Re: [PATCH] Exorcise zero-dimensional arrays (Was: Re: [HACKERS] Should array_length() Return NULL)

2013-04-04 Thread Boszormenyi Zoltan

2013-04-03 20:58 keltezéssel, Gavin Flower írta:

On 04/04/13 05:36, David E. Wheeler wrote:

On Apr 3, 2013, at 9:30 AM, Tom Lanet...@sss.pgh.pa.us  wrote:


Fortran ... Basic ... actually I'd have thought that zero was a
minority position.  Fashions change I guess.

I say we turn the default lower bound up to 11.

David


In keeping with the level of irrationality in this thread, maybe we should set it to an 
irrational number like the square root of 2, or transcend our selves and make in a 
transcendental number like pi!  :-)


I suppose using the square root of minus one would be consider too 
imaginative???  :-)


Nah, that would make arrays have 2 dimensions as a minimum... :-)




Cheers,
Gavin



--
--
Zoltán Böszörményi
Cybertec Schönig  Schönig GmbH
Gröhrmühlgasse 26
A-2700 Wiener Neustadt, Austria
Web: http://www.postgresql-support.de
 http://www.postgresql.at/



Re: [HACKERS] Proposal for Allow postgresql.conf values to be changed via SQL [review]

2013-04-04 Thread Amit Kapila
On Thursday, April 04, 2013 2:52 AM Robert Haas wrote:
 On Wed, Apr 3, 2013 at 2:54 PM, Tom Lane t...@sss.pgh.pa.us wrote:
  Robert Haas robertmh...@gmail.com writes:
  On Tue, Apr 2, 2013 at 12:19 PM, Peter Eisentraut pete...@gmx.net
 wrote:
  It's weird that SET LOCAL and SET SESSION actually *set* the value,
 and
  the second key word determines how long the setting will last.  SET
  PERSISTENT doesn't actually set the value.  I predict that this
 will be
  a new favorite help-it-doesn't-work FAQ.
 
  I think this is another argument against this particular syntax.  I
  have always thought that something along the lines of ALTER SYSTEM
  would be more appropriate.  ALTER DATABASE .. SET and ALTER ROLE ..
  SET don't change the value immediately either, and nobody gets
  confused about that to my knowledge.  But I can see where SET
  PERSISTENT could cause that sort of confusion.
 
  Yeah, I think I argued for using the SET syntax to start with, but
  I'm coming around to the position that SET PERSISTENT is too much
  unlike the behavior of other varieties of SET.  ALTER is sounding
  more attractive to me now.  Not sure about ALTER SYSTEM in
 particular
  though --- it's not clear that that has any real merit other than
  already existing as a keyword.  (Not that that's negligible.)
  ALTER CONFIGURATION is another alternative using an existing keyword
  that might be worth considering.
 
 Yeah, I thought about something like that.  Aside from saving on
 keywords, the reason I like ALTER SYSTEM or similar is that I suspect
 there will be other system-wide things that we may want to let people
 ALTER in the future, so I think that route might avoid an unnecessary
 proliferation of top-level commands.  I am not, however, deadly
 attached to the idea, if someone's got a good reason for preferring
 something else.

I think second parameter in SET command telling the scope should be fine. As
I could see Oracle
also has similar syntax for it's ALTER SYSTEM Command (Alter System Scope
[Memory|Spfile|Both]). 
Description in short:
SPFILE indicates that the change is made in the server parameter file. The
new 
setting takes effect when the database is next shut down and started up
again.
MEMORY indicates that the change is made in memory, takes effect
immediately, 
and persists until the database is shut down.

The only reason to show above example is that second parameter telling Scope
exists in other databases as well.


However if you are not convinced with above reasoning, then the Alter syntax
can be as follows:
ALTER SYSTEM SET configuration_parameter {TO | =} {value, | 'value'};

With Regards,
Amit Kapila.





-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Page replacement algorithm in buffer cache

2013-04-04 Thread Amit Kapila
On Thursday, April 04, 2013 7:19 AM Greg Smith wrote:
 On 4/2/13 11:54 AM, Robert Haas wrote:
  But, having said that, I still think the best idea is what Andres
  proposed, which pretty much matches my own thoughts: the bgwriter
  needs to populate the free list, so that buffer allocations don't
 have
  to wait for linear scans of the buffer array.
 
 I was hoping this one would make it to a full six years of being on the
 TODO list before it came up again, missed it by a few weeks.  The
 funniest part is that Amit even submitted a patch on this theme a few
 months ago without much feedback:
 http://www.postgresql.org/message-
 id/6C0B27F7206C9E4CA54AE035729E9C382852FF97@szxeml509-mbs
   That stalled where a few things have, on a) needing more regression
 test workloads, and b) wondering just what the deal with large
 shared_buffers setting degrading performance was.

For b), below are links where it decreased due to large shared buffers.

http://www.postgresql.org/message-id/attachment/27489/Results.htm
http://www.postgresql.org/message-id/6C0B27F7206C9E4CA54AE035729E9C38285442C
5@szxeml509-mbx


As per my observation, it occur when I/O starts. The dip could be due to
fluctuation or may be due to some OS scheduling or it could be due to
Eviction of dirty pages sooner than it would otherwise.

I think the further investigation can be more meaningful if the results can
be taken by someone else other than me.

One idea to proceed in this line could be we start with this patch and then
based on results, do the further experiments to make it more useful.  

With Regards,
Amit Kapila.



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Drastic performance loss in assert-enabled build in HEAD

2013-04-04 Thread Nicolas Barbier
2013/4/3 Tom Lane t...@sss.pgh.pa.us:

 Kevin Grittner kgri...@ymail.com writes:

 To be honest, I don't think I've personally seen a single use case
 for matviews where they could be used if you couldn't count on an
 error if attempting to use them without the contents reflecting a
 materialization of the associated query at *some* point in time.

 Well, if we remove the WITH NO DATA clause from CREATE MATERIALIZED
 VIEW, that minimum requirement is satisfied no?

An argument against that is that computing the contents may be very expensive.

 Granting that throwing an error is actually of some use to some people,
 I would not think that people would want to turn it on via a command
 that throws away the existing view contents altogether, nor turn it off
 with a full-throated REFRESH.  There are going to need to be ways to
 incrementally update matviews, and ways to disable/enable access that
 are not tied to a complete rebuild, not to mention being based on
 user-determined rather than hard-wired criteria for what's too stale.
 So I don't think this is a useful base to build on.

Am I correct when I think that you are saying here, that the “zero
pages == unscannable” logic is not very future-proof? In that case I
concur, and I also think that this knowledge leaks in way too many
other places (the VACUUM bug mentioned by Kevin is a good example).

 If you feel that scannability disable is an absolute must for version 0,
 let's invent a matview reloption or some such to implement it and let
 users turn it on and off as they wish.  That seems a lot more likely
 to still be useful two years from now.

(In the context of making an unlogged matview unscannable after a crash:)

Is it imaginable that such a reloption could (in a future
implementation) be changed during or right after crash recovery? For
example, by storing the set of “truncated by crash recovery” relations
in a shared catalog table, which is then inspected when connecting to
a database to continue the truncation (in the case of a matview by
making it unscannable)?

 And if you're absolutely convinced that unlogged matviews mustn't work as I
 suggest, we can lose those from 9.3, too.

+1. Having unlogged matviews without having incremental updates yet,
isn’t super useful anyway.

 What I'd actually rather see us spending time on right now is making
 some provision for incremental updates, which I will boldly propose
 could be supported by user-written triggers on the underlying tables
 if we only diked out the prohibitions against INSERT/UPDATE/DELETE on
 matviews, and allowed them to operate on a matview's contents just like
 it was a table.  Now admittedly that would foreclose allowing matviews
 to be updatable in the updatable-view sense, but that's a feature I
 would readily give up if it meant users could build incremental update
 mechanisms this year and not two years down the road.

Please make the syntax for updating the “extent” (physical
representation) of a matview different from updating the view’s
logical contents. Examples:

(1) Require to use a special function to update the extent:

SELECT pg_mv_maintain('INSERT INTO example_matview ...');

While parsing the INSERT, the parser would know that it must interpret
“example_matview” as the matview’s extent; As currently the extent and
the view are the same, nothing must be done except for only allowing
the INSERT when it is parsed in the context of pg_mv_maintain, and
otherwise saying that matviews aren’t updatable yet (“NOTICE: did you
mean to update the extent? in that case use pg_mv_maintain”).

(2) Use a different schema (cf. TOAST) for the extent, e.g., view
“public.example_matview” vs. extent “pg_mv_extent.example_matview”. I
imagine future implementations to possibly require multiple extents
anyway, e.g., for storing the “not yet applied changesets” or other
intermediate things.

 Why exactly do you think that what I'm suggesting would cause a dump and
 reload to not regenerate the data?

Expensiveness: Matviews are used in cases where the data is expensive
to compute.

 We *need* to get rid of that aspect of things.  If you must have
 scannability state in version 0, okay, but it has to be a catalog property
 not this.

+1

Nicolas

--
A. Because it breaks the logical sequence of discussion.
Q. Why is top posting bad?


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] CREATE EXTENSION BLOCKS

2013-04-04 Thread Dimitri Fontaine
Hi,

I though we were more specific about an extension's object itself not
living in a schema in our documentation, but I agree we still have room
for progress here.

David E. Wheeler da...@justatheory.com writes:
 +Note that only the extension objects will be placed into the named
 +schema; the extension itself is a database-global object.

I think you're patching the right place, but I'm not sure about the term
database-global object, that I can't find by grepping in sgml/ref.

Regards,
-- 
Dimitri Fontaine
http://2ndQuadrant.fr PostgreSQL : Expertise, Formation et Support


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] corrupt pages detected by enabling checksums

2013-04-04 Thread Simon Riggs
On 4 April 2013 02:39, Andres Freund and...@2ndquadrant.com wrote:

 Ok, I think I see the bug. And I think its been introduced in the
 checkpoints patch.

Well spotted. (I think you mean checksums patch).

 If by now the first backend has proceeded to PageSetLSN() we are writing
 different data to disk than the one we computed the checksum of
 before. Boom.

Right, so nothing else we were doing was wrong, that's why we couldn't
spot a bug. The problem is that we aren't replaying enough WAL because
the checksum on the WAL record is broke.

 I think the whole locking interactions in MarkBufferDirtyHint() need to
 be thought over pretty carefully.

When we write out a buffer with checksums enabled, we take a copy of
the buffer so that the checksum is consistent, even while other
backends may be writing hints to the same bufer.

I missed out on doing that with XLOG_HINT records, so the WAL CRC can
be incorrect because the data is scanned twice; normally that would be
OK because we have an exclusive lock on the block, but with hints we
only have share lock. So what we need to do is take a copy of the
buffer before we do XLogInsert().

Simple patch to do this attached for discussion. (Not tested).

We might also do this by modifying the WAL record to take the whole
block and bypass the BkpBlock mechanism entirely. But that's more work
and doesn't seem like it would be any cleaner. I figure lets solve the
problem first then discuss which approach is best.

-- 
 Simon Riggs   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


copy_before_XLOG_HINT.v1.patch
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Page replacement algorithm in buffer cache

2013-04-04 Thread Robert Haas
On Wed, Apr 3, 2013 at 9:49 PM, Greg Smith g...@2ndquadrant.com wrote:
 On 4/2/13 11:54 AM, Robert Haas wrote:
 But, having said that, I still think the best idea is what Andres
 proposed, which pretty much matches my own thoughts: the bgwriter
 needs to populate the free list, so that buffer allocations don't have
 to wait for linear scans of the buffer array.

 I was hoping this one would make it to a full six years of being on the TODO
 list before it came up again, missed it by a few weeks.  The funniest part
 is that Amit even submitted a patch on this theme a few months ago without
 much feedback:
 http://www.postgresql.org/message-id/6C0B27F7206C9E4CA54AE035729E9C382852FF97@szxeml509-mbs
 That stalled where a few things have, on a) needing more regression test
 workloads, and b) wondering just what the deal with large shared_buffers
 setting degrading performance was.

Those are impressive results.  I think we should seriously consider
doing something like that for 9.4.  TBH, although more workloads to
test is always better, I don't think this problem is so difficult that
we can't have some confidence in a theoretical analysis.  If I read
the original thread correctly (and I haven't looked at the patch
itself), the proposed patch would actually invalidate buffers before
putting them on the freelist.  That effectively amounts to reducing
shared_buffers, so workloads that are just on the edge of what can fit
in shared_buffers will be harmed, and those that benefit incrementally
from increased shared_buffers will be as well.

What I think we should do instead is collect the buffers that we think
are evictable and stuff them onto the freelist without invalidating
them.  When a backend allocates from the freelist, it can double-check
that the buffer still has usage_count 0.  The odds should be pretty
good.  But even if we sometimes notice that the buffer has been
touched again after being put on the freelist, we haven't expended all
that much extra effort, and that effort happened mostly in the
background.  Consider a scenario where only 10% of the buffers have
usage count 0 (which is not unrealistic).  We scan 5000 buffers and
put 500 on the freelist.  Now suppose that, due to some accident of
the workload, 75% of those buffers get touched again before they're
allocated off the freelist (which I believe to be a pessimistic
estimate for most workloads).  Now, that means that only 125 of those
500 buffers will succeed in satisfying an allocation request.  That's
still a huge win, because it means that each backend only has examine
an average of 4 buffers before it finds one to allocate.  If it had
needed to do the freelist scan itself, it would have had to touch 40
buffers before finding one to allocate.

In real life, I think the gains are apt to be, if anything, larger.
IME, it's common for most or all of the buffer pool to be pinned at
usage count 5.  So you could easily have a situation where the arena
scan has to visit millions of buffers to find one to allocate.  If
that's happening in the background instead of the foreground, it's a
huge win.  Also, note that there's nothing to prevent the arena scan
from happening in parallel with allocations off of the freelist - so
while foreground processes are emptying the freelist, the background
process can be looking for more things to add to it.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] Number of spinlocks

2013-04-04 Thread Heikki Linnakangas
With --disable-spinlocks, we need to know the number of spinlocks in the 
system at startup, so that we can reserve enough semaphores to mimic the 
spinlocks. It's calculated in SpinLockSemas():



/*
 * Report number of semaphores needed to support spinlocks.
 */
int
SpinlockSemas(void)
{
/*
 * It would be cleaner to distribute this logic into the affected 
modules,
 * similar to the way shmem space estimation is handled.
 *
 * For now, though, we just need a few spinlocks (10 should be plenty)
 * plus one for each LWLock and one for each buffer header. Plus one
 * for each XLog insertion slot in xlog.c.
 */
return NumLWLocks() + NBuffers + 10 + NUM_XLOGINSERT_SLOTS;
}


Ten spinlocks might've been plenty in 2001 when that comment was 
written, but we have reached that number. I grepped the sources for 
SpinLockInit, and found that we use:


1 spinlock for each LWLock
1 spinlock for each buffer
1 spinlock for each wal sender process slot (max_wal_senders in total)

1 spinlock for each partitioned hash table (2 in predicate.c, 2 in 
lock.c, 1 in buf_table.c)

1 spinlock in XLogCtl-info_lck
1 spinlock for WAL receiver (WalRcv-mutex)
1 spinlock for hot standby xid tracking (procArray-known_assigned_xids_lck)
1 spinlock for shared memory allocator (ShmemLock)
1 spinlock for shared inval messaging (shmInvalBuffer-msgnumLock)
1 spinlock for the proc array freelist (ProcStructLock)
1 spinlock for fast-path lock mechanism (FastPathStrongRelationLocks-mutex)
1 spinlock for the checkpointer (CheckpointerShmem-ckpt_lck)

That's a fixed number of 13 spinlocks, plus 1 for each LWLock, buffer, 
and wal sender.


I'll go and adjust SpinLockSemas() to take the walsenders into account, 
and bump the fixed number from 10 to 30. That should be enough headroom 
for the next ten years.


- Heikki


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] corrupt pages detected by enabling checksums

2013-04-04 Thread Andres Freund
On 2013-04-04 13:30:40 +0100, Simon Riggs wrote:
 On 4 April 2013 02:39, Andres Freund and...@2ndquadrant.com wrote:

  Ok, I think I see the bug. And I think its been introduced in the
  checkpoints patch.

 Well spotted. (I think you mean checksums patch).

Heh, yes. I was slightly tired at that point ;)

  If by now the first backend has proceeded to PageSetLSN() we are writing
  different data to disk than the one we computed the checksum of
  before. Boom.

 Right, so nothing else we were doing was wrong, that's why we couldn't
 spot a bug. The problem is that we aren't replaying enough WAL because
 the checksum on the WAL record is broke.

Well, there might be other bugs we can't see yet... But lets hope not.

  I think the whole locking interactions in MarkBufferDirtyHint() need to
  be thought over pretty carefully.

 When we write out a buffer with checksums enabled, we take a copy of
 the buffer so that the checksum is consistent, even while other
 backends may be writing hints to the same bufer.

 I missed out on doing that with XLOG_HINT records, so the WAL CRC can
 be incorrect because the data is scanned twice; normally that would be
 OK because we have an exclusive lock on the block, but with hints we
 only have share lock. So what we need to do is take a copy of the
 buffer before we do XLogInsert().

 Simple patch to do this attached for discussion. (Not tested).

 We might also do this by modifying the WAL record to take the whole
 block and bypass the BkpBlock mechanism entirely. But that's more work
 and doesn't seem like it would be any cleaner. I figure lets solve the
 problem first then discuss which approach is best.

Unfortunately I find that approach unacceptably ugly. I don't think its
ok to make an already *very* complicated function (XLogInsert()) in one
of the most complicated files (xlog.c) even more complicated. It
literally took me years to understand a large percentage of it.
I even think the XLOG_HINT escape hatch should be taken out and be
replaced by something outside of XLogInsert().

Don't think that approach would work with as few lines anyway, since you
need to properly pass it into XLogCheckBuffer() et al which checks the
buffer tags and such - which it needs to properly compute the struct
BkpBlock. Also we iterate over rdata-page for the CRC computation.

I think the route you quickly sketched is more realistic. That would
remove all knowledge obout XLOG_HINT from generic code hich is a very
good thing, I spent like 15minutes yesterday wondering whether the early
return in there might be the cause of the bug...

Something like:

XLogRecPtr
XLogSaveBufferForHint(Buffer buffer)
{
XLogRecPtr recptr = InvalidXLogRecPtr;
XLogRecPtr lsn;
XLogRecData rdata[2];
BkbBlock bkp;

/*
 * make sure no checkpoint can happen while we decide about logging
 * something or not, since we don't hold any lock preventing that
 * otherwise.
 */
Assert(MyPgXact-delayChkpt);

/* update RedoRecPtr */
GetRedoRecPtr();

/* setup phony rdata element */
rdata[0].buffer = buffer;
rdata[0].buffer_std = true; /* is that actually ok? */

/* force full pages on, otherwise checksums won't work? */
if (XLogCheckBuffer(rdata, true, lsn, bkp))
{
char copied_buffer[BLCKSZ];

/*
 * copy buffer so we don't have to worry about
 * concurrent hint bit or lsn updates. We assume pd_lower/upper
 * cannot be changed without an exclusive lock, so the contents
 * bkp are not racy.
 */
memcpy(copied_buffer, BufferGetBlock(buffer), BLCKSZ);

rdata[0].data = bkp;
rdata[0].len = sizeof(BkbBlock);
rdata[0].buffer = InvalidBuffer;
rdata[0].buffer_std = false;
rdata[0].next = (rdata[1]);

rdata[1].data = copied_buffer;
rdata[1].len = BLCKSZ;
rdata[1].buffer = InvalidBuffer;
rdata[1].buffer_std = false;
rdata[1].next = NULL;

recptr = XLogInsert(RM_XLOG_ID, XLOG_HINT, rdata);
}

return recptr;
}

That should work?

Greetings,

Andres Freund

--
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] Why there is a PG_GETARG_UINT32 and PG_RETURN_UINT32?

2013-04-04 Thread Rodrigo Barboza
Hi guys.
I am wondering when I can use the PG_GETARG_UINT32 and PG_RETURN_UINT32.
If postgres has no unsigned int type, what is the use of these macros?


Re: [PATCH] Exorcise zero-dimensional arrays (Was: Re: [HACKERS] Should array_length() Return NULL)

2013-04-04 Thread Merlin Moncure
On Wed, Apr 3, 2013 at 11:11 PM, Tom Lane t...@sss.pgh.pa.us wrote:
 In any case, the whole exercise is pointless if we don't change the
 visible behavior of array_dims et al.  So I think the idea that this
 would be without visible consequence is silly.  What's up for argument
 is just how much incompatibility is acceptable.

The only reasonable answer for this (a provably used, non-security,
non-standards violating, non-gross functionality breakage case) is
*zero*.  Our historically cavalier attitude towards compatibility
breakage has been an immense disservice to our users and encourages
very bad upgrade habits and is, IMNSHO, embarrassing.

Changing the way array_dims works for a minor functionality
enhancement is gratuitous and should be done, if at all, via a loudly
advertised deprecation/replacement cycle with a guarding GUC (yes, I
hate them too, but not nearly as much as the expense of qualifying
vast code bases against random compatibility breakages every release).

merlin


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Multi-pass planner

2013-04-04 Thread Robert Haas
On Wed, Apr 3, 2013 at 9:40 PM, Greg Stark st...@mit.edu wrote:
 I used to advocate a similar idea. But when questioned on list I tried to
 work out the details and ran into some problem coming up with a concrete
 plan.

 How do you compare a plan that you think has a 99% chance of running in 1ms
 but a 1% chance of taking 1s against a plan that has a 90% chance of 1ms and
 a 10% chance of taking 100ms? Which one is actually riskier? They might even
 both have the same 95% percentile run-time.

 And additionally there are different types of unknowns. Do you want to treat
 plans where we have a statistical sample that gives us a probabilistic
 answer the same as plans where we think our model just has a 10% chance of
 being wrong? The model is going to either be consistently right or
 consistently wrong for a given query but the sample will vary from run to
 run. (Or vice versa depending on the situation).

One idea that someone through up against a wall recently during an
EnterpriseDB development meeting - I think it was Kevin Grittner but
it might have been Noah Misch, or some combination - was to have a GUC
for estimate_worstcase_fraction.  So, when computing the cost of a
path, we'd compute our current expected-case estimate, and also a
worst-case estimate, and then compute the final cost as:

estimated_cost = estimate_worstcase_fraction * worst_case_estimate +
(1 - estimate_worstcase_fraction) * expected_case_estimate

I think Kevin and I both have the intuition that even a rather small
value for estimate_worstcase_fraction - like 0.01 or 0.001 or even
smaller - would prevent a pretty significant fraction of the problems
people encounter in this area.  But users could change it, in general
or for particular queries, if it ended up being wrong in their
environment.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] CREATE EXTENSION BLOCKS

2013-04-04 Thread David E. Wheeler
On Apr 4, 2013, at 5:16 AM, Dimitri Fontaine dimi...@2ndquadrant.fr wrote:

 David E. Wheeler da...@justatheory.com writes:
 +Note that only the extension objects will be placed into the named
 +schema; the extension itself is a database-global object.
 
 I think you're patching the right place, but I'm not sure about the term
 database-global object, that I can't find by grepping in sgml/ref.

Yeah, I wasn't sure, either, but figured someone here would know what to call 
those sorts of things.

Thanks,

David



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] corrupt pages detected by enabling checksums

2013-04-04 Thread Simon Riggs
On 4 April 2013 15:53, Andres Freund and...@2ndquadrant.com wrote:

 Unfortunately I find that approach unacceptably ugly.

Yeh. If we can confirm its a fix we can discuss a cleaner patch and
that is much better.

-- 
 Simon Riggs   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] psql crash fix

2013-04-04 Thread Bruce Momjian
On Tue, Apr  2, 2013 at 08:48:53PM -0400, Bruce Momjian wrote:
 I found that psql will crash if given a PSQLRC value containing a tilde:
 
   $ PSQLRC=~/x psql test
   *** glibc detected *** psql: free(): invalid pointer: 
 0x7fffb7c933ec ***
 
 This is on Debian Squeeze 6.0.7.  The fix is to pstrdup() the value
 returned by getenv(), so it can be free()'ed later --- you can't free
 getenv()-returned values:
 
As typically implemented, getenv() returns a pointer to a string
within the environment list.  The caller must take care not to
modify this string, since that would change the environment of
the process.
 
 This bug exists in 9.2 and git head.  I also removed the return value
 from expand_tilde() as no caller was using it.

Applied.

-- 
  Bruce Momjian  br...@momjian.ushttp://momjian.us
  EnterpriseDB http://enterprisedb.com

  + It's impossible for everything to be true. +


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Regex with 32k different chars causes a backend crash

2013-04-04 Thread Heikki Linnakangas

On 04.04.2013 03:32, Noah Misch wrote:

On Wed, Apr 03, 2013 at 08:09:15PM +0300, Heikki Linnakangas wrote:

--- a/src/include/regex/regguts.h
+++ b/src/include/regex/regguts.h
@@ -148,6 +148,7 @@
  typedef short color;  /* colors of characters */
  typedef int pcolor;   /* what color promotes to */

+#define MAX_COLOR  32767   /* max value that fits in 'color' 
datatype */


This should use SHRT_MAX, no?  (Not that any supported platform differs here.)


I considered that, but I got all confused on whether limits.h needs to 
be included and where, if we use that. So I just used a constant 32767 
in the end. Committed that way.


I opened a ticket in TCL bug tracker for this: 
https://sourceforge.net/tracker/?func=detailaid=3610026group_id=10894atid=110894. 



- Heikki


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] puzzling JSON bug

2013-04-04 Thread Andrew Dunstan

David Wheeler has presented me with a nasty bug case.

If I do this:

   select '{members: { add: [3, 4]}}'::json # '{members,add}';

then I get a crash.

If I comment out the pfree() at json.c:parse_object_field() lines 378-9 
then I get back the right result but instead get a warning like this:


   WARNING:  problem in alloc set ExprContext: bogus aset link in block
   0x1efaa80, chunk 0x1efb1f0


I'm not quite sure where I should go looking for what I've done wrong here.

cheers

andrew


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Hash Join cost estimates

2013-04-04 Thread Stephen Frost
Tom, all,

* Tom Lane (t...@sss.pgh.pa.us) wrote:
 So that's at least going in the right direction.

I agree that this is going in the right direction; it certainly would
make the plan that I *expect* to be chosen more likely, however..

I've been fiddling with this on the very much larger overall database
where this test case came from and have found that hashing the large
table can actually be *faster* and appears to cause a more consistent
and constant amount of disk i/o (which is good).

The test case exhibits a bit of why this is the case- the per-tuple hash
lookup is way closer to the per-tuple cost of building the hash table
than I'd expect.

per-tuple cost to build the hash table (41M tuples): 0.33us
per-tuple cost to scan/do hash lookups (41M tuples): 0.29us
  (with a small hash table of only 41K entries)

The total difference being: 1233.854 vs. 1428.424, or only 194.570ms in
favor of scanning the big table instead of hashing it.

These numbers are just from those posted with my original email:

http://explain.depesz.com/s/FEq
http://explain.depesz.com/s/FOU

I've seen much worse though- I have one case where hash-the-big-table
took 5s and hash-the-small-table took almost 10s (total times).  I'm
trying to see if I can pull that out and isolate how it's different (and
see if it was just due to other load on the box).

What I'm trying to get at in this overall email is: why in the world is
it so expensive to do hash lookups?  I would have expected the cost of
the hash table to be *much* more than the cost to do a hash lookup, and
that doing hash lookups against a small hash table would be fast enough
to put serious pressure on the i/o subsystem.  Instead, the building of
the hash table actually puts more pressure and can end up being more
efficient overall.  We have a process that basically does this a whole
bunch and the hash-the-big-table operation takes about 4.7 hrs, while
the hash-the-small-table approach went well past 5 hours and was only
about 70% complete.

Thoughts?

Thanks,

Stephen


signature.asc
Description: Digital signature


Re: [HACKERS] Drastic performance loss in assert-enabled build in HEAD

2013-04-04 Thread Kevin Grittner
Early versions of the matview patch had a relisvalid flag in
pg_class to determine whether the relation was scannable.  The name
was chosen based on a similarity to the purpose of indisvalid,
although the proliferation of new bools for related issues left me
wondering if a char would be a better idea.  Based on on-list
reviews I removed that in favor of basing the state on a
zero-length heap file, in an attempt to work better with unlogged
matviews.  I can easily look back through my development branch to
find the commits which made this change and revert them if this
approach is preferred.

I realize this would need to be Real Soon Now, but before reverting
to the earlier code I want to allow a *little* more time for
opinions.

Responses to particular points below.


Tom Lane t...@sss.pgh.pa.us wrote:

 Granting that throwing an error is actually of some use to some people,
 I would not think that people would want to turn it on via a command
 that throws away the existing view contents altogether, nor turn it off
 with a full-throated REFRESH.

There is no doubt that in future versions we will want to be able
to disallow scans based on other criteria than just whether the
data is valid as of *some* point in time.  I can't imagine a
circumstance under which we would want to allow scans if it
doesn't.  So, at least for this release and as a default for future
releases, I think it makes sense that a matview last CREATEd or
REFRESHed WITH NO DATA should not be scannable.  Additional knobs
for the users to control this can and should be added in future
releases.

 There are going to need to be ways to incrementally update
 matviews, and ways to disable/enable access that are not tied to
 a complete rebuild, not to mention being based on user-determined
 rather than hard-wired criteria for what's too stale.

Absolutely.  So far as I know, nobody has ever suggested or
expected otherwise.  This paper provides a useful summary of
techniques for incremental updates, with references to more
detailed papers:

http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.40.2254rep=rep1type=pdf

I expect to be working on implementing the most obvious special
cases and a catch-all general solution for 9.4.  Efficient
incremental update seems to depend on the ability to have at least
one new hidden system column, so once we get to a good point for
discussing 9.4 issues, I'll be on about that.

 If you feel that scannability disable is an absolute must for version 0,
 let's invent a matview reloption or some such to implement it and let
 users turn it on and off as they wish.  That seems a lot more likely
 to still be useful two years from now.

What about the idea of a relisvalid bool or some char column in
pg_class?

 And if you're absolutely convinced that unlogged matviews mustn't
 work as I suggest, we can lose those from 9.3, too.

I was loath to do that, but as Nicolas points out, they really
aren't that interesting without incremental update.  Perhaps it is
better to drop those until next release and have a more
sophisticated way to deal with invalidation of those -- or as you
suggested, just punt them to be empty.  (I would hate to do that,
but it might be the lesser of evils.)

 What I'd actually rather see us spending time on right now is making
 some provision for incremental updates, which I will boldly propose
 could be supported by user-written triggers on the underlying tables
 if we only diked out the prohibitions against INSERT/UPDATE/DELETE on
 matviews, and allowed them to operate on a matview's contents just like
 it was a table.  Now admittedly that would foreclose allowing matviews
 to be updatable in the updatable-view sense, but that's a feature I
 would readily give up if it meant users could build incremental update
 mechanisms this year and not two years down the road.

I think that should be the last resort, after we have a more
automated declarative way of maintaining the majority of cases.
Since I don't see too much there where using that technique with
matviews would give you more than you could do right now with
tables and triggers, hand-coding the incremental maintenance is
very low on my list of priorities.  In any event, I don't want to
rush into any such thing this close to release; that seems to me to
be clearly 9.4 material.

 Individual judges have a dashboard showing such things as the
 number of court cases which have gone beyond certain thresholds
 without action

 If you need 100% accuracy, which is what this scenario appears to be
 demanding, matviews are not for you (yet).  The existing implementation
 certainly is nowhere near satisfying this scenario.

No, we're talking about timelines measured in weeks or months.  A
nightly update is acceptable, although there are occasional gripes
by a judge that they would like to see the results of cleaning up
such cases sooner than the next morning.  An hour latency would
probably make them happy.  If async incremental update could give a

Re: [HACKERS] pg_dump selectively ignores extension configuration tables

2013-04-04 Thread Vibhor Kumar

On Mar 25, 2013, at 12:01 PM, Vibhor Kumar vibhor.ku...@enterprisedb.com 
wrote:

 
 On Mar 25, 2013, at 10:48 AM, Joe Conway m...@joeconway.com wrote:
 
 On 03/25/2013 08:12 AM, Vibhor Kumar wrote:
 Since, nobody has picked this one.
 
 If there is no objection,then I can test this patch against 9.1  9.2.
 
 Here are diffs for 9.1 and 9.2. The previous email was against 9.3 dev.
 
 Thanks Joe!
 
 will test both for 9.1 and 9.2

I did some testing on this patch with 9.1 and 9.2 source code. Testing included 
following:
1. Configured PostGIS with 9.1 and 9.2
2. verified all switches of pg_dump with regression db.
3. Checked other extensions, to verify if this impacting those. 

Everything is working as expected and I haven't found any issue during my test 
with this patch.

While testing this patch, some thoughts came in my mind and thought to share on 
this thread. 
Is it possible, if we can have two switches for extension in pg_dump:
1. extension dump with user data in extension tables.
2. User data-only dump from extensions.


Thanks  Regards,
Vibhor Kumar
EnterpriseDB Corporation
The Enterprise PostgreSQL Company
Blog:http://vibhork.blogspot.com



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Hash Join cost estimates

2013-04-04 Thread Tom Lane
Stephen Frost sfr...@snowman.net writes:
 I've been fiddling with this on the very much larger overall database
 where this test case came from and have found that hashing the large
 table can actually be *faster* and appears to cause a more consistent
 and constant amount of disk i/o (which is good).

Interesting.

 What I'm trying to get at in this overall email is: why in the world is
 it so expensive to do hash lookups?

perf or oprofile reveal anything?

Also, I assume that the cases you are looking at are large enough that
even the small table doesn't fit in a single hash batch?  It could
well be that the answer has to do with some bogus or at least
unintuitive behavior of the batching process, and it isn't really at all
a matter of individual hash lookups being slow.

(You never did mention what work_mem setting you're testing, anyway.)

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Hash Join cost estimates

2013-04-04 Thread Stephen Frost
* Tom Lane (t...@sss.pgh.pa.us) wrote:
  What I'm trying to get at in this overall email is: why in the world is
  it so expensive to do hash lookups?
 
 perf or oprofile reveal anything?

Working on a test case actually- I've got one now:
http://snowman.net/~sfrost/test_case2.sql

In this example, hashing the large table is actually 2 seconds *faster*
than hashing the small table (again, all on my laptop).

 Also, I assume that the cases you are looking at are large enough that
 even the small table doesn't fit in a single hash batch?  

No, quite the opposite, sorry for not mentioning that before.  Either
side fits completely into memory w/ a single batch.  The explain
analyze's that I posted before show that, either way, there's only one
batch involved.

 (You never did mention what work_mem setting you're testing, anyway.)

With the test case above (where I got a 2s faster run time by hashing
the big table) used a work_mem of 1GB.

Thanks!

Stephen


signature.asc
Description: Digital signature


Re: [HACKERS] Multi-pass planner

2013-04-04 Thread Dimitri Fontaine
Robert Haas robertmh...@gmail.com writes:
 for estimate_worstcase_fraction.  So, when computing the cost of a
 path, we'd compute our current expected-case estimate, and also a
 worst-case estimate, and then compute the final cost as:

There also was the idea for the executor to be able to handle alternate
plans and some heuristic to determine that the actual cost of running a
plan is much higher than what's been estimated, so much so as to switch
to starting from scratch with the other plan instead.

Regards,
-- 
Dimitri Fontaine
http://2ndQuadrant.fr PostgreSQL : Expertise, Formation et Support


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Hash Join cost estimates

2013-04-04 Thread Stephen Frost
* Tom Lane (t...@sss.pgh.pa.us) wrote:
 perf or oprofile reveal anything?

Here's what we get from oprofile (perhaps not too surprising):

Hash the small table / scan the big table:
samples  cum. samples  %cum. % linenr info image 
name   symbol name
167374   16737447.9491  47.9491nodeHash.c:915  postgres 
ExecScanHashBucket
8504125241524.3624  72.3115mcount.c:60 
libc-2.15.so __mcount_internal
28370280785 8.1274  80.4389_mcount.S:33
libc-2.15.so mcount
15856296641 4.5424  84.9814(no location information)   [vdso] 
(tgid:30643 range:0x7fffe6fff000-0x7fffe6ff) [vdso] (tgid:30643 
range:0x7fffe6fff000-0x7fffe6ff)
6291 302932 1.8022  86.7836xact.c:682  postgres 
TransactionIdIsCurrentTransactionId
4555 307487 1.3049  88.0885instrument.c:70 postgres 
InstrStopNode
3849 311336 1.1027  89.1912heapam.c:711postgres 
heapgettup_pagemode
3567 314903 1.0219  90.2130nodeHashjoin.c:63   postgres 
ExecHashJoin

Hash the big table / scan the small table:
samples  cum. samples  %cum. % linenr info image 
name   symbol name
112060   11206039.2123  39.2123mcount.c:60 
libc-2.15.so __mcount_internal
3654714860712.7886  52.0009nodeHash.c:709  postgres 
ExecHashTableInsert
3357018217711.7469  63.7477_mcount.S:33
libc-2.15.so mcount
16383198560 5.7328  69.4805(no location information)   [vdso] 
(tgid:30643 range:0x7fffe6fff000-0x7fffe6ff) [vdso] (tgid:30643 
range:0x7fffe6fff000-0x7fffe6ff)
13200211760 4.6190  74.0995(no location information)   
no-vmlinux   /no-vmlinux
6345 218105 2.2203  76.3197xact.c:682  postgres 
TransactionIdIsCurrentTransactionId
5250 223355 1.8371  78.1568nodeHash.c:915  postgres 
ExecScanHashBucket
4797 228152 1.6786  79.8354heapam.c:711postgres 
heapgettup_pagemode
4661 232813 1.6310  81.4664aset.c:563  postgres 
AllocSetAlloc
4588 237401 1.6054  83.0718instrument.c:70 postgres 
InstrStopNode
3550 240951 1.2422  84.3140memcpy-ssse3-back.S:60  
libc-2.15.so __memcpy_ssse3_back
3013 243964 1.0543  85.3684aset.c:1109 postgres 
AllocSetCheck

Looking at the 'Hash the small table / scan the big table', opannotate
claims that this is bar far the worst offender:

147588 42.2808 :hashTuple = hashTuple-next;

While most of the time in the 'Hash the big table / scan the small
table' is in:

 34572 12.0975 :hashTuple-next = hashtable-buckets[bucketno];

Neither of those strike me as terribly informative though.  To be
honest, I've not really played w/ oprofile all that much.  Now that I've
got things set up to support this, I'd be happy to provide more info if
anyone has suggestions on how to get something more useful.

It does look like reducing bucket depth, as I outlined before through
the use of a 2-level hashing system, might help speed up
ExecScanHashBucket, as it would hopefully have very few (eg: 1-2)
entries to consider instead of more.  Along those same lines, I really
wonder if we're being too generous wrt the bucket-depth goal of '10'
instead of, say, '1', especially when we've got plenty of work_mem
available.

Thanks,

Stephen


signature.asc
Description: Digital signature


Re: [HACKERS] Hash Join cost estimates

2013-04-04 Thread Stephen Frost
* Stephen Frost (sfr...@snowman.net) wrote:
 8504125241524.3624  72.3115mcount.c:60 
 libc-2.15.so __mcount_internal
 28370280785 8.1274  80.4389_mcount.S:33
 libc-2.15.so mcount
[...]

And as a side-note, I'm rebuilding w/o profiling, asserts, etc and will
run it again, though I don't really expect the top-hitters to change.

Thanks,

Stephen


signature.asc
Description: Digital signature


Re: [HACKERS] puzzling JSON bug

2013-04-04 Thread Tom Lane
Andrew Dunstan and...@dunslane.net writes:
 David Wheeler has presented me with a nasty bug case.
 If I do this:

 select '{members: { add: [3, 4]}}'::json # '{members,add}';

 then I get a crash.

 If I comment out the pfree() at json.c:parse_object_field() lines 378-9 
 then I get back the right result but instead get a warning like this:

 WARNING:  problem in alloc set ExprContext: bogus aset link in block
 0x1efaa80, chunk 0x1efb1f0

 I'm not quite sure where I should go looking for what I've done wrong here.

Routine array-overrun memory stomp.  The chunk header data for fname's
alloc chunk is being overwritten here:

Watchpoint 2: *(int *) 1075253088

Old value = 1074925616
New value = -1
0x50fe14 in get_array_start (state=0x40170e88) at jsonfuncs.c:688
688 _state-array_level_index[lex_level] = -1;

It appears that lex_level is 2 but only enough room for 2 entries has
been allocated in array_level_index[].

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] puzzling JSON bug

2013-04-04 Thread Andrew Dunstan


On 04/04/2013 03:39 PM, Tom Lane wrote:

Andrew Dunstan and...@dunslane.net writes:

David Wheeler has presented me with a nasty bug case.
If I do this:
 select '{members: { add: [3, 4]}}'::json # '{members,add}';
then I get a crash.
If I comment out the pfree() at json.c:parse_object_field() lines 378-9
then I get back the right result but instead get a warning like this:
 WARNING:  problem in alloc set ExprContext: bogus aset link in block
 0x1efaa80, chunk 0x1efb1f0
I'm not quite sure where I should go looking for what I've done wrong here.

Routine array-overrun memory stomp.  The chunk header data for fname's
alloc chunk is being overwritten here:

Watchpoint 2: *(int *) 1075253088

Old value = 1074925616
New value = -1
0x50fe14 in get_array_start (state=0x40170e88) at jsonfuncs.c:688
688 _state-array_level_index[lex_level] = -1;

It appears that lex_level is 2 but only enough room for 2 entries has
been allocated in array_level_index[].



OK, many thanks, will fix.

cheers

andrew



--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] corrupt pages detected by enabling checksums

2013-04-04 Thread Jeff Davis
Andres,

Thank you for diagnosing this problem!

On Thu, 2013-04-04 at 16:53 +0200, Andres Freund wrote:
 I think the route you quickly sketched is more realistic. That would
 remove all knowledge obout XLOG_HINT from generic code hich is a very
 good thing, I spent like 15minutes yesterday wondering whether the early
 return in there might be the cause of the bug...

I like this approach. It may have some performance impact though,
because there are a couple extra spinlocks taken, and an extra memcpy.
The code looks good to me except that we should be consistent about the
page hole -- XLogCheckBuffer is calculating it, but then we copy the
entire page. I don't think anything can change the size of the page hole
while we have a shared lock on the buffer, so it seems OK to skip the
page hole during the copy.

Another possible approach is to drop the lock on the buffer and
re-acquire it in exclusive mode after we find out we'll need to do
XLogInsert. It means that MarkBufferDirtyHint may end up blocking for
longer, but would avoid the memcpy. I haven't really thought through the
details.

Regards,
Jeff Davis




-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Hash Join cost estimates

2013-04-04 Thread Stephen Frost
* Stephen Frost (sfr...@snowman.net) wrote:
 It does look like reducing bucket depth, as I outlined before through
 the use of a 2-level hashing system, might help speed up
 ExecScanHashBucket, as it would hopefully have very few (eg: 1-2)
 entries to consider instead of more.  Along those same lines, I really
 wonder if we're being too generous wrt the bucket-depth goal of '10'
 instead of, say, '1', especially when we've got plenty of work_mem
 available.

Rerunning using a minimally configured build (only --enable-openssl
and --enable-debug passed to configure) with NTUP_PER_BUCKET set to '1'
results in a couple of interesting things-

First, the planner actually picks the plan to hash the small table and
seqscan the big one.  That also, finally, turns out to be *faster* for
this test case.

explain analyze results here:
Hash small table / seqscan big table: http://explain.depesz.com/s/nP1
Hash big table / seqscan small table: http://explain.depesz.com/s/AUv

Here's the oprofile reports:

Hash small table / seqscan big table:
samples  cum. samples  %cum. % linenr info image 
name   symbol name
3902339023 52.8574  52.8574nodeHash.c:915  postgres 
ExecScanHashBucket
3743 42766  5.0700  57.9273xact.c:682  postgres 
TransactionIdIsCurrentTransactionId
3110 45876  4.2126  62.1399nodeHashjoin.c:63   postgres 
ExecHashJoin
2561 48437  3.4689  65.6088heapam.c:711postgres 
heapgettup_pagemode
2427 50864  3.2874  68.8962heapam.c:300postgres 
heapgetpage
2395 53259  3.2441  72.1403heaptuple.c:1028postgres 
slot_deform_tuple
2395 55654  3.2441  75.3843heaptuple.c:1135postgres 
slot_getattr
2383 58037  3.2278  78.6122nodeHash.c:786  postgres 
ExecHashGetHashValue
1811 59848  2.4530  81.0652tqual.c:1044postgres 
HeapTupleSatisfiesMVCC
1796 61644  2.4327  83.4979execScan.c:111  postgres 
ExecScan
1298 62942  1.7582  85.2561hashfunc.c:517  postgres 
hash_uint32
1274 64216  1.7257  86.9817execProcnode.c:356  postgres 
ExecProcNode
1011 65227  1.3694  88.3511heapam.c:1453   postgres 
heap_getnext
905  66132  1.2258  89.5770execTuples.c:333postgres 
ExecStoreTuple
858  66990  1.1622  90.7392fmgr.c:1291 postgres 
FunctionCall1Coll
835  67825  1.1310  91.8702execQual.c:668  postgres 
ExecEvalScalarVarFast
834  68659  1.1297  92.mcxt.c:126  postgres 
MemoryContextReset
818  69477  1.1080  94.1078nodeSeqscan.c:48postgres 
SeqNext

Hash big table / seqscan small table:
samples  cum. samples  %cum. % linenr info image 
name   symbol name
3861238612 41.2901  41.2901nodeHash.c:709  postgres 
ExecHashTableInsert
7435 46047  7.9507  49.2408(no location information)   
no-vmlinux   /no-vmlinux
4900 50947  5.2399  54.4806aset.c:563  postgres 
AllocSetAlloc
3803 54750  4.0668  58.5474xact.c:682  postgres 
TransactionIdIsCurrentTransactionId
3335 58085  3.5663  62.1137heapam.c:711postgres 
heapgettup_pagemode
2532 60617  2.7076  64.8213nodeHash.c:786  postgres 
ExecHashGetHashValue
2523 63140  2.6980  67.5193memcpy-ssse3-back.S:60  
libc-2.15.so __memcpy_ssse3_back
2518 65658  2.6926  70.2119heaptuple.c:1028postgres 
slot_deform_tuple
2378 68036  2.5429  72.7549heapam.c:300postgres 
heapgetpage
2374 70410  2.5387  75.2935heaptuple.c:1135postgres 
slot_getattr
1852 72262  1.9805  77.2740nodeHash.c:915  postgres 
ExecScanHashBucket
1831 74093  1.9580  79.2320tqual.c:1044postgres 
HeapTupleSatisfiesMVCC
1732 75825  1.8521  81.0841heapam.c:1453   postgres 
heap_getnext
1320 77145  1.4116  82.4957nodeHash.c:76   postgres 
MultiExecHash
1219 78364  1.3035  

Re: [HACKERS] Multi-pass planner

2013-04-04 Thread Robert Haas
On Thu, Apr 4, 2013 at 2:53 PM, Dimitri Fontaine dimi...@2ndquadrant.fr wrote:
 Robert Haas robertmh...@gmail.com writes:
 for estimate_worstcase_fraction.  So, when computing the cost of a
 path, we'd compute our current expected-case estimate, and also a
 worst-case estimate, and then compute the final cost as:

 There also was the idea for the executor to be able to handle alternate
 plans and some heuristic to determine that the actual cost of running a
 plan is much higher than what's been estimated, so much so as to switch
 to starting from scratch with the other plan instead.

Yeah.  The thing is, if the plan has any side effects, that's not
really an option.  And even if it doesn't, it may throw away a lot of
work.  One thing we could do is switch from a unparameterized nested
loop to a hash join if the outer side turns out to be much larger than
expected, but that's only going to benefit a pretty narrow set of use
cases.  Which is why I think a planner-based approach is probably more
promising.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Hash Join cost estimates

2013-04-04 Thread Tom Lane
Stephen Frost sfr...@snowman.net writes:
 Looking with opannotate, there's two main hotspots in
 ExecScanHashBucket:

  12846 17.4001 :hashTuple = 
 hashtable-buckets[hjstate-hj_CurBucketNo];
 and
  22100 29.9348 :hashTuple = hashTuple-next;

Those are, of course, pretty trivial statements; so the way I read this
is that the fundamental slowdown comes from the hash table being large
compared to the CPU's cache, so that you're incurring lots of cache
misses at these particular fetches.  (You might be able to confirm that
if you can set oprofile to count cache misses rather than wall clock
time.)

 I'm certainly curious about those, but I'm also very interested in the
 possibility of making NTUP_PER_BUCKET much smaller, or perhaps variable
 depending on the work_mem setting.

Not sure about that.  That would make the hash-bucket-header array
larger without changing the size of the rest of the hash table, thus
probably making the CPU cache situation worse not better (which would
manifest as more time at the first of these two statements relative to
the second).

Can you add some instrumentation code to get us information about the
average/max length of the bucket chains?  And maybe try to figure out
how many distinct hash values per bucket, which would give us a clue
whether your two-level-list idea is worth anything.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [PATCH] Exorcise zero-dimensional arrays (Was: Re: [HACKERS] Should array_length() Return NULL)

2013-04-04 Thread Robert Haas
On Thu, Apr 4, 2013 at 11:10 AM, Merlin Moncure mmonc...@gmail.com wrote:
 The only reasonable answer for this (a provably used, non-security,
 non-standards violating, non-gross functionality breakage case) is
 *zero*.

+1.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Clang compiler warning on 9.3 HEAD

2013-04-04 Thread Alvaro Herrera
Will Leinweber wrote:
 On ref 8507907 when compiling with clang on os x, I got this warning which
 seems like a possible bug.
 
 I thought to report this because I imagine clang isn't frequently used
 day-to-day by most.

Ugh.  My fault.  Yes, this is a bug.

I don't see any nice way to convert ObjectType to ObjectClass or the
other way around; it seems to me the only simple way to fix this is to
add a new function EventTriggerSupportsObjectClass(), which takes
ObjectClass instead of ObjectType.  Patch attached.

Now, it annoys me that we now have three places that know about object
types supported by event triggers: there's a large struct of command tag
substrings (event_trigger_support), then there's these two functions.
It might be better to add ObjectType and ObjectClass entries to the
struct, so that only the struct needs to know about that.  The problem
is that these two functions would have to walk the struct every time,
instead of being a simple switch.

-- 
Álvaro Herrerahttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training  Services
*** a/src/backend/catalog/dependency.c
--- b/src/backend/catalog/dependency.c
***
*** 210,216  deleteObjectsInList(ObjectAddresses *targetObjects, Relation *depRel,
  			ObjectAddress *thisobj = targetObjects-refs + i;
  
  			if ((!(flags  PERFORM_DELETION_INTERNAL)) 
! EventTriggerSupportsObjectType(getObjectClass(thisobj)))
  			{
  EventTriggerSQLDropAddObject(thisobj);
  			}
--- 210,216 
  			ObjectAddress *thisobj = targetObjects-refs + i;
  
  			if ((!(flags  PERFORM_DELETION_INTERNAL)) 
! EventTriggerSupportsObjectClass(getObjectClass(thisobj)))
  			{
  EventTriggerSQLDropAddObject(thisobj);
  			}
*** a/src/backend/commands/event_trigger.c
--- b/src/backend/commands/event_trigger.c
***
*** 912,917  EventTriggerSupportsObjectType(ObjectType obtype)
--- 912,939 
  }
  
  /*
+  * Do event triggers support this object class?
+  */
+ bool
+ EventTriggerSupportsObjectClass(ObjectClass objclass)
+ {
+ 	switch (objclass)
+ 	{
+ 		case OCLASS_DATABASE:
+ 		case OCLASS_TBLSPACE:
+ 		case OCLASS_ROLE:
+ 			/* no support for global objects */
+ 			return false;
+ 		case OCLASS_EVENT_TRIGGER:
+ 			/* no support for event triggers on event triggers */
+ 			return false;
+ 		default:
+ 			break;
+ 	}
+ 	return true;
+ }
+ 
+ /*
   * Prepare event trigger state for a new complete query to run, if necessary;
   * returns whether this was done.  If it was, EventTriggerEndCompleteQuery must
   * be called when the query is done, regardless of whether it succeeds or fails
*** a/src/include/commands/event_trigger.h
--- b/src/include/commands/event_trigger.h
***
*** 13,18 
--- 13,19 
  #ifndef EVENT_TRIGGER_H
  #define EVENT_TRIGGER_H
  
+ #include catalog/dependency.h
  #include catalog/objectaddress.h
  #include catalog/pg_event_trigger.h
  #include nodes/parsenodes.h
***
*** 41,46  extern Oid AlterEventTriggerOwner(const char *name, Oid newOwnerId);
--- 42,48 
  extern void AlterEventTriggerOwner_oid(Oid, Oid newOwnerId);
  
  extern bool EventTriggerSupportsObjectType(ObjectType obtype);
+ extern bool EventTriggerSupportsObjectClass(ObjectClass objclass);
  extern void EventTriggerDDLCommandStart(Node *parsetree);
  extern void EventTriggerDDLCommandEnd(Node *parsetree);
  extern void EventTriggerSQLDrop(Node *parsetree);

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] corrupt pages detected by enabling checksums

2013-04-04 Thread Andres Freund
On 2013-04-04 12:59:36 -0700, Jeff Davis wrote:
 Andres,
 
 Thank you for diagnosing this problem!
 
 On Thu, 2013-04-04 at 16:53 +0200, Andres Freund wrote:
  I think the route you quickly sketched is more realistic. That would
  remove all knowledge obout XLOG_HINT from generic code hich is a very
  good thing, I spent like 15minutes yesterday wondering whether the early
  return in there might be the cause of the bug...
 
 I like this approach. It may have some performance impact though,
 because there are a couple extra spinlocks taken, and an extra memcpy.

I don't think its really slower. Earlier the code took WalInsertLock
everytime, even if we ended up not logging anything. Thats far more
epensive than a single spinlock. And the copy should also only be taken
in the case we need to log. So I think we end up ahead of the current
state.

 The code looks good to me except that we should be consistent about the
 page hole -- XLogCheckBuffer is calculating it, but then we copy the
 entire page. I don't think anything can change the size of the page hole
 while we have a shared lock on the buffer, so it seems OK to skip the
 page hole during the copy.

I don't think it can change either, but I doubt that there's a
performance advantage by not copying the hole. I'd guess the simpler
code ends up faster.

 Another possible approach is to drop the lock on the buffer and
 re-acquire it in exclusive mode after we find out we'll need to do
 XLogInsert. It means that MarkBufferDirtyHint may end up blocking for
 longer, but would avoid the memcpy. I haven't really thought through the
 details.

That sounds like it would be prone to deadlocks. So I would dislike to
go there.

Will write up a patch tomorrow.

Greetings,

Andres Freund

-- 
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [PATCH] Exorcise zero-dimensional arrays (Was: Re: [HACKERS] Should array_length() Return NULL)

2013-04-04 Thread Tom Lane
Robert Haas robertmh...@gmail.com writes:
 On Thu, Apr 4, 2013 at 11:10 AM, Merlin Moncure mmonc...@gmail.com wrote:
 The only reasonable answer for this (a provably used, non-security,
 non-standards violating, non-gross functionality breakage case) is
 *zero*.

 +1.

Well, if we're going to take that hard a line on it, then we can't
change anything about array data storage or the existing functions'
behavior; which leaves us with either doing nothing at all, or
inventing new functions that have saner behavior while leaving the
old ones in place.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Hash Join cost estimates

2013-04-04 Thread k...@rice.edu
On Thu, Apr 04, 2013 at 04:16:12PM -0400, Stephen Frost wrote:
 * Stephen Frost (sfr...@snowman.net) wrote:
  It does look like reducing bucket depth, as I outlined before through
  the use of a 2-level hashing system, might help speed up
  ExecScanHashBucket, as it would hopefully have very few (eg: 1-2)
  entries to consider instead of more.  Along those same lines, I really
  wonder if we're being too generous wrt the bucket-depth goal of '10'
  instead of, say, '1', especially when we've got plenty of work_mem
  available.
 
 Rerunning using a minimally configured build (only --enable-openssl
 and --enable-debug passed to configure) with NTUP_PER_BUCKET set to '1'
 results in a couple of interesting things-
 
 First, the planner actually picks the plan to hash the small table and
 seqscan the big one.  That also, finally, turns out to be *faster* for
 this test case.
 
 ...
 
 I'm certainly curious about those, but I'm also very interested in the
 possibility of making NTUP_PER_BUCKET much smaller, or perhaps variable
 depending on the work_mem setting.  It's only used in
 ExecChooseHashTableSize, so while making it variable or depending on
 work_mem could slow planning down a bit, it's not a per-tuple cost item.
 
+1 for adjusting this based on work_mem value.

Ken


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] CREATE EXTENSION BLOCKS

2013-04-04 Thread Tom Lane
David E. Wheeler da...@justatheory.com writes:
 On Apr 3, 2013, at 11:41 AM, Alvaro Herrera alvhe...@2ndquadrant.com wrote:
 Right -- an extension is not considered to live within a schema, they
 are database-global.  The objects might live in a particular schema (if
 it is relocatable), and there's support to move those to a different
 schema, but this doesn't affect the extension itself.

 Thanks. I humbly submit this patch to help prevent silly questions like this 
 in the future.

I think this should be addressed in extend.sgml not only on the CREATE
EXTENSION reference page.  After thinking awhile I came up with the
attached wording.  Further wordsmithing anyone?

regards, tom lane

diff --git a/doc/src/sgml/extend.sgml b/doc/src/sgml/extend.sgml
index 672d0df..bc1cd59 100644
*** a/doc/src/sgml/extend.sgml
--- b/doc/src/sgml/extend.sgml
***
*** 359,364 
--- 359,370 
  extension.)  Also notice that while a table can be a member of an
  extension, its subsidiary objects such as indexes are not directly
  considered members of the extension.
+ Another important point is that schemas can belong to extensions, but not
+ vice versa: an extension as such has an unqualified name and does not
+ exist quotewithin/ any schema.  The extension's member objects,
+ however, will belong to schemas whenever appropriate for their object
+ types.  It may or may not be appropriate for an extension to own the
+ schema(s) its member objects are within.
 /para
  
 sect2
diff --git a/doc/src/sgml/ref/create_extension.sgml b/doc/src/sgml/ref/create_extension.sgml
index 4f3b9a5..9c9bf6f 100644
*** a/doc/src/sgml/ref/create_extension.sgml
--- b/doc/src/sgml/ref/create_extension.sgml
*** CREATE EXTENSION [ IF NOT EXISTS ] repl
*** 94,99 
--- 94,105 
  If not specified, and the extension's control file does not specify a
  schema either, the current default object creation schema is used.
 /para
+para
+ Remember that the extension itself is not considered to be within any
+ schema: extensions have unqualified names that must be unique
+ database-wide.  But objects belonging to the extension can be within
+ schemas.
+/para
/listitem
   /varlistentry
  

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] corrupt pages detected by enabling checksums

2013-04-04 Thread Jeff Janes
On Thu, Apr 4, 2013 at 5:30 AM, Simon Riggs si...@2ndquadrant.com wrote:

 On 4 April 2013 02:39, Andres Freund and...@2ndquadrant.com wrote:

  If by now the first backend has proceeded to PageSetLSN() we are writing
  different data to disk than the one we computed the checksum of
  before. Boom.

 Right, so nothing else we were doing was wrong, that's why we couldn't
 spot a bug. The problem is that we aren't replaying enough WAL because
 the checksum on the WAL record is broke.


This brings up a pretty frightening possibility to me, unrelated to data
checksums.  If a bit gets twiddled in the WAL file due to a hardware issue
or a cosmic ray, and then a crash happens, automatic recovery will stop
early with the failed WAL checksum with an innocuous looking message.  The
system will start up but will be invisibly inconsistent, and will proceed
to overwrite that portion of the WAL file which contains the old data (real
data that would have been necessary to reconstruct, once the corruption is
finally realized ) with an end-of-recovery checkpoint record and continue
to chew up real data from there.

I don't know a solution here, though, other than trusting your hardware.
 Changing timelines upon ordinary crash recovery, not just media recovery,
seems excessive but also seems to be exactly what timelines were invented
for, right?

Back to the main topic here, Jeff Davis mentioned earlier You'd still
think this would cause incorrect results, but  I didn't realize the
significance of that until now.  It does produce incorrect query results.
 I was just bailing out before detecting them.  Once I specify
ignore_checksum_failure=1
my test harness complains bitterly about the data not being consistent with
what the Perl program knows it is supposed to be.


 I missed out on doing that with XLOG_HINT records, so the WAL CRC can
 be incorrect because the data is scanned twice; normally that would be
 OK because we have an exclusive lock on the block, but with hints we
 only have share lock. So what we need to do is take a copy of the
 buffer before we do XLogInsert().

 Simple patch to do this attached for discussion. (Not tested).


 We might also do this by modifying the WAL record to take the whole
 block and bypass the BkpBlock mechanism entirely. But that's more work
 and doesn't seem like it would be any cleaner. I figure lets solve the
 problem first then discuss which approach is best.



I've tested your patch it and it seems to do the job.  It has survived far
longer than unpatched ever did, with neither checksum warnings, nor
complaints of inconsistent data.  (I haven't analyzed the code much, just
the results, and leave the discussion of the best approach to others)


 Thanks,

Jeff


Re: [HACKERS] Clang compiler warning on 9.3 HEAD

2013-04-04 Thread Tom Lane
Alvaro Herrera alvhe...@2ndquadrant.com writes:
 Now, it annoys me that we now have three places that know about object
 types supported by event triggers: there's a large struct of command tag
 substrings (event_trigger_support), then there's these two functions.
 It might be better to add ObjectType and ObjectClass entries to the
 struct, so that only the struct needs to know about that.  The problem
 is that these two functions would have to walk the struct every time,
 instead of being a simple switch.

Yeah.  For the moment I think efficiency trumps having the information
in just one place, ie I agree with doing it like this.  If we find we're
constantly forgetting to fix the functions when we need to, it'd be time
enough to revisit the decision.

One thing you could do that might make it less likely we'd miss things
is to have the switches explicitly cover all the possible enum members,
instead of defaulting the majority of them as you do here.  I know when
I think about adding a new object type, I tend to choose the most
similar existing type and grep for references to it.  Likely you could
even draft the code so that most compilers would warn about an enum
value not covered in the switch.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] matview scannability rehash (was Re: Drastic performance loss in assert-enabled build in HEAD)

2013-04-04 Thread Noah Misch
Subject updated to account for the wider topics now appearing.

On Wed, Apr 03, 2013 at 05:49:18PM -0400, Tom Lane wrote:
 What I'd actually rather see us spending time on right now is making
 some provision for incremental updates, which I will boldly propose
 could be supported by user-written triggers on the underlying tables
 if we only diked out the prohibitions against INSERT/UPDATE/DELETE on
 matviews, and allowed them to operate on a matview's contents just like
 it was a table.  Now admittedly that would foreclose allowing matviews
 to be updatable in the updatable-view sense, but that's a feature I
 would readily give up if it meant users could build incremental update
 mechanisms this year and not two years down the road.

I can't see taking MVs in the direction of allowing arbitrary DML; that's what
tables are for.  Users wishing to do that should keep using current methods:

  CREATE VIEW mv_impl AS SELECT ...;
  CREATE TABLE mv AS SELECT * FROM mv_impl;
  -- REFRESH analogue: fancier approaches exist
  BEGIN; TRUNCATE mv; INSERT INTO mv SELECT * FROM mv_impl; COMMIT;

If anything, I'd help these users by introducing mechanisms for obtaining a
TRUNCATE;INSERT with the bells and whistles of REFRESH MATERIALIZED VIEW.
Namely, bulk index rebuilds, skipping WAL, and frozen tuples.

  ... Making sure that
  the heap has at least one page if data has been generated seems
  like a not-entirely-unreasonable way to do that, although there
  remains at least one vacuum bug to fix if we keep it, in addition
  to Tom's concerns.
 
 No.  This is an absolute disaster.  It's taking something we have always
 considered to be an irrelevant implementation detail and making it into
 user-visible DDL state, despite the fact that it doesn't begin to satisfy
 basic transactional behaviors.  We *need* to get rid of that aspect of
 things.  If you must have scannability state in version 0, okay, but
 it has to be a catalog property not this.

In large part, this ended up outside the catalogs due to key limitations of
the startup process.  This isn't the first time we've arranged an unusual
dance for this reason, and it probably won't be the last.  Sure, we could take
out unlogged MVs to evade the problem, but re-adding them will mean either
restoring relfilenode-based bookkeeping or attacking the startup process
limitation directly.  There exist fundamental challenges to a direct attack,
like the potential inconsistency of system catalogs themselves.  We could
teach pg_relation_is_scannable() that unlogged MVs are always non-scannable
during recovery, then somehow update system catalogs in all databases at the
end of recovery.  Not a project for 9.3, and I wouldn't be surprised to see it
mushroom in complexity.  The currently-committed approach is a good one given
the applicable constraints.

A slight variation on the committed approach would be to add a _scannable
relation fork.  The fork would either be absent (not scannable if an MV) or
empty (possibly-scannable).  Create it in CREATE MATERIALIZED VIEW sans WITH
NO DATA and in REFRESH MATERIALIZED VIEW.  Remove it in TRUNCATE.  When the
startup process reinitializes an unlogged relation, it would also remove any
_scannable fork.  This has a few advantages over the current approach: VACUUM
won't need a special case, and pg_upgrade will be in a better position to blow
away all traces if we introduce a better approach.  The disadvantage is an
extra inode per healthy MV.  (Though it does not lead to a 9.3 solution, I'll
note that an always-present relation metapage would help here.)

Note that I said possibly-scannable -- the relfilenode-based indicator
(whether the committed approach or something else) doesn't need to remain the
only input to the question of scannability.  If 9.5 introduces the concept of
age-based scannability expiration, the applicable timestamp could go in
pg_class, and pg_relation_is_scannable() could check both that and the
relfilenode-based indicator.


Concerning the original $SUBJECT, I would look at fixing the performance
problem by having pg_relation_is_scannable() use an algorithm like this:

1. Grab the pg_class entry from syscache.  If it's not found, return NULL.
2. If it's not a matview, return false.
3. Lock the matview and try to open a relcache entry.  Return NULL on failure.
4. Return the scannability as reported by the relcache.

This boils down to the CASE statement you noted upthread, except putting the
fast-exit logic in the function rather than in its caller(s).

nm

-- 
Noah Misch
EnterpriseDB http://www.enterprisedb.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Drastic performance loss in assert-enabled build in HEAD

2013-04-04 Thread Noah Misch
On Thu, Apr 04, 2013 at 12:28:01PM +0200, Nicolas Barbier wrote:
 2013/4/3 Tom Lane t...@sss.pgh.pa.us:
  And if you're absolutely convinced that unlogged matviews mustn't work as I
  suggest, we can lose those from 9.3, too.
 
 +1. Having unlogged matviews without having incremental updates yet,
 isn't super useful anyway.

I would have surmised the opposite: since an unlogged MV requires a full
refresh at unpredictable moments, logged MVs will be preferred where a refresh
is prohibitively expensive.  Why might unlogged-MV applications desire
incremental updates more acutely than logged-MV applications?

-- 
Noah Misch
EnterpriseDB http://www.enterprisedb.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] matview scannability rehash (was Re: Drastic performance loss in assert-enabled build in HEAD)

2013-04-04 Thread Tom Lane
Noah Misch n...@leadboat.com writes:
 On Wed, Apr 03, 2013 at 05:49:18PM -0400, Tom Lane wrote:
 No.  This is an absolute disaster.  It's taking something we have always
 considered to be an irrelevant implementation detail and making it into
 user-visible DDL state, despite the fact that it doesn't begin to satisfy
 basic transactional behaviors.  We *need* to get rid of that aspect of
 things.  If you must have scannability state in version 0, okay, but
 it has to be a catalog property not this.

 In large part, this ended up outside the catalogs due to key limitations of
 the startup process.

Yeah, I realize that there's no other (easy) way to make unlogged
matviews reset to an invalid state on crash, but that doesn't make this
design choice less of a disaster.  It boxes us into something that's
entirely unable to support transitions between scannable and unscannable
states by any means short of a complete rewrite of the matview contents;
which seems fundamentally incompatible with any sort of incremental
update scenario.  And I remain of the opinion that it's going to box us
into not being able to fix the problems because of pg_upgrade
on-disk-compatibility issues.  We will be far better off to drop
unlogged matviews until we can come up with a better design.  If that's
so far off that no one can see it happening, well, that's tough.
Leaving the door open for incremental maintenance is more important.

 A slight variation on the committed approach would be to add a _scannable
 relation fork.

Not very transaction-safe, I think (consider crash midway through a
transaction that adds or removes the fork), and in any case orders of
magnitude more expensive than looking at a pg_class field.  This really
needs to be catalog state, not filesystem state.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] matview scannability rehash (was Re: Drastic performance loss in assert-enabled build in HEAD)

2013-04-04 Thread Alvaro Herrera
Tom Lane escribió:
 Noah Misch n...@leadboat.com writes:

  A slight variation on the committed approach would be to add a _scannable
  relation fork.
 
 Not very transaction-safe, I think (consider crash midway through a
 transaction that adds or removes the fork), and in any case orders of
 magnitude more expensive than looking at a pg_class field.  This really
 needs to be catalog state, not filesystem state.

We could revive the pg_class_nt patch proposed a decade ago, perhaps ...

-- 
Álvaro Herrerahttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] patch to add \watch to psql

2013-04-04 Thread Tom Lane
Will Leinweber w...@heroku.com writes:
 Here is an updated patch that addresses several of the points brought up so
 far, such as the sleep, internationalization banner, and zero wait check,
 and it removes the premature input check.

I whacked this around some more, added basic docs, and committed it.

 Unfortunately rl_clear_screen() is not included at all in libedit, causing
 compilation to fail, and I was completely unable to find a way to
 distinguish libedit from readline on OS X. It tries extraordinarily hard to
 pretend that it's readline. Instead falling back to simple control
 characters to clear the screen worked very well on both linux and OS X.

I took that out; works on the two cases I tried does not mean portable.

It's possible we could do something involving having configure check for
rl_clear_screen() etc, but that seems like more work than is justified,
not to mention that the results would then be platform-dependent
*by design*.  Frankly I kinda prefer the behavior without a screen clear
anyway; though this may just prove that I'm not accustomed to using the
original watch.  Anyway, that's open to a followup patch if anybody is
sufficiently set on doing it differently.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] patch to add \watch to psql

2013-04-04 Thread Will Leinweber
On Thu, Apr 4, 2013 at 5:04 PM, Tom Lane t...@sss.pgh.pa.us wrote:


 I whacked this around some more, added basic docs, and committed it.


Thanks!


   Unfortunately rl_clear_screen() is not included at all in libedit,
 causing
  compilation to fail, and I was completely unable to find a way to
  distinguish libedit from readline on OS X. It tries extraordinarily hard
 to
  pretend that it's readline. Instead falling back to simple control
  characters to clear the screen worked very well on both linux and OS X.

 I took that out; works on the two cases I tried does not mean portable.


Completely understandable.


I'm very excited to have helped contribute something, however small this
was, to the project.

Thanks again,
Will


Re: [HACKERS] corrupt pages detected by enabling checksums

2013-04-04 Thread Jeff Davis
On Thu, 2013-04-04 at 14:21 -0700, Jeff Janes wrote:

 This brings up a pretty frightening possibility to me, unrelated to
 data checksums.  If a bit gets twiddled in the WAL file due to a
 hardware issue or a cosmic ray, and then a crash happens, automatic
 recovery will stop early with the failed WAL checksum with
 an innocuous looking message.  The system will start up but will be
 invisibly inconsistent, and will proceed to overwrite that portion of
 the WAL file which contains the old data (real data that would have
 been necessary to reconstruct, once the corruption is finally realized
 ) with an end-of-recovery checkpoint record and continue to chew up
 real data from there.

I've been worried about that for a while, and I may have even seen
something like this happen before. We could perhaps do some checks, but
in general it seems hard to solve without writing flushing some data to
two different places. For example, you could flush WAL, and then update
an LSN stored somewhere else indicating how far the WAL has been
written. Recovery could complain if it gets an error in the WAL before
that point.

But obviously, that makes WAL flushes expensive (in many cases, about
twice as expensive).

Maybe it's not out of the question to offer that as an option if nobody
has a better idea. Performance-conscious users could place the extra LSN
on an SSD or NVRAM or something; or maybe use commit_delay or async
commits. It would only need to store a few bytes.

Streaming replication mitigates the problem somewhat, by being a second
place to write WAL.

Regards,
Jeff Davis




-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] corrupt pages detected by enabling checksums

2013-04-04 Thread Jeff Davis
On Thu, 2013-04-04 at 22:39 +0200, Andres Freund wrote:
 I don't think its really slower. Earlier the code took WalInsertLock
 everytime, even if we ended up not logging anything. Thats far more
 epensive than a single spinlock. And the copy should also only be taken
 in the case we need to log. So I think we end up ahead of the current
 state.

Good point.

  The code looks good to me except that we should be consistent about the
  page hole -- XLogCheckBuffer is calculating it, but then we copy the
  entire page. I don't think anything can change the size of the page hole
  while we have a shared lock on the buffer, so it seems OK to skip the
  page hole during the copy.
 
 I don't think it can change either, but I doubt that there's a
 performance advantage by not copying the hole. I'd guess the simpler
 code ends up faster.

I was thinking more about the WAL size, but I don't have a strong
opinion.

Regards,
Jeff Davis



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] corrupt pages detected by enabling checksums

2013-04-04 Thread Tom Lane
Jeff Davis pg...@j-davis.com writes:
 On Thu, 2013-04-04 at 14:21 -0700, Jeff Janes wrote:
 This brings up a pretty frightening possibility to me, unrelated to
 data checksums.  If a bit gets twiddled in the WAL file due to a
 hardware issue or a cosmic ray, and then a crash happens, automatic
 recovery will stop early with the failed WAL checksum with
 an innocuous looking message.  The system will start up but will be
 invisibly inconsistent, and will proceed to overwrite that portion of
 the WAL file which contains the old data (real data that would have
 been necessary to reconstruct, once the corruption is finally realized
 ) with an end-of-recovery checkpoint record and continue to chew up
 real data from there.

 I've been worried about that for a while, and I may have even seen
 something like this happen before. We could perhaps do some checks, but
 in general it seems hard to solve without writing flushing some data to
 two different places. For example, you could flush WAL, and then update
 an LSN stored somewhere else indicating how far the WAL has been
 written. Recovery could complain if it gets an error in the WAL before
 that point.

 But obviously, that makes WAL flushes expensive (in many cases, about
 twice as expensive).

 Maybe it's not out of the question to offer that as an option if nobody
 has a better idea. Performance-conscious users could place the extra LSN
 on an SSD or NVRAM or something; or maybe use commit_delay or async
 commits. It would only need to store a few bytes.

At least on traditional rotating media, this is only going to perform
well if you dedicate two drives to the purpose.  At which point you
might as well just say let's write two copies of WAL.  Or maybe three
copies, so that you can take a vote when they disagree.  While this is
not so unreasonable on its face for ultra-high-reliability requirements,
I can't escape the feeling that we'd just be reinventing software RAID.
There's no reason to think that we can deal with this class of problems
better than the storage system can.

 Streaming replication mitigates the problem somewhat, by being a second
 place to write WAL.

Yeah, if you're going to do this at all it makes more sense for the
redundant copies to be on other machines.  So the questions that that
leads to are how smart is our SR code about dealing with a master that
tries to re-send WAL regions it already sent, and what a slave ought to
do in such a situation if the new data doesn't match.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [PATCH] Exorcise zero-dimensional arrays (Was: Re: [HACKERS] Should array_length() Return NULL)

2013-04-04 Thread Brendan Jurd
On 5 April 2013 07:43, Tom Lane t...@sss.pgh.pa.us wrote:
 Well, if we're going to take that hard a line on it, then we can't
 change anything about array data storage or the existing functions'
 behavior; which leaves us with either doing nothing at all, or
 inventing new functions that have saner behavior while leaving the
 old ones in place.

And then we are in the awkward position of trying to explain the
differences in behaviour between the old and new functions ...
presumably with a dash of for historical reasons and a sprinkling of
to preserve compatibility in every other paragraph.

The other suggestion that had been tossed around elsewhere upthread
was inventing a new type that serves the demand for a straightforward
mutable list, which has exactly one dimension, and which may be
sensibly empty.  Those few who are interested in dimensions = 2 could
keep on using arrays, with all their backwards-compatible silliness
intact, and everybody else could migrate to lists at their leisure.

I don't hate the latter idea from a user perspective, but from a
developer perspective I suspect there are valid objections to be made.

Cheers,
BJ


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] corrupt pages detected by enabling checksums

2013-04-04 Thread Jeff Davis
On Thu, 2013-04-04 at 21:06 -0400, Tom Lane wrote:
 I can't escape the feeling that we'd just be reinventing software RAID.
 There's no reason to think that we can deal with this class of problems
 better than the storage system can.

The goal would be to reliably detect a situation where WAL that has been
flushed successfully was later corrupted; not necessarily to recover
when we find it. Unless it's something like ZFS, I don't think most
software RAID will reliably detect corruption.

A side benefit is that it would also help catch bugs like this in the
future.

I'm not advocating for this particular solution, but I do concur with
Jeff Janes that it's a little bit scary and that we can entertain some
ideas about how to mitigate it.

Regards,
Jeff Davis




-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [PATCH] Exorcise zero-dimensional arrays (Was: Re: [HACKERS] Should array_length() Return NULL)

2013-04-04 Thread Tom Lane
Brendan Jurd dire...@gmail.com writes:
 The other suggestion that had been tossed around elsewhere upthread
 was inventing a new type that serves the demand for a straightforward
 mutable list, which has exactly one dimension, and which may be
 sensibly empty.  Those few who are interested in dimensions = 2 could
 keep on using arrays, with all their backwards-compatible silliness
 intact, and everybody else could migrate to lists at their leisure.

 I don't hate the latter idea from a user perspective, but from a
 developer perspective I suspect there are valid objections to be made.

The real problem with that is that the existing arrays have glommed onto
the syntax that is both most natural and SQL-spec-required.  I don't
think there is a lot of room to shove in a different kind of critter
there.  (There's been a remarkable lack of attention to the question
of spec compliance in this thread, btw.  Surely the standard has
something to say on the matter of zero-length arrays?)

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [PATCH] Exorcise zero-dimensional arrays (Was: Re: [HACKERS] Should array_length() Return NULL)

2013-04-04 Thread Brendan Jurd
On 5 April 2013 13:04, Tom Lane t...@sss.pgh.pa.us wrote:
 (There's been a remarkable lack of attention to the question
 of spec compliance in this thread, btw.  Surely the standard has
 something to say on the matter of zero-length arrays?)

From 4.10 in my draft copy of Foundation, arrays are one of two
collection types (the other being multisets), and:

  A collection is a composite value comprising zero or more elements,
each a value of some data type DT

  The number of elements in C is the cardinality of C

  An array is a collection A in which each element is associated with
exactly one ordinal position in A. If n is
the cardinality of A, then the ordinal position p of an element is an
integer in the range 1 (one) ≤ p ≤ n.

The language specifically allows for zero elements, and does not
contemplate multiple dimensions.  The specification for the array
constructor syntax (6.36) and array element reference by subscript
(6.23) also make it fairly clear that only 1-D arrays were being
considered.

I'd say we've already gone way off-menu by having multidims.  A more
compliant approach would have been to implement arrays as 1-D only,
and then maybe have a separate thing (matrices?) for multidims.

While I was in there I noticed CARDINALITY, which would be pretty easy
to add and would at least provide a more productive way to get the
real length of an array without disrupting existing functionality:

cardinality expression ::=
CARDINALITY left paren collection value expression right paren

Cheers,
BJ


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [PATCH] Exorcise zero-dimensional arrays (Was: Re: [HACKERS] Should array_length() Return NULL)

2013-04-04 Thread Tom Lane
Brendan Jurd dire...@gmail.com writes:
 On 5 April 2013 13:04, Tom Lane t...@sss.pgh.pa.us wrote:
 (There's been a remarkable lack of attention to the question
 of spec compliance in this thread, btw.  Surely the standard has
 something to say on the matter of zero-length arrays?)

 The language specifically allows for zero elements, and does not
 contemplate multiple dimensions.  The specification for the array
 constructor syntax (6.36) and array element reference by subscript
 (6.23) also make it fairly clear that only 1-D arrays were being
 considered.

 I'd say we've already gone way off-menu by having multidims.

Yeah, we knew that.  I don't have a problem with seeing multidim arrays
as an extension to the standard though.  The point is that the behavior
for the 1-D, default-lower-bound case ought to match the standard.

 While I was in there I noticed CARDINALITY, which would be pretty easy
 to add and would at least provide a more productive way to get the
 real length of an array without disrupting existing functionality:

Yeah, that would at least fix the null-result-for-empty-array problem
for that particular functionality.  Still, this is ammunition for the
position that null results for empty arrays are just broken.

BTW ... if you check the archives you will find that we had
cardinality() for a short while, and removed it before 8.4 release,
because we couldn't agree on what it ought to return when given a
multi-dimensional array.  I'm afraid that issue is still unresolved.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Why there is a PG_GETARG_UINT32 and PG_RETURN_UINT32?

2013-04-04 Thread Amit Kapila
On Thursday, April 04, 2013 8:30 PM Rodrigo Barboza wrote:

 Hi guys.
 I am wondering when I can use the PG_GETARG_UINT32 and PG_RETURN_UINT32.
 If postgres has no unsigned int type, what is the use of these macros?

They are mainly used for contrib module functionality or some built-in
functions which are not exposed.
For example, bt_page_items() receives relation name (text) and block number
(int), but internally the block number
is unit32 as max blocks can be oxFFFE.

With Regards,
Amit Kapila.



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Why there is a PG_GETARG_UINT32 and PG_RETURN_UINT32?

2013-04-04 Thread Rodrigo Barboza
I am creating my own uint32 type and faced this function.
But my args are always of type my_int and one of the signed int types as
postgres doesn't hava unsigned.
Could I use those functions in operations between those types?
I can't see a place for this, I don't know if I am missing something


On Fri, Apr 5, 2013 at 1:12 AM, Amit Kapila amit.kap...@huawei.com wrote:

 On Thursday, April 04, 2013 8:30 PM Rodrigo Barboza wrote:

  Hi guys.
  I am wondering when I can use the PG_GETARG_UINT32 and PG_RETURN_UINT32.
  If postgres has no unsigned int type, what is the use of these macros?

 They are mainly used for contrib module functionality or some built-in
 functions which are not exposed.
 For example, bt_page_items() receives relation name (text) and block number
 (int), but internally the block number
 is unit32 as max blocks can be oxFFFE.

 With Regards,
 Amit Kapila.




Re: [HACKERS] Multi-pass planner

2013-04-04 Thread Amit Kapila
On Friday, April 05, 2013 1:59 AM Robert Haas wrote:
 On Thu, Apr 4, 2013 at 2:53 PM, Dimitri Fontaine
 dimi...@2ndquadrant.fr wrote:
  Robert Haas robertmh...@gmail.com writes:
  for estimate_worstcase_fraction.  So, when computing the cost of a
  path, we'd compute our current expected-case estimate, and also a
  worst-case estimate, and then compute the final cost as:
 
  There also was the idea for the executor to be able to handle
 alternate
  plans and some heuristic to determine that the actual cost of running
 a
  plan is much higher than what's been estimated, so much so as to
 switch
  to starting from scratch with the other plan instead.
 
 Yeah.  The thing is, if the plan has any side effects, that's not
 really an option.  And even if it doesn't, it may throw away a lot of
 work.  

Why to throw away all the work, it could as well try to repair the plan or
even if plan repair is not possible, it can keep multiple plans and 
next time can try to choose best among available one's.


With Regards,
Amit Kapila.



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [PATCH] Exorcise zero-dimensional arrays (Was: Re: [HACKERS] Should array_length() Return NULL)

2013-04-04 Thread Brendan Jurd
On 5 April 2013 15:05, Tom Lane t...@sss.pgh.pa.us wrote:
 Brendan Jurd dire...@gmail.com writes:
 While I was in there I noticed CARDINALITY, which would be pretty easy
 to add and would at least provide a more productive way to get the
 real length of an array without disrupting existing functionality:

 Yeah, that would at least fix the null-result-for-empty-array problem
 for that particular functionality.  Still, this is ammunition for the
 position that null results for empty arrays are just broken.

 BTW ... if you check the archives you will find that we had
 cardinality() for a short while, and removed it before 8.4 release,
 because we couldn't agree on what it ought to return when given a
 multi-dimensional array.  I'm afraid that issue is still unresolved.

Well for what it's worth I would expect cardinality() to return the
total number of elements in the array (per ArrayGetNItems).  It's
consistent with the spec's identification of an array as a
collection.  You can chunk the elements into dimensions however you
want, but it's still a collection of elements, and the cardinality is
still the number of elements.

The nesting interpretation doesn't accord with our internal
representation, nor with our requirement that multidim arrays be
regular, nor with the fact that we can't put an array of texts inside
an array of ints.  Our array input syntaxes for multidim arrays look
nest-ish but what they produce is not nested.

Cheers,
BJ


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Page replacement algorithm in buffer cache

2013-04-04 Thread Amit Kapila
On Thursday, April 04, 2013 6:12 PM Robert Haas wrote:
 On Wed, Apr 3, 2013 at 9:49 PM, Greg Smith g...@2ndquadrant.com
 wrote:
  On 4/2/13 11:54 AM, Robert Haas wrote:
  But, having said that, I still think the best idea is what Andres
  proposed, which pretty much matches my own thoughts: the bgwriter
  needs to populate the free list, so that buffer allocations don't
 have
  to wait for linear scans of the buffer array.
 
  I was hoping this one would make it to a full six years of being on
 the TODO
  list before it came up again, missed it by a few weeks.  The funniest
 part
  is that Amit even submitted a patch on this theme a few months ago
 without
  much feedback:
  http://www.postgresql.org/message-
 id/6C0B27F7206C9E4CA54AE035729E9C382852FF97@szxeml509-mbs
  That stalled where a few things have, on a) needing more regression
 test
  workloads, and b) wondering just what the deal with large
 shared_buffers
  setting degrading performance was.
 
 Those are impressive results.  I think we should seriously consider
 doing something like that for 9.4.  TBH, although more workloads to
 test is always better, I don't think this problem is so difficult that
 we can't have some confidence in a theoretical analysis.  If I read
 the original thread correctly (and I haven't looked at the patch
 itself), the proposed patch would actually invalidate buffers before
 putting them on the freelist.  That effectively amounts to reducing
 shared_buffers, so workloads that are just on the edge of what can fit
 in shared_buffers will be harmed, and those that benefit incrementally
 from increased shared_buffers will be as well.
 
 What I think we should do instead is collect the buffers that we think
 are evictable and stuff them onto the freelist without invalidating
 them.  When a backend allocates from the freelist, it can double-check
 that the buffer still has usage_count 0.  The odds should be pretty
 good.  But even if we sometimes notice that the buffer has been
 touched again after being put on the freelist, we haven't expended all
 that much extra effort, and that effort happened mostly in the
 background.  

If we just put it to freelist, then next time if it get allocated directly
from bufhash table, then who will remove it from freelist
or do you think that, in BufferAlloc, if it gets from bufhash table, then it
should verify if it's in freelist, then remove from freelist.

With Regards,
Amit Kapila.



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Why there is a PG_GETARG_UINT32 and PG_RETURN_UINT32?

2013-04-04 Thread Amit Kapila
On Friday, April 05, 2013 10:00 AM Rodrigo Barboza wrote:

 I am creating my own uint32 type and faced this function.
 But my args are always of type my_int and one of the signed int types as
postgres doesn't hava unsigned.
 Could I use those functions in operations between those types?

It should not be a problem if your signed type doesn't have any negative
value.


 I can't see a place for this, I don't know if I am missing something

On Fri, Apr 5, 2013 at 1:12 AM, Amit Kapila amit.kap...@huawei.com wrote:
On Thursday, April 04, 2013 8:30 PM Rodrigo Barboza wrote:

 Hi guys.
 I am wondering when I can use the PG_GETARG_UINT32 and PG_RETURN_UINT32.
 If postgres has no unsigned int type, what is the use of these macros?
 They are mainly used for contrib module functionality or some built-in
 functions which are not exposed.
 For example, bt_page_items() receives relation name (text) and block
number
 (int), but internally the block number
 is unit32 as max blocks can be oxFFFE.

With Regards,
Amit Kapila.




-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers