Re: [HACKERS] [GENERAL] VACUUM touching file but not updating relation

2011-11-22 Thread Simon Riggs
On Fri, Nov 18, 2011 at 3:18 PM, Tom Lane t...@sss.pgh.pa.us wrote:

 So the correct number of WAL records is emitted and I see no bug there.

 What Thom's complaining about is that the buffer may be marked dirty
 unnecessarily, ie when there has been no actual data change.

Based upon both your feedback, I made a change to stop the block being
marked dirty, though Tom now wants that removed.

Thom, your earlier analysis showing that the md5 checksum of a
relation had changed is not happening because of the section of code
you identified. The code sets some data on the page, which would cause
the md5 checksum to change. So it cannot be the btree code  at
_bt_delitems_vacuum() causing this.

-- 
 Simon Riggs   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] Re: [COMMITTERS] pgsql: Avoid marking buffer dirty when VACUUM has no work to do.

2011-11-22 Thread Simon Riggs
On Tue, Nov 22, 2011 at 2:32 AM, Tom Lane t...@sss.pgh.pa.us wrote:
 I wrote:
 Simon Riggs si...@2ndquadrant.com writes:
 Avoid marking buffer dirty when VACUUM has no work to do.
 When wal_level = 'hot_standby' we touched the last page of the
 relation during a VACUUM, even if nothing else had happened.
 That would alter the LSN of the last block and set the mtime
 of the relation file unnecessarily. Noted by Thom Brown.

 This doesn't look right to me --- you have not accounted for the
 possibility that btpo_cycleid or BTP_HAS_GARBAGE is changed.

 Also, I'm confused about the business of not setting the LSN.  Thom
 claimed that he was seeing the page not change at all (or at least
 md5sum of the file didn't change) despite mtime changing.  If we'd
 been plastering a new LSN on the page each time, then that should
 certainly not have been possible.  So I now think maybe we've
 mis-analyzed what was happening in his example.

 I think this requires more careful analysis.

 Ping?  If you don't respond, I'm going to take it on my own authority to
 revert this patch, because it's definitely broken as-is, and I don't
 think the consequences of not updating the page LSN have been thought
 through either.

Tom, waiting across a weekend isn't a cause for concern.

I made that change for you, so am happy to revoke for you also.

-- 
 Simon Riggs   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Allow substitute allocators for PGresult.

2011-11-22 Thread Kyotaro HORIGUCHI
Hello,

At Fri, 11 Nov 2011 11:29:30 +0200, Heikki Linnakangas wrote
 You could use the resource owner mechanism to track
 them. Register a callback function with
 RegisterResourceReleaseCallback().

Thank you for letting me know about it. I have dug up a message
in pg-hackers refering to the mechanism on discussion about
postgresql-fdw. I'll put further thought into dblink-plus taking
it into account.


By the way, thinking about memory management for the result in
libpq is considerable as another issue.

At Sat, 12 Nov 2011 12:29:50 -0500, Tom Lane wrote
 To start with, the patch proposes exposing some global
 variables that affect the behavior of libpq process-wide.  This
 seems like a pretty bad design, because a single process could
 contain multiple usages of libpq

You're right to say the design is bad. I've designed it to have
minimal impact on libpq by limiting usage and imposing any
reponsibility on the users, that is the developers of the modules
using it. If there are any other applications that want to use
their own allocators, there are some points to be considered.

I think it is preferable consiering multi-threading to make libpq
write PGresult into memory blocks passed from the application
like OCI does, instead of letting libpq itself make request for
them.

This approach hands the responsibility of memory management to
the user and gives them the capability to avoid memory exhaustion
by their own measures.

On the other hand, this way could produce the situation that
libpq cannot write all of the data to receive from the server
onto handed memory block. So, the API must be able to return the
partial data to the caller.

More advancing, if libpq could store the result directly into
user-allocated memory space using tuplestore-like interface, it
is better on performance if the final storage is a tuplestore
itself.

I will be happy with the part-by-part passing of result. So I
will think about this as the next issue.


 So I'd feel happier about the patch if it came along with a
 patch to make dblink.c use it to prevent memory leaks.

I take it is about my original patch.

Mmm, I heard that dblink copies received data in PGResult to
tuple store not only because of the memory leak, but less memory
usage (after the copy is finished). I think I could show you the
patch ignoring the latter, but it might take some time for me to
start from understand dblink and tuplestore closely...


If I find RegisterResourceReleaseCallback short for our
requirement, I will show it. If not, I withdraw this patch for
ongoing CF and propose another patch based on the discussion
above at another time.


Please let me have a little more time.

regards,

-- 
Kyotaro Horiguchi
NTT Open Source Software Center

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] dblink: enable parameters

2011-11-22 Thread Heikki Linnakangas

On 22.11.2011 06:55, Pavel Stehule wrote:

I know so dblink is deprecated interface - but it has necessary
functionality still -  it support a writable statements.


It would be nice to add support for SQL/MED passthrough mode... That 
would allow you to do any queries/updates to foreign servers. It 
wouldn't sound very difficult at first glance, though I'm not sure what 
it would mean to our parser, for example.


--
  Heikki Linnakangas
  EnterpriseDB   http://www.enterprisedb.com

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] dblink: enable parameters

2011-11-22 Thread Itagaki Takahiro
On Tue, Nov 22, 2011 at 20:09, Heikki Linnakangas
heikki.linnakan...@enterprisedb.com wrote:
 It would be nice to add support for SQL/MED passthrough mode... That would
 allow you to do any queries/updates to foreign servers. It wouldn't sound
 very difficult at first glance, though I'm not sure what it would mean to
 our parser, for example.

I think passthrough mode is almost impossible or has very limited usage
because we cannot pass query texts that our parser doesn't accept;
we cannot support foreign-table-specific queries.

 On 22.11.2011 06:55, Pavel Stehule wrote:
 I know so dblink is deprecated interface - but it has necessary
 functionality still -  it support a writable statements.

So, dblink on FDW connection seems to be a possible solution.
We pass a query as a form of a plain text.

-- 
Itagaki Takahiro

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] explain analyze query execution time

2011-11-22 Thread Rudyar

On 22/11/11 02:58, Kevin Grittner wrote:

Rudyar  wrote:


I try to get the execution time of a query workload. I try using
explain analyze but this time is allways higher than the execution
time of a query across a client like pgadmin3

what is the reason about that difference?


It's the observer effect -- there is a cost to the timing,
counting, measuring, and reporting which is done by EXPLAIN ANALYZE,
which distorts what is being measured.  It's just about impossible to
avoid entirely.

-Kevin

Thanks kevin,

what tool you recommend for measure the query real query execution time?

--
Rudyar Cortés.
Estudiante de Ingeniería Civil Informática
Universidad Técnica Federico Santa María.


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] Review: CHECK FUNCTION statement

2011-11-22 Thread Albe Laurenz
I tried to look at the patch, but it does not apply to current master,
probably because of bit rot.

Can you submit an updated version?

The patch contains docs and regression tests and is context diff.
I'll mark it waiting for author.

Yours,
Laurenz Albe

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] explain analyze query execution time

2011-11-22 Thread Kevin Grittner
Rudyar  wrote:
 
 what tool you recommend for measure the query real query
 execution time?
 
The -hackers list is for discussion to coordinate development of the
PostgreSQL database product.  For user questions like this, please
pick a more appropriate list based on the descriptions here:
 
http://www.postgresql.org/community/lists/
 
While any further follow-up should be on another list, I'll briefly
answer here.  EXPLAIN ANALYZE is great for seeing how a query is
being executed, but for accurate timing of how long the query runs
without generating all that extra information, you can measure it on
the client side, or turn on logging of statements running long than
some particular time.  In psql you can use \timing on, in Java you
can run System.currentTimeInMillis() before and after running the
query, etc.
 
-Kevin



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] explain analyze query execution time

2011-11-22 Thread Rudyar

On 22/11/11 10:26, Kevin Grittner wrote:

Rudyar  wrote:


what tool you recommend for measure the query real query
execution time?


The -hackers list is for discussion to coordinate development of the
PostgreSQL database product.  For user questions like this, please
pick a more appropriate list based on the descriptions here:

http://www.postgresql.org/community/lists/

While any further follow-up should be on another list, I'll briefly
answer here.  EXPLAIN ANALYZE is great for seeing how a query is
being executed, but for accurate timing of how long the query runs
without generating all that extra information, you can measure it on
the client side, or turn on logging of statements running long than
some particular time.  In psql you can use \timing on, in Java you
can run System.currentTimeInMillis() before and after running the
query, etc.

-Kevin



Thanks Kevin ;)


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Review: CHECK FUNCTION statement

2011-11-22 Thread Pavel Stehule
2011/11/22 Albe Laurenz laurenz.a...@wien.gv.at:
 I tried to look at the patch, but it does not apply to current master,
 probably because of bit rot.

 Can you submit an updated version?

 The patch contains docs and regression tests and is context diff.
 I'll mark it waiting for author.

please wait, I plan to work on this topic at thursday.

Regards

Pavel


 Yours,
 Laurenz Albe


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Singleton range constructors versus functional coercion notation

2011-11-22 Thread Robert Haas
On Mon, Nov 21, 2011 at 9:55 AM, Tom Lane t...@sss.pgh.pa.us wrote:
 Robert Haas robertmh...@gmail.com writes:
 On Nov 20, 2011, at 10:24 PM, Jeff Davis pg...@j-davis.com wrote:
 Well, if there were a good shorter notation, then probably so. But it
 doesn't look like we have a good idea, so I'm fine with dropping it.

 We should also keep in mind that people who use range types can and likely 
 will define their own convenience functions.  If people use singletons, or 
 open ranges, or closed ranges, or one-hour timestamp ranges frequently, they 
 can make their own notational shorthand with a 3-line CREATE FUNCTION 
 statement.  We don't need to have it all in core.

 But if you believe that, what syntax do you think people are likely to
 try if they want a singleton range constructor?  Leaving the user to
 discover the problem and try to invent a workaround is not better than
 doing it ourselves ...

Wait, do I hear the great Tom Lane arguing for putting more than the
minimal amount of stuff in core?  :-)

I honestly don't know what function names people will pick, and I
don't care.  Someone might like singleton(x), which would be
impractical as a built-in because there could be more than one range
type over the same base type, but if the user defines the function
they can pick what's convenient for them.  If they use singletons
exceedingly frequently they might even want something really short,
like just(x) or s(x).  Or they might say daterange1(x), along the
lines you suggested earlier.  The point is that by not defining more
than necessary in core, we give the user the flexibility to do what
they want.  In cases where that amounts to handing them a loaded gun
with the safety off, we shouldn't do it, but that doesn't seem to be
the case here.  It doesn't take a lot of programming acumen to write a
function that passes two copies of its single argument to a built-in.
The only mistake anyone's likely to make is forgetting to declare it
non-VOLATILE.

But the real point is that I don't think we should assume that
singleton ranges are unique in being things for which people will want
shortcuts.  We talked about having a behavior-changing GUC to control
whether the bounds are [) or [] or () or (], but we didn't do it,
because behavior-changing GUCs are a bad idea.  But I fully expect
that people who make heavy use of range types will (and we should
encourage them to) define convenience functions with names of their
own choosing that pass the bounds that are useful to them.  If isn't
the very model of a use-case for inline-able SQL functions, I'm not
sure what is.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] EXPLAIN (plan off, rewrite off) for benchmarking

2011-11-22 Thread Robert Haas
On Mon, Nov 21, 2011 at 8:54 PM, Tom Lane t...@sss.pgh.pa.us wrote:
 Robert Haas robertmh...@gmail.com writes:
 ... Maybe we could find a way to reduce the size of the parse
 tree (i.e. fewer nodes), or the number of times that it has to be
 walked/copied.

 We could eliminate some annoying tree-copy steps if we could institute
 the policy that parse analysis doesn't scribble on the raw parse tree,
 rewriter doesn't modify parse analysis output, and planner doesn't
 modify rewriter output.  However, it would be a lot of work, and I'm not
 entirely sure that we'd end up with a significant speed benefit.  In a
 lot of places, the only way to not scribble on the input is to copy it
 anyway ...

This is probably a stupid question, but why does it matter if parse
analysis scribbles on the raw parse tree, or the rewriter on the parse
analysis output?  I understand that we may sometimes need to replan
the output of the rewriter, so we'd better not modify it
destructively, but I would have thought that parse and parse analysis
would always be done together, in which case it doesn't obviously
matter.  I'm probably missing something here...

Another thing we might want to consider doing is introducing some kind
of container data structure other than List.  I think that trying to
change the semantics of the existing List datatype in any meaningful
way is probably doomed, but if we introduce a new abstraction and
gradually convert things over, we have a lot more flexibility.  What
I'm imagining is something that is optimized for holding a small
number of pointers (say, 4) that are normally added in order, but
which can grow if needed, at some cost in performance.  For example:

struct Thingy
{
unsigned short nused;
unsigned short nalloc;
struct Thingy *next;   /* if we run out of elements in this
Thingy, we can link to another Thingy with more space */
void *item[FLEXIBLE_ARRAY_MEMBER];/* really nalloc */
};

This would mean fewer palloc() calls and less overall memory usage
than a List, and might also improve memory access locality.  Right
now, walking a three element list could involve pulling in one cache
line for the list, three more for the list cells, and then
(presumably) three more for the elements themselves.  There's not much
help for the fact that the elements might end up in different cache
lines, but at least if we did something like this you wouldn't need to
bounce around to find the list cells.  This is particularly obnoxious
for things like RangeVars, where we build up the list representation
mostly as an intermediate step and then flatten it again.

Now, one obvious problem with this line of attack is that each
individual patch in this area isn't likely to save anything noticeable
on real workloads.  But I don't think that should stop us from trying.
 With every performance patch that goes in, the number of workloads
that are dominated by the cost of backend-local memory allocation is
growing.  The cost is extremely distributed and it's very hard to pin
down where it's all coming from, but we've gotta start somewhere.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] EXPLAIN (plan off, rewrite off) for benchmarking

2011-11-22 Thread Tom Lane
Robert Haas robertmh...@gmail.com writes:
 On Mon, Nov 21, 2011 at 8:54 PM, Tom Lane t...@sss.pgh.pa.us wrote:
 We could eliminate some annoying tree-copy steps if we could institute
 the policy that parse analysis doesn't scribble on the raw parse tree,
 rewriter doesn't modify parse analysis output, and planner doesn't
 modify rewriter output.

 This is probably a stupid question, but why does it matter if parse
 analysis scribbles on the raw parse tree, or the rewriter on the parse
 analysis output?

Because we frequently need to save the original tree for possible
re-analysis later.  This doesn't matter in the simple-query protocol,
but it does matter in any code path that involves plancache.

 I understand that we may sometimes need to replan
 the output of the rewriter, so we'd better not modify it
 destructively, but I would have thought that parse and parse analysis
 would always be done together, in which case it doesn't obviously
 matter.

No, actually it's the raw grammar output tree that gets saved for
re-analysis in case we are told of a DDL change.  (I've considered
having the code save only the original query string, but then you'd
have to repeat the flex/bison work, and those things show up high
enough on any profile to make it seem unlikely that this is cheaper
than copying the parse tree.)

It's possible that we don't need a read-only guarantee for the rewriter
versus parse analysis output, but I doubt that helps much.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Singleton range constructors versus functional coercion notation

2011-11-22 Thread Jeff Davis
On Tue, 2011-11-22 at 09:07 -0500, Robert Haas wrote:
 I honestly don't know what function names people will pick, and I
 don't care.  Someone might like singleton(x), which would be
 impractical as a built-in because there could be more than one range
 type over the same base type, but if the user defines the function
 they can pick what's convenient for them.  If they use singletons
 exceedingly frequently they might even want something really short,
 like just(x) or s(x).  Or they might say daterange1(x), along the
 lines you suggested earlier.

For that matter, they might pick daterange(x), as I picked earlier, and
run into the same problems.

It's a little strange that we allow people to define functions with one
argument and the same name as a type if such functions are confusing.

This isn't intended as an argument in either direction, just an
observation.

Regards,
Jeff Davis


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] strange nbtree corruption report

2011-11-22 Thread Alvaro Herrera

Excerpts from Tom Lane's message of mar nov 22 01:14:33 -0300 2011:
 Alvaro Herrera alvhe...@alvh.no-ip.org writes:

  ERROR:  left link changed unexpectedly in block 3378 of index index_name 
  CONTEXT:  automatic vacuum of table table_name
 
  This was reported not once, but several dozens of times, by each new
  autovacuum worker that tried to vacuum the table.
 
  As far as I can see, there is just no way for this to happen ... much
  less happen repeatedly.
 
 It's not hard to believe that that would happen repeatedly given a
 corrupted set of sibling links, eg deletable page A links left to page
 B, which links right to C, which links right to A.  The question is how
 the index got into such a state.  A dropped update during a page split
 would explain it (ie, B used to be A's left sibling, then at some point
 B got split into B and C, but A's left-link never got updated on disk).
 I wonder how reliable their disk+filesystem is ...

Well, there are no other signs of random data corruption, such as toast
pointers getting corrupted which is the number one symptom showing up
when underlying storage is flaky.  However, it may be possible that
there was a transient storage problem which only affected this one page;
if this persisted in the way you describe, it might well explain these
symptoms.

Another thing I noticed is that there was corruption in heap pages (not
the same server, though; it was a different Londiste slave).  This was
even more strange; the pages would be completely fine, except the first
six words corresponding to the page header; they would be all zeros.
When filled with valid-looking data (mostly I copied the bytes from
neighbor pages), the rest of the page would decode fine.

-- 
Álvaro Herrera alvhe...@commandprompt.com
The PostgreSQL Company - Command Prompt, Inc.
PostgreSQL Replication, Consulting, Custom Development, 24x7 support

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Singleton range constructors versus functional coercion notation

2011-11-22 Thread Tom Lane
Jeff Davis pg...@j-davis.com writes:
 It's a little strange that we allow people to define functions with one
 argument and the same name as a type if such functions are confusing.

As long as your mental model is that such a function is a cast,
everything is fine.  The trouble with the range constructors is that
they aren't really casts, as shown by the fact that when you read
textrange('foo')
you expect 'foo' to be text and not textrange.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] testing ProcArrayLock patches

2011-11-22 Thread Pavan Deolasee
On Tue, Nov 22, 2011 at 4:40 AM, Kevin Grittner
kevin.gritt...@wicourts.gov wrote:
 Pavan Deolasee pavan.deola...@gmail.com wrote:

 It will be a great help if you could spare few minutes to also
 test the patch to take out the frequently accessed PGPROC members
 to a different array. We are seeing good improvements on HPUX IA
 platform and the AMD Opteron and it will be interesting to know
 what happens on the Intel platform too.

 For a read only comparison (which was run using the simple
 protocol), using identical settings to the previous master run, but
 with the PGPROC split patch:

 m32 tps = 201738.209348 (including connections establishing)
 p32 tps = 201620.966988 (including connections establishing)

 m128 tps = 352159.631878 (including connections establishing)
 p128 tps = 363998.703900 (including connections establishing)

 Clearly a win at 128 clients; not at 32.

 For updates:

 sm32 tps = 27392.393850 (including connections establishing)
 sp32 tps = 27995.784333 (including connections establishing)

 sm128 tps = 22261.902571 (including connections establishing)
 sp128 tps = 23690.408272 (including connections establishing)

 pm32 tps = 34983.352396 (including connections establishing)
 pp32 tps = 36076.373389 (including connections establishing)

 pm128 tps = 24164.441954 (including connections establishing)
 pp128 tps = 27070.824588 (including connections establishing)

 That's a pretty decisive win all around.


Thanks for running those tests. The numbers are not that bad, but
definitely not as good as we saw on some other platforms. But its
possible that they may improve in percentage terms with even more
number of clients on this box. And given that we are seeing big gains
on other platforms, hopefully it will give us confident to proceed
with the patch.

Thanks,
Pavan

-- 
Pavan Deolasee
EnterpriseDB     http://www.enterprisedb.com

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] testing ProcArrayLock patches

2011-11-22 Thread Kevin Grittner
Pavan Deolasee pavan.deola...@gmail.com wrote:
 
 The numbers are not that bad, but definitely not as good as we saw
 on some other platforms.
 
Well, this machine is definitely designed to hold up under high
concurrency.  As I understand it, each core is the memory manager
for two 4GB DIMMs, with two channels to them, each with two buffers.
The way the cores are connected, a core never needs to go through
more than one other core to get to memory not directly managed, and
that uses snoop technology which hands the cached data right over
from one core to the other when possible, rather than making the
core which now owns the cache line pull it from RAM.  It seems the
2.6.32 kernel is able to manage that technology in a reasonable
fashion.
 
At first I was surprised to see performance top out on the update
tests between 80 and 96 clients.  But then, that lands almost
exactly where my old reliable ((2 * core count) + effective spindle
count) would predict.  The SELECT only tests peaked at 64 clients,
but those were fully cached, so effective spindle count was zero,
again fitting the formula.  So these optimizations seem to me to
break down the barriers which had previously capped the number of
clients which could be handled, letting them peak at their natural
levels.
 
 But its possible that they may improve in percentage terms with
 even more number of clients on this box.
 
I think so; I think this box is just so scalable that at 128
clients we were just barely getting past the knee in the
performance graphs to where these patches help most.
 
-Kevin

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] vpath builds and verbose error messages

2011-11-22 Thread Peter Eisentraut
On fre, 2011-11-18 at 09:44 -0500, Tom Lane wrote:
 It wouldn't be that hard for elog.c to do strrchr(fname, '/') or
 something like that, but the idea that there are hundreds of full-path
 strings embedded in the executable is a bit annoying.  I guess we
 could
 hope that the compiler is bright enough to store it only once per
 source
 file, so the actual space requirement may not be all *that* bad. 

A look a the output of strings shows that the compiler does appear to
optimize this.  So your suggestion is probably the way to go.

One thing that isn't so nice about all this is that it embeds the
personal directory structure of the builder of the binary into the
shipped product.  But gcc's cpp doesn't like redefining __FILE__, so the
only way to get around that altogether would be to use something other
than __FILE__ and define that for all builds.  Might not be worth it.



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] WIP: index support for regexp search

2011-11-22 Thread Alexander Korotkov
Hackers,

WIP patch with index support for regexp search for pg_trgm contrib is
attached.
In spite of techniques which extracts continuous text parts from regexp,
this patch presents technique of automatum transformation. That allows more
comprehensive trigrams extraction.

A little example of possible perfomance benefit.

test=# explain analyze select * from words where s ~ 'a[bc]+[de]';
  QUERY PLAN



---
 Seq Scan on words  (cost=0.00..1703.11 rows=10 width=9) (actual
time=3.092..242.303 rows=662 loops=1)
   Filter: (s ~ 'a[bc]+[de]'::text)
   Rows Removed by Filter: 97907
 Total runtime: 243.213 ms
(4 rows)

test=# explain analyze select * from words where s ~ 'a[bc]+[de]';
 QUERY PLAN



 Bitmap Heap Scan on words  (cost=260.08..295.83 rows=10 width=9) (actual
time=4.166..7.506 rows=662 loops=1)
   Recheck Cond: (s ~ 'a[bc]+[de]'::text)
   Rows Removed by Index Recheck: 18
   -  Bitmap Index Scan on words_trgm_idx  (cost=0.00..260.07 rows=10
width=0) (actual time=4.076..4.076 rows=680 loops=1)
 Index Cond: (s ~ 'a[bc]+[de]'::text)
 Total runtime: 8.424 ms
(6 rows)

Current version of patch have some limitations:
1) Algorithm of logical expression extraction on trigrams have high
computational complexity. So, it can become really slow on regexp with many
branches. Probably, improvements of this algorithm is possible.
2) Surely, no perfomance benefit if no trigrams can be extracted from
regexp. It's inevitably.
3) Currently, only GIN index is supported. There are no serious problems,
GiST code for it just not written yet.
4) It appear to be some kind of problem to extract multibyte encoded
character from pg_wchar. I've posted question about it here:
http://archives.postgresql.org/pgsql-hackers/2011-11/msg01222.php
While I've hardcoded some dirty solution. So
PG_EUC_JP, PG_EUC_CN, PG_EUC_KR, PG_EUC_TW, PG_EUC_JIS_2004 are not
supported yet.

--
With best regards,
Alexander Korotkov.


trgm-regexp-0.1.patch.gz
Description: GNU Zip compressed data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] vpath builds and verbose error messages

2011-11-22 Thread Alvaro Herrera

Excerpts from Peter Eisentraut's message of mar nov 22 16:22:15 -0300 2011:

 One thing that isn't so nice about all this is that it embeds the
 personal directory structure of the builder of the binary into the
 shipped product.  But gcc's cpp doesn't like redefining __FILE__, so the
 only way to get around that altogether would be to use something other
 than __FILE__ and define that for all builds.  Might not be worth it.

This is similar to what's suggested here:
http://stackoverflow.com/questions/1591873/how-do-i-write-a-cpp-dir-macro-similar-to-file

-- 
Álvaro Herrera alvhe...@commandprompt.com
The PostgreSQL Company - Command Prompt, Inc.
PostgreSQL Replication, Consulting, Custom Development, 24x7 support

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] pg_upgrade relation OID mismatches

2011-11-22 Thread Peter Eisentraut
I thought these were fixed a while ago, but I'm still seeing these when
upgrading from master to self (using testing script sent in a while
ago).   This is completely reproducible.  What's happening?

...
Restoring user relation files
  /home/peter/devel/postgresql/git/postgresql/contrib/pg_upgra
Mismatch of relation OID in database regression: old OID 16701, new OID 16689
Failure, exiting



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] Not HOT enough

2011-11-22 Thread Simon Riggs
Some time ago, I changed GetOldestXmin() to ignore procs in other
databases resulting in a potentially later xmin.

GetSnapshotData() was not touched when that happened, even though the
comments say ...This is the same computation done by
GetOldestXmin(true, true). The transam/README file says it stronger
GetSnapshotData also performs an oldest-xmin calculation (which had
better
match GetOldestXmin's). Doh.

As a result, VACUUM ignores procs in other databases, whereas HOT does
not. That means we aren't cleaning up as much as we could do when
running multiple databases. If its OK for VACUUM, then it must be OK
for HOT cleanup also.

Attached patch ignores procs in other databases during
GetSnapshotData() when IsMVCCSnapshot(), using similar coding to
GetOldestXmin().

Any doubters?

I suggest this is backpatched a few releases.

-- 
 Simon Riggs   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


fix_getsnapshotdata.v1.patch
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] pg_upgrade relation OID mismatches

2011-11-22 Thread Bruce Momjian
Peter Eisentraut wrote:
 I thought these were fixed a while ago, but I'm still seeing these when
 upgrading from master to self (using testing script sent in a while
 ago).   This is completely reproducible.  What's happening?
 
 ...
 Restoring user relation files
   /home/peter/devel/postgresql/git/postgresql/contrib/pg_upgra
 Mismatch of relation OID in database regression: old OID 16701, new OID 
 16689
 Failure, exiting

Yes, I certainly thought they were all addressed.  What object is 16701
in the old database?  Anything unusual about it?  This is saying the
relation oid was not preserved.

-- 
  Bruce Momjian  br...@momjian.ushttp://momjian.us
  EnterpriseDB http://enterprisedb.com

  + It's impossible for everything to be true. +

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Review for Add permission check on SELECT INTO

2011-11-22 Thread Robert Haas
On Mon, Nov 21, 2011 at 6:38 AM, Albe Laurenz laurenz.a...@wien.gv.at wrote:
 Kohei KaiGai wrote:
 The attached patch is a revised version.
 It fixed up this bug, and revised test cases to ensure permission
 check error shall be raised due to the new table.

 Thanks.
 The second patch seems fine to me, I'll mark it ready for committer.

Committed.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Not HOT enough

2011-11-22 Thread Robert Haas
On Tue, Nov 22, 2011 at 3:23 PM, Simon Riggs si...@2ndquadrant.com wrote:
 Some time ago, I changed GetOldestXmin() to ignore procs in other
 databases resulting in a potentially later xmin.

 GetSnapshotData() was not touched when that happened, even though the
 comments say ...This is the same computation done by
 GetOldestXmin(true, true). The transam/README file says it stronger
 GetSnapshotData also performs an oldest-xmin calculation (which had
 better
 match GetOldestXmin's). Doh.

 As a result, VACUUM ignores procs in other databases, whereas HOT does
 not. That means we aren't cleaning up as much as we could do when
 running multiple databases. If its OK for VACUUM, then it must be OK
 for HOT cleanup also.

 Attached patch ignores procs in other databases during
 GetSnapshotData() when IsMVCCSnapshot(), using similar coding to
 GetOldestXmin().

 Any doubters?

I think this is unsafe for shared catalogs.

 I suggest this is backpatched a few releases.

-1.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] Optimize postgres protocol for fixed size arrays

2011-11-22 Thread Mikko Tiihonen

Hi,

During conversion of the jdbc driver to use binary encoding when receiving 
array objects from postgres it was noticed
that for example for int[] arrays the binary encoding is normally 30% to 200% 
larger in bytes than the text encoding.

This was of concern to some users with slower links to database. Even though 
the encoded size was larger the binary
encoding was still constantly faster (with 50% speed up for float[]).

Here is a patch that adds a new flag to the protocol that is set when all 
elements of the array are of same fixed size.
When the bit is set the 4 byte length is only sent once and not for each 
element. Another restriction is that the flag
can only be set when there are no NULLs in the array.

The postgres part is my first try at the problem and I really need some 
feedback around the detection of fixed size
elements. I just made a guess and noticed that typlen != 0 seemed to work for 
the basic types I user for testing.

First the postgres patch:

diff --git a/src/backend/utils/adt/arrayfuncs.c 
b/src/backend/utils/adt/arrayfuncs.c
index bfb6065..970272f 100644
--- a/src/backend/utils/adt/arrayfuncs.c
+++ b/src/backend/utils/adt/arrayfuncs.c
@@ -86,7 +86,7 @@ static void ReadArrayBinary(StringInfo buf, int nitems,
FmgrInfo *receiveproc, Oid typioparam, int32 
typmod,
int typlen, bool typbyval, char typalign,
Datum *values, bool *nulls,
-   bool *hasnulls, int32 *nbytes);
+   bool *hasnulls, int32 *nbytes, bool fixedlen);
 static void CopyArrayEls(ArrayType *array,
 Datum *values, bool *nulls, int nitems,
 int typlen, bool typbyval, char typalign,
@@ -1242,7 +1242,7 @@ array_recv(PG_FUNCTION_ARGS)
ndim, MAXDIM)));

flags = pq_getmsgint(buf, 4);
-   if (flags != 0  flags != 1)
+   if ((flags  ~3) != 0)
ereport(ERROR,
(errcode(ERRCODE_INVALID_BINARY_REPRESENTATION),
 errmsg(invalid array flags)));
@@ -1327,7 +1327,7 @@ array_recv(PG_FUNCTION_ARGS)
my_extra-proc, typioparam, typmod,
typlen, typbyval, typalign,
dataPtr, nullsPtr,
-   hasnulls, nbytes);
+   hasnulls, nbytes, (flags  2) != 0);
if (hasnulls)
{
dataoffset = ARR_OVERHEAD_WITHNULLS(ndim, nitems);
@@ -1390,7 +1390,8 @@ ReadArrayBinary(StringInfo buf,
Datum *values,
bool *nulls,
bool *hasnulls,
-   int32 *nbytes)
+   int32 *nbytes,
+   bool fixedlen)
 {
int i;
boolhasnull;
@@ -1403,7 +1404,7 @@ ReadArrayBinary(StringInfo buf,
charcsave;

/* Get and check the item length */
-   itemlen = pq_getmsgint(buf, 4);
+   itemlen = fixedlen ? typlen : pq_getmsgint(buf, 4);
if (itemlen  -1 || itemlen  (buf-len - buf-cursor))
ereport(ERROR,

(errcode(ERRCODE_INVALID_BINARY_REPRESENTATION),
@@ -1496,6 +1497,7 @@ array_send(PG_FUNCTION_ARGS)
int bitmask;
int nitems,
i;
+   int flags;
int ndim,
   *dim;
StringInfoData buf;
@@ -1535,6 +1537,8 @@ array_send(PG_FUNCTION_ARGS)
typbyval = my_extra-typbyval;
typalign = my_extra-typalign;

+   flags = ARR_HASNULL(v) ? 1 : (typlen  0 ? 2 : 0);
+
ndim = ARR_NDIM(v);
dim = ARR_DIMS(v);
nitems = ArrayGetNItems(ndim, dim);
@@ -1543,7 +1547,7 @@ array_send(PG_FUNCTION_ARGS)

/* Send the array header information */
pq_sendint(buf, ndim, 4);
-   pq_sendint(buf, ARR_HASNULL(v) ? 1 : 0, 4);
+   pq_sendint(buf, flags, 4);
pq_sendint(buf, element_type, sizeof(Oid));
for (i = 0; i  ndim; i++)
{
@@ -1571,7 +1575,8 @@ array_send(PG_FUNCTION_ARGS)

itemvalue = fetch_att(p, typbyval, typlen);
outputbytes = SendFunctionCall(my_extra-proc, 
itemvalue);
-   pq_sendint(buf, VARSIZE(outputbytes) - VARHDRSZ, 4);
+   if ((flags  2) == 0)
+   pq_sendint(buf, VARSIZE(outputbytes) - 
VARHDRSZ, 4);
pq_sendbytes(buf, VARDATA(outputbytes),
 

Re: [HACKERS] pg_upgrade relation OID mismatches

2011-11-22 Thread Peter Eisentraut
On tis, 2011-11-22 at 15:42 -0500, Bruce Momjian wrote:
 Peter Eisentraut wrote:
  I thought these were fixed a while ago, but I'm still seeing these when
  upgrading from master to self (using testing script sent in a while
  ago).   This is completely reproducible.  What's happening?
  
  ...
  Restoring user relation files
/home/peter/devel/postgresql/git/postgresql/contrib/pg_upgra
  Mismatch of relation OID in database regression: old OID 16701, new OID 
  16689
  Failure, exiting
 
 Yes, I certainly thought they were all addressed.  What object is 16701
 in the old database?  Anything unusual about it?  This is saying the
 relation oid was not preserved.

It came in with the range types feature:

+ psql -d regression -x -c 'select * from pg_class where oid = 16701'
-[ RECORD 1 ]--+
relname| test_range_gist
relnamespace   | 2200
reltype| 16703
reloftype  | 0
relowner   | 10
relam  | 0
relfilenode| 16701
reltablespace  | 0
relpages   | 33
reltuples  | 6200
relallvisible  | 33
reltoastrelid  | 16704
reltoastidxid  | 0
relhasindex| t
relisshared| f
relpersistence | p
relkind| r
relnatts   | 1
relchecks  | 0
relhasoids | f
relhaspkey | f
relhasrules| f
relhastriggers | f
relhassubclass | f
relfrozenxid   | 1627
relacl | 
reloptions | 



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Optimize postgres protocol for fixed size arrays

2011-11-22 Thread k...@rice.edu
On Tue, Nov 22, 2011 at 11:47:22PM +0200, Mikko Tiihonen wrote:
 Hi,
 
 During conversion of the jdbc driver to use binary encoding when receiving 
 array objects from postgres it was noticed
 that for example for int[] arrays the binary encoding is normally 30% to 200% 
 larger in bytes than the text encoding.
 
 This was of concern to some users with slower links to database. Even though 
 the encoded size was larger the binary
 encoding was still constantly faster (with 50% speed up for float[]).
 
 Here is a patch that adds a new flag to the protocol that is set when all 
 elements of the array are of same fixed size.
 When the bit is set the 4 byte length is only sent once and not for each 
 element. Another restriction is that the flag
 can only be set when there are no NULLs in the array.
 

Cool. This would be very useful with the DSPAM binary array driver. Although
the binary is shorter because the values are 8 byte integers, they would be
much shorter without the redundant sizing information. Barring issues:

+1

Regards,
Ken

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] FlexLocks

2011-11-22 Thread Kevin Grittner
Robert Haas robertmh...@gmail.com wrote:
 
 Updated patches attached.
 
I've gotten through several days of performance tests for this pair
of related patches, with results posted on a separate thread.  I'll
link those in to the CF application shortly.  To summarize the other
(fairly long) thread on benchmarks, it seemed like there might be a
very slight slowdown at low concurrency, but it could be the random
alignment of code with and without the patch; it was a small enough
fraction of a percent to be negligible, in my opinion.  At higher
concurrency levels the patch showed significant performance
improvements.  Besides the overall improvement in the median tps
numbers of several percent, there was significant mitigation of the
performance collapse phenomenon, where some runs were much slower
than others.  It seems a clear win in terms of performance.
 
I've gotten through code review of the flexlock-v2.patch, and have
decided to post on that before I go through the
procarraylock-v1.patch code.
 
Not surprisingly, this patch was in good form and applied cleanly. 
There were doc changes, and I can't see where any changes to the
tests are required.  I liked the structure, and only found a few
nit-picky things to point out:
 
I didn't see why num_held_flexlocks and held_flexlocks had the
static keyword removed from their declarations.  
 
FlexLockRemember() seems to have a pasto for a comment.  Maybe
change to something like: Add lock to list of locks held by this
backend.
 
In procarraylock.c there is this:
 
/* If there are no lockers, clar the critical PGPROC fields. */
 
s/clar/clear/
 
I have to admit I don't have my head around the extraWaits issue, so
I can't personally vouch for that code, although I have no reason to
doubt it, either.  Everything else was something that I at least
*think* I understand, and it looked good to me.
 
I'm not changing the status until I get through the other patch
file, which should be tomorrow.
 
-Kevin

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Optimize postgres protocol for fixed size arrays

2011-11-22 Thread Merlin Moncure
On Tue, Nov 22, 2011 at 3:47 PM, Mikko Tiihonen
mikko.tiiho...@nitorcreations.com wrote:
 Hi,

 During conversion of the jdbc driver to use binary encoding when receiving
 array objects from postgres it was noticed
 that for example for int[] arrays the binary encoding is normally 30% to
 200% larger in bytes than the text encoding.

 This was of concern to some users with slower links to database. Even though
 the encoded size was larger the binary
 encoding was still constantly faster (with 50% speed up for float[]).

 Here is a patch that adds a new flag to the protocol that is set when all
 elements of the array are of same fixed size.
 When the bit is set the 4 byte length is only sent once and not for each
 element. Another restriction is that the flag
 can only be set when there are no NULLs in the array.

 The postgres part is my first try at the problem and I really need some
 feedback around the detection of fixed size
 elements. I just made a guess and noticed that typlen != 0 seemed to work
 for the basic types I user for testing.

 First the postgres patch:

 diff --git a/src/backend/utils/adt/arrayfuncs.c
 b/src/backend/utils/adt/arrayfuncs.c
 index bfb6065..970272f 100644
 --- a/src/backend/utils/adt/arrayfuncs.c
 +++ b/src/backend/utils/adt/arrayfuncs.c
 @@ -86,7 +86,7 @@ static void ReadArrayBinary(StringInfo buf, int nitems,
                                FmgrInfo *receiveproc, Oid typioparam, int32
 typmod,
                                int typlen, bool typbyval, char typalign,
                                Datum *values, bool *nulls,
 -                               bool *hasnulls, int32 *nbytes);
 +                               bool *hasnulls, int32 *nbytes, bool
 fixedlen);
  static void CopyArrayEls(ArrayType *array,
                         Datum *values, bool *nulls, int nitems,
                         int typlen, bool typbyval, char typalign,
 @@ -1242,7 +1242,7 @@ array_recv(PG_FUNCTION_ARGS)
                                                ndim, MAXDIM)));

        flags = pq_getmsgint(buf, 4);
 -       if (flags != 0  flags != 1)
 +       if ((flags  ~3) != 0)
                ereport(ERROR,

  (errcode(ERRCODE_INVALID_BINARY_REPRESENTATION),
                                 errmsg(invalid array flags)));
 @@ -1327,7 +1327,7 @@ array_recv(PG_FUNCTION_ARGS)
                                        my_extra-proc, typioparam, typmod,
                                        typlen, typbyval, typalign,
                                        dataPtr, nullsPtr,
 -                                       hasnulls, nbytes);
 +                                       hasnulls, nbytes, (flags  2) !=
 0);
        if (hasnulls)
        {
                dataoffset = ARR_OVERHEAD_WITHNULLS(ndim, nitems);
 @@ -1390,7 +1390,8 @@ ReadArrayBinary(StringInfo buf,
                                Datum *values,
                                bool *nulls,
                                bool *hasnulls,
 -                               int32 *nbytes)
 +                               int32 *nbytes,
 +                               bool fixedlen)
  {
        int                     i;
        bool            hasnull;
 @@ -1403,7 +1404,7 @@ ReadArrayBinary(StringInfo buf,
                char            csave;

                /* Get and check the item length */
 -               itemlen = pq_getmsgint(buf, 4);
 +               itemlen = fixedlen ? typlen : pq_getmsgint(buf, 4);
                if (itemlen  -1 || itemlen  (buf-len - buf-cursor))
                        ereport(ERROR,

  (errcode(ERRCODE_INVALID_BINARY_REPRESENTATION),
 @@ -1496,6 +1497,7 @@ array_send(PG_FUNCTION_ARGS)
        int                     bitmask;
        int                     nitems,
                                i;
 +       int                     flags;
        int                     ndim,
                           *dim;
        StringInfoData buf;
 @@ -1535,6 +1537,8 @@ array_send(PG_FUNCTION_ARGS)
        typbyval = my_extra-typbyval;
        typalign = my_extra-typalign;

 +       flags = ARR_HASNULL(v) ? 1 : (typlen  0 ? 2 : 0);
 +
        ndim = ARR_NDIM(v);
        dim = ARR_DIMS(v);
        nitems = ArrayGetNItems(ndim, dim);
 @@ -1543,7 +1547,7 @@ array_send(PG_FUNCTION_ARGS)

        /* Send the array header information */
        pq_sendint(buf, ndim, 4);
 -       pq_sendint(buf, ARR_HASNULL(v) ? 1 : 0, 4);
 +       pq_sendint(buf, flags, 4);
        pq_sendint(buf, element_type, sizeof(Oid));
        for (i = 0; i  ndim; i++)
        {
 @@ -1571,7 +1575,8 @@ array_send(PG_FUNCTION_ARGS)

                        itemvalue = fetch_att(p, typbyval, typlen);
                        outputbytes = SendFunctionCall(my_extra-proc,
 itemvalue);
 -                       pq_sendint(buf, VARSIZE(outputbytes) - VARHDRSZ,
 4);
 +                       if ((flags  2) == 0)
 +                               pq_sendint(buf, VARSIZE(outputbytes) -
 VARHDRSZ, 4);
                        pq_sendbytes(buf, 

Re: [HACKERS] pg_upgrade relation OID mismatches

2011-11-22 Thread Bruce Momjian
Peter Eisentraut wrote:
 On tis, 2011-11-22 at 15:42 -0500, Bruce Momjian wrote:
  Peter Eisentraut wrote:
   I thought these were fixed a while ago, but I'm still seeing these when
   upgrading from master to self (using testing script sent in a while
   ago).   This is completely reproducible.  What's happening?
   
   ...
   Restoring user relation files
 /home/peter/devel/postgresql/git/postgresql/contrib/pg_upgra
   Mismatch of relation OID in database regression: old OID 16701, new OID 
   16689
   Failure, exiting
  
  Yes, I certainly thought they were all addressed.  What object is 16701
  in the old database?  Anything unusual about it?  This is saying the
  relation oid was not preserved.
 
 It came in with the range types feature:
 
 + psql -d regression -x -c 'select * from pg_class where oid = 16701'
 -[ RECORD 1 ]--+
 relname| test_range_gist
 relnamespace   | 2200
 reltype| 16703
 reloftype  | 0
 relowner   | 10
 relam  | 0
 relfilenode| 16701
 reltablespace  | 0
 relpages   | 33
 reltuples  | 6200
 relallvisible  | 33
 reltoastrelid  | 16704
 reltoastidxid  | 0
 relhasindex| t
 relisshared| f
 relpersistence | p
 relkind| r
 relnatts   | 1
 relchecks  | 0
 relhasoids | f
 relhaspkey | f
 relhasrules| f
 relhastriggers | f
 relhassubclass | f
 relfrozenxid   | 1627
 relacl | 
 reloptions | 

OK, that is a heap table.  My only guess is that the heap is being
created without binary_upgrade_next_heap_pg_class_oid being set.
Looking at the code, I can't see how the heap could be created without
this happening.  Another idea is that pg_dumpall isn't output the proper
value, but again, how is this data type different from the others.

I will try to research this further.

-- 
  Bruce Momjian  br...@momjian.ushttp://momjian.us
  EnterpriseDB http://enterprisedb.com

  + It's impossible for everything to be true. +

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] Permissions checks for range-type support functions

2011-11-22 Thread Tom Lane
It strikes me that we are rather lacking in $SUBJECT.  There are a
couple of distinct issues:

1. Since no permissions checks are applied at runtime, a nefarious
person could bypass ACL_EXECUTE checks for any function that happens
to match the signature for a subtype_diff function.  Just create a
range type that specifies that function as subtype_diff, and think
up an index operation that will call it with the arguments you want.
In principle one could bypass permissions checks on canonical functions
as well, though in practice I think that's uninteresting because of the
restrictions on the function signature.

In the analogous situation for base types, we don't worry about this
because only superusers can define base types; we just presume (and
document) that the superuser is effectively granting public access
permissions on all functions referenced in a base type definition.
This will clearly not fly for range types.

The nearest analogy I can see for non-superusers is use of functions
within aggregate definitions.  There, we check that the aggregate
definer has permission to call the referenced function at aggregate
definition time, and then recheck it at the start of any query that
uses the aggregate.  (So, the aggregate definer can give away permission
to call the target function, but he could do that anyway by creating
a SECURITY DEFINER wrapper function.)

For the range-type support functions, it would be simple enough to check
call permission at range-type definition time.  But I really don't want
to put in a run-time check, because there doesn't seem to be a very
practical way to do it less often than every call, and besides that it's
not very clear who to blame an index operation on.  Is it good enough to
test this only at range-type creation time?  The implication of course is
that you might not be able to revoke execute permission from a bad guy,
short of dropping your function.  This might be all right given the
relatively narrow cross-section of functions that could be called this
way.

2. The ANALYZE option is flat out dangerous, because it allows any
function with the signature f(internal) returns bool to be called as
though it's a typanalyze function.  There are a couple of such functions
in the catalogs already, and either of them will probably crash the
backend if invoked as typanalyze on a range column.

Again, this isn't a hazard for the existing use with base types,
because only superusers (presumed to be responsible adults) can
define base types.  But for range types it's an easy DOS attack for
less responsible persons.

I'm inclined to think that the right solution for this one is to drop
the ANALYZE option altogether, and just have DefineRange automatically
install a system-wide typanalyze function for ranges.  Under what
circumstances is range-type-specific knowledge going to be needed for
typanalyze, anyway?  Especially since the functions that would use the
results will be system-wide selectivity functions associated with the
ANYRANGE operators.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Not HOT enough

2011-11-22 Thread Simon Riggs
On Tue, Nov 22, 2011 at 9:21 PM, Robert Haas robertmh...@gmail.com wrote:
 On Tue, Nov 22, 2011 at 3:23 PM, Simon Riggs si...@2ndquadrant.com wrote:
 Some time ago, I changed GetOldestXmin() to ignore procs in other
 databases resulting in a potentially later xmin.

 GetSnapshotData() was not touched when that happened, even though the
 comments say ...This is the same computation done by
 GetOldestXmin(true, true). The transam/README file says it stronger
 GetSnapshotData also performs an oldest-xmin calculation (which had
 better
 match GetOldestXmin's). Doh.

 As a result, VACUUM ignores procs in other databases, whereas HOT does
 not. That means we aren't cleaning up as much as we could do when
 running multiple databases. If its OK for VACUUM, then it must be OK
 for HOT cleanup also.

 Attached patch ignores procs in other databases during
 GetSnapshotData() when IsMVCCSnapshot(), using similar coding to
 GetOldestXmin().

 Any doubters?

 I think this is unsafe for shared catalogs.

I think so too. Thats why it uses IsMVCCSnapshot() to confirm when it
is safe to do so.

-- 
 Simon Riggs   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [JDBC] Optimize postgres protocol for fixed size arrays

2011-11-22 Thread Oliver Jowett
On 23 November 2011 10:47, Mikko Tiihonen
mikko.tiiho...@nitorcreations.com wrote:

 Here is a patch that adds a new flag to the protocol that is set when all
 elements of the array are of same fixed size.
 When the bit is set the 4 byte length is only sent once and not for each
 element. Another restriction is that the flag
 can only be set when there are no NULLs in the array.

How does a client detect that this feature is supported?

At a glance the JDBC patch doesn't use it on the send path, but
presumably clients could use this when sending binary-format arrays to
the server - but only if the server understood the format.

(Ideally a pair of settings would be useful here - one to say the
server understands the new format and another the client sets to say
please use the new format that defaults to off - then you could
avoid confusing old clients, too)

Oliver

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [JDBC] Optimize postgres protocol for fixed size arrays

2011-11-22 Thread Tom Lane
Oliver Jowett oli...@opencloud.com writes:
 On 23 November 2011 10:47, Mikko Tiihonen
 mikko.tiiho...@nitorcreations.com wrote:
 Here is a patch that adds a new flag to the protocol that is set when all
 elements of the array are of same fixed size.

 How does a client detect that this feature is supported?

The only way that anything like this will go in is as part of a protocol
version bump, so discoverability would reduce to knowing which protocol
you're using.  We should file this away as one of the things we might
want to do whenever there's sufficient critical mass for a new wire
protocol version.

Note that COPY BINARY files would be affected too, and so we'd want to
make sure that this sort of change is recognizable from a binary file's
header.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] vpath builds and verbose error messages

2011-11-22 Thread Tom Lane
Peter Eisentraut pete...@gmx.net writes:
 One thing that isn't so nice about all this is that it embeds the
 personal directory structure of the builder of the binary into the
 shipped product.  But gcc's cpp doesn't like redefining __FILE__, so the
 only way to get around that altogether would be to use something other
 than __FILE__ and define that for all builds.  Might not be worth it.

Well, if you have a problem with that, don't use a vpath build.  I don't
think it's our job to work around gcc behaviors that someone else might
feel to be security issues --- they should take that up with the gcc
maintainers.  To me, the only argument for doing anything about this at
all is that we'd like Postgres' behavior (in terms of what it prints in
error messages) to be consistent across different build scenarios.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Not HOT enough

2011-11-22 Thread Simon Riggs
On Tue, Nov 22, 2011 at 11:40 PM, Simon Riggs si...@2ndquadrant.com wrote:

 I think this is unsafe for shared catalogs.

 I think so too. Thats why it uses IsMVCCSnapshot() to confirm when it
 is safe to do so.


Ah, you mean access to shared catalogs using MVCC snapshots.

Fixed.

-- 
 Simon Riggs   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


fix_getsnapshotdata.v2.patch
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Inlining comparators as a performance optimisation

2011-11-22 Thread Peter Geoghegan
On 21 November 2011 22:55, Robert Haas robertmh...@gmail.com wrote:
 Results on my machine, for what they're worth:

I was curious as to how much of an improvement I'd made to sorting in
isolation. I've adding simple sort timing instrumentation to the code
in a revised patch, attached, for the convenience of reviewers.

This patch adds an int8 specialisation, but it also supports fast path
sorting of timestamps by using that same int8 specialisation (for
HAVE_INT64_TIMESTAMP builds at least - didn't know if it was worth
handling the NaN crud for the other case). ISTM that various different
types with identical internal representations can share
specialisations like this, which I hope will alleviate some earlier
concerns about both the limited applicability of this optimisation and
the possible proliferation of specialisations.

Here's some raw qsort figures in seconds for the query select * from
orderlines order by test (where test is a timestamptz column I added
and updated with some data, executing repeatedly on the same machine
as before):

Before:

DEBUG:  elapsed: 0.031738
DEBUG:  elapsed: 0.031595
DEBUG:  elapsed: 0.031502
DEBUG:  elapsed: 0.031378
DEBUG:  elapsed: 0.031359
DEBUG:  elapsed: 0.031413
DEBUG:  elapsed: 0.031499
DEBUG:  elapsed: 0.031394
DEBUG:  elapsed: 0.031368

After:

DEBUG:  elapsed: 0.013341
DEBUG:  elapsed: 0.014608
DEBUG:  elapsed: 0.013809
DEBUG:  elapsed: 0.013244
DEBUG:  elapsed: 0.014307
DEBUG:  elapsed: 0.013207
DEBUG:  elapsed: 0.013227
DEBUG:  elapsed: 0.013264
DEBUG:  elapsed: 0.013143
DEBUG:  elapsed: 0.013455
DEBUG:  elapsed: 0.013447

I wonder, is it worth qualifying that the Sort Method was a
quicksort (fast path) sort within explain analyze output? This is a
rather large improvement, so It might actually be something that
people look out for, as it might be tricky to remember the exact
circumstances under which the optimisation kicks in by the time we're
done here.

I haven't had as much time as I'd like to polish this patch, or to get
clearer answers. I expect to spend more time on it over the weekend.

-- 
Peter Geoghegan       http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training and Services
From 687779e799eb0d0064f86192cc74ea473ff9a607 Mon Sep 17 00:00:00 2001
From: peter2ndQuadrant pe...@2ndquadrant.com
Date: Sun, 20 Nov 2011 20:36:25 +
Subject: [PATCH] Initial commit of optimization

Stop directly using oid

Added int8 quicksort fast path specialisation, which can also be used in
place of F_TIMESTAMP_CMP for HAVE_INT64_TIMESTAMP builds.

Rebased, revised patch for -hackers, with timestamp and int8 fast path
sorting using the same int8 specialization.

Remove unneeded line

Rebase for -hackers
---
 src/backend/utils/sort/tuplesort.c |  105 +++-
 src/include/utils/template_qsort_arg.h |  217 
 2 files changed, 316 insertions(+), 6 deletions(-)
 create mode 100644 src/include/utils/template_qsort_arg.h

diff --git a/src/backend/utils/sort/tuplesort.c b/src/backend/utils/sort/tuplesort.c
index 3505236..4962973 100644
--- a/src/backend/utils/sort/tuplesort.c
+++ b/src/backend/utils/sort/tuplesort.c
@@ -107,11 +107,13 @@
 #include miscadmin.h
 #include pg_trace.h
 #include utils/datum.h
+#include utils/fmgroids.h
 #include utils/logtape.h
 #include utils/lsyscache.h
 #include utils/memutils.h
 #include utils/pg_rusage.h
 #include utils/rel.h
+#include utils/template_qsort_arg.h
 #include utils/tuplesort.h
 
 
@@ -1225,12 +1227,60 @@ puttuple_common(Tuplesortstate *state, SortTuple *tuple)
 }
 
 /*
- * All tuples have been provided; finish the sort.
+ * Manufacture type specific sorting specialisations
+ * with inline comparators
+ */
+static inline int
+compare_int4(Datum first, Datum second)
+{
+	int32		first_int = DatumGetInt32(first);
+	int32		second_int = DatumGetInt32(second);
+
+	if (first_int  second_int)
+		return 1;
+	else if (first_int == second_int)
+		return 0;
+	else
+		return -1;
+}
+
+static inline int
+compare_int8(Datum first, Datum second)
+{
+	int64		first_int = DatumGetInt64(first);
+	int64		second_int = DatumGetInt64(second);
+
+	if (first_int  second_int)
+		return 1;
+	else if (first_int == second_int)
+		return 0;
+	else
+		return -1;
+}
+
+TEMPLATE_QSORT_ARG(int4, compare_int4);
+TEMPLATE_QSORT_ARG(int8, compare_int8);
+
+double timeval_subtract(struct timeval *x, struct timeval *y);
+double timeval_subtract(struct timeval *x, struct timeval *y)
+{
+	struct timeval result;
+	/* Compute the time remaining to wait. tv_usec is certainly positive. */
+	result.tv_sec = x-tv_sec - y-tv_sec;
+	result.tv_usec = x-tv_usec - y-tv_usec;
+
+	/* return difference in seconds */
+	return result.tv_sec + ((double) result.tv_usec / 100);
+}
+
+/*
+ * All tuples have been provided; perform the sort.
  */
 void
 tuplesort_performsort(Tuplesortstate *state)
 {
 	MemoryContext oldcontext = MemoryContextSwitchTo(state-sortcontext);
+	struct timeval begin, end;
 
 #ifdef TRACE_SORT
 	if 

Re: [HACKERS] range_adjacent and discrete ranges

2011-11-22 Thread Tom Lane
Florian Pflug f...@phlo.org writes:
 More formally, let there be two arbitrary ranges
   a,b,i_a,i_b
   c,d,i_c,i_d
 where the first two parameters are the lower respectively upper bound, and
 the last two are booleans saying whether the lower respectively upper bound
 is inclusive (true) or exclusive (false).

 These ranges are then adjacent exactly if the range
   b,c,!i_b,!i_c
 is empty.

I tried to implement this, and I think it has a small bug.  It works as
stated if we have b  c.  However, if we have b == c, then we want to
consider the ranges adjacent if i_b != i_c (ie, only one of them claims
the common boundary point).  But a singleton range is empty unless it's
inclusive on both sides.  So we have to have a special case when the
bounds are equal.

(If b  c, then of course we have to consider the two ranges in the
opposite order.)

Attached is a draft patch for this.  It passes regression tests but I've
not tried to exercise it with a canonical function that actually does
something different.  It's going to be a bit slower than Jeff's
original, because it does not only range_cmp_bound_values but also a
make_range cycle (in most cases).  So I guess the question is how much
we care about supporting canonical functions with non-default policies.
Thoughts?

regards, tom lane

diff --git a/src/backend/utils/adt/rangetypes.c b/src/backend/utils/adt/rangetypes.c
index 3326cb17c895273fd01c4eda5eb0d65a521d0168..890b6850cb6f5611e9afa922b85b0c64125f31bb 100644
*** a/src/backend/utils/adt/rangetypes.c
--- b/src/backend/utils/adt/rangetypes.c
*** range_adjacent(PG_FUNCTION_ARGS)
*** 699,704 
--- 699,706 
  upper2;
  	bool		empty1,
  empty2;
+ 	RangeType  *r3;
+ 	int			cmp;
  
  	/* Different types should be prevented by ANYRANGE matching rules */
  	if (RangeTypeGetOid(r1) != RangeTypeGetOid(r2))
*** range_adjacent(PG_FUNCTION_ARGS)
*** 714,736 
  		PG_RETURN_BOOL(false);
  
  	/*
! 	 * For two ranges to be adjacent, the lower boundary of one range has to
! 	 * match the upper boundary of the other. However, the inclusivity of
! 	 * those two boundaries must also be different.
! 	 *
! 	 * The semantics for range_cmp_bounds aren't quite what we need here, so
! 	 * we do the comparison more directly.
  	 */
! 	if (lower1.inclusive != upper2.inclusive)
  	{
! 		if (range_cmp_bound_values(typcache, lower1, upper2) == 0)
! 			PG_RETURN_BOOL(true);
  	}
! 
! 	if (upper1.inclusive != lower2.inclusive)
  	{
! 		if (range_cmp_bound_values(typcache, upper1, lower2) == 0)
! 			PG_RETURN_BOOL(true);
  	}
  
  	PG_RETURN_BOOL(false);
--- 716,758 
  		PG_RETURN_BOOL(false);
  
  	/*
! 	 * Given two ranges A..B and C..D, where B  C, the ranges are adjacent
! 	 * if and only if the range B..C is empty, where inclusivity of these two
! 	 * bounds is inverted compared to the original bounds.  For discrete
! 	 * ranges, we have to rely on the canonicalization function to normalize
! 	 * B..C to empty if it contains no elements of the subtype.  If B == C,
! 	 * the ranges are adjacent only if these bounds have different inclusive
! 	 * flags (i.e., exactly one of the ranges includes the common boundary).
! 	 * And if B  C then the ranges cannot be adjacent in this order, but we
! 	 * must consider the other order (i.e., check D = A).
  	 */
! 	cmp = range_cmp_bound_values(typcache, upper1, lower2);
! 	if (cmp  0)
  	{
! 		upper1.inclusive = !upper1.inclusive;
! 		upper1.lower = true;
! 		lower2.inclusive = !lower2.inclusive;
! 		lower2.lower = false;
! 		r3 = make_range(typcache, upper1, lower2, false);
! 		PG_RETURN_BOOL(RangeIsEmpty(r3));
  	}
! 	if (cmp == 0)
  	{
! 		PG_RETURN_BOOL(upper1.inclusive != lower2.inclusive);
! 	}
! 	cmp = range_cmp_bound_values(typcache, upper2, lower1);
! 	if (cmp  0)
! 	{
! 		upper2.inclusive = !upper2.inclusive;
! 		upper2.lower = true;
! 		lower1.inclusive = !lower1.inclusive;
! 		lower1.lower = false;
! 		r3 = make_range(typcache, upper2, lower1, false);
! 		PG_RETURN_BOOL(RangeIsEmpty(r3));
! 	}
! 	if (cmp == 0)
! 	{
! 		PG_RETURN_BOOL(upper2.inclusive != lower1.inclusive);
  	}
  
  	PG_RETURN_BOOL(false);

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Permissions checks for range-type support functions

2011-11-22 Thread Tom Lane
I wrote:
 For the range-type support functions, it would be simple enough to check
 call permission at range-type definition time.  But I really don't want
 to put in a run-time check, because there doesn't seem to be a very
 practical way to do it less often than every call, and besides that it's
 not very clear who to blame an index operation on.  Is it good enough to
 test this only at range-type creation time?  The implication of course is
 that you might not be able to revoke execute permission from a bad guy,
 short of dropping your function.  This might be all right given the
 relatively narrow cross-section of functions that could be called this
 way.

On further reflection I think we can get away with this, because of the
additional check that's already in there that insists the support
functions be IMMUTABLE.  That means they can't have any side-effects,
which makes the potential security consequences of unauthorized calls
very minimal.  It's conceivable that the return value of a misused
subtype_diff function could be interesting (think some sort of
decryption function); but the system doesn't offer any way for a user
to see that return value.  It will only factor into GiST index page
split decisions, and in a pretty indirect way at that.  So I don't see
any interesting security risk for a misused subtype_diff function;
and there's not likely to be any meaningful hole for abusing canonical
functions either, due to the restrictions on argument/result datatype.

I still think we ought to add a creation-time permission check, just
for pro-forma purposes.

 2. The ANALYZE option is flat out dangerous,

But this is still true.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] dblink: enable parameters

2011-11-22 Thread Pavel Stehule
 On 22.11.2011 06:55, Pavel Stehule wrote:
 I know so dblink is deprecated interface - but it has necessary
 functionality still -  it support a writable statements.

 So, dblink on FDW connection seems to be a possible solution.
 We pass a query as a form of a plain text.

 --

it is interesting idea

Pavel

 Itagaki Takahiro


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Writeable FDWs?

2011-11-22 Thread Pavel Stehule
Hello

2011/11/21 Josh Berkus j...@agliodbs.com:
 All,

 When I present to people about the features of 9.1, one of the most
 frequent questions is when we will get writeable FDWs for data sources
 where it is appropriate.   Is anyone working on this?

just note - it is relative complex feature - and need 2PC transaction
coordinator

Regards

Pavel


 --
 Josh Berkus
 PostgreSQL Experts Inc.
 http://pgexperts.com

 --
 Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
 To make changes to your subscription:
 http://www.postgresql.org/mailpref/pgsql-hackers


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers