date:20110109

Re: [HACKERS] Streaming base backups

2011-01-09 Thread Hannu Krosing


On 7.1.2011 15:45, Magnus Hagander wrote:

On Fri, Jan 7, 2011 at 02:15, Simon Riggssi...@2ndquadrant.com  wrote:


One very useful feature will be some way of confirming the number and
size of files to transfer, so that the base backup client can find out
the progress.

The patch already does this. Or rather, as it's coded it does this
once per tablespace.

It'll give you an approximation only of course, that can change,
In this case you actually could send exact numbers, as you need to only 
transfer the files
 up to the size they were when starting the base backup. The rest will 
be taken care of by

 WAL replay


  but
it should be enough for the purposes of a progress indication.



It would also be good to avoid writing a backup_label file at all on the
master, so there was no reason why multiple concurrent backups could not
be taken. The current coding allows for the idea that the start and stop
might be in different sessions, whereas here we know we are in one
session.

Yeah, I have that on the todo list suggested by Heikki. I consider it
a later phase though.





--

Hannu Krosing
Senior Consultant,
Infinite Scalability  Performance
http://www.2ndQuadrant.com/books/


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] WIP: Range Types

2011-01-09 Thread Jeff Davis

On Sat, 2011-01-08 at 21:58 -0500, Robert Haas wrote:
 I mean, one semi-obvious possibility is to write one set of C
 functions that can have multiple SQL-level definitions bound to it.
 Then when the function is called, it can peek at flinfo-fn_oid to
 figure out which incarnation was called and then get the typo info
 from there.  That's ugly, though.

That would work, but it is pretty ugly. It means it would be very
difficult for users to write new generic functions, because they would
need to add a new catalog entry for every existing range type.

Then again, wasting 4 bytes per row is not ideal, either. And maybe
users could still write some useful generic functions if they were a
combination of other generic functions, using the polymorphic system.

 It'd be really nice if we could just arrange for the info on which
 type anyrange actually is at the moment to be available in the right
 place.  Storing it on disk to work around that is pretty horrible, but
 maybe there's no other reasonable option.

I was surprised when I saw the solution for records. Maybe we should
consider something like that as a last resort (if it's possible for
non-record types)? I'd rather give up typmod than anyrange.

It also might be worth figuring out why input functions get the type oid
and output functions do not. I see this comment above getTypeIOParam():

 * As of PostgreSQL 8.1, output functions receive only the value
itself
 * and not any auxiliary parameters, so the name of this routine is
now
 * a bit of a misnomer ... it should be
getTypeInputParam.  
   

So, why was it eliminated? If the type output function got the type OID,
would it be enough to use fn_expr_get_argtype() for the other generic
functions?

Regards,
Jeff Davis


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Streaming base backups

2011-01-09 Thread Magnus Hagander

On Sun, Jan 9, 2011 at 09:55, Hannu Krosing ha...@2ndquadrant.com wrote:
 On 7.1.2011 15:45, Magnus Hagander wrote:

 On Fri, Jan 7, 2011 at 02:15, Simon Riggssi...@2ndquadrant.com  wrote:

 One very useful feature will be some way of confirming the number and
 size of files to transfer, so that the base backup client can find out
 the progress.

 The patch already does this. Or rather, as it's coded it does this
 once per tablespace.

 It'll give you an approximation only of course, that can change,

 In this case you actually could send exact numbers, as you need to only
 transfer the files
  up to the size they were when starting the base backup. The rest will be
 taken care of by
  WAL replay

It will still be an estimate, because files can get smaller, and even
go away completely.

But we really don't need more than an estimate...


-- 
 Magnus Hagander
 Me: http://www.hagander.net/
 Work: http://www.redpill-linpro.com/

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Streaming base backups

2011-01-09 Thread Hannu Krosing


On 9.1.2011 10:44, Magnus Hagander wrote:

On Sun, Jan 9, 2011 at 09:55, Hannu Krosingha...@2ndquadrant.com  wrote:

On 7.1.2011 15:45, Magnus Hagander wrote:

On Fri, Jan 7, 2011 at 02:15, Simon Riggssi...@2ndquadrant.comwrote:


One very useful feature will be some way of confirming the number and
size of files to transfer, so that the base backup client can find out
the progress.

The patch already does this. Or rather, as it's coded it does this
once per tablespace.

It'll give you an approximation only of course, that can change,

In this case you actually could send exact numbers, as you need to only
transfer the files
  up to the size they were when starting the base backup. The rest will be
taken care of by
  WAL replay

It will still be an estimate, because files can get smaller, and even
go away completely.
Sure. Just wanted to remind the fact that you don't need to send the 
tail part of the

file which was added after the start of backup.

And you can give the worst case estimate for space needed by base backup.

OTOH, streaming the WAL files in parallel can still fill up all 
available space :P



But we really don't need more than an estimate...


Agreed.

--

Hannu Krosing
Senior Consultant,
Infinite Scalability  Performance
http://www.2ndQuadrant.com/books/


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Streaming base backups

2011-01-09 Thread Magnus Hagander

On Sun, Jan 9, 2011 at 12:05, Hannu Krosing ha...@2ndquadrant.com wrote:
 On 9.1.2011 10:44, Magnus Hagander wrote:

 On Sun, Jan 9, 2011 at 09:55, Hannu Krosingha...@2ndquadrant.com  wrote:

 On 7.1.2011 15:45, Magnus Hagander wrote:

 On Fri, Jan 7, 2011 at 02:15, Simon Riggssi...@2ndquadrant.com
  wrote:

 One very useful feature will be some way of confirming the number and
 size of files to transfer, so that the base backup client can find out
 the progress.

 The patch already does this. Or rather, as it's coded it does this
 once per tablespace.

 It'll give you an approximation only of course, that can change,

 In this case you actually could send exact numbers, as you need to only
 transfer the files
  up to the size they were when starting the base backup. The rest will be
 taken care of by
  WAL replay

 It will still be an estimate, because files can get smaller, and even
 go away completely.

 Sure. Just wanted to remind the fact that you don't need to send the tail
 part of the
 file which was added after the start of backup.

True - but that's a PITA to keep track of. We do this if the file
changes during the transmission of that *file*, since otherwise the
tar header would specify an incorrect size, but not through the whole
backup.


 And you can give the worst case estimate for space needed by base backup.

 OTOH, streaming the WAL files in parallel can still fill up all available
 space :P

Yeah. I don't think it's worth the extra complexity of having to
enumerate and keep records for every individual file before the
streaming starts.

-- 
 Magnus Hagander
 Me: http://www.hagander.net/
 Work: http://www.redpill-linpro.com/

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] system views for walsender activity

2011-01-09 Thread Magnus Hagander

On Fri, Jan 7, 2011 at 22:21, Robert Haas robertmh...@gmail.com wrote:
 On Fri, Jan 7, 2011 at 1:46 PM, Josh Berkus j...@agliodbs.com wrote:

 To my way of thinking, pg_stat_walsender and pg_stat_walreceiver would
 be more clear than pg_stat_replication_master and
 pg_stat_replication_slave.

 Let's commit it so that some of us can get a look at the data it
 contains, and then we can fix the name during beta.

 Well, the first half is committed, under the name pg_stat_replication.
  So go look at that, for starters...

One thing I noticed is that it gives an interesting property to my
patch for streaming base backups - they now show up in
pg_stat_replication, with a streaming location of 0/0.

If the view is named pg_stat_replication, we probably want to filter
that out somehow. But then, do we want a separate view listing the
walsenders that are busy sending base backups?

For that matter, do we want an indication that separates a walsender
not sending data from one sending that happens to be at location 0/0?
Most will leave 0/0 really quickly, but a walsender can be idle (not
received a command yet), or it can be running IDENTIFY_SYSTEM for
example.

-- 
 Magnus Hagander
 Me: http://www.hagander.net/
 Work: http://www.redpill-linpro.com/

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] estimating # of distinct values

2011-01-09 Thread Jim Nasby

 A resource fork? Not sure what you mean, could you describe it in more
 detail?

Ooops, resource forks are a filesystem thing; we call them relation forks. 
From src/backend/storage/smgr/README:

Relation Forks
==

Since 8.4, a single smgr relation can be comprised of multiple physical
files, called relation forks. This allows storing additional metadata like
Free Space information in additional forks, which can be grown and truncated
independently of the main data file, while still treating it all as a single
physical relation in system catalogs.

It is assumed that the main fork, fork number 0 or MAIN_FORKNUM, always
exists. Fork numbers are assigned in src/include/storage/relfilenode.h.
Functions in smgr.c and md.c take an extra fork number argument, in addition
to relfilenode and block number, to identify which relation fork you want to
access. Since most code wants to access the main fork, a shortcut version of
ReadBuffer that accesses MAIN_FORKNUM is provided in the buffer manager for
convenience.
--
Jim C. Nasby, Database Architect   j...@nasby.net
512.569.9461 (cell) http://jim.nasby.net



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Streaming base backups

2011-01-09 Thread Cédric Villemain

2011/1/7 Garick Hamlin gham...@isc.upenn.edu:
 On Thu, Jan 06, 2011 at 07:47:39PM -0500, Cédric Villemain wrote:
 2011/1/5 Magnus Hagander mag...@hagander.net:
  On Wed, Jan 5, 2011 at 22:58, Dimitri Fontaine dimi...@2ndquadrant.fr 
  wrote:
  Magnus Hagander mag...@hagander.net writes:
  * Stefan mentiond it might be useful to put some
  posix_fadvise(POSIX_FADV_DONTNEED)
    in the process that streams all the files out. Seems useful, as long 
  as that
    doesn't kick them out of the cache *completely*, for other backends as 
  well.
    Do we know if that is the case?
 
  Maybe have a look at pgfincore to only tag DONTNEED for blocks that are
  not already in SHM?
 
  I think that's way more complex than we want to go here.
 

 DONTNEED will remove the block from OS buffer everytime.

 It should not be that hard to implement a snapshot(it needs mincore())
 and to restore previous state. I don't know how basebackup is
 performed exactly...so perhaps I am wrong.

 posix_fadvise support is already in postgresql core...we can start by
 just doing a snapshot of the files before starting, or at some point
 in the basebackup, it will need only 256kB per GB of data...

 It is actually possible to be more scalable than the simple solution you
 outline here (although that solution works pretty well).

Yes I suggest something pretty simple to go with a first shoot.


 I've written a program that syncronizes the OS cache state using
 mmap()/mincore() between two computers.  It haven't actually tested its
 impact on performance yet, but I was surprised by how fast it actually runs
 and how compact cache maps can be.

 If one encodes the data so one remembers the number of zeros between 1s
 one, storage scale by the amount of memory in each size rather than the
 dataset size.  I actually played with doing that, then doing huffman
 encoding of that.  I get around 1.2-1.3 bits / page of _physical memory_
 on my tests.

 I don't have my notes handy, but here are some numbers from memory...

that is interesting, even if I didn't have issue with the size of the
maps so far, I thought that a simple zlib compression should be
enought.


 The obvious worst cases are 1 bit per page of _dataset_ or 19 bits per page
 of physical memory in the machine.  The latter limit get better, however,
 since there are  1024 symbols possible for the encoder (since in this
 case symbols are spans of zeros that need to fit in a file that is 1 GB in
 size).  So is actually real worst case is much closer to 1 bit per page of
 the dataset or ~10 bits per page of physical memory.  The real performance
 I see with huffman is more like 1.3 bits per page of physical memory.  All the
 encoding decoding is actually very fast.  zlib would actually compress even
 better than huffman, but huffman encoder/decoder is actually pretty good and
 very straightforward code.

pgfincore currently hold those information in flat file. The on-going
dev is more simple and provide the data as bits, so you can store it
in a table, and restore it on your slave thanks to SR, and use it on
the slave.


 I would like to integrate something like this into PG or perhaps even into
 something like rsync, but its was written as proof of concept and I haven't
 had time work on it recently.

 Garick

 --
 Cédric Villemain               2ndQuadrant
 http://2ndQuadrant.fr/     PostgreSQL : Expertise, Formation et Support

 --
 Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
 To make changes to your subscription:
 http://www.postgresql.org/mailpref/pgsql-hackers




-- 
Cédric Villemain               2ndQuadrant
http://2ndQuadrant.fr/     PostgreSQL : Expertise, Formation et Support

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] system views for walsender activity

2011-01-09 Thread Simon Riggs

On Sun, 2011-01-09 at 12:52 +0100, Magnus Hagander wrote:

 One thing I noticed is that it gives an interesting property to my
 patch for streaming base backups - they now show up in
 pg_stat_replication, with a streaming location of 0/0.
 
 If the view is named pg_stat_replication, we probably want to filter
 that out somehow. But then, do we want a separate view listing the
 walsenders that are busy sending base backups?
 
 For that matter, do we want an indication that separates a walsender
 not sending data from one sending that happens to be at location 0/0?
 Most will leave 0/0 really quickly, but a walsender can be idle (not
 received a command yet), or it can be running IDENTIFY_SYSTEM for
 example.

I think we need a status enum. ('BACKUP', 'CATCHUP', 'STREAM') for the 3
phases of replication.

-- 
 Simon Riggs   http://www.2ndQuadrant.com/books/
 PostgreSQL Development, 24x7 Support, Training and Services
 


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] SSI patch(es)

2011-01-09 Thread Heikki Linnakangas


On 09.01.2011 05:06, Robert Haas wrote:

On Sat, Jan 8, 2011 at 4:10 PM, Kevin Grittner
kevin.gritt...@wicourts.gov  wrote:

Splitting out those three would leave src/backend/ and src/include/
which comes in at a svelte 5891 lines.

With a little more work I could split the three new files
(predicate.c, predicate.h, and predicate_internals.h) out from the
changes scattered around the rest of the code.  That's 4346 lines and
1545 lines, respectively.

Now, these numbers are likely to change a little in the next few
days, but not much as a percentage outside the documentation.

Thoughts?


Well, my first thought is - I'm not sure it's realistic to think we're
going to get this committed to 9.1.

But that's not a very helpful thought.  I just don't know who is going
to review 7700 lines of code in the next month, and it doesn't sound
like it can be decomposed into independent subfeatures that can be
committed independently.


I'm tempted to raise my hand and volunteer, but I don't want to make 
promises I might not be able to keep. But I'll definitely at least take 
another look at it.


I don't think pushing this to 9.2 helps at all. This patch has gone 
through several rounds of reviews already, and unless someone sees a 
major issue with it, it's not going to get any more ready by postponing 
it. And it wouldn't be fair to Kevin and others who've worked hard on 
it. We'll just have to slurp it in somehow.



 Splitting it up by directory isn't really
all that helpful.


Agreed. If it can't easily be split into increments by functionality, 
then we'll just have to deal with it as on big patch.


--
  Heikki Linnakangas
  EnterpriseDB   http://www.enterprisedb.com

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [COMMITTERS] pgsql: Properly install gram.h on MSVC builds

2011-01-09 Thread Tom Lane

Magnus Hagander mag...@hagander.net writes:
 Properly install gram.h on MSVC builds
 This file is now needed by pgAdmin builds, which started
 failing since it was missing in the installer builds.

I'd like to protest this patch as misguided.  AFAICS it is a *seriously*
bad idea for pgAdmin to be including gram.h --- we don't even want most
of the backend to include that, let alone external pieces of code.  It's
not stable enough.  See the note in gramparse.h.

To my way of thinking, not installing that file is a fine idea.
What we really need to be asking is why the pgAdmin folks think
they should be including it.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [COMMITTERS] pgsql: Properly install gram.h on MSVC builds

2011-01-09 Thread Magnus Hagander

On Sun, Jan 9, 2011 at 17:31, Tom Lane t...@sss.pgh.pa.us wrote:
 Magnus Hagander mag...@hagander.net writes:
 Properly install gram.h on MSVC builds
 This file is now needed by pgAdmin builds, which started
 failing since it was missing in the installer builds.

 I'd like to protest this patch as misguided.  AFAICS it is a *seriously*

Uh, we install the file on Unix, so we should do the same on Windows.


 bad idea for pgAdmin to be including gram.h --- we don't even want most
 of the backend to include that, let alone external pieces of code.  It's
 not stable enough.  See the note in gramparse.h.

Whether that's misguided is another thing :-)


 To my way of thinking, not installing that file is a fine idea.
 What we really need to be asking is why the pgAdmin folks think
 they should be including it.

It is required in order to pull kwlist.h, which is used to determine
which keywords to highlight in the SQL editor (and other places that
do syntax highlighting). It never actually uses the values, but it
needs the file for kwlist.h to compile.

-- 
 Magnus Hagander
 Me: http://www.hagander.net/
 Work: http://www.redpill-linpro.com/

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Support for negative index values in array fetching

2011-01-09 Thread Hannu Valtonen


On 1/5/11 6:19 PM, Florian Pflug wrote:

Sorry, but It isn't too intuitive. Minimally for me. Why you don't
thinking about simple functions with only positive arguments. There
are only four combinations. I don't think we must have only one super
function.

we need functionality for:

a) get first n items
b) get items without last n items
c) get last n items
d) skip first n items

Now you've moved the goalpost - the OP wanted to access individual
elements, not slices! To support slices, a three-argument version
of array_relative() would be required, with the signature

   array_relative(some_array anyarray, first int[], last int[])

Your requirements (a) to (d) are then easily satisfied

a) array_relative(ary, array[0], array[n-1])
b) array_relative(ary, array[0], array[-n-1])
c) array_relative(ary, array[-n], array[-1])
d) array_relative(ary, array[n], array[-1])

The individual function approach might be a tad more readable for
one-dimensional arrays, but they don't scale well to the general
case.

Maybe the OP could comment on whether any of these solutions
would fit his needs?


Hi,

(sorry for the late reply, got lost in my Inbox)

For my main use case I just needed the last element of the array, but I 
could see myself needing a slice as well. (i.e. give me the last 5 items 
in an array)


So in that sense yes, this would fit the bill.

Hannu Valtonen
Lead Software Architect
Technology Office
F-Secure Corporationhttp://www.F-Secure.com


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [COMMITTERS] pgsql: Properly install gram.h on MSVC builds

2011-01-09 Thread Tom Lane

Magnus Hagander mag...@hagander.net writes:
 On Sun, Jan 9, 2011 at 17:31, Tom Lane t...@sss.pgh.pa.us wrote:
 I'd like to protest this patch as misguided.  AFAICS it is a *seriously*

 Uh, we install the file on Unix, so we should do the same on Windows.

Well, my idea of how to fix that would be the other way 'round.

 What we really need to be asking is why the pgAdmin folks think
 they should be including it.

 It is required in order to pull kwlist.h,

No, it is not required.  What they should be doing is #define'ing
PG_KEYWORD() in a way that ignores its second argument.  See pg_dump's
keywords.c for an example of safe usage.

If we allow this to stand then we are going to find people complaining
that we've broken ABI anytime we make a minor change in the grammar.
I repeat: it's a SERIOUSLY bad idea.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [COMMITTERS] pgsql: Properly install gram.h on MSVC builds

2011-01-09 Thread Magnus Hagander

On Sun, Jan 9, 2011 at 17:49, Tom Lane t...@sss.pgh.pa.us wrote:
 Magnus Hagander mag...@hagander.net writes:
 On Sun, Jan 9, 2011 at 17:31, Tom Lane t...@sss.pgh.pa.us wrote:
 I'd like to protest this patch as misguided.  AFAICS it is a *seriously*

 Uh, we install the file on Unix, so we should do the same on Windows.

 Well, my idea of how to fix that would be the other way 'round.

Sure, then it's at least consistent...


 What we really need to be asking is why the pgAdmin folks think
 they should be including it.

 It is required in order to pull kwlist.h,

 No, it is not required.  What they should be doing is #define'ing
 PG_KEYWORD() in a way that ignores its second argument.  See pg_dump's
 keywords.c for an example of safe usage.

Ahh, good point.

And yes, that seems to work for pgadmin. I'll commit a patch there as
soon as I've finished testing, at which point it won't be required
anymore.


-- 
 Magnus Hagander
 Me: http://www.hagander.net/
 Work: http://www.redpill-linpro.com/

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

[HACKERS] SSI and 2PC

2011-01-09 Thread Kevin Grittner

In going back through old emails to see what issues might have been
raised but not yet addressed for the SSI patch, I found the subject
issue described in a review by Jeff Davis here:
 
http://archives.postgresql.org/pgsql-hackers/2010-10/msg01159.php
 
I think this is already handled based on feedback from Heikki:
 
http://archives.postgresql.org/pgsql-hackers/2010-09/msg00789.php
 
Basically, the PREPARE TRANSACTION statement plays the same role for
the prepared transaction that COMMIT does for other transactions --
final conflict checking is done and the transaction becomes immune to
later serialization_failure rollback.  A transaction which starts
after PREPARE TRANSACTION executes is not considered to overlap with
the prepared transaction.
 
In Jeff's example, if the Session2 runs a query before Session1
executes PREPARE TRANSACTION, one of them will fail.  (I tested to
make sure.)
 
Does that sound sane, or is something else needed here?
 
-Kevin


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

[HACKERS] Compatibility GUC for serializable

2011-01-09 Thread Kevin Grittner

There's an issue where we don't seem to have consensus yet, so I
figured I'd bounce it off the list.
 
If the SSI patch were to be accepted as is, REPEATABLE READ would
continue to provide the exact same snapshot isolation behavior which
both it and SERIALIZABLE do through 9.0, and SERIALIZABLE would
always use SSI on top of the snapshot isolation to prevent
serialization anomalies.  In his review, Jeff argued for a
compatibility GUC which could be changed to provide legacy behavior
for SERIALIZABLE transactions -- if set, SERIALIZABLE would fall back
to working the same as REPEATABLE READ.
 
In an off-list exchange with me, David Fetter expressed opposition to
this, as a foot-gun.  I'm not sure where anyone else stands on this. 
Personally, I don't care a whole lot because it's trivial to add, so
that seems to leave the vote at 1 to 1.  Anyone else care to tip the
scales?
 
-Kevin


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Compatibility GUC for serializable

2011-01-09 Thread David Fetter

On Sun, Jan 09, 2011 at 12:07:49PM -0600, Kevin Grittner wrote:
 There's an issue where we don't seem to have consensus yet, so I
 figured I'd bounce it off the list.
  
 If the SSI patch were to be accepted as is, REPEATABLE READ would
 continue to provide the exact same snapshot isolation behavior which
 both it and SERIALIZABLE do through 9.0, and SERIALIZABLE would
 always use SSI on top of the snapshot isolation to prevent
 serialization anomalies.  In his review, Jeff argued for a
 compatibility GUC which could be changed to provide legacy behavior
 for SERIALIZABLE transactions -- if set, SERIALIZABLE would fall
 back to working the same as REPEATABLE READ.
  
 In an off-list exchange with me, David Fetter expressed opposition
 to this, as a foot-gun.  I'm not sure where anyone else stands on
 this.  Personally, I don't care a whole lot because it's trivial to
 add, so that seems to leave the vote at 1 to 1.  Anyone else care to
 tip the scales?

For what it's worth, that exchange started with my proposing a
separate SNAPSHOT isolation, but since we'll already providing that
isolation level and calling it REPEATABLE READ, I figured we didn't
need an extra one that did the exact same thing. :)

Cheers,
David.
-- 
David Fetter da...@fetter.org http://fetter.org/
Phone: +1 415 235 3778  AIM: dfetter666  Yahoo!: dfetter
Skype: davidfetter  XMPP: david.fet...@gmail.com
iCal: webcal://www.tripit.com/feed/ical/people/david74/tripit.ics

Remember to vote!
Consider donating to Postgres: http://www.postgresql.org/about/donate

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

[HACKERS] multiset patch review

2011-01-09 Thread Pavel Stehule

Patching:

patching file doc/src/sgml/func.sgml
Hunk #6 succeeded at 10567 (offset 1 line).
Hunk #7 succeeded at 10621 (offset 1 line).
Hunk #8 succeeded at 10721 (offset 1 line).
Hunk #9 succeeded at 10775 (offset 1 line).
patching file src/backend/nodes/makefuncs.c
patching file src/backend/parser/gram.y
patching file src/backend/utils/adt/array_userfuncs.c
patching file src/include/catalog/pg_aggregate.h
patching file src/include/catalog/pg_proc.h
patching file src/include/nodes/makefuncs.h
patching file src/include/parser/kwlist.h
patching file src/include/utils/array.h
Hunk #1 succeeded at 281 (offset 1 line).
patching file src/test/regress/expected/arrays.out
Hunk #1 succeeded at 1558 (offset 272 lines).
patching file src/test/regress/sql/arrays.sql
Hunk #1 succeeded at 438 (offset 12 lines).

Compilation:

there are no warning related to patch

regress tests failed

*** 1639,1647 
..
  SELECT ARRAY[1, 2] SUBMULTISET OF ARRAY[1, NULL],
 ARRAY[1, 2] SUBMULTISET OF ARRAY[3, NULL];
!  submultiset_of | submultiset_of ^M
! +^M
! | f^M
  (1 row)
..
  SELECT ARRAY[1, NULL] SUBMULTISET OF ARRAY[1, NULL];
--- 1639,1647 
..
  SELECT ARRAY[1, 2] SUBMULTISET OF ARRAY[1, NULL],
 ARRAY[1, 2] SUBMULTISET OF ARRAY[3, NULL];
!  submultiset_of | submultiset_of.
! +
! | f
  (1 row)
..
  SELECT ARRAY[1, NULL] SUBMULTISET OF ARRAY[1, NULL];

There is often used a fragment

+ --fn.arg[0] = values1[n1];
+ --fn.arg[1] = values2[n2];
+ --fn.argnull[0] = false;
+ --fn.argnull[1] = false;
+ --fn.isnull = false;
+ --r = DatumGetInt32(FunctionCallInvoke(fn));

it can be moved to procedure?

Doc and regress tests are ok.

I see only one issue. There isn't documented, what is a MULTISET?

Regards

Pavel Stehule

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] GiST insert algorithm rewrite

2011-01-09 Thread Heikki Linnakangas


On 09.01.2011 07:05, Tom Lane wrote:

I just found out that the benchmark test script in
contrib/intarray/bench/ crashes HEAD in gistdoinsert() --- it looks like
it's trying to pop to a stack entry that isn't there.

Run it per the instructions in the intarray documentation:

$ createdb TEST
$ psql TEST  ../_int.sql
...
$ ./create_test.pl | psql TEST
CREATE TABLE
CREATE TABLE
CREATE INDEX
CREATE INDEX
server closed the connection unexpectedly
 This probably means the server terminated abnormally
 before or while processing the request.
connection to server was lost

The script generates randomized data, so possibly it won't fail every
time, but it failed three out of three times for me.  The changes I'm
about to commit in intarray don't seem to make any difference.


Thanks, fixed. Apparently my testing never hit the case where an update, 
rather than an insert, into the root page causes it to split.


--
  Heikki Linnakangas
  EnterpriseDB   http://www.enterprisedb.com

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

[HACKERS] hstore ? operator versus mathematics

2011-01-09 Thread Tom Lane

hstore's hstore ? text[] operator is defined as contains all, ie,
it will return true if all the key names found in the text array are
present in the hstore.

ISTM that a sane definition of this operator would provide that if the
array is empty, it returns true: every set contains the empty set.
However, the code goes out of its way to return false instead.

Perhaps this was done intentionally because there was no way to make
GIN index searches work compatibly ... but now there is, so I see no
reason to remain bug-compatible with this behavior.

Barring objections, I'm going to fix it.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

[HACKERS] ALTER TYPE 1: recheck index-based constraints

2011-01-09 Thread Noah Misch

When ALTER TABLE rewrites a table, it reindexes, but the reindex does not
revalidate UNIQUE/EXCLUDE constraints.  This behaves badly in cases like this,
neglecting to throw an error on the new UNIQUE violation:

CREATE TABLE t (c numeric UNIQUE);
INSERT INTO t VALUES (1.1),(1.2);
ALTER TABLE t ALTER c TYPE int;

The comment gave a reason for skipping the checks: it would cause deadlocks when
we rewrite a system catalog.  So, this patch changes things to only skip the
check for system catalogs.
*** a/src/backend/catalog/index.c
--- b/src/backend/catalog/index.c
***
*** 2544,2552  reindex_index(Oid indexId, bool skip_constraint_checks)
   * catalog indexes while collecting the list.)
   *
   * We also skip rechecking uniqueness/exclusion constraint properties if
!  * heap_rebuilt is true.  This avoids likely deadlock conditions when doing
!  * VACUUM FULL or CLUSTER on system catalogs.  REINDEX should be used to
!  * rebuild an index if constraint inconsistency is suspected.
   *
   * Returns true if any indexes were rebuilt.  Note that a
   * CommandCounterIncrement will occur after each index rebuild.
--- 2544,2554 
   * catalog indexes while collecting the list.)
   *
   * We also skip rechecking uniqueness/exclusion constraint properties if
!  * heap_rebuilt is true and the relation is a system catalog.  This avoids
!  * likely deadlock conditions when doing VACUUM FULL or CLUSTER.  We trust 
that
!  * constraints will remain consistent across a user-issued ALTER TABLE 
against a
!  * system catalog.  REINDEX should be used to rebuild an index if constraint
!  * inconsistency is suspected.
   *
   * Returns true if any indexes were rebuilt.  Note that a
   * CommandCounterIncrement will occur after each index rebuild.
***
*** 2629,2635  reindex_relation(Oid relid, bool toast_too, bool 
heap_rebuilt)
if (is_pg_class)
RelationSetIndexList(rel, doneIndexes, 
InvalidOid);
  
!   reindex_index(indexOid, heap_rebuilt);
  
CommandCounterIncrement();
  
--- 2631,2637 
if (is_pg_class)
RelationSetIndexList(rel, doneIndexes, 
InvalidOid);
  
!   reindex_index(indexOid, heap_rebuilt  
IsSystemRelation(rel));
  
CommandCounterIncrement();
  
*** a/src/test/regress/expected/alter_table.out
--- b/src/test/regress/expected/alter_table.out
***
*** 1774,1779  DEBUG:  Rebuilding index t_strarr_idx
--- 1774,1781 
  DEBUG:  Rebuilding index t_square_idx
  DEBUG:  Rebuilding index t_expr_idx
  DEBUG:  Rebuilding index t_touchy_f_idx
+ ERROR:  could not create unique index t_touchy_f_idx
+ DETAIL:  Key (touchy_f(constraint1))=(100) is duplicated.
  ALTER TABLE t ALTER constraint2 TYPE trickint;
-- noop-e
  DEBUG:  Rewriting table t
  DEBUG:  Rebuilding index t_constraint4_key
***
*** 1792,1816  DEBUG:  Rebuilding index t_strarr_idx
  DEBUG:  Rebuilding index t_square_idx
  DEBUG:  Rebuilding index t_touchy_f_idx
  DEBUG:  Rebuilding index t_expr_idx
! -- Temporary fixup until behavior of the previous two improves.
! ALTER TABLE t ALTER constraint1 TYPE int, ALTER constraint2 TYPE int;
! DEBUG:  Rewriting table t
! DEBUG:  Rebuilding index t_constraint4_key
! DEBUG:  Rebuilding index t_integral_key
! DEBUG:  Rebuilding index t_rational_key
! DEBUG:  Rebuilding index t_daytimetz_key
! DEBUG:  Rebuilding index t_daytime_key
! DEBUG:  Rebuilding index t_stamptz_key
! DEBUG:  Rebuilding index t_stamp_key
! DEBUG:  Rebuilding index t_timegap_key
! DEBUG:  Rebuilding index t_bits_key
! DEBUG:  Rebuilding index t_network_key
! DEBUG:  Rebuilding index t_string_idx
! DEBUG:  Rebuilding index t_string_idx1
! DEBUG:  Rebuilding index t_strarr_idx
! DEBUG:  Rebuilding index t_square_idx
! DEBUG:  Rebuilding index t_touchy_f_idx
! DEBUG:  Rebuilding index t_expr_idx
  -- Change a column with an outgoing foreign key constraint.
  ALTER TABLE t ALTER constraint3 TYPE numeric(8,1); -- rewrite, FK error
  DEBUG:  Rewriting table t
--- 1794,1801 
  DEBUG:  Rebuilding index t_square_idx
  DEBUG:  Rebuilding index t_touchy_f_idx
  DEBUG:  Rebuilding index t_expr_idx
! ERROR:  could not create unique index t_expr_idx
! DETAIL:  Key ((1))=(1) is duplicated.
  -- Change a column with an outgoing foreign key constraint.
  ALTER TABLE t ALTER constraint3 TYPE numeric(8,1); -- rewrite, FK error
  DEBUG:  Rewriting table t
*** a/src/test/regress/sql/alter_table.sql
--- b/src/test/regress/sql/alter_table.sql
***
*** 1246,1253  FROM pg_trigger WHERE tgrelid = 't'::regclass ORDER BY 
tgname;
  ALTER TABLE t ALTER constraint0 TYPE trickint;
-- verify-e
  ALTER TABLE t ALTER constraint1 TYPE trickint;
-- noop-e
  ALTER TABLE

[HACKERS] ALTER TYPE 2: skip already-provable no-work rewrites

2011-01-09 Thread Noah Misch

This patch removes ALTER TYPE rewrites in cases we can already prove valid.  I
add a function GetCoerceExemptions() that walks an Expr according to rules
discussed in the design thread, simplified slightly pending additions in the
next patch.  See the comment at that function for a refresher.  I use it to
populate two new bools to AlteredTableInfo, new_bits and mayerror.
new_bits is a superset of new_changedoids, so I subsume that.  I change
ATRewriteTable to act on those and support the notion of evaluating the
transformation expressions when we're not rewriting the table.

As unintended fallout, it's no longer an error to add oids or a column with a
default value to a table whose rowtype is used in columns elsewhere.  This seems
best.  Defaults on the origin table do not even apply to new inserts into such a
column, and the rowtype does not gain an OID column via its table.

This helps on conversions like varchar(X)-text, xml-text, and conversions
between domains and their base types.
*** a/src/backend/commands/tablecmds.c
--- b/src/backend/commands/tablecmds.c
***
*** 71,76 
--- 71,77 
  #include storage/smgr.h
  #include utils/acl.h
  #include utils/builtins.h
+ #include utils/datum.h
  #include utils/fmgroids.h
  #include utils/inval.h
  #include utils/lsyscache.h
***
*** 140,147  typedef struct AlteredTableInfo
/* Information saved by Phases 1/2 for Phase 3: */
List   *constraints;/* List of NewConstraint */
List   *newvals;/* List of NewColumnValue */
boolnew_notnull;/* T if we added new NOT NULL 
constraints */
-   boolnew_changeoids; /* T if we added/dropped the OID column 
*/
Oid newTableSpace;  /* new tablespace; 0 means no 
change */
/* Objects to rebuild after completing ALTER TYPE operations */
List   *changedConstraintOids;  /* OIDs of constraints to 
rebuild */
--- 141,149 
/* Information saved by Phases 1/2 for Phase 3: */
List   *constraints;/* List of NewConstraint */
List   *newvals;/* List of NewColumnValue */
+   boolnew_bits;   /* Could stored tuple or OID 
bits change? */
+   boolmayerror;   /* Could the newval expressions 
throw errors? */
boolnew_notnull;/* T if we added new NOT NULL 
constraints */
Oid newTableSpace;  /* new tablespace; 0 means no 
change */
/* Objects to rebuild after completing ALTER TYPE operations */
List   *changedConstraintOids;  /* OIDs of constraints to 
rebuild */
***
*** 3184,3190  ATRewriteTables(List **wqueue, LOCKMODE lockmode)
 * We only need to rewrite the table if at least one column 
needs to
 * be recomputed, or we are adding/removing the OID column.
 */
!   if (tab-newvals != NIL || tab-new_changeoids)
{
/* Build a temporary relation and copy data */
RelationOldHeap;
--- 3186,3192 
 * We only need to rewrite the table if at least one column 
needs to
 * be recomputed, or we are adding/removing the OID column.
 */
!   if (tab-new_bits)
{
/* Build a temporary relation and copy data */
RelationOldHeap;
***
*** 3250,3256  ATRewriteTables(List **wqueue, LOCKMODE lockmode)
 * Test the current data within the table against new 
constraints
 * generated by ALTER TABLE commands, but don't rebuild 
data.
 */
!   if (tab-constraints != NIL || tab-new_notnull)
ATRewriteTable(tab, InvalidOid, lockmode);
  
/*
--- 3252,3258 
 * Test the current data within the table against new 
constraints
 * generated by ALTER TABLE commands, but don't rebuild 
data.
 */
!   if (tab-mayerror || tab-constraints != NIL || 
tab-new_notnull)
ATRewriteTable(tab, InvalidOid, lockmode);
  
/*
***
*** 3310,3316  ATRewriteTables(List **wqueue, LOCKMODE lockmode)
  /*
   * ATRewriteTable: scan or rewrite one table
   *
!  * OIDNewHeap is InvalidOid if we don't need to rewrite
   */
  static void
  ATRewriteTable(AlteredTableInfo *tab, Oid OIDNewHeap, LOCKMODE lockmode)
--- 3312,3319 
  /*
   * ATRewriteTable: scan or rewrite one table
   *
!  * OIDNewHeap is InvalidOid if we don't need to rewrite.  If the only new
!  * constraints are foreign key constraints, we do nothing here.
   */
  static void

[HACKERS] ALTER TYPE 4: temporal data types

2011-01-09 Thread Noah Misch

Add exemptor functions to avoid rewrites for conversions involving the temporal
data types.  I needed a find-last-set function for the interval_scale exemptor
function, so I imported one from FreeBSD.  To improve timestamp-timestamptz
when the timezone is UTC/GMT, I compare the current timezone definition to the
gmt_timezone we keep handy.  To make that actually work, I've changed pg_tzset
to zero its unused bytes.
*** a/configure
--- b/configure
***
*** 20269,20275  LIBS=`echo $LIBS | sed -e 's/-ledit//g' -e 
's/-lreadline//g'`
  
  
  
! for ac_func in crypt erand48 getopt getrusage inet_aton random rint srandom 
strdup strerror strlcat strlcpy strtol strtoul
  do
  as_ac_var=`$as_echo ac_cv_func_$ac_func | $as_tr_sh`
  { $as_echo $as_me:$LINENO: checking for $ac_func 5
--- 20269,20276 
  
  
  
! 
! for ac_func in crypt erand48 fls getopt getrusage inet_aton random rint 
srandom strdup strerror strlcat strlcpy strtol strtoul
  do
  as_ac_var=`$as_echo ac_cv_func_$ac_func | $as_tr_sh`
  { $as_echo $as_me:$LINENO: checking for $ac_func 5
*** a/configure.in
--- b/configure.in
***
*** 1288,1294  fi
  pgac_save_LIBS=$LIBS
  LIBS=`echo $LIBS | sed -e 's/-ledit//g' -e 's/-lreadline//g'`
  
! AC_REPLACE_FUNCS([crypt erand48 getopt getrusage inet_aton random rint 
srandom strdup strerror strlcat strlcpy strtol strtoul])
  
  case $host_os in
  
--- 1288,1294 
  pgac_save_LIBS=$LIBS
  LIBS=`echo $LIBS | sed -e 's/-ledit//g' -e 's/-lreadline//g'`
  
! AC_REPLACE_FUNCS([crypt erand48 fls getopt getrusage inet_aton random rint 
srandom strdup strerror strlcat strlcpy strtol strtoul])
  
  case $host_os in
  
*** a/src/backend/utils/adt/date.c
--- b/src/backend/utils/adt/date.c
***
*** 1187,1192  timetypmodout(PG_FUNCTION_ARGS)
--- 1187,1203 
  }
  
  
+ /* time_exemptor()
+  * Identify superfluous calls to time_scale.  Also suitable for timetz_scale.
+  */
+ Datum
+ time_exemptor(PG_FUNCTION_ARGS)
+ {
+   PG_RETURN_INT32(TemporalExemptor(MAX_TIME_PRECISION,
+
PG_GETARG_INT32(0),
+
PG_GETARG_INT32(1)));
+ }
+ 
  /* time_scale()
   * Adjust time type for specified scale factor.
   * Used by PostgreSQL type system to stuff columns.
*** a/src/backend/utils/adt/datetime.c
--- b/src/backend/utils/adt/datetime.c
***
*** 4135,4140  CheckDateTokenTables(void)
--- 4135,4158 
  }
  
  /*
+  * Helper for temporal exemptor functions.  Under !HAVE_INT64_TIMESTAMP and 
the
+  * wrong floating point implementation, rounding could make a fib of our
+  * COERCE_EXEMPT_NOCHANGE.  Genuine users will appreciate this as a feature: 
we
+  * avoid gratuitous data churn due to rounding anomalies in the cast.  This
+  * would yield an Assert failure in a suitably enabled build, at which point 
we
+  * might need to weaken that Assert or get serious about the precise truth 
here.
+  */
+ int32
+ TemporalExemptor(int32 max_precis, int32 old_precis, int32 new_precis)
+ {
+   if (new_precis == -1 ||
+   new_precis == max_precis ||
+   (old_precis != -1  new_precis = old_precis))
+   return COERCE_EXEMPT_NOCHANGE | COERCE_EXEMPT_NOERROR;
+   return COERCE_EXEMPT_NOERROR;
+ }
+ 
+ /*
   * This function gets called during timezone config file load or reload
   * to create the final array of timezone tokens.  The argument array
   * is already sorted in name order.  This data is in a temporary memory
*** a/src/backend/utils/adt/timestamp.c
--- b/src/backend/utils/adt/timestamp.c
***
*** 308,313  timestamptypmodout(PG_FUNCTION_ARGS)
--- 308,329 
  }
  
  
+ /* timestamp_exemptor()
+  * Identify superfluous calls to timestamp_scale.  Also suitable for
+  * timestamptz_scale.
+  */
+ Datum
+ timestamp_exemptor(PG_FUNCTION_ARGS)
+ {
+   /*
+* timestamp_scale throws an error when the typmod is out of range, but 
we
+* can't get there from a cast: our typmodin will have caught it 
already.
+*/
+   PG_RETURN_INT32(TemporalExemptor(MAX_TIMESTAMP_PRECISION,
+
PG_GETARG_INT32(0),
+
PG_GETARG_INT32(1)));
+ }
+ 
  /* timestamp_scale()
   * Adjust time type for specified scale factor.
   * Used by PostgreSQL type system to stuff columns.
***
*** 745,750  interval_send(PG_FUNCTION_ARGS)
--- 761,779 
PG_RETURN_BYTEA_P(pq_endtypsend(buf));
  }
  
+ /*
+  * The interval typmod stores a range in its high 16 bits and a precision 
in
+  * its low 16 bits.  Both address the resolution of the type, with the range
+  * affecting resolution larger than one second, and precision affecting
+  * resolution smaller than one second.  The representation is sufficient to
+  * represent all

[HACKERS] ALTER TYPE 5: varbit and bit

2011-01-09 Thread Noah Misch

Add exemptor functions for bit and varbit.  These are probably the simplest
examples of the full range of optimizations.  I would have used them as the test
case in the initial exemptor function patch if it were a more mainstream use
case.
*** a/src/backend/utils/adt/varbit.c
--- b/src/backend/utils/adt/varbit.c
***
*** 18,23 
--- 18,24 
  
  #include access/htup.h
  #include libpq/pqformat.h
+ #include nodes/primnodes.h
  #include utils/array.h
  #include utils/varbit.h
  
***
*** 337,342  bit_send(PG_FUNCTION_ARGS)
--- 338,359 
  }
  
  /*
+  * bit_exemptor()
+  * Identify superfluous calls to our length coercion function.
+  */
+ Datum
+ bit_exemptor(PG_FUNCTION_ARGS)
+ {
+   boolisExplicit = PG_GETARG_BOOL(2);
+ 
+   /*
+* We could add COERCE_EXEMPT_NOERROR when the old and new lengths are
+* identical, but that has zero practical value.
+*/
+   PG_RETURN_INT32(isExplicit ? 0 : COERCE_EXEMPT_NOCHANGE);
+ }
+ 
+ /*
   * bit()
   * Converts a bit() type to a specific internal length.
   * len is the bitlength specified in the column definition.
***
*** 645,650  varbit_send(PG_FUNCTION_ARGS)
--- 662,683 
  }
  
  /*
+  * varbit_exemptor()
+  * Identify superfluous calls to our length coercion function.
+  */
+ Datum
+ varbit_exemptor(PG_FUNCTION_ARGS)
+ {
+   int32   old_max = PG_GETARG_INT32(0);
+   int32   new_max = PG_GETARG_INT32(1);
+   boolisExplicit = PG_GETARG_BOOL(2);
+ 
+   if (new_max = 0 || (old_max  0  new_max = old_max))
+   PG_RETURN_INT32(COERCE_EXEMPT_NOCHANGE | COERCE_EXEMPT_NOERROR);
+   PG_RETURN_INT32(isExplicit ? 0 : COERCE_EXEMPT_NOCHANGE);
+ }
+ 
+ /*
   * varbit()
   * Converts a varbit() type to a specific internal length.
   * len is the maximum bitlength specified in the column definition.
*** a/src/include/catalog/catversion.h
--- b/src/include/catalog/catversion.h
***
*** 53,58 
   */
  
  /*mmddN */
! #define CATALOG_VERSION_NO201101102
  
  #endif
--- 53,58 
   */
  
  /*mmddN */
! #define CATALOG_VERSION_NO201101103
  
  #endif
*** a/src/include/catalog/pg_cast.h
--- b/src/include/catalog/pg_cast.h
***
*** 355,362  DATA(insert ( 1114 1114 1961 3542 i f ));
  DATA(insert ( 1184 1184 1967 3542 i f ));
  DATA(insert ( 1186 1186 1200 3543 i f ));
  DATA(insert ( 1266 1266 1969 3541 i f ));
! DATA(insert ( 1560 1560 1685 0 i f ));
! DATA(insert ( 1562 1562 1687 0 i f ));
  DATA(insert ( 1700 1700 1703 0 i f ));
  
  #endif   /* PG_CAST_H */
--- 355,362 
  DATA(insert ( 1184 1184 1967 3542 i f ));
  DATA(insert ( 1186 1186 1200 3543 i f ));
  DATA(insert ( 1266 1266 1969 3541 i f ));
! DATA(insert ( 1560 1560 1685 3817 i f ));
! DATA(insert ( 1562 1562 1687 3819 i f ));
  DATA(insert ( 1700 1700 1703 0 i f ));
  
  #endif   /* PG_CAST_H */
*** a/src/include/catalog/pg_proc.h
--- b/src/include/catalog/pg_proc.h
***
*** 2433,2440  DESCR(int4 to bitstring);
--- 2433,2444 
  DATA(insert OID = 1684 (  int4PGNSP PGUID 12 
1 0 0 f f f t f i 1 0 23 1560 _null_ _null_ _null_ _null_ bittoint4 _null_ 
_null_ _null_ ));
  DESCR(bitstring to int4);
  
+ DATA(insert OID = 3817 (  bit_exemptor   PGNSP PGUID 12 1 0 0 f f f t 
f i 3 0 23 23 23 16 _null_ _null_ _null_ _null_ bit_exemptor _null_ _null_ 
_null_ ));
+ DESCR(bit cast exemptor);
  DATA(insert OID = 1685 (  bitPGNSP PGUID 12 1 0 0 f f f t 
f i 3 0 1560 1560 23 16 _null_ _null_ _null_ _null_ bit _null_ _null_ _null_ 
));
  DESCR(adjust bit() to typmod length);
+ DATA(insert OID = 3819 (  varbit_exemptor  PGNSP PGUID 12 1 0 0 f f f t f i 3 
0 23 23 23 16 _null_ _null_ _null_ _null_ varbit_exemptor _null_ _null_ 
_null_ ));
+ DESCR(varbit cast exemptor);
  DATA(insert OID = 1687 (  varbit PGNSP PGUID 12 1 0 0 f f f t 
f i 3 0 1562 1562 23 16 _null_ _null_ _null_ _null_ varbit _null_ _null_ 
_null_ ));
  DESCR(adjust varbit() to typmod length);
  
*** a/src/include/utils/varbit.h
--- b/src/include/utils/varbit.h
***
*** 71,77  extern Datum varbit_recv(PG_FUNCTION_ARGS);
--- 71,79 
  extern Datum varbit_send(PG_FUNCTION_ARGS);
  extern Datum varbittypmodin(PG_FUNCTION_ARGS);
  extern Datum varbittypmodout(PG_FUNCTION_ARGS);
+ extern Datum bit_exemptor(PG_FUNCTION_ARGS);
  extern Datum bit(PG_FUNCTION_ARGS);
+ extern Datum varbit_exemptor(PG_FUNCTION_ARGS);
  extern Datum varbit(PG_FUNCTION_ARGS);
  extern Datum biteq(PG_FUNCTION_ARGS);
  extern Datum bitne(PG_FUNCTION_ARGS);
*** a/src/test/regress/expected/alter_table.out
--- b/src/test/regress/expected/alter_table.out
***
*** 2454,2478  DEBUG:  Rebuilding index t_timegap_key
  ALTER TABLE t ALTER timegap TYPE interval(2);

[HACKERS] ALTER TYPE 6: numeric

2011-01-09 Thread Noah Misch

Add an exemptor function for numeric.  We store the scale in every datum, making
numeric(7,2)-numeric(8,3) unoptimizable.  Precision changes work, though.
*** a/src/backend/utils/adt/numeric.c
--- b/src/backend/utils/adt/numeric.c
***
*** 712,717  numeric_send(PG_FUNCTION_ARGS)
--- 712,754 
  }
  
  
+ Datum
+ numeric_exemptor(PG_FUNCTION_ARGS)
+ {
+   int32   old_typmod = PG_GETARG_INT32(0);
+   int32   new_typmod = PG_GETARG_INT32(1);
+   int old_scale;
+   int new_scale;
+   int old_precis;
+   int new_precis;
+ 
+   /* Destination is unconstrained: never any work to do. */
+   if (new_typmod  VARHDRSZ)
+   PG_RETURN_INT32(COERCE_EXEMPT_NOCHANGE | COERCE_EXEMPT_NOERROR);
+ 
+   /*
+* Source is unconstrained, and destination is constrained.  A value 
like
+* 10^-1001 will always get truncated in this situation, so we can't 
help.
+*/
+   if (old_typmod  VARHDRSZ)
+   PG_RETURN_INT32(0);
+ 
+   /* Each value stores its scale.  If the scale changes, we can't help. */
+   old_scale = (old_typmod - VARHDRSZ)  0x;
+   new_scale = (new_typmod - VARHDRSZ)  0x;
+   if (old_scale != new_scale)
+   PG_RETURN_INT32(0);
+ 
+   old_precis = (old_typmod - VARHDRSZ)  16  0x;
+   new_precis = (new_typmod - VARHDRSZ)  16  0x;
+   if (new_precis = old_precis)
+   /* Precision increases with scale unchanged: never any work to 
do. */
+   PG_RETURN_INT32(COERCE_EXEMPT_NOCHANGE | COERCE_EXEMPT_NOERROR);
+ 
+   /* Precision falls with scale unchanged: might error, never changes 
data. */
+   PG_RETURN_INT32(COERCE_EXEMPT_NOCHANGE);
+ }
+ 
  /*
   * numeric() -
   *
*** a/src/include/catalog/catversion.h
--- b/src/include/catalog/catversion.h
***
*** 53,58 
   */
  
  /*mmddN */
! #define CATALOG_VERSION_NO201101103
  
  #endif
--- 53,58 
   */
  
  /*mmddN */
! #define CATALOG_VERSION_NO201101104
  
  #endif
*** a/src/include/catalog/pg_cast.h
--- b/src/include/catalog/pg_cast.h
***
*** 357,362  DATA(insert ( 1186 1186 1200 3543 i f ));
  DATA(insert ( 1266 1266 1969 3541 i f ));
  DATA(insert ( 1560 1560 1685 3817 i f ));
  DATA(insert ( 1562 1562 1687 3819 i f ));
! DATA(insert ( 1700 1700 1703 0 i f ));
  
  #endif   /* PG_CAST_H */
--- 357,362 
  DATA(insert ( 1266 1266 1969 3541 i f ));
  DATA(insert ( 1560 1560 1685 3817 i f ));
  DATA(insert ( 1562 1562 1687 3819 i f ));
! DATA(insert ( 1700 1700 1703 3818 i f ));
  
  #endif   /* PG_CAST_H */
*** a/src/include/catalog/pg_proc.h
--- b/src/include/catalog/pg_proc.h
***
*** 2595,2600  DATA(insert OID = 2917 (  numerictypmodinPGNSP 
PGUID 12 1 0 0 f f f t f i 1 0
--- 2595,2602 
  DESCR(I/O typmod);
  DATA(insert OID = 2918 (  numerictypmodoutPGNSP PGUID 12 1 0 0 f 
f f t f i 1 0 2275 23 _null_ _null_ _null_ _null_  numerictypmodout 
_null_ _null_ _null_ ));
  DESCR(I/O typmod);
+ DATA(insert OID = 3818 (  numeric_exemptor  PGNSP PGUID 12 1 0 0 f f f t f i 
3 0 23 23 23 16 _null_ _null_ _null_ _null_ numeric_exemptor _null_ _null_ 
_null_ ));
+ DESCR(numeric cast exemptor);
  DATA(insert OID = 1703 ( numeric  PGNSP PGUID 12 
1 0 0 f f f t f i 2 0 1700 1700 23 _null_ _null_ _null_ _null_ numeric _null_ 
_null_ _null_ ));
  DESCR(adjust numeric to typmod precision/scale);
  DATA(insert OID = 1704 ( numeric_abs  PGNSP PGUID 12 1 0 0 f 
f f t f i 1 0 1700 1700 _null_ _null_ _null_ _null_ numeric_abs _null_ _null_ 
_null_ ));
*** a/src/include/utils/builtins.h
--- b/src/include/utils/builtins.h
***
*** 878,883  extern Datum numeric_recv(PG_FUNCTION_ARGS);
--- 878,884 
  extern Datum numeric_send(PG_FUNCTION_ARGS);
  extern Datum numerictypmodin(PG_FUNCTION_ARGS);
  extern Datum numerictypmodout(PG_FUNCTION_ARGS);
+ extern Datum numeric_exemptor(PG_FUNCTION_ARGS);
  extern Datum numeric (PG_FUNCTION_ARGS);
  extern Datum numeric_abs(PG_FUNCTION_ARGS);
  extern Datum numeric_uminus(PG_FUNCTION_ARGS);
*** a/src/test/regress/expected/alter_table.out
--- b/src/test/regress/expected/alter_table.out
***
*** 1804,1845  DEBUG:  Rebuilding index t_touchy_f_idx
  DEBUG:  Rebuilding index t_expr_idx
  DEBUG:  Validating foreign key constraint t_constraint3_fkey
  ALTER TABLE t ALTER constraint3 TYPE numeric(7,2); -- verify; FK noop
! DEBUG:  Rewriting table t
! DEBUG:  Rebuilding index t_constraint4_key
! DEBUG:  Rebuilding index t_integral_key
! DEBUG:  Rebuilding index t_rational_key
! DEBUG:  Rebuilding index t_daytimetz_key
! DEBUG:  Rebuilding index t_daytime_key
! DEBUG:  Rebuilding index t_stamptz_key
! DEBUG:

Re: [HACKERS] Streaming base backups

2011-01-09 Thread Cédric Villemain

2011/1/7 Magnus Hagander mag...@hagander.net:
 On Fri, Jan 7, 2011 at 01:47, Cédric Villemain
 cedric.villemain.deb...@gmail.com wrote:
 2011/1/5 Magnus Hagander mag...@hagander.net:
 On Wed, Jan 5, 2011 at 22:58, Dimitri Fontaine dimi...@2ndquadrant.fr 
 wrote:
 Magnus Hagander mag...@hagander.net writes:
 * Stefan mentiond it might be useful to put some
 posix_fadvise(POSIX_FADV_DONTNEED)
   in the process that streams all the files out. Seems useful, as long as 
 that
   doesn't kick them out of the cache *completely*, for other backends as 
 well.
   Do we know if that is the case?

 Maybe have a look at pgfincore to only tag DONTNEED for blocks that are
 not already in SHM?

 I think that's way more complex than we want to go here.


 DONTNEED will remove the block from OS buffer everytime.

 Then we definitely don't want to use it - because some other backend
 might well want the file. Better leave it up to the standard logic in
 the kernel.

Looking at the patch, it is (very) easy to add the support for that in
basebackup.c
That supposed allowing mincore(), so mmap(), and so probably switch
the fopen() to an open() (or add an open() just for mmap
requirement...)

Let's go ?


 It should not be that hard to implement a snapshot(it needs mincore())
 and to restore previous state. I don't know how basebackup is
 performed exactly...so perhaps I am wrong.

 Uh, it just reads the files out of the filesystem. Just like you'd to
 today, except it's now integrated and streams the data across a
 regular libpq connection.

 --
  Magnus Hagander
  Me: http://www.hagander.net/
  Work: http://www.redpill-linpro.com/




-- 
Cédric Villemain               2ndQuadrant
http://2ndQuadrant.fr/     PostgreSQL : Expertise, Formation et Support

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

[HACKERS] GIN indexscans versus equality selectivity estimation

2011-01-09 Thread Tom Lane

Whilst fooling around with GIN over the past few days, I noticed the
following rather surprising behavior:

regression=# create table t1 (f1 int[]);
CREATE TABLE
regression=# insert into t1 values (array[42]);
INSERT 0 1
regression=# create index ti1 on t1 using gin (f1);
CREATE INDEX
regression=# set enable_seqscan to 0;
SET
regression=# explain select * from t1 where f1 = array[42];
  QUERY PLAN   
---
 Seq Scan on t1  (cost=100.00..101.01 rows=1 width=32)
   Filter: (f1 = '{42}'::integer[])
(2 rows)

The system absolutely will not use the index in this state, even though
the GIN array opclass does support equality.  I dug around and
eventually figured out why:

1. We don't have any pg_stats statistics in this state; but we do know
reltuples = 1, since the CREATE INDEX helpfully updated the pg_class row
with that information.  Without stats, eqsel() falls back to a
selectivity estimate of 1/numdistinct; and without stats,
get_variable_numdistinct estimates the number of distinct values as
equal to reltuples, if that's a small number.  The upshot is that we get
a selectivity estimate of exactly 1.0 for the equality condition.

2. In indxpath.c, we have the following comment and code about selecting
which paths to generate bitmap scan paths for:

 * ... Also, pick out the ones that might be useful
 * as bitmap scans.  For that, we must discard indexes that don't support
 * bitmap scans, and we also are only interested in paths that have some
 * selectivity; we should discard anything that was generated solely for
 * ordering purposes.

if (ipath-indexinfo-amhasgetbitmap 
ipath-indexselectivity  1.0 
!ScanDirectionIsBackward(ipath-indexscandir))
bitindexpaths = lappend(bitindexpaths, ipath);

Since GIN doesn't support plain indexscans, only bitmap scans, this
means that the planner simply won't use a GIN index unless the estimated
selectivity is less than 1.0.

There are several things we might choose to do about this:

1. Do nothing.  The issue seems quite unlikely to affect anyone in the
field, since in fact use a seqscan is probably the right answer
anytime reltuples = 1; and anyway using a GIN index for plain equality
is a corner case to begin with.  However, it could confuse people who
were doing testing (it confused me!).

2. Hack the selectivity estimators so that we don't get exactly 1.0
for this sort of situation.  It might be plausible for
get_variable_numdistinct to set a floor of 2 on its result, for example;
or we could hack eqsel() to bound the no-stats estimate to a bit less
than 1.

3. Change the indxpath.c code to have some different way of determining
whether a path is sensible to use as a bitmap scan.  We could for
instance make the test be path has some indexquals and/or index is
partial.  Or maybe there should be an exception in the case where the
index type doesn't support plain indexscan.

I'm not really sold on any of these alternatives.  Thoughts?

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Compatibility GUC for serializable

2011-01-09 Thread Tom Lane

Kevin Grittner kevin.gritt...@wicourts.gov writes:
 If the SSI patch were to be accepted as is, REPEATABLE READ would
 continue to provide the exact same snapshot isolation behavior which
 both it and SERIALIZABLE do through 9.0, and SERIALIZABLE would
 always use SSI on top of the snapshot isolation to prevent
 serialization anomalies.  In his review, Jeff argued for a
 compatibility GUC which could be changed to provide legacy behavior
 for SERIALIZABLE transactions -- if set, SERIALIZABLE would fall back
 to working the same as REPEATABLE READ.

 In an off-list exchange with me, David Fetter expressed opposition to
 this, as a foot-gun.

I think we've learned over the years that GUCs that significantly change
semantics can be foot-guns.  I'm not sure exactly how dangerous this one
would be, but on the whole I'd prefer to avoid introducing a GUC here.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] GIN indexscans versus equality selectivity estimation

2011-01-09 Thread Josh Berkus

On 1/9/11 3:38 PM, Tom Lane wrote:
 1. Do nothing.  The issue seems quite unlikely to affect anyone in the
 field, since in fact use a seqscan is probably the right answer
 anytime reltuples = 1; and anyway using a GIN index for plain equality
 is a corner case to begin with.  However, it could confuse people who
 were doing testing (it confused me!).

+1.  It's an unexpected result, but not actually a bad one.  It just
doesn't seem worth messing with code which works in production just to
help testing.

-- 
  -- Josh Berkus
 PostgreSQL Experts Inc.
 http://www.pgexperts.com

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Compatibility GUC for serializable

2011-01-09 Thread Robert Haas

On Sun, Jan 9, 2011 at 7:47 PM, Tom Lane t...@sss.pgh.pa.us wrote:
 Kevin Grittner kevin.gritt...@wicourts.gov writes:
 If the SSI patch were to be accepted as is, REPEATABLE READ would
 continue to provide the exact same snapshot isolation behavior which
 both it and SERIALIZABLE do through 9.0, and SERIALIZABLE would
 always use SSI on top of the snapshot isolation to prevent
 serialization anomalies.  In his review, Jeff argued for a
 compatibility GUC which could be changed to provide legacy behavior
 for SERIALIZABLE transactions -- if set, SERIALIZABLE would fall back
 to working the same as REPEATABLE READ.

 In an off-list exchange with me, David Fetter expressed opposition to
 this, as a foot-gun.

 I think we've learned over the years that GUCs that significantly change
 semantics can be foot-guns.  I'm not sure exactly how dangerous this one
 would be, but on the whole I'd prefer to avoid introducing a GUC here.

I agree.  I think we should assume that existing code which asks for
serializable behavior wants serializable behavior, not broken
serializable behavior.  There certainly could be cases where the
opposite is true (the code wants, specifically, our traditional
definition of serializability rather than actual serializability) but
I bet there's not a whole lot of them, and changing such code to ask
for REPEATABLE READ probably isn't extremely difficult.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] GIN indexscans versus equality selectivity estimation

2011-01-09 Thread Robert Haas

On Sun, Jan 9, 2011 at 6:38 PM, Tom Lane t...@sss.pgh.pa.us wrote:
 or we could hack eqsel() to bound the no-stats estimate to a bit less
 than 1.

This seems like a pretty sensible thing to do.  I can't immediately
imagine a situation in which 1.0 is a sensible selectivity estimate in
the no-stats case and 0.90 (say) is a major regression.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] hstore ? operator versus mathematics

2011-01-09 Thread David Fetter

On Sun, Jan 09, 2011 at 04:10:25PM -0500, Tom Lane wrote:
 hstore's hstore ? text[] operator is defined as contains all, ie,
 it will return true if all the key names found in the text array are
 present in the hstore.
 
 ISTM that a sane definition of this operator would provide that if the
 array is empty, it returns true: every set contains the empty set.
 However, the code goes out of its way to return false instead.
 
 Perhaps this was done intentionally because there was no way to make
 GIN index searches work compatibly ... but now there is, so I see no
 reason to remain bug-compatible with this behavior.
 
 Barring objections, I'm going to fix it.

+1 for fixing it.

Cheers,
David.
-- 
David Fetter da...@fetter.org http://fetter.org/
Phone: +1 415 235 3778  AIM: dfetter666  Yahoo!: dfetter
Skype: davidfetter  XMPP: david.fet...@gmail.com
iCal: webcal://www.tripit.com/feed/ical/people/david74/tripit.ics

Remember to vote!
Consider donating to Postgres: http://www.postgresql.org/about/donate

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Streaming base backups

Re: [HACKERS] WIP: Range Types

Re: [HACKERS] Streaming base backups

Re: [HACKERS] Streaming base backups

Re: [HACKERS] Streaming base backups

Re: [HACKERS] system views for walsender activity

Re: [HACKERS] estimating # of distinct values

Re: [HACKERS] Streaming base backups

Re: [HACKERS] system views for walsender activity

Re: [HACKERS] SSI patch(es)

Re: [HACKERS] [COMMITTERS] pgsql: Properly install gram.h on MSVC builds

Re: [HACKERS] [COMMITTERS] pgsql: Properly install gram.h on MSVC builds

Re: [HACKERS] Support for negative index values in array fetching

Re: [HACKERS] [COMMITTERS] pgsql: Properly install gram.h on MSVC builds

Re: [HACKERS] [COMMITTERS] pgsql: Properly install gram.h on MSVC builds

[HACKERS] SSI and 2PC

[HACKERS] Compatibility GUC for serializable

Re: [HACKERS] Compatibility GUC for serializable

[HACKERS] multiset patch review

Re: [HACKERS] GiST insert algorithm rewrite

[HACKERS] hstore ? operator versus mathematics

[HACKERS] ALTER TYPE 1: recheck index-based constraints

[HACKERS] ALTER TYPE 2: skip already-provable no-work rewrites

[HACKERS] ALTER TYPE 4: temporal data types

[HACKERS] ALTER TYPE 5: varbit and bit

[HACKERS] ALTER TYPE 6: numeric

Re: [HACKERS] Streaming base backups

[HACKERS] GIN indexscans versus equality selectivity estimation

Re: [HACKERS] Compatibility GUC for serializable

Re: [HACKERS] GIN indexscans versus equality selectivity estimation

Re: [HACKERS] Compatibility GUC for serializable

Re: [HACKERS] GIN indexscans versus equality selectivity estimation

Re: [HACKERS] hstore ? operator versus mathematics

33 matches

Site Navigation

Mail list logo

Footer information