Re: [HACKERS] 9.3 Pre-proposal: Range Merge Join

2012-04-19 Thread Jeff Davis
On Tue, 2012-04-17 at 14:24 -0400, Robert Haas wrote:
 I thought Jeff was parenthetically complaining about cases like A LEFT
 JOIN (B INNER JOIN C ON b.y = c.y) ON a.x  b.x.  That presumably
 would require the parameterized-path stuff to have any chance of doing
 partial index scans over B.  However, I understand that's not the main
 issue here.

To take the mystery out of it, I was talking about any case where an
index scan is impossible or impractical. For instance, let's say the
ranges are computed values. Just to make it really impossible, let's say
the ranges are computed from columns in two different tables joined in a
subquery.

But yes, the ability of the planner to find the plan is also an issue
(hopefully less of one with the recent improvements).

 One thing that I think needs some analysis is when the range join idea
 is better or worse than a nested loop with inner index-scan, because
 potentially those are the options the planner has to choose between,
 and the costing model had better know enough to make the right thing
 happen.  It strikes me that the nested loop with inner index-scan is
 likely to be a win when there are large chunks of the indexed relation
 that the nestloop never needs to visit at all - imagine small JOIN big
 ON small.a  big.a, for example.  I suppose the really interesting
 question is how much we can save when the entirety of both relations
 has to be visited anyway - it seems promising, but I guess we won't
 know for sure without testing it.

Right, I will need to come up with a prototype that can at least test
the executor piece. I suspect that the plan choice won't be all that
different from an ordinary index nestloop versus mergejoin case, but
with much worse cardinality estimates to work with.

Regards,
Jeff Davis



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] Re: SPGiST versus hot standby - question about conflict resolution rules

2012-04-19 Thread Noah Misch
On Mon, Mar 12, 2012 at 10:50:36PM -0400, Tom Lane wrote:
 There is one more (known) stop-ship problem in SPGiST, which I'd kind of
 like to get out of the way now before I let my knowledge of that code
 get swapped out again.  This is that SPGiST is unsafe for use by hot
 standby slaves.

I suspect that swap-out has passed, but ...

 The problem comes from redirect tuples, which are short-lifespan
 objects that replace a tuple that's been moved to another page.
 A redirect tuple can be recycled as soon as no active indexscan could
 be in flight from the parent index page to the moved tuple.  SPGiST
 implements this by marking each redirect tuple with the XID of the
 creating transaction, and assuming that the tuple can be recycled once
 that XID is below the OldestXmin horizon (implying that all active
 transactions started after it ended).  This is fine as far as
 transactions on the master are concerned, but there is no guarantee that
 the recycling WAL record couldn't be replayed on a hot standby slave
 while there are still HS transactions that saw the old state of the
 parent index tuple.
 
 Now, btree has a very similar problem with deciding when it's safe to
 recycle a deleted index page: it has to wait out transactions that could
 be in flight to the page, and it does that by marking deleted pages with
 XIDs.  I see that the problem has been patched for btree by emitting a
 special WAL record just before a page is recycled.  However, I'm a bit
 nervous about copying that solution, because the details are a bit
 different.  In particular, I see that btree marks deleted pages with
 ReadNewTransactionId() --- that is, the next-to-be-assigned XID ---
 rather than the XID of the originating transaction, and then it
 subtracts one from the XID before sending it to the WAL stream.
 The comments about this are not clear enough for me, and so I'm

Attempting to write an explanation for that btree code led me conclude that
the code is incorrect.  (FWIW, I caused that wrongness.)  I will start a
separate thread to fix it.

 wondering whether it's okay to use the originating transaction XID
 in a similar way, or if we need to modify SPGiST's rule for how to
 mark redirection tuples.  I think that the use of ReadNewTransactionId
 is because btree page deletion happens in VACUUM, which does not have
 its own XID; this is unlike the situation for SPGiST where creation of
 redirects is caused by index tuple insertion, so there is a surrounding
 transaction with a real XID.  But it's not clear to me how
 GetConflictingVirtualXIDs makes use of the limitXmin and whether a live
 XID is okay to pass to it, or whether we actually need next XID - 1.
 
 Info appreciated.

GetConflictingVirtualXIDs() selects transactions with pgaxt-xmin =
limitXmin.  The prototype use case was VACUUM, where limitXmin is the xmax of
a dead tuple cleaned from a page.  Transactions with xmin = limitXmin might
still regard the limitXmin XID as uncommitted; we conflict so they cannot fail
to see the tuple we're about to purge.

All hot standby transactions holding snapshots taken before the startup
process applies the tuple-mover transaction's commit record will have xmin =
its XID.  Therefore, passing that XID to ResolveRecoveryConflictWithSnapshot()
meets the need here precisely.  vacuum_defer_cleanup_age and
hot_standby_feedback inform GetOldestXmin(), so the master will delay cleanup
long enough to prevent conflicts when so configured.

Thanks,
nm

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] 9.3 Pre-proposal: Range Merge Join

2012-04-19 Thread Jeff Davis
On Tue, 2012-04-17 at 14:03 -0400, Robert Haas wrote:
 I'm actually not sure these are equivalent formulations.  Suppose one
 side has [i,i] where i ranges from 1 to 1000 and the other side
 the exact same thing plus [1,1000].  That one really big range
 will come up second on the right side, and you will not be able to
 discard it until you reach the end of the join.  If you just keep
 rewinding the right side, you're going to end up with O(n^2) behavior,
 whereas if you can discard tuples from the middle on the right side,
 then you will get O(n) behavior, which is a big difference.  In other
 words, in your original algorithm the tuples that you discard in step
 4 or 5 need not be the first remaining tuple on whichever side of the
 join we're talking about.

To illustrate the problem (slightly modified), I'll write the sets out
verbatim rather than use range syntax:

  {1,2,3}  {1,2,3,4,5,6,7,8,9}
  {  2,3,4}{  2,3,4}. . . . .
  {3,4,5}  {3,4,5}. . . .
  {  4,5,6}{  4,5,6}. . .
  {5,6,7}  {5,6,7}. .
  {  6,7,8}{  6,7,8}.
  {7,8,9}  {7,8,9}

The . are supposed to represent a shadow that the large range [1,9]
casts below it. This shadow prevents the discarding of [2,4] on the RHS
even when processing range [5,7] on the LHS, because we can't discard
out of the middle.

Note that, if you just have some large ranges and a lot of overlap,
that's not really a problem with the algorithm, it's just a large result
to compute. The problem comes when the ranges vary in size by quite a
lot, and there are many ranges that could be eliminated but can't
because of the shadow.

This problem can be mitigated substantially, I believe. Let me change
the algorithm (I don't like the way the pseudocode was structured
anyway), and word it a little more loosely:

1. Look at the top ranges on each side. Choose the one with the greater
upper bound, and call that Range1 from Side1, and the other range R2
from Side2. If either Range1 or Range2 is empty, terminate.
2. Scan down Side2, discarding ranges that are strictly before, and
joining with ranges that overlap, stopping when you hit a range that is
strictly after.
3. Now, discard Range1, and reset Side2 to the first non-discarded
range. Goto 1.

The benefit is, in step 1, we choose a side that will *always* discard
the top tuple. And if we choose the one with the greater upper bound,
then we are going to eliminate the largest shadow.

That doesn't eliminate the problem entirely, but it seems like it would
reduce it a lot.

Regarding the idea of discarding tuples in the middle, that might be an
interesting approach as well. It might be as simple as setting a flag in
the tuple header (like was done for full hash joins). Still not perfect,
but would make redundant checks very cheap. Combined with my strategy,
there's a good chance that we practically eliminate the problem.

Regards,
Jeff Davis


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Re: SPGiST versus hot standby - question about conflict resolution rules

2012-04-19 Thread Simon Riggs
On Thu, Apr 19, 2012 at 7:55 AM, Noah Misch n...@leadboat.com wrote:
 On Mon, Mar 12, 2012 at 10:50:36PM -0400, Tom Lane wrote:
 There is one more (known) stop-ship problem in SPGiST, which I'd kind of
 like to get out of the way now before I let my knowledge of that code
 get swapped out again.  This is that SPGiST is unsafe for use by hot
 standby slaves.

 I suspect that swap-out has passed, but ...

 The problem comes from redirect tuples, which are short-lifespan
 objects that replace a tuple that's been moved to another page.
 A redirect tuple can be recycled as soon as no active indexscan could
 be in flight from the parent index page to the moved tuple.  SPGiST
 implements this by marking each redirect tuple with the XID of the
 creating transaction, and assuming that the tuple can be recycled once
 that XID is below the OldestXmin horizon (implying that all active
 transactions started after it ended).  This is fine as far as
 transactions on the master are concerned, but there is no guarantee that
 the recycling WAL record couldn't be replayed on a hot standby slave
 while there are still HS transactions that saw the old state of the
 parent index tuple.

 Now, btree has a very similar problem with deciding when it's safe to
 recycle a deleted index page: it has to wait out transactions that could
 be in flight to the page, and it does that by marking deleted pages with
 XIDs.  I see that the problem has been patched for btree by emitting a
 special WAL record just before a page is recycled.  However, I'm a bit
 nervous about copying that solution, because the details are a bit
 different.  In particular, I see that btree marks deleted pages with
 ReadNewTransactionId() --- that is, the next-to-be-assigned XID ---
 rather than the XID of the originating transaction, and then it
 subtracts one from the XID before sending it to the WAL stream.
 The comments about this are not clear enough for me, and so I'm

 Attempting to write an explanation for that btree code led me conclude that
 the code is incorrect.  (FWIW, I caused that wrongness.)  I will start a
 separate thread to fix it.

Wrong or not, we need to better document why we picked
ReadNewTransactionID(), rather than OldestXmin which seems the more
obvious and cheaper choice.

 All hot standby transactions holding snapshots taken before the startup
 process applies the tuple-mover transaction's commit record will have xmin =
 its XID.  Therefore, passing that XID to ResolveRecoveryConflictWithSnapshot()
 meets the need here precisely.

Yes, agreed. i.e. don't subtract 1.

-- 
 Simon Riggs   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Bug tracker tool we need

2012-04-19 Thread Susanne Ebrecht

Am 18.04.2012 14:28, schrieb Robert Haas:
So I think Greg has exactly the right idea: we shouldn't try to 
incorporate one of these systems that aims to manage workflow; we 
should just design something really simple that tracks what happened 
and lets people who wish to volunteer to do so help keep that tracking 
information up to date. 


I tested a lot tools for bug / issue tracking and I figured out that 
almost all of them either have had

too much overhead or not really were made for database business.
Additionally more often the sentence we support PostgreSQL just was a 
marketing trap.

Means I figured out that the PostgreSQL support totally sucked.

My opinion is that a tool should mirror your business and not that you 
build your business around the

given tool.

Looking for a bug tracking too - there are some points that are 
mandatory for us:

1. it should run on PostgreSQL
2. it should be open source - if possible BSD license - if possible 
there shouldn't be a

single company behind it
3. it should be user friendly - no need to click here and there to get 
all informations

4. It should be able to communicate with our version control system
- when we get the idea to move away from git to something else - it 
should be able by just a few

 changes that the tool will communicate with the new system
5. it should be possible to do almost all via email

My personal dream would be that it would be possible to do almost all 
via irc bot but that is fiction.


I think a tool should be slim and simple. It should exactly do what you 
want to do.


You should be able to change the tool code that way that it exactly is 
doing what you want to do.


Let me give you an example:
bugs.mysql.com might be far away from being perfect.
It is slim - and on developer side it has had all that the development 
needed.
My information is that originally they took the code from the php bug 
tracking system and
recoded it / implemented features so that it was doing a good job on 
database bugs.
When the developers needed tool changes or additionally features then 
they just were coded.
I never heard a developer saying that he hates the system. There just 
were lots of ideas how
this or that could be solved better. That is normal - when you are 
working with the tool every

day - of course you will get ideas what could be solved better.

So yes - I think Greg is right. We should design something really simple 
that exactly is doing
what we need. With some luck we might not need to start from scratch. 
Maybe there is a tool
outside that is slim and good enough to deliver the base code on which 
we can start recoding.


Just my 2ct,

Susanne

--
Dipl. Inf. Susanne Ebrecht - 2ndQuadrant
PostgreSQL Development, 24x7 Support, Training and Services
www.2ndQuadrant.com


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Gsoc2012 idea, tablesample

2012-04-19 Thread Sandro Santilli
On Tue, Apr 17, 2012 at 04:29:52PM -0400, Stephen Frost wrote:
 Josh,
 
 * Josh Berkus (j...@agliodbs.com) wrote:
  FWIW, the PostGIS folks would *really* love to have a TABLESAMPLE which
  worked with geographic indexes.  This would be tremendously useful for
  constructing low-resolution zoom out tiles on maps and similar.
 
 I'm familiar with the concept of 'zoom out' tiles and PostGIS, but I
 don't actually see the connection between that and TABLESAMPLE.  Perhaps
 I'm missing something, but I've never seen a case where you create 'zoom
 out' tiles by just grabbing some portion of the data at random, as that
 would end up creating empty spots or missing pieces.

Actually a random sample would really be representative of the data
distribution. What the type analyzer gets is a sample and that sample
is what the estimator looks at to answer the question:

 How many rows fall in this rectangle ?

You can see how well it works by passing your queries using  operator
to EXPLAIN ANALYZE and compare estimated/real.

I'm looking for a way to fetch random samples these days so I confirm
the need for a quick way to fetch the same sample that analyze
command fetches but at SQL level.

--strk; 

  ,--o-. 
  |   __/  |Delivering high quality PostGIS 2.0 !
  |  / 2.0 |http://strk.keybit.net - http://vizzuality.com
  `-o--'


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] 9.3 Pre-proposal: Range Merge Join

2012-04-19 Thread Jeff Davis
On Wed, 2012-04-18 at 01:21 -0400, Tom Lane wrote:
 It would be a pretty weird implementation of mergejoin that could
 discard tuples from the middle of an input stream.  Or to be more
 specific, it wouldn't be the mergejoin itself that could do that at all
 --- you'd need the input plan node to be some sort of tweaked version of
 tuplestore or tuplesort that could respond to a request like that.

As I said in my reply to Robert, I think there are some ways we can make
this idea work.

 I can't escape the feeling that Jeff has chosen the wrong basis for his
 thought experiment, and that what he really ought to be thinking about
 is hashjoin, which keeps an in-memory table that it could easily modify
 on the fly if it chose to.  The multi-batch aspect of hybrid hashjoin
 could be useful too (IOW, when you're out of better ideas, throw the
 tuple back in the water to process later).

Obviously hashing is not going to be much use for anything but equality.
So I believe this approach is very similar to the temporary-index
method, except with batching, and always keeping the index in memory.

I don't think we would get the partitioning benefit of hashjoin, because
we'd have to put the same tuple in multiple partitions, so it's probably
better to just leave the outer side intact.

But in-memory indexes and multiple passes of the outer seems like a
reasonable alternative, particularly because an in-memory index might be
very fast (to build as well as query).

 This is just handwaving of course.  I think some digging in the
 spatial-join literature would likely find ideas better than any of
 these.

I will look in some more detail. The merge-like approach did seem to be
represented in the paper referenced by Alexander (the external plane
sweep), but it also refers to several methods based on partitioning.

I'm beginning to think that more than one of these ideas has merit.

Regards,
Jeff Davis


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] New sync commit mode remote_write

2012-04-19 Thread Magnus Hagander
I admit to not having followed the discussion around the new mode for
synchronous_commit very closely, so my apologies if this has been
discussed and dismiseed - I blame failing to find it int he archives
;)

My understanding from looking at the docs is that
synchronous_commit=remote_write will always imply a *local* commit as
well.

Is there any way to set the system up to do a write to the remote,
ensure it's in memory of the remote (remote_write mode, not full sync
to disk), but *not* necessarily to the local disk? Meaning we're ok to
release the transaction when the data is in memory both locally and
remotely but not wait for I/O?

Seems there is a pretty large usecase for this particular in our
lovely new cloud environments with pathetic I/O performance

-- 
 Magnus Hagander
 Me: http://www.hagander.net/
 Work: http://www.redpill-linpro.com/

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Gsoc2012 idea, tablesample

2012-04-19 Thread Sandro Santilli
On Mon, 16 Apr 2012 23:17:25 -0700, Heikki Linnakangas wrote:

 1. We probably don't want the SQL syntax to be added to the
grammar. This should be written as an extension, using custom
functions as the API, instead of extra SQL syntax.

I can't find the discussion about this, have any pointer ?

I've found a patch of 2007 by Gavin Sherry implementing the SQL 2003
TABLESAMPLE syntax. May be a good starting point ?
http://www.neilconway.org/talks/hacking/

--strk;

  ,--o-. 
  |   __/  |Delivering high quality PostGIS 2.0 !
  |  / 2.0 |http://strk.keybit.net - http://vizzuality.com
  `-o--'


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] New sync commit mode remote_write

2012-04-19 Thread Robert Haas
On Apr 19, 2012, at 5:05 AM, Magnus Hagander mag...@hagander.net wrote:
 I admit to not having followed the discussion around the new mode for
 synchronous_commit very closely, so my apologies if this has been
 discussed and dismiseed - I blame failing to find it int he archives
 ;)
 
 My understanding from looking at the docs is that
 synchronous_commit=remote_write will always imply a *local* commit as
 well.
 
 Is there any way to set the system up to do a write to the remote,
 ensure it's in memory of the remote (remote_write mode, not full sync
 to disk), but *not* necessarily to the local disk? Meaning we're ok to
 release the transaction when the data is in memory both locally and
 remotely but not wait for I/O?

If we crash, the slave can end up ahead of the master, and then it's hopelessly 
corrupted...

Maybe we could engineer around this, but it hasn't been done yet.

...Robert
-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] New sync commit mode remote_write

2012-04-19 Thread Magnus Hagander
On Thu, Apr 19, 2012 at 12:40, Simon Riggs si...@2ndquadrant.com wrote:
 On Thu, Apr 19, 2012 at 10:05 AM, Magnus Hagander mag...@hagander.net wrote:
 I admit to not having followed the discussion around the new mode for
 synchronous_commit very closely, so my apologies if this has been
 discussed and dismiseed - I blame failing to find it int he archives
 ;)

 My understanding from looking at the docs is that
 synchronous_commit=remote_write will always imply a *local* commit as
 well.

 Is there any way to set the system up to do a write to the remote,
 ensure it's in memory of the remote (remote_write mode, not full sync
 to disk), but *not* necessarily to the local disk? Meaning we're ok to
 release the transaction when the data is in memory both locally and
 remotely but not wait for I/O?

 Seems there is a pretty large usecase for this particular in our
 lovely new cloud environments with pathetic I/O performance

 Yeh, its on my TODO list.

 What we need to do is to send the last written point as part of the
 replication protocol, so the standby can receive it, yet know not to
 apply it yet in case of crash.

 I was expecting that to change as a result of efforts to improve
 WALInsertLock, so I didn't want to do something that would be
 immediately invalidated.

Understood. Something to look forward in 9.3 then :-)

-- 
 Magnus Hagander
 Me: http://www.hagander.net/
 Work: http://www.redpill-linpro.com/

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] New sync commit mode remote_write

2012-04-19 Thread Simon Riggs
On Thu, Apr 19, 2012 at 10:05 AM, Magnus Hagander mag...@hagander.net wrote:
 I admit to not having followed the discussion around the new mode for
 synchronous_commit very closely, so my apologies if this has been
 discussed and dismiseed - I blame failing to find it int he archives
 ;)

 My understanding from looking at the docs is that
 synchronous_commit=remote_write will always imply a *local* commit as
 well.

 Is there any way to set the system up to do a write to the remote,
 ensure it's in memory of the remote (remote_write mode, not full sync
 to disk), but *not* necessarily to the local disk? Meaning we're ok to
 release the transaction when the data is in memory both locally and
 remotely but not wait for I/O?

 Seems there is a pretty large usecase for this particular in our
 lovely new cloud environments with pathetic I/O performance

Yeh, its on my TODO list.

What we need to do is to send the last written point as part of the
replication protocol, so the standby can receive it, yet know not to
apply it yet in case of crash.

I was expecting that to change as a result of efforts to improve
WALInsertLock, so I didn't want to do something that would be
immediately invalidated.

-- 
 Simon Riggs   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] Dump EXTENSION sequences too

2012-04-19 Thread Gianni Ciolli
Hi,

currently an EXTENSION can mark some of its tables as configuration
tables using pg_catalog.pg_extension_config_dump(), so that pg_dump
does the right thing.

I think it would be useful to mark sequences too, but unfortunately it
is not possible; hence, each time a dump is reloaded, all the
sequences in the extension are reset to 1, causing all the related
problems.

Moreover, the error message that we get if we try to mark a sequence
does not mention the requirement that the relation is a table. The
OID %u does not refer to a table error message seems to be wrongly
restricted to the case when get_rel_name can't find a relation.

Is there any objection to the above proposal? I did a little search of
the archives, but I couldn't find any related discussions; I apologise
if I missed something.

Thanks,
Dr. Gianni Ciolli - 2ndQuadrant Italia
PostgreSQL Training, Services and Support
gianni.cio...@2ndquadrant.it | www.2ndquadrant.it

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] pg_receivexlog stops upon server restart

2012-04-19 Thread Thom Brown
On 10 April 2012 21:07, Magnus Hagander mag...@hagander.net wrote:
 On Friday, April 6, 2012, Thom Brown wrote:

 Hi,

 I've tried out pg_receivexlog and have noticed that when restarting
 the cluster, pg_receivexlog gets cut off... it doesn't keep waiting.
 This is surprising as the DBA would have to remember to start
 pg_receivexlog up again.


 This is intentional as far as that's how the code was written, there's not a
 malfunctioning piece of code somewhere.

 It would probably make sense to have an auto-reconnect feature, and to have
 an option to turn it on/off.

 If you haven't already (my wifi here is currently quite useless, which is
 why I'm working on my email backlog, so I can't check), please add it to the
 open items list.

I think it would also be useful to add a paragraph to the
documentation stating use-cases for this feature, and its advantages.

-- 
Thom

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Gsoc2012 idea, tablesample

2012-04-19 Thread Stephen Frost
* Sandro Santilli (s...@keybit.net) wrote:
 Actually a random sample would really be representative of the data
 distribution. What the type analyzer gets is a sample and that sample
 is what the estimator looks at to answer the question:

That might work if all you have is point data, but lines, polygons, etc,
you're typically going to want to see, just not at the same resolution..
At least, when you're talking about 'zoom-out' tiles, which is what this
was about up thread.

 I'm looking for a way to fetch random samples these days so I confirm
 the need for a quick way to fetch the same sample that analyze
 command fetches but at SQL level.

I'm all for supporting that and implementing this feature, I just don't
think it's going to be all that useful for zoom-out tiles when complex
geometries are involved.

Thanks,

Stephen


signature.asc
Description: Digital signature


Re: [HACKERS] Gsoc2012 idea, tablesample

2012-04-19 Thread Sandro Santilli
On Thu, Apr 19, 2012 at 08:47:51AM -0400, Stephen Frost wrote:
 * Sandro Santilli (s...@keybit.net) wrote:
  Actually a random sample would really be representative of the data
  distribution. What the type analyzer gets is a sample and that sample
  is what the estimator looks at to answer the question:
 
 That might work if all you have is point data, but lines, polygons, etc,
 you're typically going to want to see, just not at the same resolution..
 At least, when you're talking about 'zoom-out' tiles, which is what this
 was about up thread.
 
  I'm looking for a way to fetch random samples these days so I confirm
  the need for a quick way to fetch the same sample that analyze
  command fetches but at SQL level.
 
 I'm all for supporting that and implementing this feature, I just don't
 think it's going to be all that useful for zoom-out tiles when complex
 geometries are involved.

Me neither. But for points it sounds very useful.
And we know it is useful for lines and polygons as well when it comes
to estimate overlaps... (since the estimator does a good job even for
lines and polygons)

I really hope Neil Conway work of 2007 could make it into PostgreSQL.

Look, the same work was a topic of an homework assignment at Berkley in
2005: http://inst.eecs.berkeley.edu/~cs186/fa05/hw/hw2/hw2.html

And the whole thing is in the SQL standard 2003 

--strk;

  ,--o-. 
  |   __/  |Delivering high quality PostGIS 2.0 !
  |  / 2.0 |http://strk.keybit.net - http://vizzuality.com
  `-o--'


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Bug tracker tool we need

2012-04-19 Thread Alvaro Herrera

Excerpts from Tom Lane's message of mié abr 18 03:12:09 -0300 2012:
 
 Magnus Hagander mag...@hagander.net writes:
  I think this cleraly outlines that we need to remember that there are
  *two* different patterns that people are trying tosolve with the
  bugtracker.
 
 Yeah, remember we drifted to this topic from discussion of management of
 CF patches, which might be yet a third use-case.  It's not obvious that
 it's the same as tracking unfixed bugs, at least; though maybe the
 requirements end up the same.
 
  Any tool we'd go for should aim to cover *both* usecases.
 
 Not convinced that we should expect one tool to be good at both
 (or all three) things.

So maybe we need more than one tool to present the information to
different kinds of users, but we need only *one* database backing them
all, right?

-- 
Álvaro Herrera alvhe...@commandprompt.com
The PostgreSQL Company - Command Prompt, Inc.
PostgreSQL Replication, Consulting, Custom Development, 24x7 support

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Bug tracker tool we need

2012-04-19 Thread Greg Sabino Mullane

-BEGIN PGP SIGNED MESSAGE-
Hash: RIPEMD160


 So I think Greg has exactly the right idea: we shouldn't try to 
 incorporate one of these systems that aims to manage workflow; we 
 should just design something really simple that tracks what happened 
 and lets people who wish to volunteer to do so help keep that tracking 
 information up to date. 

Note: the above is the other Greg :)

If we are serious about building this ourselves, and we feel it 
is important, maybe we should sponsor it via our group funds or 
some other means? Seems like everyone here has lots of ideas but 
little free time.

- -- 
Greg Sabino Mullane g...@turnstep.com
End Point Corporation http://www.endpoint.com/
PGP Key: 0x14964AC8 201204191031
http://biglumber.com/x/web?pk=2529DF6AB8F79407E94445B4BC9B906714964AC8
-BEGIN PGP SIGNATURE-

iEYEAREDAAYFAk+QIeUACgkQvJuQZxSWSsi5NACg4ruX3jvuQ5zKnxbBPu2Kc9wW
C+EAoPsIt2n0bbYau/aPhPbVdm+JPHj3
=j1XN
-END PGP SIGNATURE-



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Bug tracker tool we need

2012-04-19 Thread Greg Sabino Mullane

-BEGIN PGP SIGNED MESSAGE-
Hash: RIPEMD160


 My github.com account currently has 4264 notifications in the inbox.
 Almost all of those are spam, growing constantly. �Because of that, the
 platform is currently fairly useless to me for actually communicating or
 collaborating on code.

 That's about the same amount that I have.

I have no spam at all, despite being a fairly early github adopter. 
Wonder what the difference is?

- -- 
Greg Sabino Mullane g...@turnstep.com
PGP Key: 0x14964AC8 201204191044
http://biglumber.com/x/web?pk=2529DF6AB8F79407E94445B4BC9B906714964AC8
-BEGIN PGP SIGNATURE-

iEYEAREDAAYFAk+QJO4ACgkQvJuQZxSWSsg7OgCggq2MVw10W2+XxCyoDSdbjTYP
JOAAoLVJeX/V5j1h8r0dpvyJAw9/O+BU
=puT/
-END PGP SIGNATURE-



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Bug tracker tool we need

2012-04-19 Thread Magnus Hagander
On Thu, Apr 19, 2012 at 16:45, Greg Sabino Mullane g...@turnstep.com wrote:

 My github.com account currently has 4264 notifications in the inbox.
 Almost all of those are spam, growing constantly. �Because of that, the
 platform is currently fairly useless to me for actually communicating or
 collaborating on code.

 That's about the same amount that I have.

 I have no spam at all, despite being a fairly early github adopter.
 Wonder what the difference is?

The vast majority of the spam I have originates in the postgresql git
repository. You don't have any commits there...

But I would've assumed it should hit equally hard on other
repositories that's been around a long time.

-- 
 Magnus Hagander
 Me: http://www.hagander.net/
 Work: http://www.redpill-linpro.com/

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Bug tracker tool we need

2012-04-19 Thread Christopher Browne
On Thu, Apr 19, 2012 at 10:49 AM, Magnus Hagander mag...@hagander.net wrote:
 On Thu, Apr 19, 2012 at 16:45, Greg Sabino Mullane g...@turnstep.com wrote:

 My github.com account currently has 4264 notifications in the inbox.
 Almost all of those are spam, growing constantly. �Because of that, the
 platform is currently fairly useless to me for actually communicating or
 collaborating on code.

 That's about the same amount that I have.

 I have no spam at all, despite being a fairly early github adopter.
 Wonder what the difference is?

 The vast majority of the spam I have originates in the postgresql git
 repository. You don't have any commits there...

 But I would've assumed it should hit equally hard on other
 repositories that's been around a long time.

I have plenty of commits on the Slony Git repo, which has had clones
at github for about as long as PostgreSQL has.

And I don't get any noticeable amounts of spam at github.  Not all
notifications are hugely interesting, but I don't see anything that's
not reasonably related to things I have commented on.

So I think there has to be some other effect in play.
-- 
When confronted by a difficult problem, solve it by reducing it to the
question, How would the Lone Ranger handle this?

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Bug #6593, extensions, and proposed new patch policy

2012-04-19 Thread Dimitri Fontaine
Hi,

Robert Haas robertmh...@gmail.com writes:
 On Wed, Apr 18, 2012 at 7:54 PM, Tom Lane t...@sss.pgh.pa.us wrote:
 However, ignoring that issue for the moment, this patch is making me
 uncomfortable.  I have a vague recollection that we deliberately omitted
 ALTER EXTENSION OWNER because of security or definitional worries.
 (Dimitri, does that ring any bells?)  I wonder whether we should insist
 that the new owner be a superuser, as the original owner must have been.

I remember not having included any OWNER in my submitted patch, because
I though we didn't really need one as far as the pg_extension entry is
concerned. You added one to better integrate with our standard practice
and for pg_dump purposes IIRC, but that's about it. I don't remember
that we talked about specific extension's objects owner: they get owned
by the user installing them.

 Don't we have non-superuser extensions, that can be installed with
 just DBA privileges?

We do, so the extension owner could well be the database owner and not a
superuser. That can only happens if the extension contains no C coded
parts, obviously, and that's why it's so easy to forget about --- plans
have been discussed to work on per-database module installation, not yet
about non-superuser installing a C module. I think the open possibility
of crashing the whole cluster is driving us not to relax the trusted bit
for the C language ever.

 Anyhow, it seems a bit nannyish, unless I'm missing something.  If the
 current owner is a superuser and s/he wants to give the object to a
 non-superuser, you can't really stop them.  They can just make the
 target user a superuser, give 'em the object, and make them not a
 superuser, all in one transaction no less.

The interesting use case would be reassigning the extension from a
database owner to a superuser so that the database owner can't remove it
nor upgrade it no more. I think we want to allow that.

The reassign from a superuser to a non-superuser can be argued both
ways, unless the extension is marked as superuser = true in the control
file. Which means we should better register that into the catalogs
because the control file can be updated on the file system and that's
out of reach for the cluster.

What about only issuing a WARNING that the extensions are not supported
by reassign owned in 9.1 (and probably 9.2)?

In next versions, we will be able to register the control file superuser
parameter and allow reassigning in a more controlled fashion, both the
catalog entry in pg_extension and the extension's objects.

Maybe rather than adding the control file into the catalogs piece after
piece we should install a copy of it at create/alter time in a database
specific location from the per major version shared location in the OS.
That way the user could be updating the OS shipped file then decide
which cluster and which database to upgrade with that, and you could
have PostGIS 1.5 in database foo and 2.0 in database bar. The OS would
still need to provide for a way to upgrade them separately, but that
would be much better than what we have now.

When doing things this way, we could trust the extension's control file
in cases such as ALTER EXTENSION OWNER, because it's now a private copy
of it.

Regards,
-- 
Dimitri Fontaine
http://2ndQuadrant.fr PostgreSQL : Expertise, Formation et Support

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] New sync commit mode remote_write

2012-04-19 Thread Jeff Janes
On Thu, Apr 19, 2012 at 2:47 AM, Robert Haas robertmh...@gmail.com wrote:
 On Apr 19, 2012, at 5:05 AM, Magnus Hagander mag...@hagander.net wrote:
 I admit to not having followed the discussion around the new mode for
 synchronous_commit very closely, so my apologies if this has been
 discussed and dismiseed - I blame failing to find it int he archives
 ;)

 My understanding from looking at the docs is that
 synchronous_commit=remote_write will always imply a *local* commit as
 well.

 Is there any way to set the system up to do a write to the remote,
 ensure it's in memory of the remote (remote_write mode, not full sync
 to disk), but *not* necessarily to the local disk? Meaning we're ok to
 release the transaction when the data is in memory both locally and
 remotely but not wait for I/O?

 If we crash, the slave can end up ahead of the master, and then it's 
 hopelessly corrupted...

 Maybe we could engineer around this, but it hasn't been done yet.

The work around would be for the master to refuse to automatically
restart after a crash, insisting on a fail-over instead (or a manual
forcing of recovery)?

Cheers,

Jeff

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Bug tracker tool we need

2012-04-19 Thread Andrew Dunstan



On 04/19/2012 11:25 AM, Christopher Browne wrote:


The vast majority of the spam I have originates in the postgresql git
repository. You don't have any commits there...

But I would've assumed it should hit equally hard on other
repositories that's been around a long time.

I have plenty of commits on the Slony Git repo, which has had clones
at github for about as long as PostgreSQL has.

And I don't get any noticeable amounts of spam at github.  Not all
notifications are hugely interesting, but I don't see anything that's
not reasonably related to things I have commented on.

So I think there has to be some other effect in play.


The spammers pick certain well known projects, I believe.

At any rate, I found that my spam went to nil by turning off 
notifications for comments on my commits and comments that mention me.



cheers

andrew

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Timsort performance, quicksort

2012-04-19 Thread Dimitri Fontaine
Peter Geoghegan pe...@2ndquadrant.com writes:
 1.  What we should be doing with timsort, if anything.  It is one
 thing to demonstrate that it's a useful algorithm under certain
 artificial conditions, but quite another to argue for its inclusion in
 Postgres, or for it being selectively used at points where that is
 likely to be a win, based on some criteria or another like known
 cardinality, physical/logical correlation or assumed costs of
 comparisons for each type.  At the very least, it is an interesting
 algorithm, but without integration that makes it actually serve user
 needs, that's all it will ever be to us.  Deciding if and when it
 should be used is a rather nuanced process, and I'm certainly not
 about to declare that we should get rid of quicksort.  It does appear
 to be a fairly good fit to some of our requirements though.

I kind of understood timsort would shine in sorting text in non-C
collation, because of the comparison cost. So a test in some UTF8
collation or other would be interesting, right?

Regards,
-- 
Dimitri Fontaine
http://2ndQuadrant.fr PostgreSQL : Expertise, Formation et Support

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] libpq URI and regression testing

2012-04-19 Thread Peter Eisentraut
On tor, 2012-04-19 at 00:13 +0300, Alex wrote:
 +#!/usr/bin/env perl

Don't do that.  Call the script using $(PERL) from the makefile.


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] New sync commit mode remote_write

2012-04-19 Thread Robert Haas
On 4/19/12, Jeff Janes jeff.ja...@gmail.com wrote:
 The work around would be for the master to refuse to automatically
 restart after a crash, insisting on a fail-over instead (or a manual
 forcing of recovery)?

I suppose that would work, but I think Simon's idea is better: don't
let the slave replay the WAL until either (a) it's promoted or (b) the
master finishes the fsync.   That boils down to adding some more
handshaking to the replication protocol, I think.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Bug tracker tool we need

2012-04-19 Thread Tom Lane
Andrew Dunstan and...@dunslane.net writes:
 At any rate, I found that my spam went to nil by turning off 
 notifications for comments on my commits and comments that mention me.

The first part of that seems like it would destroy most of the point
of having the mechanism at all?

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Bug tracker tool we need

2012-04-19 Thread Andrew Dunstan



On 04/19/2012 03:05 PM, Tom Lane wrote:

Andrew Dunstanand...@dunslane.net  writes:

At any rate, I found that my spam went to nil by turning off
notifications for comments on my commits and comments that mention me.

The first part of that seems like it would destroy most of the point
of having the mechanism at all?





Yes, the notification piece is pretty much useless, because of the spammers.

I use github as a convenient place to stash public repositories (e.g. 
buildfarm code), and PostgreSQL Experts uses it for both public and 
private repos, but if people want me to get their comments they need to 
go to the trouble of emailing me.


cheers

andrew

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Dump EXTENSION sequences too

2012-04-19 Thread Jim Nasby

On 4/19/12 5:42 AM, Gianni Ciolli wrote:

currently an EXTENSION can mark some of its tables as configuration
tables using pg_catalog.pg_extension_config_dump(), so that pg_dump
does the right thing.

I think it would be useful to mark sequences too, but unfortunately it
is not possible; hence, each time a dump is reloaded, all the
sequences in the extension are reset to 1, causing all the related
problems.

Moreover, the error message that we get if we try to mark a sequence
does not mention the requirement that the relation is a table. The
OID %u does not refer to a table error message seems to be wrongly
restricted to the case when get_rel_name can't find a relation.

Is there any objection to the above proposal? I did a little search of
the archives, but I couldn't find any related discussions; I apologise
if I missed something.


I'll toss in something related to this...

At work we use the concept of seed tables that have their data dumped along with their 
structure (using a script that's calling pg_dump). These are similar to the concept of 
configuration tables.

The problem that we've discovered with this is that surrogate keys based on 
sequences can really screw you if you're not careful. The issue comes about if 
you're using the dump in more than one database (ie: a dump of a common set of 
tools) and the different databases have also added configuration. In that 
scenario it's easy to end up with duplicated surrogate key values.

The solution we plan to implement to get around this is to add support for 
dumping config data via something other than just copying raw table data. So 
our dump script would call a database function that would be responsible for 
spitting out raw SQL that gets injected directly into the dump. That SQL would 
then be able to remove all references to surrogate keys (doing stuff like 
INSERT WHERE NOT EXISTS and then JOINing to avoid outputting raw surrogate 
keys).
--
Jim C. Nasby, Database Architect   j...@nasby.net
512.569.9461 (cell) http://jim.nasby.net

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] libpq URI and regression testing

2012-04-19 Thread Alex
Peter Eisentraut pete...@gmx.net writes:

 On tor, 2012-04-19 at 00:13 +0300, Alex wrote:
 +#!/usr/bin/env perl

 Don't do that.  Call the script using $(PERL) from the makefile.

Thank you for the suggestion.  Attached v2 does just this (while keeping
a more commonly found shebang line in the perl script for running it w/o
the makefile.)

I figure src/tools/msvc/vcregress.pl will need to be updated too, but
trying to model all this after ecpg regression tests I'm stuck at
replicating ecpg_regression.proj for libpq's uri-regress.  I'd
appreciate any help from the Windows guys at this point.

--
Alex

diff --git a/src/interfaces/libpq/test/Makefile b/src/interfaces/libpq/test/Makefile
index b9023c3..048f092 100644
--- a/src/interfaces/libpq/test/Makefile
+++ b/src/interfaces/libpq/test/Makefile
@@ -15,7 +15,7 @@ all: $(PROGS)
 
 installcheck: all
 	SRCDIR='$(top_srcdir)' SUBDIR='$(subdir)' \
-		   $(SHELL) $(top_srcdir)/$(subdir)/regress.sh
+	  $(PERL) $(top_srcdir)/$(subdir)/regress.pl
 
 clean distclean maintainer-clean:
 	rm -f $(PROGS)
diff --git a/src/interfaces/libpq/test/regress.pl b/src/interfaces/libpq/test/regress.pl
new file mode 100755
index 000..a19f793
--- /dev/null
+++ b/src/interfaces/libpq/test/regress.pl
@@ -0,0 +1,56 @@
+#!/usr/bin/perl
+use strict;
+
+# use of SRCDIR/SUBDIR is required for supporting VPath builds
+my $srcdir = $ENV{'SRCDIR'} or die '$SRCDIR environment variable is not set';
+my $subdir = $ENV{'SUBDIR'} or die '$SUBDIR environment variable is not set';
+
+my $regress_in   = $srcdir/$subdir/regress.in;
+my $expected_out = $srcdir/$subdir/expected.out;
+
+# the output file should land in the build_dir of VPath, or just in
+# the current dir, if VPath isn't used
+my $regress_out  = regress.out;
+
+# open input file first, so possible error isn't sent to redirected STDERR
+open(REGRESS_IN, $regress_in) or die Can't open $regress_in: $!;
+
+# save STDOUT/ERR and redirect both to regress.out
+open(OLDOUT, STDOUT) or die Can't dup STDOUT: $!;
+open(OLDERR, STDERR) or die Can't dup STDERR: $!;
+
+open(STDOUT, $regress_out) or die Can't open $regress_out: $!;
+open(STDERR, STDOUT)  or die Can't dup STDOUT: $!;
+
+# read lines from regress.in and run uri-regress on them
+while (REGRESS_IN) {
+  chomp;
+  print trying $_\n;
+  system(./uri-regress \$_\);
+  print \n;
+}
+
+# restore STDOUT/ERR so we can print the outcome to the user
+open(STDERR, OLDERR) or die; # can't complain as STDERR is still duped
+open(STDOUT, OLDOUT) or die Can't restore STDOUT: $!;
+
+# just in case
+close REGRESS_IN;
+
+my $diff_status = system(diff -c \$srcdir/$subdir/expected.out\ regress.out regress.diff);
+if ($diff_status == 0) {
+  print EOF;
+==
+All tests passed
+EOF
+  exit 0;
+} else {
+  print EOF;
+==
+FAILED: the test result differs from the expected output
+
+Review the difference in $subdir/regress.diff
+==
+EOF
+  exit 1;
+}
diff --git a/src/interfaces/libpq/test/regress.sh b/src/interfaces/libpq/test/regress.sh
deleted file mode 100644
index 298d8bd..000
--- a/src/interfaces/libpq/test/regress.sh
+++ /dev/null
@@ -1,21 +0,0 @@
-#!/bin/sh
-
-while read line
-do
-	echo trying $line
-	./uri-regress $line
-	echo 
-done  ${SRCDIR}/${SUBDIR}/regress.in regress.out 21
-
-if diff -c ${SRCDIR}/${SUBDIR}/expected.out regress.out regress.diff; then
-	echo 
-	echo All tests passed
-	exit 0
-else
-	echo 
-	echo FAILED: the test result differs from the expected output
-	echo
-	echo Review the difference in ${SUBDIR}/regress.diff
-	echo 
-	exit 1
-fi
diff --git a/src/tools/msvc/Mkvcbuild.pm b/src/tools/msvc/Mkvcbuild.pm
index f0fad43..e65971c 100644
--- a/src/tools/msvc/Mkvcbuild.pm
+++ b/src/tools/msvc/Mkvcbuild.pm
@@ -229,6 +229,15 @@ sub mkvcbuild
 $libpq-ReplaceFile('src\interfaces\libpq\libpqrc.c','src\interfaces\libpq\libpq.rc');
 $libpq-AddReference($libpgport);
 
+my $libpq_uri_regress = $solution-AddProject('libpq_uri_regress','exe','misc');
+$libpq_uri_regress-AddFile('src\interfaces\libpq\test\uri-regress.c');
+$libpq_uri_regress-AddIncludeDir('src\port');
+$libpq_uri_regress-AddIncludeDir('src\interfaces\libpq');
+$libpq_uri_regress-AddLibrary('wsock32.lib');
+$libpq_uri_regress-AddDefine('HOST_TUPLE=i686-pc-win32vc');
+$libpq_uri_regress-AddDefine('FRONTEND');
+$libpq_uri_regress-AddReference($libpq,$libpgport);
+
 my $libpqwalreceiver = $solution-AddProject('libpqwalreceiver', 'dll', '',
 'src\backend\replication\libpqwalreceiver');
 $libpqwalreceiver-AddIncludeDir('src\interfaces\libpq');

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:

Re: [HACKERS] Bug #6593, extensions, and proposed new patch policy

2012-04-19 Thread Alvaro Herrera

Excerpts from Dimitri Fontaine's message of jue abr 19 12:42:00 -0300 2012:

 What about only issuing a WARNING that the extensions are not supported
 by reassign owned in 9.1 (and probably 9.2)?

Raise a warning and then do what?  While you can continue reassigning
the rest of the objects to someone else, this doesn't help the poor
fella who's just trying to drop the owner of the extension -- it still
can't be dropped.  Moreover, since there's no ALTER OWNER command for
extensions, the user can't just change it to someone else manually.
The only option is to do DROP OWNED, which will drop the extension along
with all the objects that belong to it.  In fact, the documentation
states that the way to drop a user that owns objects is to run REASSIGN
OWNED, then DROP OWNED, (repeat for all databases), then DROP ROLE.  So
if the DBA does that, he might end up dropping the extension by
accident.

Maybe we could just add a protection that the user to which the
extension is reassigned must be a superuser or the database owner.

Remember that we're talking about REASSIGN OWNED here, which will
automatically reassign not only the extension itself, but also the
contained objects.  There is no danger that we will end up with an
inconsistent installation.  Also, if the objects in the extension have
been manually given to someone else, they will stay owned by that other
user, precisely because the code as written does not recurse.

-- 
Álvaro Herrera alvhe...@commandprompt.com
The PostgreSQL Company - Command Prompt, Inc.
PostgreSQL Replication, Consulting, Custom Development, 24x7 support

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Bug #6593, extensions, and proposed new patch policy

2012-04-19 Thread Dimitri Fontaine
Alvaro Herrera alvhe...@commandprompt.com writes:
 Remember that we're talking about REASSIGN OWNED here, which will
 automatically reassign not only the extension itself, but also the
 contained objects.  There is no danger that we will end up with an
 inconsistent installation.  Also, if the objects in the extension have
 been manually given to someone else, they will stay owned by that other
 user, precisely because the code as written does not recurse.

Oh, right, I forgot the scope of the command. Given those bits of missed
context, +1 from me here. Sorry about missing that in my previous email.

Regards,
-- 
Dimitri Fontaine
http://2ndQuadrant.fr PostgreSQL : Expertise, Formation et Support

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] Plan stability versus near-exact ties in cost estimates

2012-04-19 Thread Tom Lane
So after committing the latest round of parameterized-plan hacking,
I was dismayed to see the buildfarm breaking out in pink, thanks to
some of the members producing a different plan than expected for one
test query.  I eventually managed to reproduce that (on the fourth
machine I tried locally), and after some quality time with gdb
I understand what is happening, to wit: the two alternative plans have
exactly the same cost so far as our cost model is concerned.  On my
main development machine, the two plans look like this to add_path:

$13 = {type = T_NestPath, pathtype = T_NestLoop, parent = 0x40193c08, 
  param_info = 0x40194458, rows = 5, startup_cost = 0, 
  total_cost = 47.6284988, pathkeys = 0x0}

$16 = {type = T_NestPath, pathtype = T_NestLoop, parent = 0x40193c08, 
  param_info = 0x40194458, rows = 5, startup_cost = 0, 
  total_cost = 47.6284981, pathkeys = 0x0}

so it picks the second one on the basis that its total_cost is better at
the sixteenth decimal place.  On the other machine, the same two paths
look like this:

$12 = {type = T_NestPath, pathtype = T_NestLoop, parent = 0x895b9e0, 
  param_info = 0x895c198, rows = 5, startup_cost = 0, 
  total_cost = 47.5784977, pathkeys = 0x0}

Breakpoint 2, add_path (parent_rel=0x895b9e0, new_path=0x895c208)
$15 = {type = T_NestPath, pathtype = T_NestLoop, parent = 0x895b9e0, 
  param_info = 0x895c198, rows = 5, startup_cost = 0, 
  total_cost = 47.5784977, pathkeys = 0x0}

and add_path is coded to arbitrarily keep the first one when two
paths are exactly the same on all its preference measures.

Now, the fact that the two machines get different costs at the third
decimal place isn't very interesting here; that's a pretty standard
cross-platform difference arising from different MAXALIGN values.
The important point is that the total_costs of the two paths are
exactly the same on one machine, and on the other one different only
by a microscopic amount that probably arises from a slightly different
floating-point calculation sequence chosen by a different compiler.

So, as our code stands, we're inevitably going to have very platform-
and compiler-specific decisions as to which plan to prefer.  I'm a bit
surprised that it's taken us this long to trip over this type of
situation, because it's surely not specific to parameterized paths.

We could deal with this either by giving up on showing the selected
plan in the regression test, or by creating multiple expected files,
but neither of those alternatives is very appealing.

The idea that I'm toying with is to try to make the choice a bit less
platform-specific, by removing the exact cost test that add_path uses
as its last-ditch comparison step, essentially this:

/*
 * Same pathkeys and outer rels, and fuzzily
 * the same cost, so keep just one; to decide
 * which, first check rows and then do an
 * exact cost comparison.
 */
if (new_path-rows  old_path-rows)
remove_old = true;  /* new dominates old */
-   else if (new_path-rows  old_path-rows)
-   accept_new = false; /* old dominates new */
-   else if (compare_path_costs(new_path, old_path,
-  TOTAL_COST)  0)
-   remove_old = true;  /* new dominates old */
else
accept_new = false; /* old equals or 
dominates new */

This change would mean that, when two paths have the same pathkeys,
parameterization, and rowcount, and fuzzily the same cost, that we
arbitrarily keep the first-submitted one rather than looking at low
order digits of the costs.  Since the order in which different paths
are generated is a property of our code and not platform-specific,
this should eliminate platform dependencies in cases where two paths
are essentially identical to the cost model.

A variant idea would be to replace the exact cost comparison with a
second round of fuzzy cost comparison, but with a much tighter fuzz
factor, maybe 1e-6 instead of 0.01.

Now, neither of these fixes is perfect: what they would do is remove
platform-specific instability from where the costs are basically equal
and add some more in the range where the costs differ by almost exactly
the fuzz factor.  But the behavior near that point is platform-specific
already, just not quite as much, and it's surely something we're
unlikely to trip over in the regression tests.

Thoughts, better ideas?

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:

Re: [HACKERS] Plan stability versus near-exact ties in cost estimates

2012-04-19 Thread Jim Nasby

On 4/19/12 5:39 PM, Tom Lane wrote:

Now, neither of these fixes is perfect: what they would do is remove
platform-specific instability from where the costs are basically equal
and add some more in the range where the costs differ by almost exactly
the fuzz factor.  But the behavior near that point is platform-specific
already, just not quite as much, and it's surely something we're
unlikely to trip over in the regression tests.


I can't help but think of complaints we get from users regarding plan 
stability, even though this is a case of taking that to an extreme. Because 
this case is extreme (changing plans due to 1e-16 of difference) it's fairly 
easy to fix this specific case. In order to get 9.2 out the door maybe fixing 
just this case is the right thing to do. But ISTM this is just an example of a 
bigger problem.

One of the complaints we've seen is that the planner will sometimes choose a 
plan that has a marginally lower cost (where marginally in this case is 
significantly more than 1e-16 ;) even though that plan will perform really 
poorly if the stats are off. I have wondered if that could be addressed by 
introducing the concept of an error range to each plan. My idea is that each 
node would predict how much the cost estimate would change if the stats were 
off by some amount. If two plans are close to the same cost, you would want to 
choose the plan that had the lower error range, trading off a small amount of 
theoretical performance for less risk of getting a horrible plan if the stats 
assumptions proved to be wrong.

I believe that would fix this specific case because even though to plans might 
come out with a nearly identical cost it is unlikely that they would also have 
a nearly identical error range.
--
Jim C. Nasby, Database Architect   j...@nasby.net
512.569.9461 (cell) http://jim.nasby.net

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Timsort performance, quicksort

2012-04-19 Thread Peter Geoghegan
On 19 April 2012 19:24, Dimitri Fontaine dimi...@2ndquadrant.fr wrote:
 I kind of understood timsort would shine in sorting text in non-C
 collation, because of the comparison cost. So a test in some UTF8
 collation or other would be interesting, right?

That's certainly the theory, yes. In practice, even though timsort
lives up to its promise of significantly reducing the number of
comparisons required in many common situations, my implementation does
not actually perform better than qsort_arg. Even a reduction of over
10% in the number of comparisons does not result in a net performance
gain. It wouldn't surprise me if the implementation used is quite
suboptimal, and it might well be worth profiling and optimising. It
doesn't appear to be the big win that I'd hoped for though. It's
necessary to stretch the assumed cost of a comparison rather a lot
further than the very common case of sorting a single key of non-c
collated text for it to be worth it, and that's just too thin for me
to sink more time into this right now.

-- 
Peter Geoghegan       http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training and Services

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Plan stability versus near-exact ties in cost estimates

2012-04-19 Thread Tom Lane
Jim Nasby j...@nasby.net writes:
 [ add some error ranges to cost estimates ]

 I believe that would fix this specific case because even though to plans 
 might come out with a nearly identical cost it is unlikely that they would 
 also have a nearly identical error range.

Actually, I think that *wouldn't* fix this specific case --- did you
look at the details?  The two formulations of the plan are really pretty
nearly equivalent; you can do the two nestloops in either order and it's
not clear it'll make much difference.  I'm suspicious that the addition
of parameterized planning might open up more scope for this type of
thing, even though in principle it was always possible.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers