date:20120625

Re: [HACKERS] new --maintenance-db options

2012-06-25 Thread Amit Kapila

From: pgsql-hackers-ow...@postgresql.org
[mailto:pgsql-hackers-ow...@postgresql.org] On Behalf Of Tom Lane
Robert Haas  writes:
>> From pg_upgrade's perspective, it would
>> be nice to have a flag that starts the server in some mode where
>> nobody but pg_upgrade can connect to it and all connections are
>> automatically allowed, but it's not exactly clear how to implement
>> "nobody but pg_upgrade can connect to it".

> The implementation I've wanted to see for some time is that you can
> start a standalone backend, but it speaks FE/BE protocol to its caller
> (preferably over pipes, so that there is no issue whatsoever of where
> you can securely put a socket or anything like that).  

Can't it be done like follow the FE/BE protocol, but call directly the
server API's 
at required places. 
This kind of implementation can be more performant than adding any
communication to it which will be
beneficial for embedded databases.

> Making that
> happen might be a bit too much work if pg_upgrade were the only use
> case, but there are a lot of people who would like to use PG as an
> embedded database, and this might be close enough for such use-cases.

Seeing PG to run as embedded database would be interesting for many people
using PG.
There is another use case of embedded databases that they allow another
remote connections
as well to monitor the operations in database. However that can be done in a
later version of implementation.

With Regards,
Amit Kapila.

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

[HACKERS] proof concept - access to session variables on client side

2012-06-25 Thread Pavel Stehule

Hello

I worked on simple patch, that enable access from server side to
client side data. It add two new hooks to libpq - one for returning of
local context, second for setting of local context.

A motivation is integration of possibilities of psql console together
with stronger language - plpgsql. Second target is enabling
possibility to save a result of some server side process in psql. It
improve vars feature in psql.

pavel ~/src/postgresql/src $ cat test.sql
\echo value of external paremeter is :"myvar"

do $$
begin
  -- we can take any session variable on client side
  -- it is safe against to SQL injection
  raise notice 'external parameter accessed from plpgsql is "%"',
hgetvar('myvar');

  -- we can change this session variable and finish transaction
  perform hsetvar('myvar', 'Hello, World');
end;
$$ language plpgsql;

\echo new value of session variable is :"myvar"

cat test.sql | psql postgres -v myvar=Hello
value of external paremeter is "Hello"
NOTICE:  external parameter accessed from plpgsql is "Hello"
DO
new value of session variable is "Hello, World"

This is just proof concept - there should be better integration with
pl languages, using cache for read on server side, ...

Notices?

Regards

Pavel Stehule


client-session-vars.diff
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [PATCH] lock_timeout and common SIGALRM framework

2012-06-25 Thread Alvaro Herrera

I cleaned up the framework patch a bit.  My version's attached.  Mainly,
returning false for failure in some code paths that are only going to
have the caller elog(FATAL) is rather pointless -- it seems much better
to just have the code itself do the elog(FATAL).  In any case, I
searched for reports of those error messages being reported in the wild
and saw none.

There are other things but they are in the nitpick department.  (A
reference to "->check_timeout" remains that needs to be fixed too).

There is one thing that still bothers me on this whole framework patch,
but I'm not sure it's easily fixable.  Basically the API to timeout.c is
the whole list of timeouts and their whole behaviors.  If you want to
add a new one, you can't just call into the entry points, you have to
modify timeout.c itself (as well as timeout.h as well as whatever code
it is that you want to add timeouts to).  This may be good enough, but I
don't like it.  I think it boils down to proctimeout.h is cheating.

The interface I would actually like to have is something that lets the
modules trying to get a timeout register the timeout-checking function
as a callback.  API-wise, it would be much cleaner.  However, I'm not
raelly sure that code-wise it would be any cleaner at all.  In fact I
think it'd be rather cumbersome.  However, if you could give that idea
some thought, it'd be swell.

I haven't looked at the second patch at all yet.

-- 
Álvaro Herrera 
The PostgreSQL Company - Command Prompt, Inc.
PostgreSQL Replication, Consulting, Custom Development, 24x7 support


1-timeout-framework-v9.patch
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] WAL format changes

2012-06-25 Thread Amit Kapila

From: pgsql-hackers-ow...@postgresql.org
[mailto:pgsql-hackers-ow...@postgresql.org] On Behalf Of Tom Lane
Heikki Linnakangas  writes:
>>  So I think we should change pg_resetxlog -l option to take a WAL file 
>>  name as argument, and fix pg_upgrade accordingly.

> Seems reasonable I guess.  It's really specifying a starting WAL
> location, but only to file granularity, so treating the argument as a
> file name is sort of a type cheat but seems convenient.

> If we do it that way, we'd better validate that the argument is a legal
> WAL file name, so as to catch any cases where somebody tries to do it
> old-style.

> BTW, does pg_resetxlog's logic for setting the default -l value (from
> scanning pg_xlog to find the largest existing file name) still work?
  

It finds the segment number for largest existing file name from pg_xlog and
then compare it with input provided by the 
user for -l Option, if input is greater it will use the input to set in
control file.

With Regards,
Amit Kapila.


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] pg_stat_lwlocks view - lwlocks statistics

2012-06-25 Thread Satoshi Nagayasu


2012/06/26 7:04, Kevin Grittner wrote:

Josh Berkus  wrote:

On 6/25/12 1:29 PM, Satoshi Nagayasu wrote:

(1) Performance

   I've measured LWLock performance both with and without the
   patch, and confirmed that this patch does not affect the LWLock
   perfomance at all.


This would be my main concern with this patch; it's hard for me to
imagine that it has no performance impact *at all*, since
trace_lwlocks has quite a noticable one in my experience.
However, the answer to that is to submit the patch and let people
test.


I think overhead is going to depend quite a bit on the
gettimeofday() implementation, since that is called twice per lock
wait.


Yes.
It's one of my concerns, and what I actually want hackers to test.


It looked to me like there was nothing to prevent concurrent updates
of the counts while gathering the accumulated values for display.
Won't this be a problem on 32-bit builds?


Actually, I'd like to know how I can improve my code in a 32bit box.

However, unfortunately I don't have any 32bit (physical) box now,
so I want someone to test it if it needs to be tested.


Please add this to the Open COmmitFest for a proper review:

https://commitfest.postgresql.org/action/commitfest_view/open


Will submit soon. Thanks.



-Kevin



Regards,
--
Satoshi Nagayasu 
Uptime Technologies, LLC. http://www.uptime.jp

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] patch: avoid heavyweight locking on hash metapage

2012-06-25 Thread Jeff Janes

On Mon, Jun 18, 2012 at 5:42 PM, Robert Haas  wrote:
>
> Hmm.  That was actually a gloss I added on existing code to try to
> convince myself that it was safe; I don't think that the changes I
> made make that any more or less safe than it was before.

Right, sorry.  I thought there was some strength reduction going on
there as well.

Thanks for the various explanations, they address my concerns.  I see
that v2 applies over v1.

I've verified performance improvements using 8 cores with my proposed
pgbench -P benchmark, with a scale that fits in shared_buffers.
It brings it most of the way, but not quite, up to the btree performance.

I've marked this as ready for committer.

Cheers,

Jeff

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] pg_stat_lwlocks view - lwlocks statistics

2012-06-25 Thread Satoshi Nagayasu


2012/06/26 6:44, Josh Berkus wrote:

On 6/25/12 1:29 PM, Satoshi Nagayasu wrote:

(1) Performance

   I've measured LWLock performance both with and without the patch,
   and confirmed that this patch does not affect the LWLock perfomance
   at all.


This would be my main concern with this patch; it's hard for me to
imagine that it has no performance impact *at all*, since trace_lwlocks
has quite a noticable one in my experience.  However, the answer to that
is to submit the patch and let people test.


Thanks. I will submit the patch to the CommitFest page with some fixes
to be able to work with the latest PostgreSQL on Git.


I will remark that it would be far more useful to me if we could also
track lwlocks per session.  Overall counts are somewhat useful, but more
granular counts are even more useful.  What period of time does the
table cover?  Since last reset?


Yes. it has not yet been implemented yet since this code is just a PoC
one, but it is another design issue which needs to be discussed.

To implement it, a new array can be added in the local process memory
to hold lwlock statistics, and update counters both in the shared
memory and the local process memory at once. Then, the session can
retrieve 'per-session' statistics from the local process memory
via some dedicated function.

Does it make sense? Any comments?

Regards,
--
Satoshi Nagayasu 
Uptime Technologies, LLC. http://www.uptime.jp

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] WAL format changes

2012-06-25 Thread Tom Lane

Heikki Linnakangas  writes:
> So I think we should change pg_resetxlog -l option to take a WAL file 
> name as argument, and fix pg_upgrade accordingly.

Seems reasonable I guess.  It's really specifying a starting WAL
location, but only to file granularity, so treating the argument as a
file name is sort of a type cheat but seems convenient.

If we do it that way, we'd better validate that the argument is a legal
WAL file name, so as to catch any cases where somebody tries to do it
old-style.

BTW, does pg_resetxlog's logic for setting the default -l value (from
scanning pg_xlog to find the largest existing file name) still work?

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] pg_dump and dependencies and --section ... it's a mess

2012-06-25 Thread Tom Lane

I wrote:
> Barring objections or better ideas, I'll push forward with applying
> this patch and the dependency-fixup patch.

Applied and back-patched.

BTW, I had complained at the end of the pgsql-bugs thread about bug #6699
that it seemed like FK constraints would get prevented from being
restored in parallel fashion if they depended on a constraint-style
unique index, because the dependency for the FK constraint would
reference the index's dump ID, which is nowhere to be seen in the dump.
But in fact they are restored in parallel, and the reason is that
pg_restore silently ignores any references to unknown dump IDs (a hack
put in specifically because of the bogus dependency data, no doubt).
So that raises the opposite question: how come pg_restore doesn't fall
over from trying to create the FK constraint before the unique index it
depends on is created?  And the answer to that is dumb luck, more or
less.  The "locking dependencies" hack in pg_restore knows that creation
of an FK constraint requires exclusive lock on both tables, so it won't
try to restore the FK constraint before creation of the constraint-style
index is done.  So getting the dependency information fixed is a
necessary prerequisite for any attempt to reduce the locking
requirements there.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] WAL format changes

2012-06-25 Thread Alvaro Herrera


Excerpts from Heikki Linnakangas's message of lun jun 25 20:09:34 -0400 2012:
> On 25.06.2012 21:01, Robert Haas wrote:
> > On Mon, Jun 25, 2012 at 1:57 PM, Fujii Masao  wrote:
> >>> "<<" should be">>". The attached patch fixes this typo.
> >>
> >> Oh, I forgot to attach the patch.. Here is the patch.
> >
> > I committed both of the patches you posted to this thread.
> 
> Thanks Robert. I was thinking that "pg_resetxlog -l" would accept a WAL 
> file name, instead of comma-separated tli, xlogid, segno arguments. The 
> latter is a bit meaningless now that we don't use the xlogid+segno 
> combination anywhere else. Alvaro pointed out that pg_upgrade was broken 
> by the change in pg_resetxlog -n output - I changed that too to print 
> the "First log segment after reset" information as a WAL file name, 
> instead of logid+segno. Another option would be to print the 64-bit 
> segment number, but I think that's worse, because the 64-bit segment 
> number is harder to associate with a physical WAL file.
> 
> So I think we should change pg_resetxlog -l option to take a WAL file 
> name as argument, and fix pg_upgrade accordingly.

The only thing pg_upgrade does with the tli/logid/segno combo, AFAICT,
is pass it back to pg_resetxlog -l, so this plan seems reasonable.

-- 
Álvaro Herrera 
The PostgreSQL Company - Command Prompt, Inc.
PostgreSQL Replication, Consulting, Custom Development, 24x7 support

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [PATCH 08/16] Introduce the ApplyCache module which can reassemble transactions from a stream of interspersed changes

2012-06-25 Thread Steve Singer


On 12-06-21 04:37 AM, Andres Freund wrote:

Hi Steve,
Thanks!



Attached is a detailed review of the patch.

Very good analysis, thanks!

Another reasons why we cannot easily do 1) is that subtransactions aren't
discernible from top-level transactions before the top-level commit happens,
we can only properly merge in the right order (by "sorting" via lsn) once we
have seen the commit record which includes a list of all committed
subtransactions.



Based on that and your comments further down in your reply (and that no 
one spoke up and disagreed with you) It sounds like that doing (1) isn't 
going to be practical.



I also don't think 1) would be particularly welcome by people trying to
replicate into foreign systems.



They could still sort the changes into transaction groups before 
applying to the foreign system.



I planned to have some cutoff 'max_changes_in_memory_per_txn' value. 
If it has

been reached for one transaction all existing changes are spilled to disk. New
changes again can be kept in memory till its reached again.

Do you want max_changes_per_in_memory_txn or do you want to put a limit 
on the total amount of memory that the cache is able to use? How are you 
going to tell a DBA to tune max_changes_in_memory_per_txn? They know how 
much memory their system has and that they can devote to the apply cache 
versus other things, giving them guidance on how estimating how much 
open transactions they might have at a point in time  and how many
WAL change records each transaction generates seems like a step 
backwards from the progress we've been making in getting Postgresql to 
be easier to tune.  The maximum number of transactions that could be 
opened at a time is governed by max_connections on the master at the 
time the WAL was generated , so I don't even see how the machine 
processing the WAL records could autotune/guess that.





We need to support serializing the cache for crash recovery + shutdown of the
receiving side as well. Depending on how we do the wal decoding we will need
it more frequently...



Have you described your thoughts on crash recovery on another thread?

I am thinking that this module would have to serialize some state 
everytime it calls cache->commit() to ensure that consumers don't get 
invoked twice on the same transaction.


If the apply module is making changes to the same backend that the apply 
cache serializes to then both the state for the apply cache and the 
changes that committed changes/transactions make will be persisted (or 
not persisted) together.   What if I am replicating from x86 to x86_64 
via a apply module that does textout conversions?


x86 Proxy x86_64
WAL--> apply
 cache
  |   (proxy catalog)
  |
 apply module
  textout  ->
  SQL statements


How do we ensure that the commits are all visible(or not visible)  on 
the catalog on the proxy instance used for decoding WAL, the destination 
database, and the state + spill files of the apply cache stay consistent 
in the event of a crash of either the proxy or the target?
I don't think you can (unless we consider two-phase commit, and I'd 
rather we didn't).  Can we come up with a way of avoiding the need for 
them to be consistent with each other?


I think apply modules will need to be able to be passed the same 
transaction twice (once before the crash and again after) and come up 
with a  way of deciding if that transaction has  a) Been applied to the 
translation/proxy catalog and b) been applied on the replica instance.   
How is the walreceiver going to decide which WAL sgements it needs to 
re-process after a crash?  I would want to see more of these details 
worked out before we finalize the interface between the apply cache and 
the apply modules and how the serialization works.



Code Review
=

applycache.h
---
+typedef struct ApplyCacheTupleBuf
+{
+/* position in preallocated list */
+ilist_s_node node;
+
+HeapTupleData tuple;
+HeapTupleHeaderData header;
+char data[MaxHeapTupleSize];
+} ApplyCacheTupleBuf;

Each ApplyCacheTupleBuf will be about 8k (BLKSZ) big no matter how big 
the data in the transaction is? Wouldn't workloads with inserts of lots 
of small rows in a transaction eat up lots of memory that is allocated 
but storing nothing?  The only alternative I can think of is dynamically 
allocating these and I don't know what the cost/benefit of that overhead 
will be versus spilling to disk sooner.


+* FIXME: better name
+ */
+ApplyCacheChange*
+ApplyCacheGetChange(ApplyCache*);

How about:

ApplyCacheReserveChangeStruct(..)
ApplyCacheReserveChange(...)
ApplyCacheAllocateChange(...)

as ideas?
+/*
+ * Return an unused ApplyCacheChange struct
 +*/
+void
+ApplyCacheReturnChange(ApplyCache*, ApplyCacheChange*);

ApplyCac

Re: [HACKERS] WAL format changes

2012-06-25 Thread Heikki Linnakangas


On 25.06.2012 21:01, Robert Haas wrote:

On Mon, Jun 25, 2012 at 1:57 PM, Fujii Masao  wrote:

"<<" should be">>". The attached patch fixes this typo.


Oh, I forgot to attach the patch.. Here is the patch.


I committed both of the patches you posted to this thread.


Thanks Robert. I was thinking that "pg_resetxlog -l" would accept a WAL 
file name, instead of comma-separated tli, xlogid, segno arguments. The 
latter is a bit meaningless now that we don't use the xlogid+segno 
combination anywhere else. Alvaro pointed out that pg_upgrade was broken 
by the change in pg_resetxlog -n output - I changed that too to print 
the "First log segment after reset" information as a WAL file name, 
instead of logid+segno. Another option would be to print the 64-bit 
segment number, but I think that's worse, because the 64-bit 	segment 
number is harder to associate with a physical WAL file.


So I think we should change pg_resetxlog -l option to take a WAL file 
name as argument, and fix pg_upgrade accordingly.


--
  Heikki Linnakangas
  EnterpriseDB   http://www.enterprisedb.com

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] pg_stat_lwlocks view - lwlocks statistics

2012-06-25 Thread Josh Berkus

On 6/25/12 1:29 PM, Satoshi Nagayasu wrote:
> (1) Performance
> 
>   I've measured LWLock performance both with and without the patch,
>   and confirmed that this patch does not affect the LWLock perfomance
>   at all.

This would be my main concern with this patch; it's hard for me to
imagine that it has no performance impact *at all*, since trace_lwlocks
has quite a noticable one in my experience.  However, the answer to that
is to submit the patch and let people test.

I will remark that it would be far more useful to me if we could also
track lwlocks per session.  Overall counts are somewhat useful, but more
granular counts are even more useful.  What period of time does the
table cover?  Since last reset?

-- 
Josh Berkus
PostgreSQL Experts Inc.
http://pgexperts.com

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] libpq compression

2012-06-25 Thread Merlin Moncure

On Mon, Jun 25, 2012 at 3:38 PM, Euler Taveira  wrote:
> On 25-06-2012 16:45, Florian Pflug wrote:
>> Agreed. If we extend the protocol to support compression, and not rely
>> on SSL, then let's pick one of these LZ77-style compressors, and let's
>> integrate it into our tree.
>>
> If we have an option to have it out of our tree, good; if not, let's integrate
> it. I, particularly, don't see a compelling reason to integrate compression
> code in our tree, I mean, if we want to support more than one algorithm, it is
> clear that the overhead for maintain the compression code is too high (a lot
> of my-new-compression-algorithms).
>
>> We should then also make it possible to enable compression only for
>> the server -> client direction. Since those types of compressions are
>> usually pretty easy to decompress, that reduces the amount to work
>> non-libpq clients have to put in to take advantage of compression.
>>
> I don't buy this idea. My use case (data load) will not be covered if we only
> enable server -> client compression. I figure that there are use cases for
> server -> client compression (replication, for example) but also there are
> important use cases for client -> server (data load, for example). If you
> implement decompression, why not code compression code too?

+1.  lz4, which is looking like a strong candidate,  has c#, java,
etc. which are the main languages that are going to lag behind in
terms of protocol support.  I don't think you're saving a lot by going
only one way (although you could make a case for the client to signal
interest in compression separately from decompression?)

another point:
It's been obvious for years now that zlib is somewhat of a dog in
terms of cpu usage for what it gives you.  however, raw performance #s
are not the whole story -- it would be interesting to compress real
world protocol messages to/from the server in various scenarios to see
how compression would work, in particular with OLTP type queries --
for example pgbench runs, etc. That would speak a lot more to the
benefits than canned benchmarks.

merlin

merlin

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] libpq compression

2012-06-25 Thread Euler Taveira

On 25-06-2012 14:30, Greg Jaskiewicz wrote:
> Wasn't this more of an issue in de-coupling compression from encryption ?
> 
Let me give a try to take some conclusion. If we decide for an independent
compression code instead of an SSL-based one, that is a possibility to be
tested: SSL + SSL-based compression x SSL + our compression code. If the
latter is faster then we could discuss the possibility to disable compression
in our SSL code and put in documentation that it is recommended to enable
compression in SSL connections.


-- 
   Euler Taveira de Oliveira - Timbira   http://www.timbira.com.br/
   PostgreSQL: Consultoria, Desenvolvimento, Suporte 24x7 e Treinamento

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] libpq compression

2012-06-25 Thread Euler Taveira

On 25-06-2012 16:45, Florian Pflug wrote:
> Agreed. If we extend the protocol to support compression, and not rely
> on SSL, then let's pick one of these LZ77-style compressors, and let's
> integrate it into our tree.
> 
If we have an option to have it out of our tree, good; if not, let's integrate
it. I, particularly, don't see a compelling reason to integrate compression
code in our tree, I mean, if we want to support more than one algorithm, it is
clear that the overhead for maintain the compression code is too high (a lot
of my-new-compression-algorithms).

> We should then also make it possible to enable compression only for
> the server -> client direction. Since those types of compressions are
> usually pretty easy to decompress, that reduces the amount to work
> non-libpq clients have to put in to take advantage of compression.
> 
I don't buy this idea. My use case (data load) will not be covered if we only
enable server -> client compression. I figure that there are use cases for
server -> client compression (replication, for example) but also there are
important use cases for client -> server (data load, for example). If you
implement decompression, why not code compression code too?


-- 
   Euler Taveira de Oliveira - Timbira   http://www.timbira.com.br/
   PostgreSQL: Consultoria, Desenvolvimento, Suporte 24x7 e Treinamento

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

[HACKERS] pg_stat_lwlocks view - lwlocks statistics

2012-06-25 Thread Satoshi Nagayasu

Hi all,

I've been working on a new system view, pg_stat_lwlocks, to observe
LWLock, and just completed my 'proof-of-concept' code that can work
with version 9.1.

Now, I'd like to know the possibility of this feature for future
release.

With this patch, DBA can easily determine a bottleneck around lwlocks.
--
postgres=# SELECT * FROM pg_stat_lwlocks ORDER BY time_ms DESC LIMIT 10;
 lwlockid | calls  | waits | time_ms
--++---+-
   49 | 193326 | 32096 |   23688
8 |   3305 |   133 |1335
2 | 21 | 0 |   0
4 | 135188 | 0 |   0
5 |  57935 | 0 |   0
6 |141 | 0 |   0
7 |  24580 | 1 |   0
3 |   3282 | 0 |   0
1 | 41 | 0 |   0
9 |  3 | 0 |   0
(10 rows)

postgres=#
--

In this view,
  'lwlockid' column represents LWLockId used in the backends.
  'calls' represents how many times LWLockAcquire() was called.
  'waits' represents how many times LWLockAcquire() needed to wait
  within it before lock acquisition.
  'time_ms' represents how long LWLockAcquire() totally waited on
  a lwlock.

And lwlocks that use a LWLockId range, such as BufMappingLock or
LockMgrLock, would be grouped and summed up in a single record.
For example, lwlockid 49 in the above view represents LockMgrLock
statistics.

Now, I know there are some considerations.

(1) Performance

  I've measured LWLock performance both with and without the patch,
  and confirmed that this patch does not affect the LWLock perfomance
  at all.

  pgbench scores with the patch:
tps = 900.906658 (excluding connections establishing)
tps = 908.528422 (excluding connections establishing)
tps = 903.900977 (excluding connections establishing)
tps = 910.470595 (excluding connections establishing)
tps = 909.685396 (excluding connections establishing)

  pgbench scores without the patch:
tps = 909.096785 (excluding connections establishing)
tps = 894.868712 (excluding connections establishing)
tps = 910.074669 (excluding connections establishing)
tps = 904.022770 (excluding connections establishing)
tps = 895.673830 (excluding connections establishing)

  Of course, this experiment was not I/O bound, and the cache hit ratio
  was >99.9%.

(2) Memory space

  In this patch, I added three new members to LWLock structure
  as uint64 to collect statistics.

  It means that those members must be held in the shared memory,
  but I'm not sure whether it's appropriate.

  I think another possible option is holding those statistics
  values in local (backend) process memory, and send them through
  the stat collector process (like other statistics values).

(3) LWLock names (or labels)

  Now, pg_stat_lwlocks view shows LWLockId itself. But LWLockId is
  not easy for DBA to determine actual lock type.

  So, I want to show LWLock names (or labels), like 'WALWriteLock'
  or 'LockMgrLock', but how should I implement it?

Any comments?

Regards,
-- 
Satoshi Nagayasu 
Uptime Technologies, LLC. http://www.uptime.jp
diff -rc postgresql-9.1.2.orig/src/backend/catalog/postgres.bki 
postgresql-9.1.2/src/backend/catalog/postgres.bki
*** postgresql-9.1.2.orig/src/backend/catalog/postgres.bki  2012-06-20 
03:32:46.0 +0900
--- postgresql-9.1.2/src/backend/catalog/postgres.bki   2012-06-26 
01:51:52.0 +0900
***
*** 1553,1558 
--- 1553,1559 
  insert OID = 3071 ( pg_xlog_replay_pause 11 10 12 1 0 0 f f f t f v 0 0 2278 
"" _null_ _null_ _null_ _null_ pg_xlog_replay_pause _null_ _null_ _null_ )
  insert OID = 3072 ( pg_xlog_replay_resume 11 10 12 1 0 0 f f f t f v 0 0 2278 
"" _null_ _null_ _null_ _null_ pg_xlog_replay_resume _null_ _null_ _null_ )
  insert OID = 3073 ( pg_is_xlog_replay_paused 11 10 12 1 0 0 f f f t f v 0 0 
16 "" _null_ _null_ _null_ _null_ pg_is_xlog_replay_paused _null_ _null_ _null_ 
)
+ insert OID = 3764 ( pg_stat_get_lwlocks 11 10 12 1 100 0 f f f f t s 0 0 2249 
"" "{20,20,20,20}" "{o,o,o,o}" "{lwlockid,calls,waits,time_ms}" _null_ 
pg_stat_get_lwlocks _null_ _null_ _null_ )
  insert OID = 2621 ( pg_reload_conf 11 10 12 1 0 0 f f f t f v 0 0 16 "" 
_null_ _null_ _null_ _null_ pg_reload_conf _null_ _null_ _null_ )
  insert OID = 2622 ( pg_rotate_logfile 11 10 12 1 0 0 f f f t f v 0 0 16 "" 
_null_ _null_ _null_ _null_ pg_rotate_logfile _null_ _null_ _null_ )
  insert OID = 2623 ( pg_stat_file 11 10 12 1 0 0 f f f t f v 1 0 2249 "25" 
"{25,20,1184,1184,1184,1184,16}" "{i,o,o,o,o,o,o}" 
"{filename,size,access,modification,change,creation,isdir}" _null_ pg_stat_file 
_null_ _null_ _null_ )
diff -rc postgresql-9.1.2.orig/src/backend/catalog/postgres.description 
postgresql-9.1.2/src/backend/catalog/postgres.description
*** postgresql-9.1.2.orig/src/backend/catalog/postgres.description  
2012-06-20 03:32:46.0 +

Re: [HACKERS] libpq compression

2012-06-25 Thread k...@rice.edu

On Mon, Jun 25, 2012 at 09:45:26PM +0200, Florian Pflug wrote:
> On Jun25, 2012, at 21:21 , Dimitri Fontaine wrote:
> > Magnus Hagander  writes:
> >> Or that it takes less code/generates cleaner code...
> > 
> > So we're talking about some LZO things such as snappy from google, and
> > that would be another run time dependency IIUC.
> > 
> > I think it's time to talk about fastlz:
> > 
> >  http://fastlz.org/
> >  http://code.google.com/p/fastlz/source/browse/trunk/fastlz.c
> > 
> >  551 lines of C code under MIT license, works also under windows
> > 
> > I guess it would be easy (and safe) enough to embed in our tree should
> > we decide going this way.
> 
> Agreed. If we extend the protocol to support compression, and not rely
> on SSL, then let's pick one of these LZ77-style compressors, and let's
> integrate it into our tree.
> 
> We should then also make it possible to enable compression only for
> the server -> client direction. Since those types of compressions are
> usually pretty easy to decompress, that reduces the amount to work
> non-libpq clients have to put in to take advantage of compression.
> 
> best regards,
> Florian Pflug
> 

Here is the benchmark list from the Google lz4 page:

NameRatio   C.speed D.speed
LZ4 (r59)   2.084   330  915
LZO 2.05 1x_1   2.038   311  480
QuickLZ 1.5 -1  2.233   257  277
Snappy 1.0.52.024   227  729
LZF 2.076   197  465
FastLZ  2.030   190  420
zlib 1.2.5 -1   2.72839  195
LZ4 HC (r66)2.71218 1020
zlib 1.2.5 -6   3.09514  210

lz4 absolutely dominates on compression/decompression speed. While fastlz
is faster than zlib(-1) on compression, lz4 is almost 2X faster still.

Regards,
Ken

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [PATCH 04/16] Add embedded list interface (header only)

2012-06-25 Thread Peter Geoghegan

On 25 June 2012 20:59, Tom Lane  wrote:
> Andres Freund  writes:
>> Also, 'static inline' *is* C99 conforming as far as I can see?
>
> Hmm.  I went back and re-read the C99 spec, and it looks like most of
> the headaches we had in the past with C99 inline are specific to the
> case where you want an extern declaration to be available.  For a
> function that exists *only* as a static it might be all right.  So maybe
> I'm misremembering how well this would work.  We'd have to be sure we
> don't need any extern declarations, though.

Yeah, the extern inline functions sounds at least superficially
similar to what happened with extern templates in C++ - exactly one
compiler vendor implemented them to the letter of the standard (they
remained completely unimplemented elsewhere), and subsequently went
bust, before they were eventually removed from the standard last year.

Note that when you build Postgres with Clang, it's implicitly and
automatically building C code as C99. There is an excellent analysis
of the situation here, under "C99 inline functions":

http://clang.llvm.org/compatibility.html

> Having said that, I'm still of the opinion that it's not so hard to deal
> with that we should just blow off compilers where "inline" doesn't work
> well.

Fair enough.

-- 
Peter Geoghegan       http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training and Services

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] new --maintenance-db options

2012-06-25 Thread Tom Lane

Robert Haas  writes:
> From pg_upgrade's perspective, it would
> be nice to have a flag that starts the server in some mode where
> nobody but pg_upgrade can connect to it and all connections are
> automatically allowed, but it's not exactly clear how to implement
> "nobody but pg_upgrade can connect to it".

The implementation I've wanted to see for some time is that you can
start a standalone backend, but it speaks FE/BE protocol to its caller
(preferably over pipes, so that there is no issue whatsoever of where
you can securely put a socket or anything like that).  Making that
happen might be a bit too much work if pg_upgrade were the only use
case, but there are a lot of people who would like to use PG as an
embedded database, and this might be close enough for such use-cases.

However, that has got little to do with whether --maintenance-db is a
worthwhile thing or not, because that's about external client-side
tools, not pg_upgrade.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Catalog/Metadata consistency during changeset extraction from wal

2012-06-25 Thread Alvaro Herrera


Excerpts from Kevin Grittner's message of lun jun 25 14:50:54 -0400 2012:

> One fine point regarding before and after images -- if a value
> doesn't change in an UPDATE, there's no reason to include it in both
> the BEFORE and AFTER tuple images, as long as we have the null
> column bitmaps -- or some other way of distinguishing unchanged from
> NULL.  (This could be especially important when the unchanged column
> was a 50 MB bytea.)

Yeah, probably the best is to have the whole thing in BEFORE, and just
send AFTER values for those columns that changed, and include the
'replace' bool array (probably packed as a bitmap), so that the update
can be trivially constructed at the other end just like in
heap_modify_tuple.

-- 
Álvaro Herrera 
The PostgreSQL Company - Command Prompt, Inc.
PostgreSQL Replication, Consulting, Custom Development, 24x7 support

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [PATCH 04/16] Add embedded list interface (header only)

2012-06-25 Thread Tom Lane

Andres Freund  writes:
> On Monday, June 25, 2012 05:57:51 PM Tom Lane wrote:
>> Well, my response is "no".  I could see saying that we require (some) C99
>> features at this point, but not features that are in no standard, no
>> matter how popular gcc might be.

> Also, 'static inline' *is* C99 conforming as far as I can see?

Hmm.  I went back and re-read the C99 spec, and it looks like most of
the headaches we had in the past with C99 inline are specific to the
case where you want an extern declaration to be available.  For a
function that exists *only* as a static it might be all right.  So maybe
I'm misremembering how well this would work.  We'd have to be sure we
don't need any extern declarations, though.

Having said that, I'm still of the opinion that it's not so hard to deal
with that we should just blow off compilers where "inline" doesn't work
well.  I have no sympathy at all for the "we'd need two copies"
argument.  First off, if the code is at any risk whatsoever of changing
intra-major-release, it is not acceptable to inline it (there would be
inline copies in third-party modules where we couldn't ensure
recompilation).  So that's going to force us to use this only in cases
where the code is small and stable enough that two copies aren't such
a big problem.  Second, it's not that hard to set things up so there's
only one source-code copy, as was noted upthread.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] libpq compression

2012-06-25 Thread Phil Sorber

On Mon, Jun 25, 2012 at 3:45 PM, Florian Pflug  wrote:
> On Jun25, 2012, at 21:21 , Dimitri Fontaine wrote:
>> Magnus Hagander  writes:
>>> Or that it takes less code/generates cleaner code...
>>
>> So we're talking about some LZO things such as snappy from google, and
>> that would be another run time dependency IIUC.
>>
>> I think it's time to talk about fastlz:
>>
>>  http://fastlz.org/
>>  http://code.google.com/p/fastlz/source/browse/trunk/fastlz.c
>>
>>  551 lines of C code under MIT license, works also under windows
>>
>> I guess it would be easy (and safe) enough to embed in our tree should
>> we decide going this way.
>
> Agreed. If we extend the protocol to support compression, and not rely
> on SSL, then let's pick one of these LZ77-style compressors, and let's
> integrate it into our tree.
>
> We should then also make it possible to enable compression only for
> the server -> client direction. Since those types of compressions are
> usually pretty easy to decompress, that reduces the amount to work
> non-libpq clients have to put in to take advantage of compression.

+1

>
> best regards,
> Florian Pflug
>
>
> --
> Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-hackers

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] libpq compression

2012-06-25 Thread Florian Pflug

On Jun25, 2012, at 21:21 , Dimitri Fontaine wrote:
> Magnus Hagander  writes:
>> Or that it takes less code/generates cleaner code...
> 
> So we're talking about some LZO things such as snappy from google, and
> that would be another run time dependency IIUC.
> 
> I think it's time to talk about fastlz:
> 
>  http://fastlz.org/
>  http://code.google.com/p/fastlz/source/browse/trunk/fastlz.c
> 
>  551 lines of C code under MIT license, works also under windows
> 
> I guess it would be easy (and safe) enough to embed in our tree should
> we decide going this way.

Agreed. If we extend the protocol to support compression, and not rely
on SSL, then let's pick one of these LZ77-style compressors, and let's
integrate it into our tree.

We should then also make it possible to enable compression only for
the server -> client direction. Since those types of compressions are
usually pretty easy to decompress, that reduces the amount to work
non-libpq clients have to put in to take advantage of compression.

best regards,
Florian Pflug

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] libpq compression

2012-06-25 Thread Dimitri Fontaine

Magnus Hagander  writes:
> Or that it takes less code/generates cleaner code...

So we're talking about some LZO things such as snappy from google, and
that would be another run time dependency IIUC.

I think it's time to talk about fastlz:

  http://fastlz.org/
  http://code.google.com/p/fastlz/source/browse/trunk/fastlz.c

  551 lines of C code under MIT license, works also under windows

I guess it would be easy (and safe) enough to embed in our tree should
we decide going this way.

Regards,
-- 
Dimitri Fontaine
http://2ndQuadrant.fr PostgreSQL : Expertise, Formation et Support

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Catalog/Metadata consistency during changeset extraction from wal

2012-06-25 Thread Andres Freund

Hi,

On Monday, June 25, 2012 08:13:54 PM Robert Haas wrote:
> On Mon, Jun 25, 2012 at 1:50 PM, Andres Freund  
wrote:
> > Its an argument why related infrastructure would be interesting to more
> > than that patch and thats not bad.
> > If the garbage collecting is done in a very simplistic manner it doesn't
> > sound too hard... The biggest problem is probably crash-recovery of that
> > knowledge and how to hook knowledge into it that logical rep needs that
> > data...
> I suppose the main reason we haven't done it already is that it
> increases the period of time during which we're using 2X the disk
> space.
I find that an acceptable price if its optional. Making it such doesn't seem 
to be a problem for me.

> >> > I don't think its that easy. If you e.g. have multiple ALTER's in the
> >> > same transaction interspersed with inserted rows they will all have
> >> > different TupleDesc's.
> >> 
> >> If new columns were added, then tuples created with those older
> >> tuple-descriptors can still be interpreted with the latest
> >> tuple-descriptor.
> > 
> > But you need to figure that out. If you have just the before-after images
> > of the tupledescs you don't know what happened in there... That would
> > mean either doing special things on catalog changes or reassembling the
> > meaning from the changed pg_* rows. Neither seems enticing.
> 
> I think there is absolutely nothing wrong with doing extra things in
> ALTER TABLE when logical replication is enabled.  We've got code
> that's conditional on Hot Standby being enabled in many places in the
> system; why should logical replication be any different?  If we set
> the bar for logical replication at "the system can't do anything
> differently when logical replication is enabled" then I cheerfully
> submit that we are doomed.  You've already made WAL format changes to
> support logging the pre-image of the tuple, which is a hundred times
> more likely to cause a performance problem than any monkeying around
> we might want to do in ALTER TABLE.
>
> I am deeply skeptical that we need to look inside of transactions that
> do full-table rewrites.  But even if we do, I don't see that what I'm
> proposing precludes it.  For example, I think we could have ALTER
> TABLE emit WAL records specifically for logical replication that allow
> us to disentangle which tuple descriptor to use at which point in the
> transaction.  I don't see that that would even be very difficult to
> set up.
Sorry, I was imprecise above: I have no problem doing some changes during 
ALTER TABLE if logical rep is enabled. I am worried though that to make that 
robust you would need loads of places that emit additional information:
* ALTER TABLE
* ALTER FUNCTIION
* ALTER OPERATOR
* ALTER/CREATE CAST
* TRUNCATE
* CLUSTER
* ...

I have the feeling that would we want to do that the full amount of required 
information would be rather high and end up being essentially the catalog. 
Just having an intermediate tupledesc doesn't help that much if you have e.g. 
record_out doing type lookups of its own.

There also is the issue you have talked about before, that a user-type might 
depend on values in other tables. Unless were ready to break at least 
transactional behaviour for those for now...) I don't see how decoding outside 
of the transaction is ever going to be valid? I wouldn't have a big problem 
declaring that as broken for now...

Greetings,

Andres

-- 
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] new --maintenance-db options

2012-06-25 Thread Alvaro Herrera


Excerpts from Robert Haas's message of lun jun 25 14:58:25 -0400 2012:
> 
> On Mon, Jun 25, 2012 at 2:49 PM, Alvaro Herrera
>  wrote:
> > Excerpts from Robert Haas's message of lun jun 25 11:57:36 -0400 2012:
> >> Really, I think
> >> pg_upgrade needs this option too, unless we're going to kill the
> >> problem at its root by providing a reliable way to enumerate database
> >> names without first knowing the name one that you can connect to.
> >
> > I think pg_upgrade could do this one task by using a standalone backend
> > instead of a full-blown postmaster.  It should be easy enough ...
> 
> Maybe, but it seems like baking even more hackery into a tool that's
> already got too much hackery.  It's also hard for pg_upgrade to know
> things like - whether pg_hba.conf prohibits access to certain
> users/databases/etc. or just requires the use of authentication
> methods that happen to fail.  From pg_upgrade's perspective, it would
> be nice to have a flag that starts the server in some mode where
> nobody but pg_upgrade can connect to it and all connections are
> automatically allowed, but it's not exactly clear how to implement
> "nobody but pg_upgrade can connect to it".

Well, have it specify a private socket directory, listen only on that
(not TCP), and bypass all pg_hba rules.

-- 
Álvaro Herrera 
The PostgreSQL Company - Command Prompt, Inc.
PostgreSQL Replication, Consulting, Custom Development, 24x7 support

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] new --maintenance-db options

2012-06-25 Thread Robert Haas

On Mon, Jun 25, 2012 at 2:49 PM, Alvaro Herrera
 wrote:
> Excerpts from Robert Haas's message of lun jun 25 11:57:36 -0400 2012:
>> Really, I think
>> pg_upgrade needs this option too, unless we're going to kill the
>> problem at its root by providing a reliable way to enumerate database
>> names without first knowing the name one that you can connect to.
>
> I think pg_upgrade could do this one task by using a standalone backend
> instead of a full-blown postmaster.  It should be easy enough ...

Maybe, but it seems like baking even more hackery into a tool that's
already got too much hackery.  It's also hard for pg_upgrade to know
things like - whether pg_hba.conf prohibits access to certain
users/databases/etc. or just requires the use of authentication
methods that happen to fail.  From pg_upgrade's perspective, it would
be nice to have a flag that starts the server in some mode where
nobody but pg_upgrade can connect to it and all connections are
automatically allowed, but it's not exactly clear how to implement
"nobody but pg_upgrade can connect to it".

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Catalog/Metadata consistency during changeset extraction from wal

2012-06-25 Thread Kevin Grittner

Andres Freund  wrote:

>> We most particularly *don't* want DDL to replicate automatically,
>> because the schema changes are deployed along with related
>> software changes, and we like to pilot any changes for at least a
>> few days.  Depending on the release, the rollout may take a
>> couple months, or we may slam in out everywhere a few days after
>> the first pilot deployment.
> Thats a sensible for your use-case - but I do not think its thats
> the appropriate behaviour for anything which is somewhat
> out-of-the box...

Right.  We currently consider the issues involved in a change and
script it by hand.  I think we want to continue to do that.  The
point was that, given the variety of timings and techniques for
distributing schema changes, maybe is was only worth doing that
automatically for the most constrained and controlled cases.

>> So you could certainly punt all of this for any release as far as
>> Wisconsin Courts are concerned.  We need to know table and column
>> names, before and after images, and some application-supplied
>> metadata.
> I am not sure were going to get all that into 9.3.

Sure, that was more related to why I was questioning how much these
use cases even *could* integrate -- whether it even paid to
*consider* these use cases at this point.  Responses from Robert and
you have pretty much convinced me that I was being overly
pessimistic on that.

One fine point regarding before and after images -- if a value
doesn't change in an UPDATE, there's no reason to include it in both
the BEFORE and AFTER tuple images, as long as we have the null
column bitmaps -- or some other way of distinguishing unchanged from
NULL.  (This could be especially important when the unchanged column
was a 50 MB bytea.)  It doesn't matter to me which way this is
optimized -- in our trigger-based system we chose to keep the full
BEFORE tuple and just show AFTER values which were distinct from the
corresponding BEFORE values, but it would be trivial to adapt the
code to the other way.

Sorry for that bout of pessimism.

-Kevin

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] new --maintenance-db options

2012-06-25 Thread Alvaro Herrera


Excerpts from Robert Haas's message of lun jun 25 11:57:36 -0400 2012:

> Really, I think
> pg_upgrade needs this option too, unless we're going to kill the
> problem at its root by providing a reliable way to enumerate database
> names without first knowing the name one that you can connect to.

I think pg_upgrade could do this one task by using a standalone backend
instead of a full-blown postmaster.  It should be easy enough ...

-- 
Álvaro Herrera 
The PostgreSQL Company - Command Prompt, Inc.
PostgreSQL Replication, Consulting, Custom Development, 24x7 support

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Catalog/Metadata consistency during changeset extraction from wal

2012-06-25 Thread Andres Freund

Hi,

(munching the mail from Robert and Kevin together)

On Monday, June 25, 2012 06:42:41 PM Kevin Grittner wrote:
> Robert Haas  wrote:
> > I bet for a lot of replication systems, the answer is "do a full
> > resync".  In other words, we either forbid the operation outright
> > when the table is enabled for logical replication, or else we emit
> > an LCR that says, in effect, "transaction 12345 monkeyed with the
> > table, please resync".  It strikes me that it's really the job of
> > some higher-level control logic to decide what the "correct"
> > behavior is in these cases; the decoding process doesn't really
> > have enough information about what the user is trying to do to
> > make a sensible decision anyway.
> 
> This is clearly going to depend on the topology.  You would
> definitely want to try to replicate the DDL for the case on which
> Simon is focused (which seems to me to be essentially physical
> replication of catalogs with logical replication of data changes
> from any machine to all others).  What you do about transactions in
> flight is the hard part.  You could try to suppress concurrent DML
> of the same objects or have some complex matrix of rules for trying
> to resolve the transactions in flight.  I don't see how the latter
> could ever be 100% accurate.
Yes. Thats why I dislike that proposal. I don't think thats going to be 
understandable and robust enough.

If we really look inside transactions (3b) and 1)) that shouldn't be a problem 
though. So I think it really has to be one of those.


> In our shop it is much easier.  We always have one database which is
> the only valid source for any tuple, although rows from many such
> databases can be in one table, and one row might replicate to many
> databases.  Thus, we don't want automatic replication of DDL.
> 
>  - When a column is going to be added to the source machines, we
>first add it to the targets, with either a default or as
>NULL-capable.
> 
>  - When a column is going to be deleted from the source machines, we
>make sure it is NULL-capable or has a default on the replicas.
>We drop it from all replicas after it is gone from all sources.
> 
>  - If a column is changing name or is changing to a fundamentally
>different type we need to give the new column a new name, have
>triggers to convert old to new (and vice versa) on the replicas,
>and drop the old after all sources are updated.
> 
>  - If a column is changing in a minor way, like its precision, we
>make sure the replicas can accept either format until all sources
>have been converted.  We update the replicas to match the sources
>after all sources are converted.
>
> We most particularly *don't* want DDL to replicate automatically,
> because the schema changes are deployed along with related software
> changes, and we like to pilot any changes for at least a few days.
> Depending on the release, the rollout may take a couple months, or
> we may slam in out everywhere a few days after the first pilot
> deployment.
Thats a sensible for your use-case - but I do not think its thats the 
appropriate behaviour for anything which is somewhat out-of-the box...

> So you could certainly punt all of this for any release as far as
> Wisconsin Courts are concerned.  We need to know table and column
> names, before and after images, and some application-supplied
> metadata.
I am not sure were going to get all that into 9.3. More on that below.

On Monday, June 25, 2012 07:09:38 PM Robert Haas wrote:
> On Mon, Jun 25, 2012 at 12:42 PM, Kevin Grittner wrote:
> > I don't know that what we're looking for is any easier (although I
> > doubt that it's any harder), but I'm starting to wonder how much
> > mechanism they can really share.  The 2Q code is geared toward page
> > format OIDs and data values for automated DDL distribution and
> > faster replication, while we're looking for something which works
> > between releases, architectures, and OSes.  We keep coming back to
> > the idea of one mechanism because both WAL and a logical transaction
> > stream would have "after" tuples, although they need them in
> > different formats.
> > 
> > I think the need for truly logical replication is obvious, since so
> > many different people have developed trigger-based versions of that.
> > And it sure seems like 2Q has clients who are willing to pay for the
> > other.
> >
> > Perhaps the first question is: Is there enough in common between
> > logical replication (and all the topologies that might be created
> > with that) and the proposal on the table (which seems to be based
> > around one particular topology with a vague notion of bolting
> > logical replication on to it after the fact) to try to resolve the
> > differences in one feature?  Or should the "identical schema with
> > multiple identical copies" case be allowed to move forward more or
> > less in isolation, with logical replication having its own design if
> > and when someone wants to tak

Re: [HACKERS] Catalog/Metadata consistency during changeset extraction from wal

2012-06-25 Thread Robert Haas

On Mon, Jun 25, 2012 at 1:50 PM, Andres Freund  wrote:
> Its an argument why related infrastructure would be interesting to more than
> that patch and thats not bad.
> If the garbage collecting is done in a very simplistic manner it doesn't sound
> too hard... The biggest problem is probably crash-recovery of that knowledge
> and how to hook knowledge into it that logical rep needs that data...

I suppose the main reason we haven't done it already is that it
increases the period of time during which we're using 2X the disk
space.

>> > I don't think its that easy. If you e.g. have multiple ALTER's in the
>> > same transaction interspersed with inserted rows they will all have
>> > different TupleDesc's.
>>
>> If new columns were added, then tuples created with those older
>> tuple-descriptors can still be interpreted with the latest
>> tuple-descriptor.
> But you need to figure that out. If you have just the before-after images of
> the tupledescs you don't know what happened in there... That would mean either
> doing special things on catalog changes or reassembling the meaning from the
> changed pg_* rows. Neither seems enticing.

I think there is absolutely nothing wrong with doing extra things in
ALTER TABLE when logical replication is enabled.  We've got code
that's conditional on Hot Standby being enabled in many places in the
system; why should logical replication be any different?  If we set
the bar for logical replication at "the system can't do anything
differently when logical replication is enabled" then I cheerfully
submit that we are doomed.  You've already made WAL format changes to
support logging the pre-image of the tuple, which is a hundred times
more likely to cause a performance problem than any monkeying around
we might want to do in ALTER TABLE.

>> Can we just agree to punt all this complexity for version 1 (and maybe
>> versions 2, 3, and 4)?  I'm not sure what Slony does in situations
>> like this but I bet for a lot of replication systems, the answer is
>> "do a full resync".  In other words, we either forbid the operation
>> outright when the table is enabled for logical replication, or else we
>> emit an LCR that says, in effect, "transaction 12345 monkeyed with the
>> table, please resync".  It strikes me that it's really the job of some
>> higher-level control logic to decide what the "correct" behavior is in
>> these cases; the decoding process doesn't really have enough
>> information about what the user is trying to do to make a sensible
>> decision anyway.  It would be nice to be able to support some simple
>> cases like "adding a column that has no default" or "dropping a
>> column" without punting, but going much further than that seems like
>> it will require embedding policy decisions that should really be
>> happening at a higher level.
> I am totally fine with saying that we do not support everything from the
> start. But we need to choose an architecture where its possible to add that
> support gradually and I don't think without looking inside transaction makes
> that possible.

I am deeply skeptical that we need to look inside of transactions that
do full-table rewrites.  But even if we do, I don't see that what I'm
proposing precludes it.  For example, I think we could have ALTER
TABLE emit WAL records specifically for logical replication that allow
us to disentangle which tuple descriptor to use at which point in the
transaction.  I don't see that that would even be very difficult to
set up.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] WAL format changes

2012-06-25 Thread Robert Haas

On Mon, Jun 25, 2012 at 1:57 PM, Fujii Masao  wrote:
>> "<<" should be ">>". The attached patch fixes this typo.
>
> Oh, I forgot to attach the patch.. Here is the patch.

I committed both of the patches you posted to this thread.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] WAL format changes

2012-06-25 Thread Fujii Masao

On Tue, Jun 26, 2012 at 2:53 AM, Fujii Masao  wrote:
> On Mon, Jun 25, 2012 at 1:24 AM, Heikki Linnakangas
>  wrote:
>> Ok, committed all the WAL format changes now.
>
> I found the typo.
>
> In walsender.c
> -                reply.write.xlogid, reply.write.xrecoff,
> -                reply.flush.xlogid, reply.flush.xrecoff,
> -                reply.apply.xlogid, reply.apply.xrecoff);
> +                (uint32) (reply.write << 32), (uint32) reply.write,
> +                (uint32) (reply.flush << 32), (uint32) reply.flush,
> +                (uint32) (reply.apply << 32), (uint32) reply.apply);
>
> "<<" should be ">>". The attached patch fixes this typo.

Oh, I forgot to attach the patch.. Here is the patch.

Regards,

-- 
Fujii Masao


walsender_typo_v1.patch
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] WAL format changes

2012-06-25 Thread Fujii Masao

On Mon, Jun 25, 2012 at 1:24 AM, Heikki Linnakangas
 wrote:
> Ok, committed all the WAL format changes now.

I found the typo.

In walsender.c
-reply.write.xlogid, reply.write.xrecoff,
-reply.flush.xlogid, reply.flush.xrecoff,
-reply.apply.xlogid, reply.apply.xrecoff);
+(uint32) (reply.write << 32), (uint32) reply.write,
+(uint32) (reply.flush << 32), (uint32) reply.flush,
+(uint32) (reply.apply << 32), (uint32) reply.apply);

"<<" should be ">>". The attached patch fixes this typo.

Regards,

-- 
Fujii Masao

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Catalog/Metadata consistency during changeset extraction from wal

2012-06-25 Thread Andres Freund

On Monday, June 25, 2012 05:34:13 PM Robert Haas wrote:
> On Mon, Jun 25, 2012 at 9:43 AM, Andres Freund  
wrote:
> >> > The only theoretical way I see against that problem would be to
> >> > postpone all relation unlinks untill everything that could possibly
> >> > read them has finished. Doesn't seem to alluring although it would be
> >> > needed if we ever move more things of SnapshotNow.
> >> > 
> >> > Input/Ideas/Opinions?
> >> 
> >> Yeah, this is slightly nasty.  I'm not sure whether or not there's a
> >> way to make it work.
> > 
> > Postponing all non-rollback unlinks to the next "logical checkpoint" is
> > the only thing I can think of...
> There are a number of cool things we could do if we postponed unlinks.
>  Like, why can't we allow concurrent read-only queries while a CLUSTER
> operation is in progress?  Well, two reasons.  The first is that we
> currently can't do ANY DDL with less than a full table lock because of
> SnapshotNow-related race conditions.  The second is that people might
> still need to look at the old heap after the CLUSTER transaction
> commits.  Some kind of delayed unlink facility where we
> garbage-collect relation backing files when their refcount falls to
> zero would solve the second problem - not that that's any help by
> itself without a solution to the first one, but hey.
Its an argument why related infrastructure would be interesting to more than 
that patch and thats not bad.
If the garbage collecting is done in a very simplistic manner it doesn't sound 
too hard... The biggest problem is probably crash-recovery of that knowledge 
and how to hook knowledge into it that logical rep needs that data...

> >> I had another idea.  Suppose decoding happens directly on the primary,
> >> because I'm still hoping there's a way to swing that.  Suppose further
> >> that we handle DDL by insisting that (1) any backend which wants to
> >> add columns or change the types of existing columns must first wait
> >> for logical replication to catch up and (2) if a backend which has
> >> added columns or changed the types of existing columns then writes to
> >> the modified table, decoding of those writes will be postponed until
> >> transaction commit.  I think that's enough to guarantee that the
> >> decoding process can just use the catalogs as they stand, with plain
> >> old SnapshotNow.
> > 
> > I don't think its that easy. If you e.g. have multiple ALTER's in the
> > same transaction interspersed with inserted rows they will all have
> > different TupleDesc's.
> 
> If new columns were added, then tuples created with those older
> tuple-descriptors can still be interpreted with the latest
> tuple-descriptor.
But you need to figure that out. If you have just the before-after images of 
the tupledescs you don't know what happened in there... That would mean either 
doing special things on catalog changes or reassembling the meaning from the 
changed pg_* rows. Neither seems enticing.

> Columns that are dropped or retyped are a little trickier, but
> honestly... how much do we care about those cases?  How practical is
> it to suppose we're going to be able to handle them sanely anyway?
> Suppose that the user defines a type which works just like int4 except
> that the output functions writes out each number in pig latin (and the
> input function parses pig latin).  The user defines the types as
> binary coercible to each other and then does ALTER TABLE on a large
> table with an int4 column, transforming it into an int4piglatin
> column.  Due to Noah Misch's fine work, we will conclude that no table
> rewrite is needed.  But if logical replication is in use, then in
> theory we should scan the whole table and generate an LCR for each row
> saying "the row with primary key X was updated, and column Y, which
> used to contain 42, now contains ourty-two-fay".  Otherwise, if we're
> doing heterogenous replication into a system that just stores that
> column as text, it'll end up with the wrong contents.  On the other
> hand, if we're trying to ship data to another PostgreSQL instance
> where the column hasn't yet been updated, then all of those LCRs are
> just going to error out when we try to apply them.

> A more realistic scenario where you have the same problem is with
> something like ALTER TABLE .. ADD COLUMN .. DEFAULT.   If you add a
> column with a default in a single step (as opposed to first adding the
> column and then setting its default), we rewrite the table and set
> every row to the default value.  Should that generate LCRs showing
> every row being updated to add that new value, or should we generate
> no LCRs and assume that the DBA will independently do the same
> operation on the remote side?  Either answer could be correct,
> depending on how the LCRs are being used.  If you're just rewriting
> with a constant default, then perhaps the sensible thing is to
> generate no LCRs, since it will be more efficient to mimic the
> operation on the remote side than to rep

Re: [HACKERS] WAL format changes

2012-06-25 Thread Fujii Masao

On Mon, Jun 25, 2012 at 1:24 AM, Heikki Linnakangas
 wrote:
> Ok, committed all the WAL format changes now.

This breaks pg_resetxlog -l at all. When I ran "pg_resetxlog -l 0x01,0x01,0x01"
in the HEAD, I got the following error message though the same command
successfully completed in 9.1.

pg_resetxlog: invalid argument for option -l
Try "pg_resetxlog --help" for more information.

I think the attached patch needs to be applied.

Regards,

-- 
Fujii Masao


resetxlog_bugfix_v1.patch
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] libpq compression

2012-06-25 Thread Greg Jaskiewicz

Wasn't this more of an issue in de-coupling compression from encryption ?

 
On 25 Jun 2012, at 16:36, Euler Taveira wrote:

> On 24-06-2012 23:04, Robert Haas wrote:
>> So I think we really
>> need someone to try this both ways and compare.  Right now it seems
>> like we're mostly speculating on how well either approach would work,
>> which is not as good as having some experimental results.
>> 
> Not a problem. That's what I'm thinking too but I would like to make sure that
> others don't object to general idea. Let me give it a try in both ideas...
> 
> 
> -- 
>   Euler Taveira de Oliveira - Timbira   http://www.timbira.com.br/
>   PostgreSQL: Consultoria, Desenvolvimento, Suporte 24x7 e Treinamento
> 
> -- 
> Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-hackers
> 


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [PATCH 04/16] Add embedded list interface (header only)

2012-06-25 Thread Andres Freund

On Monday, June 25, 2012 05:57:51 PM Tom Lane wrote:
> Andres Freund  writes:
> > On Monday, June 25, 2012 05:15:43 PM Tom Lane wrote:
> >> So you propose to define any compiler that strictly implements C99 as
> >> not sensible and not one that will be able to compile Postgres?
> > 
> > I propose to treat any compiler which has no way to get to equivalent
> > behaviour as not sensible. Yes.

> Well, my response is "no".  I could see saying that we require (some) C99
> features at this point, but not features that are in no standard, no
> matter how popular gcc might be.
I fail to see how gcc is the relevant point here given that there is 
equivalent definitions available from multiple compiler vendors.

Also, 'static inline' *is* C99 conforming as far as I can see? The problem 
with it is that some compilers may warn if the function isn't used in the same 
translation unit. Thats doesn't make not using a function non standard-
conforming though.

> > I don't think there really are many of those
> > around. As you pointed out there is only one compiler in the buildfarm
> > with problems
> This just means we don't have a wide enough collection of non-mainstream
> machines in the buildfarm.  Deciding to break any platform with a
> non-gcc-equivalent compiler isn't going to improve that.
No, it won't improve that. But neither will the contrary.

Greetings,

Andres
-- 
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] pg_upgrade broken by xlog numbering

2012-06-25 Thread Robert Haas

On Mon, Jun 25, 2012 at 8:11 AM, Kevin Grittner
 wrote:
> On HEAD at the moment, `make check-world` is failing on a 32-bit Linux
> build:

This appears to be because of the following hunk from commit
dfda6ebaec6763090fb78b458a979b558c50b39b:

@@ -558,10 +536,10 @@ PrintControlValues(bool guessed)
snprintf(sysident_str, sizeof(sysident_str), UINT64_FORMAT,
 ControlFile.system_identifier);

-   printf(_("First log file ID after reset:%u\n"),
-  newXlogId);
-   printf(_("First log file segment after reset:   %u\n"),
-  newXlogSeg);
+   XLogFileName(fname, ControlFile.checkPointCopy.ThisTimeLineID, newXlogSe
+
+   printf(_("First log segment after reset:%s\n"),
+  fname);
printf(_("pg_control version number:%u\n"),
   ControlFile.pg_control_version);
printf(_("Catalog version number:   %u\n"),

Evidently, Heikki failed to realize that pg_upgrade gets the control
data information by parsing the output of pg_controldata.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [COMMITTERS] pgsql: Remove sanity test in XRecOffIsValid.

2012-06-25 Thread Robert Haas

On Mon, Jun 25, 2012 at 12:35 PM, Tom Lane  wrote:
> Robert Haas  writes:
>> Remove sanity test in XRecOffIsValid.
>
>> Commit 061e7efb1b4c5b8a5d02122b7780531b8d5bf23d changed the rules
>> for splitting xlog records across pages, but neglected to update this
>> test.  It's possible that there's some better action here than just
>> removing the test completely, but this at least appears to get some
>> of the things that are currently broken (like initdb on MacOS X)
>> working again.
>
> Offhand, I'm wondering why this macro doesn't include a MAXALIGN test.
> If it did, I don't think that the upper-limit test would have any
> use anymore.

Yeah, I wondered that, too, but wasn't sure enough about what the real
alignment requirements were to do it myself.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Catalog/Metadata consistency during changeset extraction from wal

2012-06-25 Thread Robert Haas

On Mon, Jun 25, 2012 at 12:42 PM, Kevin Grittner
 wrote:
> Perhaps the first question is: Is there enough in common between
> logical replication (and all the topologies that might be created
> with that) and the proposal on the table (which seems to be based
> around one particular topology with a vague notion of bolting
> logical replication on to it after the fact) to try to resolve the
> differences in one feature?  Or should the "identical schema with
> multiple identical copies" case be allowed to move forward more or
> less in isolation, with logical replication having its own design if
> and when someone wants to take it on?  Two non-compromised features
> might be cleaner -- I'm starting to feel like we're trying to design
> a toaster which can also water your garden.

I think there are a number of shared pieces.  Being able to read WAL
and do something with it is a general need that both solutions share;
I think actually that might be the piece that we should try to get
committed first.  I suspect that there are a number of applications
for just that and nothing more - for example, it might allow a contrib
module that reads WAL as it's generated and prints out a debug trace,
which I can imagine being useful.

Also, I think that even for MMR there will be a need for control
logic, resynchronization, and similar mechanisms.  I mean, suppose you
have four servers in an MMR configuration.  Now, you want to deploy a
schema change that adds a new column and which, as it so happens,
requires a table rewrite to add the default.  It is very possible that
you do NOT want that to automatically replicate around the cluster.
Instead, you likely want to redirect load to the remaining three
servers, do the change on the fourth, put it back into the ring and
take out a different one, do the change on that one, and so on.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Catalog/Metadata consistency during changeset extraction from wal

2012-06-25 Thread Kevin Grittner

Robert Haas  wrote:

> I bet for a lot of replication systems, the answer is "do a full
> resync".  In other words, we either forbid the operation outright
> when the table is enabled for logical replication, or else we emit
> an LCR that says, in effect, "transaction 12345 monkeyed with the
> table, please resync".  It strikes me that it's really the job of
> some higher-level control logic to decide what the "correct"
> behavior is in these cases; the decoding process doesn't really
> have enough information about what the user is trying to do to
> make a sensible decision anyway.

This is clearly going to depend on the topology.  You would
definitely want to try to replicate the DDL for the case on which
Simon is focused (which seems to me to be essentially physical
replication of catalogs with logical replication of data changes
from any machine to all others).  What you do about transactions in
flight is the hard part.  You could try to suppress concurrent DML
of the same objects or have some complex matrix of rules for trying
to resolve the transactions in flight.  I don't see how the latter
could ever be 100% accurate.

In our shop it is much easier.  We always have one database which is
the only valid source for any tuple, although rows from many such
databases can be in one table, and one row might replicate to many
databases.  Thus, we don't want automatic replication of DDL.

 - When a column is going to be added to the source machines, we
   first add it to the targets, with either a default or as
   NULL-capable.

 - When a column is going to be deleted from the source machines, we
   make sure it is NULL-capable or has a default on the replicas. 
   We drop it from all replicas after it is gone from all sources.

 - If a column is changing name or is changing to a fundamentally
   different type we need to give the new column a new name, have
   triggers to convert old to new (and vice versa) on the replicas,
   and drop the old after all sources are updated.

 - If a column is changing in a minor way, like its precision, we
   make sure the replicas can accept either format until all sources
   have been converted.  We update the replicas to match the sources
   after all sources are converted.

We most particularly *don't* want DDL to replicate automatically,
because the schema changes are deployed along with related software
changes, and we like to pilot any changes for at least a few days. 
Depending on the release, the rollout may take a couple months, or
we may slam in out everywhere a few days after the first pilot
deployment.

So you could certainly punt all of this for any release as far as
Wisconsin Courts are concerned.  We need to know table and column
names, before and after images, and some application-supplied
metadata.

I don't know that what we're looking for is any easier (although I
doubt that it's any harder), but I'm starting to wonder how much
mechanism they can really share.  The 2Q code is geared toward page
format OIDs and data values for automated DDL distribution and
faster replication, while we're looking for something which works
between releases, architectures, and OSes.  We keep coming back to
the idea of one mechanism because both WAL and a logical transaction
stream would have "after" tuples, although they need them in
different formats.

I think the need for truly logical replication is obvious, since so
many different people have developed trigger-based versions of that.
And it sure seems like 2Q has clients who are willing to pay for the
other.

Perhaps the first question is: Is there enough in common between
logical replication (and all the topologies that might be created
with that) and the proposal on the table (which seems to be based
around one particular topology with a vague notion of bolting
logical replication on to it after the fact) to try to resolve the
differences in one feature?  Or should the "identical schema with
multiple identical copies" case be allowed to move forward more or
less in isolation, with logical replication having its own design if
and when someone wants to take it on?  Two non-compromised features
might be cleaner -- I'm starting to feel like we're trying to design
a toaster which can also water your garden.

-Kevin

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [COMMITTERS] pgsql: Remove sanity test in XRecOffIsValid.

2012-06-25 Thread Tom Lane

Robert Haas  writes:
> Remove sanity test in XRecOffIsValid.

> Commit 061e7efb1b4c5b8a5d02122b7780531b8d5bf23d changed the rules
> for splitting xlog records across pages, but neglected to update this
> test.  It's possible that there's some better action here than just
> removing the test completely, but this at least appears to get some
> of the things that are currently broken (like initdb on MacOS X)
> working again.

Offhand, I'm wondering why this macro doesn't include a MAXALIGN test.
If it did, I don't think that the upper-limit test would have any
use anymore.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] warning handling in Perl scripts

2012-06-25 Thread Ryan Kelly

On Mon, Jun 25, 2012 at 12:07:55PM -0400, Tom Lane wrote:
> "David E. Wheeler"  writes:
> > Hrm, I think that `use warnings 'FATAL';` might only work for core 
> > warnings. Which is annoying. I missed what was warning up-thread, but the 
> > most foolproof way to make all warnings fatal is the originally suggested
> 
> >   local $SIG{__WARN__} = sub { die shift };
> 
> Sigh, let's do it that way then.
> 
> > A *bit* cleaner is to use Carp::croak:
> 
> > use Carp;
> > local $SIG{__WARN__} = \&croak;
> 
> Just as soon not add a new module dependency if we don't have to.
Carp is core since Perl 5 [1994-10-17].

> In this case, since we're not really expecting the warnings to get
> thrown, it seems like there'd be little value added by doing so.
> 
>   regards, tom lane
> 
> -- 
> Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-hackers

-Ryan Kelly

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] warning handling in Perl scripts

2012-06-25 Thread Tom Lane

"David E. Wheeler"  writes:
> Hrm, I think that `use warnings 'FATAL';` might only work for core warnings. 
> Which is annoying. I missed what was warning up-thread, but the most 
> foolproof way to make all warnings fatal is the originally suggested

>   local $SIG{__WARN__} = sub { die shift };

Sigh, let's do it that way then.

> A *bit* cleaner is to use Carp::croak:

> use Carp;
> local $SIG{__WARN__} = \&croak;

Just as soon not add a new module dependency if we don't have to.
In this case, since we're not really expecting the warnings to get
thrown, it seems like there'd be little value added by doing so.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] pg_upgrade broken by xlog numbering

2012-06-25 Thread Robert Haas

On Mon, Jun 25, 2012 at 11:50 AM, Tom Lane  wrote:
> Robert Haas  writes:
>> On MacOS X, on latest sources, initdb fails:
>
>> creating directory /Users/rhaas/pgsql/src/test/regress/./tmp_check/data ... 
>> ok
>> creating subdirectories ... ok
>> selecting default max_connections ... 100
>> selecting default shared_buffers ... 32MB
>> creating configuration files ... ok
>> creating template1 database in
>> /Users/rhaas/pgsql/src/test/regress/./tmp_check/data/base/1 ... ok
>> initializing pg_authid ... ok
>> initializing dependencies ... ok
>> creating system views ... ok
>> loading system objects' descriptions ... ok
>> creating collations ... ok
>> creating conversions ... ok
>> creating dictionaries ... FATAL:  control file contains invalid data
>> child process exited with exit code 1
>
> Same for me.  It's crashing here:
>
>    if (ControlFile->state < DB_SHUTDOWNED ||
>        ControlFile->state > DB_IN_PRODUCTION ||
>        !XRecOffIsValid(ControlFile->checkPoint))
>        ereport(FATAL,
>                (errmsg("control file contains invalid data")));
>
> state == DB_SHUTDOWNED, so the problem is with the XRecOffIsValid test.
> ControlFile->checkPoint == 19972072 (0x130BFE8), what's wrong with that?
>
> (I suppose the reason this is only failing on some machines is
> platform-specific variations in xlog entry size, but it's still a bit
> distressing that this got committed in such a broken state.)

I'm guessing that the problem is as follows: in the old code, the
XLogRecord header could not be split, so any offset that was closer to
the end of the page than SizeOfXLogRecord was a sure sign of trouble.
But commit 061e7efb1b4c5b8a5d02122b7780531b8d5bf23d relaxed that
restriction, so now it IS legal for the checkpoint record to be where
it is.  But it seems that XRecOffIsValid() didn't get the memo.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] warning handling in Perl scripts

2012-06-25 Thread David E. Wheeler

On Jun 25, 2012, at 5:51 PM, Alvaro Herrera wrote:

>> However, that works only for the current lexical scope. If there are 
>> warnings in the code you are calling from the current scope, the use of 
>> `local $SIG{__WARN__}` is required.
> 
> So lets add 'FATAL' to the already existing "use warnings" lines in
> Catalog.pm and genbki.pl.
> 
> I think the other files we should add this to  are generate-errcodes.pl,
> generate-plerrorcodes.pl, generate-spiexceptions.pl, Gen_fmgrtab.pl.
> Maybe psql/create_help.pl too.
> 
> We have a bunch of files in ECPG and MSVC areas and others in src/tools;
> not sure about those.
> 
> We also have gen_qsort_tuple.pl which amusingly does not even
> use warnings.

Hrm, I think that `use warnings 'FATAL';` might only work for core warnings. 
Which is annoying. I missed what was warning up-thread, but the most foolproof 
way to make all warnings fatal is the originally suggested

  local $SIG{__WARN__} = sub { die shift };

A *bit* cleaner is to use Carp::croak:

use Carp;
local $SIG{__WARN__} = \&croak;

Or if you wanted to get a stack trace out of it, use Carp::confess:

use Carp;
local $SIG{__WARN__} = \&confess;

Exception-handling in Perl is one of the few places that annoy me regularly.

Best,

David

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [PATCH 04/16] Add embedded list interface (header only)

2012-06-25 Thread Tom Lane

Andres Freund  writes:
> On Monday, June 25, 2012 05:15:43 PM Tom Lane wrote:
>> So you propose to define any compiler that strictly implements C99 as
>> not sensible and not one that will be able to compile Postgres?

> I propose to treat any compiler which has no way to get to equivalent 
> behaviour as not sensible. Yes.

Well, my response is "no".  I could see saying that we require (some) C99
features at this point, but not features that are in no standard, no
matter how popular gcc might be.

> I don't think there really are many of those 
> around. As you pointed out there is only one compiler in the buildfarm with 
> problems

This just means we don't have a wide enough collection of non-mainstream
machines in the buildfarm.  Deciding to break any platform with a
non-gcc-equivalent compiler isn't going to improve that.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] new --maintenance-db options

2012-06-25 Thread Robert Haas

On Sat, Jun 23, 2012 at 6:26 PM, Peter Eisentraut  wrote:
> About the new --maintenance-db options:
>
> Why was this option not added to createuser and dropuser?  In the
> original discussion[0] they were mentioned, but it apparently never made
> it into the code.

Oops.  That was an oversight.

> I find the name to be unfortunate.  For example, I think of running
> vacuum as "maintenance".  So running vacuumdb --maintenance-db=X would
> imply that the vacuum maintenance is done on X.  In fact, the whole
> point of this option is to find out where the maintenance is to be run,
> not to run the maintenance.  Maybe something like --initial-db would be
> better?

As Dave says, I picked this because pgAdmin has long used that terminology.

> What is the purpose of these options?  The initial discussion was
> unclear on this.  The documentation contains no explanation of why they
> should be used.  If we want to really support the case where both
> postgres and template1 are removed, an environment variable might be
> more useful than requiring this to be typed out for every command.
>
> [0]: 
> http://archives.postgresql.org/message-id/ca+tgmoacjwsis9nnqjgaaml1vg6c8b6o1ndgqnuca2gm00d...@mail.gmail.com

Well, I would be opposed to having ONLY an environment variable,
because I think that anything that can be controlled via an
environment variable should be able to be overridden on the command
line.  It might be OK to have both an environment variable AND a
command-line option, but I tend to thing it's too marginal to justify
that.

In retrospect, it seems as though it might have been a good idea to
make the postgres database read-only and undroppable, so that all
client utilities could count on being able to connect to it and get a
list of databases in the cluster without the need for all this
complexity.  Or else having some other way for a client to
authenticate and list out all the available databases.  In the absence
of such a mechanism, I don't think we can turn around and say that not
having a postgres database is an unsupported configuration, and
therefore we need some way to cope with it when it happens.

I think the original report that prompted this change was a complaint
that pg_upgrade failed when the postgres database had been dropped.
Now, admittedly, pg_upgrade fails for all kinds of crazy stupid
reasons and the chances of fixing that problem completely any time in
the next 5 years do not seem good, but that's not a reason not to keep
plugging the holes we can.  Anyhow, the same commit that introduced
--maintenance-db "fixed" that problem by making arranging to try both
postgres and template1 before giving up...  but have two hard-coded
database names either of which can be dropped or renamed seems only
marginally better than having one, hence the switch.  Really, I think
pg_upgrade needs this option too, unless we're going to kill the
problem at its root by providing a reliable way to enumerate database
names without first knowing the name one that you can connect to.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] warning handling in Perl scripts

2012-06-25 Thread Alvaro Herrera

Excerpts from David E. Wheeler's message of lun jun 25 11:23:34 -0400 2012:
> On Jun 25, 2012, at 3:35 PM, Tom Lane wrote:
> 
> > +1 for the concept of turning warnings into errors, but is that really
> > the cleanest, most idiomatic way to do so in Perl?  Sheesh.
> 
> It’s the most backward-compatible, but the most idiomatic way to do it 
> lexically is:
> 
> use warnings 'FATAL';
> 
> However, that works only for the current lexical scope. If there are warnings 
> in the code you are calling from the current scope, the use of `local 
> $SIG{__WARN__}` is required.

So lets add 'FATAL' to the already existing "use warnings" lines in
Catalog.pm and genbki.pl.

I think the other files we should add this to  are generate-errcodes.pl,
generate-plerrorcodes.pl, generate-spiexceptions.pl, Gen_fmgrtab.pl.
Maybe psql/create_help.pl too.

We have a bunch of files in ECPG and MSVC areas and others in src/tools;
not sure about those.

We also have gen_qsort_tuple.pl which amusingly does not even
use warnings.

-- 
Álvaro Herrera 
The PostgreSQL Company - Command Prompt, Inc.
PostgreSQL Replication, Consulting, Custom Development, 24x7 support

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] pg_upgrade broken by xlog numbering

2012-06-25 Thread Tom Lane

Robert Haas  writes:
> On MacOS X, on latest sources, initdb fails:

> creating directory /Users/rhaas/pgsql/src/test/regress/./tmp_check/data ... ok
> creating subdirectories ... ok
> selecting default max_connections ... 100
> selecting default shared_buffers ... 32MB
> creating configuration files ... ok
> creating template1 database in
> /Users/rhaas/pgsql/src/test/regress/./tmp_check/data/base/1 ... ok
> initializing pg_authid ... ok
> initializing dependencies ... ok
> creating system views ... ok
> loading system objects' descriptions ... ok
> creating collations ... ok
> creating conversions ... ok
> creating dictionaries ... FATAL:  control file contains invalid data
> child process exited with exit code 1

Same for me.  It's crashing here:

if (ControlFile->state < DB_SHUTDOWNED ||
ControlFile->state > DB_IN_PRODUCTION ||
!XRecOffIsValid(ControlFile->checkPoint))
ereport(FATAL,
(errmsg("control file contains invalid data")));

state == DB_SHUTDOWNED, so the problem is with the XRecOffIsValid test.
ControlFile->checkPoint == 19972072 (0x130BFE8), what's wrong with that?

(I suppose the reason this is only failing on some machines is
platform-specific variations in xlog entry size, but it's still a bit
distressing that this got committed in such a broken state.)

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] libpq compression

2012-06-25 Thread Euler Taveira

On 24-06-2012 23:04, Robert Haas wrote:
> So I think we really
> need someone to try this both ways and compare.  Right now it seems
> like we're mostly speculating on how well either approach would work,
> which is not as good as having some experimental results.
> 
Not a problem. That's what I'm thinking too but I would like to make sure that
others don't object to general idea. Let me give it a try in both ideas...


-- 
   Euler Taveira de Oliveira - Timbira   http://www.timbira.com.br/
   PostgreSQL: Consultoria, Desenvolvimento, Suporte 24x7 e Treinamento

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [PATCH 04/16] Add embedded list interface (header only)

2012-06-25 Thread Andres Freund

On Monday, June 25, 2012 05:15:43 PM Tom Lane wrote:
> Andres Freund  writes:
> > On Friday, June 22, 2012 02:04:02 AM Tom Lane wrote:
> >> This is nonsense.  There are at least three buildfarm machines running
> >> compilers that do not "pretend to be gcc" (at least, configure
> >> recognizes them as not gcc) and are not MSVC either.
> > 
> > Should there be no other trick - I think there is though - we could just
> > specify -W2177 as an alternative parameter to test in the 'quiet static
> > inline' test.
> What is that, an MSVC switch?  If so it's rather irrelevant to non-MSVC
> compilers.
HP-UX/aCC, the only compiler in the buildfarm I found that seems to fall short 
in the "quiet inline" test.

MSVC seems to work fine with in supported versions, USE_INLINE is defined 
these days.

> > I definitely do not want to bar any sensible compiler from compiling
> > postgres but the keyword here is 'sensible'. If it requires some modest
> > force/trickery to behave sensible, thats ok, but if we need to ship
> > around huge unreadable crufty macros just to support them I don't find
> > it ok.
> So you propose to define any compiler that strictly implements C99 as
> not sensible and not one that will be able to compile Postgres?  I do
> not think that's acceptable.  I have no problem with producing better
> code on gcc than elsewhere (as we already do), but being flat out broken
> for compilers that don't match gcc's interpretation of "inline" is not
> good enough.
I propose to treat any compiler which has no way to get to equivalent 
behaviour as not sensible. Yes. I don't think there really are many of those 
around. As you pointed out there is only one compiler in the buildfarm with 
problems and I think those can be worked around (can't test it yet though, the 
only HP-UX I could get my hands on quickly is at 11.11...).

Greetings,

Andres

-- 
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Catalog/Metadata consistency during changeset extraction from wal

2012-06-25 Thread Robert Haas

On Mon, Jun 25, 2012 at 9:43 AM, Andres Freund  wrote:
>> > The only theoretical way I see against that problem would be to postpone
>> > all relation unlinks untill everything that could possibly read them has
>> > finished. Doesn't seem to alluring although it would be needed if we
>> > ever move more things of SnapshotNow.
>> >
>> > Input/Ideas/Opinions?
>>
>> Yeah, this is slightly nasty.  I'm not sure whether or not there's a
>> way to make it work.
> Postponing all non-rollback unlinks to the next "logical checkpoint" is the
> only thing I can think of...

There are a number of cool things we could do if we postponed unlinks.
 Like, why can't we allow concurrent read-only queries while a CLUSTER
operation is in progress?  Well, two reasons.  The first is that we
currently can't do ANY DDL with less than a full table lock because of
SnapshotNow-related race conditions.  The second is that people might
still need to look at the old heap after the CLUSTER transaction
commits.  Some kind of delayed unlink facility where we
garbage-collect relation backing files when their refcount falls to
zero would solve the second problem - not that that's any help by
itself without a solution to the first one, but hey.

>> I had another idea.  Suppose decoding happens directly on the primary,
>> because I'm still hoping there's a way to swing that.  Suppose further
>> that we handle DDL by insisting that (1) any backend which wants to
>> add columns or change the types of existing columns must first wait
>> for logical replication to catch up and (2) if a backend which has
>> added columns or changed the types of existing columns then writes to
>> the modified table, decoding of those writes will be postponed until
>> transaction commit.  I think that's enough to guarantee that the
>> decoding process can just use the catalogs as they stand, with plain
>> old SnapshotNow.
> I don't think its that easy. If you e.g. have multiple ALTER's in the same
> transaction interspersed with inserted rows they will all have different
> TupleDesc's.

If new columns were added, then tuples created with those older
tuple-descriptors can still be interpreted with the latest
tuple-descriptor.

Columns that are dropped or retyped are a little trickier, but
honestly... how much do we care about those cases?  How practical is
it to suppose we're going to be able to handle them sanely anyway?
Suppose that the user defines a type which works just like int4 except
that the output functions writes out each number in pig latin (and the
input function parses pig latin).  The user defines the types as
binary coercible to each other and then does ALTER TABLE on a large
table with an int4 column, transforming it into an int4piglatin
column.  Due to Noah Misch's fine work, we will conclude that no table
rewrite is needed.  But if logical replication is in use, then in
theory we should scan the whole table and generate an LCR for each row
saying "the row with primary key X was updated, and column Y, which
used to contain 42, now contains ourty-two-fay".  Otherwise, if we're
doing heterogenous replication into a system that just stores that
column as text, it'll end up with the wrong contents.  On the other
hand, if we're trying to ship data to another PostgreSQL instance
where the column hasn't yet been updated, then all of those LCRs are
just going to error out when we try to apply them.

A more realistic scenario where you have the same problem is with
something like ALTER TABLE .. ADD COLUMN .. DEFAULT.   If you add a
column with a default in a single step (as opposed to first adding the
column and then setting its default), we rewrite the table and set
every row to the default value.  Should that generate LCRs showing
every row being updated to add that new value, or should we generate
no LCRs and assume that the DBA will independently do the same
operation on the remote side?  Either answer could be correct,
depending on how the LCRs are being used.  If you're just rewriting
with a constant default, then perhaps the sensible thing is to
generate no LCRs, since it will be more efficient to mimic the
operation on the remote side than to replay the changes row-by-row.
But what if the default isn't a constant, like maybe it's
nextval('new_synthetic_pkey_seq') or even something like now().  In
those cases, it seems quite likely that if you don't generate LCRs,
manual user intervention will be required to get things back on track.
 On the other hand, if you do generate LCRs, the remote side will
become horribly bloated on replay, unless the LCRs also instruct the
far side that they should be applied via a full-table rewrite.

Can we just agree to punt all this complexity for version 1 (and maybe
versions 2, 3, and 4)?  I'm not sure what Slony does in situations
like this but I bet for a lot of replication systems, the answer is
"do a full resync".  In other words, we either forbid the operation
outright when the table is enabled for logical replicat

Re: [HACKERS] warning handling in Perl scripts

2012-06-25 Thread David E. Wheeler

On Jun 25, 2012, at 3:35 PM, Tom Lane wrote:

> +1 for the concept of turning warnings into errors, but is that really
> the cleanest, most idiomatic way to do so in Perl?  Sheesh.

It’s the most backward-compatible, but the most idiomatic way to do it 
lexically is:

use warnings 'FATAL';

However, that works only for the current lexical scope. If there are warnings 
in the code you are calling from the current scope, the use of `local 
$SIG{__WARN__}` is required.

HTH,

David

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] pg_upgrade broken by xlog numbering

2012-06-25 Thread Thom Brown

On 25 June 2012 13:11, Kevin Grittner  wrote:
> On HEAD at the moment, `make check-world` is failing on a 32-bit Linux
> build:
>
> + pg_upgrade -d
> /home/kevin/pg/master/contrib/pg_upgrade/tmp_check/data.old -D
> /home/kevin/pg/master/contrib/pg_upgrade/tmp_check/data -b
> /home/kevin/pg/master/contrib/pg_upgrade/tmp_check/install//home/kevin/pg/master/Debug/bin
> -B
> /home/kevin/pg/master/contrib/pg_upgrade/tmp_check/install//home/kevin/pg/master/Debug/bin
> Performing Consistency Checks
> -
> Checking current, bin, and data directories                 ok
> Checking cluster versions                                   ok
> Some required control information is missing;  cannot find:
>  first log file ID after reset
>  first log file segment after reset
>
> Cannot continue without required control information, terminating
> Failure, exiting

I get precisely the same on 64-bit Linux.

-- 
Thom

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [PATCH 04/16] Add embedded list interface (header only)

2012-06-25 Thread Tom Lane

Andres Freund  writes:
> On Friday, June 22, 2012 02:04:02 AM Tom Lane wrote:
>> This is nonsense.  There are at least three buildfarm machines running
>> compilers that do not "pretend to be gcc" (at least, configure
>> recognizes them as not gcc) and are not MSVC either.

> Should there be no other trick - I think there is though - we could just 
> specify -W2177 as an alternative parameter to test in the 'quiet static 
> inline' test.

What is that, an MSVC switch?  If so it's rather irrelevant to non-MSVC
compilers.

> I definitely do not want to bar any sensible compiler from compiling postgres
> but the keyword here is 'sensible'. If it requires some modest force/trickery
> to behave sensible, thats ok, but if we need to ship around huge unreadable 
> crufty macros just to support them I don't find it ok.

So you propose to define any compiler that strictly implements C99 as
not sensible and not one that will be able to compile Postgres?  I do
not think that's acceptable.  I have no problem with producing better
code on gcc than elsewhere (as we already do), but being flat out broken
for compilers that don't match gcc's interpretation of "inline" is not
good enough.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [PATCH 04/16] Add embedded list interface (header only)

2012-06-25 Thread Tom Lane

Peter Geoghegan  writes:
> On 22 June 2012 01:04, Tom Lane  wrote:
>> This is nonsense.  There are at least three buildfarm machines running
>> compilers that do not "pretend to be gcc" (at least, configure
>> recognizes them as not gcc) and are not MSVC either.

> So those three don't have medium to high degrees of compatibility with GCC?

Uh, they all compile C, so perforce they have reasonable degrees of
compatibility with gcc.  That doesn't mean they implement gcc's
nonstandard extensions.

>> We ought to have more IMO, because software monocultures are
>> dangerous.  Of
>> those three, two pass the "quiet inline" test and one --- the newest of the 
>> three
>> if I guess correctly --- does not.  So it is not the case that
>> !USE_INLINE is dead code, even if you adopt the position that we don't
>> care about any compiler not represented in the buildfarm.

> I note that you said that it doesn't pass the "quiet inline" test, and
> not that it doesn't support inline functions.

What's your point?  If the compiler isn't implementing inline the same
way gcc does, we can't use the same inlining arrangements.  I will be
the first to agree that C99's definition of inline sucks, but that
doesn't mean we can assume that gcc's version is implemented everywhere.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] pg_upgrade broken by xlog numbering

2012-06-25 Thread Robert Haas

On Mon, Jun 25, 2012 at 8:11 AM, Kevin Grittner
 wrote:
> On HEAD at the moment, `make check-world` is failing on a 32-bit Linux
> build:
>
> + pg_upgrade -d
> /home/kevin/pg/master/contrib/pg_upgrade/tmp_check/data.old -D
> /home/kevin/pg/master/contrib/pg_upgrade/tmp_check/data -b
> /home/kevin/pg/master/contrib/pg_upgrade/tmp_check/install//home/kevin/pg/master/Debug/bin
> -B
> /home/kevin/pg/master/contrib/pg_upgrade/tmp_check/install//home/kevin/pg/master/Debug/bin
> Performing Consistency Checks
> -
> Checking current, bin, and data directories                 ok
> Checking cluster versions                                   ok
> Some required control information is missing;  cannot find:
>  first log file ID after reset
>  first log file segment after reset
>
> Cannot continue without required control information, terminating
> Failure, exiting

On MacOS X, on latest sources, initdb fails:

creating directory /Users/rhaas/pgsql/src/test/regress/./tmp_check/data ... ok
creating subdirectories ... ok
selecting default max_connections ... 100
selecting default shared_buffers ... 32MB
creating configuration files ... ok
creating template1 database in
/Users/rhaas/pgsql/src/test/regress/./tmp_check/data/base/1 ... ok
initializing pg_authid ... ok
initializing dependencies ... ok
creating system views ... ok
loading system objects' descriptions ... ok
creating collations ... ok
creating conversions ... ok
creating dictionaries ... FATAL:  control file contains invalid data
child process exited with exit code 1
initdb: data directory
"/Users/rhaas/pgsql/src/test/regress/./tmp_check/data" not removed at
user's request

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [BUGS] pg_tablespace.spclocation column removed in 9.2

2012-06-25 Thread Magnus Hagander

On Mon, Jun 25, 2012 at 3:46 PM, Tom Lane  wrote:
> Pavel Golub  writes:
>> I'm aware of problems caused by this hard coded column. My proposal is
>> to convert pg_tablespace to system view may be?
>
> It's not that easy to make a 100% backwards compatible view for a system
> catalog.  The main sticking point is the OID column; see recent problems
> with pg_roles' OID column for an example.  On the whole I don't think
> renaming pg_tablespace and inserting a mostly-compatible view would be
> a net advance.
>
> More generally, I don't see that this particular incompatibility in
> the system catalogs is worse than many others that we've perpetrated.

I'd say it's a lot less bad than some others. At least for this one
you can reasonably connect and figure it out. There was the changes
for database config, I think, which made at least pgadmin break even
before it managed to connect properly. (It got the connection in the
libpq sense, but not in the pgadmin sense).

Bottom line is, any admin tool will *always* have to have to know
about the specific versions and have code to deal with being able to
run different queries on different versions anyway. And this one is
trivial to reimplement with a different query, compared to most
others.

Yes, if we had a set of those stable-system-views that people keep
asking for every now and then, this is one of the few changes that
they *would* actually help with. But there would still be changes that
even those couldn't deal with, because they simply can't know the
future...

-- 
 Magnus Hagander
 Me: http://www.hagander.net/
 Work: http://www.redpill-linpro.com/

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [BUGS] pg_tablespace.spclocation column removed in 9.2

2012-06-25 Thread Tom Lane

Pavel Golub  writes:
> I'm aware of problems caused by this hard coded column. My proposal is
> to convert pg_tablespace to system view may be?

It's not that easy to make a 100% backwards compatible view for a system
catalog.  The main sticking point is the OID column; see recent problems
with pg_roles' OID column for an example.  On the whole I don't think
renaming pg_tablespace and inserting a mostly-compatible view would be
a net advance.

More generally, I don't see that this particular incompatibility in
the system catalogs is worse than many others that we've perpetrated.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Catalog/Metadata consistency during changeset extraction from wal

2012-06-25 Thread Andres Freund

On Monday, June 25, 2012 03:08:51 AM Robert Haas wrote:
> On Sun, Jun 24, 2012 at 5:11 PM, Andres Freund  
wrote:
> > There are some interesting problems related to locking and snapshots
> > here. Not sure if they are resolvable:
> > 
> > We need to restrict SnapshotNow to represent to the view it had back when
> > the wal record were currently decoding had. Otherwise we would possibly
> > get wrong column types and similar. As were working in the past locking
> > doesn't protect us against much here. I have that (mostly and
> > inefficiently).
> > 
> > One interesting problem are table rewrites (truncate, cluster, some ALTER
> > TABLE's) and dropping tables. Because we nudge SnapshotNow to the past
> > view it had back when the wal record was created we get the old
> > relfilenode. Which might have been dropped in part of the transaction
> > cleanup...
> > With most types thats not a problem. Even things like records and arrays
> > aren't problematic. More interesting cases include VACUUM FULL $systable
> > (e.g. pg_enum) and vacuum full'ing a table which is used in the *_out
> > function of a type (like a user level pg_enum implementation).
> > 
> > The only theoretical way I see against that problem would be to postpone
> > all relation unlinks untill everything that could possibly read them has
> > finished. Doesn't seem to alluring although it would be needed if we
> > ever move more things of SnapshotNow.
> > 
> > Input/Ideas/Opinions?
> 
> Yeah, this is slightly nasty.  I'm not sure whether or not there's a
> way to make it work.
Postponing all non-rollback unlinks to the next "logical checkpoint" is the 
only thing I can think of...

> I had another idea.  Suppose decoding happens directly on the primary,
> because I'm still hoping there's a way to swing that.  Suppose further
> that we handle DDL by insisting that (1) any backend which wants to
> add columns or change the types of existing columns must first wait
> for logical replication to catch up and (2) if a backend which has
> added columns or changed the types of existing columns then writes to
> the modified table, decoding of those writes will be postponed until
> transaction commit.  I think that's enough to guarantee that the
> decoding process can just use the catalogs as they stand, with plain
> old SnapshotNow.
I don't think its that easy. If you e.g. have multiple ALTER's in the same 
transaction interspersed with inserted rows they will all have different 
TupleDesc's.
I don't see how thats resolvable without either replicating ddl to the target 
system or changing what SnapshotNow does...

> The downside of this approach is that it makes certain kinds of DDL
> suck worse if logical replication is in use and behind.  But I don't
> necessarily see that as prohibitive because (1) logical replication
> being behind is likely to suck for a lot of other reasons too and (2)
> adding or retyping columns isn't a terribly frequent operation and
> people already expect a hit when they do it.  Also, I suspect that we
> could find ways to loosen those restrictions at least in common cases
> in some future version; meanwhile, less work now.
Agreed.

Andres
-- 
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] warning handling in Perl scripts

2012-06-25 Thread Tom Lane

Peter Eisentraut  writes:
> On sÃ¶n, 2012-06-24 at 16:05 -0400, Robert Haas wrote:
>>> +local $SIG{__WARN__} = sub { die $_[0] };

>> This seems like a band-aid.

> I'd think of it as a safety net.

+1 for the concept of turning warnings into errors, but is that really
the cleanest, most idiomatic way to do so in Perl?  Sheesh.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] libpq compression

2012-06-25 Thread k...@rice.edu

On Mon, Jun 25, 2012 at 03:12:46PM +0200, Florian Pflug wrote:
> On Jun25, 2012, at 04:04 , Robert Haas wrote:
> > If, for
> > example, someone can demonstrate that an awesomebsdlz compresses 10x
> > as fast as OpenSSL...  that'd be pretty compelling.
> 
> That, actually, is demonstrably the case for at least Google's snappy.
> (and LZO, but that's not an option since its license is GPL) They state in
> their documentation that
> 
>   In our tests, Snappy usually is faster than algorithms in the same class
>   (e.g. LZO, LZF, FastLZ, QuickLZ, etc.) while achieving comparable
>   compression ratios.
> 
> The only widely supported compression method for SSL seems to be DEFLATE,
> which is also what gzip/zlib uses. I've benchmarked LZO against gzip/zlib
> a few months ago, and LZO outperformed zlib in fast mode (i.e. gzip -1) by
> an order of magnitude.
> 
> The compression ratio achieved by DEFLATE/gzip/zlib is much better, though.
> The snappy documentation states
> 
>   Typical compression ratios (based on the benchmark suite) are about
>   1.5-1.7x for plain text, about 2-4x for HTML, and of course 1.0x for
>   JPEGs, PNGs and other already-compressed data. Similar numbers for zlib
>   in its fastest mode are 2.6-2.8x, 3-7x and 1.0x, respectively.
> 
> Here are a few numbers for LZO vs. gzip. Snappy should be comparable to
> LZO - I tested LZO because I still had the command-line compressor lzop
> lying around on my machine, whereas I'd have needed to download and compile
> snappy first.
> 
> $ dd if=/dev/random of=data bs=1m count=128
> $ time gzip -1 < data > data.gz
> real  0m6.189s
> user  0m5.947s
> sys   0m0.224s
> $ time lzop < data > data.lzo
> real  0m2.697s
> user  0m0.295s
> sys   0m0.224s
> $ ls -lh data*
> -rw-r--r--  1 fgp  staff   128M Jun 25 14:43 data
> -rw-r--r--  1 fgp  staff   128M Jun 25 14:44 data.gz
> -rw-r--r--  1 fgp  staff   128M Jun 25 14:44 data.lzo
> 
> $ dd if=/dev/zero of=zeros bs=1m count=128
> $ time gzip -1 < zeros > zeros.gz
> real  0m1.083s
> user  0m1.019s
> sys   0m0.052s
> $ time lzop < zeros > zeros.lzo
> real  0m0.186s
> user  0m0.123s
> sys   0m0.053s
> $ ls -lh zeros*
> -rw-r--r--  1 fgp  staff   128M Jun 25 14:47 zeros
> -rw-r--r--  1 fgp  staff   572K Jun 25 14:47 zeros.gz
> -rw-r--r--  1 fgp  staff   598K Jun 25 14:47 zeros.lzo
> 
> To summarize, on my 2.66 Ghz Core2 Duo Macbook Pro, LZO compresses about
> 350MB/s if the data is purely random, and about 800MB/s if the data
> compresses extremely well. (Numbers based on user time since that indicates
> the CPU time used, and ignores the IO overhead, which is substantial)
> 
> IMHO, the only compelling argument (and a very compelling one) to use
> SSL compression was that it requires very little code on our side. We've
> since discovered that it's not actually that simple, at least if we want
> to support compression without authentication or encryption, and don't
> want to restrict ourselves to using OpenSSL forever. So unless we give
> up at least one of those requirements, the arguments for using
> SSL-compression are rather thin, I think.
> 
> best regards,
> Florian Pflug
> 
+1 for http://code.google.com/p/lz4/ support. It has a BSD license too.
Using SSL libraries give all the complexity without any real benefit.

Regards,
Ken

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] libpq compression

2012-06-25 Thread Florian Pflug

On Jun25, 2012, at 04:04 , Robert Haas wrote:
> If, for
> example, someone can demonstrate that an awesomebsdlz compresses 10x
> as fast as OpenSSL...  that'd be pretty compelling.

That, actually, is demonstrably the case for at least Google's snappy.
(and LZO, but that's not an option since its license is GPL) They state in
their documentation that

  In our tests, Snappy usually is faster than algorithms in the same class
  (e.g. LZO, LZF, FastLZ, QuickLZ, etc.) while achieving comparable
  compression ratios.

The only widely supported compression method for SSL seems to be DEFLATE,
which is also what gzip/zlib uses. I've benchmarked LZO against gzip/zlib
a few months ago, and LZO outperformed zlib in fast mode (i.e. gzip -1) by
an order of magnitude.

The compression ratio achieved by DEFLATE/gzip/zlib is much better, though.
The snappy documentation states

  Typical compression ratios (based on the benchmark suite) are about
  1.5-1.7x for plain text, about 2-4x for HTML, and of course 1.0x for
  JPEGs, PNGs and other already-compressed data. Similar numbers for zlib
  in its fastest mode are 2.6-2.8x, 3-7x and 1.0x, respectively.

Here are a few numbers for LZO vs. gzip. Snappy should be comparable to
LZO - I tested LZO because I still had the command-line compressor lzop
lying around on my machine, whereas I'd have needed to download and compile
snappy first.

$ dd if=/dev/random of=data bs=1m count=128
$ time gzip -1 < data > data.gz
real0m6.189s
user0m5.947s
sys 0m0.224s
$ time lzop < data > data.lzo
real0m2.697s
user0m0.295s
sys 0m0.224s
$ ls -lh data*
-rw-r--r--  1 fgp  staff   128M Jun 25 14:43 data
-rw-r--r--  1 fgp  staff   128M Jun 25 14:44 data.gz
-rw-r--r--  1 fgp  staff   128M Jun 25 14:44 data.lzo

$ dd if=/dev/zero of=zeros bs=1m count=128
$ time gzip -1 < zeros > zeros.gz
real0m1.083s
user0m1.019s
sys 0m0.052s
$ time lzop < zeros > zeros.lzo
real0m0.186s
user0m0.123s
sys 0m0.053s
$ ls -lh zeros*
-rw-r--r--  1 fgp  staff   128M Jun 25 14:47 zeros
-rw-r--r--  1 fgp  staff   572K Jun 25 14:47 zeros.gz
-rw-r--r--  1 fgp  staff   598K Jun 25 14:47 zeros.lzo

To summarize, on my 2.66 Ghz Core2 Duo Macbook Pro, LZO compresses about
350MB/s if the data is purely random, and about 800MB/s if the data
compresses extremely well. (Numbers based on user time since that indicates
the CPU time used, and ignores the IO overhead, which is substantial)

IMHO, the only compelling argument (and a very compelling one) to use
SSL compression was that it requires very little code on our side. We've
since discovered that it's not actually that simple, at least if we want
to support compression without authentication or encryption, and don't
want to restrict ourselves to using OpenSSL forever. So unless we give
up at least one of those requirements, the arguments for using
SSL-compression are rather thin, I think.

best regards,
Florian Pflug

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] WIP Patch: Selective binary conversion of CSV file foreign tables

2012-06-25 Thread Kohei KaiGai

Fujita-san,

The revised patch is almost good for me.
Only point I noticed is the check for redundant or duplicated options
I pointed out on the previous post.
So, if you agree with the attached patch, I'd like to hand over this
patch for committers.

All I changed is:
 --- a/src/backend/commands/copy.c
 +++ b/src/backend/commands/copy.c
 @@ -122,6 +122,11 @@ typedef struct CopyStateData
@@ -217,7 +221,7 @@ index 98bcb2f..0143d81 100644
}
 +  else if (strcmp(defel->defname, "convert_binary") == 0)
 +  {
-+  if (cstate->convert_binary)
++  if (cstate->convert_selectively)
 +  ereport(ERROR,
 +  (errcode(ERRCODE_SYNTAX_ERROR),
 +   errmsg("conflicting or redundant options")));

Thanks,

2012/6/20 Etsuro Fujita :
> Hi KaiGai-san,
>
> Thank you for the review.
>
>> -Original Message-
>> From: pgsql-hackers-ow...@postgresql.org
>> [mailto:pgsql-hackers-ow...@postgresql.org] On Behalf Of Kohei KaiGai
>> Sent: Wednesday, June 20, 2012 1:26 AM
>> To: Etsuro Fujita
>> Cc: Robert Haas; pgsql-hackers@postgresql.org
>> Subject: Re: [HACKERS] WIP Patch: Selective binary conversion of CSV file
>> foreign tables
>>
>> Hi Fujita-san,
>>
>> Could you rebase this patch towards the latest tree?
>> It was unavailable to apply the patch cleanly.
>
> Sorry, I updated the patch.  Please find attached an updated version of the
> patch.
>
>> I looked over the patch, then noticed a few points.
>>
>> At ProcessCopyOptions(), defel->arg can be NIL, isn't it?
>> If so, cstate->convert_binary is not a suitable flag to check redundant
>> option. It seems to me cstate->convert_selectively is more suitable flag
>> to check it.
>>
>> +       else if (strcmp(defel->defname, "convert_binary") == 0)
>> +       {
>> +           if (cstate->convert_binary)
>> +               ereport(ERROR,
>> +                       (errcode(ERRCODE_SYNTAX_ERROR),
>> +                        errmsg("conflicting or redundant options")));
>> +           cstate->convert_selectively = true;
>> +           if (defel->arg == NULL || IsA(defel->arg, List))
>> +               cstate->convert_binary = (List *) defel->arg;
>> +           else
>> +               ereport(ERROR,
>> +                       (errcode(ERRCODE_INVALID_PARAMETER_VALUE),
>> +                        errmsg("argument to option \"%s\" must be a
>> list of column names",
>> +                               defel->defname)));
>> +       }
>
> Yes, defel->arg can be NIL.  defel->arg is a List structure listing all the
> columns needed to be converted to binary representation, which is NIL for
> the case where no columns are needed to be converted.  For example,
> defel->arg is NIL for SELECT COUNT(*).  In this case, while
> cstate->convert_selectively is set to true, no columns are converted at
> NextCopyFrom().  Most efficient case!  In short, cstate->convert_selectively
> represents whether to do selective binary conversion at NextCopyFrom(), and
> cstate->convert_binary represents all the columns to be converted at
> NextCopyFrom(), that can be NIL.
>
>> At NextCopyFrom(), this routine computes default values if configured.
>> In case when these values are not referenced, it might be possible to skip
>> unnecessary calculations.
>> Is it unavailable to add logic to avoid to construct cstate->defmap on
>> unreferenced columns at  BeginCopyFrom()?
>
> I think that we don't need to add the above logic because file_fdw does
> BeginCopyFrom() with attnamelist = NIL, in which case, BeginCopyFrom()
> doesn't construct cstate->defmap at all.
>
> I fixed a bug plus some minor optimization in check_binary_conversion() that
> is renamed to check_selective_binary_conversion() in the updated version,
> and now file_fdw gives up selective binary conversion for the following
> cases:
>
>  a) BINARY format
>  b) CSV/TEXT format and whole row reference
>  c) CSV/TEXT format and all the user attributes needed
>
>
> Best regards,
> Etsuro Fujita
>
>> Thanks,
>>
>> 2012/5/11 Etsuro Fujita :
>> >> -Original Message-
>> >> From: Robert Haas [mailto:robertmh...@gmail.com]
>> >> Sent: Friday, May 11, 2012 1:36 AM
>> >> To: Etsuro Fujita
>> >> Cc: pgsql-hackers@postgresql.org
>> >> Subject: Re: [HACKERS] WIP Patch: Selective binary conversion of CSV
>> >> file foreign tables
>> >>
>> >> On Tue, May 8, 2012 at 7:26 AM, Etsuro Fujita
>> > 
>> >> wrote:
>> >> > I would like to propose to improve parsing efficiency of
>> >> > contrib/file_fdw by selective parsing proposed by Alagiannis et
>> >> > al.[1], which means that for a CSV/TEXT file foreign table,
>> >> > file_fdw performs binary conversion only for the columns needed for
>> >> > query processing.  Attached is a WIP patch implementing the feature.
>> >>
>> >> Can you add this to the next CommitFest?  Looks interesting.
>> >
>> > Done.
>> >
>> > Best regards,
>> > Etsuro Fujita
>> >
>> >> --
>> >> Robert Haas
>> >> EnterpriseDB: http://www.enterprisedb.com The Enterprise Pos

Re: [HACKERS] libpq compression

2012-06-25 Thread Magnus Hagander

On Mon, Jun 25, 2012 at 4:04 AM, Robert Haas  wrote:
> On Fri, Jun 22, 2012 at 12:38 PM, Euler Taveira  wrote:
>> On 20-06-2012 17:40, Marko Kreen wrote:
>>> On Wed, Jun 20, 2012 at 10:05 PM, Florian Pflug  wrote:
 I'm starting to think that relying on SSL/TLS for compression of
 unencrypted connections might not be such a good idea after all. We'd
 be using the protocol in a way it quite clearly never was intended to
 be used...
>>>
>>> Maybe, but what is the argument that we should avoid
>>> on encryption+compression at the same time?
>>>
>>> AES is quite lightweight compared to compression, so should
>>> be no problem in situations where you care about compression.
>>>
>> If we could solve compression problem without AES that will turn things
>> easier. Compression-only via encryption is a weird manner to solve the 
>> problem
>> in the user's POV.
>>
>>> RSA is noticeable, but only for short connections.
>>> Thus easily solvable with connection pooling.
>>>
>> RSA overhead is not the main problem. SSL/TLS setup is.
>>
>>> And for really special compression needs you can always
>>> create a UDF that does custom compression for you.
>>>
>> You have to own the code to modify it; it is not always an option.
>>
>>> So what exactly is the situation we need to solve
>>> with postgres-specific protocol compression?
>>>
>> Compression only support. Why do I need to set up SSL/TLS just for 
>> compression?
>>
>> IMHO SSL/TLS use is no different from relying in another library to handle
>> compression for the protocol and more it is compression-specific. That way, 
>> we
>> could implement another algorithms in such library without needing to modify
>> libpq code. Using SSL/TLS you are bounded by what SSL/TLS software products
>> decide to use as compression algorithms. I'll be happy to maintain the code
>> iif it is postgres-specific or even as close as possible to core.
>
> I guess my feeling on this is that, so far as I can see, supporting
> compression via OpenSSL involves work and trade-offs, and supporting
> it without depending on OpenSSL also involves work, and trade-offs.

Nice summary :)

> So it's not real evident to me that we should prefer one to the other
> on general principle.  It seems to me that a lot might come down to
> performance.  If someone can demonstrate that using an external
> library involves gets significantly better compression, chews up
> significantly less CPU time, and/or is significantly less code than
> supporting this via OpenSSL, then maybe we ought to consider it.

I think we should, yes. But as you say, we need to know first. It's
also a question of if one of these compression schemes are trivial
enough that we could embed the code rather than rely on it externally
- I have no idea if that's even remotely possibe, but that would move
the goalposts a bit too.

> OpenSSL is kind of an ugly piece of code, and all things being equal
> depending on it more heavily would not be my first choice.

Indeed.

But we should really stop saying "rely on openssl" and start saying
"rely on the ssl library". There are other SSL libraries which are not
quite so ugly, which we should eventually support.

That said, it's *still* a bit ugly to rely on the SSL library for this, IMO.


> On the other hand, this does not seem to me to be a situation where we
> should accept a patch to use an external library just because someone
> takes the time to write one, because there is also a non-trivial cost
> for the entire project to depending on more things; or if the
> compression code gets put into the backend, then there's a cost to us
> to maintain that code inside our source base.  So I think we really
> need someone to try this both ways and compare.  Right now it seems
> like we're mostly speculating on how well either approach would work,
> which is not as good as having some experimental results.  If, for
> example, someone can demonstrate that an awesomebsdlz compresses 10x
> as fast as OpenSSL...  that'd be pretty compelling.

Or that it takes less code/generates cleaner code...

-- 
 Magnus Hagander
 Me: http://www.hagander.net/
 Work: http://www.redpill-linpro.com/

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

[HACKERS] pg_upgrade broken by xlog numbering

2012-06-25 Thread Kevin Grittner

On HEAD at the moment, `make check-world` is failing on a 32-bit Linux
build:

+ pg_upgrade -d
/home/kevin/pg/master/contrib/pg_upgrade/tmp_check/data.old -D
/home/kevin/pg/master/contrib/pg_upgrade/tmp_check/data -b
/home/kevin/pg/master/contrib/pg_upgrade/tmp_check/install//home/kevin/pg/master/Debug/bin
-B
/home/kevin/pg/master/contrib/pg_upgrade/tmp_check/install//home/kevin/pg/master/Debug/bin
Performing Consistency Checks
-
Checking current, bin, and data directories ok
Checking cluster versions   ok
Some required control information is missing;  cannot find:
  first log file ID after reset
  first log file segment after reset

Cannot continue without required control information, terminating
Failure, exiting



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [PATCH] Allow breaking out of hung connection attempts

2012-06-25 Thread Ryan Kelly

Shigeru:

Thank you very much for your review. Comments are inline below, and a
new patch is attached.

On Fri, Jun 22, 2012 at 10:06:53AM +0900, Shigeru HANADA wrote:
> (2012/05/01 0:30), Ryan Kelly wrote:
> >On Mon, Apr 30, 2012 at 09:02:33AM -0400, Alvaro Herrera wrote:
> >>Well, do *you* want it?
> >Of course. That way I can stop patching my psql and go back to using the
> >one that came with my release :)
> 
> Here is result of my review of v4 patch.
> 
> Submission
> --
> - The patch is in context diff format
> - It needs rebase for HEAD, but patch command takes care automatically.
> - Make finishes cleanly, and all regression tests pass
> 
> Functionality
> -
> After applying this patch, I could cancel connection attempt at
> start-up by pressing Ctrl+C on terminal just after invoking psql
> command with an unused IP address.  Without this patch, such attempt
> ends up with error such as "No route to host" after uninterruptable
> few seconds (maybe the duration until error would depend on
> environment).
> 
> Connection attempt by \connect command could be also canceled by
> pressing Ctrl+C on psql prompt.
> 
> In addition, I tried setting PGCONNECT_TIMEOUT to 0 (infinite), but
> psql gave up after few seconds, for both start-up and re-connect.
> Is this intentional behavior?
A timeout of 0 (infinite) means to keep trying until we succeed or fail,
not keep trying forever. As you mentioned above, your connection
attempts error out after a few seconds. This is what is happening. In my
environment no such error occurs and as a result psql continues to
attempt to connect for as long as I let it.

> Detail of changes
> -
> Basically this patch consists of three changes:
> 
> 1) defer setup of cancel handler until start-up connection has established
> 2) new libpq API PQconnectTimeout which returns connect_timeout
> value of current session
> 3) use asynchronous connection API at \connect psql command, this
> requires API added by 2)
> 
> These seem reasonable to achieve canceling connection attempt at
> start-up and \connect, but, as Ryan says, codes added to do_connect
> might need more simplification.
> 
> I have some random comments.
> 
> - Checking status by calling PQstatus just after
> PQconnectStartParams is necessary.
Yes, I agree.

> - Copying only "select()" part of pqSocketPoll seems not enough,
> should we use poll(2) if it is supported?
I did not think the additional complexity was worth it in this case.
Unless you see some reason to use poll(2) that I do not.

> - Don't we need to clear error message stored in PGconn after
> PQconnectPoll returns OK status, like connectDBComplete?
I do not believe there is a client interface for clearing the error
message. Additionally, the documentation states that PQerrorMessage
"Returns the error message most recently generated by an operation on
the connection." This seems to indicate that the error message should be
cleared as this behavior is part of the contract of PQerrorMessage.

> 
> Regards,
> -- 
> Shigeru HANADA

-Ryan Kelly
diff --git a/doc/src/sgml/libpq.sgml b/doc/src/sgml/libpq.sgml
index 5c5dd68..79f4bc0 100644
--- a/doc/src/sgml/libpq.sgml
+++ b/doc/src/sgml/libpq.sgml
@@ -1496,6 +1496,24 @@ char *PQoptions(const PGconn *conn);
   
  
 
+
+
+ 
+  PQconnectTimeout
+  
+   PQconnectTimeout
+  
+ 
+
+ 
+  
+   Returns the connect_timeout property as given to libpq.
+
+char *PQconnectTimeout(const PGconn *conn);
+
+  
+ 
+

   
 
diff --git a/src/bin/psql/command.c b/src/bin/psql/command.c
index 0926786..23b0106 100644
--- a/src/bin/psql/command.c
+++ b/src/bin/psql/command.c
@@ -1506,7 +1506,7 @@ static bool
 do_connect(char *dbname, char *user, char *host, char *port)
 {
 	PGconn	   *o_conn = pset.db,
-			   *n_conn;
+			   *n_conn = NULL;
 	char	   *password = NULL;
 
 	if (!dbname)
@@ -1539,7 +1539,7 @@ do_connect(char *dbname, char *user, char *host, char *port)
 
 	while (true)
 	{
-#define PARAMS_ARRAY_SIZE	8
+#define PARAMS_ARRAY_SIZE	9
 		const char **keywords = pg_malloc(PARAMS_ARRAY_SIZE * sizeof(*keywords));
 		const char **values = pg_malloc(PARAMS_ARRAY_SIZE * sizeof(*values));
 
@@ -1557,30 +1557,140 @@ do_connect(char *dbname, char *user, char *host, char *port)
 		values[5] = pset.progname;
 		keywords[6] = "client_encoding";
 		values[6] = (pset.notty || getenv("PGCLIENTENCODING")) ? NULL : "auto";
-		keywords[7] = NULL;
-		values[7] = NULL;
+		keywords[7] = "connect_timeout";
+		values[7] = PQconnectTimeout(o_conn);
+		keywords[8] = NULL;
+		values[8] = NULL;
 
-		n_conn = PQconnectdbParams(keywords, values, true);
-
-		free(keywords);
-		free(values);
-
-		/* We can immediately discard the password -- no longer needed */
-		if (password)
-			free(password);
-
-		if (PQstatus(n_conn) == CONNECTION_OK)
-			break;
+		/* attempt connection asynchronously */
+		n_conn = PQconnectStartParams(keywords, values, true);
 
 		/*
-

Re: [HACKERS] compiler warnings on mingw

2012-06-25 Thread Magnus Hagander

On Mon, Jun 25, 2012 at 11:42 AM, Peter Eisentraut  wrote:
> I've tried to cross-compile PostgreSQL from Linux to Windows, following
> the ideas of Andrew Dunstan [0].  This works quite well.  I see two
> compiler warnings altogether, which might be worth getting rid of:
>
> #1
>
> mingwcompat.c:60:1: warning: ‘RegisterWaitForSingleObject’ redeclared without 
> dllimport attribute: previous dllimport ignored [-Wattributes]
>
> This can apparently go away with this:
>
> diff --git a/src/backend/port/win32/mingwcompat.c 
> b/src/backend/port/win32/mingwcompat.c
> index 0978e8c..b1a5ca5 100644
> --- a/src/backend/port/win32/mingwcompat.c
> +++ b/src/backend/port/win32/mingwcompat.c
> @@ -56,6 +56,7 @@
>            (PHANDLE, HANDLE, WAITORTIMERCALLBACK, PVOID, ULONG, ULONG);
>  static __RegisterWaitForSingleObject _RegisterWaitForSingleObject = NULL;
>
> +__attribute__((dllimport))
>  BOOL       WINAPI
>  RegisterWaitForSingleObject(PHANDLE phNewWaitObject,
>                            HANDLE hObject,

Seems like a proper fix to me - but could be verified by checking what
the actual mingw header looks like. Or maybe that's what you did
already..


> Oddly, the mingw buildfarm member[1] complains about
>
> mingwcompat.c:66: warning: no previous prototype for 
> 'RegisterWaitForSingleObject'

I think that one is just laziness - in the case when we're injecting
that API function into mingw we should declare it in our own headers.
It was likely just left out because the proper API headers already
carry it, it was just missing in mingw.. So it hsould be added to the
port header, under an #ifdef.


> instead.  So there might be some divergent header files around.
>
> Anyone know details about this?

Perhaps mingw has added it to their api *properly* this time, and the
whole function should go away from mingwcompat.c? In that case it'd
obviously require a configure test, since it doesn't exist in previous
releases.


> #2
>
> pg_stat_statements.c: In function ‘pgss_ProcessUtility’:
> pg_stat_statements.c:840:4: warning: unknown conversion type character ‘l’ in 
> format [-Wformat]
> pg_stat_statements.c:840:4: warning: too many arguments for format 
> [-Wformat-extra-args]
>
> We use a replacement snprintf and set the int64 format to %lld and %llu
> based on that.  But pg_stat_statements.c uses sscanf, for which we have
> no replacement.  The configure check comments
>
> # MinGW uses '%I64d', though gcc throws an warning with -Wall,
> # while '%lld' doesn't generate a warning, but doesn't work.
>
> So assuming that sscanf in the mingw C library works consistently with
> snprintf, that might mean that pg_stat_statements is broken on that
> platform.  (The claim that %lld doesn't generate a warning is also
> questionable here.)

can't commend on that part without more investigation.

-- 
 Magnus Hagander
 Me: http://www.hagander.net/
 Work: http://www.redpill-linpro.com/

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

[HACKERS] compiler warnings on mingw

2012-06-25 Thread Peter Eisentraut

I've tried to cross-compile PostgreSQL from Linux to Windows, following
the ideas of Andrew Dunstan [0].  This works quite well.  I see two
compiler warnings altogether, which might be worth getting rid of:

#1

mingwcompat.c:60:1: warning: ‘RegisterWaitForSingleObject’ redeclared without 
dllimport attribute: previous dllimport ignored [-Wattributes]

This can apparently go away with this:

diff --git a/src/backend/port/win32/mingwcompat.c 
b/src/backend/port/win32/mingwcompat.c
index 0978e8c..b1a5ca5 100644
--- a/src/backend/port/win32/mingwcompat.c
+++ b/src/backend/port/win32/mingwcompat.c
@@ -56,6 +56,7 @@
(PHANDLE, HANDLE, WAITORTIMERCALLBACK, PVOID, ULONG, ULONG);
 static __RegisterWaitForSingleObject _RegisterWaitForSingleObject = NULL;
 
+__attribute__((dllimport))
 BOOL   WINAPI
 RegisterWaitForSingleObject(PHANDLE phNewWaitObject,
HANDLE hObject,

Oddly, the mingw buildfarm member[1] complains about

mingwcompat.c:66: warning: no previous prototype for 
'RegisterWaitForSingleObject'

instead.  So there might be some divergent header files around.

Anyone know details about this?

#2

pg_stat_statements.c: In function ‘pgss_ProcessUtility’:
pg_stat_statements.c:840:4: warning: unknown conversion type character ‘l’ in 
format [-Wformat]
pg_stat_statements.c:840:4: warning: too many arguments for format 
[-Wformat-extra-args]

We use a replacement snprintf and set the int64 format to %lld and %llu
based on that.  But pg_stat_statements.c uses sscanf, for which we have
no replacement.  The configure check comments

# MinGW uses '%I64d', though gcc throws an warning with -Wall,
# while '%lld' doesn't generate a warning, but doesn't work.

So assuming that sscanf in the mingw C library works consistently with
snprintf, that might mean that pg_stat_statements is broken on that
platform.  (The claim that %lld doesn't generate a warning is also
questionable here.)


[0]: 
http://people.planetpostgresql.org/andrew/index.php?/archives/264-Cross-compiling-PostgreSQL-for-WIndows.html
[1]: 
http://buildfarm.postgresql.org/cgi-bin/show_stage_log.pl?nm=narwhal&dt=2012-06-22%2004%3A00%3A05&stg=make

PS: Instructions for Debian:

apt-get install gcc-mingw-w64

./configure --build=$(config/config.guess) --host=i686-w64-mingw32 
--without-zlib --without-readline ZIC=/usr/sbin/zic
make world



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] temporal support patch

2012-06-25 Thread Vlad Arkhipov


On 05/31/2012 11:52 AM, Jeff Davis wrote:

On Wed, 2012-05-16 at 23:14 +0200, Miroslav Šimulčík wrote:

Hi all,


as a part of my master's thesis I have created temporal support patch
for PostgreSQL. It enables the creation of special temporal tables
with entries versioning. Modifying operations (UPDATE, DELETE,
TRUNCATE) on these tables don't cause permanent changes to entries,
but create new versions of them. Thus user can easily get to the past
states of the table.


I would be very interested to see this, thank you for working on it.

There are quite a few aspects to a temporal database system, and you are
working on a system-maintained transaction-time historical table, right?
Or are there other aspects to your proposal?

Some general comments:

* I'd very much like to see you make use of Range Types from 9.2; in
particular, TSTZRANGE would be much better than holding two timestamps.
If a standard requires you to display two timestamps in certain
situations, perhaps you could use ranges internally and display the
boundaries as timestamps when needed.
It's not sufficient to store only a period of validity for a row. If two 
transactions started in the same time change the same record, you have a 
problem with TSTZRANGE type because it's normalized to empty interval. 
The other issue is how to handle multiple changes of the same record 
within the transaction. Should they be stored or not?
Also it's necessary to store some kind of operation type that was 
applied to the record (insert/update/delete). For example, there is a 
table with one record with validity period [0, ) and value 'A'.


First way
1. Delete this record in time 1, now there is [0, 1), A in the history 
table.
2. Insert a new record in time 1, now there is [0, 1), A in the history 
table and [1, ), B record in the current data table.


Second way
1. Update this record in time 1, now there is [0, 1), A in the history 
table and [1, ), B record in the current data table.


So you have the same data in the tables but the actions that led to this 
configuration were different and the history has been lost partly.



* There is other useful information that could be recorded, such as the
user who inserted/updated/deleted the record.
I'm not sure that the database user is the proper thing to be stored in 
the history table. Many applications usually connect to a database using 
some virtual user and have their own users/roles tables to handle with 
privileges. There should be some way to substitute the stored user in 
the history table with the application's one. It's also helpful to store 
transaction id that inserted/updated/deleted the record.



* For some purposes, it's very useful to keep track of the columns that
changed. For instance, a query like "show me any time a salary was
changed over the last month" (or some other rare event) would be very
slow to run if there was not some explicit annotation on the historical
records (e.g. a "columns changed" bitmap or something).
It's a great proposal but seems to be impossible to implement with 
triggers only solution, isn't it? Is there any kind of hooks on ALTER 
TABLE ... in PostgreSQL to update changed columns bitmaps when table 
structure changes?

* In general, I'm not fond of adorning queries with TRANSACTION TIME AS
OF... kinds of things. Those constructs are redundant with a WHERE
clause (on a range type, you'd use the "contains" operator). If a
standard requires that, maybe it would be OK to allow such things as
syntactic sugar.
In SQL2011 there is only one table with the all data, historical and 
current. So it's not very convenient to specifiy WHERE condition on 
system time everywhere and for all tables in the query. By default only 
the current data is selected with a query like SELECT * FROM table.

* As Jim mentioned, it might make sense to use something resembling
inheritance so that selecting from the historical table includes the
current data (but with no upper bound for the range).
We have a success experience with inheritance with our trigger-based 
solution. It's completely transparent for the existing applications and 
does not have any impact on performance.


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] pg_tablespace.spclocation column removed in 9.2

2012-06-25 Thread Pavel Golub

Hello, Guillaume.

You wrote:

GL> Hi Pavel,

GL> On Mon, 2012-06-25 at 08:26 +0300, Pavel Golub wrote:
>> Hello, Pgsql-bugs.
>> 
>> According to the "Moving tablespaces" thread started by Bruce
>> http://archives.postgresql.org/pgsql-docs/2011-12/msg3.php
>> pg_tablespace.spclocation column is removed in the 9.2beta. However
>> this breaks backward compatibility for a bunch of products, e.g.
>> pgAdmin, phpPgAdmin, PgMDD etc.
>> 
>> I'm not sure this is the best choice. Because each application with
>> tablespace support will need additional check now to determine what
>> way to use for obtaining tablespace location:
>> pg_get_tablespace_location(oid) or tablespace.spclocation
>> 
>> I'm aware of problems caused by this hard coded column. My proposal is
>> to convert pg_tablespace to system view may be?
>> 

GL> I don't see why it causes you so much trouble.

Not so much. However.

GL> You should already have
GL> many locations in your code where you need to check the version to be
GL> compatible with the latest major releases.

This is holy true.

GL> I know pgAdmin does. So I
GL> guess that one more is not a big deal.

GL> And this change in PostgreSQL helps a lot DBAs who want to move
GL> tablespaces (not really common work AFAIK, I agree).

I know. I just followed the advice of Josh Berkus and added this as a
bug.

-- 
With best wishes,
 Pavel  mailto:pa...@gf.microolap.com

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] pg_tablespace.spclocation column removed in 9.2

2012-06-25 Thread Guillaume Lelarge

Hi Pavel,

On Mon, 2012-06-25 at 08:26 +0300, Pavel Golub wrote:
> Hello, Pgsql-bugs.
> 
> According to the "Moving tablespaces" thread started by Bruce
> http://archives.postgresql.org/pgsql-docs/2011-12/msg3.php
> pg_tablespace.spclocation column is removed in the 9.2beta. However
> this breaks backward compatibility for a bunch of products, e.g.
> pgAdmin, phpPgAdmin, PgMDD etc.
> 
> I'm not sure this is the best choice. Because each application with
> tablespace support will need additional check now to determine what
> way to use for obtaining tablespace location:
> pg_get_tablespace_location(oid) or tablespace.spclocation
> 
> I'm aware of problems caused by this hard coded column. My proposal is
> to convert pg_tablespace to system view may be?
> 

I don't see why it causes you so much trouble. You should already have
many locations in your code where you need to check the version to be
compatible with the latest major releases. I know pgAdmin does. So I
guess that one more is not a big deal.

And this change in PostgreSQL helps a lot DBAs who want to move
tablespaces (not really common work AFAIK, I agree).


-- 
Guillaume
http://blog.guillaume.lelarge.info
http://www.dalibo.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] warning handling in Perl scripts

2012-06-25 Thread Peter Eisentraut

On sön, 2012-06-24 at 16:05 -0400, Robert Haas wrote:
> > diff --git a/src/backend/catalog/genbki.pl b/src/backend/catalog/genbki.pl
> > index ebc4825..7d66da9 100644
> > --- a/src/backend/catalog/genbki.pl
> > +++ b/src/backend/catalog/genbki.pl
> > @@ -19,6 +19,8 @@
> >  use strict;
> >  use warnings;
> >
> > +local $SIG{__WARN__} = sub { die $_[0] };
> > +
> >  my @input_files;
> >  our @include_path;
> >  my $output_path = '';
> >
> > would address that.
> >
> > Could that cause any other problems?  Should it be added to all Perl
> > scripts?
> 
> This seems like a band-aid.

I'd think of it as a safety net.

> How about if we instead add whatever error-handling the script is
> missing, so that it produces an appropriate, human-readable error
> message?

That could also be worthwhile, but I think given the audience of that
script and the complexity of the possible failures, it could be a very
large and aimless project.  But there should still be a catch-all for
uncaught failures.


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

76 matches

Mail list logo