Re: [HACKERS] bg worker: general purpose requirements

2010-09-17 Thread tomas
-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1

On Fri, Sep 17, 2010 at 11:21:13PM -0400, Robert Haas wrote:

[...]

> Wow, 100 processes??! Really?  I guess I don't actually know how large
> modern proctables are, but on my MacOS X machine, for example, there
> are only 75 processes showing up right now in "ps auxww".  My Fedora
> 12 machine has 97.  That's including a PostgreSQL instance in the
> first case and an Apache instance in the second case.  So 100 workers
> seems like a ton to me.

As an equally unscientific data point, on my box, a typical desktop box
(actually a netbook, slow CPU, but beefed up to 2GB RAM), I have 5
PostgreSQL processes running, which take away about 1.2 MB (resident) --
not each one, but together!. As a contrast, there is *one* mysql daemon
(don't ask!), taking away 17 MB. The worst offenders are, by far, the
eye-candy thingies, as one has become accustomed to expect :-(

What I wanted to say is that the PostgreSQL processes are unusually
light-weight by modern standards.

Regards
- -- tomás
-BEGIN PGP SIGNATURE-
Version: GnuPG v1.4.6 (GNU/Linux)

iD8DBQFMlFEhBcgs9XrR2kYRAlqHAJ9rz5eQhqnh62H5QljDjU0E68ai6wCffnCW
ybV0RIdDy769/JYBBq7xakA=
=7Vc/
-END PGP SIGNATURE-

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] bg worker: general purpose requirements

2010-09-17 Thread Tom Lane
Robert Haas  writes:
> Wow, 100 processes??! Really?  I guess I don't actually know how large
> modern proctables are, but on my MacOS X machine, for example, there
> are only 75 processes showing up right now in "ps auxww".  My Fedora
> 12 machine has 97.  That's including a PostgreSQL instance in the
> first case and an Apache instance in the second case.  So 100 workers
> seems like a ton to me.

The part of that that would worry me is open files.  PG backends don't
have any compunction about holding open hundreds of files.  Apiece.
You can dial that down but it'll cost you performance-wise.  Last
I checked, most Unix kernels still had limited-size FD arrays.

And as you say, ProcArray manipulations aren't going to be terribly
happy about large numbers of idle backends, either.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] bg worker: general purpose requirements

2010-09-17 Thread Robert Haas
On Fri, Sep 17, 2010 at 4:49 PM, Markus Wanner  wrote:
>> If you're optimizing for
>> ability to respond quickly to a sudden load, keeping idle backends
>> will probably win even when the number of them you're keeping around
>> is fairly high.  If you're optimizing for minimal overall resource
>> consumption, though, you'll not be as happy about that.
>
> What resources are we talking about here? Are idle backends really that
> resource hungry? My feeling so far has been that idle processes are
> relatively cheap (i.e. some 100 idle processes shouldn't hurt on a modern
> server).

Wow, 100 processes??! Really?  I guess I don't actually know how large
modern proctables are, but on my MacOS X machine, for example, there
are only 75 processes showing up right now in "ps auxww".  My Fedora
12 machine has 97.  That's including a PostgreSQL instance in the
first case and an Apache instance in the second case.  So 100 workers
seems like a ton to me.

>> What I'm
>> struggling to understand is this: if there aren't any preforked
>> workers around when the load hits, how much does it slow things down?
>
> As the startup code is pretty much the same as for the current avlauncher,
> the coordinator can only request one bgworker at a time.
>
> This means the signal needs to reach the postmaster, which then forks a
> bgworker process. That new process starts up, connects to the requested
> database and then sends an imessage to the coordinator to register. Only
> after having received that registration, the coordinator can request another
> bgworker (note that this is a one-overall limitation, not per database).
>
> I haven't measured the actual time it takes, but given the use case of a
> connection pool, I so far thought it's obvious that this process takes too
> long.

Maybe that would be a worthwhile exercise...

> (It's exactly what apache pre-fork does, no? Is anybody concerned about the
> idle processes there? Or do they consume much less resources?)

I think the kicker here is the idea of having a certain number of
extra workers per database.  On my vanilla Apache server on the
above-mentioned Fedora 12 VM, there are a total of 10 processes
running.  I am sure that could balloon to 100 or more under load, but
it's not keeping around 100 processes on an otherwise idle system.  So
if you knew you only had 1 database, keeping around 2 or 3 or 5 or
even 10 workers might seem reasonable, but since you might have 1
database or 1000 databases, it doesn't.  Keeping 2 or 3 or 5 or 10
workers TOTAL around could be reasonable, but not per-database.  As
Tom said upthread, we don't want to assume that we're the only thing
running on the box and are therefore entitled to take up all the
available memory/disk/process slots/whatever.  And even if we DID feel
so entitled, there could be hundreds of databases, and it certainly
doesn't seem practical to keep 1000 workers around "just in case".

I don't know whether an idle Apache worker consumes more or less
memory than an idle PostgreSQL worker, but another difference between
the Apache case and the PostgreSQL case is that presumably all those
backend processes have attached shared memory and have ProcArray
slots.  We know that code doesn't scale terribly well, especially in
terms of taking snapshots, and that's one reason why high-volume
PostgreSQL installations pretty much require a connection pooler.  I
think the sizes of the connection pools I've seen recommended are
considerably smaller than 100, more like 2 * CPUs + spindles, or
something like that.  It seems like if you actually used all 100
workers at the same time performance might be pretty awful.

I was taking a look at the Mammoth Replicator code this week
(parenthetical note: I couldn't figure out where mcp_server was or how
to set it up) and it apparently has a limitation that only one
database in the cluster can be replicated.  I'm a little fuzzy on how
Mammoth works, but apparently this problem of scaling to large numbers
of databases is not unique to Postgres-R.

>> Is the ramp-up time longer than that,
>> or is even that much delay unacceptable for Postgres-R, or is there
>> some other aspect to the problem I'm failing to grasp?  I can tell you
>> have some experience tuning this so I'd like to try to understand
>> where you're coming from.
>
> I didn't ever compare to a max_spare_background_workers = 0 configuration,
> so I don't have any hard numbers, sorry.

Hmm, OK.

>> I think this is an interesting example, and worth some further
>> thought.  I guess I don't really understand how Postgres-R uses these
>> bgworkers.
>
> The given example doesn't only apply to Postgres-R. But with fewer bgworkers
> in total, you are more likely to want to use them all for one database, yes.
>
>> Are you replicating one transaction at a time, or how does
>> the data get sliced up?
>
> Yes, one transaction at a time. One transaction per backend (bgworker). On a
> cluster with n nodes that has only performs writing transaction

Re: [HACKERS] patch: Add JSON datatype to PostgreSQL (GSoC, WIP)

2010-09-17 Thread Itagaki Takahiro
On Sat, Sep 18, 2010 at 11:46 AM, Robert Haas  wrote:
>  wrote:
>> One of my proposal is we don't have to keep the original input text.
>> We store JSON data in effective internal formats. If users want to get
>> human-readable output, they can use stringify() with indentation option.
>
> There's a trade-off here: this will make some things faster, but other
> things slower.  Probably some discussion of the pros and cons is in
> order.

I didn't intended to introduce non-text internal formats. The original
patch  spent some codes to keep all of whitespaces as-is in the input.
But I'd say we can simplify it.

Except whitespaces, normalization of strings and numbers might be
problem when we support JSON comparison operators -- comparison of
Unicode escaped characters in strings or 0 vs. 0.0 in numbers.

-- 
Itagaki Takahiro

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] patch: Add JSON datatype to PostgreSQL (GSoC, WIP)

2010-09-17 Thread Robert Haas
On Fri, Sep 17, 2010 at 10:28 PM, Itagaki Takahiro
 wrote:
> One of my proposal is we don't have to keep the original input text.
> We store JSON data in effective internal formats. If users want to get
> human-readable output, they can use stringify() with indentation option.

There's a trade-off here: this will make some things faster, but other
things slower.  Probably some discussion of the pros and cons is in
order.

(Also, it's important not to break EXPLAIN (FORMAT JSON), which thinks
that the internal format of JSON is text.)

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise Postgres Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] patch: Add JSON datatype to PostgreSQL (GSoC, WIP)

2010-09-17 Thread Bruce Momjian
Itagaki Takahiro wrote:
> On Sat, Sep 18, 2010 at 6:45 AM, Joseph Adams
>  wrote:
> >>  Why do we need to store the value in UTF8 encoding?
> >
> >  because the JSON code needs to handle Unicode escapes like
> > "\u266B", but there is no simple and efficient way (that I know of) to
> > convert single characters to/from the server encoding.
> 
> Ah, we don't need UTF8 encoding only to store JSON data, but we should
> care about Unicode escape when we support comparison and extracting
> values from JSON, right? I see the worth encoding to UTF8.
> 
> One of my proposal is we don't have to keep the original input text.
> We store JSON data in effective internal formats. If users want to get
> human-readable output, they can use stringify() with indentation option.
> 
> > I think your patch is on a better footing than mine, so maybe I should
> > start contributing to your code rather than the other way around.
> > Before the next commitfest, I could merge the testcases from my patch
> > in and identify parsing discrepancies (if any). ?Afterward, I could
> > help merge the other features into the new JSON infrastructure.
> 
> Thanks! I'll contribute my codes developed for another project
> (PL/JavaScript), and let's merge our codes to the core.
> 
> > I can't compile your initial patch against the latest checkout because
> > json_parser.h and json_scanner.h are missing.
> 
> Hmm, those files should be generated from .y and .l files. I'll check it.

I am please the two efforts can be joined.  I like the idea of
PL/JavaScript too.

-- 
  Bruce Momjian  http://momjian.us
  EnterpriseDB http://enterprisedb.com

  + It's impossible for everything to be true. +

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] patch: Add JSON datatype to PostgreSQL (GSoC, WIP)

2010-09-17 Thread Itagaki Takahiro
On Sat, Sep 18, 2010 at 6:45 AM, Joseph Adams
 wrote:
>>  Why do we need to store the value in UTF8 encoding?
>
>  because the JSON code needs to handle Unicode escapes like
> "\u266B", but there is no simple and efficient way (that I know of) to
> convert single characters to/from the server encoding.

Ah, we don't need UTF8 encoding only to store JSON data, but we should
care about Unicode escape when we support comparison and extracting
values from JSON, right? I see the worth encoding to UTF8.

One of my proposal is we don't have to keep the original input text.
We store JSON data in effective internal formats. If users want to get
human-readable output, they can use stringify() with indentation option.

> I think your patch is on a better footing than mine, so maybe I should
> start contributing to your code rather than the other way around.
> Before the next commitfest, I could merge the testcases from my patch
> in and identify parsing discrepancies (if any).  Afterward, I could
> help merge the other features into the new JSON infrastructure.

Thanks! I'll contribute my codes developed for another project
(PL/JavaScript), and let's merge our codes to the core.

> I can't compile your initial patch against the latest checkout because
> json_parser.h and json_scanner.h are missing.

Hmm, those files should be generated from .y and .l files. I'll check it.

-- 
Itagaki Takahiro

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Heartbeat between Primary and Standby replicas

2010-09-17 Thread fazool mein
Apologies. I'm new to Postgres and I didn't see that feature. It satisfies
what I want to do.

Thanks.

On Thu, Sep 16, 2010 at 7:34 PM, Fujii Masao  wrote:

> On Fri, Sep 17, 2010 at 6:49 AM, fazool mein  wrote:
> > I am designing a heartbeat system between replicas to know when a replica
> > goes down so that necessary measures can be taken. As I see, there are
> two
> > ways of doing it:
> >
> > 1) Creating a separate heartbeat process on replicas.
> > 2) Creating a heartbeat message, and sending it over the connection that
> is
> > already established between walsender and walreceiver.
> >
> > With 2, sending heartbeat from walsender to walreceiver seems trivial.
> > Sending a heartbeat from walreceiver to walsender seems tricky. Going
> > through the code, it seems that the walreceiver is always in the
> > PGASYNC_COPY_OUT mode (except in the beginning when handshaking is done).
> >
> > Can you recommend the right way of doing this?
>
> The existing keepalive feature doesn't help?
>
> Regards,
>
> --
> Fujii Masao
> NIPPON TELEGRAPH AND TELEPHONE CORPORATION
> NTT Open Source Software Center
>


Re: [HACKERS] VACUUM produces odd freespace values

2010-09-17 Thread Bruce Momjian
Thom Brown wrote:
> > VACUUM mvcc_demo;
> > VACUUM
> > SELECT pg_relation_size('mvcc_demo');
> > ?pg_relation_size
> > --
> > ? ? ? ? ? ? ? ?0
> > (1 row)
> >
> 
> That's odd.  When I delete val 2, the freespace goes back up in 9.0rc1
> (attached).

Your numbers are odd too.  With one row after INSERT you show:

 (0,8128)

but after delete you with one row left you show:

 (0,8096)

You also dip to (0,8032), which I don't see.  I now see we only track
range of free space values. For example, freespace.c has:

 * We use just one byte to store the amount of free space on a page, so we
 * divide the amount of free space a page can have into 256 different
 * categories. The highest category, 255, represents a page with at least
 * MaxFSMRequestSize bytes of free space, and the second highest category
 * represents the range from 254 * FSM_CAT_STEP, inclusive, to
 * MaxFSMRequestSize, exclusive.
 *
 * MaxFSMRequestSize depends on the architecture and BLCKSZ, but assuming
 * default 8k BLCKSZ, and that MaxFSMRequestSize is 24 bytes, the categories
 * look like this
 *
 *
 * Range Category
 * 0- 31   0
 * 32   - 63   1
 * ......  ...
 * 8096 - 8127 253
 * 8128 - 8163 254
 * 8164 - 8192 255

So, my guess is that the unused item pointers are causing the free space
to fall into a smaller category than we had after the first INSERT.  I
bet if I pulled more columns from heap_page_items() I could see it.

Anyway, I think I have my explaination now.  Thanks.

-- 
  Bruce Momjian  http://momjian.us
  EnterpriseDB http://enterprisedb.com

  + It's impossible for everything to be true. +

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Report: removing the inconsistencies in our CVS->git conversion

2010-09-17 Thread Tom Lane
I wrote:
> Oh, mystery explained upon comparing the cvs2git.options files.
> I was using Max's file which had this in it:
> ExcludeRegexpStrategyRule(r'unlabeled-.*'),
> I think I'll rerun without that just to convince myself of what it is
> we're dropping.  But right now it seems that everything is pretty sane.

OK, I found out what we're dropping.  Those "unlabeled" branches each
have a manufactured creation commit that just deletes a lot of files,
plus a single real commit, which all look like this:

  
+ commit 6c23cb5f371f273cade66408f3a888c2f94af293   
refs/heads/unlabeled-1.51.2
+ Author: Tom Lane 
+ Date:   Sat Sep 28 20:00:29 2002 +
+ 
+ Make the world at least somewhat safe for zero-column tables, and
+ remove the special case in ALTER DROP COLUMN to prohibit dropping a
+ table's last column.
+ 
+ M src/test/regress/expected/alter_table.out
+ 
+ commit d554d56827e4337835b0792c2568e8e9fee2d950   
refs/heads/unlabeled-1.87.2
+ Author: Tom Lane 
+ Date:   Sat Sep 28 20:00:28 2002 +
+ 
+ Make the world at least somewhat safe for zero-column tables, and
+ remove the special case in ALTER DROP COLUMN to prohibit dropping a
+ table's last column.
+ 
+ M src/backend/executor/nodeAgg.c
+ 
+ commit 7724eba3a121e5c84da3467fe6f0b176bf3f3d4b   
refs/heads/unlabeled-1.59.2
+ Author: Tom Lane 
+ Date:   Sat Sep 28 20:00:27 2002 +
+ 
+ Make the world at least somewhat safe for zero-column tables, and
+ remove the special case in ALTER DROP COLUMN to prohibit dropping a
+ table's last column.
+ 
+ M src/backend/executor/execTuples.c
+ 
+ commit 6924390f0cfa2aac1308719e51cb4cc2b24e2bc7   
refs/heads/unlabeled-1.44.2
+ Author: Tom Lane 
+ Date:   Sat Sep 28 20:00:26 2002 +
+ 
+ Make the world at least somewhat safe for zero-column tables, and
+ remove the special case in ALTER DROP COLUMN to prohibit dropping a
+ table's last column.
+ 
+ M src/backend/commands/tablecmds.c
+ 
+ commit 83029b0525f3324c30d35fcc9c77d2c301bdf7cf   
refs/heads/unlabeled-1.90.2
+ Author: Tom Lane 
+ Date:   Sat Sep 28 20:00:25 2002 +
+ 
+ Make the world at least somewhat safe for zero-column tables, and
+ remove the special case in ALTER DROP COLUMN to prohibit dropping a
+ table's last column.
+ 
+ M src/backend/access/common/tupdesc.c
+ M src/backend/parser/parse_target.c
+ 
  commit 6d0d15c451739396851d3f93f81c63a47535bf1e   refs/tags/REL7_4_BETA1
  Author: Tom Lane 
  Date:   Sat Sep 28 20:00:19 2002 +


If you go and look at the CVS history of these files, you'll see that
indeed they all have branches sprouted on that date that are
unreferenced.

This commit occurred during that interval between where we'd initially
branched 7.3 and where we moved it up to head.  I think what happened
was that I tried to back-patch a fix into what was then the 7.3 branch,
and when Marc moved the branch point, these commits all ended up
orphans.  So indeed there's no harm discarding them.  (I always
suspected that what Marc had done wasn't entirely kosher, and yup,
now the chickens are coming home to roost.)

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] VACUUM produces odd freespace values

2010-09-17 Thread Thom Brown
On 18 September 2010 02:00, Bruce Momjian  wrote:
> Bruce Momjian wrote:
>> Can anyone explain why VACUUM after INSERT shows steadily decreasing
>> freespace, while DELETE of the same rows does not decrease consistently?
>>
>> Specifically, after one row is inserted I see:
>>
>>       SELECT pg_freespace('mvcc_demo');
>>        pg_freespace
>>       --
>>        (0,8128)
>>       (1 row)
>>
>> but after inserting two more rows and deleting those two rows, I see:
>>
>>       SELECT pg_freespace('mvcc_demo');
>>        pg_freespace
>>       --
>>        (0,8096)
>>       (1 row)
>>
>> Seems that value should be '(0,8128)'.  Is it the unused line pointers
>> that are causing this?
>>
>> Another odd thing --- if I change the second VACUUM to VACUUM FULL I
>> see:
>>
>>       VACUUM FULL mvcc_demo;
>>       VACUUM
>>       SELECT pg_freespace('mvcc_demo');
>>        pg_freespace
>>       --
>>        (0,0)
>>       (1 row)
>>
>> There is still a row in the table, so why is there no free space
>> reported?  I realize after VACUUM FULL that only the last page has
>> freespace --- do we assume that will be used as default for the next
>> addition and just not bother with the free space map? --- makes sense if
>> we do that.  Does this happen because cluster creates a new relfilenode?
>>
>> I am attaching the init script, the SQL query script, and the results I
>> obtained against our CVS HEAD.
>
> Sorry.  Attached is trimmed-down result file that shows just the
> problem.
>
> --
>  Bruce Momjian          http://momjian.us
>  EnterpriseDB                             http://enterprisedb.com
>
>  + It's impossible for everything to be true. +
>
>
> 00-init.sql
> --
> This script is designed to run in a database called test
> and requires installation of /contrib/pageinspect and
> /contrib/pg_freespacemap.
> You are now connected to database "test" as user "postgres".
> DROP TABLE IF EXISTS mvcc_demo;
> DROP TABLE
> CREATE TABLE mvcc_demo (val INTEGER);
> CREATE TABLE
> DROP VIEW IF EXISTS mvcc_demo_page0;
> DROP VIEW
> CREATE VIEW mvcc_demo_page0 AS
>        SELECT  '(0,' || lp || ')' AS ctid,
>                CASE lp_flags
>                        WHEN 0 THEN 'Unused'
>                        WHEN 1 THEN 'Normal'
>                        WHEN 2 THEN 'Redirect to ' || lp_off
>                        WHEN 3 THEN 'Dead'
>                END,
>                t_xmin::text::int8 AS xmin,
>                t_xmax::text::int8 AS xmax,
>                t_ctid
>        FROM heap_page_items(get_raw_page('mvcc_demo', 0))
>        ORDER BY lp;
> CREATE VIEW
>
> 31-vacuum-freesp.sql
> --
> TRUNCATE mvcc_demo;
> TRUNCATE TABLE
> VACUUM mvcc_demo;
> VACUUM
> SELECT pg_freespace('mvcc_demo');
>  pg_freespace
> --
> (0 rows)
>
> INSERT INTO mvcc_demo VALUES (1);
> INSERT 0 1
> VACUUM mvcc_demo;
> VACUUM
> SELECT pg_freespace('mvcc_demo');
>  pg_freespace
> --
>  (0,8128)
> (1 row)
>
> INSERT INTO mvcc_demo VALUES (2);
> INSERT 0 1
> VACUUM mvcc_demo;
> VACUUM
> SELECT pg_freespace('mvcc_demo');
>  pg_freespace
> --
>  (0,8096)
> (1 row)
>
> INSERT INTO mvcc_demo VALUES (3);
> INSERT 0 1
> VACUUM mvcc_demo;
> VACUUM
> SELECT pg_freespace('mvcc_demo');
>  pg_freespace
> --
>  (0,8064)
> (1 row)
>
> DELETE FROM mvcc_demo WHERE val = 3;
> DELETE 1
> VACUUM mvcc_demo;
> VACUUM
> SELECT pg_freespace('mvcc_demo');
>  pg_freespace
> --
>  (0,8096)
> (1 row)
>
> DELETE FROM mvcc_demo WHERE val = 2;
> DELETE 1
> VACUUM mvcc_demo;
> VACUUM
> SELECT pg_freespace('mvcc_demo');
>  pg_freespace
> --
>  (0,8096)
> (1 row)
>
> SELECT * FROM mvcc_demo_page0;
>  ctid  |  case  | xmin | xmax | t_ctid
> ---++--+--+
>  (0,1) | Normal | 1339 |    0 | (0,1)
>  (0,2) | Unused |      |      |
>  (0,3) | Unused |      |      |
> (3 rows)
>
> DELETE FROM mvcc_demo WHERE val = 1;
> DELETE 1
> VACUUM mvcc_demo;
> VACUUM
> SELECT pg_freespace('mvcc_demo');
>  pg_freespace
> --
> (0 rows)
>
> VACUUM mvcc_demo;
> VACUUM
> SELECT pg_relation_size('mvcc_demo');
>  pg_relation_size
> --
>                0
> (1 row)
>

That's odd.  When I delete val 2, the freespace goes back up in 9.0rc1
(attached).

-- 
Thom Brown
Twitter: @darkixion
IRC (freenode): dark_ixion
Registered Linux user: #516935
mvcctest=# INSERT INTO mvcc_demo VALUES (1);VACUUM mvcc_demo;SELECT 
pg_freespace('mvcc_demo');
INSERT 0 1
VACUUM
 pg_freespace 
--
 (0,8128)
(1 row)

mvcctest=# INSERT INTO mvcc_demo VALUES (2);VACUUM mvcc_demo;SELECT 
pg_freespace('mvcc_demo');
INSERT 0 1
VACUUM
 pg_freespace 
--
 (0,8064)
(1 row)

mvcctest=# INSERT INTO mvcc_demo VALUES (3);VACUUM mvcc_demo;SELECT 
pg_freespace('mvcc_demo');
INSERT 0 1
VACUUM
 pg_freespace 
--
 (0,8032)
(1 row)

=# DELETE FROM mvcc_demo WHERE val = 3;VACUUM mvcc_demo;SELECT 
pg_freespace('mvcc_demo');
DELETE 1
VACUUM

Re: [HACKERS] Bad cast priority for DATE?

2010-09-17 Thread Josh Berkus

> timestamptz is a preferred type, so no you probably can't change that
> without breaking a lot of stuff.  It's not immediately clear to me why
> that's wrong anyway.

Just that having a value implicitly acquire time zone information it
didn't originally have seems dangerous.  But I can't come up with a
specific example of breakage right now -- at least not one on a single
server.

-- 
  -- Josh Berkus
 PostgreSQL Experts Inc.
 http://www.pgexperts.com

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] VACUUM produces odd freespace values

2010-09-17 Thread Bruce Momjian
Bruce Momjian wrote:
> Can anyone explain why VACUUM after INSERT shows steadily decreasing
> freespace, while DELETE of the same rows does not decrease consistently?
> 
> Specifically, after one row is inserted I see:
> 
>   SELECT pg_freespace('mvcc_demo');
>pg_freespace
>   --
>(0,8128)
>   (1 row)
> 
> but after inserting two more rows and deleting those two rows, I see:
> 
>   SELECT pg_freespace('mvcc_demo');
>pg_freespace
>   --
>(0,8096)
>   (1 row)
> 
> Seems that value should be '(0,8128)'.  Is it the unused line pointers
> that are causing this?
> 
> Another odd thing --- if I change the second VACUUM to VACUUM FULL I
> see:
> 
>   VACUUM FULL mvcc_demo;
>   VACUUM
>   SELECT pg_freespace('mvcc_demo');
>pg_freespace
>   --
>(0,0)
>   (1 row)
> 
> There is still a row in the table, so why is there no free space
> reported?  I realize after VACUUM FULL that only the last page has
> freespace --- do we assume that will be used as default for the next
> addition and just not bother with the free space map? --- makes sense if
> we do that.  Does this happen because cluster creates a new relfilenode?
> 
> I am attaching the init script, the SQL query script, and the results I
> obtained against our CVS HEAD.

Sorry.  Attached is trimmed-down result file that shows just the
problem.

-- 
  Bruce Momjian  http://momjian.us
  EnterpriseDB http://enterprisedb.com

  + It's impossible for everything to be true. +

00-init.sql
--
This script is designed to run in a database called test
and requires installation of /contrib/pageinspect and
/contrib/pg_freespacemap.
You are now connected to database "test" as user "postgres".
DROP TABLE IF EXISTS mvcc_demo;
DROP TABLE
CREATE TABLE mvcc_demo (val INTEGER);
CREATE TABLE
DROP VIEW IF EXISTS mvcc_demo_page0;
DROP VIEW
CREATE VIEW mvcc_demo_page0 AS
SELECT  '(0,' || lp || ')' AS ctid,
CASE lp_flags
WHEN 0 THEN 'Unused'
WHEN 1 THEN 'Normal'
WHEN 2 THEN 'Redirect to ' || lp_off
WHEN 3 THEN 'Dead'
END,
t_xmin::text::int8 AS xmin,
t_xmax::text::int8 AS xmax, 
t_ctid
FROM heap_page_items(get_raw_page('mvcc_demo', 0))
ORDER BY lp;
CREATE VIEW

31-vacuum-freesp.sql
--
TRUNCATE mvcc_demo;
TRUNCATE TABLE
VACUUM mvcc_demo;
VACUUM
SELECT pg_freespace('mvcc_demo');
 pg_freespace 
--
(0 rows)

INSERT INTO mvcc_demo VALUES (1);
INSERT 0 1
VACUUM mvcc_demo;
VACUUM
SELECT pg_freespace('mvcc_demo');
 pg_freespace 
--
 (0,8128)
(1 row)

INSERT INTO mvcc_demo VALUES (2);
INSERT 0 1
VACUUM mvcc_demo;
VACUUM
SELECT pg_freespace('mvcc_demo');
 pg_freespace 
--
 (0,8096)
(1 row)

INSERT INTO mvcc_demo VALUES (3);
INSERT 0 1
VACUUM mvcc_demo;
VACUUM
SELECT pg_freespace('mvcc_demo');
 pg_freespace 
--
 (0,8064)
(1 row)

DELETE FROM mvcc_demo WHERE val = 3;
DELETE 1
VACUUM mvcc_demo;
VACUUM
SELECT pg_freespace('mvcc_demo');
 pg_freespace 
--
 (0,8096)
(1 row)

DELETE FROM mvcc_demo WHERE val = 2;
DELETE 1
VACUUM mvcc_demo;
VACUUM
SELECT pg_freespace('mvcc_demo');
 pg_freespace 
--
 (0,8096)
(1 row)

SELECT * FROM mvcc_demo_page0;
 ctid  |  case  | xmin | xmax | t_ctid 
---++--+--+
 (0,1) | Normal | 1339 |0 | (0,1)
 (0,2) | Unused |  |  | 
 (0,3) | Unused |  |  | 
(3 rows)

DELETE FROM mvcc_demo WHERE val = 1;
DELETE 1
VACUUM mvcc_demo;
VACUUM
SELECT pg_freespace('mvcc_demo');
 pg_freespace 
--
(0 rows)

VACUUM mvcc_demo;
VACUUM
SELECT pg_relation_size('mvcc_demo');
 pg_relation_size 
--
0
(1 row)

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] VACUUM produces odd freespace values

2010-09-17 Thread Bruce Momjian
Can anyone explain why VACUUM after INSERT shows steadily decreasing
freespace, while DELETE of the same rows does not decrease consistently?

Specifically, after one row is inserted I see:

SELECT pg_freespace('mvcc_demo');
 pg_freespace
--
 (0,8128)
(1 row)

but after inserting two more rows and deleting those two rows, I see:

SELECT pg_freespace('mvcc_demo');
 pg_freespace
--
 (0,8096)
(1 row)

Seems that value should be '(0,8128)'.  Is it the unused line pointers
that are causing this?

Another odd thing --- if I change the second VACUUM to VACUUM FULL I
see:

VACUUM FULL mvcc_demo;
VACUUM
SELECT pg_freespace('mvcc_demo');
 pg_freespace
--
 (0,0)
(1 row)

There is still a row in the table, so why is there no free space
reported?  I realize after VACUUM FULL that only the last page has
freespace --- do we assume that will be used as default for the next
addition and just not bother with the free space map? --- makes sense if
we do that.  Does this happen because cluster creates a new relfilenode?

I am attaching the init script, the SQL query script, and the results I
obtained against our CVS HEAD.

-- 
  Bruce Momjian  http://momjian.us
  EnterpriseDB http://enterprisedb.com

  + It's impossible for everything to be true. +

\echo This script is designed to run in a database called 'test'
\echo and requires installation of /contrib/pageinspect and
\echo /contrib/pg_freespacemap.
\c test

DROP TABLE IF EXISTS mvcc_demo;
CREATE TABLE mvcc_demo (val INTEGER);

DROP VIEW IF EXISTS mvcc_demo_page0;
CREATE VIEW mvcc_demo_page0 AS
SELECT  '(0,' || lp || ')' AS ctid,
CASE lp_flags
WHEN 0 THEN 'Unused'
WHEN 1 THEN 'Normal'
WHEN 2 THEN 'Redirect to ' || lp_off
WHEN 3 THEN 'Dead'
END,
t_xmin::text::int8 AS xmin,
t_xmax::text::int8 AS xmax, 
t_ctid
FROM heap_page_items(get_raw_page('mvcc_demo', 0))
ORDER BY lp;
-- clear out heap file
TRUNCATE mvcc_demo;

VACUUM mvcc_demo;

SELECT pg_freespace('mvcc_demo');

INSERT INTO mvcc_demo VALUES (1);

VACUUM mvcc_demo;

SELECT pg_freespace('mvcc_demo');

INSERT INTO mvcc_demo VALUES (2);

VACUUM mvcc_demo;

SELECT pg_freespace('mvcc_demo');

INSERT INTO mvcc_demo VALUES (3);

VACUUM mvcc_demo;

SELECT pg_freespace('mvcc_demo');

DELETE FROM mvcc_demo WHERE val = 3;

VACUUM mvcc_demo;

SELECT pg_freespace('mvcc_demo');

DELETE FROM mvcc_demo WHERE val = 2;

VACUUM mvcc_demo;

SELECT pg_freespace('mvcc_demo');

SELECT * FROM mvcc_demo_page0;

DELETE FROM mvcc_demo WHERE val = 1;

VACUUM mvcc_demo;

SELECT pg_freespace('mvcc_demo');

VACUUM mvcc_demo;

SELECT pg_relation_size('mvcc_demo');

00-init.sql
--
This script is designed to run in a database called test
and requires installation of /contrib/pageinspect and
/contrib/pg_freespacemap.
You are now connected to database "test" as user "postgres".
DROP TABLE IF EXISTS mvcc_demo;
DROP TABLE
CREATE TABLE mvcc_demo (val INTEGER);
CREATE TABLE
DROP VIEW IF EXISTS mvcc_demo_page0;
DROP VIEW
CREATE VIEW mvcc_demo_page0 AS
SELECT  '(0,' || lp || ')' AS ctid,
CASE lp_flags
WHEN 0 THEN 'Unused'
WHEN 1 THEN 'Normal'
WHEN 2 THEN 'Redirect to ' || lp_off
WHEN 3 THEN 'Dead'
END,
t_xmin::text::int8 AS xmin,
t_xmax::text::int8 AS xmax, 
t_ctid
FROM heap_page_items(get_raw_page('mvcc_demo', 0))
ORDER BY lp;
CREATE VIEW

01-xmin_ins.sql
--
DELETE FROM mvcc_demo;
DELETE 0
INSERT INTO mvcc_demo VALUES (1);
INSERT 0 1
SELECT xmin, xmax, * FROM mvcc_demo;
 xmin | xmax | val 
--+--+-
 1301 |0 |   1
(1 row)


02-xmax_del.sql
--
DELETE FROM mvcc_demo;
DELETE 1
INSERT INTO mvcc_demo VALUES (1);
INSERT 0 1
SELECT xmin, xmax, * FROM mvcc_demo;
 xmin | xmax | val 
--+--+-
 1303 |0 |   1
(1 row)

BEGIN WORK;
BEGIN
DELETE FROM mvcc_demo;
DELETE 1
SELECT xmin, xmax, * FROM mvcc_demo;
 xmin | xmax | val 
--+--+-
(0 rows)

SELECT xmin, xmax, * FROM mvcc_demo;
 xmin | xmax | val 
--+--+-
 1303 | 1304 |   1
(1 row)

COMMIT WORK;
COMMIT

03-xmax_upd.sql
--
DELETE FROM mvcc_demo;
DELETE 0
INSERT INTO mvcc_demo VALUES (1);
INSERT 0 1
SELECT xmin, xmax, * FROM mvcc_demo;
 xmin | xmax | val 
--+--+-
 1305 |0 |   1
(1 row)

BEGIN WORK;
BEGIN
UPDATE mvcc_demo SET val = 2;
UPDATE 1
SELECT xmin, xmax, * FROM mvcc_demo;
 xmin | xmax | val 
--+--+-
 1306 |0 |   2
(1 row)


Re: [HACKERS] Bad cast priority for DATE?

2010-09-17 Thread Tom Lane
Josh Berkus  writes:
> I was noticing that, where we have a function which has two versions,
> timestamp and timestamptz (for example, date_trunc()), if I use a DATE
> timestamptz is the default cast.  Shouldn't timestamp without time zone
> be the default?  Is this something we can fix without an overhaul of the
> type casting system?

timestamptz is a preferred type, so no you probably can't change that
without breaking a lot of stuff.  It's not immediately clear to me why
that's wrong anyway.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] Bad cast priority for DATE?

2010-09-17 Thread Josh Berkus
Folks,

I was noticing that, where we have a function which has two versions,
timestamp and timestamptz (for example, date_trunc()), if I use a DATE
timestamptz is the default cast.  Shouldn't timestamp without time zone
be the default?  Is this something we can fix without an overhaul of the
type casting system?


-- 
  -- Josh Berkus
 PostgreSQL Experts Inc.
 http://www.pgexperts.com

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Progress indication prototype

2010-09-17 Thread Robert Haas
On Fri, Sep 17, 2010 at 4:50 PM, Peter Eisentraut  wrote:
> On tor, 2010-09-16 at 19:14 -0400, Robert Haas wrote:
>> I think that there should be a function which returns just this one
>> piece of data and is not automatically called as part of select * from
>> pg_stat_activity.  Then we could eventually decide to give backends a
>> way to know if that function had been invoked on them and how
>> recently.
>
> Displaying this as part of pg_stat_activity is completely trivial: it's
> just displaying the value of a float variable.
>
> It seems you are advocating a completely different architecture, where
> someone can find out on demand what the progress or status of another
> session is, without that other session having known about that request
> before it started its current command.  But that seems pretty outlandish
> to me, and I would ask for more details on what you have in mind.

What you just said is about what I had in mind.  I admit I can't
articulate a more detailed design right off the top of my head, but
the architecture you're proposing seems dead certain to never cover
more than 0.1% of what people actually do.  If there's not even an
obvious way of generalizing this to the case of a full-database
VACUUM, let alone actual queries, that seems like a strong hint that
it might be badly designed.  Leaving some parts of the problem for
future development is perfectly reasonable, but there should be some
realistic hope that the next guy will be able to make some further
progress.

It seems to me that this is the sort of information that people will
normally never see, and therefore won't be willing to pay a
performance penalty for.  But when they need it (because something is
running long) they'll be happy to pay a modest penalty to get it.
Which is good, because the chances that we'll be able to provide this
information "for free" seem very poor even for utility commands.  But
it also means that we shouldn't carve the "can get this for free"
aspect of it into stone.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise Postgres Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Report: removing the inconsistencies in our CVS->git conversion

2010-09-17 Thread Tom Lane
Magnus Hagander  writes:
> On Sat, Sep 18, 2010 at 00:06, Tom Lane  wrote:
>> [ scratches head ... ]  That's weird.  We probably ought to figure out
>> why you and I are getting different results.  I wonder if there's some
>> other discrepancy in the anoncvs pull?

> Could be.

Oh, mystery explained upon comparing the cvs2git.options files.
I was using Max's file which had this in it:

ExcludeRegexpStrategyRule(r'unlabeled-.*'),

I think I'll rerun without that just to convince myself of what it is
we're dropping.  But right now it seems that everything is pretty sane.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Report: removing the inconsistencies in our CVS->git conversion

2010-09-17 Thread Magnus Hagander
On Sat, Sep 18, 2010 at 00:06, Tom Lane  wrote:
> Magnus Hagander  writes:
>> On Fri, Sep 17, 2010 at 23:01, Tom Lane  wrote:
>>> git branch -D unlabeled-1.44.2
>>> git branch -D unlabeled-1.51.2
>>> git branch -D unlabeled-1.59.2
>>> git branch -D unlabeled-1.87.2
>>> git branch -D unlabeled-1.90.2
>>>
>>> You should not need any of the above; I don't see those being generated
>>> anymore with the cleanup script in place.
>
>> I see them - at least the script didn't give an error when it ran, but
>> said it had deleted them.
>
> [ scratches head ... ]  That's weird.  We probably ought to figure out
> why you and I are getting different results.  I wonder if there's some
> other discrepancy in the anoncvs pull?

Could be.


> Anyway, if what you have up at
> http://github.com/mhagander/pggit_migrate
> is current, I can try to reproduce your results here.

It is - I just double-checked that.

-- 
 Magnus Hagander
 Me: http://www.hagander.net/
 Work: http://www.redpill-linpro.com/

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Report: removing the inconsistencies in our CVS->git conversion

2010-09-17 Thread Tom Lane
Magnus Hagander  writes:
> On Fri, Sep 17, 2010 at 23:01, Tom Lane  wrote:
>> git branch -D unlabeled-1.44.2
>> git branch -D unlabeled-1.51.2
>> git branch -D unlabeled-1.59.2
>> git branch -D unlabeled-1.87.2
>> git branch -D unlabeled-1.90.2
>> 
>> You should not need any of the above; I don't see those being generated
>> anymore with the cleanup script in place.

> I see them - at least the script didn't give an error when it ran, but
> said it had deleted them.

[ scratches head ... ]  That's weird.  We probably ought to figure out
why you and I are getting different results.  I wonder if there's some
other discrepancy in the anoncvs pull?

Anyway, if what you have up at
http://github.com/mhagander/pggit_migrate
is current, I can try to reproduce your results here.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] patch: Add JSON datatype to PostgreSQL (GSoC, WIP)

2010-09-17 Thread Joseph Adams
On Fri, Sep 17, 2010 at 8:32 AM, Itagaki Takahiro
 wrote:
> On Fri, Aug 13, 2010 at 7:33 PM, Joseph Adams
>  wrote:
>> Updated patch:  the JSON code has all been moved into core, so this
>> patch is now for a built-in data type.
>
> I have a question about the design of the JSON type. Why do we need to
> store the value in UTF8 encoding? It's true the RFC of JSON says the
> the encoding SHALL be encoded in Unicode, but I don't understand why
> we should reject other encodings.

Actually, the code in my original patch should work with any server
encoding in PostgreSQL.  However, internally, it operates in UTF-8 and
converts to/from the server encoding when necessary.  I did it this
way because the JSON code needs to handle Unicode escapes like
"\u266B", but there is no simple and efficient way (that I know of) to
convert single characters to/from the server encoding.

I noticed that in your new patch, you sidestepped the encoding issue
by simply storing strings in their encoded form (right?).  This is
nice and simple, but in the future, JSON tree conversions and updates
will still need to deal with the encoding issue somehow.

> As I said before, I'd like to propose only 3 features in the commitfest:
>  * TYPE json data type
>  * text to json: FUNCTION json_parse(text)
>  * json to text: FUNCTION json_stringify(json, whitelist, space)

Although casting from JSON to TEXT does "stringify" it in my original
patch, I think json_stringify would be much more useful.  In addition
to the formatting options, if the internal format of the JSON type
changes and no longer preserves original formatting, then the behavior
of the following would change:

$$"unnecessary\u0020escape" $$ :: JSON :: TEXT

json_stringify would be more predictable because it would re-encode
the whitespace (but not the \u0020, unless we went out of our way to
make it do that).

Also, json_parse is "unnecessary" if you allow casting from TEXT to
JSON (which my patch does), but I think having json_parse would be
more intuitive for the same reason you do.

Long story short: I like it :-)  If you're keeping track, features
from my patch not in the new code yet are:
 * Programmatically validating JSON ( json_validate() )
 * Getting the type of a JSON value ( json_type() )
 * Converting scalar values to/from JSON
 * Converting arrays to JSON
 * JSONPath


> JSONPath will be re-implemented on the basic functionalities in the
> subsequent commitfest. Do you have a plan to split your patch?
> Or, can I continue to develop my patch? If so, JSONPath needs
> to be adjusted to the new infrastructure.

I think your patch is on a better footing than mine, so maybe I should
start contributing to your code rather than the other way around.
Before the next commitfest, I could merge the testcases from my patch
in and identify parsing discrepancies (if any).  Afterward, I could
help merge the other features into the new JSON infrastructure.

I can't compile your initial patch against the latest checkout because
json_parser.h and json_scanner.h are missing.  Is there a more recent
patch, or could you update the patch so it compiles?   I'd like to
start tinkering with the new code.  Thanks!


Joey Adams

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Report: removing the inconsistencies in our CVS->git conversion

2010-09-17 Thread Magnus Hagander
On Fri, Sep 17, 2010 at 23:21, Tom Lane  wrote:
> Actually, the simplest way to handle this might be to just delete all
> five of those tags during the conversion, and then I'll put them back
> in the right places later when I add the other old-release tags.
> That way we won't have any tags getting moved after the repository is
> published.  (Or am I wrong about that being something to avoid?  But
> in any case we want to gc the manufactured commits for Release_2_0_0
> and Release_2_0.)

As long as nobody has cloned the repository, it's not a problem moving
them. But it *is* something you should generally avoid, so let's do
that :-)


-- 
 Magnus Hagander
 Me: http://www.hagander.net/
 Work: http://www.redpill-linpro.com/

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Report: removing the inconsistencies in our CVS->git conversion

2010-09-17 Thread Magnus Hagander
On Fri, Sep 17, 2010 at 23:01, Tom Lane  wrote:
> BTW, on the cleanup steps:
>
> # Remove bogus branches
> git branch -D unlabeled-1.44.2
> git branch -D unlabeled-1.51.2
> git branch -D unlabeled-1.59.2
> git branch -D unlabeled-1.87.2
> git branch -D unlabeled-1.90.2
>
> You should not need any of the above; I don't see those being generated
> anymore with the cleanup script in place.

I see them - at least the script didn't give an error when it ran, but
said it had deleted them.

> git branch -D ecpg_big_bison
>
> I don't agree with removing this.  It is a legitimate part of the
> project history.  Yeah, it was a workaround, but we've had workarounds
> in place for other broken software.

Ok. I was just going by Michaels email - I'll leave it in then.


> # Remove broken tags
> git tag -d SUPPORT
> git tag -d MANUAL_1_0
> git tag -d Release-1-6-0
>
> Also get rid of the "creation" tag, please.  Also, just for the record,
> these aren't really broken.  Maybe "Remove unwanted partial tags"?

Right. I didn't really pay much attention to the comments, since it's
just a temporary thing.
"creation" added to the list.


> In addition to the above, we're going to want to clean up the
> Release_2_0_0 and Release_2_0 tags, but I'm not sure if there's
> a reasonable way to script those when the commit SHA1's aren't frozen
> yet.  I can give you timestamps for the commits they should point at,
> but I lack the git-fu to convert that into a git tag command.

yeah, that's definitely easier to do after the fact.


> Similarly, we're going to want to repoint REL6_5, REL7_1, REL7_1_2
> tags at more appropriate places, but right now all I have for those is
> timestamps.  (I've also identified places to tag the other early
> releases, but those tags can certainly be added later.)

I like your downthread suggestion of removing them, and then putting
them back in manually later. I'll add them all to the delete list.

-- 
 Magnus Hagander
 Me: http://www.hagander.net/
 Work: http://www.redpill-linpro.com/

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Update comment for README.HOT

2010-09-17 Thread Bruce Momjian
Tom Lane wrote:
> Bruce Momjian  writes:
> > + This means that UPDATE, DELETE, and SELECT can trigger space
> > + reclamation, while INSERT ... VALUES cannot because it does not retrieve
> > + a row.
> 
> I don't believe that's correct.  It might have happened to work that way
> for you in a particular test.  It's certainly not something I'd document
> as being intended long-term behavior.

Well, I would like to document something about this because I was
surprised that when INSERT did not trigger a cleanup.  I realize we
might change the behavior but then we would update the file too,
hopefully.

How is the attached version using "often"?  I also clarified it is < 10%
free.

I found this while doing tests for a new MVCC talk I will be delivering
at PG West.

-- 
  Bruce Momjian  http://momjian.us
  EnterpriseDB http://enterprisedb.com

  + It's impossible for everything to be true. +
Index: src/backend/access/heap/README.HOT
===
RCS file: /cvsroot/pgsql/src/backend/access/heap/README.HOT,v
retrieving revision 1.6
diff -c -c -r1.6 README.HOT
*** src/backend/access/heap/README.HOT  23 Apr 2010 23:21:44 -  1.6
--- src/backend/access/heap/README.HOT  17 Sep 2010 21:21:56 -
***
*** 246,251 
--- 246,257 
  is arbitrarily capped at MaxHeapTuplesPerPage (the most tuples that
  could fit without HOT pruning).
  
+ Effectively, space reclamation happens during tuple retrieval when the
+ page is nearly full (<10% free) and a buffer cleanup lock can be
+ acquired.  This means that UPDATE, DELETE, and SELECT can trigger space
+ reclamation, but often not during INSERT ... VALUES because it does
+ not retrieve a row.
+ 
  
  VACUUM
  --

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Report: removing the inconsistencies in our CVS->git conversion

2010-09-17 Thread Tom Lane
I wrote:
> In addition to the above, we're going to want to clean up the
> Release_2_0_0 and Release_2_0 tags, but I'm not sure if there's
> a reasonable way to script those when the commit SHA1's aren't frozen
> yet.  I can give you timestamps for the commits they should point at,
> but I lack the git-fu to convert that into a git tag command.

> Similarly, we're going to want to repoint REL6_5, REL7_1, REL7_1_2
> tags at more appropriate places, but right now all I have for those is
> timestamps.  (I've also identified places to tag the other early
> releases, but those tags can certainly be added later.)

Actually, the simplest way to handle this might be to just delete all
five of those tags during the conversion, and then I'll put them back
in the right places later when I add the other old-release tags.
That way we won't have any tags getting moved after the repository is
published.  (Or am I wrong about that being something to avoid?  But
in any case we want to gc the manufactured commits for Release_2_0_0
and Release_2_0.)

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Update comment for README.HOT

2010-09-17 Thread Bruce Momjian
Thom Brown wrote:
> On 17 September 2010 20:52, Bruce Momjian  wrote:
> > I would like to apply the attached patch to README.HOT so clarify when
> > single-page cleanup happens, e.g. not during INSERT.
> >
> 
> "... when the page is nearly full (<10%) ..."
> 
> Shouldn't that be >90%?
> 
> "... while INSERT ... VALUES cannot because it does not retrieve a row."
> 
> Is this still true when it's used in conjunction with RETURNING?

I think returning might cause a clean --- I have not tested that.

-- 
  Bruce Momjian  http://momjian.us
  EnterpriseDB http://enterprisedb.com

  + It's impossible for everything to be true. +

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] compile/install of git

2010-09-17 Thread Bruce Momjian
FYI, I have compiled/installed git 1.7.3.rc2 on my BSD/OS 4.3.1 machine
with the attached minor changes.

-- 
  Bruce Momjian  http://momjian.us
  EnterpriseDB http://enterprisedb.com

  + It's impossible for everything to be true. +
*** ./git-1.7.3.rc2/Makefile.orig	Fri Sep 17 14:01:17 2010
--- ./git-1.7.3.rc2/Makefile	Fri Sep 17 14:36:57 2010
***
*** 1141,1146 
--- 1141,1152 
  endif
  endif
  
+ NO_NSEC = YesPlease
+ NO_PREAD = YesPlease
+ NEEDS_LIBGEN = 
+ NO_TCLTK = YesPlease
+ NO_PYTHON = YesPlease
+ 
  -include config.mak.autogen
  -include config.mak
  
***
*** 1244,1250 
  	PROGRAM_OBJS += daemon.o
  endif
  ifndef NO_OPENSSL
! 	OPENSSL_LIBSSL = -lssl
  	ifdef OPENSSLDIR
  		BASIC_CFLAGS += -I$(OPENSSLDIR)/include
  		OPENSSL_LINK = -L$(OPENSSLDIR)/$(lib) $(CC_LD_DYNPATH)$(OPENSSLDIR)/$(lib)
--- 1250,1256 
  	PROGRAM_OBJS += daemon.o
  endif
  ifndef NO_OPENSSL
! 	OPENSSL_LIBSSL = -lssl -lcrypto
  	ifdef OPENSSLDIR
  		BASIC_CFLAGS += -I$(OPENSSLDIR)/include
  		OPENSSL_LINK = -L$(OPENSSLDIR)/$(lib) $(CC_LD_DYNPATH)$(OPENSSLDIR)/$(lib)
***
*** 1260,1266 
  	OPENSSL_LIBSSL =
  endif
  ifdef NEEDS_SSL_WITH_CRYPTO
! 	LIB_4_CRYPTO = $(OPENSSL_LINK) -lcrypto -lssl
  else
  	LIB_4_CRYPTO = $(OPENSSL_LINK) -lcrypto
  endif
--- 1266,1272 
  	OPENSSL_LIBSSL =
  endif
  ifdef NEEDS_SSL_WITH_CRYPTO
! 	LIB_4_CRYPTO = $(OPENSSL_LINK) -lssl -lcrypto
  else
  	LIB_4_CRYPTO = $(OPENSSL_LINK) -lcrypto
  endif
***
*** 1274,1281 
  	EXTLIBS += $(ICONV_LINK) -liconv
  endif
  ifdef NEEDS_LIBGEN
! 	EXTLIBS += -lgen
  endif
  ifdef NEEDS_SOCKET
  	EXTLIBS += -lsocket
  endif
--- 1280,1289 
  	EXTLIBS += $(ICONV_LINK) -liconv
  endif
  ifdef NEEDS_LIBGEN
! #	EXTLIBS += -lgen
  endif
+ EXTLIBS += -lpoll
+ 
  ifdef NEEDS_SOCKET
  	EXTLIBS += -lsocket
  endif
***
*** 1498,1504 
  ifndef V
  	QUIET_CC   = @echo '   ' CC $@;
  	QUIET_AR   = @echo '   ' AR $@;
! 	QUIET_LINK = @echo '   ' LINK $@;
  	QUIET_BUILT_IN = @echo '   ' BUILTIN $@;
  	QUIET_GEN  = @echo '   ' GEN $@;
  	QUIET_LNCP = @echo '   ' LN/CP $@;
--- 1506,1513 
  ifndef V
  	QUIET_CC   = @echo '   ' CC $@;
  	QUIET_AR   = @echo '   ' AR $@;
! #	QUIET_LINK = @echo '   ' LINK $@;
! QUIET_LINK=
  	QUIET_BUILT_IN = @echo '   ' BUILTIN $@;
  	QUIET_GEN  = @echo '   ' GEN $@;
  	QUIET_LNCP = @echo '   ' LN/CP $@;
*** ./git-1.7.3.rc2/git-compat-util.h.orig	Fri Sep 17 12:25:09 2010
--- ./git-1.7.3.rc2/git-compat-util.h	Fri Sep 17 12:35:23 2010
***
*** 106,112 
  #include 
  #include 
  #include 
! #include 
  #if defined(__CYGWIN__)
  #undef _XOPEN_SOURCE
  #include 
--- 106,119 
  #include 
  #include 
  #include 
! 
! typedef u_int64_t uint64_t;
! typedef u_int32_t uint32_t;
! typedef u_int16_t uint16_t;
! typedef long int intptr_t;
! typedef unsigned long int uintptr_t;
! 
! //#include 
  #if defined(__CYGWIN__)
  #undef _XOPEN_SOURCE
  #include 

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Report: removing the inconsistencies in our CVS->git conversion

2010-09-17 Thread Tom Lane
BTW, on the cleanup steps:

# Remove bogus branches
git branch -D unlabeled-1.44.2
git branch -D unlabeled-1.51.2
git branch -D unlabeled-1.59.2
git branch -D unlabeled-1.87.2
git branch -D unlabeled-1.90.2

You should not need any of the above; I don't see those being generated
anymore with the cleanup script in place.

git branch -D ecpg_big_bison

I don't agree with removing this.  It is a legitimate part of the
project history.  Yeah, it was a workaround, but we've had workarounds
in place for other broken software.


# Remove broken tags
git tag -d SUPPORT
git tag -d MANUAL_1_0
git tag -d Release-1-6-0

Also get rid of the "creation" tag, please.  Also, just for the record,
these aren't really broken.  Maybe "Remove unwanted partial tags"?

In addition to the above, we're going to want to clean up the
Release_2_0_0 and Release_2_0 tags, but I'm not sure if there's
a reasonable way to script those when the commit SHA1's aren't frozen
yet.  I can give you timestamps for the commits they should point at,
but I lack the git-fu to convert that into a git tag command.

Similarly, we're going to want to repoint REL6_5, REL7_1, REL7_1_2
tags at more appropriate places, but right now all I have for those is
timestamps.  (I've also identified places to tag the other early
releases, but those tags can certainly be added later.)

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Progress indication prototype

2010-09-17 Thread Peter Eisentraut
On tor, 2010-09-16 at 19:14 -0400, Robert Haas wrote:
> I think that there should be a function which returns just this one
> piece of data and is not automatically called as part of select * from
> pg_stat_activity.  Then we could eventually decide to give backends a
> way to know if that function had been invoked on them and how
> recently.

Displaying this as part of pg_stat_activity is completely trivial: it's
just displaying the value of a float variable.

It seems you are advocating a completely different architecture, where
someone can find out on demand what the progress or status of another
session is, without that other session having known about that request
before it started its current command.  But that seems pretty outlandish
to me, and I would ask for more details on what you have in mind.


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] bg worker: general purpose requirements

2010-09-17 Thread Markus Wanner

Robert,

On 09/17/2010 05:52 PM, Robert Haas wrote:

Technically, you could start an autonomous transaction from within an
autonomous transaction, so I don't think there's a hard maximum of one
per normal backend.  However, I agree that the expected case is to not
have very many.


Thanks for pointing that out. I somehow knew that was wrong..


I guess it depends on what your goals are.


Agreed.


If you're optimizing for
ability to respond quickly to a sudden load, keeping idle backends
will probably win even when the number of them you're keeping around
is fairly high.  If you're optimizing for minimal overall resource
consumption, though, you'll not be as happy about that.


What resources are we talking about here? Are idle backends really that 
resource hungry? My feeling so far has been that idle processes are 
relatively cheap (i.e. some 100 idle processes shouldn't hurt on a 
modern server).



What I'm
struggling to understand is this: if there aren't any preforked
workers around when the load hits, how much does it slow things down?


As the startup code is pretty much the same as for the current 
avlauncher, the coordinator can only request one bgworker at a time.


This means the signal needs to reach the postmaster, which then forks a 
bgworker process. That new process starts up, connects to the requested 
database and then sends an imessage to the coordinator to register. Only 
after having received that registration, the coordinator can request 
another bgworker (note that this is a one-overall limitation, not per 
database).


I haven't measured the actual time it takes, but given the use case of a 
connection pool, I so far thought it's obvious that this process takes 
too long.


(It's exactly what apache pre-fork does, no? Is anybody concerned about 
the idle processes there? Or do they consume much less resources?)



I would have thought that a few seconds to ramp up to speed after an
extended idle period (5 minutes, say) would be acceptable for most of
the applications you mention.


A few seconds? That might be sufficient for autovacuum, but most queries 
are completed in less that one second. So for parallel querying, 
autonomous transactions and Postgres-R, I certainly don't think that a 
few seconds are reasonable. Especially considering the cost of idle 
backends.



Is the ramp-up time longer than that,
or is even that much delay unacceptable for Postgres-R, or is there
some other aspect to the problem I'm failing to grasp?  I can tell you
have some experience tuning this so I'd like to try to understand
where you're coming from.


I didn't ever compare to a max_spare_background_workers = 0 
configuration, so I don't have any hard numbers, sorry.



I think this is an interesting example, and worth some further
thought.  I guess I don't really understand how Postgres-R uses these
bgworkers.


The given example doesn't only apply to Postgres-R. But with fewer 
bgworkers in total, you are more likely to want to use them all for one 
database, yes.



Are you replicating one transaction at a time, or how does
the data get sliced up?


Yes, one transaction at a time. One transaction per backend (bgworker). 
On a cluster with n nodes that has only performs writing transactions, 
avg. at a rate of m concurrent transactions/node, you ideally end up 
having m normal backends and (n-1) * m bgworkers that concurrently apply 
the remote transactions.



I remember you mentioning
sync/async/eager/other replication strategies previously - do you have
a pointer to some good reading on that topic?


Postgres-R mainly is eager multi-master replication. www.postgres-r.org 
has some links, most up-to-date my concept paper:

http://www.postgres-r.org/downloads/concept.pdf


That seems like it would be useful, too.


Okay, will try to come up with something, soon(ish).

Thank you for your feedback and constructive criticism.

Regards

Markus Wanner

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Report: removing the inconsistencies in our CVS->git conversion

2010-09-17 Thread Magnus Hagander
On Fri, Sep 17, 2010 at 20:37, Magnus Hagander  wrote:
> On Fri, Sep 17, 2010 at 20:32, Tom Lane  wrote:
>> I wrote:
>>> Something in the rsync process thinks that ignoring subdirectories
>>> named "core" is a good idea.  I'm a bit surprised nobody ever noticed
>>> these were missing from anoncvs before ...
>>
>> That's because they aren't.  It's the -C switch in your rsync call
>> that's at fault.  (And this demonstrates why comparing against checkouts
>> from the same CVS repo isn't an end-to-end test :-()
>
> Yeah, I noticed that too. Re-running now.

Ok, re-run off a correct rsync pushed. How does it look now?

-- 
 Magnus Hagander
 Me: http://www.hagander.net/
 Work: http://www.redpill-linpro.com/

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] ALTER TYPE extensions

2010-09-17 Thread Peter Eisentraut
On fre, 2010-09-17 at 18:15 +0900, KaiGai Kohei wrote:
> * At the ATPrepAddColumn(), it seems to me someone added a check
>   to prevent adding a new column to typed table, as you try to
>   add in this patch.

Good catch.  Redundant checks removed.

> * At the ATPrepAlterColumnType(), you enclosed an existing code
>   block by "if (tab->relkind == RELKIND_RELATION) { ... }", but
>   it is not indented to appropriate level.

Yeah, just to keep the patch small. ;-)

> * RENAME ATTRIBUTE ... TO ...
> 
>   Even if the composite type to be altered is in use, we can alter
>   the name of attribute. Is it intended?

No.  Added a check for it now.

> BTW, is there any requirement from SQL standard about behavior
> when we try to add/drop an attribute of composite type in use?
> This patch always prohibit it, using find_typed_table_dependencies()
> and find_composite_type_dependencies().
> However, it seems to me not difficult to alter columns of typed
> tables subsequent with this ALTER TYPE, although it might be
> not easy to alter definitions of embedded composite type already
> in use.
> Of course, it may be our future works. If so, it's good.

The prohibition on altering types that are used in typed tables is
actually from the SQL standard.  But for now it's just because it's not
implemented; I plan to work on extending that later.

The restriction by find_composite_type_dependencies() was already there
for altering tables, and I just kept it the same for now.


New patch attached.

diff --git a/doc/src/sgml/ref/alter_type.sgml b/doc/src/sgml/ref/alter_type.sgml
index e2fec32..ade300f 100644
--- a/doc/src/sgml/ref/alter_type.sgml
+++ b/doc/src/sgml/ref/alter_type.sgml
@@ -23,9 +23,17 @@ PostgreSQL documentation
 
  
 
-ALTER TYPE name RENAME TO new_name
+ALTER TYPE name action [, ... ]
 ALTER TYPE name OWNER TO new_owner 
+ALTER TYPE name RENAME ATTRIBUTE attribute_name TO new_attribute_name
+ALTER TYPE name RENAME TO new_name
 ALTER TYPE name SET SCHEMA new_schema
+
+where action is one of:
+
+ADD ATTRIBUTE attribute_name data_type
+DROP ATTRIBUTE [ IF EXISTS ] attribute_name
+ALTER ATTRIBUTE attribute_name [ SET DATA ] TYPE data_type
 
  
 
@@ -34,6 +42,76 @@ ALTER TYPE name SET SCHEMA 
ALTER TYPE changes the definition of an existing type.
+   There are several subforms:
+
+  
+   
+ADD ATTRIBUTE
+
+ 
+  This form adds a new attribute to a composite type, using the same syntax as
+  .
+ 
+
+   
+
+   
+DROP ATTRIBUTE [ IF EXISTS ]
+
+ 
+  This form drops an attribute from a composite type.
+  If IF EXISTS is specified and the attribute
+  does not exist, no error is thrown. In this case a notice
+  is issued instead.
+ 
+
+   
+
+   
+SET DATA TYPE
+
+ 
+  This form changes the type of an attribute of a composite type.
+ 
+
+   
+
+   
+OWNER
+
+ 
+  This form changes the owner of the type.
+ 
+
+   
+
+   
+RENAME
+
+ 
+  This form changes the name of the type or the name of an
+  individual attribute of a composite type.
+ 
+
+   
+
+   
+SET SCHEMA
+
+ 
+  This form moves the type into another schema.
+ 
+
+   
+  
+  
+
+  
+   The ADD ATTRIBUTE, DROP
+   ATTRIBUTE, and ALTER ATTRIBUTE actions
+   can be combined into a list of multiple alterations to apply in
+   parallel.  For example, it is possible to add several attributes
+   and/or alter the type of several attributes in a single command.
   
 
   
@@ -90,6 +168,34 @@ ALTER TYPE name SET SCHEMA 
  
 
+ 
+  attribute_name
+  
+   
+The name of the attribute to add, alter, or drop.
+   
+  
+ 
+
+ 
+  new_attribute_name
+  
+   
+The new name of the attribute begin renamed.
+   
+  
+ 
+
+ 
+  data_type
+  
+   
+The data type of the attribute to add, or the new type of the
+attribute to alter.
+   
+  
+ 
+
 

   
@@ -119,14 +225,21 @@ ALTER TYPE email OWNER TO joe;
 ALTER TYPE email SET SCHEMA customers;
 
   
+
+  
+   To add a new attribute to a type:
+
+ALTER TYPE compfoo ADD ATTRIBUTE f3 int;
+
+  
  
 
  
   Compatibility
 
   
-   There is no ALTER TYPE statement in the SQL
-   standard.
+   The variants to add and drop attributes are part of the SQL
+   standard; the other variants are PostgreSQL extensions.
   
  
 
diff --git a/src/backend/commands/alter.c b/src/backend/commands/alter.c
index 788a4db..9da2690 100644
--- a/src/backend/commands/alter.c
+++ b/src/backend/commands/alter.c
@@ -89,6 +89,7 @@ ExecRenameStmt(RenameStmt *stmt)
 		case OBJECT_VIEW:
 		case OBJECT_INDEX:
 		case OBJECT_COLUMN:
+		case OBJECT_ATTRIBUTE:
 		case OBJECT_TRIGGER:
 			{
 Oid			relid;
@@ -123,6 +124,7 @@ ExecRenameStmt(RenameStmt *stmt)
 			break;
 		}
 	case OBJECT_COLUMN:
+	case OBJECT_ATTRIBUTE:
 		renameatt(relid,
 

Re: [HACKERS] Update comment for README.HOT

2010-09-17 Thread Tom Lane
Bruce Momjian  writes:
> + This means that UPDATE, DELETE, and SELECT can trigger space
> + reclamation, while INSERT ... VALUES cannot because it does not retrieve
> + a row.

I don't believe that's correct.  It might have happened to work that way
for you in a particular test.  It's certainly not something I'd document
as being intended long-term behavior.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Update comment for README.HOT

2010-09-17 Thread Thom Brown
On 17 September 2010 20:52, Bruce Momjian  wrote:
> I would like to apply the attached patch to README.HOT so clarify when
> single-page cleanup happens, e.g. not during INSERT.
>

"... when the page is nearly full (<10%) ..."

Shouldn't that be >90%?

"... while INSERT ... VALUES cannot because it does not retrieve a row."

Is this still true when it's used in conjunction with RETURNING?

-- 
Thom Brown
Twitter: @darkixion
IRC (freenode): dark_ixion
Registered Linux user: #516935

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] Update comment for README.HOT

2010-09-17 Thread Bruce Momjian
I would like to apply the attached patch to README.HOT so clarify when
single-page cleanup happens, e.g. not during INSERT.

-- 
  Bruce Momjian  http://momjian.us
  EnterpriseDB http://enterprisedb.com

  + It's impossible for everything to be true. +
Index: src/backend/access/heap/README.HOT
===
RCS file: /cvsroot/pgsql/src/backend/access/heap/README.HOT,v
retrieving revision 1.6
diff -c -c -r1.6 README.HOT
*** src/backend/access/heap/README.HOT	23 Apr 2010 23:21:44 -	1.6
--- src/backend/access/heap/README.HOT	17 Sep 2010 19:48:16 -
***
*** 246,251 
--- 246,257 
  is arbitrarily capped at MaxHeapTuplesPerPage (the most tuples that
  could fit without HOT pruning).
  
+ Effectively, space reclamation happens during tuple retrieval when the
+ page is nearly full (<10%) and a buffer cleanup lock can be acquired. 
+ This means that UPDATE, DELETE, and SELECT can trigger space
+ reclamation, while INSERT ... VALUES cannot because it does not retrieve
+ a row.
+ 
  
  VACUUM
  --

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Configuring synchronous replication

2010-09-17 Thread Dimitri Fontaine
Simon Riggs  writes:
> On Fri, 2010-09-17 at 21:20 +0900, Fujii Masao wrote:
>> What synchronization level does each combination of sync_replication
>> and sync_replication_service lead to?
>
> There are only 4 possible outcomes. There is no combination, so we don't
> need a table like that above.
>
> The "service" specifies the highest request type available from that
> specific standby. If someone requests a higher service than is currently
> offered by this standby, they will either 
> a) get that service from another standby that does offer that level
> b) automatically downgrade the sync rep mode to the highest available.

I like the a) part, I can't say the same about the b) part. There's no
reason to accept to COMMIT a transaction when the requested durability
is known not to have been reached, unless the user said so.

> For example, if you request recv but there is only one standby and it
> only offers async, then you get downgraded to async.

If so you choose, but with a net slowdown as you're now reaching the
timeout for each transaction, with what I have in mind, and I don't see
how you can avoid that. Even if you setup the replication from the
master, you still can mess it up the same way, right?

Regards,
-- 
dim

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Configuring synchronous replication

2010-09-17 Thread Dimitri Fontaine
Simon Riggs  writes:
> On Fri, 2010-09-17 at 21:20 +0900, Fujii Masao wrote:
>> According to what I heard, some people want to guarantee that all the
>> transactions are *always* written in *all* the synchronous standbys.
>
> You don't need standby registration at all. You can do that with a
> single parameter, already proposed:
>
> quorum_commit = N.

I think you also need another parameter to control the behavior upon
timeout. You received less than N votes, now what? You're current idea
seems to be COMMIT, Aidan says ROLLBACK, and I say that's to be a GUC
set at the transaction level.

As far as registration goes, I see no harm to have the master maintain a
list of known standby systems, of course, it's just maintaining that
list from the master that I don't understand the use case for.

Regards,
-- 
dim

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Report: removing the inconsistencies in our CVS->git conversion

2010-09-17 Thread Tom Lane
Magnus Hagander  writes:
> Will not doing the backbranches make it harder to backport patches?
> Probably shouldn't, unless you're changing the very first line of the
> file, right?

The $PostgreSQL$ lines haven't been a backporting problem in the past,
so I don't see why they'd be one now.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Report: removing the inconsistencies in our CVS->git conversion

2010-09-17 Thread Magnus Hagander
On Fri, Sep 17, 2010 at 20:49, Tom Lane  wrote:
> I looked a bit more at your pggit_migrate stuff.  I'm not terribly happy
> with the proposed clean_keywords.pl script.  I'd like it to reduce the
> $PostgreSQL$ thingies to the full pathname of the file, rather than
> try to remove all trace of them, eg
>  *        $PostgreSQL: pgsql/src/port/unsetenv.c,v 1.12 2010/09/07 14:10:30 
> momjian Exp $
> becomes
>  *        src/port/unsetenv.c
>
> This would then be followed up by moving those pathname comments to
> somewhere more sensible.  I don't think that part can be managed with
> a script like this, but leaving the data in place will make it easier
> to do the moving.  Some places, like the .sgml files, won't need any
> additional changing to get to where I would like to be.
>
> Also, I'd be inclined to make these changes only in master, not in the
> back branches.  We don't for example run pg_indent against back branches.

We discussed that before, and the consensus then was to do that, and
backport it to *active* backbranches. Just to not demove the lines,
but replace them with an empty one so that line numbers would stay the
same.

But it's not too late to change that again... Shouldn't be too hard to
change the script to change it the way you suggest, and it's
*certainly* not hard to restrict it to just to the master branch...

Will not doing the backbranches make it harder to backport patches?
Probably shouldn't, unless you're changing the very first line of the
file, right?

-- 
 Magnus Hagander
 Me: http://www.hagander.net/
 Work: http://www.redpill-linpro.com/

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Report: removing the inconsistencies in our CVS->git conversion

2010-09-17 Thread Andrew Dunstan



On 09/17/2010 02:37 PM, Magnus Hagander wrote:

On Fri, Sep 17, 2010 at 20:32, Tom Lane  wrote:

I wrote:

Something in the rsync process thinks that ignoring subdirectories
named "core" is a good idea.  I'm a bit surprised nobody ever noticed
these were missing from anoncvs before ...

That's because they aren't.  It's the -C switch in your rsync call
that's at fault.  (And this demonstrates why comparing against checkouts
from the same CVS repo isn't an end-to-end test :-()

Yeah, I noticed that too. Re-running now.

FWIW, I got thecommand off the wiki's recommendations for how to use rsync




This is what I have been using for a very long time:

   rsync -avzH --delete --exclude-from=/home/cvsmirror/pg-exclude
   anoncvs.postgresql.org::pgsql-cvs /home/cvsmirror/pg


The exclude file contains:

   /sup/
   /CVSROOT/loginfo*
   /CVSROOT/commitinfo*
   /CVSROOT/config*
   /CVSROOT/passwd
   /CVSROOT/history


cheers

andrew


Re: [HACKERS] Report: removing the inconsistencies in our CVS->git conversion

2010-09-17 Thread Tom Lane
I looked a bit more at your pggit_migrate stuff.  I'm not terribly happy
with the proposed clean_keywords.pl script.  I'd like it to reduce the
$PostgreSQL$ thingies to the full pathname of the file, rather than
try to remove all trace of them, eg
 *$PostgreSQL: pgsql/src/port/unsetenv.c,v 1.12 2010/09/07 14:10:30 
momjian Exp $
becomes
 *src/port/unsetenv.c

This would then be followed up by moving those pathname comments to
somewhere more sensible.  I don't think that part can be managed with
a script like this, but leaving the data in place will make it easier
to do the moving.  Some places, like the .sgml files, won't need any
additional changing to get to where I would like to be.

Also, I'd be inclined to make these changes only in master, not in the
back branches.  We don't for example run pg_indent against back branches.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Report: removing the inconsistencies in our CVS->git conversion

2010-09-17 Thread Magnus Hagander
On Fri, Sep 17, 2010 at 20:32, Tom Lane  wrote:
> I wrote:
>> Something in the rsync process thinks that ignoring subdirectories
>> named "core" is a good idea.  I'm a bit surprised nobody ever noticed
>> these were missing from anoncvs before ...
>
> That's because they aren't.  It's the -C switch in your rsync call
> that's at fault.  (And this demonstrates why comparing against checkouts
> from the same CVS repo isn't an end-to-end test :-()

Yeah, I noticed that too. Re-running now.

FWIW, I got thecommand off the wiki's recommendations for how to use rsync


-- 
 Magnus Hagander
 Me: http://www.hagander.net/
 Work: http://www.redpill-linpro.com/

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Report: removing the inconsistencies in our CVS->git conversion

2010-09-17 Thread Tom Lane
I wrote:
> Something in the rsync process thinks that ignoring subdirectories
> named "core" is a good idea.  I'm a bit surprised nobody ever noticed
> these were missing from anoncvs before ...

That's because they aren't.  It's the -C switch in your rsync call
that's at fault.  (And this demonstrates why comparing against checkouts
from the same CVS repo isn't an end-to-end test :-()

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Report: removing the inconsistencies in our CVS->git conversion

2010-09-17 Thread Magnus Hagander
On Fri, Sep 17, 2010 at 20:24, Tom Lane  wrote:
> I wrote:
>> Hmm.  I didn't try rsync'ing from anoncvs ... I logged into the master
>> and tar'd up the /cvsroot directory ;-).  I wonder if there's something
>> wrong with the anoncvs copy of that subdirectory?  Will do the rsync
>> and compare.
>
> Doh:
>
> Only in myrepo/pgsql/contrib/retep/uk/org/retep/xml: core
> Only in myrepo/pgsql/src/interfaces/jdbc/org/postgresql: core
>
> Something in the rsync process thinks that ignoring subdirectories
> named "core" is a good idea.  I'm a bit surprised nobody ever noticed
> these were missing from anoncvs before ...

Oh FFS..

It's the adding of -C to the commandline to rsync. And it's not added
when syncing to anoncvs - it's there on anoncvs.


-- 
 Magnus Hagander
 Me: http://www.hagander.net/
 Work: http://www.redpill-linpro.com/

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Report: removing the inconsistencies in our CVS->git conversion

2010-09-17 Thread Tom Lane
I wrote:
> Hmm.  I didn't try rsync'ing from anoncvs ... I logged into the master
> and tar'd up the /cvsroot directory ;-).  I wonder if there's something
> wrong with the anoncvs copy of that subdirectory?  Will do the rsync
> and compare.

Doh:

Only in myrepo/pgsql/contrib/retep/uk/org/retep/xml: core
Only in myrepo/pgsql/src/interfaces/jdbc/org/postgresql: core

Something in the rsync process thinks that ignoring subdirectories
named "core" is a good idea.  I'm a bit surprised nobody ever noticed
these were missing from anoncvs before ...

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Report: removing the inconsistencies in our CVS->git conversion

2010-09-17 Thread Tom Lane
Magnus Hagander  writes:
> Just to confirm, you ran your patch against current cvs, right? So you
> also got the "hunk succeeded at offset 1 line" a whole bunch of times?
> Then it's not that that's broken.

Right, the patch still applies fine, it's just off by a line or so in
many places (probably because of the new REL9_0_0 tags).

> The script I've run is on http://github.com/mhagander/pggit_migrate -
> it's the "migrate_cvs.sh" script. The repository_fixups script is a
> direct import of yours except I added a "set -e" at the start.

Hmm.  I didn't try rsync'ing from anoncvs ... I logged into the master
and tar'd up the /cvsroot directory ;-).  I wonder if there's something
wrong with the anoncvs copy of that subdirectory?  Will do the rsync
and compare.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Make CLUSTER VERBOSE more verbose

2010-09-17 Thread Abhijit Menon-Sen
At 2010-09-16 21:22:54 +0900, itagaki.takah...@gmail.com wrote:
>
> Please read the thread. The patch is intended to be applied after
> "sequence scan + sort for CLUSTER" patch.

Sorry. I missed that. The patch looks fine, then.

-- ams

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Report: removing the inconsistencies in our CVS->git conversion

2010-09-17 Thread Magnus Hagander
On Fri, Sep 17, 2010 at 19:20, Tom Lane  wrote:
> Magnus Hagander  writes:
>> On Fri, Sep 17, 2010 at 18:28, Robert Haas  wrote:
>>> Sorry, I must not have cleaned out the old state properly.
>
>> Turns out I did th esame thing from my box to the repo on git.postgresql.org.
>
>> So I've now wiped that repository and re-pushed mine. Can you give it
>> another check? (from a fresh clone)
>
> I re-cloned but got the exact same state as before --- jdbc still wonky.

That's weird.

Just to confirm, you ran your patch against current cvs, right? So you
also got the "hunk succeeded at offset 1 line" a whole bunch of times?
Then it's not that that's broken.

And I'm on cvs2git revision 5270.

The script I've run is on http://github.com/mhagander/pggit_migrate -
it's the "migrate_cvs.sh" script. The repository_fixups script is a
direct import of yours except I added a "set -e" at the start.

(the version pushed hasn't had the git gc step run, but that's the
only one that differs)

-- 
 Magnus Hagander
 Me: http://www.hagander.net/
 Work: http://www.redpill-linpro.com/

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] system catalogs page

2010-09-17 Thread Tom Lane
Robert Haas  writes:
> I was just noticing that, on this page here:
> http://www.postgresql.org/docs/9.0/static/catalogs.html

> ...all of the catalogs are in alphabetical order, except for
> pg_db_role_setting, which is filed in the place in alphabetical order
> where it would belong if the name were spelled pg_setting.

> Is there any reason I shouldn't go fix this?

Sounds like a mistake to me ...

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] system catalogs page

2010-09-17 Thread Thom Brown
On 17 September 2010 18:17, Robert Haas  wrote:
> I was just noticing that, on this page here:
>
> http://www.postgresql.org/docs/9.0/static/catalogs.html
>
> ...all of the catalogs are in alphabetical order, except for
> pg_db_role_setting, which is filed in the place in alphabetical order
> where it would belong if the name were spelled pg_setting.

or pg_role_setting

> Is there any reason I shouldn't go fix this?

Looks wrong to me, so I'd say should be fixed.

-- 
Thom Brown
Twitter: @darkixion
IRC (freenode): dark_ixion
Registered Linux user: #516935

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Report: removing the inconsistencies in our CVS->git conversion

2010-09-17 Thread Tom Lane
Magnus Hagander  writes:
> On Fri, Sep 17, 2010 at 18:28, Robert Haas  wrote:
>> Sorry, I must not have cleaned out the old state properly.

> Turns out I did th esame thing from my box to the repo on git.postgresql.org.

> So I've now wiped that repository and re-pushed mine. Can you give it
> another check? (from a fresh clone)

I re-cloned but got the exact same state as before --- jdbc still wonky.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] system catalogs page

2010-09-17 Thread Robert Haas
I was just noticing that, on this page here:

http://www.postgresql.org/docs/9.0/static/catalogs.html

...all of the catalogs are in alphabetical order, except for
pg_db_role_setting, which is filed in the place in alphabetical order
where it would belong if the name were spelled pg_setting.

Is there any reason I shouldn't go fix this?

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise Postgres Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Report: removing the inconsistencies in our CVS->git conversion

2010-09-17 Thread Magnus Hagander
On Fri, Sep 17, 2010 at 18:28, Robert Haas  wrote:
> On Fri, Sep 17, 2010 at 11:55 AM, Robert Haas  wrote:
>> On Fri, Sep 17, 2010 at 11:39 AM, Tom Lane  wrote:
>>> Robert Haas  writes:
 Magnus posted an updated conversion this morning.
>>>
 http://git.postgresql.org/gitweb?p=postgresql-migration.git;a=summary
>>>
 Evidently, however, he didn't do the same things you did, because
 there are DEFINITELY more than 9 manufactured commits in this one.
>>>
>>> Um ... I just did
>>>        git clone git://git.postgresql.org/git/postgresql-migration.git
>>> and I only see nine.  It's got some *other* problems though; compared
>>> to a conversion I just finished locally, it's missing a whole lot of
>>> history for some of the old jdbc files.
>>>
>>> Is there any possibility that "git clone" isn't very trustworthy?
>>> It's a bit scary that we don't see identical views of this repository.
>>
>> *scratches head*
>>
>> I did a git-fetch into an existing copy of the old contents of that
>> repository, rather than a fresh clone.  Let me nuke it and start over.
>
> OK, the fresh clone does in fact show just 9 manufactured commits.
> Sorry, I must not have cleaned out the old state properly.

Turns out I did th esame thing from my box to the repo on git.postgresql.org.

So I've now wiped that repository and re-pushed mine. Can you give it
another check? (from a fresh clone)

-- 
 Magnus Hagander
 Me: http://www.hagander.net/
 Work: http://www.redpill-linpro.com/

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Report: removing the inconsistencies in our CVS->git conversion

2010-09-17 Thread Tom Lane
Robert Haas  writes:
>> On Fri, Sep 17, 2010 at 11:39 AM, Tom Lane  wrote:
>>> and I only see nine.  It's got some *other* problems though; compared
>>> to a conversion I just finished locally, it's missing a whole lot of
>>> history for some of the old jdbc files.

> What's the problem with the old JDBC files?

This is what I sent to Magnus off-list:

There seems to be something wrong with this :-(.  I pulled this down
and compared the output of "git log --all --source --name-status"
to what I got from a fresh conversion of my own.  There seems to be
a substantial loss of history around some of the jdbc files.  See
attached diffs from my log to yours.

regards, tom lane


*** revhist.git17   Fri Sep 17 10:44:50 2010
--- revhist.gitmha  Fri Sep 17 11:24:03 2010
***
*** 236673,236679 
  
  From Donald Fraser.
  
- M src/interfaces/jdbc/org/postgresql/core/QueryExecutor.java
  M src/interfaces/jdbc/org/postgresql/fastpath/Fastpath.java
  M src/interfaces/jdbc/org/postgresql/jdbc1/AbstractJdbc1Connection.java
  
--- 236673,236678 
***
*** 249776,249786 
jdbc/org/postgresql/test/jdbc2/ServerCursorTest.java
  
  M src/interfaces/jdbc/org/postgresql/Driver.java.in
- M src/interfaces/jdbc/org/postgresql/core/BaseResultSet.java
- M src/interfaces/jdbc/org/postgresql/core/BaseStatement.java
- M src/interfaces/jdbc/org/postgresql/core/Field.java
- M src/interfaces/jdbc/org/postgresql/core/PGStream.java
- M src/interfaces/jdbc/org/postgresql/core/QueryExecutor.java
  M 
src/interfaces/jdbc/org/postgresql/jdbc1/AbstractJdbc1DatabaseMetaData.java
  M src/interfaces/jdbc/org/postgresql/jdbc1/AbstractJdbc1ResultSet.java
  M src/interfaces/jdbc/org/postgresql/jdbc1/AbstractJdbc1Statement.java
--- 249775,249780 
***
*** 254896,254902 
  
  Per report from Hans Nather.
  
- M src/interfaces/jdbc/org/postgresql/core/QueryExecutor.java
  M src/interfaces/jdbc/org/postgresql/fastpath/Fastpath.java
  M src/interfaces/jdbc/org/postgresql/test/jdbc2/Jdbc2TestSuite.java
  A src/interfaces/jdbc/org/postgresql/test/jdbc2/NotifyTest.java
--- 254890,254895 
***
*** 256307,256321 
  D src/interfaces/jdbc/org/postgresql/PGNotification.java
  D src/interfaces/jdbc/org/postgresql/PGRefCursorResultSet.java
  D src/interfaces/jdbc/org/postgresql/PGStatement.java
- D src/interfaces/jdbc/org/postgresql/core/BaseConnection.java
- D src/interfaces/jdbc/org/postgresql/core/BaseResultSet.java
- D src/interfaces/jdbc/org/postgresql/core/BaseStatement.java
- D src/interfaces/jdbc/org/postgresql/core/Encoding.java
- D src/interfaces/jdbc/org/postgresql/core/Field.java
- D src/interfaces/jdbc/org/postgresql/core/Notification.java
- D src/interfaces/jdbc/org/postgresql/core/PGStream.java
- D src/interfaces/jdbc/org/postgresql/core/QueryExecutor.java
- D src/interfaces/jdbc/org/postgresql/core/StartupPacket.java
  D src/interfaces/jdbc/org/postgresql/errors.properties
  D src/interfaces/jdbc/org/postgresql/errors_de.properties
  D src/interfaces/jdbc/org/postgresql/errors_fr.properties
--- 256300,256305 
***
*** 261723,261729 
  M src/interfaces/jdbc/example/corba/StockServer.java
  M src/interfaces/jdbc/example/corba/stock.idl
  M src/interfaces/jdbc/example/corba/stock.sql
- M src/interfaces/jdbc/org/postgresql/core/StartupPacket.java
  M src/interfaces/jdbc/org/postgresql/test/jdbc2/BlobTest.java
  M src/interfaces/jdbc/org/postgresql/test/jdbc2/ConnectionTest.java
  M src/interfaces/jdbc/org/postgresql/test/jdbc2/DatabaseMetaDataTest.java
--- 261707,261712 
***
*** 262635,262648 
  M src/interfaces/jdbc/org/postgresql/PGNotification.java
  M src/interfaces/jdbc/org/postgresql/PGRefCursorResultSet.java
  M src/interfaces/jdbc/org/postgresql/PGStatement.java
- M src/interfaces/jdbc/org/postgresql/core/BaseConnection.java
- M src/interfaces/jdbc/org/postgresql/core/BaseResultSet.java
- M src/interfaces/jdbc/org/postgresql/core/BaseStatement.java
- M src/interfaces/jdbc/org/postgresql/core/Encoding.java
- M src/interfaces/jdbc/org/postgresql/core/Field.java
- M src/interfaces/jdbc/org/postgresql/core/Notification.java
- M src/interfaces/jdbc/org/postgresql/core/PGStream.java
- M src/interfaces/jdbc/org/postgresql/core/QueryExecutor.java
  M src/interfaces/jdbc/org/postgresql/errors_de.properties
  M src/interfaces/jdbc/org/postgresql/fastpath/Fastpath.java
  M src/interfaces/jdbc/org/postgresql/fastpath/FastpathArg.java
--- 262618,262623 
***
*** 266110,266117 
  
  Patches from Oliver Jowett to fix CursorFetchTest, 7.4 now does not 
automatically delete cursors
  
- M src/interfaces/jdbc/org/postgresql/core/BaseConnection.java
- M src/interfaces/j

Re: [HACKERS] Report: removing the inconsistencies in our CVS->git conversion

2010-09-17 Thread Robert Haas
On Fri, Sep 17, 2010 at 11:55 AM, Robert Haas  wrote:
> On Fri, Sep 17, 2010 at 11:39 AM, Tom Lane  wrote:
>> Robert Haas  writes:
>>> Magnus posted an updated conversion this morning.
>>
>>> http://git.postgresql.org/gitweb?p=postgresql-migration.git;a=summary
>>
>>> Evidently, however, he didn't do the same things you did, because
>>> there are DEFINITELY more than 9 manufactured commits in this one.
>>
>> Um ... I just did
>>        git clone git://git.postgresql.org/git/postgresql-migration.git
>> and I only see nine.  It's got some *other* problems though; compared
>> to a conversion I just finished locally, it's missing a whole lot of
>> history for some of the old jdbc files.
>>
>> Is there any possibility that "git clone" isn't very trustworthy?
>> It's a bit scary that we don't see identical views of this repository.
>
> *scratches head*
>
> I did a git-fetch into an existing copy of the old contents of that
> repository, rather than a fresh clone.  Let me nuke it and start over.

OK, the fresh clone does in fact show just 9 manufactured commits.
Sorry, I must not have cleaned out the old state properly.

What's the problem with the old JDBC files?

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise Postgres Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] CVS to Git conversion: second try Monday

2010-09-17 Thread Tom Lane
We will take another run at converting our CVS repository to Git on
Monday (the 20th).  Hopefully this won't interfere too much with
the commitfest, since most work will still be on reviewing rather than
actual committing at that point.

As before, there will be a "quiet time" to let interested people capture
final copies of the CVS repository before access is turned off.  Please
do not make any commits into CVS after 1400 UTC (10AM EDT, 7AM PDT) on
Monday.  ssh access to the master repository will be turned off at 1600
UTC or shortly after.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Configuring synchronous replication

2010-09-17 Thread Jesper Krogh

On 2010-09-17 10:09, Heikki Linnakangas wrote:

 I think it makes most sense to set sync vs. async in the master, and
 the level of synchronicity in the slave. Although I have sympathy
 for the argument that it's simpler if you configure it all from the
 master side as well.


Just a comment as a sysadmin, It would be hugely beneficial if the
master and slaves all was able to run from the "exact same" configuration
file. This would leave out any doubt of the configuration of the 
"complete cluster"

in terms of debugging. Slave would be able to just "copy" over the masters
configuration, etc. etc.

I dont know if it is doable or has any huge backsides.

--
Jesper


Re: [HACKERS] Report: removing the inconsistencies in our CVS->git conversion

2010-09-17 Thread Andrew Dunstan



On 09/17/2010 11:39 AM, Tom Lane wrote:

  Is there any possibility that "git clone" isn't very trustworthy?
It's a bit scary that we don't see identical views of this repository.


I should have thought that very unlikely.

cheers

andrew

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Configuring synchronous replication

2010-09-17 Thread Simon Riggs
On Fri, 2010-09-17 at 16:09 +0300, Heikki Linnakangas wrote:

> >> I don't expect any meaningful differences in terms of performance
> >> between any of the discussed options. The big question right now is...
> >
> > This is the critical point. Politely, I would observe that *You* do not
> > think there is a meaningful difference. *I* do, and evidence suggests
> > that both Oracle and DRBD think so too. So we differ on what the "big
> > question" is here.
> 
> We must be talking about different things again. There's certainly big 
> differences in the different synchronization levels and configurations, 
> but I don't expect there to be big performance differences between 
> patches to implement those levels. Once we got rid of the polling loops, 
> I expect the network and disk latencies to dominate.

So IIUC you seem to agree with
* 4 levels of synchronous replication (specified on master)
* transaction-controlled replication from the master
* sending 3 LSN values back from standby

Well, then that pretty much is my patch, except for the parameter UI.
Did I misunderstand?

We also agree that we need a standby to master protocol change; I used
Zoltan's directly and I've had zero problems with it in testing.

The only disagreement has been about
 
* the need for standby registration (I understand "want")
which seems to boil down to whether we wait for servers that *ought* to
be there, but currently aren't.

* whether to have wal writer active (I'm happy to add that later in this
release, so we get the "recv" option also)

* whether we have a parameter for quorum commit > 1 (happy to add later)
Not sure if there is debate about whether quorum_commit = 1 is the
default.

* whether we provide replication_exceptions as core feature or as a
plugin

The only area of doubt is when we send replies, which you haven't
thought about yet. So presumably you've no design-level objection to
what I've proposed.

Things we all seem to like are

* different standbys can offer different sync levels

* standby names

* a set returning function which tells you current LSNs of all standbys

* the rough idea of being able to specify a "service" and have that
equate to a more complex config underneath the covers, without needing
to have the application know the details - I think we need more details
on that before we could say "we agree".

So seems like a good days work.

-- 
 Simon Riggs   www.2ndQuadrant.com
 PostgreSQL Development, 24x7 Support, Training and Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Report: removing the inconsistencies in our CVS->git conversion

2010-09-17 Thread Robert Haas
On Fri, Sep 17, 2010 at 11:39 AM, Tom Lane  wrote:
> Robert Haas  writes:
>> Magnus posted an updated conversion this morning.
>
>> http://git.postgresql.org/gitweb?p=postgresql-migration.git;a=summary
>
>> Evidently, however, he didn't do the same things you did, because
>> there are DEFINITELY more than 9 manufactured commits in this one.
>
> Um ... I just did
>        git clone git://git.postgresql.org/git/postgresql-migration.git
> and I only see nine.  It's got some *other* problems though; compared
> to a conversion I just finished locally, it's missing a whole lot of
> history for some of the old jdbc files.
>
> Is there any possibility that "git clone" isn't very trustworthy?
> It's a bit scary that we don't see identical views of this repository.

*scratches head*

I did a git-fetch into an existing copy of the old contents of that
repository, rather than a fresh clone.  Let me nuke it and start over.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise Postgres Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] bg worker: general purpose requirements

2010-09-17 Thread Robert Haas
On Fri, Sep 17, 2010 at 11:29 AM, Markus Wanner  wrote:
> autonomous transactions: max. one per normal backend (correct?), way fewer
> should suffice in most cases, only control data to be passed around

Technically, you could start an autonomous transaction from within an
autonomous transaction, so I don't think there's a hard maximum of one
per normal backend.  However, I agree that the expected case is to not
have very many.

> All of the potential users of bgworkers benefit from a pre-connected
> bgworker. Meaning having at least one spare bgworker around per database
> could be beneficial, potentially more depending on how often spike loads
> occur. As long as there are only few databases, it's easily possible to have
> at least one spare process around per database, but with thousands of
> databases, that might get prohibitively expensive (not sure where the
> boundary between win vs loose is, though. Idle backends vs. connection
> cost).

I guess it depends on what your goals are.  If you're optimizing for
ability to respond quickly to a sudden load, keeping idle backends
will probably win even when the number of them you're keeping around
is fairly high.  If you're optimizing for minimal overall resource
consumption, though, you'll not be as happy about that.  What I'm
struggling to understand is this: if there aren't any preforked
workers around when the load hits, how much does it slow things down?
I would have thought that a few seconds to ramp up to speed after an
extended idle period (5 minutes, say) would be acceptable for most of
the applications you mention.  Is the ramp-up time longer than that,
or is even that much delay unacceptable for Postgres-R, or is there
some other aspect to the problem I'm failing to grasp?  I can tell you
have some experience tuning this so I'd like to try to understand
where you're coming from.

> However, I feel like this gives less control over how the bgworkers are
> used. For example, I'd prefer to be able to prevent the system from
> allocating all bgworkers to a single database at once.

I think this is an interesting example, and worth some further
thought.  I guess I don't really understand how Postgres-R uses these
bgworkers.  Are you replicating one transaction at a time, or how does
the data get sliced up?  I remember you mentioning
sync/async/eager/other replication strategies previously - do you have
a pointer to some good reading on that topic?

> Hope that sheds some more light on how bgworkers could be useful. Maybe I
> just need to describe the job handling features of the coordinator better as
> well? (Simon also requested better documentation...)

That seems like it would be useful, too.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise Postgres Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Configuring synchronous replication

2010-09-17 Thread Simon Riggs
On Fri, 2010-09-17 at 11:30 -0400, Aidan Van Dyk wrote:
> * Robert Haas  [100917 11:24]:
> > On Fri, Sep 17, 2010 at 11:22 AM, Simon Riggs  wrote:
> > > On Fri, 2010-09-17 at 09:36 -0400, Aidan Van Dyk wrote:
> > >
> > >> I want to have them configured in a fsync WAL/style sync rep, I want to
> > >> make sure that if the master comes up first after I get power back, it's
> > >> not going to be claiming transactions are committed while the slave
> > >> (which happens to have 4x the disks because it keeps PITR backups for a
> > >> period too) it still chugging away on SCSI probes yet, not gotten to
> > >> having PostgreSQL up yet...
> > >
> > > Nobody has mentioned the ability to persist the not-committed state
> > > across a crash before, and I think it's an important discussion point.
> > 
> > Eh?  I think all Aidan is asking for is the ability to have a mode
> > where sync rep is really always sync, or nothing commits.  Rather than
> > timing out and continuing merrily on its way...
> 
> Right, I'm not asking for a "new" mode.  I'm just hope that there will
> be a way to guarantee my "sync rep" is actually replicating.  Having it
> "not replicate" simply because no slave has (yet) connected means I have
> to dance jigs around pg_hba.conf so that it won't allow non-replication
> connections until I've manual verified that the replication slave
> is connected...

I agree that aspect is a problem.

One solution, to me, would be to have a directive included in the
pg_hba.conf that says entries below it are only allowed if it passes the
test. So your hba file looks like this

local   postgrespostgres
hostreplication ...
needreplication 
hostany any 

So the "need" test is an extra option in the first column. We might want
additional "need" tests before we allow other rules also. Text following
the "need" verb will be additional info for that test, sufficient to
allow some kind of execution on the backend.

I definitely don't like the idea that anyone that commits will just sit
there waiting until the standby comes up. That just sounds an insane way
of doing it.

-- 
 Simon Riggs   www.2ndQuadrant.com
 PostgreSQL Development, 24x7 Support, Training and Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Report: removing the inconsistencies in our CVS->git conversion

2010-09-17 Thread Tom Lane
Robert Haas  writes:
> Magnus posted an updated conversion this morning.

> http://git.postgresql.org/gitweb?p=postgresql-migration.git;a=summary

> Evidently, however, he didn't do the same things you did, because
> there are DEFINITELY more than 9 manufactured commits in this one.

Um ... I just did
git clone git://git.postgresql.org/git/postgresql-migration.git
and I only see nine.  It's got some *other* problems though; compared
to a conversion I just finished locally, it's missing a whole lot of
history for some of the old jdbc files.

Is there any possibility that "git clone" isn't very trustworthy?
It's a bit scary that we don't see identical views of this repository.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Configuring synchronous replication

2010-09-17 Thread Aidan Van Dyk
* Robert Haas  [100917 11:24]:
> On Fri, Sep 17, 2010 at 11:22 AM, Simon Riggs  wrote:
> > On Fri, 2010-09-17 at 09:36 -0400, Aidan Van Dyk wrote:
> >
> >> I want to have them configured in a fsync WAL/style sync rep, I want to
> >> make sure that if the master comes up first after I get power back, it's
> >> not going to be claiming transactions are committed while the slave
> >> (which happens to have 4x the disks because it keeps PITR backups for a
> >> period too) it still chugging away on SCSI probes yet, not gotten to
> >> having PostgreSQL up yet...
> >
> > Nobody has mentioned the ability to persist the not-committed state
> > across a crash before, and I think it's an important discussion point.
> 
> Eh?  I think all Aidan is asking for is the ability to have a mode
> where sync rep is really always sync, or nothing commits.  Rather than
> timing out and continuing merrily on its way...

Right, I'm not asking for a "new" mode.  I'm just hope that there will
be a way to guarantee my "sync rep" is actually replicating.  Having it
"not replicate" simply because no slave has (yet) connected means I have
to dance jigs around pg_hba.conf so that it won't allow non-replication
connections until I've manual verified that the replication slave
is connected...

a.
-- 
Aidan Van Dyk Create like a god,
ai...@highrise.ca   command like a king,
http://www.highrise.ca/   work like a slave.


signature.asc
Description: Digital signature


Re: [HACKERS] bg worker: general purpose requirements

2010-09-17 Thread Markus Wanner

Hi,

On 09/16/2010 07:47 PM, Robert Haas wrote:

It would be nice if there were a way to create
a general facility here that we could then build various applications
on, but I'm not sure whether that's the case.  We had some
back-and-forth about what is best for replication vs. what is best for
vacuum vs. what is best for parallel query.  If we could somehow
conceive of a system that could serve all of those needs without
introducing any more configuration complexity than what we have now,
that would of course be very interesting.


Lets think about this again from a little distance. We have the existing 
autovacuum and the Postgres-R project. Then there are the potential 
features 'parallel querying' and 'autonomous transactions' that could in 
principle benefit from the bgworker infrastructure.


For all of those, one could head for a multi-threaded, a multi-process 
or an async, event based approach. Multi-threading seems to be out of 
question for Postgres. We don't have much of an async event framework 
anywhere, so at least for parallel querying that seems out of question 
as well. Only the 'autonomous transactions' feature seems simple enough 
to be doable within a single process. That approach would still miss the 
isolation that a separate process features (not sure that's required, 
but 'autonomous' sounds like it could be a good thing to have).


So assuming we use the multi-process approach provided by bgworkers for 
both potential features. What are the requirements?


autovacuum: only very few jobs at a time, not very resource intensive, 
not passing around lots of data


Postgres-R: lots of concurrent jobs, easily more than normal backends, 
depending on the amount of nodes in the cluster and read/write ratio, 
lots of data to be passed around


parallel querying: a couple dozen concurrent jobs (by number of CPUs or 
spindles available?), more doesn't help, lots of data to be passed around


autonomous transactions: max. one per normal backend (correct?), way 
fewer should suffice in most cases, only control data to be passed around



So, for both potential features as well as for autovacuum, a ratio of 
1:10 (or even less) for max_bgworkers:max_connections would suffice. 
Postgres-R clearly seems to be the out-breaker here. It needs special 
configuration anyway, so I'd have no problem with defaults that target 
the other use cases.


All of the potential users of bgworkers benefit from a pre-connected 
bgworker. Meaning having at least one spare bgworker around per database 
could be beneficial, potentially more depending on how often spike loads 
occur. As long as there are only few databases, it's easily possible to 
have at least one spare process around per database, but with thousands 
of databases, that might get prohibitively expensive (not sure where the 
boundary between win vs loose is, though. Idle backends vs. connection 
cost).


None the less, bgworkers would make the above features easier to 
implement, as they provide the controlled background worker process 
infrastructure, including job handling (and even queuing) in the 
coordinator process. Having spare workers available is not a perquisite 
to use bgworkers, it's just an optimization.


Autovacuum could possibly benefit from bgworkers by enabling a finer 
grained choice for what database and table to vacuum when. I didn't look 
too much into that, though.


Regarding the additional configuration overhead of the bgworkers patch: 
max_autovacuum_workers gets turned into max_background_workers, so the 
only additional GUCs currently are: min_spare_background_workers and 
max_spare_background_workers (sorry, I thought I named them idle 
workers, looks like I've gone with spare workers for the GUCs).


Those are used to control and limit (in both directions) the amount of 
spare workers (per database). It's the simplest possible variant I could 
think of. But I'm open to other mechanisms, especially ones that require 
less configuration. Simply keeping spare workers around for a given 
timeout *could* be a replacement and would save us one GUC.


However, I feel like this gives less control over how the bgworkers are 
used. For example, I'd prefer to be able to prevent the system from 
allocating all bgworkers to a single database at once. And as mentioned 
above, it also makes sense to pre-fork some bgworkers in advance, if 
there are still enough available. The timeout approach doesn't take care 
of that, but assumes that the past is a good indicator of use for the 
future.


Hope that sheds some more light on how bgworkers could be useful. Maybe 
I just need to describe the job handling features of the coordinator 
better as well? (Simon also requested better documentation...)


Regards

Markus Wanner

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Configuring synchronous replication

2010-09-17 Thread Robert Haas
On Fri, Sep 17, 2010 at 11:22 AM, Simon Riggs  wrote:
> On Fri, 2010-09-17 at 09:36 -0400, Aidan Van Dyk wrote:
>
>> I want to have them configured in a fsync WAL/style sync rep, I want to
>> make sure that if the master comes up first after I get power back, it's
>> not going to be claiming transactions are committed while the slave
>> (which happens to have 4x the disks because it keeps PITR backups for a
>> period too) it still chugging away on SCSI probes yet, not gotten to
>> having PostgreSQL up yet...
>
> Nobody has mentioned the ability to persist the not-committed state
> across a crash before, and I think it's an important discussion point.

Eh?  I think all Aidan is asking for is the ability to have a mode
where sync rep is really always sync, or nothing commits.  Rather than
timing out and continuing merrily on its way...

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise Postgres Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Configuring synchronous replication

2010-09-17 Thread Simon Riggs
On Fri, 2010-09-17 at 09:36 -0400, Aidan Van Dyk wrote:

> I want to have them configured in a fsync WAL/style sync rep, I want to
> make sure that if the master comes up first after I get power back, it's
> not going to be claiming transactions are committed while the slave
> (which happens to have 4x the disks because it keeps PITR backups for a
> period too) it still chugging away on SCSI probes yet, not gotten to
> having PostgreSQL up yet...

Nobody has mentioned the ability to persist the not-committed state
across a crash before, and I think it's an important discussion point. 

We already have it: its called "two phase commit". (2PC)

If you run 2PC on 3 servers and one goes down, you can just commit the
in-flight transactions and continue. But it doesn't work on hot standby.

It could: If we want that we could prepare the transaction on the master
and don't allow commit until we get positive confirmation from standby.
All of the machinery is there.

I'm not sure if that's a 5th sync rep mode, or that idea is actually
good enough to replace all the ideas we've had up until now. I would say
probably not, but we should think about this.

A slightly modified idea would be avoid writing the transaction prepare
file as a separate file, just write the WAL for the prepare. We then
remember the LSN of the prepare so we can re-access the WAL copy of it
by re-reading the WAL files on master. Make sure we don't get rid of WAL
that refers to waiting transactions. That would then give us the option
to commit or abort depending upon whether we receive a reply within
timeout.

-- 
 Simon Riggs   www.2ndQuadrant.com
 PostgreSQL Development, 24x7 Support, Training and Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Report: removing the inconsistencies in our CVS->git conversion

2010-09-17 Thread Robert Haas
On Tue, Sep 14, 2010 at 10:19 AM, Tom Lane  wrote:
> Attached is an updated repository.fixups script that inserts dead
> revisions in every case where a new file was back-patched into an
> existing branch.  With that, we are down to a total of nine manufactured
> commits, to wit:
> [details]

Magnus posted an updated conversion this morning.

http://git.postgresql.org/gitweb?p=postgresql-migration.git;a=summary

Evidently, however, he didn't do the same things you did, because
there are DEFINITELY more than 9 manufactured commits in this one.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise Postgres Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Serializable Snapshot Isolation

2010-09-17 Thread Kevin Grittner
Tom Lane  wrote:
 
> That assumption is absolutely, totally not going to fly.
 
Understood; I'm already working on it based on Heikki's input.
 
>> This needs to work when the xid of a transaction is found in the
>> MVCC data of a tuple for any overlapping serializable transaction
>> -- even if that transaction has completed and its connection has
>> been closed. It didn't look to me like
>> SubTransGetTopmostTransaction() would work after the transaction
>> was gone.
> 
> Yes, it should work.  If it doesn't, you are failing to manage the
> TransactionXmin horizon correctly.
 
So far I haven't wanted to mess with the global xmin values for fear
of the possible impact on other transactions.  It actually hasn't
been that hard to maintain a SerializableGlobalXmin value, which is
more efficient than the existing ones for predicate lock cleanup
purposes.  That still isn't exactly what I need to modify cleanup of
the subtransaction information, though.  Once I've got my head
around the subtrans.c code, I think I'll need to maintain a minimum
that includes the xids for serializable transactions which *overlap*
SerializableGlobalXmin.  That doesn't seem very hard to do; I just
haven't needed it until now.  Then I'll modify the subtransaction
cleanup to only remove entries before the earlier of the global xmin
of all transactions and the xmin of serializable transactions which
overlap active serializable transactions.
 
Does all that sound reasonable?
 
-Kevin

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Serializable Snapshot Isolation

2010-09-17 Thread Tom Lane
"Kevin Grittner"  writes:
> Heikki Linnakangas  wrote:
>> That sounds like it can eat through your shared memory very quickly
>> if you have a lot of subtransactions.
 
> Hmmm  I've never explicitly used subtransactions, so I don't tend
> to think of them routinely going too deep.  And the struct is pretty
> small.

That assumption is absolutely, totally not going to fly.

>> Why not use SubTransGetTopmostTransaction() ?
 
> This needs to work when the xid of a transaction is found in the MVCC
> data of a tuple for any overlapping serializable transaction -- even
> if that transaction has completed and its connection has been
> closed. It didn't look to me like SubTransGetTopmostTransaction()
> would work after the transaction was gone.

Yes, it should work.  If it doesn't, you are failing to manage the
TransactionXmin horizon correctly.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Configuring synchronous replication

2010-09-17 Thread Aidan Van Dyk
* Fujii Masao  [100917 07:57]:
 
> Synchronous replication is basically used to reduce the
> downtime, and "wait forever" option opposes that.

Hm... I'm not sure I'ld agree with that.  I'ld rather have some
downtime, and my data available, then have less downtime, but find that
I'm missing valuable data that was committed, but happend to not be
replicated because no slave was available "yet".

Sync rep is about "data availability", "data recoverability", *and*
"downtime".  The three are definitely related, but each use has their
own tradeoffs.

a.

-- 
Aidan Van Dyk Create like a god,
ai...@highrise.ca   command like a king,
http://www.highrise.ca/   work like a slave.


signature.asc
Description: Digital signature


Re: [HACKERS] Configuring synchronous replication

2010-09-17 Thread Aidan Van Dyk
* Robert Haas  [100917 07:44]:
> On Fri, Sep 17, 2010 at 7:31 AM, Simon Riggs  wrote:
> > The only thing standby registration allows you to do is know whether
> > there was supposed to be a standby there, but yet it isn't there now. I
> > don't see that point as being important because it seems strange to me
> > to want to wait for a standby that ought to be there, but isn't anymore.
> > What happens if it never comes back? Manual intervention required.
 
> > The absence of registration in my patch makes some things easier and
> > some things harder. For example, you can add a new standby without
> > editing the config on the master.
> 
> That's actually one of the reasons why I like the idea of
> registration.  It seems rather scary to add a new standby without
> editing the config on the master.  Actually, adding a new fully-async
> slave without touching the master seems reasonable, but adding a new
> sync slave without touching the master gives me the willies.  The
> behavior of the system could change quite sharply when you do this,
> and it might not be obvious what has happened.  (Imagine DBA #1 makes
> the change and DBA #2 is then trying to figure out what's happened -
> he checks the configs of all the machines he knows about and finds
> them all unchanged... head-scratching ensues.)

So, those both give me the willies too...

I've had a rack loose all power.  Now, let's say I've got two servers
(plus trays of disks for each) in the same rack.  Ya, I know, I should
move them to separate racks, preferably in separate buildings on the
same campus, but realistically...

I want to have them configured in a fsync WAL/style sync rep, I want to
make sure that if the master comes up first after I get power back, it's
not going to be claiming transactions are committed while the slave
(which happens to have 4x the disks because it keeps PITR backups for a
period too) it still chugging away on SCSI probes yet, not gotten to
having PostgreSQL up yet...

And I want to make sure the dev box that was testing another slave setup
on, which is running in some test area by some other DBA, but not in the
same rack, *can't* through some mis-configuration make my master think
that it's production slave has properly fsync'ed the replicated WAL.



-- 
Aidan Van Dyk Create like a god,
ai...@highrise.ca   command like a king,
http://www.highrise.ca/   work like a slave.


signature.asc
Description: Digital signature


Re: [HACKERS] Configuring synchronous replication

2010-09-17 Thread Heikki Linnakangas

On 17/09/10 15:56, Simon Riggs wrote:

On Fri, 2010-09-17 at 13:41 +0300, Heikki Linnakangas wrote:

On 17/09/10 12:49, Simon Riggs wrote:

Without performance tests to demonstrate "why", these do sound hard to
understand. But we should note that DRBD offers recv ("B") and fsync
("C") as separate options. And Oracle implements all 3 of recv, fsync
and apply. Neither of them describe those options so simply and easily
as the way we are proposing with a 4 valued enum (with async as the
fourth option).

If we have only one option for sync_rep = 'on' which of recv | fsync |
apply would it implement? You don't mention that. Which do you choose?


You would choose between recv, fsync and apply in the slave, with a GUC.


So you would have both registration on the master and parameter settings
on the standby? I doubt you mean that, so possibly need more explanation
there for me to understand what you mean and also why you would do that.


Yes, that's what I meant. No-one else seems to think that's a good idea :-).


I don't expect any meaningful differences in terms of performance
between any of the discussed options. The big question right now is...


This is the critical point. Politely, I would observe that *You* do not
think there is a meaningful difference. *I* do, and evidence suggests
that both Oracle and DRBD think so too. So we differ on what the "big
question" is here.


We must be talking about different things again. There's certainly big 
differences in the different synchronization levels and configurations, 
but I don't expect there to be big performance differences between 
patches to implement those levels. Once we got rid of the polling loops, 
I expect the network and disk latencies to dominate.


--
  Heikki Linnakangas
  EnterpriseDB   http://www.enterprisedb.com

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Configuring synchronous replication

2010-09-17 Thread Simon Riggs
On Fri, 2010-09-17 at 21:43 +0900, Fujii Masao wrote:
> On Fri, Sep 17, 2010 at 5:09 PM, Heikki Linnakangas
>  wrote:
> > * Quorum commit. Wait until n standbys acknowledge. n=1 and n=all servers
> > can be seen as important special cases of this.
> 
> I think that we should skip quorum commit at the first phase
> because the design seems to be still poorly-thought-out.

Agreed

> I'm concerned about the case where the faster synchronous standby
> goes down and the lagged synchronous one remains when n=1. In this
> case, some transactions marked as committed in a client might not
> be replicated to the remaining synchronous standby yet. What if
> the master goes down at this point? How can we determine whether
> promoting the remaining standby to the master causes data loss?

In that config if the faster sync standby goes down then your
application performance goes down dramatically. That would be fragile.

So you would set up like this
master - requests are > async
standby1 - fast - so use recv | fsync | apply
standby2 - async

So if standby1 goes down we don't wait for standby2, but we do continue
to stream to it.

-- 
 Simon Riggs   www.2ndQuadrant.com
 PostgreSQL Development, 24x7 Support, Training and Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Configuring synchronous replication

2010-09-17 Thread Simon Riggs
On Fri, 2010-09-17 at 13:41 +0300, Heikki Linnakangas wrote:
> On 17/09/10 12:49, Simon Riggs wrote:
> > Fujii has long talked about 4 levels of service also. Why change? I had
> > thought that part was pretty much agreed between all of us.
> 
> Now you lost me. I agree that we need 4 levels of service (at least 
> ultimately, not necessarily in the first phase).

OK, good.

> > Without performance tests to demonstrate "why", these do sound hard to
> > understand. But we should note that DRBD offers recv ("B") and fsync
> > ("C") as separate options. And Oracle implements all 3 of recv, fsync
> > and apply. Neither of them describe those options so simply and easily
> > as the way we are proposing with a 4 valued enum (with async as the
> > fourth option).
> >
> > If we have only one option for sync_rep = 'on' which of recv | fsync |
> > apply would it implement? You don't mention that. Which do you choose?
> 
> You would choose between recv, fsync and apply in the slave, with a GUC.

So you would have both registration on the master and parameter settings
on the standby? I doubt you mean that, so possibly need more explanation
there for me to understand what you mean and also why you would do that.

> > I no longer seek to persuade by words alone. The existence of my patch
> > means that I think that only measurements and tests will show why I have
> > been saying these things. We need performance tests.
> 


> I don't expect any meaningful differences in terms of performance 
> between any of the discussed options. The big question right now is...

This is the critical point. Politely, I would observe that *You* do not
think there is a meaningful difference. *I* do, and evidence suggests
that both Oracle and DRBD think so too. So we differ on what the "big
question" is here.

It's sounding to me that if we don't know these things, then we're quite
a long way from committing something. This is basic research.

> what 
> features we provide and how they're configured. Performance will depend 
> primarily on the mode you use, and secondarily on the implementation of 
> the mode. It would be completely premature to do performance testing yet 
> IMHO.

If a patch is "ready" then we should be able to performance test it
*before* we commit it. From what you say it sounds like Fujii's patch
might yet require substantial tuning, so it might even be the case that
my patch is closer in terms of readiness to commit. Whatever the case,
we have two patches and I can't see any benefit in avoiding performance
tests.

> >> Putting all of that together. I think Fujii-san's standby.conf is pretty
> >> close.
> >
> >> What it needs is the additional GUC for transaction-level control.
> >
> > The difference between the patches is not a simple matter of a GUC.
> >
> > My proposal allows a single standby to provide efficient replies to
> > multiple requested durability levels all at the same time. With
> > efficient use of network resources. ISTM that because the other patch
> > cannot provide that you'd like to persuade us that we don't need that,
> > ever. You won't sell me on that point, cos I can see lots of uses for
> > it.
> 
> Simon, how the replies are sent is an implementation detail I haven't 
> given much thought yet. 

It seems clear we've thought about different details around these
topics. Now I understand your work on latches, I see it is an important
contribution and I very much respect that. IMHO, each of us has seen
something important that the other has not.

> The reason we delved into that discussion 
> earlier was that you seemed to contradict yourself with the claims that 
> you don't need to send more than one reply per transaction, and that the 
> standby doesn't need to know the synchronization level. Other than that 
> the curiosity about that contradiction, it doesn't seem like a very 
> interesting detail to me right now. It's not a question that drives the 
> rest of the design, but the other way round.

There was no contradiction. You just didn't understand how it could be
possible, so dismissed it.

It's a detail, yes. Some are critical, some are not. (e.g. latches.)

My view is that it is critical and drives the design. So I don't agree
with you on "the other way around".

> But FWIW, something like your proposal of sending 3 XLogRecPtrs in each 
> reply seems like a good approach. I'm not sure about using walwriter. I 
> can see that it helps with getting the 'recv' and 'replay' 
> acknowledgments out faster, but 


> I still have the scars from starting 
> bgwriter during recovery.

I am happy to apologise for those problems. I was concentrating on HS at
the time, not on that aspect. You sorted out those problems for me and I
thank you for that. 

With that in mind, I will remove the aspect of my patch that relate to
starting wal writer. Small amount of code only. That means we will
effectively disable recv mode for now, but I definitely want to be able
to put it back later.

-- 
 Simon Riggs

Re: [HACKERS] Configuring synchronous replication

2010-09-17 Thread Simon Riggs
On Fri, 2010-09-17 at 20:56 +0900, Fujii Masao wrote:
> On Fri, Sep 17, 2010 at 7:41 PM, Heikki Linnakangas
>  wrote:
> >> The question is do we want standby registration on master and if so,
> >> why?
> >
> > Well, aside from how to configure synchronous replication, standby
> > registration would help with retaining the right amount of WAL in the
> > master. wal_keep_segments doesn't guarantee that enough is retained, and
> > OTOH when all standbys are connected you retain much more than might be
> > required.
> 
> Yep.

Setting wal_keep_segments is difficult, but its not a tunable.

The sysadmin needs to tell us what is the maximum number of files she'd
like to keep. Otherwise we may fill up a disk, use space intended for
use by another app, etc..

The server cannot determine what limits the sysadmin may wish to impose.
The only sane default is 0, because "store everything, forever" makes no
sense. Similarly, if we register a server, it goes down and we forget to
deregister it then we will attempt to store everything, forever and our
system will go down.

The bigger problem is base backups, not server restarts. We don't know
how to get that right because we don't register base backups
automatically. If we did dynamically alter the number of WALs we store
then we'd potentially screw up new base backups. Server registration
won't help with that at all, so you'd need to add a base backup
registration scheme as well. But even if you had that, you'd still need
a "max" setting defined by sysadmin.

So the only sane thing to do is to set wal_keep_segments as high as
possible. And doing that doesn't need server reg.

> And standby registration is required when we support "wait forever when
> synchronous standby isn't connected at the moment" option that Heikki
> explained upthread. Though I don't think that standby registration is
> required in the first phase since "wait forever" option is not used in
> basic use case. Synchronous replication is basically used to reduce the
> downtime, and "wait forever" option opposes that.

Agreed, but I'd say "if" we support that. 

-- 
 Simon Riggs   www.2ndQuadrant.com
 PostgreSQL Development, 24x7 Support, Training and Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Configuring synchronous replication

2010-09-17 Thread Simon Riggs
On Fri, 2010-09-17 at 21:20 +0900, Fujii Masao wrote:
> On Fri, Sep 17, 2010 at 8:31 PM, Simon Riggs  wrote:
> > The only thing standby registration allows you to do is know whether
> > there was supposed to be a standby there, but yet it isn't there now. I
> > don't see that point as being important because it seems strange to me
> > to want to wait for a standby that ought to be there, but isn't anymore.
> 
> According to what I heard, some people want to guarantee that all the
> transactions are *always* written in *all* the synchronous standbys.
> IOW, they want to keep the transaction waiting until it has been written
> in all the synchronous standbys. Standby registration is required to
> support such a use case. Without the registration, the master cannot
> determine whether the transaction has been written in all the synchronous
> standbys.

You don't need standby registration at all. You can do that with a
single parameter, already proposed:

quorum_commit = N.

But most people said they didn't want it. If they do we can put it back
later.

I don't think we're getting anywhere here. I just don't see any *need*
to have it. Some people might *want* to set things up that way, and if
that's true, that's enough for me to agree with them. The trouble is, I
know some people have said they *want* to set it in the standby and we
definitely *need* to set it somewhere. After this discussion, I think
"both" is easily done and quite cool.

-- 
 Simon Riggs   www.2ndQuadrant.com
 PostgreSQL Development, 24x7 Support, Training and Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Configuring synchronous replication

2010-09-17 Thread Robert Haas
On Fri, Sep 17, 2010 at 8:43 AM, Fujii Masao  wrote:
> On Fri, Sep 17, 2010 at 5:09 PM, Heikki Linnakangas
>  wrote:
>> * Quorum commit. Wait until n standbys acknowledge. n=1 and n=all servers
>> can be seen as important special cases of this.
>
> I think that we should skip quorum commit at the first phase
> because the design seems to be still poorly-thought-out.
>
> I'm concerned about the case where the faster synchronous standby
> goes down and the lagged synchronous one remains when n=1. In this
> case, some transactions marked as committed in a client might not
> be replicated to the remaining synchronous standby yet. What if
> the master goes down at this point? How can we determine whether
> promoting the remaining standby to the master causes data loss?

Yep. That issue has been raised before, and I think it's quite valid.
That's not to say the feature isn't valid, but I think trying to
include it in the first commit is going to lead to endless wrangling
about design.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise Postgres Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Configuring synchronous replication

2010-09-17 Thread Fujii Masao
On Fri, Sep 17, 2010 at 5:09 PM, Heikki Linnakangas
 wrote:
> * Quorum commit. Wait until n standbys acknowledge. n=1 and n=all servers
> can be seen as important special cases of this.

I think that we should skip quorum commit at the first phase
because the design seems to be still poorly-thought-out.

I'm concerned about the case where the faster synchronous standby
goes down and the lagged synchronous one remains when n=1. In this
case, some transactions marked as committed in a client might not
be replicated to the remaining synchronous standby yet. What if
the master goes down at this point? How can we determine whether
promoting the remaining standby to the master causes data loss?

Regards,

-- 
Fujii Masao
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Configuring synchronous replication

2010-09-17 Thread Simon Riggs
On Fri, 2010-09-17 at 21:20 +0900, Fujii Masao wrote:

> What synchronization level does each combination of sync_replication
> and sync_replication_service lead to? I'd like to see something like
> the following table.
> 
>  sync_replication | sync_replication_service | result
> --+--+
>  async| async| ???
>  async| recv | ???
>  async| fsync| ???
>  async| apply| ???
>  recv | async| ???
>  ... 

Good question.

There are only 4 possible outcomes. There is no combination, so we don't
need a table like that above.

The "service" specifies the highest request type available from that
specific standby. If someone requests a higher service than is currently
offered by this standby, they will either 
a) get that service from another standby that does offer that level
b) automatically downgrade the sync rep mode to the highest available.

For example, if you request recv but there is only one standby and it
only offers async, then you get downgraded to async.

In all cases, if you request async then we act same as 9.0.

-- 
 Simon Riggs   www.2ndQuadrant.com
 PostgreSQL Development, 24x7 Support, Training and Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] patch: Add JSON datatype to PostgreSQL (GSoC, WIP)

2010-09-17 Thread Itagaki Takahiro
On Fri, Aug 13, 2010 at 7:33 PM, Joseph Adams
 wrote:
> Updated patch:  the JSON code has all been moved into core, so this
> patch is now for a built-in data type.

I have a question about the design of the JSON type. Why do we need to
store the value in UTF8 encoding? It's true the RFC of JSON says the
the encoding SHALL be encoded in Unicode, but I don't understand why
we should reject other encodings.

As I said before, I'd like to propose only 3 features in the commitfest:
  * TYPE json data type
  * text to json: FUNCTION json_parse(text)
  * json to text: FUNCTION json_stringify(json, whitelist, space)

JSONPath will be re-implemented on the basic functionalities in the
subsequent commitfest. Do you have a plan to split your patch?
Or, can I continue to develop my patch? If so, JSONPath needs
to be adjusted to the new infrastructure.

I think json_parse() and json_stringify() is well-known APIs for JSON:
  https://developer.mozilla.org/En/Using_JSON_in_Firefox
So, it'd be worth buying the names and signatures for our APIs.
(I'll rename json_pretty in my previous patch to json_stringify.)

-- 
Itagaki Takahiro

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Report: removing the inconsistencies in our CVS->git conversion

2010-09-17 Thread Michael Meskes
On Tue, Sep 14, 2010 at 12:01:18PM -0400, Tom Lane wrote:
> Well ... I guess the other attitude we could take is that that was a
> private development branch of Michael's.  If we'd been working in git

Actually it wasn't. This branch was created when ecpg grew too big for the
released version of bison. We had to use a development snapshot of bison to
even compile ecpg back then and didn't want to put the ecpg changes into the
main branch because this would have prevented most people from compiling PG. At
least that's what I still remember.

As far as converting this branch to git, I'd say forget about it. All changes
to the parser went into CVS HEAD as soon as a suitable bison version was
released.

Michael

-- 
Michael Meskes
Michael at Fam-Meskes dot De, Michael at Meskes dot (De|Com|Net|Org)
Michael at BorussiaFan dot De, Meskes at (Debian|Postgresql) dot Org
Jabber: michael.meskes at googlemail dot com
VfL Borussia! Força Barça! Go SF 49ers! Use Debian GNU/Linux, PostgreSQL

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Configuring synchronous replication

2010-09-17 Thread Fujii Masao
On Fri, Sep 17, 2010 at 8:31 PM, Simon Riggs  wrote:
> The only thing standby registration allows you to do is know whether
> there was supposed to be a standby there, but yet it isn't there now. I
> don't see that point as being important because it seems strange to me
> to want to wait for a standby that ought to be there, but isn't anymore.

According to what I heard, some people want to guarantee that all the
transactions are *always* written in *all* the synchronous standbys.
IOW, they want to keep the transaction waiting until it has been written
in all the synchronous standbys. Standby registration is required to
support such a use case. Without the registration, the master cannot
determine whether the transaction has been written in all the synchronous
standbys.

> What happens if it never comes back? Manual intervention required.

Yep.

> In the use cases we discussed we had simple 2 or 3 server configs.
>
> master
> standby1 - preferred sync target - set to recv, fsync or apply
> standby2 - non-preferred sync target, maybe test server - set to async
>
> So in the two cases you mention we might set
>
> "wait for ack from reporting slave"
> master: sync_replication = 'recv'   #as default, can be changed
> reporting-slave: sync_replication_service = 'recv' #gives max level
>
> "wait until replayed in the server on the west coast"
> master: sync_replication = 'recv'   #as default, can be changed
> west-coast: sync_replication_service = 'apply' #gives max level

What synchronization level does each combination of sync_replication
and sync_replication_service lead to? I'd like to see something like
the following table.

 sync_replication | sync_replication_service | result
--+--+
 async| async| ???
 async| recv | ???
 async| fsync| ???
 async| apply| ???
 recv | async| ???
 ...

Regards,

-- 
Fujii Masao
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Serializable Snapshot Isolation

2010-09-17 Thread Kevin Grittner
Heikki Linnakangas  wrote:
> On 17/09/10 14:56, Kevin Grittner wrote:
>> Heikki Linnakangas wrote:
>>> Why not use SubTransGetTopmostTransaction() ?
>>
>> This needs to work when the xid of a transaction is found in the
>> MVCC data of a tuple for any overlapping serializable transaction
>> -- even if that transaction has completed and its connection has
>> been closed.  It didn't look to me like
>> SubTransGetTopmostTransaction() would work after the transaction
>> was gone.
>
> You're right, it doesn't retain that old transactions. But it could
> easily be modified to do so.
 
I shall look into it.
 
-Kevin

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Serializable Snapshot Isolation

2010-09-17 Thread Heikki Linnakangas

On 17/09/10 14:56, Kevin Grittner wrote:

Heikki Linnakangas  wrote:

Why not use SubTransGetTopmostTransaction() ?


This needs to work when the xid of a transaction is found in the MVCC
data of a tuple for any overlapping serializable transaction -- even
if that transaction has completed and its connection has been
closed. It didn't look to me like SubTransGetTopmostTransaction()
would work after the transaction was gone.


You're right, it doesn't retain that old transactions. But it could 
easily be modified to do so.


--
  Heikki Linnakangas
  EnterpriseDB   http://www.enterprisedb.com

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Configuring synchronous replication

2010-09-17 Thread Fujii Masao
On Fri, Sep 17, 2010 at 7:41 PM, Heikki Linnakangas
 wrote:
>> The question is do we want standby registration on master and if so,
>> why?
>
> Well, aside from how to configure synchronous replication, standby
> registration would help with retaining the right amount of WAL in the
> master. wal_keep_segments doesn't guarantee that enough is retained, and
> OTOH when all standbys are connected you retain much more than might be
> required.

Yep.

And standby registration is required when we support "wait forever when
synchronous standby isn't connected at the moment" option that Heikki
explained upthread. Though I don't think that standby registration is
required in the first phase since "wait forever" option is not used in
basic use case. Synchronous replication is basically used to reduce the
downtime, and "wait forever" option opposes that.

Regards,

-- 
Fujii Masao
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Serializable Snapshot Isolation

2010-09-17 Thread Kevin Grittner
Heikki Linnakangas  wrote:
 
> So, the purpose of SerializableXidHash is to provide quick access
> to the SERIALIZABLEXACT struct of a top-level transaction, when you
> know its transaction id or any of its subtransaction ids.
 
Right.
 
> To implement the "or any of its subtransaction ids" part, you need
> to have a SERIALIZABLEXID struct for each subtransaction in shared
> memory.
 
Close -- each subtransaction which writes any tuples.
 
> That sounds like it can eat through your shared memory very quickly
> if you have a lot of subtransactions.
 
Hmmm  I've never explicitly used subtransactions, so I don't tend
to think of them routinely going too deep.  And the struct is pretty
small.
 
> Why not use SubTransGetTopmostTransaction() ?
 
This needs to work when the xid of a transaction is found in the MVCC
data of a tuple for any overlapping serializable transaction -- even
if that transaction has completed and its connection has been
closed. It didn't look to me like SubTransGetTopmostTransaction()
would work after the transaction was gone.
 
I guess that's something I should mention in the comments
 
-Kevin

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Configuring synchronous replication

2010-09-17 Thread Robert Haas
On Fri, Sep 17, 2010 at 7:31 AM, Simon Riggs  wrote:
> The only thing standby registration allows you to do is know whether
> there was supposed to be a standby there, but yet it isn't there now. I
> don't see that point as being important because it seems strange to me
> to want to wait for a standby that ought to be there, but isn't anymore.
> What happens if it never comes back? Manual intervention required.
>
> (We agree on how to handle a standby that *is* "connected", yet never
> returns a reply or takes too long to do so).

Doesn't Oracle provide a mode where it shuts down if this occurs?

> The absence of registration in my patch makes some things easier and
> some things harder. For example, you can add a new standby without
> editing the config on the master.

That's actually one of the reasons why I like the idea of
registration.  It seems rather scary to add a new standby without
editing the config on the master.  Actually, adding a new fully-async
slave without touching the master seems reasonable, but adding a new
sync slave without touching the master gives me the willies.  The
behavior of the system could change quite sharply when you do this,
and it might not be obvious what has happened.  (Imagine DBA #1 makes
the change and DBA #2 is then trying to figure out what's happened -
he checks the configs of all the machines he knows about and finds
them all unchanged... head-scratching ensues.)

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise Postgres Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Configuring synchronous replication

2010-09-17 Thread Robert Haas
On Fri, Sep 17, 2010 at 6:41 AM, Heikki Linnakangas
 wrote:
>>> So what should the user interface be like? Given the 1st and 2nd
>>> requirement, we need standby registration. If some standbys are
>>> important and others are not, the master needs to distinguish between
>>> them to be able to determine that a transaction is safely delivered to
>>> the important standbys.
>>
>> My patch provides those two requirements without standby registration,
>> so we very clearly don't "need" standby registration.
>
> It's still not clear to me how you would configure things like "wait for ack
> from reporting slave, but not other slaves" or "wait until replayed in the
> server on the west coast" in your proposal. Maybe it's possible, but doesn't
> seem very intuitive, requiring careful configuration in both the master and
> the slaves.

Agreed.  I think this will be much simpler if all the configuration is
in one place (on the master).

> In your proposal, you also need to be careful not to connect e.g a test
> slave with "synchronous_replication_service = apply" to the master, or it
> will possible shadow a real production slave, acknowledging transactions
> that are not yet received by the real slave. It's certainly possible to
> screw up with standby registration too, but you have more direct control of
> the master behavior in the master, instead of distributing it across all
> slaves.

Similarly agreed.

>> The question is do we want standby registration on master and if so,
>> why?
>
> Well, aside from how to configure synchronous replication, standby
> registration would help with retaining the right amount of WAL in the
> master. wal_keep_segments doesn't guarantee that enough is retained, and
> OTOH when all standbys are connected you retain much more than might be
> required.

+1.

> Giving names to slaves also allows you to view their status in the master in
> a more intuitive format. Something like:
>
> postgres=# SELECT * FROM pg_slave_status ;
>    name    | connected |  received  |   fsyncd   |  applied
> +---+++
>  reporting  | t         | 0/2620 | 0/2620 | 0/25550020
>  ha-standby | t         | 0/2620 | 0/2620 | 0/2620
>  testserver | f         |            | 0/1520 |
> (3 rows)

+1.

Having said all of the above, I am not in favor your (Heikki's)
proposal to configure sync/async on the slave and the level on the
master.  That seems like a somewhat bizarre division of labor,
splitting what is essentially one setting across two machines.

>>> For the control between async/recv/fsync/replay, I like to think in
>>> terms of
>>> a) asynchronous vs synchronous
>>> b) if it's synchronous, how synchronous is it? recv, fsync or replay?
>>>
>>> I think it makes most sense to set sync vs. async in the master, and the
>>> level of synchronicity in the slave. Although I have sympathy for the
>>> argument that it's simpler if you configure it all from the master side
>>> as well.
>>
>> I have catered for such requests by suggesting a plugin that allows you
>> to implement that complexity without overburdening the core code.
>
> Well, plugins are certainly one possibility, but then we need to design the
> plugin API. I've been thinking along the lines of a proxy, which can
> implement whatever logic you want to decide when to send the acknowledgment.
> With a proxy as well, if we push any features people that want to a proxy or
> plugin, we need to make sure that the proxy/plugin has all the necessary
> information available.

I'm not really sold on the proxy idea. That seems like it adds a lot
of configuration complexity, not to mention additional hops.  Of
course, the plug-in idea also won't be suitable for any but the most
advanced users.  I think of the two I prefer the idea of a plug-in,
slightly, but maybe this doesn't have to be done in version 1.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise Postgres Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Configuring synchronous replication

2010-09-17 Thread Simon Riggs
On Fri, 2010-09-17 at 13:41 +0300, Heikki Linnakangas wrote:
> On 17/09/10 12:49, Simon Riggs wrote:
> > This isn't just about UI, there are significant and important
> > differences between the proposals in terms of the capability and control
> > they offer.
> 
> Sure. The point of focusing on the UI is that the UI demonstrates what 
> capability and control a proposal offers.

My patch does not include server registration. It could be added later
on top of my patch without any issues.

The core parts of my patch are the fine grained transaction-level
control and the ability to mix them dynamically with good performance.

To me server registration is not a core issue. I'm not actively against
it, I just don't see the need for it at all. Certainly not committed
first, especially since its not actually needed by either of our
patches.

Standby registration doesn't provide *any* parameter that can't be
supplied from standby recovery.conf. 

The only thing standby registration allows you to do is know whether
there was supposed to be a standby there, but yet it isn't there now. I
don't see that point as being important because it seems strange to me
to want to wait for a standby that ought to be there, but isn't anymore.
What happens if it never comes back? Manual intervention required.

(We agree on how to handle a standby that *is* "connected", yet never
returns a reply or takes too long to do so).

> >> So what should the user interface be like? Given the 1st and 2nd
> >> requirement, we need standby registration. If some standbys are
> >> important and others are not, the master needs to distinguish between
> >> them to be able to determine that a transaction is safely delivered to
> >> the important standbys.
> >
> > My patch provides those two requirements without standby registration,
> > so we very clearly don't "need" standby registration.
> 
> It's still not clear to me how you would configure things like "wait for 
> ack from reporting slave, but not other slaves" or "wait until replayed 
> in the server on the west coast" in your proposal. Maybe it's possible, 
> but doesn't seem very intuitive, requiring careful configuration in both 
> the master and the slaves.

In the use cases we discussed we had simple 2 or 3 server configs.

master
standby1 - preferred sync target - set to recv, fsync or apply
standby2 - non-preferred sync target, maybe test server - set to async

So in the two cases you mention we might set

"wait for ack from reporting slave"
master: sync_replication = 'recv'   #as default, can be changed
reporting-slave: sync_replication_service = 'recv' #gives max level

"wait until replayed in the server on the west coast"
master: sync_replication = 'recv'   #as default, can be changed
west-coast: sync_replication_service = 'apply' #gives max level


The absence of registration in my patch makes some things easier and
some things harder. For example, you can add a new standby without
editing the config on the master.

If you had 2 standbys, both offering the same level of protection, my
proposal would *not* allow you to specify that you preferred one master
over another. But we could add a priority parameter as well if that's an
issue. 

> In your proposal, you also need to be careful not to connect e.g a test 
> slave with "synchronous_replication_service = apply" to the master, or 
> it will possible shadow a real production slave, acknowledging 
> transactions that are not yet received by the real slave. It's certainly 
> possible to screw up with standby registration too, but you have more 
> direct control of the master behavior in the master, instead of 
> distributing it across all slaves.
> 
> > The question is do we want standby registration on master and if so,
> > why?
> 
> Well, aside from how to configure synchronous replication, standby 
> registration would help with retaining the right amount of WAL in the 
> master. wal_keep_segments doesn't guarantee that enough is retained, and 
> OTOH when all standbys are connected you retain much more than might be 
> required.
> 
> Giving names to slaves also allows you to view their status in the 
> master in a more intuitive format. Something like:

We can give servers a name without registration. It actually makes more
sense to set the name in the standby and it can be passed through from
standby when we connect.

I very much like the idea of server names and think this next SRF looks
really cool.

> postgres=# SELECT * FROM pg_slave_status ;
>  name| connected |  received  |   fsyncd   |  applied
> +---+++
>   reporting  | t | 0/2620 | 0/2620 | 0/25550020
>   ha-standby | t | 0/2620 | 0/2620 | 0/2620
>   testserver | f || 0/1520 |
> (3 rows)

That could be added on top of my patch also.

-- 
 Simon Riggs   www.2ndQuadrant.com
 PostgreSQL Development, 24x7 Support, Training and Services


-- 
Sent

Re: [HACKERS] Configuring synchronous replication

2010-09-17 Thread Dimitri Fontaine
Simon Riggs  writes:
> So far, I have added the point that if a user requests a level of
> confirmation that is currently unavailable, then it will use the highest
> level of confirmation available now. That stops us from waiting for
> timeout for every transaction we run if standby goes down hard, which
> just freezes the application for long periods to no real benefit. It
> also prevents applications from requesting durability levels the cluster
> cannot satisfy, in the opinion of the sysadmin, since the sysadmin
> specifies the max level on each standby.

That sounds like the commit-or-rollback when slave are gone question. I
think this behavior should be user-setable, again per-transaction. I
agree with you that the general case looks like your proposed default,
but we already know that some will need "don't ack if not replied before
the timeout", and they even will go as far as asking for it to be
reported as a serialisation error of some sort, I guess…

Regards,
-- 
Dimitri Fontaine
PostgreSQL DBA, Architecte

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Configuring synchronous replication

2010-09-17 Thread Heikki Linnakangas

On 17/09/10 12:49, Simon Riggs wrote:

This isn't just about UI, there are significant and important
differences between the proposals in terms of the capability and control
they offer.


Sure. The point of focusing on the UI is that the UI demonstrates what 
capability and control a proposal offers.



So what should the user interface be like? Given the 1st and 2nd
requirement, we need standby registration. If some standbys are
important and others are not, the master needs to distinguish between
them to be able to determine that a transaction is safely delivered to
the important standbys.


My patch provides those two requirements without standby registration,
so we very clearly don't "need" standby registration.


It's still not clear to me how you would configure things like "wait for 
ack from reporting slave, but not other slaves" or "wait until replayed 
in the server on the west coast" in your proposal. Maybe it's possible, 
but doesn't seem very intuitive, requiring careful configuration in both 
the master and the slaves.


In your proposal, you also need to be careful not to connect e.g a test 
slave with "synchronous_replication_service = apply" to the master, or 
it will possible shadow a real production slave, acknowledging 
transactions that are not yet received by the real slave. It's certainly 
possible to screw up with standby registration too, but you have more 
direct control of the master behavior in the master, instead of 
distributing it across all slaves.



The question is do we want standby registration on master and if so,
why?


Well, aside from how to configure synchronous replication, standby 
registration would help with retaining the right amount of WAL in the 
master. wal_keep_segments doesn't guarantee that enough is retained, and 
OTOH when all standbys are connected you retain much more than might be 
required.


Giving names to slaves also allows you to view their status in the 
master in a more intuitive format. Something like:


postgres=# SELECT * FROM pg_slave_status ;
name| connected |  received  |   fsyncd   |  applied
+---+++
 reporting  | t | 0/2620 | 0/2620 | 0/25550020
 ha-standby | t | 0/2620 | 0/2620 | 0/2620
 testserver | f || 0/1520 |
(3 rows)


For the control between async/recv/fsync/replay, I like to think in
terms of
a) asynchronous vs synchronous
b) if it's synchronous, how synchronous is it? recv, fsync or replay?

I think it makes most sense to set sync vs. async in the master, and the
level of synchronicity in the slave. Although I have sympathy for the
argument that it's simpler if you configure it all from the master side
as well.


I have catered for such requests by suggesting a plugin that allows you
to implement that complexity without overburdening the core code.


Well, plugins are certainly one possibility, but then we need to design 
the plugin API. I've been thinking along the lines of a proxy, which can 
implement whatever logic you want to decide when to send the 
acknowledgment. With a proxy as well, if we push any features people 
that want to a proxy or plugin, we need to make sure that the 
proxy/plugin has all the necessary information available.



This strikes me as an "ad absurdum" argument. Since the above
over-complexity would doubtless be seen as insane by Tom et al, it
attempts to persuade that we don't need recv, fsync and apply either.

Fujii has long talked about 4 levels of service also. Why change? I had
thought that part was pretty much agreed between all of us.


Now you lost me. I agree that we need 4 levels of service (at least 
ultimately, not necessarily in the first phase).



Without performance tests to demonstrate "why", these do sound hard to
understand. But we should note that DRBD offers recv ("B") and fsync
("C") as separate options. And Oracle implements all 3 of recv, fsync
and apply. Neither of them describe those options so simply and easily
as the way we are proposing with a 4 valued enum (with async as the
fourth option).

If we have only one option for sync_rep = 'on' which of recv | fsync |
apply would it implement? You don't mention that. Which do you choose?


You would choose between recv, fsync and apply in the slave, with a GUC.


I no longer seek to persuade by words alone. The existence of my patch
means that I think that only measurements and tests will show why I have
been saying these things. We need performance tests.


I don't expect any meaningful differences in terms of performance 
between any of the discussed options. The big question right now is what 
features we provide and how they're configured. Performance will depend 
primarily on the mode you use, and secondarily on the implementation of 
the mode. It would be completely premature to do performance testing yet 
IMHO.



Putting all of that together. I think Fujii-san's standby.conf is pretty
close.


Re: [HACKERS] Configuring synchronous replication

2010-09-17 Thread Simon Riggs
On Fri, 2010-09-17 at 12:30 +0300, Heikki Linnakangas wrote:

> If the synchronicity is configured in the standby, how does the master 
> know that there's a synchronous slave out there that it should wait for, 
> if that slave isn't connected at the moment?

That isn't a question you need standby registration to answer.

In my proposal, the user requests a certain level of confirmation and
will wait until timeout to see if it is received. The standby can crash
and restart, come back and provide the answer, and it will still work.

So it is the user request that informs the master that there would
normally be a synchronous slave out there it should wait for.

So far, I have added the point that if a user requests a level of
confirmation that is currently unavailable, then it will use the highest
level of confirmation available now. That stops us from waiting for
timeout for every transaction we run if standby goes down hard, which
just freezes the application for long periods to no real benefit. It
also prevents applications from requesting durability levels the cluster
cannot satisfy, in the opinion of the sysadmin, since the sysadmin
specifies the max level on each standby.

-- 
 Simon Riggs   www.2ndQuadrant.com
 PostgreSQL Development, 24x7 Support, Training and Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Configuring synchronous replication

2010-09-17 Thread Dimitri Fontaine
Heikki Linnakangas  writes:
> If the synchronicity is configured in the standby, how does the master know
> that there's a synchronous slave out there that it should wait for, if that
> slave isn't connected at the moment?

That's what quorum is trying to solve. The master knows how many votes
per sync level the transaction needs. If no slave is acknowledging any
vote, that's all you need to know to ROLLBACK (after the timeout),
right? — if setup says so, on the master.

> Yeah, the quorum stuff. That's all good, but doesn't change the way you
> would do per-transaction control. 

That's when I bought in on the feature. It's all dynamic and
distributed, and it offers per-transaction control.

Regards,
-- 
Dimitri Fontaine
PostgreSQL DBA, Architecte

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] trigger failover with signal

2010-09-17 Thread Fujii Masao
On Fri, Sep 17, 2010 at 5:09 PM, Heikki Linnakangas
 wrote:
> That said, there's a few small things that can be progressed regardless of
> the details of synchronous replication. There's the changes to trigger
> failover with a signal, and it seems that we'll need some libpq changes to
> allow acknowledgments to be sent back to the master regardless of the rest
> of the design. We can discuss those in separate threads in parallel.

At first, we should fix the problem that startup process cannot
respond to SIGTERM and SIGHUP immediately by committing the patch
that I submitted in the following post.
http://archives.postgresql.org/pgsql-hackers/2010-09/msg00858.php

> pg_ctl failover ? At the moment, the location of the trigger file is
> configurable, but if we accept a constant location like "$PGDATA/failover"
> pg_ctl could do the whole thing, create the file and send signal. pg_ctl on
> Window already knows how to send the "signal" via the named pipe signal
> emulation.
http://archives.postgresql.org/pgsql-hackers/2010-09/msg00797.php

This makes sense. The remaining problem is which signal we should send
to postmaster. SIGHUP, SIGTERM, SIGINT, SIGQUIT and SIGUSR1 are already
used for other purposes. SIGUSR2 is actually unused, but reserved for
children according to the source code comment.

We can use SIGHUP to wake up startup process since postmaster sends
it to startup process when it arrives. But the reload of the
configuration files happens as a side-effect. So I think that SIGHUP
handler should check whether the default trigger file exists first,
and then skip the reload and just wake up startup process if it does.
Thought?

Or we should track the pid of startup process and make pg_ctl send the
signal directly to startup process?

Regards,

-- 
Fujii Masao
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Configuring synchronous replication

2010-09-17 Thread Simon Riggs
On Fri, 2010-09-17 at 11:09 +0300, Heikki Linnakangas wrote:
> (changed subject again.)
> 
> On 17/09/10 10:06, Simon Riggs wrote:
> > I don't think we can determine how far to implement without considering
> > both approaches in detail. With regard to your points below, I don't
> > think any of those points could be committed first.
> 
> Yeah, I think we need to decide on the desired feature set first, before 
> we dig deeper into the the patches. The design and implementation will 
> fall out of that.

Well, we've discussed these things many times and talking hasn't got us
very far on its own. We need measurements and neutral assessments.

The patches are simple and we have time.

This isn't just about UI, there are significant and important
differences between the proposals in terms of the capability and control
they offer.

I propose we develop both patches further and performance test them.
Many of the features I have proposed are performance related and people
need to be able to see what is important, and what is not. But not
through mere discussion, we need numbers to show which things matter and
which things don't. And those need to be derived objectively.

> * Support multiple standbys with various synchronization levels.
> 
> * What happens if a synchronous standby isn't connected at the moment? 
> Return immediately vs. wait forever.
> 
> * Per-transaction control. Some transactions are important, others are not.
> 
> * Quorum commit. Wait until n standbys acknowledge. n=1 and n=all 
> servers can be seen as important special cases of this.
> 
> * async, recv, fsync and replay levels of synchronization.

That's a reasonable starting list of points, there may be others.


> So what should the user interface be like? Given the 1st and 2nd 
> requirement, we need standby registration. If some standbys are 
> important and others are not, the master needs to distinguish between 
> them to be able to determine that a transaction is safely delivered to 
> the important standbys.

My patch provides those two requirements without standby registration,
so we very clearly don't "need" standby registration.

The question is do we want standby registration on master and if so,
why?


> For per-transaction control, ISTM it would be enough to have a simple 
> user-settable GUC like synchronous_commit. Let's call it 
> "synchronous_replication_commit" for now. 

If you wish to change the name of the GUC away from the one I have
proposed, fine. Please note that aspect isn't important to me and I will
happily concede all such points to the majority view.

> For non-critical transactions, 
> you can turn it off. That's very simple for developers to understand and 
> use. I don't think we need more fine-grained control than that at 
> transaction level, in all the use cases I can think of you have a stream 
> of important transactions, mixed with non-important ones like log 
> messages that you want to finish fast in a best-effort fashion. 

Sounds like we're getting somewhere. See below.

> I'm 
> actually tempted to tie that to the existing synchronous_commit GUC, the 
> use case seems exactly the same.

http://archives.postgresql.org/pgsql-hackers/2008-07/msg01001.php
Check the date!

I think that particular point is going to confuse us. It will draw much
bike shedding and won't help us decide between patches. It's a nicety
that can be left to a time after we have the core feature committed.

> OTOH, if we do want fine-grained per-transaction control, a simple 
> boolean or even an enum GUC doesn't really cut it. For truly 
> fine-grained control you want to be able to specify exceptions like 
> "wait until this is replayed in slave named 'reporting'" or 'don't wait 
> for acknowledgment from slave named 'uk-server'". With standby 
> registration, we can invent a syntax for specifying overriding rules in 
> the transaction. Something like SET replication_exceptions = 
> 'reporting=replay, uk-server=async'.
> 
> For the control between async/recv/fsync/replay, I like to think in 
> terms of
> a) asynchronous vs synchronous
> b) if it's synchronous, how synchronous is it? recv, fsync or replay?
> 
> I think it makes most sense to set sync vs. async in the master, and the 
> level of synchronicity in the slave. Although I have sympathy for the 
> argument that it's simpler if you configure it all from the master side 
> as well.

I have catered for such requests by suggesting a plugin that allows you
to implement that complexity without overburdening the core code.

This strikes me as an "ad absurdum" argument. Since the above
over-complexity would doubtless be seen as insane by Tom et al, it
attempts to persuade that we don't need recv, fsync and apply either.

Fujii has long talked about 4 levels of service also. Why change? I had
thought that part was pretty much agreed between all of us.

Without performance tests to demonstrate "why", these do sound hard to
understand. But we should note that DRBD offers recv

Re: [HACKERS] Configuring synchronous replication

2010-09-17 Thread Heikki Linnakangas

On 17/09/10 12:10, Dimitri Fontaine wrote:

Heikki Linnakangas  writes:

* Support multiple standbys with various synchronization levels.

* What happens if a synchronous standby isn't connected at the moment?
Return immediately vs. wait forever.

* Per-transaction control. Some transactions are important, others are not.

* Quorum commit. Wait until n standbys acknowledge. n=1 and n=all servers
can be seen as important special cases of this.

* async, recv, fsync and replay levels of synchronization.

So what should the user interface be like? Given the 1st and 2nd
requirement, we need standby registration. If some standbys are important
and others are not, the master needs to distinguish between them to be able
to determine that a transaction is safely delivered to the important
standbys.


Well the 1st point can be handled in a distributed fashion, where the
sync level is setup at the slave.


If the synchronicity is configured in the standby, how does the master 
know that there's a synchronous slave out there that it should wait for, 
if that slave isn't connected at the moment?



OTOH, if we do want fine-grained per-transaction control, a simple boolean
or even an enum GUC doesn't really cut it. For truly fine-grained control
you want to be able to specify exceptions like "wait until this is replayed
in slave named 'reporting'" or 'don't wait for acknowledgment from slave
named 'uk-server'". With standby registration, we can invent a syntax for
specifying overriding rules in the transaction. Something like SET
replication_exceptions = 'reporting=replay, uk-server=async'.


Then you want to be able to have more than one reporting server and need
only one of them at the "replay" level, but you don't need to know which
it is. Or on the contrary you have a failover server and you want to be
sure this one is at the replay level whatever happens.

Then you want topology flexibility: you need to be able to replace a
reporting server with another, ditto for the failover one.

Did I tell you my current thinking on how to tackle that yet? :) Using a
distributed setup, where each slave has a weight (several votes per
transaction) and a level offering would allow that I think.


Yeah, the quorum stuff. That's all good, but doesn't change the way you 
would do per-transaction control. By specifying overrides on a 
per-transaction basis, you can have as fine-grained control as you 
possibly can. Anything you can specify in a configuration file can then 
also be specified per-transaction with overrides. The syntax just needs 
to be flexible enough.


If we buy into the concept of per-transaction exceptions, we can put 
that issue aside for the moment, and just consider how to configure 
things in a config file. Anything you can express in the config file can 
also be expressed per-transaction with the exceptions GUC.



Now something similar to your idea that I can see a need for is being
able to have a multi-part quorum target: when you currently say that you
want 2 votes for sync, you would be able to say you want 2 votes for
recv, 2 for fsync and 1 for replay. Remember that any slave is setup to
offer only one level of synchronicity but can offer multiple votes.

How this would look like in the setup? Best would be to register the
different service levels your application need. Time to bikeshed a
little?

   sync_rep_services = {critical: recv=2, fsync=2, replay=1;
important: fsync=3;
reporting: recv=2, apply=1}

Well you get the idea, it could maybe get stored on a catalog somewhere
with nice SQL commands etc. The goal is then to be able to handle a much
simpler GUC in the application, sync_rep_service = important for
example. Reserved label would be off, the default value


So ignoring the quorum stuff for a moment, the general idea is that you 
have predefined sets of configurations (or exceptions to the general 
config) specified in a config file, and in the application you just 
choose among those with "sync_rep_service=XXX". Yeah, I like that, it 
allows you to isolate the details of the topology from the application.



If you add a weight to each slave then a quorum commit, you don't change
the implementation complexity and you offer lot of setup flexibility. If
the slave sync-level and weight are SIGHUP, then it even become rather
easy to switch roles online or to add new servers or to organise a
maintenance window — the quorum to reach is a per-transaction GUC on the
master, too, right?


I haven't bought into the quorum idea yet, but yeah, if we have quorum 
support, then it would be configurable on a per-transaction basis too 
with the above mechanism.


--
  Heikki Linnakangas
  EnterpriseDB   http://www.enterprisedb.com

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] libpq changes for synchronous replication

2010-09-17 Thread Fujii Masao
On Fri, Sep 17, 2010 at 5:09 PM, Heikki Linnakangas
 wrote:
> That said, there's a few small things that can be progressed regardless of
> the details of synchronous replication. There's the changes to trigger
> failover with a signal, and it seems that we'll need some libpq changes to
> allow acknowledgments to be sent back to the master regardless of the rest
> of the design. We can discuss those in separate threads in parallel.

Agreed. The attached patch introduces new function which is used
to send ACK back from walreceiver. The function sends a message
to XLOG stream by calling PQputCopyData. Also I allowed PQputCopyData
to be called even during COPY OUT.

Regards,

-- 
Fujii Masao
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center


libpqrcv_send_v1.patch
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Configuring synchronous replication

2010-09-17 Thread Simon Riggs
On Fri, 2010-09-17 at 09:15 +0100, Simon Riggs wrote:
> On Fri, 2010-09-17 at 11:09 +0300, Heikki Linnakangas wrote:
> > That said, there's a few small things that can be progressed
> > regardless of the details of synchronous replication. There's the
> > changes to trigger failover with a signal, and it seems that we'll
> > need some libpq changes to allow acknowledgments to be sent back to
> > the master regardless of the rest of the design. We can discuss those
> > in separate threads in parallel. 
> 
> Agree to both of those points.

But I don't agree that those things should be committed just yet.

-- 
 Simon Riggs   www.2ndQuadrant.com
 PostgreSQL Development, 24x7 Support, Training and Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] ALTER TYPE extensions

2010-09-17 Thread KaiGai Kohei
(2010/08/09 5:54), Peter Eisentraut wrote:
> For the next review cycle, here is a patch that adds some ALTER TYPE
> subcommands for composite types:
> 
> ALTER TYPE ... ADD ATTRIBUTE
> ALTER TYPE ... DROP ATTRIBUTE
> ALTER TYPE ... ALTER ATTRIBUTE ... SET DATA TYPE
> ALTER TYPE ... RENAME ATTRIBUTE
> 
> These work similarly to the analogous ALTER TABLE / $ACTION COLUMN
> commands.  The first two above are from the SQL standard.
> 

I checked this patch, then noticed some points:

* At the ATPrepAddColumn(), it seems to me someone added a check
  to prevent adding a new column to typed table, as you try to
  add in this patch.
  Since this patch was submitted about one month ago, it might be
  necessary to rebase to the latest master.

* At the ATPrepAlterColumnType(), you enclosed an existing code
  block by "if (tab->relkind == RELKIND_RELATION) { ... }", but
  it is not indented to appropriate level.

|if (tab->relkind == RELKIND_RELATION)
|{
|/*
| * Set up an expression to transform the old data value to the new type.
| * If a USING option was given, transform and use that expression, else
| * just take the old value and try to coerce it.  We do this first so that
| * type incompatibility can be detected before we waste effort, and
| * because we need the expression to be parsed against the original table
| * rowtype.
| */
|if (cmd->transform)
|{
|RangeTblEntry *rte;
|
|/* Expression must be able to access vars of old table */
|rte = addRangeTableEntryForRelation(pstate,
|:

  Perhaps, it is violated to the common coding style.

* RENAME ATTRIBUTE ... TO ...

  Even if the composite type to be altered is in use, we can alter
  the name of attribute. Is it intended?
  In this case, this renaming does not affects column name of the
  typed tables in use.
  Is it necessary to prohibit renaming, or also calls renameatt()
  for the depending typed tables.

  postgres=# CREATE TYPE comp as (a int, b text);
  CREATE TYPE
  postgres=# CREATE TABLE t OF comp;
  CREATE TABLE
  postgres=# SELECT * FROM t;
   a | b
  ---+---
  (0 rows)

  postgres=# ALTER TYPE comp RENAME ATTRIBUTE b TO bbb;
  ALTER TYPE
  postgres=# CREATE TABLE s OF comp;
  CREATE TABLE
  postgres=# SELECT * FROM t;
   a | b
  ---+---
  (0 rows)

  postgres=# SELECT * FROM s;
   a | bbb
  ---+-
  (0 rows)


BTW, is there any requirement from SQL standard about behavior
when we try to add/drop an attribute of composite type in use?
This patch always prohibit it, using find_typed_table_dependencies()
and find_composite_type_dependencies().
However, it seems to me not difficult to alter columns of typed
tables subsequent with this ALTER TYPE, although it might be
not easy to alter definitions of embedded composite type already
in use.
Of course, it may be our future works. If so, it's good.

Thanks,
-- 
KaiGai Kohei 

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


  1   2   >