Re: [HACKERS] bg worker: general purpose requirements
-BEGIN PGP SIGNED MESSAGE- Hash: SHA1 On Fri, Sep 17, 2010 at 11:21:13PM -0400, Robert Haas wrote: [...] > Wow, 100 processes??! Really? I guess I don't actually know how large > modern proctables are, but on my MacOS X machine, for example, there > are only 75 processes showing up right now in "ps auxww". My Fedora > 12 machine has 97. That's including a PostgreSQL instance in the > first case and an Apache instance in the second case. So 100 workers > seems like a ton to me. As an equally unscientific data point, on my box, a typical desktop box (actually a netbook, slow CPU, but beefed up to 2GB RAM), I have 5 PostgreSQL processes running, which take away about 1.2 MB (resident) -- not each one, but together!. As a contrast, there is *one* mysql daemon (don't ask!), taking away 17 MB. The worst offenders are, by far, the eye-candy thingies, as one has become accustomed to expect :-( What I wanted to say is that the PostgreSQL processes are unusually light-weight by modern standards. Regards - -- tomás -BEGIN PGP SIGNATURE- Version: GnuPG v1.4.6 (GNU/Linux) iD8DBQFMlFEhBcgs9XrR2kYRAlqHAJ9rz5eQhqnh62H5QljDjU0E68ai6wCffnCW ybV0RIdDy769/JYBBq7xakA= =7Vc/ -END PGP SIGNATURE- -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] bg worker: general purpose requirements
Robert Haas writes: > Wow, 100 processes??! Really? I guess I don't actually know how large > modern proctables are, but on my MacOS X machine, for example, there > are only 75 processes showing up right now in "ps auxww". My Fedora > 12 machine has 97. That's including a PostgreSQL instance in the > first case and an Apache instance in the second case. So 100 workers > seems like a ton to me. The part of that that would worry me is open files. PG backends don't have any compunction about holding open hundreds of files. Apiece. You can dial that down but it'll cost you performance-wise. Last I checked, most Unix kernels still had limited-size FD arrays. And as you say, ProcArray manipulations aren't going to be terribly happy about large numbers of idle backends, either. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] bg worker: general purpose requirements
On Fri, Sep 17, 2010 at 4:49 PM, Markus Wanner wrote: >> If you're optimizing for >> ability to respond quickly to a sudden load, keeping idle backends >> will probably win even when the number of them you're keeping around >> is fairly high. If you're optimizing for minimal overall resource >> consumption, though, you'll not be as happy about that. > > What resources are we talking about here? Are idle backends really that > resource hungry? My feeling so far has been that idle processes are > relatively cheap (i.e. some 100 idle processes shouldn't hurt on a modern > server). Wow, 100 processes??! Really? I guess I don't actually know how large modern proctables are, but on my MacOS X machine, for example, there are only 75 processes showing up right now in "ps auxww". My Fedora 12 machine has 97. That's including a PostgreSQL instance in the first case and an Apache instance in the second case. So 100 workers seems like a ton to me. >> What I'm >> struggling to understand is this: if there aren't any preforked >> workers around when the load hits, how much does it slow things down? > > As the startup code is pretty much the same as for the current avlauncher, > the coordinator can only request one bgworker at a time. > > This means the signal needs to reach the postmaster, which then forks a > bgworker process. That new process starts up, connects to the requested > database and then sends an imessage to the coordinator to register. Only > after having received that registration, the coordinator can request another > bgworker (note that this is a one-overall limitation, not per database). > > I haven't measured the actual time it takes, but given the use case of a > connection pool, I so far thought it's obvious that this process takes too > long. Maybe that would be a worthwhile exercise... > (It's exactly what apache pre-fork does, no? Is anybody concerned about the > idle processes there? Or do they consume much less resources?) I think the kicker here is the idea of having a certain number of extra workers per database. On my vanilla Apache server on the above-mentioned Fedora 12 VM, there are a total of 10 processes running. I am sure that could balloon to 100 or more under load, but it's not keeping around 100 processes on an otherwise idle system. So if you knew you only had 1 database, keeping around 2 or 3 or 5 or even 10 workers might seem reasonable, but since you might have 1 database or 1000 databases, it doesn't. Keeping 2 or 3 or 5 or 10 workers TOTAL around could be reasonable, but not per-database. As Tom said upthread, we don't want to assume that we're the only thing running on the box and are therefore entitled to take up all the available memory/disk/process slots/whatever. And even if we DID feel so entitled, there could be hundreds of databases, and it certainly doesn't seem practical to keep 1000 workers around "just in case". I don't know whether an idle Apache worker consumes more or less memory than an idle PostgreSQL worker, but another difference between the Apache case and the PostgreSQL case is that presumably all those backend processes have attached shared memory and have ProcArray slots. We know that code doesn't scale terribly well, especially in terms of taking snapshots, and that's one reason why high-volume PostgreSQL installations pretty much require a connection pooler. I think the sizes of the connection pools I've seen recommended are considerably smaller than 100, more like 2 * CPUs + spindles, or something like that. It seems like if you actually used all 100 workers at the same time performance might be pretty awful. I was taking a look at the Mammoth Replicator code this week (parenthetical note: I couldn't figure out where mcp_server was or how to set it up) and it apparently has a limitation that only one database in the cluster can be replicated. I'm a little fuzzy on how Mammoth works, but apparently this problem of scaling to large numbers of databases is not unique to Postgres-R. >> Is the ramp-up time longer than that, >> or is even that much delay unacceptable for Postgres-R, or is there >> some other aspect to the problem I'm failing to grasp? I can tell you >> have some experience tuning this so I'd like to try to understand >> where you're coming from. > > I didn't ever compare to a max_spare_background_workers = 0 configuration, > so I don't have any hard numbers, sorry. Hmm, OK. >> I think this is an interesting example, and worth some further >> thought. I guess I don't really understand how Postgres-R uses these >> bgworkers. > > The given example doesn't only apply to Postgres-R. But with fewer bgworkers > in total, you are more likely to want to use them all for one database, yes. > >> Are you replicating one transaction at a time, or how does >> the data get sliced up? > > Yes, one transaction at a time. One transaction per backend (bgworker). On a > cluster with n nodes that has only performs writing transaction
Re: [HACKERS] patch: Add JSON datatype to PostgreSQL (GSoC, WIP)
On Sat, Sep 18, 2010 at 11:46 AM, Robert Haas wrote: > wrote: >> One of my proposal is we don't have to keep the original input text. >> We store JSON data in effective internal formats. If users want to get >> human-readable output, they can use stringify() with indentation option. > > There's a trade-off here: this will make some things faster, but other > things slower. Probably some discussion of the pros and cons is in > order. I didn't intended to introduce non-text internal formats. The original patch spent some codes to keep all of whitespaces as-is in the input. But I'd say we can simplify it. Except whitespaces, normalization of strings and numbers might be problem when we support JSON comparison operators -- comparison of Unicode escaped characters in strings or 0 vs. 0.0 in numbers. -- Itagaki Takahiro -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] patch: Add JSON datatype to PostgreSQL (GSoC, WIP)
On Fri, Sep 17, 2010 at 10:28 PM, Itagaki Takahiro wrote: > One of my proposal is we don't have to keep the original input text. > We store JSON data in effective internal formats. If users want to get > human-readable output, they can use stringify() with indentation option. There's a trade-off here: this will make some things faster, but other things slower. Probably some discussion of the pros and cons is in order. (Also, it's important not to break EXPLAIN (FORMAT JSON), which thinks that the internal format of JSON is text.) -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise Postgres Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] patch: Add JSON datatype to PostgreSQL (GSoC, WIP)
Itagaki Takahiro wrote: > On Sat, Sep 18, 2010 at 6:45 AM, Joseph Adams > wrote: > >> Why do we need to store the value in UTF8 encoding? > > > > because the JSON code needs to handle Unicode escapes like > > "\u266B", but there is no simple and efficient way (that I know of) to > > convert single characters to/from the server encoding. > > Ah, we don't need UTF8 encoding only to store JSON data, but we should > care about Unicode escape when we support comparison and extracting > values from JSON, right? I see the worth encoding to UTF8. > > One of my proposal is we don't have to keep the original input text. > We store JSON data in effective internal formats. If users want to get > human-readable output, they can use stringify() with indentation option. > > > I think your patch is on a better footing than mine, so maybe I should > > start contributing to your code rather than the other way around. > > Before the next commitfest, I could merge the testcases from my patch > > in and identify parsing discrepancies (if any). ?Afterward, I could > > help merge the other features into the new JSON infrastructure. > > Thanks! I'll contribute my codes developed for another project > (PL/JavaScript), and let's merge our codes to the core. > > > I can't compile your initial patch against the latest checkout because > > json_parser.h and json_scanner.h are missing. > > Hmm, those files should be generated from .y and .l files. I'll check it. I am please the two efforts can be joined. I like the idea of PL/JavaScript too. -- Bruce Momjian http://momjian.us EnterpriseDB http://enterprisedb.com + It's impossible for everything to be true. + -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] patch: Add JSON datatype to PostgreSQL (GSoC, WIP)
On Sat, Sep 18, 2010 at 6:45 AM, Joseph Adams wrote: >> Why do we need to store the value in UTF8 encoding? > > because the JSON code needs to handle Unicode escapes like > "\u266B", but there is no simple and efficient way (that I know of) to > convert single characters to/from the server encoding. Ah, we don't need UTF8 encoding only to store JSON data, but we should care about Unicode escape when we support comparison and extracting values from JSON, right? I see the worth encoding to UTF8. One of my proposal is we don't have to keep the original input text. We store JSON data in effective internal formats. If users want to get human-readable output, they can use stringify() with indentation option. > I think your patch is on a better footing than mine, so maybe I should > start contributing to your code rather than the other way around. > Before the next commitfest, I could merge the testcases from my patch > in and identify parsing discrepancies (if any). Afterward, I could > help merge the other features into the new JSON infrastructure. Thanks! I'll contribute my codes developed for another project (PL/JavaScript), and let's merge our codes to the core. > I can't compile your initial patch against the latest checkout because > json_parser.h and json_scanner.h are missing. Hmm, those files should be generated from .y and .l files. I'll check it. -- Itagaki Takahiro -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Heartbeat between Primary and Standby replicas
Apologies. I'm new to Postgres and I didn't see that feature. It satisfies what I want to do. Thanks. On Thu, Sep 16, 2010 at 7:34 PM, Fujii Masao wrote: > On Fri, Sep 17, 2010 at 6:49 AM, fazool mein wrote: > > I am designing a heartbeat system between replicas to know when a replica > > goes down so that necessary measures can be taken. As I see, there are > two > > ways of doing it: > > > > 1) Creating a separate heartbeat process on replicas. > > 2) Creating a heartbeat message, and sending it over the connection that > is > > already established between walsender and walreceiver. > > > > With 2, sending heartbeat from walsender to walreceiver seems trivial. > > Sending a heartbeat from walreceiver to walsender seems tricky. Going > > through the code, it seems that the walreceiver is always in the > > PGASYNC_COPY_OUT mode (except in the beginning when handshaking is done). > > > > Can you recommend the right way of doing this? > > The existing keepalive feature doesn't help? > > Regards, > > -- > Fujii Masao > NIPPON TELEGRAPH AND TELEPHONE CORPORATION > NTT Open Source Software Center >
Re: [HACKERS] VACUUM produces odd freespace values
Thom Brown wrote: > > VACUUM mvcc_demo; > > VACUUM > > SELECT pg_relation_size('mvcc_demo'); > > ?pg_relation_size > > -- > > ? ? ? ? ? ? ? ?0 > > (1 row) > > > > That's odd. When I delete val 2, the freespace goes back up in 9.0rc1 > (attached). Your numbers are odd too. With one row after INSERT you show: (0,8128) but after delete you with one row left you show: (0,8096) You also dip to (0,8032), which I don't see. I now see we only track range of free space values. For example, freespace.c has: * We use just one byte to store the amount of free space on a page, so we * divide the amount of free space a page can have into 256 different * categories. The highest category, 255, represents a page with at least * MaxFSMRequestSize bytes of free space, and the second highest category * represents the range from 254 * FSM_CAT_STEP, inclusive, to * MaxFSMRequestSize, exclusive. * * MaxFSMRequestSize depends on the architecture and BLCKSZ, but assuming * default 8k BLCKSZ, and that MaxFSMRequestSize is 24 bytes, the categories * look like this * * * Range Category * 0- 31 0 * 32 - 63 1 * ...... ... * 8096 - 8127 253 * 8128 - 8163 254 * 8164 - 8192 255 So, my guess is that the unused item pointers are causing the free space to fall into a smaller category than we had after the first INSERT. I bet if I pulled more columns from heap_page_items() I could see it. Anyway, I think I have my explaination now. Thanks. -- Bruce Momjian http://momjian.us EnterpriseDB http://enterprisedb.com + It's impossible for everything to be true. + -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Report: removing the inconsistencies in our CVS->git conversion
I wrote: > Oh, mystery explained upon comparing the cvs2git.options files. > I was using Max's file which had this in it: > ExcludeRegexpStrategyRule(r'unlabeled-.*'), > I think I'll rerun without that just to convince myself of what it is > we're dropping. But right now it seems that everything is pretty sane. OK, I found out what we're dropping. Those "unlabeled" branches each have a manufactured creation commit that just deletes a lot of files, plus a single real commit, which all look like this: + commit 6c23cb5f371f273cade66408f3a888c2f94af293 refs/heads/unlabeled-1.51.2 + Author: Tom Lane + Date: Sat Sep 28 20:00:29 2002 + + + Make the world at least somewhat safe for zero-column tables, and + remove the special case in ALTER DROP COLUMN to prohibit dropping a + table's last column. + + M src/test/regress/expected/alter_table.out + + commit d554d56827e4337835b0792c2568e8e9fee2d950 refs/heads/unlabeled-1.87.2 + Author: Tom Lane + Date: Sat Sep 28 20:00:28 2002 + + + Make the world at least somewhat safe for zero-column tables, and + remove the special case in ALTER DROP COLUMN to prohibit dropping a + table's last column. + + M src/backend/executor/nodeAgg.c + + commit 7724eba3a121e5c84da3467fe6f0b176bf3f3d4b refs/heads/unlabeled-1.59.2 + Author: Tom Lane + Date: Sat Sep 28 20:00:27 2002 + + + Make the world at least somewhat safe for zero-column tables, and + remove the special case in ALTER DROP COLUMN to prohibit dropping a + table's last column. + + M src/backend/executor/execTuples.c + + commit 6924390f0cfa2aac1308719e51cb4cc2b24e2bc7 refs/heads/unlabeled-1.44.2 + Author: Tom Lane + Date: Sat Sep 28 20:00:26 2002 + + + Make the world at least somewhat safe for zero-column tables, and + remove the special case in ALTER DROP COLUMN to prohibit dropping a + table's last column. + + M src/backend/commands/tablecmds.c + + commit 83029b0525f3324c30d35fcc9c77d2c301bdf7cf refs/heads/unlabeled-1.90.2 + Author: Tom Lane + Date: Sat Sep 28 20:00:25 2002 + + + Make the world at least somewhat safe for zero-column tables, and + remove the special case in ALTER DROP COLUMN to prohibit dropping a + table's last column. + + M src/backend/access/common/tupdesc.c + M src/backend/parser/parse_target.c + commit 6d0d15c451739396851d3f93f81c63a47535bf1e refs/tags/REL7_4_BETA1 Author: Tom Lane Date: Sat Sep 28 20:00:19 2002 + If you go and look at the CVS history of these files, you'll see that indeed they all have branches sprouted on that date that are unreferenced. This commit occurred during that interval between where we'd initially branched 7.3 and where we moved it up to head. I think what happened was that I tried to back-patch a fix into what was then the 7.3 branch, and when Marc moved the branch point, these commits all ended up orphans. So indeed there's no harm discarding them. (I always suspected that what Marc had done wasn't entirely kosher, and yup, now the chickens are coming home to roost.) regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] VACUUM produces odd freespace values
On 18 September 2010 02:00, Bruce Momjian wrote: > Bruce Momjian wrote: >> Can anyone explain why VACUUM after INSERT shows steadily decreasing >> freespace, while DELETE of the same rows does not decrease consistently? >> >> Specifically, after one row is inserted I see: >> >> SELECT pg_freespace('mvcc_demo'); >> pg_freespace >> -- >> (0,8128) >> (1 row) >> >> but after inserting two more rows and deleting those two rows, I see: >> >> SELECT pg_freespace('mvcc_demo'); >> pg_freespace >> -- >> (0,8096) >> (1 row) >> >> Seems that value should be '(0,8128)'. Is it the unused line pointers >> that are causing this? >> >> Another odd thing --- if I change the second VACUUM to VACUUM FULL I >> see: >> >> VACUUM FULL mvcc_demo; >> VACUUM >> SELECT pg_freespace('mvcc_demo'); >> pg_freespace >> -- >> (0,0) >> (1 row) >> >> There is still a row in the table, so why is there no free space >> reported? I realize after VACUUM FULL that only the last page has >> freespace --- do we assume that will be used as default for the next >> addition and just not bother with the free space map? --- makes sense if >> we do that. Does this happen because cluster creates a new relfilenode? >> >> I am attaching the init script, the SQL query script, and the results I >> obtained against our CVS HEAD. > > Sorry. Attached is trimmed-down result file that shows just the > problem. > > -- > Bruce Momjian http://momjian.us > EnterpriseDB http://enterprisedb.com > > + It's impossible for everything to be true. + > > > 00-init.sql > -- > This script is designed to run in a database called test > and requires installation of /contrib/pageinspect and > /contrib/pg_freespacemap. > You are now connected to database "test" as user "postgres". > DROP TABLE IF EXISTS mvcc_demo; > DROP TABLE > CREATE TABLE mvcc_demo (val INTEGER); > CREATE TABLE > DROP VIEW IF EXISTS mvcc_demo_page0; > DROP VIEW > CREATE VIEW mvcc_demo_page0 AS > SELECT '(0,' || lp || ')' AS ctid, > CASE lp_flags > WHEN 0 THEN 'Unused' > WHEN 1 THEN 'Normal' > WHEN 2 THEN 'Redirect to ' || lp_off > WHEN 3 THEN 'Dead' > END, > t_xmin::text::int8 AS xmin, > t_xmax::text::int8 AS xmax, > t_ctid > FROM heap_page_items(get_raw_page('mvcc_demo', 0)) > ORDER BY lp; > CREATE VIEW > > 31-vacuum-freesp.sql > -- > TRUNCATE mvcc_demo; > TRUNCATE TABLE > VACUUM mvcc_demo; > VACUUM > SELECT pg_freespace('mvcc_demo'); > pg_freespace > -- > (0 rows) > > INSERT INTO mvcc_demo VALUES (1); > INSERT 0 1 > VACUUM mvcc_demo; > VACUUM > SELECT pg_freespace('mvcc_demo'); > pg_freespace > -- > (0,8128) > (1 row) > > INSERT INTO mvcc_demo VALUES (2); > INSERT 0 1 > VACUUM mvcc_demo; > VACUUM > SELECT pg_freespace('mvcc_demo'); > pg_freespace > -- > (0,8096) > (1 row) > > INSERT INTO mvcc_demo VALUES (3); > INSERT 0 1 > VACUUM mvcc_demo; > VACUUM > SELECT pg_freespace('mvcc_demo'); > pg_freespace > -- > (0,8064) > (1 row) > > DELETE FROM mvcc_demo WHERE val = 3; > DELETE 1 > VACUUM mvcc_demo; > VACUUM > SELECT pg_freespace('mvcc_demo'); > pg_freespace > -- > (0,8096) > (1 row) > > DELETE FROM mvcc_demo WHERE val = 2; > DELETE 1 > VACUUM mvcc_demo; > VACUUM > SELECT pg_freespace('mvcc_demo'); > pg_freespace > -- > (0,8096) > (1 row) > > SELECT * FROM mvcc_demo_page0; > ctid | case | xmin | xmax | t_ctid > ---++--+--+ > (0,1) | Normal | 1339 | 0 | (0,1) > (0,2) | Unused | | | > (0,3) | Unused | | | > (3 rows) > > DELETE FROM mvcc_demo WHERE val = 1; > DELETE 1 > VACUUM mvcc_demo; > VACUUM > SELECT pg_freespace('mvcc_demo'); > pg_freespace > -- > (0 rows) > > VACUUM mvcc_demo; > VACUUM > SELECT pg_relation_size('mvcc_demo'); > pg_relation_size > -- > 0 > (1 row) > That's odd. When I delete val 2, the freespace goes back up in 9.0rc1 (attached). -- Thom Brown Twitter: @darkixion IRC (freenode): dark_ixion Registered Linux user: #516935 mvcctest=# INSERT INTO mvcc_demo VALUES (1);VACUUM mvcc_demo;SELECT pg_freespace('mvcc_demo'); INSERT 0 1 VACUUM pg_freespace -- (0,8128) (1 row) mvcctest=# INSERT INTO mvcc_demo VALUES (2);VACUUM mvcc_demo;SELECT pg_freespace('mvcc_demo'); INSERT 0 1 VACUUM pg_freespace -- (0,8064) (1 row) mvcctest=# INSERT INTO mvcc_demo VALUES (3);VACUUM mvcc_demo;SELECT pg_freespace('mvcc_demo'); INSERT 0 1 VACUUM pg_freespace -- (0,8032) (1 row) =# DELETE FROM mvcc_demo WHERE val = 3;VACUUM mvcc_demo;SELECT pg_freespace('mvcc_demo'); DELETE 1 VACUUM
Re: [HACKERS] Bad cast priority for DATE?
> timestamptz is a preferred type, so no you probably can't change that > without breaking a lot of stuff. It's not immediately clear to me why > that's wrong anyway. Just that having a value implicitly acquire time zone information it didn't originally have seems dangerous. But I can't come up with a specific example of breakage right now -- at least not one on a single server. -- -- Josh Berkus PostgreSQL Experts Inc. http://www.pgexperts.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] VACUUM produces odd freespace values
Bruce Momjian wrote: > Can anyone explain why VACUUM after INSERT shows steadily decreasing > freespace, while DELETE of the same rows does not decrease consistently? > > Specifically, after one row is inserted I see: > > SELECT pg_freespace('mvcc_demo'); >pg_freespace > -- >(0,8128) > (1 row) > > but after inserting two more rows and deleting those two rows, I see: > > SELECT pg_freespace('mvcc_demo'); >pg_freespace > -- >(0,8096) > (1 row) > > Seems that value should be '(0,8128)'. Is it the unused line pointers > that are causing this? > > Another odd thing --- if I change the second VACUUM to VACUUM FULL I > see: > > VACUUM FULL mvcc_demo; > VACUUM > SELECT pg_freespace('mvcc_demo'); >pg_freespace > -- >(0,0) > (1 row) > > There is still a row in the table, so why is there no free space > reported? I realize after VACUUM FULL that only the last page has > freespace --- do we assume that will be used as default for the next > addition and just not bother with the free space map? --- makes sense if > we do that. Does this happen because cluster creates a new relfilenode? > > I am attaching the init script, the SQL query script, and the results I > obtained against our CVS HEAD. Sorry. Attached is trimmed-down result file that shows just the problem. -- Bruce Momjian http://momjian.us EnterpriseDB http://enterprisedb.com + It's impossible for everything to be true. + 00-init.sql -- This script is designed to run in a database called test and requires installation of /contrib/pageinspect and /contrib/pg_freespacemap. You are now connected to database "test" as user "postgres". DROP TABLE IF EXISTS mvcc_demo; DROP TABLE CREATE TABLE mvcc_demo (val INTEGER); CREATE TABLE DROP VIEW IF EXISTS mvcc_demo_page0; DROP VIEW CREATE VIEW mvcc_demo_page0 AS SELECT '(0,' || lp || ')' AS ctid, CASE lp_flags WHEN 0 THEN 'Unused' WHEN 1 THEN 'Normal' WHEN 2 THEN 'Redirect to ' || lp_off WHEN 3 THEN 'Dead' END, t_xmin::text::int8 AS xmin, t_xmax::text::int8 AS xmax, t_ctid FROM heap_page_items(get_raw_page('mvcc_demo', 0)) ORDER BY lp; CREATE VIEW 31-vacuum-freesp.sql -- TRUNCATE mvcc_demo; TRUNCATE TABLE VACUUM mvcc_demo; VACUUM SELECT pg_freespace('mvcc_demo'); pg_freespace -- (0 rows) INSERT INTO mvcc_demo VALUES (1); INSERT 0 1 VACUUM mvcc_demo; VACUUM SELECT pg_freespace('mvcc_demo'); pg_freespace -- (0,8128) (1 row) INSERT INTO mvcc_demo VALUES (2); INSERT 0 1 VACUUM mvcc_demo; VACUUM SELECT pg_freespace('mvcc_demo'); pg_freespace -- (0,8096) (1 row) INSERT INTO mvcc_demo VALUES (3); INSERT 0 1 VACUUM mvcc_demo; VACUUM SELECT pg_freespace('mvcc_demo'); pg_freespace -- (0,8064) (1 row) DELETE FROM mvcc_demo WHERE val = 3; DELETE 1 VACUUM mvcc_demo; VACUUM SELECT pg_freespace('mvcc_demo'); pg_freespace -- (0,8096) (1 row) DELETE FROM mvcc_demo WHERE val = 2; DELETE 1 VACUUM mvcc_demo; VACUUM SELECT pg_freespace('mvcc_demo'); pg_freespace -- (0,8096) (1 row) SELECT * FROM mvcc_demo_page0; ctid | case | xmin | xmax | t_ctid ---++--+--+ (0,1) | Normal | 1339 |0 | (0,1) (0,2) | Unused | | | (0,3) | Unused | | | (3 rows) DELETE FROM mvcc_demo WHERE val = 1; DELETE 1 VACUUM mvcc_demo; VACUUM SELECT pg_freespace('mvcc_demo'); pg_freespace -- (0 rows) VACUUM mvcc_demo; VACUUM SELECT pg_relation_size('mvcc_demo'); pg_relation_size -- 0 (1 row) -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] VACUUM produces odd freespace values
Can anyone explain why VACUUM after INSERT shows steadily decreasing freespace, while DELETE of the same rows does not decrease consistently? Specifically, after one row is inserted I see: SELECT pg_freespace('mvcc_demo'); pg_freespace -- (0,8128) (1 row) but after inserting two more rows and deleting those two rows, I see: SELECT pg_freespace('mvcc_demo'); pg_freespace -- (0,8096) (1 row) Seems that value should be '(0,8128)'. Is it the unused line pointers that are causing this? Another odd thing --- if I change the second VACUUM to VACUUM FULL I see: VACUUM FULL mvcc_demo; VACUUM SELECT pg_freespace('mvcc_demo'); pg_freespace -- (0,0) (1 row) There is still a row in the table, so why is there no free space reported? I realize after VACUUM FULL that only the last page has freespace --- do we assume that will be used as default for the next addition and just not bother with the free space map? --- makes sense if we do that. Does this happen because cluster creates a new relfilenode? I am attaching the init script, the SQL query script, and the results I obtained against our CVS HEAD. -- Bruce Momjian http://momjian.us EnterpriseDB http://enterprisedb.com + It's impossible for everything to be true. + \echo This script is designed to run in a database called 'test' \echo and requires installation of /contrib/pageinspect and \echo /contrib/pg_freespacemap. \c test DROP TABLE IF EXISTS mvcc_demo; CREATE TABLE mvcc_demo (val INTEGER); DROP VIEW IF EXISTS mvcc_demo_page0; CREATE VIEW mvcc_demo_page0 AS SELECT '(0,' || lp || ')' AS ctid, CASE lp_flags WHEN 0 THEN 'Unused' WHEN 1 THEN 'Normal' WHEN 2 THEN 'Redirect to ' || lp_off WHEN 3 THEN 'Dead' END, t_xmin::text::int8 AS xmin, t_xmax::text::int8 AS xmax, t_ctid FROM heap_page_items(get_raw_page('mvcc_demo', 0)) ORDER BY lp; -- clear out heap file TRUNCATE mvcc_demo; VACUUM mvcc_demo; SELECT pg_freespace('mvcc_demo'); INSERT INTO mvcc_demo VALUES (1); VACUUM mvcc_demo; SELECT pg_freespace('mvcc_demo'); INSERT INTO mvcc_demo VALUES (2); VACUUM mvcc_demo; SELECT pg_freespace('mvcc_demo'); INSERT INTO mvcc_demo VALUES (3); VACUUM mvcc_demo; SELECT pg_freespace('mvcc_demo'); DELETE FROM mvcc_demo WHERE val = 3; VACUUM mvcc_demo; SELECT pg_freespace('mvcc_demo'); DELETE FROM mvcc_demo WHERE val = 2; VACUUM mvcc_demo; SELECT pg_freespace('mvcc_demo'); SELECT * FROM mvcc_demo_page0; DELETE FROM mvcc_demo WHERE val = 1; VACUUM mvcc_demo; SELECT pg_freespace('mvcc_demo'); VACUUM mvcc_demo; SELECT pg_relation_size('mvcc_demo'); 00-init.sql -- This script is designed to run in a database called test and requires installation of /contrib/pageinspect and /contrib/pg_freespacemap. You are now connected to database "test" as user "postgres". DROP TABLE IF EXISTS mvcc_demo; DROP TABLE CREATE TABLE mvcc_demo (val INTEGER); CREATE TABLE DROP VIEW IF EXISTS mvcc_demo_page0; DROP VIEW CREATE VIEW mvcc_demo_page0 AS SELECT '(0,' || lp || ')' AS ctid, CASE lp_flags WHEN 0 THEN 'Unused' WHEN 1 THEN 'Normal' WHEN 2 THEN 'Redirect to ' || lp_off WHEN 3 THEN 'Dead' END, t_xmin::text::int8 AS xmin, t_xmax::text::int8 AS xmax, t_ctid FROM heap_page_items(get_raw_page('mvcc_demo', 0)) ORDER BY lp; CREATE VIEW 01-xmin_ins.sql -- DELETE FROM mvcc_demo; DELETE 0 INSERT INTO mvcc_demo VALUES (1); INSERT 0 1 SELECT xmin, xmax, * FROM mvcc_demo; xmin | xmax | val --+--+- 1301 |0 | 1 (1 row) 02-xmax_del.sql -- DELETE FROM mvcc_demo; DELETE 1 INSERT INTO mvcc_demo VALUES (1); INSERT 0 1 SELECT xmin, xmax, * FROM mvcc_demo; xmin | xmax | val --+--+- 1303 |0 | 1 (1 row) BEGIN WORK; BEGIN DELETE FROM mvcc_demo; DELETE 1 SELECT xmin, xmax, * FROM mvcc_demo; xmin | xmax | val --+--+- (0 rows) SELECT xmin, xmax, * FROM mvcc_demo; xmin | xmax | val --+--+- 1303 | 1304 | 1 (1 row) COMMIT WORK; COMMIT 03-xmax_upd.sql -- DELETE FROM mvcc_demo; DELETE 0 INSERT INTO mvcc_demo VALUES (1); INSERT 0 1 SELECT xmin, xmax, * FROM mvcc_demo; xmin | xmax | val --+--+- 1305 |0 | 1 (1 row) BEGIN WORK; BEGIN UPDATE mvcc_demo SET val = 2; UPDATE 1 SELECT xmin, xmax, * FROM mvcc_demo; xmin | xmax | val --+--+- 1306 |0 | 2 (1 row)
Re: [HACKERS] Bad cast priority for DATE?
Josh Berkus writes: > I was noticing that, where we have a function which has two versions, > timestamp and timestamptz (for example, date_trunc()), if I use a DATE > timestamptz is the default cast. Shouldn't timestamp without time zone > be the default? Is this something we can fix without an overhaul of the > type casting system? timestamptz is a preferred type, so no you probably can't change that without breaking a lot of stuff. It's not immediately clear to me why that's wrong anyway. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] Bad cast priority for DATE?
Folks, I was noticing that, where we have a function which has two versions, timestamp and timestamptz (for example, date_trunc()), if I use a DATE timestamptz is the default cast. Shouldn't timestamp without time zone be the default? Is this something we can fix without an overhaul of the type casting system? -- -- Josh Berkus PostgreSQL Experts Inc. http://www.pgexperts.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Progress indication prototype
On Fri, Sep 17, 2010 at 4:50 PM, Peter Eisentraut wrote: > On tor, 2010-09-16 at 19:14 -0400, Robert Haas wrote: >> I think that there should be a function which returns just this one >> piece of data and is not automatically called as part of select * from >> pg_stat_activity. Then we could eventually decide to give backends a >> way to know if that function had been invoked on them and how >> recently. > > Displaying this as part of pg_stat_activity is completely trivial: it's > just displaying the value of a float variable. > > It seems you are advocating a completely different architecture, where > someone can find out on demand what the progress or status of another > session is, without that other session having known about that request > before it started its current command. But that seems pretty outlandish > to me, and I would ask for more details on what you have in mind. What you just said is about what I had in mind. I admit I can't articulate a more detailed design right off the top of my head, but the architecture you're proposing seems dead certain to never cover more than 0.1% of what people actually do. If there's not even an obvious way of generalizing this to the case of a full-database VACUUM, let alone actual queries, that seems like a strong hint that it might be badly designed. Leaving some parts of the problem for future development is perfectly reasonable, but there should be some realistic hope that the next guy will be able to make some further progress. It seems to me that this is the sort of information that people will normally never see, and therefore won't be willing to pay a performance penalty for. But when they need it (because something is running long) they'll be happy to pay a modest penalty to get it. Which is good, because the chances that we'll be able to provide this information "for free" seem very poor even for utility commands. But it also means that we shouldn't carve the "can get this for free" aspect of it into stone. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise Postgres Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Report: removing the inconsistencies in our CVS->git conversion
Magnus Hagander writes: > On Sat, Sep 18, 2010 at 00:06, Tom Lane wrote: >> [ scratches head ... ] That's weird. We probably ought to figure out >> why you and I are getting different results. I wonder if there's some >> other discrepancy in the anoncvs pull? > Could be. Oh, mystery explained upon comparing the cvs2git.options files. I was using Max's file which had this in it: ExcludeRegexpStrategyRule(r'unlabeled-.*'), I think I'll rerun without that just to convince myself of what it is we're dropping. But right now it seems that everything is pretty sane. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Report: removing the inconsistencies in our CVS->git conversion
On Sat, Sep 18, 2010 at 00:06, Tom Lane wrote: > Magnus Hagander writes: >> On Fri, Sep 17, 2010 at 23:01, Tom Lane wrote: >>> git branch -D unlabeled-1.44.2 >>> git branch -D unlabeled-1.51.2 >>> git branch -D unlabeled-1.59.2 >>> git branch -D unlabeled-1.87.2 >>> git branch -D unlabeled-1.90.2 >>> >>> You should not need any of the above; I don't see those being generated >>> anymore with the cleanup script in place. > >> I see them - at least the script didn't give an error when it ran, but >> said it had deleted them. > > [ scratches head ... ] That's weird. We probably ought to figure out > why you and I are getting different results. I wonder if there's some > other discrepancy in the anoncvs pull? Could be. > Anyway, if what you have up at > http://github.com/mhagander/pggit_migrate > is current, I can try to reproduce your results here. It is - I just double-checked that. -- Magnus Hagander Me: http://www.hagander.net/ Work: http://www.redpill-linpro.com/ -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Report: removing the inconsistencies in our CVS->git conversion
Magnus Hagander writes: > On Fri, Sep 17, 2010 at 23:01, Tom Lane wrote: >> git branch -D unlabeled-1.44.2 >> git branch -D unlabeled-1.51.2 >> git branch -D unlabeled-1.59.2 >> git branch -D unlabeled-1.87.2 >> git branch -D unlabeled-1.90.2 >> >> You should not need any of the above; I don't see those being generated >> anymore with the cleanup script in place. > I see them - at least the script didn't give an error when it ran, but > said it had deleted them. [ scratches head ... ] That's weird. We probably ought to figure out why you and I are getting different results. I wonder if there's some other discrepancy in the anoncvs pull? Anyway, if what you have up at http://github.com/mhagander/pggit_migrate is current, I can try to reproduce your results here. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] patch: Add JSON datatype to PostgreSQL (GSoC, WIP)
On Fri, Sep 17, 2010 at 8:32 AM, Itagaki Takahiro wrote: > On Fri, Aug 13, 2010 at 7:33 PM, Joseph Adams > wrote: >> Updated patch: the JSON code has all been moved into core, so this >> patch is now for a built-in data type. > > I have a question about the design of the JSON type. Why do we need to > store the value in UTF8 encoding? It's true the RFC of JSON says the > the encoding SHALL be encoded in Unicode, but I don't understand why > we should reject other encodings. Actually, the code in my original patch should work with any server encoding in PostgreSQL. However, internally, it operates in UTF-8 and converts to/from the server encoding when necessary. I did it this way because the JSON code needs to handle Unicode escapes like "\u266B", but there is no simple and efficient way (that I know of) to convert single characters to/from the server encoding. I noticed that in your new patch, you sidestepped the encoding issue by simply storing strings in their encoded form (right?). This is nice and simple, but in the future, JSON tree conversions and updates will still need to deal with the encoding issue somehow. > As I said before, I'd like to propose only 3 features in the commitfest: > * TYPE json data type > * text to json: FUNCTION json_parse(text) > * json to text: FUNCTION json_stringify(json, whitelist, space) Although casting from JSON to TEXT does "stringify" it in my original patch, I think json_stringify would be much more useful. In addition to the formatting options, if the internal format of the JSON type changes and no longer preserves original formatting, then the behavior of the following would change: $$"unnecessary\u0020escape" $$ :: JSON :: TEXT json_stringify would be more predictable because it would re-encode the whitespace (but not the \u0020, unless we went out of our way to make it do that). Also, json_parse is "unnecessary" if you allow casting from TEXT to JSON (which my patch does), but I think having json_parse would be more intuitive for the same reason you do. Long story short: I like it :-) If you're keeping track, features from my patch not in the new code yet are: * Programmatically validating JSON ( json_validate() ) * Getting the type of a JSON value ( json_type() ) * Converting scalar values to/from JSON * Converting arrays to JSON * JSONPath > JSONPath will be re-implemented on the basic functionalities in the > subsequent commitfest. Do you have a plan to split your patch? > Or, can I continue to develop my patch? If so, JSONPath needs > to be adjusted to the new infrastructure. I think your patch is on a better footing than mine, so maybe I should start contributing to your code rather than the other way around. Before the next commitfest, I could merge the testcases from my patch in and identify parsing discrepancies (if any). Afterward, I could help merge the other features into the new JSON infrastructure. I can't compile your initial patch against the latest checkout because json_parser.h and json_scanner.h are missing. Is there a more recent patch, or could you update the patch so it compiles? I'd like to start tinkering with the new code. Thanks! Joey Adams -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Report: removing the inconsistencies in our CVS->git conversion
On Fri, Sep 17, 2010 at 23:21, Tom Lane wrote: > Actually, the simplest way to handle this might be to just delete all > five of those tags during the conversion, and then I'll put them back > in the right places later when I add the other old-release tags. > That way we won't have any tags getting moved after the repository is > published. (Or am I wrong about that being something to avoid? But > in any case we want to gc the manufactured commits for Release_2_0_0 > and Release_2_0.) As long as nobody has cloned the repository, it's not a problem moving them. But it *is* something you should generally avoid, so let's do that :-) -- Magnus Hagander Me: http://www.hagander.net/ Work: http://www.redpill-linpro.com/ -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Report: removing the inconsistencies in our CVS->git conversion
On Fri, Sep 17, 2010 at 23:01, Tom Lane wrote: > BTW, on the cleanup steps: > > # Remove bogus branches > git branch -D unlabeled-1.44.2 > git branch -D unlabeled-1.51.2 > git branch -D unlabeled-1.59.2 > git branch -D unlabeled-1.87.2 > git branch -D unlabeled-1.90.2 > > You should not need any of the above; I don't see those being generated > anymore with the cleanup script in place. I see them - at least the script didn't give an error when it ran, but said it had deleted them. > git branch -D ecpg_big_bison > > I don't agree with removing this. It is a legitimate part of the > project history. Yeah, it was a workaround, but we've had workarounds > in place for other broken software. Ok. I was just going by Michaels email - I'll leave it in then. > # Remove broken tags > git tag -d SUPPORT > git tag -d MANUAL_1_0 > git tag -d Release-1-6-0 > > Also get rid of the "creation" tag, please. Also, just for the record, > these aren't really broken. Maybe "Remove unwanted partial tags"? Right. I didn't really pay much attention to the comments, since it's just a temporary thing. "creation" added to the list. > In addition to the above, we're going to want to clean up the > Release_2_0_0 and Release_2_0 tags, but I'm not sure if there's > a reasonable way to script those when the commit SHA1's aren't frozen > yet. I can give you timestamps for the commits they should point at, > but I lack the git-fu to convert that into a git tag command. yeah, that's definitely easier to do after the fact. > Similarly, we're going to want to repoint REL6_5, REL7_1, REL7_1_2 > tags at more appropriate places, but right now all I have for those is > timestamps. (I've also identified places to tag the other early > releases, but those tags can certainly be added later.) I like your downthread suggestion of removing them, and then putting them back in manually later. I'll add them all to the delete list. -- Magnus Hagander Me: http://www.hagander.net/ Work: http://www.redpill-linpro.com/ -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Update comment for README.HOT
Tom Lane wrote: > Bruce Momjian writes: > > + This means that UPDATE, DELETE, and SELECT can trigger space > > + reclamation, while INSERT ... VALUES cannot because it does not retrieve > > + a row. > > I don't believe that's correct. It might have happened to work that way > for you in a particular test. It's certainly not something I'd document > as being intended long-term behavior. Well, I would like to document something about this because I was surprised that when INSERT did not trigger a cleanup. I realize we might change the behavior but then we would update the file too, hopefully. How is the attached version using "often"? I also clarified it is < 10% free. I found this while doing tests for a new MVCC talk I will be delivering at PG West. -- Bruce Momjian http://momjian.us EnterpriseDB http://enterprisedb.com + It's impossible for everything to be true. + Index: src/backend/access/heap/README.HOT === RCS file: /cvsroot/pgsql/src/backend/access/heap/README.HOT,v retrieving revision 1.6 diff -c -c -r1.6 README.HOT *** src/backend/access/heap/README.HOT 23 Apr 2010 23:21:44 - 1.6 --- src/backend/access/heap/README.HOT 17 Sep 2010 21:21:56 - *** *** 246,251 --- 246,257 is arbitrarily capped at MaxHeapTuplesPerPage (the most tuples that could fit without HOT pruning). + Effectively, space reclamation happens during tuple retrieval when the + page is nearly full (<10% free) and a buffer cleanup lock can be + acquired. This means that UPDATE, DELETE, and SELECT can trigger space + reclamation, but often not during INSERT ... VALUES because it does + not retrieve a row. + VACUUM -- -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Report: removing the inconsistencies in our CVS->git conversion
I wrote: > In addition to the above, we're going to want to clean up the > Release_2_0_0 and Release_2_0 tags, but I'm not sure if there's > a reasonable way to script those when the commit SHA1's aren't frozen > yet. I can give you timestamps for the commits they should point at, > but I lack the git-fu to convert that into a git tag command. > Similarly, we're going to want to repoint REL6_5, REL7_1, REL7_1_2 > tags at more appropriate places, but right now all I have for those is > timestamps. (I've also identified places to tag the other early > releases, but those tags can certainly be added later.) Actually, the simplest way to handle this might be to just delete all five of those tags during the conversion, and then I'll put them back in the right places later when I add the other old-release tags. That way we won't have any tags getting moved after the repository is published. (Or am I wrong about that being something to avoid? But in any case we want to gc the manufactured commits for Release_2_0_0 and Release_2_0.) regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Update comment for README.HOT
Thom Brown wrote: > On 17 September 2010 20:52, Bruce Momjian wrote: > > I would like to apply the attached patch to README.HOT so clarify when > > single-page cleanup happens, e.g. not during INSERT. > > > > "... when the page is nearly full (<10%) ..." > > Shouldn't that be >90%? > > "... while INSERT ... VALUES cannot because it does not retrieve a row." > > Is this still true when it's used in conjunction with RETURNING? I think returning might cause a clean --- I have not tested that. -- Bruce Momjian http://momjian.us EnterpriseDB http://enterprisedb.com + It's impossible for everything to be true. + -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] compile/install of git
FYI, I have compiled/installed git 1.7.3.rc2 on my BSD/OS 4.3.1 machine with the attached minor changes. -- Bruce Momjian http://momjian.us EnterpriseDB http://enterprisedb.com + It's impossible for everything to be true. + *** ./git-1.7.3.rc2/Makefile.orig Fri Sep 17 14:01:17 2010 --- ./git-1.7.3.rc2/Makefile Fri Sep 17 14:36:57 2010 *** *** 1141,1146 --- 1141,1152 endif endif + NO_NSEC = YesPlease + NO_PREAD = YesPlease + NEEDS_LIBGEN = + NO_TCLTK = YesPlease + NO_PYTHON = YesPlease + -include config.mak.autogen -include config.mak *** *** 1244,1250 PROGRAM_OBJS += daemon.o endif ifndef NO_OPENSSL ! OPENSSL_LIBSSL = -lssl ifdef OPENSSLDIR BASIC_CFLAGS += -I$(OPENSSLDIR)/include OPENSSL_LINK = -L$(OPENSSLDIR)/$(lib) $(CC_LD_DYNPATH)$(OPENSSLDIR)/$(lib) --- 1250,1256 PROGRAM_OBJS += daemon.o endif ifndef NO_OPENSSL ! OPENSSL_LIBSSL = -lssl -lcrypto ifdef OPENSSLDIR BASIC_CFLAGS += -I$(OPENSSLDIR)/include OPENSSL_LINK = -L$(OPENSSLDIR)/$(lib) $(CC_LD_DYNPATH)$(OPENSSLDIR)/$(lib) *** *** 1260,1266 OPENSSL_LIBSSL = endif ifdef NEEDS_SSL_WITH_CRYPTO ! LIB_4_CRYPTO = $(OPENSSL_LINK) -lcrypto -lssl else LIB_4_CRYPTO = $(OPENSSL_LINK) -lcrypto endif --- 1266,1272 OPENSSL_LIBSSL = endif ifdef NEEDS_SSL_WITH_CRYPTO ! LIB_4_CRYPTO = $(OPENSSL_LINK) -lssl -lcrypto else LIB_4_CRYPTO = $(OPENSSL_LINK) -lcrypto endif *** *** 1274,1281 EXTLIBS += $(ICONV_LINK) -liconv endif ifdef NEEDS_LIBGEN ! EXTLIBS += -lgen endif ifdef NEEDS_SOCKET EXTLIBS += -lsocket endif --- 1280,1289 EXTLIBS += $(ICONV_LINK) -liconv endif ifdef NEEDS_LIBGEN ! # EXTLIBS += -lgen endif + EXTLIBS += -lpoll + ifdef NEEDS_SOCKET EXTLIBS += -lsocket endif *** *** 1498,1504 ifndef V QUIET_CC = @echo ' ' CC $@; QUIET_AR = @echo ' ' AR $@; ! QUIET_LINK = @echo ' ' LINK $@; QUIET_BUILT_IN = @echo ' ' BUILTIN $@; QUIET_GEN = @echo ' ' GEN $@; QUIET_LNCP = @echo ' ' LN/CP $@; --- 1506,1513 ifndef V QUIET_CC = @echo ' ' CC $@; QUIET_AR = @echo ' ' AR $@; ! # QUIET_LINK = @echo ' ' LINK $@; ! QUIET_LINK= QUIET_BUILT_IN = @echo ' ' BUILTIN $@; QUIET_GEN = @echo ' ' GEN $@; QUIET_LNCP = @echo ' ' LN/CP $@; *** ./git-1.7.3.rc2/git-compat-util.h.orig Fri Sep 17 12:25:09 2010 --- ./git-1.7.3.rc2/git-compat-util.h Fri Sep 17 12:35:23 2010 *** *** 106,112 #include #include #include ! #include #if defined(__CYGWIN__) #undef _XOPEN_SOURCE #include --- 106,119 #include #include #include ! ! typedef u_int64_t uint64_t; ! typedef u_int32_t uint32_t; ! typedef u_int16_t uint16_t; ! typedef long int intptr_t; ! typedef unsigned long int uintptr_t; ! ! //#include #if defined(__CYGWIN__) #undef _XOPEN_SOURCE #include -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Report: removing the inconsistencies in our CVS->git conversion
BTW, on the cleanup steps: # Remove bogus branches git branch -D unlabeled-1.44.2 git branch -D unlabeled-1.51.2 git branch -D unlabeled-1.59.2 git branch -D unlabeled-1.87.2 git branch -D unlabeled-1.90.2 You should not need any of the above; I don't see those being generated anymore with the cleanup script in place. git branch -D ecpg_big_bison I don't agree with removing this. It is a legitimate part of the project history. Yeah, it was a workaround, but we've had workarounds in place for other broken software. # Remove broken tags git tag -d SUPPORT git tag -d MANUAL_1_0 git tag -d Release-1-6-0 Also get rid of the "creation" tag, please. Also, just for the record, these aren't really broken. Maybe "Remove unwanted partial tags"? In addition to the above, we're going to want to clean up the Release_2_0_0 and Release_2_0 tags, but I'm not sure if there's a reasonable way to script those when the commit SHA1's aren't frozen yet. I can give you timestamps for the commits they should point at, but I lack the git-fu to convert that into a git tag command. Similarly, we're going to want to repoint REL6_5, REL7_1, REL7_1_2 tags at more appropriate places, but right now all I have for those is timestamps. (I've also identified places to tag the other early releases, but those tags can certainly be added later.) regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Progress indication prototype
On tor, 2010-09-16 at 19:14 -0400, Robert Haas wrote: > I think that there should be a function which returns just this one > piece of data and is not automatically called as part of select * from > pg_stat_activity. Then we could eventually decide to give backends a > way to know if that function had been invoked on them and how > recently. Displaying this as part of pg_stat_activity is completely trivial: it's just displaying the value of a float variable. It seems you are advocating a completely different architecture, where someone can find out on demand what the progress or status of another session is, without that other session having known about that request before it started its current command. But that seems pretty outlandish to me, and I would ask for more details on what you have in mind. -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] bg worker: general purpose requirements
Robert, On 09/17/2010 05:52 PM, Robert Haas wrote: Technically, you could start an autonomous transaction from within an autonomous transaction, so I don't think there's a hard maximum of one per normal backend. However, I agree that the expected case is to not have very many. Thanks for pointing that out. I somehow knew that was wrong.. I guess it depends on what your goals are. Agreed. If you're optimizing for ability to respond quickly to a sudden load, keeping idle backends will probably win even when the number of them you're keeping around is fairly high. If you're optimizing for minimal overall resource consumption, though, you'll not be as happy about that. What resources are we talking about here? Are idle backends really that resource hungry? My feeling so far has been that idle processes are relatively cheap (i.e. some 100 idle processes shouldn't hurt on a modern server). What I'm struggling to understand is this: if there aren't any preforked workers around when the load hits, how much does it slow things down? As the startup code is pretty much the same as for the current avlauncher, the coordinator can only request one bgworker at a time. This means the signal needs to reach the postmaster, which then forks a bgworker process. That new process starts up, connects to the requested database and then sends an imessage to the coordinator to register. Only after having received that registration, the coordinator can request another bgworker (note that this is a one-overall limitation, not per database). I haven't measured the actual time it takes, but given the use case of a connection pool, I so far thought it's obvious that this process takes too long. (It's exactly what apache pre-fork does, no? Is anybody concerned about the idle processes there? Or do they consume much less resources?) I would have thought that a few seconds to ramp up to speed after an extended idle period (5 minutes, say) would be acceptable for most of the applications you mention. A few seconds? That might be sufficient for autovacuum, but most queries are completed in less that one second. So for parallel querying, autonomous transactions and Postgres-R, I certainly don't think that a few seconds are reasonable. Especially considering the cost of idle backends. Is the ramp-up time longer than that, or is even that much delay unacceptable for Postgres-R, or is there some other aspect to the problem I'm failing to grasp? I can tell you have some experience tuning this so I'd like to try to understand where you're coming from. I didn't ever compare to a max_spare_background_workers = 0 configuration, so I don't have any hard numbers, sorry. I think this is an interesting example, and worth some further thought. I guess I don't really understand how Postgres-R uses these bgworkers. The given example doesn't only apply to Postgres-R. But with fewer bgworkers in total, you are more likely to want to use them all for one database, yes. Are you replicating one transaction at a time, or how does the data get sliced up? Yes, one transaction at a time. One transaction per backend (bgworker). On a cluster with n nodes that has only performs writing transactions, avg. at a rate of m concurrent transactions/node, you ideally end up having m normal backends and (n-1) * m bgworkers that concurrently apply the remote transactions. I remember you mentioning sync/async/eager/other replication strategies previously - do you have a pointer to some good reading on that topic? Postgres-R mainly is eager multi-master replication. www.postgres-r.org has some links, most up-to-date my concept paper: http://www.postgres-r.org/downloads/concept.pdf That seems like it would be useful, too. Okay, will try to come up with something, soon(ish). Thank you for your feedback and constructive criticism. Regards Markus Wanner -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Report: removing the inconsistencies in our CVS->git conversion
On Fri, Sep 17, 2010 at 20:37, Magnus Hagander wrote: > On Fri, Sep 17, 2010 at 20:32, Tom Lane wrote: >> I wrote: >>> Something in the rsync process thinks that ignoring subdirectories >>> named "core" is a good idea. I'm a bit surprised nobody ever noticed >>> these were missing from anoncvs before ... >> >> That's because they aren't. It's the -C switch in your rsync call >> that's at fault. (And this demonstrates why comparing against checkouts >> from the same CVS repo isn't an end-to-end test :-() > > Yeah, I noticed that too. Re-running now. Ok, re-run off a correct rsync pushed. How does it look now? -- Magnus Hagander Me: http://www.hagander.net/ Work: http://www.redpill-linpro.com/ -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] ALTER TYPE extensions
On fre, 2010-09-17 at 18:15 +0900, KaiGai Kohei wrote: > * At the ATPrepAddColumn(), it seems to me someone added a check > to prevent adding a new column to typed table, as you try to > add in this patch. Good catch. Redundant checks removed. > * At the ATPrepAlterColumnType(), you enclosed an existing code > block by "if (tab->relkind == RELKIND_RELATION) { ... }", but > it is not indented to appropriate level. Yeah, just to keep the patch small. ;-) > * RENAME ATTRIBUTE ... TO ... > > Even if the composite type to be altered is in use, we can alter > the name of attribute. Is it intended? No. Added a check for it now. > BTW, is there any requirement from SQL standard about behavior > when we try to add/drop an attribute of composite type in use? > This patch always prohibit it, using find_typed_table_dependencies() > and find_composite_type_dependencies(). > However, it seems to me not difficult to alter columns of typed > tables subsequent with this ALTER TYPE, although it might be > not easy to alter definitions of embedded composite type already > in use. > Of course, it may be our future works. If so, it's good. The prohibition on altering types that are used in typed tables is actually from the SQL standard. But for now it's just because it's not implemented; I plan to work on extending that later. The restriction by find_composite_type_dependencies() was already there for altering tables, and I just kept it the same for now. New patch attached. diff --git a/doc/src/sgml/ref/alter_type.sgml b/doc/src/sgml/ref/alter_type.sgml index e2fec32..ade300f 100644 --- a/doc/src/sgml/ref/alter_type.sgml +++ b/doc/src/sgml/ref/alter_type.sgml @@ -23,9 +23,17 @@ PostgreSQL documentation -ALTER TYPE name RENAME TO new_name +ALTER TYPE name action [, ... ] ALTER TYPE name OWNER TO new_owner +ALTER TYPE name RENAME ATTRIBUTE attribute_name TO new_attribute_name +ALTER TYPE name RENAME TO new_name ALTER TYPE name SET SCHEMA new_schema + +where action is one of: + +ADD ATTRIBUTE attribute_name data_type +DROP ATTRIBUTE [ IF EXISTS ] attribute_name +ALTER ATTRIBUTE attribute_name [ SET DATA ] TYPE data_type @@ -34,6 +42,76 @@ ALTER TYPE name SET SCHEMA ALTER TYPE changes the definition of an existing type. + There are several subforms: + + + +ADD ATTRIBUTE + + + This form adds a new attribute to a composite type, using the same syntax as + . + + + + + +DROP ATTRIBUTE [ IF EXISTS ] + + + This form drops an attribute from a composite type. + If IF EXISTS is specified and the attribute + does not exist, no error is thrown. In this case a notice + is issued instead. + + + + + +SET DATA TYPE + + + This form changes the type of an attribute of a composite type. + + + + + +OWNER + + + This form changes the owner of the type. + + + + + +RENAME + + + This form changes the name of the type or the name of an + individual attribute of a composite type. + + + + + +SET SCHEMA + + + This form moves the type into another schema. + + + + + + + + The ADD ATTRIBUTE, DROP + ATTRIBUTE, and ALTER ATTRIBUTE actions + can be combined into a list of multiple alterations to apply in + parallel. For example, it is possible to add several attributes + and/or alter the type of several attributes in a single command. @@ -90,6 +168,34 @@ ALTER TYPE name SET SCHEMA + + attribute_name + + +The name of the attribute to add, alter, or drop. + + + + + + new_attribute_name + + +The new name of the attribute begin renamed. + + + + + + data_type + + +The data type of the attribute to add, or the new type of the +attribute to alter. + + + + @@ -119,14 +225,21 @@ ALTER TYPE email OWNER TO joe; ALTER TYPE email SET SCHEMA customers; + + + To add a new attribute to a type: + +ALTER TYPE compfoo ADD ATTRIBUTE f3 int; + + Compatibility - There is no ALTER TYPE statement in the SQL - standard. + The variants to add and drop attributes are part of the SQL + standard; the other variants are PostgreSQL extensions. diff --git a/src/backend/commands/alter.c b/src/backend/commands/alter.c index 788a4db..9da2690 100644 --- a/src/backend/commands/alter.c +++ b/src/backend/commands/alter.c @@ -89,6 +89,7 @@ ExecRenameStmt(RenameStmt *stmt) case OBJECT_VIEW: case OBJECT_INDEX: case OBJECT_COLUMN: + case OBJECT_ATTRIBUTE: case OBJECT_TRIGGER: { Oid relid; @@ -123,6 +124,7 @@ ExecRenameStmt(RenameStmt *stmt) break; } case OBJECT_COLUMN: + case OBJECT_ATTRIBUTE: renameatt(relid,
Re: [HACKERS] Update comment for README.HOT
Bruce Momjian writes: > + This means that UPDATE, DELETE, and SELECT can trigger space > + reclamation, while INSERT ... VALUES cannot because it does not retrieve > + a row. I don't believe that's correct. It might have happened to work that way for you in a particular test. It's certainly not something I'd document as being intended long-term behavior. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Update comment for README.HOT
On 17 September 2010 20:52, Bruce Momjian wrote: > I would like to apply the attached patch to README.HOT so clarify when > single-page cleanup happens, e.g. not during INSERT. > "... when the page is nearly full (<10%) ..." Shouldn't that be >90%? "... while INSERT ... VALUES cannot because it does not retrieve a row." Is this still true when it's used in conjunction with RETURNING? -- Thom Brown Twitter: @darkixion IRC (freenode): dark_ixion Registered Linux user: #516935 -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] Update comment for README.HOT
I would like to apply the attached patch to README.HOT so clarify when single-page cleanup happens, e.g. not during INSERT. -- Bruce Momjian http://momjian.us EnterpriseDB http://enterprisedb.com + It's impossible for everything to be true. + Index: src/backend/access/heap/README.HOT === RCS file: /cvsroot/pgsql/src/backend/access/heap/README.HOT,v retrieving revision 1.6 diff -c -c -r1.6 README.HOT *** src/backend/access/heap/README.HOT 23 Apr 2010 23:21:44 - 1.6 --- src/backend/access/heap/README.HOT 17 Sep 2010 19:48:16 - *** *** 246,251 --- 246,257 is arbitrarily capped at MaxHeapTuplesPerPage (the most tuples that could fit without HOT pruning). + Effectively, space reclamation happens during tuple retrieval when the + page is nearly full (<10%) and a buffer cleanup lock can be acquired. + This means that UPDATE, DELETE, and SELECT can trigger space + reclamation, while INSERT ... VALUES cannot because it does not retrieve + a row. + VACUUM -- -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Configuring synchronous replication
Simon Riggs writes: > On Fri, 2010-09-17 at 21:20 +0900, Fujii Masao wrote: >> What synchronization level does each combination of sync_replication >> and sync_replication_service lead to? > > There are only 4 possible outcomes. There is no combination, so we don't > need a table like that above. > > The "service" specifies the highest request type available from that > specific standby. If someone requests a higher service than is currently > offered by this standby, they will either > a) get that service from another standby that does offer that level > b) automatically downgrade the sync rep mode to the highest available. I like the a) part, I can't say the same about the b) part. There's no reason to accept to COMMIT a transaction when the requested durability is known not to have been reached, unless the user said so. > For example, if you request recv but there is only one standby and it > only offers async, then you get downgraded to async. If so you choose, but with a net slowdown as you're now reaching the timeout for each transaction, with what I have in mind, and I don't see how you can avoid that. Even if you setup the replication from the master, you still can mess it up the same way, right? Regards, -- dim -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Configuring synchronous replication
Simon Riggs writes: > On Fri, 2010-09-17 at 21:20 +0900, Fujii Masao wrote: >> According to what I heard, some people want to guarantee that all the >> transactions are *always* written in *all* the synchronous standbys. > > You don't need standby registration at all. You can do that with a > single parameter, already proposed: > > quorum_commit = N. I think you also need another parameter to control the behavior upon timeout. You received less than N votes, now what? You're current idea seems to be COMMIT, Aidan says ROLLBACK, and I say that's to be a GUC set at the transaction level. As far as registration goes, I see no harm to have the master maintain a list of known standby systems, of course, it's just maintaining that list from the master that I don't understand the use case for. Regards, -- dim -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Report: removing the inconsistencies in our CVS->git conversion
Magnus Hagander writes: > Will not doing the backbranches make it harder to backport patches? > Probably shouldn't, unless you're changing the very first line of the > file, right? The $PostgreSQL$ lines haven't been a backporting problem in the past, so I don't see why they'd be one now. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Report: removing the inconsistencies in our CVS->git conversion
On Fri, Sep 17, 2010 at 20:49, Tom Lane wrote: > I looked a bit more at your pggit_migrate stuff. I'm not terribly happy > with the proposed clean_keywords.pl script. I'd like it to reduce the > $PostgreSQL$ thingies to the full pathname of the file, rather than > try to remove all trace of them, eg > * $PostgreSQL: pgsql/src/port/unsetenv.c,v 1.12 2010/09/07 14:10:30 > momjian Exp $ > becomes > * src/port/unsetenv.c > > This would then be followed up by moving those pathname comments to > somewhere more sensible. I don't think that part can be managed with > a script like this, but leaving the data in place will make it easier > to do the moving. Some places, like the .sgml files, won't need any > additional changing to get to where I would like to be. > > Also, I'd be inclined to make these changes only in master, not in the > back branches. We don't for example run pg_indent against back branches. We discussed that before, and the consensus then was to do that, and backport it to *active* backbranches. Just to not demove the lines, but replace them with an empty one so that line numbers would stay the same. But it's not too late to change that again... Shouldn't be too hard to change the script to change it the way you suggest, and it's *certainly* not hard to restrict it to just to the master branch... Will not doing the backbranches make it harder to backport patches? Probably shouldn't, unless you're changing the very first line of the file, right? -- Magnus Hagander Me: http://www.hagander.net/ Work: http://www.redpill-linpro.com/ -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Report: removing the inconsistencies in our CVS->git conversion
On 09/17/2010 02:37 PM, Magnus Hagander wrote: On Fri, Sep 17, 2010 at 20:32, Tom Lane wrote: I wrote: Something in the rsync process thinks that ignoring subdirectories named "core" is a good idea. I'm a bit surprised nobody ever noticed these were missing from anoncvs before ... That's because they aren't. It's the -C switch in your rsync call that's at fault. (And this demonstrates why comparing against checkouts from the same CVS repo isn't an end-to-end test :-() Yeah, I noticed that too. Re-running now. FWIW, I got thecommand off the wiki's recommendations for how to use rsync This is what I have been using for a very long time: rsync -avzH --delete --exclude-from=/home/cvsmirror/pg-exclude anoncvs.postgresql.org::pgsql-cvs /home/cvsmirror/pg The exclude file contains: /sup/ /CVSROOT/loginfo* /CVSROOT/commitinfo* /CVSROOT/config* /CVSROOT/passwd /CVSROOT/history cheers andrew
Re: [HACKERS] Report: removing the inconsistencies in our CVS->git conversion
I looked a bit more at your pggit_migrate stuff. I'm not terribly happy with the proposed clean_keywords.pl script. I'd like it to reduce the $PostgreSQL$ thingies to the full pathname of the file, rather than try to remove all trace of them, eg *$PostgreSQL: pgsql/src/port/unsetenv.c,v 1.12 2010/09/07 14:10:30 momjian Exp $ becomes *src/port/unsetenv.c This would then be followed up by moving those pathname comments to somewhere more sensible. I don't think that part can be managed with a script like this, but leaving the data in place will make it easier to do the moving. Some places, like the .sgml files, won't need any additional changing to get to where I would like to be. Also, I'd be inclined to make these changes only in master, not in the back branches. We don't for example run pg_indent against back branches. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Report: removing the inconsistencies in our CVS->git conversion
On Fri, Sep 17, 2010 at 20:32, Tom Lane wrote: > I wrote: >> Something in the rsync process thinks that ignoring subdirectories >> named "core" is a good idea. I'm a bit surprised nobody ever noticed >> these were missing from anoncvs before ... > > That's because they aren't. It's the -C switch in your rsync call > that's at fault. (And this demonstrates why comparing against checkouts > from the same CVS repo isn't an end-to-end test :-() Yeah, I noticed that too. Re-running now. FWIW, I got thecommand off the wiki's recommendations for how to use rsync -- Magnus Hagander Me: http://www.hagander.net/ Work: http://www.redpill-linpro.com/ -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Report: removing the inconsistencies in our CVS->git conversion
I wrote: > Something in the rsync process thinks that ignoring subdirectories > named "core" is a good idea. I'm a bit surprised nobody ever noticed > these were missing from anoncvs before ... That's because they aren't. It's the -C switch in your rsync call that's at fault. (And this demonstrates why comparing against checkouts from the same CVS repo isn't an end-to-end test :-() regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Report: removing the inconsistencies in our CVS->git conversion
On Fri, Sep 17, 2010 at 20:24, Tom Lane wrote: > I wrote: >> Hmm. I didn't try rsync'ing from anoncvs ... I logged into the master >> and tar'd up the /cvsroot directory ;-). I wonder if there's something >> wrong with the anoncvs copy of that subdirectory? Will do the rsync >> and compare. > > Doh: > > Only in myrepo/pgsql/contrib/retep/uk/org/retep/xml: core > Only in myrepo/pgsql/src/interfaces/jdbc/org/postgresql: core > > Something in the rsync process thinks that ignoring subdirectories > named "core" is a good idea. I'm a bit surprised nobody ever noticed > these were missing from anoncvs before ... Oh FFS.. It's the adding of -C to the commandline to rsync. And it's not added when syncing to anoncvs - it's there on anoncvs. -- Magnus Hagander Me: http://www.hagander.net/ Work: http://www.redpill-linpro.com/ -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Report: removing the inconsistencies in our CVS->git conversion
I wrote: > Hmm. I didn't try rsync'ing from anoncvs ... I logged into the master > and tar'd up the /cvsroot directory ;-). I wonder if there's something > wrong with the anoncvs copy of that subdirectory? Will do the rsync > and compare. Doh: Only in myrepo/pgsql/contrib/retep/uk/org/retep/xml: core Only in myrepo/pgsql/src/interfaces/jdbc/org/postgresql: core Something in the rsync process thinks that ignoring subdirectories named "core" is a good idea. I'm a bit surprised nobody ever noticed these were missing from anoncvs before ... regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Report: removing the inconsistencies in our CVS->git conversion
Magnus Hagander writes: > Just to confirm, you ran your patch against current cvs, right? So you > also got the "hunk succeeded at offset 1 line" a whole bunch of times? > Then it's not that that's broken. Right, the patch still applies fine, it's just off by a line or so in many places (probably because of the new REL9_0_0 tags). > The script I've run is on http://github.com/mhagander/pggit_migrate - > it's the "migrate_cvs.sh" script. The repository_fixups script is a > direct import of yours except I added a "set -e" at the start. Hmm. I didn't try rsync'ing from anoncvs ... I logged into the master and tar'd up the /cvsroot directory ;-). I wonder if there's something wrong with the anoncvs copy of that subdirectory? Will do the rsync and compare. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Make CLUSTER VERBOSE more verbose
At 2010-09-16 21:22:54 +0900, itagaki.takah...@gmail.com wrote: > > Please read the thread. The patch is intended to be applied after > "sequence scan + sort for CLUSTER" patch. Sorry. I missed that. The patch looks fine, then. -- ams -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Report: removing the inconsistencies in our CVS->git conversion
On Fri, Sep 17, 2010 at 19:20, Tom Lane wrote: > Magnus Hagander writes: >> On Fri, Sep 17, 2010 at 18:28, Robert Haas wrote: >>> Sorry, I must not have cleaned out the old state properly. > >> Turns out I did th esame thing from my box to the repo on git.postgresql.org. > >> So I've now wiped that repository and re-pushed mine. Can you give it >> another check? (from a fresh clone) > > I re-cloned but got the exact same state as before --- jdbc still wonky. That's weird. Just to confirm, you ran your patch against current cvs, right? So you also got the "hunk succeeded at offset 1 line" a whole bunch of times? Then it's not that that's broken. And I'm on cvs2git revision 5270. The script I've run is on http://github.com/mhagander/pggit_migrate - it's the "migrate_cvs.sh" script. The repository_fixups script is a direct import of yours except I added a "set -e" at the start. (the version pushed hasn't had the git gc step run, but that's the only one that differs) -- Magnus Hagander Me: http://www.hagander.net/ Work: http://www.redpill-linpro.com/ -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] system catalogs page
Robert Haas writes: > I was just noticing that, on this page here: > http://www.postgresql.org/docs/9.0/static/catalogs.html > ...all of the catalogs are in alphabetical order, except for > pg_db_role_setting, which is filed in the place in alphabetical order > where it would belong if the name were spelled pg_setting. > Is there any reason I shouldn't go fix this? Sounds like a mistake to me ... regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] system catalogs page
On 17 September 2010 18:17, Robert Haas wrote: > I was just noticing that, on this page here: > > http://www.postgresql.org/docs/9.0/static/catalogs.html > > ...all of the catalogs are in alphabetical order, except for > pg_db_role_setting, which is filed in the place in alphabetical order > where it would belong if the name were spelled pg_setting. or pg_role_setting > Is there any reason I shouldn't go fix this? Looks wrong to me, so I'd say should be fixed. -- Thom Brown Twitter: @darkixion IRC (freenode): dark_ixion Registered Linux user: #516935 -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Report: removing the inconsistencies in our CVS->git conversion
Magnus Hagander writes: > On Fri, Sep 17, 2010 at 18:28, Robert Haas wrote: >> Sorry, I must not have cleaned out the old state properly. > Turns out I did th esame thing from my box to the repo on git.postgresql.org. > So I've now wiped that repository and re-pushed mine. Can you give it > another check? (from a fresh clone) I re-cloned but got the exact same state as before --- jdbc still wonky. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] system catalogs page
I was just noticing that, on this page here: http://www.postgresql.org/docs/9.0/static/catalogs.html ...all of the catalogs are in alphabetical order, except for pg_db_role_setting, which is filed in the place in alphabetical order where it would belong if the name were spelled pg_setting. Is there any reason I shouldn't go fix this? -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise Postgres Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Report: removing the inconsistencies in our CVS->git conversion
On Fri, Sep 17, 2010 at 18:28, Robert Haas wrote: > On Fri, Sep 17, 2010 at 11:55 AM, Robert Haas wrote: >> On Fri, Sep 17, 2010 at 11:39 AM, Tom Lane wrote: >>> Robert Haas writes: Magnus posted an updated conversion this morning. >>> http://git.postgresql.org/gitweb?p=postgresql-migration.git;a=summary >>> Evidently, however, he didn't do the same things you did, because there are DEFINITELY more than 9 manufactured commits in this one. >>> >>> Um ... I just did >>> git clone git://git.postgresql.org/git/postgresql-migration.git >>> and I only see nine. It's got some *other* problems though; compared >>> to a conversion I just finished locally, it's missing a whole lot of >>> history for some of the old jdbc files. >>> >>> Is there any possibility that "git clone" isn't very trustworthy? >>> It's a bit scary that we don't see identical views of this repository. >> >> *scratches head* >> >> I did a git-fetch into an existing copy of the old contents of that >> repository, rather than a fresh clone. Let me nuke it and start over. > > OK, the fresh clone does in fact show just 9 manufactured commits. > Sorry, I must not have cleaned out the old state properly. Turns out I did th esame thing from my box to the repo on git.postgresql.org. So I've now wiped that repository and re-pushed mine. Can you give it another check? (from a fresh clone) -- Magnus Hagander Me: http://www.hagander.net/ Work: http://www.redpill-linpro.com/ -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Report: removing the inconsistencies in our CVS->git conversion
Robert Haas writes: >> On Fri, Sep 17, 2010 at 11:39 AM, Tom Lane wrote: >>> and I only see nine. It's got some *other* problems though; compared >>> to a conversion I just finished locally, it's missing a whole lot of >>> history for some of the old jdbc files. > What's the problem with the old JDBC files? This is what I sent to Magnus off-list: There seems to be something wrong with this :-(. I pulled this down and compared the output of "git log --all --source --name-status" to what I got from a fresh conversion of my own. There seems to be a substantial loss of history around some of the jdbc files. See attached diffs from my log to yours. regards, tom lane *** revhist.git17 Fri Sep 17 10:44:50 2010 --- revhist.gitmha Fri Sep 17 11:24:03 2010 *** *** 236673,236679 From Donald Fraser. - M src/interfaces/jdbc/org/postgresql/core/QueryExecutor.java M src/interfaces/jdbc/org/postgresql/fastpath/Fastpath.java M src/interfaces/jdbc/org/postgresql/jdbc1/AbstractJdbc1Connection.java --- 236673,236678 *** *** 249776,249786 jdbc/org/postgresql/test/jdbc2/ServerCursorTest.java M src/interfaces/jdbc/org/postgresql/Driver.java.in - M src/interfaces/jdbc/org/postgresql/core/BaseResultSet.java - M src/interfaces/jdbc/org/postgresql/core/BaseStatement.java - M src/interfaces/jdbc/org/postgresql/core/Field.java - M src/interfaces/jdbc/org/postgresql/core/PGStream.java - M src/interfaces/jdbc/org/postgresql/core/QueryExecutor.java M src/interfaces/jdbc/org/postgresql/jdbc1/AbstractJdbc1DatabaseMetaData.java M src/interfaces/jdbc/org/postgresql/jdbc1/AbstractJdbc1ResultSet.java M src/interfaces/jdbc/org/postgresql/jdbc1/AbstractJdbc1Statement.java --- 249775,249780 *** *** 254896,254902 Per report from Hans Nather. - M src/interfaces/jdbc/org/postgresql/core/QueryExecutor.java M src/interfaces/jdbc/org/postgresql/fastpath/Fastpath.java M src/interfaces/jdbc/org/postgresql/test/jdbc2/Jdbc2TestSuite.java A src/interfaces/jdbc/org/postgresql/test/jdbc2/NotifyTest.java --- 254890,254895 *** *** 256307,256321 D src/interfaces/jdbc/org/postgresql/PGNotification.java D src/interfaces/jdbc/org/postgresql/PGRefCursorResultSet.java D src/interfaces/jdbc/org/postgresql/PGStatement.java - D src/interfaces/jdbc/org/postgresql/core/BaseConnection.java - D src/interfaces/jdbc/org/postgresql/core/BaseResultSet.java - D src/interfaces/jdbc/org/postgresql/core/BaseStatement.java - D src/interfaces/jdbc/org/postgresql/core/Encoding.java - D src/interfaces/jdbc/org/postgresql/core/Field.java - D src/interfaces/jdbc/org/postgresql/core/Notification.java - D src/interfaces/jdbc/org/postgresql/core/PGStream.java - D src/interfaces/jdbc/org/postgresql/core/QueryExecutor.java - D src/interfaces/jdbc/org/postgresql/core/StartupPacket.java D src/interfaces/jdbc/org/postgresql/errors.properties D src/interfaces/jdbc/org/postgresql/errors_de.properties D src/interfaces/jdbc/org/postgresql/errors_fr.properties --- 256300,256305 *** *** 261723,261729 M src/interfaces/jdbc/example/corba/StockServer.java M src/interfaces/jdbc/example/corba/stock.idl M src/interfaces/jdbc/example/corba/stock.sql - M src/interfaces/jdbc/org/postgresql/core/StartupPacket.java M src/interfaces/jdbc/org/postgresql/test/jdbc2/BlobTest.java M src/interfaces/jdbc/org/postgresql/test/jdbc2/ConnectionTest.java M src/interfaces/jdbc/org/postgresql/test/jdbc2/DatabaseMetaDataTest.java --- 261707,261712 *** *** 262635,262648 M src/interfaces/jdbc/org/postgresql/PGNotification.java M src/interfaces/jdbc/org/postgresql/PGRefCursorResultSet.java M src/interfaces/jdbc/org/postgresql/PGStatement.java - M src/interfaces/jdbc/org/postgresql/core/BaseConnection.java - M src/interfaces/jdbc/org/postgresql/core/BaseResultSet.java - M src/interfaces/jdbc/org/postgresql/core/BaseStatement.java - M src/interfaces/jdbc/org/postgresql/core/Encoding.java - M src/interfaces/jdbc/org/postgresql/core/Field.java - M src/interfaces/jdbc/org/postgresql/core/Notification.java - M src/interfaces/jdbc/org/postgresql/core/PGStream.java - M src/interfaces/jdbc/org/postgresql/core/QueryExecutor.java M src/interfaces/jdbc/org/postgresql/errors_de.properties M src/interfaces/jdbc/org/postgresql/fastpath/Fastpath.java M src/interfaces/jdbc/org/postgresql/fastpath/FastpathArg.java --- 262618,262623 *** *** 266110,266117 Patches from Oliver Jowett to fix CursorFetchTest, 7.4 now does not automatically delete cursors - M src/interfaces/jdbc/org/postgresql/core/BaseConnection.java - M src/interfaces/j
Re: [HACKERS] Report: removing the inconsistencies in our CVS->git conversion
On Fri, Sep 17, 2010 at 11:55 AM, Robert Haas wrote: > On Fri, Sep 17, 2010 at 11:39 AM, Tom Lane wrote: >> Robert Haas writes: >>> Magnus posted an updated conversion this morning. >> >>> http://git.postgresql.org/gitweb?p=postgresql-migration.git;a=summary >> >>> Evidently, however, he didn't do the same things you did, because >>> there are DEFINITELY more than 9 manufactured commits in this one. >> >> Um ... I just did >> git clone git://git.postgresql.org/git/postgresql-migration.git >> and I only see nine. It's got some *other* problems though; compared >> to a conversion I just finished locally, it's missing a whole lot of >> history for some of the old jdbc files. >> >> Is there any possibility that "git clone" isn't very trustworthy? >> It's a bit scary that we don't see identical views of this repository. > > *scratches head* > > I did a git-fetch into an existing copy of the old contents of that > repository, rather than a fresh clone. Let me nuke it and start over. OK, the fresh clone does in fact show just 9 manufactured commits. Sorry, I must not have cleaned out the old state properly. What's the problem with the old JDBC files? -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise Postgres Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] CVS to Git conversion: second try Monday
We will take another run at converting our CVS repository to Git on Monday (the 20th). Hopefully this won't interfere too much with the commitfest, since most work will still be on reviewing rather than actual committing at that point. As before, there will be a "quiet time" to let interested people capture final copies of the CVS repository before access is turned off. Please do not make any commits into CVS after 1400 UTC (10AM EDT, 7AM PDT) on Monday. ssh access to the master repository will be turned off at 1600 UTC or shortly after. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Configuring synchronous replication
On 2010-09-17 10:09, Heikki Linnakangas wrote: I think it makes most sense to set sync vs. async in the master, and the level of synchronicity in the slave. Although I have sympathy for the argument that it's simpler if you configure it all from the master side as well. Just a comment as a sysadmin, It would be hugely beneficial if the master and slaves all was able to run from the "exact same" configuration file. This would leave out any doubt of the configuration of the "complete cluster" in terms of debugging. Slave would be able to just "copy" over the masters configuration, etc. etc. I dont know if it is doable or has any huge backsides. -- Jesper
Re: [HACKERS] Report: removing the inconsistencies in our CVS->git conversion
On 09/17/2010 11:39 AM, Tom Lane wrote: Is there any possibility that "git clone" isn't very trustworthy? It's a bit scary that we don't see identical views of this repository. I should have thought that very unlikely. cheers andrew -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Configuring synchronous replication
On Fri, 2010-09-17 at 16:09 +0300, Heikki Linnakangas wrote: > >> I don't expect any meaningful differences in terms of performance > >> between any of the discussed options. The big question right now is... > > > > This is the critical point. Politely, I would observe that *You* do not > > think there is a meaningful difference. *I* do, and evidence suggests > > that both Oracle and DRBD think so too. So we differ on what the "big > > question" is here. > > We must be talking about different things again. There's certainly big > differences in the different synchronization levels and configurations, > but I don't expect there to be big performance differences between > patches to implement those levels. Once we got rid of the polling loops, > I expect the network and disk latencies to dominate. So IIUC you seem to agree with * 4 levels of synchronous replication (specified on master) * transaction-controlled replication from the master * sending 3 LSN values back from standby Well, then that pretty much is my patch, except for the parameter UI. Did I misunderstand? We also agree that we need a standby to master protocol change; I used Zoltan's directly and I've had zero problems with it in testing. The only disagreement has been about * the need for standby registration (I understand "want") which seems to boil down to whether we wait for servers that *ought* to be there, but currently aren't. * whether to have wal writer active (I'm happy to add that later in this release, so we get the "recv" option also) * whether we have a parameter for quorum commit > 1 (happy to add later) Not sure if there is debate about whether quorum_commit = 1 is the default. * whether we provide replication_exceptions as core feature or as a plugin The only area of doubt is when we send replies, which you haven't thought about yet. So presumably you've no design-level objection to what I've proposed. Things we all seem to like are * different standbys can offer different sync levels * standby names * a set returning function which tells you current LSNs of all standbys * the rough idea of being able to specify a "service" and have that equate to a more complex config underneath the covers, without needing to have the application know the details - I think we need more details on that before we could say "we agree". So seems like a good days work. -- Simon Riggs www.2ndQuadrant.com PostgreSQL Development, 24x7 Support, Training and Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Report: removing the inconsistencies in our CVS->git conversion
On Fri, Sep 17, 2010 at 11:39 AM, Tom Lane wrote: > Robert Haas writes: >> Magnus posted an updated conversion this morning. > >> http://git.postgresql.org/gitweb?p=postgresql-migration.git;a=summary > >> Evidently, however, he didn't do the same things you did, because >> there are DEFINITELY more than 9 manufactured commits in this one. > > Um ... I just did > git clone git://git.postgresql.org/git/postgresql-migration.git > and I only see nine. It's got some *other* problems though; compared > to a conversion I just finished locally, it's missing a whole lot of > history for some of the old jdbc files. > > Is there any possibility that "git clone" isn't very trustworthy? > It's a bit scary that we don't see identical views of this repository. *scratches head* I did a git-fetch into an existing copy of the old contents of that repository, rather than a fresh clone. Let me nuke it and start over. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise Postgres Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] bg worker: general purpose requirements
On Fri, Sep 17, 2010 at 11:29 AM, Markus Wanner wrote: > autonomous transactions: max. one per normal backend (correct?), way fewer > should suffice in most cases, only control data to be passed around Technically, you could start an autonomous transaction from within an autonomous transaction, so I don't think there's a hard maximum of one per normal backend. However, I agree that the expected case is to not have very many. > All of the potential users of bgworkers benefit from a pre-connected > bgworker. Meaning having at least one spare bgworker around per database > could be beneficial, potentially more depending on how often spike loads > occur. As long as there are only few databases, it's easily possible to have > at least one spare process around per database, but with thousands of > databases, that might get prohibitively expensive (not sure where the > boundary between win vs loose is, though. Idle backends vs. connection > cost). I guess it depends on what your goals are. If you're optimizing for ability to respond quickly to a sudden load, keeping idle backends will probably win even when the number of them you're keeping around is fairly high. If you're optimizing for minimal overall resource consumption, though, you'll not be as happy about that. What I'm struggling to understand is this: if there aren't any preforked workers around when the load hits, how much does it slow things down? I would have thought that a few seconds to ramp up to speed after an extended idle period (5 minutes, say) would be acceptable for most of the applications you mention. Is the ramp-up time longer than that, or is even that much delay unacceptable for Postgres-R, or is there some other aspect to the problem I'm failing to grasp? I can tell you have some experience tuning this so I'd like to try to understand where you're coming from. > However, I feel like this gives less control over how the bgworkers are > used. For example, I'd prefer to be able to prevent the system from > allocating all bgworkers to a single database at once. I think this is an interesting example, and worth some further thought. I guess I don't really understand how Postgres-R uses these bgworkers. Are you replicating one transaction at a time, or how does the data get sliced up? I remember you mentioning sync/async/eager/other replication strategies previously - do you have a pointer to some good reading on that topic? > Hope that sheds some more light on how bgworkers could be useful. Maybe I > just need to describe the job handling features of the coordinator better as > well? (Simon also requested better documentation...) That seems like it would be useful, too. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise Postgres Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Configuring synchronous replication
On Fri, 2010-09-17 at 11:30 -0400, Aidan Van Dyk wrote: > * Robert Haas [100917 11:24]: > > On Fri, Sep 17, 2010 at 11:22 AM, Simon Riggs wrote: > > > On Fri, 2010-09-17 at 09:36 -0400, Aidan Van Dyk wrote: > > > > > >> I want to have them configured in a fsync WAL/style sync rep, I want to > > >> make sure that if the master comes up first after I get power back, it's > > >> not going to be claiming transactions are committed while the slave > > >> (which happens to have 4x the disks because it keeps PITR backups for a > > >> period too) it still chugging away on SCSI probes yet, not gotten to > > >> having PostgreSQL up yet... > > > > > > Nobody has mentioned the ability to persist the not-committed state > > > across a crash before, and I think it's an important discussion point. > > > > Eh? I think all Aidan is asking for is the ability to have a mode > > where sync rep is really always sync, or nothing commits. Rather than > > timing out and continuing merrily on its way... > > Right, I'm not asking for a "new" mode. I'm just hope that there will > be a way to guarantee my "sync rep" is actually replicating. Having it > "not replicate" simply because no slave has (yet) connected means I have > to dance jigs around pg_hba.conf so that it won't allow non-replication > connections until I've manual verified that the replication slave > is connected... I agree that aspect is a problem. One solution, to me, would be to have a directive included in the pg_hba.conf that says entries below it are only allowed if it passes the test. So your hba file looks like this local postgrespostgres hostreplication ... needreplication hostany any So the "need" test is an extra option in the first column. We might want additional "need" tests before we allow other rules also. Text following the "need" verb will be additional info for that test, sufficient to allow some kind of execution on the backend. I definitely don't like the idea that anyone that commits will just sit there waiting until the standby comes up. That just sounds an insane way of doing it. -- Simon Riggs www.2ndQuadrant.com PostgreSQL Development, 24x7 Support, Training and Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Report: removing the inconsistencies in our CVS->git conversion
Robert Haas writes: > Magnus posted an updated conversion this morning. > http://git.postgresql.org/gitweb?p=postgresql-migration.git;a=summary > Evidently, however, he didn't do the same things you did, because > there are DEFINITELY more than 9 manufactured commits in this one. Um ... I just did git clone git://git.postgresql.org/git/postgresql-migration.git and I only see nine. It's got some *other* problems though; compared to a conversion I just finished locally, it's missing a whole lot of history for some of the old jdbc files. Is there any possibility that "git clone" isn't very trustworthy? It's a bit scary that we don't see identical views of this repository. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Configuring synchronous replication
* Robert Haas [100917 11:24]: > On Fri, Sep 17, 2010 at 11:22 AM, Simon Riggs wrote: > > On Fri, 2010-09-17 at 09:36 -0400, Aidan Van Dyk wrote: > > > >> I want to have them configured in a fsync WAL/style sync rep, I want to > >> make sure that if the master comes up first after I get power back, it's > >> not going to be claiming transactions are committed while the slave > >> (which happens to have 4x the disks because it keeps PITR backups for a > >> period too) it still chugging away on SCSI probes yet, not gotten to > >> having PostgreSQL up yet... > > > > Nobody has mentioned the ability to persist the not-committed state > > across a crash before, and I think it's an important discussion point. > > Eh? I think all Aidan is asking for is the ability to have a mode > where sync rep is really always sync, or nothing commits. Rather than > timing out and continuing merrily on its way... Right, I'm not asking for a "new" mode. I'm just hope that there will be a way to guarantee my "sync rep" is actually replicating. Having it "not replicate" simply because no slave has (yet) connected means I have to dance jigs around pg_hba.conf so that it won't allow non-replication connections until I've manual verified that the replication slave is connected... a. -- Aidan Van Dyk Create like a god, ai...@highrise.ca command like a king, http://www.highrise.ca/ work like a slave. signature.asc Description: Digital signature
Re: [HACKERS] bg worker: general purpose requirements
Hi, On 09/16/2010 07:47 PM, Robert Haas wrote: It would be nice if there were a way to create a general facility here that we could then build various applications on, but I'm not sure whether that's the case. We had some back-and-forth about what is best for replication vs. what is best for vacuum vs. what is best for parallel query. If we could somehow conceive of a system that could serve all of those needs without introducing any more configuration complexity than what we have now, that would of course be very interesting. Lets think about this again from a little distance. We have the existing autovacuum and the Postgres-R project. Then there are the potential features 'parallel querying' and 'autonomous transactions' that could in principle benefit from the bgworker infrastructure. For all of those, one could head for a multi-threaded, a multi-process or an async, event based approach. Multi-threading seems to be out of question for Postgres. We don't have much of an async event framework anywhere, so at least for parallel querying that seems out of question as well. Only the 'autonomous transactions' feature seems simple enough to be doable within a single process. That approach would still miss the isolation that a separate process features (not sure that's required, but 'autonomous' sounds like it could be a good thing to have). So assuming we use the multi-process approach provided by bgworkers for both potential features. What are the requirements? autovacuum: only very few jobs at a time, not very resource intensive, not passing around lots of data Postgres-R: lots of concurrent jobs, easily more than normal backends, depending on the amount of nodes in the cluster and read/write ratio, lots of data to be passed around parallel querying: a couple dozen concurrent jobs (by number of CPUs or spindles available?), more doesn't help, lots of data to be passed around autonomous transactions: max. one per normal backend (correct?), way fewer should suffice in most cases, only control data to be passed around So, for both potential features as well as for autovacuum, a ratio of 1:10 (or even less) for max_bgworkers:max_connections would suffice. Postgres-R clearly seems to be the out-breaker here. It needs special configuration anyway, so I'd have no problem with defaults that target the other use cases. All of the potential users of bgworkers benefit from a pre-connected bgworker. Meaning having at least one spare bgworker around per database could be beneficial, potentially more depending on how often spike loads occur. As long as there are only few databases, it's easily possible to have at least one spare process around per database, but with thousands of databases, that might get prohibitively expensive (not sure where the boundary between win vs loose is, though. Idle backends vs. connection cost). None the less, bgworkers would make the above features easier to implement, as they provide the controlled background worker process infrastructure, including job handling (and even queuing) in the coordinator process. Having spare workers available is not a perquisite to use bgworkers, it's just an optimization. Autovacuum could possibly benefit from bgworkers by enabling a finer grained choice for what database and table to vacuum when. I didn't look too much into that, though. Regarding the additional configuration overhead of the bgworkers patch: max_autovacuum_workers gets turned into max_background_workers, so the only additional GUCs currently are: min_spare_background_workers and max_spare_background_workers (sorry, I thought I named them idle workers, looks like I've gone with spare workers for the GUCs). Those are used to control and limit (in both directions) the amount of spare workers (per database). It's the simplest possible variant I could think of. But I'm open to other mechanisms, especially ones that require less configuration. Simply keeping spare workers around for a given timeout *could* be a replacement and would save us one GUC. However, I feel like this gives less control over how the bgworkers are used. For example, I'd prefer to be able to prevent the system from allocating all bgworkers to a single database at once. And as mentioned above, it also makes sense to pre-fork some bgworkers in advance, if there are still enough available. The timeout approach doesn't take care of that, but assumes that the past is a good indicator of use for the future. Hope that sheds some more light on how bgworkers could be useful. Maybe I just need to describe the job handling features of the coordinator better as well? (Simon also requested better documentation...) Regards Markus Wanner -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Configuring synchronous replication
On Fri, Sep 17, 2010 at 11:22 AM, Simon Riggs wrote: > On Fri, 2010-09-17 at 09:36 -0400, Aidan Van Dyk wrote: > >> I want to have them configured in a fsync WAL/style sync rep, I want to >> make sure that if the master comes up first after I get power back, it's >> not going to be claiming transactions are committed while the slave >> (which happens to have 4x the disks because it keeps PITR backups for a >> period too) it still chugging away on SCSI probes yet, not gotten to >> having PostgreSQL up yet... > > Nobody has mentioned the ability to persist the not-committed state > across a crash before, and I think it's an important discussion point. Eh? I think all Aidan is asking for is the ability to have a mode where sync rep is really always sync, or nothing commits. Rather than timing out and continuing merrily on its way... -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise Postgres Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Configuring synchronous replication
On Fri, 2010-09-17 at 09:36 -0400, Aidan Van Dyk wrote: > I want to have them configured in a fsync WAL/style sync rep, I want to > make sure that if the master comes up first after I get power back, it's > not going to be claiming transactions are committed while the slave > (which happens to have 4x the disks because it keeps PITR backups for a > period too) it still chugging away on SCSI probes yet, not gotten to > having PostgreSQL up yet... Nobody has mentioned the ability to persist the not-committed state across a crash before, and I think it's an important discussion point. We already have it: its called "two phase commit". (2PC) If you run 2PC on 3 servers and one goes down, you can just commit the in-flight transactions and continue. But it doesn't work on hot standby. It could: If we want that we could prepare the transaction on the master and don't allow commit until we get positive confirmation from standby. All of the machinery is there. I'm not sure if that's a 5th sync rep mode, or that idea is actually good enough to replace all the ideas we've had up until now. I would say probably not, but we should think about this. A slightly modified idea would be avoid writing the transaction prepare file as a separate file, just write the WAL for the prepare. We then remember the LSN of the prepare so we can re-access the WAL copy of it by re-reading the WAL files on master. Make sure we don't get rid of WAL that refers to waiting transactions. That would then give us the option to commit or abort depending upon whether we receive a reply within timeout. -- Simon Riggs www.2ndQuadrant.com PostgreSQL Development, 24x7 Support, Training and Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Report: removing the inconsistencies in our CVS->git conversion
On Tue, Sep 14, 2010 at 10:19 AM, Tom Lane wrote: > Attached is an updated repository.fixups script that inserts dead > revisions in every case where a new file was back-patched into an > existing branch. With that, we are down to a total of nine manufactured > commits, to wit: > [details] Magnus posted an updated conversion this morning. http://git.postgresql.org/gitweb?p=postgresql-migration.git;a=summary Evidently, however, he didn't do the same things you did, because there are DEFINITELY more than 9 manufactured commits in this one. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise Postgres Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Serializable Snapshot Isolation
Tom Lane wrote: > That assumption is absolutely, totally not going to fly. Understood; I'm already working on it based on Heikki's input. >> This needs to work when the xid of a transaction is found in the >> MVCC data of a tuple for any overlapping serializable transaction >> -- even if that transaction has completed and its connection has >> been closed. It didn't look to me like >> SubTransGetTopmostTransaction() would work after the transaction >> was gone. > > Yes, it should work. If it doesn't, you are failing to manage the > TransactionXmin horizon correctly. So far I haven't wanted to mess with the global xmin values for fear of the possible impact on other transactions. It actually hasn't been that hard to maintain a SerializableGlobalXmin value, which is more efficient than the existing ones for predicate lock cleanup purposes. That still isn't exactly what I need to modify cleanup of the subtransaction information, though. Once I've got my head around the subtrans.c code, I think I'll need to maintain a minimum that includes the xids for serializable transactions which *overlap* SerializableGlobalXmin. That doesn't seem very hard to do; I just haven't needed it until now. Then I'll modify the subtransaction cleanup to only remove entries before the earlier of the global xmin of all transactions and the xmin of serializable transactions which overlap active serializable transactions. Does all that sound reasonable? -Kevin -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Serializable Snapshot Isolation
"Kevin Grittner" writes: > Heikki Linnakangas wrote: >> That sounds like it can eat through your shared memory very quickly >> if you have a lot of subtransactions. > Hmmm I've never explicitly used subtransactions, so I don't tend > to think of them routinely going too deep. And the struct is pretty > small. That assumption is absolutely, totally not going to fly. >> Why not use SubTransGetTopmostTransaction() ? > This needs to work when the xid of a transaction is found in the MVCC > data of a tuple for any overlapping serializable transaction -- even > if that transaction has completed and its connection has been > closed. It didn't look to me like SubTransGetTopmostTransaction() > would work after the transaction was gone. Yes, it should work. If it doesn't, you are failing to manage the TransactionXmin horizon correctly. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Configuring synchronous replication
* Fujii Masao [100917 07:57]: > Synchronous replication is basically used to reduce the > downtime, and "wait forever" option opposes that. Hm... I'm not sure I'ld agree with that. I'ld rather have some downtime, and my data available, then have less downtime, but find that I'm missing valuable data that was committed, but happend to not be replicated because no slave was available "yet". Sync rep is about "data availability", "data recoverability", *and* "downtime". The three are definitely related, but each use has their own tradeoffs. a. -- Aidan Van Dyk Create like a god, ai...@highrise.ca command like a king, http://www.highrise.ca/ work like a slave. signature.asc Description: Digital signature
Re: [HACKERS] Configuring synchronous replication
* Robert Haas [100917 07:44]: > On Fri, Sep 17, 2010 at 7:31 AM, Simon Riggs wrote: > > The only thing standby registration allows you to do is know whether > > there was supposed to be a standby there, but yet it isn't there now. I > > don't see that point as being important because it seems strange to me > > to want to wait for a standby that ought to be there, but isn't anymore. > > What happens if it never comes back? Manual intervention required. > > The absence of registration in my patch makes some things easier and > > some things harder. For example, you can add a new standby without > > editing the config on the master. > > That's actually one of the reasons why I like the idea of > registration. It seems rather scary to add a new standby without > editing the config on the master. Actually, adding a new fully-async > slave without touching the master seems reasonable, but adding a new > sync slave without touching the master gives me the willies. The > behavior of the system could change quite sharply when you do this, > and it might not be obvious what has happened. (Imagine DBA #1 makes > the change and DBA #2 is then trying to figure out what's happened - > he checks the configs of all the machines he knows about and finds > them all unchanged... head-scratching ensues.) So, those both give me the willies too... I've had a rack loose all power. Now, let's say I've got two servers (plus trays of disks for each) in the same rack. Ya, I know, I should move them to separate racks, preferably in separate buildings on the same campus, but realistically... I want to have them configured in a fsync WAL/style sync rep, I want to make sure that if the master comes up first after I get power back, it's not going to be claiming transactions are committed while the slave (which happens to have 4x the disks because it keeps PITR backups for a period too) it still chugging away on SCSI probes yet, not gotten to having PostgreSQL up yet... And I want to make sure the dev box that was testing another slave setup on, which is running in some test area by some other DBA, but not in the same rack, *can't* through some mis-configuration make my master think that it's production slave has properly fsync'ed the replicated WAL. -- Aidan Van Dyk Create like a god, ai...@highrise.ca command like a king, http://www.highrise.ca/ work like a slave. signature.asc Description: Digital signature
Re: [HACKERS] Configuring synchronous replication
On 17/09/10 15:56, Simon Riggs wrote: On Fri, 2010-09-17 at 13:41 +0300, Heikki Linnakangas wrote: On 17/09/10 12:49, Simon Riggs wrote: Without performance tests to demonstrate "why", these do sound hard to understand. But we should note that DRBD offers recv ("B") and fsync ("C") as separate options. And Oracle implements all 3 of recv, fsync and apply. Neither of them describe those options so simply and easily as the way we are proposing with a 4 valued enum (with async as the fourth option). If we have only one option for sync_rep = 'on' which of recv | fsync | apply would it implement? You don't mention that. Which do you choose? You would choose between recv, fsync and apply in the slave, with a GUC. So you would have both registration on the master and parameter settings on the standby? I doubt you mean that, so possibly need more explanation there for me to understand what you mean and also why you would do that. Yes, that's what I meant. No-one else seems to think that's a good idea :-). I don't expect any meaningful differences in terms of performance between any of the discussed options. The big question right now is... This is the critical point. Politely, I would observe that *You* do not think there is a meaningful difference. *I* do, and evidence suggests that both Oracle and DRBD think so too. So we differ on what the "big question" is here. We must be talking about different things again. There's certainly big differences in the different synchronization levels and configurations, but I don't expect there to be big performance differences between patches to implement those levels. Once we got rid of the polling loops, I expect the network and disk latencies to dominate. -- Heikki Linnakangas EnterpriseDB http://www.enterprisedb.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Configuring synchronous replication
On Fri, 2010-09-17 at 21:43 +0900, Fujii Masao wrote: > On Fri, Sep 17, 2010 at 5:09 PM, Heikki Linnakangas > wrote: > > * Quorum commit. Wait until n standbys acknowledge. n=1 and n=all servers > > can be seen as important special cases of this. > > I think that we should skip quorum commit at the first phase > because the design seems to be still poorly-thought-out. Agreed > I'm concerned about the case where the faster synchronous standby > goes down and the lagged synchronous one remains when n=1. In this > case, some transactions marked as committed in a client might not > be replicated to the remaining synchronous standby yet. What if > the master goes down at this point? How can we determine whether > promoting the remaining standby to the master causes data loss? In that config if the faster sync standby goes down then your application performance goes down dramatically. That would be fragile. So you would set up like this master - requests are > async standby1 - fast - so use recv | fsync | apply standby2 - async So if standby1 goes down we don't wait for standby2, but we do continue to stream to it. -- Simon Riggs www.2ndQuadrant.com PostgreSQL Development, 24x7 Support, Training and Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Configuring synchronous replication
On Fri, 2010-09-17 at 13:41 +0300, Heikki Linnakangas wrote: > On 17/09/10 12:49, Simon Riggs wrote: > > Fujii has long talked about 4 levels of service also. Why change? I had > > thought that part was pretty much agreed between all of us. > > Now you lost me. I agree that we need 4 levels of service (at least > ultimately, not necessarily in the first phase). OK, good. > > Without performance tests to demonstrate "why", these do sound hard to > > understand. But we should note that DRBD offers recv ("B") and fsync > > ("C") as separate options. And Oracle implements all 3 of recv, fsync > > and apply. Neither of them describe those options so simply and easily > > as the way we are proposing with a 4 valued enum (with async as the > > fourth option). > > > > If we have only one option for sync_rep = 'on' which of recv | fsync | > > apply would it implement? You don't mention that. Which do you choose? > > You would choose between recv, fsync and apply in the slave, with a GUC. So you would have both registration on the master and parameter settings on the standby? I doubt you mean that, so possibly need more explanation there for me to understand what you mean and also why you would do that. > > I no longer seek to persuade by words alone. The existence of my patch > > means that I think that only measurements and tests will show why I have > > been saying these things. We need performance tests. > > I don't expect any meaningful differences in terms of performance > between any of the discussed options. The big question right now is... This is the critical point. Politely, I would observe that *You* do not think there is a meaningful difference. *I* do, and evidence suggests that both Oracle and DRBD think so too. So we differ on what the "big question" is here. It's sounding to me that if we don't know these things, then we're quite a long way from committing something. This is basic research. > what > features we provide and how they're configured. Performance will depend > primarily on the mode you use, and secondarily on the implementation of > the mode. It would be completely premature to do performance testing yet > IMHO. If a patch is "ready" then we should be able to performance test it *before* we commit it. From what you say it sounds like Fujii's patch might yet require substantial tuning, so it might even be the case that my patch is closer in terms of readiness to commit. Whatever the case, we have two patches and I can't see any benefit in avoiding performance tests. > >> Putting all of that together. I think Fujii-san's standby.conf is pretty > >> close. > > > >> What it needs is the additional GUC for transaction-level control. > > > > The difference between the patches is not a simple matter of a GUC. > > > > My proposal allows a single standby to provide efficient replies to > > multiple requested durability levels all at the same time. With > > efficient use of network resources. ISTM that because the other patch > > cannot provide that you'd like to persuade us that we don't need that, > > ever. You won't sell me on that point, cos I can see lots of uses for > > it. > > Simon, how the replies are sent is an implementation detail I haven't > given much thought yet. It seems clear we've thought about different details around these topics. Now I understand your work on latches, I see it is an important contribution and I very much respect that. IMHO, each of us has seen something important that the other has not. > The reason we delved into that discussion > earlier was that you seemed to contradict yourself with the claims that > you don't need to send more than one reply per transaction, and that the > standby doesn't need to know the synchronization level. Other than that > the curiosity about that contradiction, it doesn't seem like a very > interesting detail to me right now. It's not a question that drives the > rest of the design, but the other way round. There was no contradiction. You just didn't understand how it could be possible, so dismissed it. It's a detail, yes. Some are critical, some are not. (e.g. latches.) My view is that it is critical and drives the design. So I don't agree with you on "the other way around". > But FWIW, something like your proposal of sending 3 XLogRecPtrs in each > reply seems like a good approach. I'm not sure about using walwriter. I > can see that it helps with getting the 'recv' and 'replay' > acknowledgments out faster, but > I still have the scars from starting > bgwriter during recovery. I am happy to apologise for those problems. I was concentrating on HS at the time, not on that aspect. You sorted out those problems for me and I thank you for that. With that in mind, I will remove the aspect of my patch that relate to starting wal writer. Small amount of code only. That means we will effectively disable recv mode for now, but I definitely want to be able to put it back later. -- Simon Riggs
Re: [HACKERS] Configuring synchronous replication
On Fri, 2010-09-17 at 20:56 +0900, Fujii Masao wrote: > On Fri, Sep 17, 2010 at 7:41 PM, Heikki Linnakangas > wrote: > >> The question is do we want standby registration on master and if so, > >> why? > > > > Well, aside from how to configure synchronous replication, standby > > registration would help with retaining the right amount of WAL in the > > master. wal_keep_segments doesn't guarantee that enough is retained, and > > OTOH when all standbys are connected you retain much more than might be > > required. > > Yep. Setting wal_keep_segments is difficult, but its not a tunable. The sysadmin needs to tell us what is the maximum number of files she'd like to keep. Otherwise we may fill up a disk, use space intended for use by another app, etc.. The server cannot determine what limits the sysadmin may wish to impose. The only sane default is 0, because "store everything, forever" makes no sense. Similarly, if we register a server, it goes down and we forget to deregister it then we will attempt to store everything, forever and our system will go down. The bigger problem is base backups, not server restarts. We don't know how to get that right because we don't register base backups automatically. If we did dynamically alter the number of WALs we store then we'd potentially screw up new base backups. Server registration won't help with that at all, so you'd need to add a base backup registration scheme as well. But even if you had that, you'd still need a "max" setting defined by sysadmin. So the only sane thing to do is to set wal_keep_segments as high as possible. And doing that doesn't need server reg. > And standby registration is required when we support "wait forever when > synchronous standby isn't connected at the moment" option that Heikki > explained upthread. Though I don't think that standby registration is > required in the first phase since "wait forever" option is not used in > basic use case. Synchronous replication is basically used to reduce the > downtime, and "wait forever" option opposes that. Agreed, but I'd say "if" we support that. -- Simon Riggs www.2ndQuadrant.com PostgreSQL Development, 24x7 Support, Training and Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Configuring synchronous replication
On Fri, 2010-09-17 at 21:20 +0900, Fujii Masao wrote: > On Fri, Sep 17, 2010 at 8:31 PM, Simon Riggs wrote: > > The only thing standby registration allows you to do is know whether > > there was supposed to be a standby there, but yet it isn't there now. I > > don't see that point as being important because it seems strange to me > > to want to wait for a standby that ought to be there, but isn't anymore. > > According to what I heard, some people want to guarantee that all the > transactions are *always* written in *all* the synchronous standbys. > IOW, they want to keep the transaction waiting until it has been written > in all the synchronous standbys. Standby registration is required to > support such a use case. Without the registration, the master cannot > determine whether the transaction has been written in all the synchronous > standbys. You don't need standby registration at all. You can do that with a single parameter, already proposed: quorum_commit = N. But most people said they didn't want it. If they do we can put it back later. I don't think we're getting anywhere here. I just don't see any *need* to have it. Some people might *want* to set things up that way, and if that's true, that's enough for me to agree with them. The trouble is, I know some people have said they *want* to set it in the standby and we definitely *need* to set it somewhere. After this discussion, I think "both" is easily done and quite cool. -- Simon Riggs www.2ndQuadrant.com PostgreSQL Development, 24x7 Support, Training and Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Configuring synchronous replication
On Fri, Sep 17, 2010 at 8:43 AM, Fujii Masao wrote: > On Fri, Sep 17, 2010 at 5:09 PM, Heikki Linnakangas > wrote: >> * Quorum commit. Wait until n standbys acknowledge. n=1 and n=all servers >> can be seen as important special cases of this. > > I think that we should skip quorum commit at the first phase > because the design seems to be still poorly-thought-out. > > I'm concerned about the case where the faster synchronous standby > goes down and the lagged synchronous one remains when n=1. In this > case, some transactions marked as committed in a client might not > be replicated to the remaining synchronous standby yet. What if > the master goes down at this point? How can we determine whether > promoting the remaining standby to the master causes data loss? Yep. That issue has been raised before, and I think it's quite valid. That's not to say the feature isn't valid, but I think trying to include it in the first commit is going to lead to endless wrangling about design. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise Postgres Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Configuring synchronous replication
On Fri, Sep 17, 2010 at 5:09 PM, Heikki Linnakangas wrote: > * Quorum commit. Wait until n standbys acknowledge. n=1 and n=all servers > can be seen as important special cases of this. I think that we should skip quorum commit at the first phase because the design seems to be still poorly-thought-out. I'm concerned about the case where the faster synchronous standby goes down and the lagged synchronous one remains when n=1. In this case, some transactions marked as committed in a client might not be replicated to the remaining synchronous standby yet. What if the master goes down at this point? How can we determine whether promoting the remaining standby to the master causes data loss? Regards, -- Fujii Masao NIPPON TELEGRAPH AND TELEPHONE CORPORATION NTT Open Source Software Center -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Configuring synchronous replication
On Fri, 2010-09-17 at 21:20 +0900, Fujii Masao wrote: > What synchronization level does each combination of sync_replication > and sync_replication_service lead to? I'd like to see something like > the following table. > > sync_replication | sync_replication_service | result > --+--+ > async| async| ??? > async| recv | ??? > async| fsync| ??? > async| apply| ??? > recv | async| ??? > ... Good question. There are only 4 possible outcomes. There is no combination, so we don't need a table like that above. The "service" specifies the highest request type available from that specific standby. If someone requests a higher service than is currently offered by this standby, they will either a) get that service from another standby that does offer that level b) automatically downgrade the sync rep mode to the highest available. For example, if you request recv but there is only one standby and it only offers async, then you get downgraded to async. In all cases, if you request async then we act same as 9.0. -- Simon Riggs www.2ndQuadrant.com PostgreSQL Development, 24x7 Support, Training and Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] patch: Add JSON datatype to PostgreSQL (GSoC, WIP)
On Fri, Aug 13, 2010 at 7:33 PM, Joseph Adams wrote: > Updated patch: the JSON code has all been moved into core, so this > patch is now for a built-in data type. I have a question about the design of the JSON type. Why do we need to store the value in UTF8 encoding? It's true the RFC of JSON says the the encoding SHALL be encoded in Unicode, but I don't understand why we should reject other encodings. As I said before, I'd like to propose only 3 features in the commitfest: * TYPE json data type * text to json: FUNCTION json_parse(text) * json to text: FUNCTION json_stringify(json, whitelist, space) JSONPath will be re-implemented on the basic functionalities in the subsequent commitfest. Do you have a plan to split your patch? Or, can I continue to develop my patch? If so, JSONPath needs to be adjusted to the new infrastructure. I think json_parse() and json_stringify() is well-known APIs for JSON: https://developer.mozilla.org/En/Using_JSON_in_Firefox So, it'd be worth buying the names and signatures for our APIs. (I'll rename json_pretty in my previous patch to json_stringify.) -- Itagaki Takahiro -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Report: removing the inconsistencies in our CVS->git conversion
On Tue, Sep 14, 2010 at 12:01:18PM -0400, Tom Lane wrote: > Well ... I guess the other attitude we could take is that that was a > private development branch of Michael's. If we'd been working in git Actually it wasn't. This branch was created when ecpg grew too big for the released version of bison. We had to use a development snapshot of bison to even compile ecpg back then and didn't want to put the ecpg changes into the main branch because this would have prevented most people from compiling PG. At least that's what I still remember. As far as converting this branch to git, I'd say forget about it. All changes to the parser went into CVS HEAD as soon as a suitable bison version was released. Michael -- Michael Meskes Michael at Fam-Meskes dot De, Michael at Meskes dot (De|Com|Net|Org) Michael at BorussiaFan dot De, Meskes at (Debian|Postgresql) dot Org Jabber: michael.meskes at googlemail dot com VfL Borussia! Força Barça! Go SF 49ers! Use Debian GNU/Linux, PostgreSQL -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Configuring synchronous replication
On Fri, Sep 17, 2010 at 8:31 PM, Simon Riggs wrote: > The only thing standby registration allows you to do is know whether > there was supposed to be a standby there, but yet it isn't there now. I > don't see that point as being important because it seems strange to me > to want to wait for a standby that ought to be there, but isn't anymore. According to what I heard, some people want to guarantee that all the transactions are *always* written in *all* the synchronous standbys. IOW, they want to keep the transaction waiting until it has been written in all the synchronous standbys. Standby registration is required to support such a use case. Without the registration, the master cannot determine whether the transaction has been written in all the synchronous standbys. > What happens if it never comes back? Manual intervention required. Yep. > In the use cases we discussed we had simple 2 or 3 server configs. > > master > standby1 - preferred sync target - set to recv, fsync or apply > standby2 - non-preferred sync target, maybe test server - set to async > > So in the two cases you mention we might set > > "wait for ack from reporting slave" > master: sync_replication = 'recv' #as default, can be changed > reporting-slave: sync_replication_service = 'recv' #gives max level > > "wait until replayed in the server on the west coast" > master: sync_replication = 'recv' #as default, can be changed > west-coast: sync_replication_service = 'apply' #gives max level What synchronization level does each combination of sync_replication and sync_replication_service lead to? I'd like to see something like the following table. sync_replication | sync_replication_service | result --+--+ async| async| ??? async| recv | ??? async| fsync| ??? async| apply| ??? recv | async| ??? ... Regards, -- Fujii Masao NIPPON TELEGRAPH AND TELEPHONE CORPORATION NTT Open Source Software Center -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Serializable Snapshot Isolation
Heikki Linnakangas wrote: > On 17/09/10 14:56, Kevin Grittner wrote: >> Heikki Linnakangas wrote: >>> Why not use SubTransGetTopmostTransaction() ? >> >> This needs to work when the xid of a transaction is found in the >> MVCC data of a tuple for any overlapping serializable transaction >> -- even if that transaction has completed and its connection has >> been closed. It didn't look to me like >> SubTransGetTopmostTransaction() would work after the transaction >> was gone. > > You're right, it doesn't retain that old transactions. But it could > easily be modified to do so. I shall look into it. -Kevin -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Serializable Snapshot Isolation
On 17/09/10 14:56, Kevin Grittner wrote: Heikki Linnakangas wrote: Why not use SubTransGetTopmostTransaction() ? This needs to work when the xid of a transaction is found in the MVCC data of a tuple for any overlapping serializable transaction -- even if that transaction has completed and its connection has been closed. It didn't look to me like SubTransGetTopmostTransaction() would work after the transaction was gone. You're right, it doesn't retain that old transactions. But it could easily be modified to do so. -- Heikki Linnakangas EnterpriseDB http://www.enterprisedb.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Configuring synchronous replication
On Fri, Sep 17, 2010 at 7:41 PM, Heikki Linnakangas wrote: >> The question is do we want standby registration on master and if so, >> why? > > Well, aside from how to configure synchronous replication, standby > registration would help with retaining the right amount of WAL in the > master. wal_keep_segments doesn't guarantee that enough is retained, and > OTOH when all standbys are connected you retain much more than might be > required. Yep. And standby registration is required when we support "wait forever when synchronous standby isn't connected at the moment" option that Heikki explained upthread. Though I don't think that standby registration is required in the first phase since "wait forever" option is not used in basic use case. Synchronous replication is basically used to reduce the downtime, and "wait forever" option opposes that. Regards, -- Fujii Masao NIPPON TELEGRAPH AND TELEPHONE CORPORATION NTT Open Source Software Center -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Serializable Snapshot Isolation
Heikki Linnakangas wrote: > So, the purpose of SerializableXidHash is to provide quick access > to the SERIALIZABLEXACT struct of a top-level transaction, when you > know its transaction id or any of its subtransaction ids. Right. > To implement the "or any of its subtransaction ids" part, you need > to have a SERIALIZABLEXID struct for each subtransaction in shared > memory. Close -- each subtransaction which writes any tuples. > That sounds like it can eat through your shared memory very quickly > if you have a lot of subtransactions. Hmmm I've never explicitly used subtransactions, so I don't tend to think of them routinely going too deep. And the struct is pretty small. > Why not use SubTransGetTopmostTransaction() ? This needs to work when the xid of a transaction is found in the MVCC data of a tuple for any overlapping serializable transaction -- even if that transaction has completed and its connection has been closed. It didn't look to me like SubTransGetTopmostTransaction() would work after the transaction was gone. I guess that's something I should mention in the comments -Kevin -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Configuring synchronous replication
On Fri, Sep 17, 2010 at 7:31 AM, Simon Riggs wrote: > The only thing standby registration allows you to do is know whether > there was supposed to be a standby there, but yet it isn't there now. I > don't see that point as being important because it seems strange to me > to want to wait for a standby that ought to be there, but isn't anymore. > What happens if it never comes back? Manual intervention required. > > (We agree on how to handle a standby that *is* "connected", yet never > returns a reply or takes too long to do so). Doesn't Oracle provide a mode where it shuts down if this occurs? > The absence of registration in my patch makes some things easier and > some things harder. For example, you can add a new standby without > editing the config on the master. That's actually one of the reasons why I like the idea of registration. It seems rather scary to add a new standby without editing the config on the master. Actually, adding a new fully-async slave without touching the master seems reasonable, but adding a new sync slave without touching the master gives me the willies. The behavior of the system could change quite sharply when you do this, and it might not be obvious what has happened. (Imagine DBA #1 makes the change and DBA #2 is then trying to figure out what's happened - he checks the configs of all the machines he knows about and finds them all unchanged... head-scratching ensues.) -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise Postgres Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Configuring synchronous replication
On Fri, Sep 17, 2010 at 6:41 AM, Heikki Linnakangas wrote: >>> So what should the user interface be like? Given the 1st and 2nd >>> requirement, we need standby registration. If some standbys are >>> important and others are not, the master needs to distinguish between >>> them to be able to determine that a transaction is safely delivered to >>> the important standbys. >> >> My patch provides those two requirements without standby registration, >> so we very clearly don't "need" standby registration. > > It's still not clear to me how you would configure things like "wait for ack > from reporting slave, but not other slaves" or "wait until replayed in the > server on the west coast" in your proposal. Maybe it's possible, but doesn't > seem very intuitive, requiring careful configuration in both the master and > the slaves. Agreed. I think this will be much simpler if all the configuration is in one place (on the master). > In your proposal, you also need to be careful not to connect e.g a test > slave with "synchronous_replication_service = apply" to the master, or it > will possible shadow a real production slave, acknowledging transactions > that are not yet received by the real slave. It's certainly possible to > screw up with standby registration too, but you have more direct control of > the master behavior in the master, instead of distributing it across all > slaves. Similarly agreed. >> The question is do we want standby registration on master and if so, >> why? > > Well, aside from how to configure synchronous replication, standby > registration would help with retaining the right amount of WAL in the > master. wal_keep_segments doesn't guarantee that enough is retained, and > OTOH when all standbys are connected you retain much more than might be > required. +1. > Giving names to slaves also allows you to view their status in the master in > a more intuitive format. Something like: > > postgres=# SELECT * FROM pg_slave_status ; > name | connected | received | fsyncd | applied > +---+++ > reporting | t | 0/2620 | 0/2620 | 0/25550020 > ha-standby | t | 0/2620 | 0/2620 | 0/2620 > testserver | f | | 0/1520 | > (3 rows) +1. Having said all of the above, I am not in favor your (Heikki's) proposal to configure sync/async on the slave and the level on the master. That seems like a somewhat bizarre division of labor, splitting what is essentially one setting across two machines. >>> For the control between async/recv/fsync/replay, I like to think in >>> terms of >>> a) asynchronous vs synchronous >>> b) if it's synchronous, how synchronous is it? recv, fsync or replay? >>> >>> I think it makes most sense to set sync vs. async in the master, and the >>> level of synchronicity in the slave. Although I have sympathy for the >>> argument that it's simpler if you configure it all from the master side >>> as well. >> >> I have catered for such requests by suggesting a plugin that allows you >> to implement that complexity without overburdening the core code. > > Well, plugins are certainly one possibility, but then we need to design the > plugin API. I've been thinking along the lines of a proxy, which can > implement whatever logic you want to decide when to send the acknowledgment. > With a proxy as well, if we push any features people that want to a proxy or > plugin, we need to make sure that the proxy/plugin has all the necessary > information available. I'm not really sold on the proxy idea. That seems like it adds a lot of configuration complexity, not to mention additional hops. Of course, the plug-in idea also won't be suitable for any but the most advanced users. I think of the two I prefer the idea of a plug-in, slightly, but maybe this doesn't have to be done in version 1. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise Postgres Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Configuring synchronous replication
On Fri, 2010-09-17 at 13:41 +0300, Heikki Linnakangas wrote: > On 17/09/10 12:49, Simon Riggs wrote: > > This isn't just about UI, there are significant and important > > differences between the proposals in terms of the capability and control > > they offer. > > Sure. The point of focusing on the UI is that the UI demonstrates what > capability and control a proposal offers. My patch does not include server registration. It could be added later on top of my patch without any issues. The core parts of my patch are the fine grained transaction-level control and the ability to mix them dynamically with good performance. To me server registration is not a core issue. I'm not actively against it, I just don't see the need for it at all. Certainly not committed first, especially since its not actually needed by either of our patches. Standby registration doesn't provide *any* parameter that can't be supplied from standby recovery.conf. The only thing standby registration allows you to do is know whether there was supposed to be a standby there, but yet it isn't there now. I don't see that point as being important because it seems strange to me to want to wait for a standby that ought to be there, but isn't anymore. What happens if it never comes back? Manual intervention required. (We agree on how to handle a standby that *is* "connected", yet never returns a reply or takes too long to do so). > >> So what should the user interface be like? Given the 1st and 2nd > >> requirement, we need standby registration. If some standbys are > >> important and others are not, the master needs to distinguish between > >> them to be able to determine that a transaction is safely delivered to > >> the important standbys. > > > > My patch provides those two requirements without standby registration, > > so we very clearly don't "need" standby registration. > > It's still not clear to me how you would configure things like "wait for > ack from reporting slave, but not other slaves" or "wait until replayed > in the server on the west coast" in your proposal. Maybe it's possible, > but doesn't seem very intuitive, requiring careful configuration in both > the master and the slaves. In the use cases we discussed we had simple 2 or 3 server configs. master standby1 - preferred sync target - set to recv, fsync or apply standby2 - non-preferred sync target, maybe test server - set to async So in the two cases you mention we might set "wait for ack from reporting slave" master: sync_replication = 'recv' #as default, can be changed reporting-slave: sync_replication_service = 'recv' #gives max level "wait until replayed in the server on the west coast" master: sync_replication = 'recv' #as default, can be changed west-coast: sync_replication_service = 'apply' #gives max level The absence of registration in my patch makes some things easier and some things harder. For example, you can add a new standby without editing the config on the master. If you had 2 standbys, both offering the same level of protection, my proposal would *not* allow you to specify that you preferred one master over another. But we could add a priority parameter as well if that's an issue. > In your proposal, you also need to be careful not to connect e.g a test > slave with "synchronous_replication_service = apply" to the master, or > it will possible shadow a real production slave, acknowledging > transactions that are not yet received by the real slave. It's certainly > possible to screw up with standby registration too, but you have more > direct control of the master behavior in the master, instead of > distributing it across all slaves. > > > The question is do we want standby registration on master and if so, > > why? > > Well, aside from how to configure synchronous replication, standby > registration would help with retaining the right amount of WAL in the > master. wal_keep_segments doesn't guarantee that enough is retained, and > OTOH when all standbys are connected you retain much more than might be > required. > > Giving names to slaves also allows you to view their status in the > master in a more intuitive format. Something like: We can give servers a name without registration. It actually makes more sense to set the name in the standby and it can be passed through from standby when we connect. I very much like the idea of server names and think this next SRF looks really cool. > postgres=# SELECT * FROM pg_slave_status ; > name| connected | received | fsyncd | applied > +---+++ > reporting | t | 0/2620 | 0/2620 | 0/25550020 > ha-standby | t | 0/2620 | 0/2620 | 0/2620 > testserver | f || 0/1520 | > (3 rows) That could be added on top of my patch also. -- Simon Riggs www.2ndQuadrant.com PostgreSQL Development, 24x7 Support, Training and Services -- Sent
Re: [HACKERS] Configuring synchronous replication
Simon Riggs writes: > So far, I have added the point that if a user requests a level of > confirmation that is currently unavailable, then it will use the highest > level of confirmation available now. That stops us from waiting for > timeout for every transaction we run if standby goes down hard, which > just freezes the application for long periods to no real benefit. It > also prevents applications from requesting durability levels the cluster > cannot satisfy, in the opinion of the sysadmin, since the sysadmin > specifies the max level on each standby. That sounds like the commit-or-rollback when slave are gone question. I think this behavior should be user-setable, again per-transaction. I agree with you that the general case looks like your proposed default, but we already know that some will need "don't ack if not replied before the timeout", and they even will go as far as asking for it to be reported as a serialisation error of some sort, I guess… Regards, -- Dimitri Fontaine PostgreSQL DBA, Architecte -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Configuring synchronous replication
On 17/09/10 12:49, Simon Riggs wrote: This isn't just about UI, there are significant and important differences between the proposals in terms of the capability and control they offer. Sure. The point of focusing on the UI is that the UI demonstrates what capability and control a proposal offers. So what should the user interface be like? Given the 1st and 2nd requirement, we need standby registration. If some standbys are important and others are not, the master needs to distinguish between them to be able to determine that a transaction is safely delivered to the important standbys. My patch provides those two requirements without standby registration, so we very clearly don't "need" standby registration. It's still not clear to me how you would configure things like "wait for ack from reporting slave, but not other slaves" or "wait until replayed in the server on the west coast" in your proposal. Maybe it's possible, but doesn't seem very intuitive, requiring careful configuration in both the master and the slaves. In your proposal, you also need to be careful not to connect e.g a test slave with "synchronous_replication_service = apply" to the master, or it will possible shadow a real production slave, acknowledging transactions that are not yet received by the real slave. It's certainly possible to screw up with standby registration too, but you have more direct control of the master behavior in the master, instead of distributing it across all slaves. The question is do we want standby registration on master and if so, why? Well, aside from how to configure synchronous replication, standby registration would help with retaining the right amount of WAL in the master. wal_keep_segments doesn't guarantee that enough is retained, and OTOH when all standbys are connected you retain much more than might be required. Giving names to slaves also allows you to view their status in the master in a more intuitive format. Something like: postgres=# SELECT * FROM pg_slave_status ; name| connected | received | fsyncd | applied +---+++ reporting | t | 0/2620 | 0/2620 | 0/25550020 ha-standby | t | 0/2620 | 0/2620 | 0/2620 testserver | f || 0/1520 | (3 rows) For the control between async/recv/fsync/replay, I like to think in terms of a) asynchronous vs synchronous b) if it's synchronous, how synchronous is it? recv, fsync or replay? I think it makes most sense to set sync vs. async in the master, and the level of synchronicity in the slave. Although I have sympathy for the argument that it's simpler if you configure it all from the master side as well. I have catered for such requests by suggesting a plugin that allows you to implement that complexity without overburdening the core code. Well, plugins are certainly one possibility, but then we need to design the plugin API. I've been thinking along the lines of a proxy, which can implement whatever logic you want to decide when to send the acknowledgment. With a proxy as well, if we push any features people that want to a proxy or plugin, we need to make sure that the proxy/plugin has all the necessary information available. This strikes me as an "ad absurdum" argument. Since the above over-complexity would doubtless be seen as insane by Tom et al, it attempts to persuade that we don't need recv, fsync and apply either. Fujii has long talked about 4 levels of service also. Why change? I had thought that part was pretty much agreed between all of us. Now you lost me. I agree that we need 4 levels of service (at least ultimately, not necessarily in the first phase). Without performance tests to demonstrate "why", these do sound hard to understand. But we should note that DRBD offers recv ("B") and fsync ("C") as separate options. And Oracle implements all 3 of recv, fsync and apply. Neither of them describe those options so simply and easily as the way we are proposing with a 4 valued enum (with async as the fourth option). If we have only one option for sync_rep = 'on' which of recv | fsync | apply would it implement? You don't mention that. Which do you choose? You would choose between recv, fsync and apply in the slave, with a GUC. I no longer seek to persuade by words alone. The existence of my patch means that I think that only measurements and tests will show why I have been saying these things. We need performance tests. I don't expect any meaningful differences in terms of performance between any of the discussed options. The big question right now is what features we provide and how they're configured. Performance will depend primarily on the mode you use, and secondarily on the implementation of the mode. It would be completely premature to do performance testing yet IMHO. Putting all of that together. I think Fujii-san's standby.conf is pretty close.
Re: [HACKERS] Configuring synchronous replication
On Fri, 2010-09-17 at 12:30 +0300, Heikki Linnakangas wrote: > If the synchronicity is configured in the standby, how does the master > know that there's a synchronous slave out there that it should wait for, > if that slave isn't connected at the moment? That isn't a question you need standby registration to answer. In my proposal, the user requests a certain level of confirmation and will wait until timeout to see if it is received. The standby can crash and restart, come back and provide the answer, and it will still work. So it is the user request that informs the master that there would normally be a synchronous slave out there it should wait for. So far, I have added the point that if a user requests a level of confirmation that is currently unavailable, then it will use the highest level of confirmation available now. That stops us from waiting for timeout for every transaction we run if standby goes down hard, which just freezes the application for long periods to no real benefit. It also prevents applications from requesting durability levels the cluster cannot satisfy, in the opinion of the sysadmin, since the sysadmin specifies the max level on each standby. -- Simon Riggs www.2ndQuadrant.com PostgreSQL Development, 24x7 Support, Training and Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Configuring synchronous replication
Heikki Linnakangas writes: > If the synchronicity is configured in the standby, how does the master know > that there's a synchronous slave out there that it should wait for, if that > slave isn't connected at the moment? That's what quorum is trying to solve. The master knows how many votes per sync level the transaction needs. If no slave is acknowledging any vote, that's all you need to know to ROLLBACK (after the timeout), right? — if setup says so, on the master. > Yeah, the quorum stuff. That's all good, but doesn't change the way you > would do per-transaction control. That's when I bought in on the feature. It's all dynamic and distributed, and it offers per-transaction control. Regards, -- Dimitri Fontaine PostgreSQL DBA, Architecte -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] trigger failover with signal
On Fri, Sep 17, 2010 at 5:09 PM, Heikki Linnakangas wrote: > That said, there's a few small things that can be progressed regardless of > the details of synchronous replication. There's the changes to trigger > failover with a signal, and it seems that we'll need some libpq changes to > allow acknowledgments to be sent back to the master regardless of the rest > of the design. We can discuss those in separate threads in parallel. At first, we should fix the problem that startup process cannot respond to SIGTERM and SIGHUP immediately by committing the patch that I submitted in the following post. http://archives.postgresql.org/pgsql-hackers/2010-09/msg00858.php > pg_ctl failover ? At the moment, the location of the trigger file is > configurable, but if we accept a constant location like "$PGDATA/failover" > pg_ctl could do the whole thing, create the file and send signal. pg_ctl on > Window already knows how to send the "signal" via the named pipe signal > emulation. http://archives.postgresql.org/pgsql-hackers/2010-09/msg00797.php This makes sense. The remaining problem is which signal we should send to postmaster. SIGHUP, SIGTERM, SIGINT, SIGQUIT and SIGUSR1 are already used for other purposes. SIGUSR2 is actually unused, but reserved for children according to the source code comment. We can use SIGHUP to wake up startup process since postmaster sends it to startup process when it arrives. But the reload of the configuration files happens as a side-effect. So I think that SIGHUP handler should check whether the default trigger file exists first, and then skip the reload and just wake up startup process if it does. Thought? Or we should track the pid of startup process and make pg_ctl send the signal directly to startup process? Regards, -- Fujii Masao NIPPON TELEGRAPH AND TELEPHONE CORPORATION NTT Open Source Software Center -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Configuring synchronous replication
On Fri, 2010-09-17 at 11:09 +0300, Heikki Linnakangas wrote: > (changed subject again.) > > On 17/09/10 10:06, Simon Riggs wrote: > > I don't think we can determine how far to implement without considering > > both approaches in detail. With regard to your points below, I don't > > think any of those points could be committed first. > > Yeah, I think we need to decide on the desired feature set first, before > we dig deeper into the the patches. The design and implementation will > fall out of that. Well, we've discussed these things many times and talking hasn't got us very far on its own. We need measurements and neutral assessments. The patches are simple and we have time. This isn't just about UI, there are significant and important differences between the proposals in terms of the capability and control they offer. I propose we develop both patches further and performance test them. Many of the features I have proposed are performance related and people need to be able to see what is important, and what is not. But not through mere discussion, we need numbers to show which things matter and which things don't. And those need to be derived objectively. > * Support multiple standbys with various synchronization levels. > > * What happens if a synchronous standby isn't connected at the moment? > Return immediately vs. wait forever. > > * Per-transaction control. Some transactions are important, others are not. > > * Quorum commit. Wait until n standbys acknowledge. n=1 and n=all > servers can be seen as important special cases of this. > > * async, recv, fsync and replay levels of synchronization. That's a reasonable starting list of points, there may be others. > So what should the user interface be like? Given the 1st and 2nd > requirement, we need standby registration. If some standbys are > important and others are not, the master needs to distinguish between > them to be able to determine that a transaction is safely delivered to > the important standbys. My patch provides those two requirements without standby registration, so we very clearly don't "need" standby registration. The question is do we want standby registration on master and if so, why? > For per-transaction control, ISTM it would be enough to have a simple > user-settable GUC like synchronous_commit. Let's call it > "synchronous_replication_commit" for now. If you wish to change the name of the GUC away from the one I have proposed, fine. Please note that aspect isn't important to me and I will happily concede all such points to the majority view. > For non-critical transactions, > you can turn it off. That's very simple for developers to understand and > use. I don't think we need more fine-grained control than that at > transaction level, in all the use cases I can think of you have a stream > of important transactions, mixed with non-important ones like log > messages that you want to finish fast in a best-effort fashion. Sounds like we're getting somewhere. See below. > I'm > actually tempted to tie that to the existing synchronous_commit GUC, the > use case seems exactly the same. http://archives.postgresql.org/pgsql-hackers/2008-07/msg01001.php Check the date! I think that particular point is going to confuse us. It will draw much bike shedding and won't help us decide between patches. It's a nicety that can be left to a time after we have the core feature committed. > OTOH, if we do want fine-grained per-transaction control, a simple > boolean or even an enum GUC doesn't really cut it. For truly > fine-grained control you want to be able to specify exceptions like > "wait until this is replayed in slave named 'reporting'" or 'don't wait > for acknowledgment from slave named 'uk-server'". With standby > registration, we can invent a syntax for specifying overriding rules in > the transaction. Something like SET replication_exceptions = > 'reporting=replay, uk-server=async'. > > For the control between async/recv/fsync/replay, I like to think in > terms of > a) asynchronous vs synchronous > b) if it's synchronous, how synchronous is it? recv, fsync or replay? > > I think it makes most sense to set sync vs. async in the master, and the > level of synchronicity in the slave. Although I have sympathy for the > argument that it's simpler if you configure it all from the master side > as well. I have catered for such requests by suggesting a plugin that allows you to implement that complexity without overburdening the core code. This strikes me as an "ad absurdum" argument. Since the above over-complexity would doubtless be seen as insane by Tom et al, it attempts to persuade that we don't need recv, fsync and apply either. Fujii has long talked about 4 levels of service also. Why change? I had thought that part was pretty much agreed between all of us. Without performance tests to demonstrate "why", these do sound hard to understand. But we should note that DRBD offers recv
Re: [HACKERS] Configuring synchronous replication
On 17/09/10 12:10, Dimitri Fontaine wrote: Heikki Linnakangas writes: * Support multiple standbys with various synchronization levels. * What happens if a synchronous standby isn't connected at the moment? Return immediately vs. wait forever. * Per-transaction control. Some transactions are important, others are not. * Quorum commit. Wait until n standbys acknowledge. n=1 and n=all servers can be seen as important special cases of this. * async, recv, fsync and replay levels of synchronization. So what should the user interface be like? Given the 1st and 2nd requirement, we need standby registration. If some standbys are important and others are not, the master needs to distinguish between them to be able to determine that a transaction is safely delivered to the important standbys. Well the 1st point can be handled in a distributed fashion, where the sync level is setup at the slave. If the synchronicity is configured in the standby, how does the master know that there's a synchronous slave out there that it should wait for, if that slave isn't connected at the moment? OTOH, if we do want fine-grained per-transaction control, a simple boolean or even an enum GUC doesn't really cut it. For truly fine-grained control you want to be able to specify exceptions like "wait until this is replayed in slave named 'reporting'" or 'don't wait for acknowledgment from slave named 'uk-server'". With standby registration, we can invent a syntax for specifying overriding rules in the transaction. Something like SET replication_exceptions = 'reporting=replay, uk-server=async'. Then you want to be able to have more than one reporting server and need only one of them at the "replay" level, but you don't need to know which it is. Or on the contrary you have a failover server and you want to be sure this one is at the replay level whatever happens. Then you want topology flexibility: you need to be able to replace a reporting server with another, ditto for the failover one. Did I tell you my current thinking on how to tackle that yet? :) Using a distributed setup, where each slave has a weight (several votes per transaction) and a level offering would allow that I think. Yeah, the quorum stuff. That's all good, but doesn't change the way you would do per-transaction control. By specifying overrides on a per-transaction basis, you can have as fine-grained control as you possibly can. Anything you can specify in a configuration file can then also be specified per-transaction with overrides. The syntax just needs to be flexible enough. If we buy into the concept of per-transaction exceptions, we can put that issue aside for the moment, and just consider how to configure things in a config file. Anything you can express in the config file can also be expressed per-transaction with the exceptions GUC. Now something similar to your idea that I can see a need for is being able to have a multi-part quorum target: when you currently say that you want 2 votes for sync, you would be able to say you want 2 votes for recv, 2 for fsync and 1 for replay. Remember that any slave is setup to offer only one level of synchronicity but can offer multiple votes. How this would look like in the setup? Best would be to register the different service levels your application need. Time to bikeshed a little? sync_rep_services = {critical: recv=2, fsync=2, replay=1; important: fsync=3; reporting: recv=2, apply=1} Well you get the idea, it could maybe get stored on a catalog somewhere with nice SQL commands etc. The goal is then to be able to handle a much simpler GUC in the application, sync_rep_service = important for example. Reserved label would be off, the default value So ignoring the quorum stuff for a moment, the general idea is that you have predefined sets of configurations (or exceptions to the general config) specified in a config file, and in the application you just choose among those with "sync_rep_service=XXX". Yeah, I like that, it allows you to isolate the details of the topology from the application. If you add a weight to each slave then a quorum commit, you don't change the implementation complexity and you offer lot of setup flexibility. If the slave sync-level and weight are SIGHUP, then it even become rather easy to switch roles online or to add new servers or to organise a maintenance window — the quorum to reach is a per-transaction GUC on the master, too, right? I haven't bought into the quorum idea yet, but yeah, if we have quorum support, then it would be configurable on a per-transaction basis too with the above mechanism. -- Heikki Linnakangas EnterpriseDB http://www.enterprisedb.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] libpq changes for synchronous replication
On Fri, Sep 17, 2010 at 5:09 PM, Heikki Linnakangas wrote: > That said, there's a few small things that can be progressed regardless of > the details of synchronous replication. There's the changes to trigger > failover with a signal, and it seems that we'll need some libpq changes to > allow acknowledgments to be sent back to the master regardless of the rest > of the design. We can discuss those in separate threads in parallel. Agreed. The attached patch introduces new function which is used to send ACK back from walreceiver. The function sends a message to XLOG stream by calling PQputCopyData. Also I allowed PQputCopyData to be called even during COPY OUT. Regards, -- Fujii Masao NIPPON TELEGRAPH AND TELEPHONE CORPORATION NTT Open Source Software Center libpqrcv_send_v1.patch Description: Binary data -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Configuring synchronous replication
On Fri, 2010-09-17 at 09:15 +0100, Simon Riggs wrote: > On Fri, 2010-09-17 at 11:09 +0300, Heikki Linnakangas wrote: > > That said, there's a few small things that can be progressed > > regardless of the details of synchronous replication. There's the > > changes to trigger failover with a signal, and it seems that we'll > > need some libpq changes to allow acknowledgments to be sent back to > > the master regardless of the rest of the design. We can discuss those > > in separate threads in parallel. > > Agree to both of those points. But I don't agree that those things should be committed just yet. -- Simon Riggs www.2ndQuadrant.com PostgreSQL Development, 24x7 Support, Training and Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] ALTER TYPE extensions
(2010/08/09 5:54), Peter Eisentraut wrote: > For the next review cycle, here is a patch that adds some ALTER TYPE > subcommands for composite types: > > ALTER TYPE ... ADD ATTRIBUTE > ALTER TYPE ... DROP ATTRIBUTE > ALTER TYPE ... ALTER ATTRIBUTE ... SET DATA TYPE > ALTER TYPE ... RENAME ATTRIBUTE > > These work similarly to the analogous ALTER TABLE / $ACTION COLUMN > commands. The first two above are from the SQL standard. > I checked this patch, then noticed some points: * At the ATPrepAddColumn(), it seems to me someone added a check to prevent adding a new column to typed table, as you try to add in this patch. Since this patch was submitted about one month ago, it might be necessary to rebase to the latest master. * At the ATPrepAlterColumnType(), you enclosed an existing code block by "if (tab->relkind == RELKIND_RELATION) { ... }", but it is not indented to appropriate level. |if (tab->relkind == RELKIND_RELATION) |{ |/* | * Set up an expression to transform the old data value to the new type. | * If a USING option was given, transform and use that expression, else | * just take the old value and try to coerce it. We do this first so that | * type incompatibility can be detected before we waste effort, and | * because we need the expression to be parsed against the original table | * rowtype. | */ |if (cmd->transform) |{ |RangeTblEntry *rte; | |/* Expression must be able to access vars of old table */ |rte = addRangeTableEntryForRelation(pstate, |: Perhaps, it is violated to the common coding style. * RENAME ATTRIBUTE ... TO ... Even if the composite type to be altered is in use, we can alter the name of attribute. Is it intended? In this case, this renaming does not affects column name of the typed tables in use. Is it necessary to prohibit renaming, or also calls renameatt() for the depending typed tables. postgres=# CREATE TYPE comp as (a int, b text); CREATE TYPE postgres=# CREATE TABLE t OF comp; CREATE TABLE postgres=# SELECT * FROM t; a | b ---+--- (0 rows) postgres=# ALTER TYPE comp RENAME ATTRIBUTE b TO bbb; ALTER TYPE postgres=# CREATE TABLE s OF comp; CREATE TABLE postgres=# SELECT * FROM t; a | b ---+--- (0 rows) postgres=# SELECT * FROM s; a | bbb ---+- (0 rows) BTW, is there any requirement from SQL standard about behavior when we try to add/drop an attribute of composite type in use? This patch always prohibit it, using find_typed_table_dependencies() and find_composite_type_dependencies(). However, it seems to me not difficult to alter columns of typed tables subsequent with this ALTER TYPE, although it might be not easy to alter definitions of embedded composite type already in use. Of course, it may be our future works. If so, it's good. Thanks, -- KaiGai Kohei -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers