Re: [HACKERS] WITHIN GROUP patch
2013/10/11 Andrew Gierth > > "Pavel" == Pavel Stehule writes: > > >> I found so following error message is not too friendly (mainly because > >> this functionality will be new) > >> > >> postgres=# select dense_rank(3,3,2) within group (order by num desc, > odd) > >> from test4; > >> ERROR: Incorrect number of arguments for hypothetical set function > >> LINE 1: select dense_rank(3,3,2) within group (order by num desc, od... > >> ^ > >> > >> Probably some hint should be there? > > Hm, yeah. > > Anyone have ideas for better wording for the original error message, > or a DETAIL or HINT addition? The key point here is that the number of > _hypothetical_ arguments and the number of ordered columns must match, > but we don't currently exclude the possibility that a user-defined > hypothetical set function might have additional non-hypothetical args. > > The first alternative that springs to mind is: > > ERROR: Incorrect number of arguments for hypothetical set function > DETAIL: Number of hypothetical arguments (3) must equal number of ordered > columns (2) > > +1 Pavel > -- > Andrew (irc:RhodiumToad) >
Re: [HACKERS] Auto-tuning work_mem and maintenance_work_mem
Robert, >> The counter-proposal to "auto-tuning" is just to raise the default for >> work_mem to 4MB or 8MB. Given that Bruce's current formula sets it at >> 6MB for a server with 8GB RAM, I don't really see the benefit of going >> to a whole lot of code and formulas in order to end up at a figure only >> incrementally different from a new static default. > > Agreed. But what do you think the value SHOULD be on such a system? That's the problem: It Depends. One thing in particular which is an issue with calculating against max_connections is that users who don't need 100 connections seldom *reduce* max_connections. So that developer laptop which only needs 3 connections is still going to have a max_connections of 100, just like the DW server where m_c should probably be 30. > I guess the point I'm making here is that raising the default value is > not mutually exclusive with auto-tuning. We could quadruple the > current defaults for work_mem and maintenance_work_mem and be better > off right now, today. Then, we could improve things further in the > future if and when we agree on an approach to auto-tuning. And people > who don't use the auto-tuning will still have a better default. Seems fine to me. -- Josh Berkus PostgreSQL Experts Inc. http://pgexperts.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] Re: space reserved for WAL record does not match what was written: panic on windows
On Thu, Oct 10, 2013 at 03:23:30PM +0200, Andres Freund wrote: > On 2013-10-10 08:59:47 -0400, Robert Haas wrote: > > On Tue, Oct 8, 2013 at 6:24 PM, Andres Freund > > wrote: > > > Do you have a better alternative? Making the computation unconditionally > > > 64bit will have a runtime overhead and adding a StaticAssert in the > > > existing macro doesn't work because we use it in array sizes where gcc > > > balks. > > > We could try using inline functions, but that's not going to be pretty > > > either. > > > > > > I don't really see that many further usecases that will align 64bit > > > values on 32bit platforms, so I think we're ok for now. > > > > I'd be inclined to make the computation unconditionally 64-bit. I > > doubt the speed penalty is enough to worry about, and I think we're > > going to have more and more cases where optimizing for 32-bit > > platforms is just not the right decision. > > MAXALIGN is used in several of PG's hottest functions in many > scenarios. att_align_nominal is used in slot_deform_tuple, > heap_deform_tuple, nocachegetattr, etc. So I don't think that's viable > yet. At least not with much more benefit than this... Agreed. Besides performance, aligning a wider-than-pointer value is an unusual need; authors should think thrice before doing that. I might have even defined the MAXALIGN64 macro in xlog.c rather than a core header. Incidentally, why does MAXALIGN64 use unsigned math while MAXALIGN uses signed math? -- Noah Misch EnterpriseDB http://www.enterprisedb.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [PoC] pgstattuple2: block sampling to reduce physical read
On 11/10/13 17:33, Jaime Casanova wrote: On Thu, Oct 10, 2013 at 5:32 PM, Mark Kirkwood wrote: Quietly replying to myself - looking at the code the sampler does 3000 random page reads... FWIW, something that bothers me is that there is 3000 random page reads... i mean, why 3000? how do you get that number as absolute for good accuracy in every relation? why not a percentage, maybe an argument to the function? Right, Looking at http://en.wikipedia.org/wiki/Sample_size_determination maybe it is not such a bad setting - tho 400 or 1000 seem to be good magic numbers too (if we are gonna punt on single number that is). Perhaps it should reuse (some of) the code from acquire_sample_rows in src/commands/analyze.c (we can't use exactly the same logic, as we need to keep block data together in this case). Cheers Mark -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [PoC] pgstattuple2: block sampling to reduce physical read
On 11/10/13 17:08, Satoshi Nagayasu wrote: > (2013/10/11 7:32), Mark Kirkwood wrote: >> On 11/10/13 11:09, Mark Kirkwood wrote: >>> On 16/09/13 16:20, Satoshi Nagayasu wrote: (2013/09/15 11:07), Peter Eisentraut wrote: > On Sat, 2013-09-14 at 16:18 +0900, Satoshi Nagayasu wrote: >> I'm looking forward to seeing more feedback on this approach, >> in terms of design and performance improvement. >> So, I have submitted this for the next CF. > Your patch fails to build: > > pgstattuple.c: In function ‘pgstat_heap_sample’: > pgstattuple.c:737:13: error: ‘SnapshotNow’ undeclared (first use in > this function) > pgstattuple.c:737:13: note: each undeclared identifier is reported > only once for each function it appears in Thanks for checking. Fixed to eliminate SnapshotNow. >>> This seems like a cool idea! I took a quick look, and initally >>> replicated the sort of improvement you saw: >>> >>> >>> bench=# explain analyze select * from pgstattuple('pgbench_accounts'); >>> QUERY PLAN >>> >>> >>> Function Scan on pgstattuple (cost=0.00..0.01 rows=1 width=72) (actual >>> time=786.368..786.369 rows=1 loops=1) >>> Total runtime: 786.384 ms >>> (2 rows) >>> >>> bench=# explain analyze select * from pgstattuple2('pgbench_accounts'); >>> NOTICE: pgstattuple2: SE tuple_count 0.00, tuple_len 0.00, >>> dead_tuple_count 0.00, dead_tuple_len 0.00, free_space 0.00 >>> QUERY PLAN >>> >>> >>> Function Scan on pgstattuple2 (cost=0.00..0.01 rows=1 width=72) (actual >>> time=12.004..12.005 rows=1 loops=1) >>> Total runtime: 12.019 ms >>> (2 rows) >>> >>> >>> >>> I wondered what sort of difference eliminating caching would make: >>> >>> $ sudo sysctl -w vm.drop_caches=3 >>> >>> Repeating the above queries: >>> >>> >>> bench=# explain analyze select * from pgstattuple('pgbench_accounts'); >>> QUERY PLAN >>> >>> >>> Function Scan on pgstattuple (cost=0.00..0.01 rows=1 width=72) (actual >>> time=9503.774..9503.776 rows=1 loops=1) >>> Total runtime: 9504.523 ms >>> (2 rows) >>> >>> bench=# explain analyze select * from pgstattuple2('pgbench_accounts'); >>> NOTICE: pgstattuple2: SE tuple_count 0.00, tuple_len 0.00, >>> dead_tuple_count 0.00, dead_tuple_len 0.00, free_space 0.00 >>> QUERY PLAN >>> >>> >>> Function Scan on pgstattuple2 (cost=0.00..0.01 rows=1 width=72) (actual >>> time=12330.630..12330.631 rows=1 loops=1) >>> Total runtime: 12331.353 ms >>> (2 rows) >>> >>> >>> So the sampling code seems *slower* when the cache is completely cold - >>> is that expected? (I have not looked at how the code works yet - I'll >>> dive in later if I get a chance)! > Thanks for testing that. It would be very helpful to improve the > performance. > >> Quietly replying to myself - looking at the code the sampler does 3000 >> random page reads... I guess this is slower than 163935 (number of pages >> in pgbench_accounts) sequential page reads thanks to os readahead on my >> type of disk (WD Velociraptor). Tweaking the number of random reads (i.e >> the sample size) down helps - but obviously that can impact estimation >> accuracy. >> >> Thinking about this a bit more, I guess the elapsed runtime is not the >> *only* theng to consider - the sampling code will cause way less >> disruption to the os page cache (3000 pages vs possibly lots more than >> 3000 for reading an entire ralation). >> >> Thoughts? > I think it could be improved by sorting sample block numbers > *before* physical block reads in order to eliminate random access > on the disk. > > pseudo code: > -- > for (i=0 ; i { > sample_block[i] = random(); > } > > qsort(sample_block); > > for (i=0 ; i { > buf = ReadBuffer(rel, sample_block[i]); > > do_some_stats_stuff(buf); > } > -- > > I guess it would be helpful for reducing random access thing. > > Any comments? Ah yes - that's a good idea (rough patch to your patch attached)! bench=# explain analyze select * from pgstattuple2('pgbench_accounts'); NOTICE: pgstattuple2: SE tuple_count 0.00, tuple_len 0.00, dead_tuple_count 0.00, dead_tuple_len 0.00, free_space 0.00 QUERY PLAN Function Scan on pgstattuple2 (cost=0.00..0.01 rows=1 width=72) (actual time=9968.318..9968.319 rows=1 loops=1) Total runtime: 9968.443 ms (2 rows) It would appear that we are now not any worse than a complete sampling... Cheers Mark *** pgstattuple.c.orig 2013-10-11 17:46:00.592666316 +1300 --- pgstattuple.c 2013-10-11 17:46:08.616450284 +1300 *** *** 100,105 --- 100,123 uint64 *tuple_count, uin
Re: [HACKERS] Patch: FORCE_NULL option for copy COPY in CSV mode
On Thu, Oct 10, 2013 at 6:45 PM, Andrew Dunstan wrote: > > On 10/09/2013 11:47 PM, Amit Kapila wrote: >> >> >> One of the advantage, I could see using "NULL For .." syntax is >> that already we have one syntax with which user can specify what >> strings can be replaced with NULL, now just to handle quoted empty >> string why to add different syntax. >> >> "FORCE_NULL" has advantage that it can be used for some columns rather >> than all columns, but I think for that existing syntax can also be >> modified to support it. >> >> >> > > > I think it's badly designed on its face. We don't need and shouldn't > provide a facility for different NULL markers. A general facility for that > would be an ugly an quite pointless addition to code complexity. What we > need is simply a way of altering one specific behaviour, namely the > treatment of quoted NULL markers. We should not do that by allowing munging > the NULL marker per column, I was thinking it to similar in some sense with "insert into tbl" statement. For example Create table tbl (c1 int, c2 int, c3 int, c4 int) insert into tbl (col2) values(1); Here after table name, user can specify column names for which he wants to provide specific values. > but by a syntactical mechanism that directly > addresses the change in behaviour. If you don't like "FORCE NULL" then let's > pick something else, but not this ugly and unnecessary "NULL FOR" gadget. If you don't like idea of one generic syntax for NULLs in COPY command, then "FORCE_QUOTED_NULL" or "QUOTED_NULL" will make more sense as compare to FORCE_NULL. With Regards, Amit Kapila. EnterpriseDB: http://www.enterprisedb.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [PoC] pgstattuple2: block sampling to reduce physical read
On Thu, Oct 10, 2013 at 5:32 PM, Mark Kirkwood wrote: > > Quietly replying to myself - looking at the code the sampler does 3000 > random page reads... FWIW, something that bothers me is that there is 3000 random page reads... i mean, why 3000? how do you get that number as absolute for good accuracy in every relation? why not a percentage, maybe an argument to the function? also the name pgstattuple2, doesn't convince me... maybe you can use pgstattuple() if you use a second argument (percentage of the sample) to overload the function -- Jaime Casanova www.2ndQuadrant.com Professional PostgreSQL: Soporte 24x7 y capacitación Phone: +593 4 5107566 Cell: +593 987171157 -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Patch for reserved connections for replication users
On Thu, Oct 10, 2013 at 10:06 PM, Gibheer wrote: > On Thu, 10 Oct 2013 09:55:24 +0530 > Amit Kapila wrote: > >> On Thu, Oct 10, 2013 at 3:17 AM, Gibheer >> wrote: >> > On Mon, 7 Oct 2013 11:39:55 +0530 >> > Amit Kapila wrote: >> >> Robert Haas wrote: >> >> On Mon, Aug 5, 2013 at 2:04 AM, Andres Freund >> >> wrote: >> >> >>> Hmm. It seems like this match is making MaxConnections no >> >> >>> longer mean the maximum number of connections, but rather the >> >> >>> maximum number of non-replication connections. I don't think >> >> >>> I support that definitional change, and I'm kinda surprised if >> >> >>> this is sufficient to implement it anyway (e.g. see >> >> >>> InitProcGlobal()). >> >> > >> >> >> I don't think the implementation is correct, but why don't you >> >> >> like the definitional change? The set of things you can do from >> >> >> replication connections are completely different from a normal >> >> >> connection. So using separate "pools" for them seems to make >> >> >> sense. That they end up allocating similar internal data seems >> >> >> to be an implementation detail to me. >> >> >> >> > Because replication connections are still "connections". If I >> >> > tell the system I want to allow 100 connections to the server, >> >> > it should allow 100 connections, not 110 or 95 or any other >> >> > number. >> >> >> >> I think that to reserve connections for replication, mechanism >> >> similar to superuser_reserved_connections be used rather than auto >> >> vacuum workers or background workers. >> >> This won't change the definition of MaxConnections. Another thing >> >> is that rather than introducing new parameter for replication >> >> reserved connections, it is better to use max_wal_senders as it >> >> can serve the purpose. >> >> >> >> Review for replication_reserved_connections-v2.patch, considering >> >> we are going to use mechanism similar to >> >> superuser_reserved_connections and won't allow definition of >> >> MaxConnections to change. >> >> >> >> 1. /* the extra unit accounts for the autovacuum launcher */ >> >> MaxBackends = MaxConnections + autovacuum_max_workers + 1 + >> >> - + max_worker_processes; >> >> + + max_worker_processes + max_wal_senders; >> >> >> >> Above changes are not required. >> >> >> >> 2. >> >> + if ((!am_superuser && !am_walsender) && >> >> ReservedBackends > 0 && >> >> !HaveNFreeProcs(ReservedBackends)) >> >> >> >> Change the check as you have in patch-1 for >> >> ReserveReplicationConnections >> >> >> >> if (!am_superuser && >> >> (max_wal_senders > 0 || ReservedBackends > 0) && >> >> !HaveNFreeProcs(max_wal_senders + ReservedBackends)) >> >> ereport(FATAL, >> >> (errcode(ERRCODE_TOO_MANY_CONNECTIONS), >> >> errmsg("remaining connection slots are reserved for replication and >> >> superuser connections"))); >> >> >> >> 3. In guc.c, change description of max_wal_senders similar to >> >> superuser_reserved_connections at below place: >> >>{"max_wal_senders", PGC_POSTMASTER, REPLICATION_SENDING, >> >> gettext_noop("Sets the maximum number of simultaneously running WAL >> >> sender processes."), >> >> >> >> 4. With this approach, there is no need to change >> >> InitProcGlobal(), as it is used to keep track bgworkerFreeProcs >> >> and autovacFreeProcs, for others it use freeProcs. >> >> >> >> 5. Below description in config.sgml needs to be changed for >> >> superuser_reserved_connections to include the effect of >> >> max_wal_enders in reserved connections. >> >> Whenever the number of active concurrent connections is at least >> >> max_connections minus superuser_reserved_connections, new >> >> connections will be accepted only for superusers, and no new >> >> replication connections will be accepted. >> >> >> >> 6. Also similar description should be added to max_wal_senders >> >> configuration parameter. >> >> >> >> 7. Below comment needs to be updated, as it assumes only superuser >> >> reserved connections as part of MaxConnections limit. >> >>/* >> >> * ReservedBackends is the number of backends reserved for >> >> superuser use. >> >> * This number is taken out of the pool size given by MaxBackends >> >> so >> >> * number of backend slots available to non-superusers is >> >> * (MaxBackends - ReservedBackends). Note what this really means >> >> is >> >> * "if there are <= ReservedBackends connections available, only >> >> superusers >> >> * can make new connections" --- pre-existing superuser connections >> >> don't >> >> * count against the limit. >> >> */ >> >> int ReservedBackends; >> >> >> >> 8. Also we can add comment on top of definition for max_wal_senders >> >> similar to ReservedBackends. >> >> >> >> >> >> With Regards, >> >> Amit Kapila. >> >> EnterpriseDB: http://www.enterprisedb.com >> >> >> > >> > Hi, >> > >> > I took the time and reworked the patch with the feedback till now. >> > Thank you very much Amit! >> >>Is there any specific reason why you moved this patch to next >> CommitFest? >> >> With Regards, >> Ami
Re: [HACKERS] [PoC] pgstattuple2: block sampling to reduce physical read
(2013/10/11 7:32), Mark Kirkwood wrote: > On 11/10/13 11:09, Mark Kirkwood wrote: >> On 16/09/13 16:20, Satoshi Nagayasu wrote: >>> (2013/09/15 11:07), Peter Eisentraut wrote: On Sat, 2013-09-14 at 16:18 +0900, Satoshi Nagayasu wrote: > I'm looking forward to seeing more feedback on this approach, > in terms of design and performance improvement. > So, I have submitted this for the next CF. Your patch fails to build: pgstattuple.c: In function ‘pgstat_heap_sample’: pgstattuple.c:737:13: error: ‘SnapshotNow’ undeclared (first use in this function) pgstattuple.c:737:13: note: each undeclared identifier is reported only once for each function it appears in >>> Thanks for checking. Fixed to eliminate SnapshotNow. >>> >> This seems like a cool idea! I took a quick look, and initally >> replicated the sort of improvement you saw: >> >> >> bench=# explain analyze select * from pgstattuple('pgbench_accounts'); >> QUERY PLAN >> >> >> Function Scan on pgstattuple (cost=0.00..0.01 rows=1 width=72) (actual >> time=786.368..786.369 rows=1 loops=1) >> Total runtime: 786.384 ms >> (2 rows) >> >> bench=# explain analyze select * from pgstattuple2('pgbench_accounts'); >> NOTICE: pgstattuple2: SE tuple_count 0.00, tuple_len 0.00, >> dead_tuple_count 0.00, dead_tuple_len 0.00, free_space 0.00 >> QUERY PLAN >> >> >> Function Scan on pgstattuple2 (cost=0.00..0.01 rows=1 width=72) (actual >> time=12.004..12.005 rows=1 loops=1) >> Total runtime: 12.019 ms >> (2 rows) >> >> >> >> I wondered what sort of difference eliminating caching would make: >> >> $ sudo sysctl -w vm.drop_caches=3 >> >> Repeating the above queries: >> >> >> bench=# explain analyze select * from pgstattuple('pgbench_accounts'); >> QUERY PLAN >> >> >> Function Scan on pgstattuple (cost=0.00..0.01 rows=1 width=72) (actual >> time=9503.774..9503.776 rows=1 loops=1) >> Total runtime: 9504.523 ms >> (2 rows) >> >> bench=# explain analyze select * from pgstattuple2('pgbench_accounts'); >> NOTICE: pgstattuple2: SE tuple_count 0.00, tuple_len 0.00, >> dead_tuple_count 0.00, dead_tuple_len 0.00, free_space 0.00 >> QUERY PLAN >> >> >> Function Scan on pgstattuple2 (cost=0.00..0.01 rows=1 width=72) (actual >> time=12330.630..12330.631 rows=1 loops=1) >> Total runtime: 12331.353 ms >> (2 rows) >> >> >> So the sampling code seems *slower* when the cache is completely cold - >> is that expected? (I have not looked at how the code works yet - I'll >> dive in later if I get a chance)! Thanks for testing that. It would be very helpful to improve the performance. > Quietly replying to myself - looking at the code the sampler does 3000 > random page reads... I guess this is slower than 163935 (number of pages > in pgbench_accounts) sequential page reads thanks to os readahead on my > type of disk (WD Velociraptor). Tweaking the number of random reads (i.e > the sample size) down helps - but obviously that can impact estimation > accuracy. > > Thinking about this a bit more, I guess the elapsed runtime is not the > *only* theng to consider - the sampling code will cause way less > disruption to the os page cache (3000 pages vs possibly lots more than > 3000 for reading an entire ralation). > > Thoughts? I think it could be improved by sorting sample block numbers *before* physical block reads in order to eliminate random access on the disk. pseudo code: -- for (i=0 ; i Uptime Technologies, LLC. http://www.uptime.jp -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Compression of full-page-writes
On Fri, Oct 11, 2013 at 5:05 AM, Andres Freund wrote: > Hi, > On 2013-10-11 03:44:01 +0900, Fujii Masao wrote: >> I'm afraid that the patch has only limited effects in WAL reduction and >> performance improvement unless the database contains highly-compressible >> data like large blank characters column. It really depends on the contents >> of the database. So, obviously FPW compression should not be the default. >> Maybe we can treat it as just tuning knob. > > > Have you tried using lz4 (or snappy) instead of pglz? There's a patch > adding it to pg in > http://archives.postgresql.org/message-id/20130621000900.GA12425%40alap2.anarazel.de > > If this really is only a benefit in scenarios with lots of such data, I > have to say I have my doubts about the benefits of the patch. I think it will be difficult to prove by using any compression algorithm, that it compresses in most of the scenario's. In many cases it can so happen that the WAL will also not be reduced and tps can also come down if the data is non-compressible, because any compression algorithm will have to try to compress the data and it will burn some cpu for that, which inturn will reduce tps. As this patch is giving a knob to user to turn compression on/off, so users can decide if they want such benefit. Now some users can say that they have no idea, how or what kind of data will be there in their databases, so such kind of users should not use this option, but on the other side some users know that they have similar pattern of data, so they can get benefit out of such optimisations. For example in telecom industry, i have seen that they have lot of data as CDR's (call data records) in their HLR databases for which the data records will be different but of same pattern. Being said above, I think both this patch and my patch "WAL reduction for Update" (https://commitfest.postgresql.org/action/patch_view?id=1209) are using same technique for WAL compression and can lead to similar consequences in different ways. So I suggest to have unified method to enable WAL Compression for both the patches. With Regards, Amit Kapila. EnterpriseDB: http://www.enterprisedb.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Compression of full-page-writes
On Fri, Oct 11, 2013 at 8:35 AM, Andres Freund wrote: > Hi, > On 2013-10-11 03:44:01 +0900, Fujii Masao wrote: >> I'm afraid that the patch has only limited effects in WAL reduction and >> performance improvement unless the database contains highly-compressible >> data like large blank characters column. It really depends on the contents >> of the database. So, obviously FPW compression should not be the default. >> Maybe we can treat it as just tuning knob. > > > Have you tried using lz4 (or snappy) instead of pglz? There's a patch > adding it to pg in > http://archives.postgresql.org/message-id/20130621000900.GA12425%40alap2.anarazel.de Yeah, it's worth checking them! Will do that. > If this really is only a benefit in scenarios with lots of such data, I > have to say I have my doubts about the benefits of the patch. Yep, maybe the patch needs to be redesigned. Currently in the patch compression is performed per FPW, i.e., the size of data to compress is just 8KB. If we can increase the size of data to compress, we might be able to improve the compression ratio. For example, by storing all outstanding WAL data temporarily in local buffer, compressing them, and then storing the compressed WAL data to WAL buffers. Regards, -- Fujii Masao -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Compression of full-page-writes
On Fri, Oct 11, 2013 at 3:44 AM, Fujii Masao wrote: > On Fri, Oct 11, 2013 at 1:20 AM, Dimitri Fontaine > wrote: >> Hi, >> >> I did a partial review of this patch, wherein I focused on the patch and >> the code itself, as I saw other contributors already did some testing on >> it, so that we know it applies cleanly and work to some good extend. > > Thanks a lot! > >> In full_page_writes_str() why are you returning "unrecognized" rather >> than doing an ELOG(ERROR, …) for this unexpected situation? > > It's because the similar functions 'wal_level_str' and 'dbState' also return > 'unrecognized' in the unexpected situation. I just implemented > full_page_writes_str() > in the same manner. > > If we do an elog(ERROR) in that case, pg_xlogdump would fail to dump > the 'broken' (i.e., unrecognized fpw is set) WAL file. I think that some > users want to use pg_xlogdump to investigate the broken WAL file, so > doing an elog(ERROR) seems not good to me. > >> The code switches to compression (or trying to) when the following >> condition is met: >> >> + if (fpw <= FULL_PAGE_WRITES_COMPRESS) >> + { >> + rdt->data = CompressBackupBlock(page, BLCKSZ - >> bkpb->hole_length, &(rdt->len)); >> >> We have >> >> + typedef enum FullPageWritesLevel >> + { >> + FULL_PAGE_WRITES_OFF = 0, >> + FULL_PAGE_WRITES_COMPRESS, >> + FULL_PAGE_WRITES_ON >> + } FullPageWritesLevel; >> >> + #define FullPageWritesIsNeeded(fpw) (fpw >= FULL_PAGE_WRITES_COMPRESS) >> >> I don't much like using the <= test against and ENUM and I'm not sure I >> understand the intention you have here. It somehow looks like a typo and >> disagrees with the macro. > > I thought that FPW should be compressed only when full_page_writes is > set to 'compress' or 'off'. That is, 'off' implies a compression. When it's > set > to 'off', FPW is basically not generated, so there is no need to call > CompressBackupBlock() in that case. But only during online base backup, > FPW is forcibly generated even when it's set to 'off'. So I used the check > "fpw <= FULL_PAGE_WRITES_COMPRESS" there. > >> What about using the FullPageWritesIsNeeded >> macro, and maybe rewriting the macro as >> >> #define FullPageWritesIsNeeded(fpw) \ >>(fpw == FULL_PAGE_WRITES_COMPRESS || fpw == FULL_PAGE_WRITES_ON) > > I'm OK to change the macro so that the <= test is not used. > >> Also, having "on" imply "compress" is a little funny to me. Maybe we >> should just finish our testing and be happy to always compress the full >> page writes. What would the downside be exactly (on buzy IO system >> writing less data even if needing more CPU will be the right trade-off). > > "on" doesn't imply "compress". When full_page_writes is set to "on", > FPW is not compressed at all. > >> I like that you're checking the savings of the compressed data with >> respect to the uncompressed data and cancel the compression if there's >> no gain. I wonder if your test accounts for enough padding and headers >> though given the results we saw in other tests made in this thread. > > I'm afraid that the patch has only limited effects in WAL reduction and > performance improvement unless the database contains highly-compressible > data like large blank characters column. It really depends on the contents > of the database. So, obviously FPW compression should not be the default. > Maybe we can treat it as just tuning knob. > >> Why do we have both the static function full_page_writes_str() and the >> macro FullPageWritesStr, with two different implementations issuing >> either "true" and "false" or "on" and "off"? > > First I was thinking to use "on" and "off" because they are often used > as the setting value of boolean GUC. But unfortunately the existing > pg_xlogdump uses "true" and "false" to show the value of full_page_writes > in WAL. To avoid breaking the backward compatibility, I implmented > the "true/false" version of function. I'm really not sure how many people > want such a compatibility of pg_xlogdump, though. > >> ! unsignedhole_offset:15, /* number of bytes before "hole" */ >> ! flags:2,/* state of a backup >> block, see below */ >> ! hole_length:15; /* number of bytes in "hole" >> */ >> >> I don't understand that. I wanted to use that patch as a leverage to >> smoothly discover the internals of our WAL system but won't have the >> time to do that here. > > We need the flag indicating whether each FPW is compressed or not. > If no such a flag exists in WAL, the standby cannot determine whether > it should decompress each FPW or not, and then cannot replay > the WAL containing FPW properly. That is, I just used a 'space' in > the header of FPW to have such a flag. > >> That said, I don't even know that C syntax. > > The struct 'ItemIdData' uses the same C syntax. > >> + #define BKPBLOCK_UNCOMPRESSED 0 /* uncompressed */ >> + #define BKP
Re: [HACKERS] Bugfix and new feature for PGXS
On 10/10/2013 09:35 PM, Peter Eisentraut wrote: On Tue, 2013-10-08 at 10:04 -0400, Andrew Dunstan wrote: On 10/07/2013 08:47 PM, Peter Eisentraut wrote: I suspect this line submake-libpq: $(libdir)/libpq.so ; will cause problems on platforms with a different extension (e.g. OS X). suggested fix is below. Hmm, this would duplicate information about shared library naming in a place outside of Makefile.shlib. That doesn't look right. Please suggest an alternative. cheers andrew -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Bugfix and new feature for PGXS
On 10/10/2013 09:37 PM, Peter Eisentraut wrote: On Mon, 2013-10-07 at 22:00 -0400, Andrew Dunstan wrote: The code has been sitting in HEAD for several months, and I committed on the back branches because it was wanted. New features normally go through a full development cycle including extensive beta testing, especially when they contain changes to external interfaces. The fact that the patch author wanted his feature released immediately (of course) doesn't warrant a free pass in this case, IMO. Perhaps you should have stated your objection when this was being discussed, and saved me some time. I frankly think we can be a bit more tolerant about build system features than we would be about the actual software it builds. cheers andrew -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] WITHIN GROUP patch
> "Pavel" == Pavel Stehule writes: >> I found so following error message is not too friendly (mainly because >> this functionality will be new) >> >> postgres=# select dense_rank(3,3,2) within group (order by num desc, odd) >> from test4; >> ERROR: Incorrect number of arguments for hypothetical set function >> LINE 1: select dense_rank(3,3,2) within group (order by num desc, od... >> ^ >> >> Probably some hint should be there? Hm, yeah. Anyone have ideas for better wording for the original error message, or a DETAIL or HINT addition? The key point here is that the number of _hypothetical_ arguments and the number of ordered columns must match, but we don't currently exclude the possibility that a user-defined hypothetical set function might have additional non-hypothetical args. The first alternative that springs to mind is: ERROR: Incorrect number of arguments for hypothetical set function DETAIL: Number of hypothetical arguments (3) must equal number of ordered columns (2) -- Andrew (irc:RhodiumToad) -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [COMMITTERS] pgsql: Replace duplicate_oids with Perl implementation
On Thu, Oct 10, 2013 at 8:23 PM, Peter Eisentraut wrote: > Replace duplicate_oids with Perl implementation > > It is more portable, more robust, and more readable. smilodon seems unhappy with this: cd ../../../src/include/catalog && ./duplicate_oids sh: ./duplicate_oids: not found -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] ENABLE/DISABLE CONSTRAINT NAME
On 2013-10-10 02:10, Robert Haas wrote: I agree with these concerns, as well as those raised by Tom Lane and Fabien COELHO, and I think they indicate that we shouldn't accept this patch. So I'm marking this as Rejected. On 2013-10-11 06:48, Jim Nasby wrote: I see a use case for disabling FKs and CHECKS but not PKs or UNIQUE constraints: FKs and CHECKS don't depend on additional state information (namely an index), so >it's easy to just disable them temporarily and then re-enable them. The same isn't true about a PK or UNIQUE constraint. Of course we could decide to do something more complex to handle disabling PK/UNIQUE... though at that point it'd be better to just allow temporarily disabling >any index. But I think there's an argument to be made for that being beyond the scope of disabling "simple" constraints... it's a pretty high bar to set that we ?>won't accept a patch that disables simple constraints but not those involving indexes. Thanks for your reply. I found my patch's weakness.I think the DISABLE/ENABLE patch is necessary. I will pack a new patch for all the constraints to commit. Thanks again. Yours, Wang Shuo HighGo Software Co.,Ltd. October 11, 2013 -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Bugfix and new feature for PGXS
On Mon, 2013-10-07 at 22:00 -0400, Andrew Dunstan wrote: > The code has been sitting in HEAD for several months, and I > committed on the back branches because it was wanted. New features normally go through a full development cycle including extensive beta testing, especially when they contain changes to external interfaces. The fact that the patch author wanted his feature released immediately (of course) doesn't warrant a free pass in this case, IMO. -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Bugfix and new feature for PGXS
On Tue, 2013-10-08 at 10:04 -0400, Andrew Dunstan wrote: > On 10/07/2013 08:47 PM, Peter Eisentraut wrote: > > > > I suspect this line > > > > submake-libpq: $(libdir)/libpq.so ; > > > > will cause problems on platforms with a different extension (e.g. OS X). > > > suggested fix is below. Hmm, this would duplicate information about shared library naming in a place outside of Makefile.shlib. That doesn't look right. > diff --git a/src/Makefile.global.in b/src/Makefile.global.in > index bb732bb..b562378 100644 > --- a/src/Makefile.global.in > +++ b/src/Makefile.global.in > @@ -422,7 +422,11 @@ ifndef PGXS > submake-libpq: > $(MAKE) -C $(libpq_builddir) all > else > -submake-libpq: $(libdir)/libpq.so ; > +ifneq ($(PORTNAME),cygwin) > +submake-libpq: $(libdir)/libpq$(DLSUFFIX) ; > +else > +submake-libpq: $(libdir)/cygpq$(DLSUFFIX) ; > +endif > endif > > ifndef PGXS > > > -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] GIN improvements part 1: additional information
You linked to this email from the commitfest entry, but there is no patch here. You probably meant a different email. Check please. On Tue, 2013-10-08 at 21:48 +0300, Heikki Linnakangas wrote: > On 04.10.2013 14:13, Alexander Korotkov wrote: > > On Fri, Oct 4, 2013 at 12:28 PM, Heikki Linnakangas >> wrote: > > > >> In the attached patch, I in fact already did that for data leaf pages, but > >> didn't change the format of non-leaf pages yet. If we want to support > >> pg_upgrade, we might want to refrain from changing the non-leaf format. > > > > In GinDataLeafPageGetPostingList* you use sizeof(ItemPointerData) without > > MAXALIGN. Is it an error or you especially use 2 extra bytes on leaf page? > > I didn't even think of it. Now that I do think of it, I don't see a > reason to use MAXALIGN there. PostingItems only require 2-byte > alignment. It's a bit fragile and underdocumented though. It probably > would be good to have a struct to represent that layout. Something like: > > struct > { >ItemPointerData rightBound; >PostingItem postingItems[1]; /* variable length array */ > } PostingItemPageContent; > > And use that struct in the macros. > > Then again, we do currently use MAXALIGN there, so if we want to avoid > changing the on-disk format, we have to keep it... > > - Heikki > > -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Auto-tuning work_mem and maintenance_work_mem
On Thu, Oct 10, 2013 at 6:27 PM, Josh Berkus wrote: >> More generally, Josh has made repeated comments that various proposed >> value/formulas for work_mem are too low, but obviously the people who >> suggested them didn't think so. So I'm a bit concerned that we don't >> all agree on what the end goal of this activity looks like. > > The counter-proposal to "auto-tuning" is just to raise the default for > work_mem to 4MB or 8MB. Given that Bruce's current formula sets it at > 6MB for a server with 8GB RAM, I don't really see the benefit of going > to a whole lot of code and formulas in order to end up at a figure only > incrementally different from a new static default. Agreed. But what do you think the value SHOULD be on such a system? I suggest that it's pretty reasonable to assume that even a developer's personal machine will likely have 8GB or so by the time PostgreSQL comes out, so tuning work_mem on that basis is not unreasonable. Plus, even if it has less, a developer probably won't have 100 connections. I guess the point I'm making here is that raising the default value is not mutually exclusive with auto-tuning. We could quadruple the current defaults for work_mem and maintenance_work_mem and be better off right now, today. Then, we could improve things further in the future if and when we agree on an approach to auto-tuning. And people who don't use the auto-tuning will still have a better default. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Auto-tuning work_mem and maintenance_work_mem
On Thu, Oct 10, 2013 at 6:36 PM, Bruce Momjian wrote: > Patch attached. ISTM that we have broad consensus that doing this at initdb time is more desirable than doing it in the server on the fly. Not everyone agrees with that (you don't, for instance) but there were many, many votes in favor of that option. Judging by the commit I just pushed to do initdb-time selection of the dynamic shared memory implementation to use, this is probably not hard to code. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] CommitFest progress
On Wed, Oct 9, 2013 at 2:03 PM, Robert Haas wrote: > I therefore propose that we start by marking all of the patches that > are currently Waiting on Author as Returned with Feedback. Most of > them have been that way for a long time. Hearing no objections, I went through and did this, but skipped some that had recent activity, and instead marked one as ready for committer on the basis that the reviewer thought it was in good shape except for needing test case revision. Together with today's rash of commits, this means that we've now disposed of half the patches in the CommitFest. IOW, we have a lot of work left to do. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [COMMITTERS] pgsql: Revive line type
On Thu, 2013-10-10 at 10:51 -0400, Robert Haas wrote: > On Wed, Oct 9, 2013 at 10:43 PM, Peter Eisentraut wrote: > > Revive line type > > Kevin just pointed out to me that there are a bunch of buildfarm > failures. I'm looking at the ones that are attributable to shared > memory, but there seem to be some problems with this patch as well. > Check out brolga, for example. OK, I fixed the obvious issue with the geometry test. The line test is now failing because of "negative zero", which could be addressed with an alternative expected file like in the case of geometry, and also because of a rounding issue. I'm not sure yet whether the latter is just another platform difference, an unfortunately example case, or an actual bug. -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] dynamic shared memory: wherein I am punished for good intentions
On Thu, Oct 10, 2013 at 7:59 PM, Josh Berkus wrote: >>> (5) Default to POSIX, and allow for SysV as a compile-time option for >>> platforms with poor POSIX memory support. >> >> OK, I did #5. Let's see how that works. > > Andrew pointed out upthread that, since platforms are unlikely to change > what they support dynamically, we could do this at initdb time instead. Oh, sorry, that's what I did. I missed the fact that your #5 and his #5 were different. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] dynamic shared memory: wherein I am punished for good intentions
>> (5) Default to POSIX, and allow for SysV as a compile-time option for >> platforms with poor POSIX memory support. > > OK, I did #5. Let's see how that works. Andrew pointed out upthread that, since platforms are unlikely to change what they support dynamically, we could do this at initdb time instead. -- Josh Berkus PostgreSQL Experts Inc. http://pgexperts.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] dynamic shared memory: wherein I am punished for good intentions
On Thu, Oct 10, 2013 at 5:14 PM, Josh Berkus wrote: >>> Doesn't #2 negate all advantages of this effort? Bringing sysv >>> management back on the table seems like a giant step backwards -- or >>> am I missing something? >> >> Not unless there's no difference between "the default" and "the only option". > > Well, per our earlier discussion about "the first 15 minutes", there > actually isn't a difference. Harsh. :-) > I can only see two reasonable alternatives: > > This one: >> (3) Add a new setting that auto-probes for a type of shared memory >> that works. Try POSIX first, then System V. Maybe even fall back to >> mmap'd files if neither of those works. > > Or: > > (5) Default to POSIX, and allow for SysV as a compile-time option for > platforms with poor POSIX memory support. OK, I did #5. Let's see how that works. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Compression of full-page-writes
Hi, On 2013-10-11 03:44:01 +0900, Fujii Masao wrote: > I'm afraid that the patch has only limited effects in WAL reduction and > performance improvement unless the database contains highly-compressible > data like large blank characters column. It really depends on the contents > of the database. So, obviously FPW compression should not be the default. > Maybe we can treat it as just tuning knob. Have you tried using lz4 (or snappy) instead of pglz? There's a patch adding it to pg in http://archives.postgresql.org/message-id/20130621000900.GA12425%40alap2.anarazel.de If this really is only a benefit in scenarios with lots of such data, I have to say I have my doubts about the benefits of the patch. Greetings, Andres Freund -- Andres Freund http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Auto-tuning work_mem and maintenance_work_mem
On 10/10/13 9:44 AM, MauMau wrote: From: "Robert Haas" On Thu, Oct 10, 2013 at 1:23 AM, Magnus Hagander wrote: I think it would be even simpler, and more reliable, to start with the parameter to initdb - I like that. But instead of having it set a new variable based on that and then autotune off that, just have *initdb* do these calculations you're suggesting, and write new defaults to the files (preferably with a comment). That way if the user *later* comes in and say changes shared_buffers, we don't dynamically resize the work_mem into a value that might cause his machine to die from swapping which would definitely violate the principle of least surprise.. +1 for all of that. I completely agree. I vote for this idea completely, too. It's nice to be able to specify usable RAM with something like "initdb --system-memory 8GB", because it provides flexibility for memory allocation --- use the whole machine for one PostgreSQL instance, or run multiple instances on one machine with 50% of RAM for instance-A and 25% of RAM for instance B and C, etc. But what is the default value of --system-memory? I would like it to be the whole RAM. I hope something like pgtune will be incorporated into the core, absorbing the ideas in: - pgtune - https://wiki.postgresql.org/wiki/Tuning_Your_PostgreSQL_Server - the book "PostgreSQL 9.0 High Performance" by Greg Smith Then initdb calls the tool. Of course, DBAs can use the tool later. Like pgtune, the tool would be nice if it and initdb can accept "--system-type" or "--workload" with arguments {OLTP | DW | mixed}. +1 on all of the above. If putting one-time magic in initdb works maybe then we can look at runtime or even completely dynamic magic. FWIW, I would be careful about allowing the tool to go completely crazy if --system-memory is set really high, including for things like work_mem. Frequently if you've got a lot of memory you're going to want a serious chunk of it used by the filesystem/kernel cache, and not to just vanish into a random sort (esp since last I knew there were diminishing returns on sort work_mem...) Of course, I'm in a world of 512G servers with 8GB shared buffers so... -- Jim C. Nasby, Data Architect j...@nasby.net 512.569.9461 (cell) http://jim.nasby.net -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] dynamic shared memory: wherein I am punished for good intentions
On 2013-10-10 12:13:20 -0400, Robert Haas wrote: > and on smew (Debian GNU/Linux 6.0), it > fails with "Function not implemented", which according to a forum > post[1] I found probably indicates that /dev/shm doesn't mount a tmpfs > on that box. It would be nice to get confirmed what the reason for that is - afaik debian has mounted /dev/shm for ages. The most likely issue I can see is an incorrectly setup chroot. If it's just that I wouldn't be too concerned about it. That's a scenario that won't happen to many users and relatively easily fixed Greetings, Andres Freund -- Andres Freund http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] logical changeset generation v6.2
Hi Robert, On 2013-10-09 14:49:46 -0400, Robert Haas wrote: > I spent some time looking at the sample plugin (patch 9/12). Here are > some review comments: > > - I think that the decoding plugin interface should work more like the > foreign data wrapper interface. Instead of using pg_dlsym to look up > fixed names, I think there should be a struct of function pointers > that gets filled in and registered somehow. You mean something like CREATE OUTPUT PLUGIN registering a function with an INTERNAL return value returning a filled struct? I thought about that, but it seemed more complex. Happy to change it though if it's preferred. > - pg_decode_init() only warns when it encounters an unknown option. > An error seems more appropriate. Fine with me. I think I just made it a warning because I wanted to experiment with options. > - Still wondering how we'll use this from a bgworker. Simplified code to consume data: LogicalDecodingReAcquireSlot(NameStr(*name)); ctx = CreateLogicalDecodingContext(MyLogicalDecodingSlot, false /* not initial call */, MyLogicalDecodingSlot->confirmed_flush, options, logical_read_local_xlog_page, LogicalOutputPrepareWrite, LogicalOutputWrite); ... while (true) { XLogRecord *record; char *errm = NULL; record = XLogReadRecord(ctx->reader, startptr, &errm); ... DecodeRecordIntoReorderBuffer(ctx, &buf); } /* at the end or better ever commit or such */ LogicalConfirmReceivedLocation(/* whatever you consumed */); LogicalDecodingReleaseSlot(); > - The output format doesn't look very machine-parseable. I really > think we ought to provide something that is. Maybe a CSV-like format, > or maybe something else, but I don't see why someone who wants to do > change logging should be forced to write and install C code. If > something like Bucardo can run on an unmodified system and extract > change-sets this way without needing a .so file, that's going to be a > huge win for usability. We can change the current format but I really see little to no chance of agreeing on a replication format that's serviceable to several solutions short term. Once we've gained some experience - maybe even this cycle - that might be different. > More generally on this patch set, if I'm going to be committing any of > this, I'd prefer to start with what is currently patches 3 and 4, once > we reach agreement on those. Sounds like a reasonable start. > Are we hoping to get any of this committed for this CF? If so, let's > make a plan to get that done; time is short. If not, let's update the > CF app accordingly. I'd really like to do so. I am travelling atm, but I will be back tomorrow evening and will push an updated patch this weekend. The issue I know of in the latest patches at http://www.postgresql.org/message-id/20131007133232.ga15...@awork2.anarazel.de is renaming from http://www.postgresql.org/message-id/20131008194758.gb3718...@alap2.anarazel.de Do you know of anything else in the patches you're referring to? Thanks, Andres Freund -- Andres Freund http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Auto-tuning work_mem and maintenance_work_mem
On Thu, Oct 10, 2013 at 03:27:17PM -0700, Josh Berkus wrote: > > > More generally, Josh has made repeated comments that various proposed > > value/formulas for work_mem are too low, but obviously the people who > > suggested them didn't think so. So I'm a bit concerned that we don't > > all agree on what the end goal of this activity looks like. > > The counter-proposal to "auto-tuning" is just to raise the default for > work_mem to 4MB or 8MB. Given that Bruce's current formula sets it at > 6MB for a server with 8GB RAM, I don't really see the benefit of going > to a whole lot of code and formulas in order to end up at a figure only > incrementally different from a new static default. Well, the plan was going to auto-tune shared_buffers and effective_cache_size too. We could fall back to our existing code where effective_cache_size autotunes on shared_buffers, and we just up work_mem's default, tell people to set shared_buffers properly, and call it a day. -- Bruce Momjian http://momjian.us EnterpriseDB http://enterprisedb.com + Everyone has their own god. + -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] ENABLE/DISABLE CONSTRAINT NAME
On 10/9/13 1:10 PM, Robert Haas wrote: On Tue, Sep 24, 2013 at 10:40 PM, Peter Eisentraut wrote: On Tue, 2013-09-24 at 11:58 +0200, Bernd Helmle wrote: Hmm not sure i understand this argument either: this patch doesn't allow disabling a primary key. It only supports FKs and CHECK constraints explicitly. Well, as soon as the patch for cataloging not-null constraints as check constraints is available, it will be possible to create views that depend functionally on check constraints. Then you'll have the same problem there. It's also not clear why this patch only supports foreign keys and check constraints. Maybe that's what was convenient to implement, but it's not a principled solution to the general issue that constraints can be involved in dependencies. I agree with these concerns, as well as those raised by Tom Lane and Fabien COELHO, and I think they indicate that we shouldn't accept this patch. So I'm marking this as Rejected. I see a use case for disabling FKs and CHECKS but not PKs or UNIQUE constraints: FKs and CHECKS don't depend on additional state information (namely an index), so it's easy to just disable them temporarily and then re-enable them. The same isn't true about a PK or UNIQUE constraint. Of course we could decide to do something more complex to handle disabling PK/UNIQUE... though at that point it'd be better to just allow temporarily disabling any index. But I think there's an argument to be made for that being beyond the scope of disabling "simple" constraints... it's a pretty high bar to set that we won't accept a patch that disables simple constraints but not those involving indexes. -- Jim C. Nasby, Data Architect j...@nasby.net 512.569.9461 (cell) http://jim.nasby.net -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Auto-tuning work_mem and maintenance_work_mem
On Thu, Oct 10, 2013 at 03:40:17PM -0700, Josh Berkus wrote: > > >> I don't follow that. Why would using a connection pooler change the > >> multiples > >> of work_mem that a connection would use? > > > > I assume that a connection pooler would keep processes running longer, > > so even if they were not all using work_mem, they would have that memory > > mapped into the process, and perhaps swapped out. > > Yes, and then this is when it *really* matters what OS you're running, > and what release. FreeBSD and Solaris++ don't overallocate RAM, so > those long-running connections pin a lot of RAM eventually. And for > Linux, it's a question of how aggressive the OOM killer is, which kinda > depends on distro/version/sysadmin settings. > > When I configure pgbouncer for Illumos users, I specifically have it > rotate out old connections once an hour for this reason. Just as a point of education, this is a good idea why you want to allocate swap even if you expect your workload to fit in memory. Pushing unused memory to swap is a good use of swap. -- Bruce Momjian http://momjian.us EnterpriseDB http://enterprisedb.com + Everyone has their own god. + -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Auto-tuning work_mem and maintenance_work_mem
On Thu, Oct 10, 2013 at 02:44:12PM -0400, Peter Eisentraut wrote: > On 10/10/13 11:31 AM, Bruce Momjian wrote: > > Let me walk through the idea of adding an available_mem setting, that > > Josh suggested, and which I think addresses Robert's concern about > > larger shared_buffers and Windows servers. > > I think this is a promising idea. available_mem could even be set > automatically by packages. And power users could just set available_mem > = -1 to turn off all the magic. Yes, I was thinking about that. Imagine we have an initdb parameter for available memory --- packagers could do something like: initdb -M $(awk '{print $2 * 1024; exit}' /proc/meminfo) to pass in the available memory of the server, or to use 90% of RAM, use: initdb -M $(awk '{printf "%.0f\n", $2 * 1024 * 0.9; exit}' /proc/meminfo) This allows us to externalize all the OS-specific information and allow the packagers to supply it. The packagers could even ask the user if they wish to control the percentage. FYI, I hope people are OK with me replying a lot in this thread --- I do think this is going to take a lot of discussion, but I think the end-result will be worth it. -- Bruce Momjian http://momjian.us EnterpriseDB http://enterprisedb.com + Everyone has their own god. + -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Auto-tuning work_mem and maintenance_work_mem
>> I don't follow that. Why would using a connection pooler change the >> multiples >> of work_mem that a connection would use? > > I assume that a connection pooler would keep processes running longer, > so even if they were not all using work_mem, they would have that memory > mapped into the process, and perhaps swapped out. Yes, and then this is when it *really* matters what OS you're running, and what release. FreeBSD and Solaris++ don't overallocate RAM, so those long-running connections pin a lot of RAM eventually. And for Linux, it's a question of how aggressive the OOM killer is, which kinda depends on distro/version/sysadmin settings. When I configure pgbouncer for Illumos users, I specifically have it rotate out old connections once an hour for this reason. -- Josh Berkus PostgreSQL Experts Inc. http://pgexperts.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Auto-tuning work_mem and maintenance_work_mem
On Thu, Oct 10, 2013 at 11:18:28AM -0700, Josh Berkus wrote: > Bruce, > > >> That's way low, and frankly it's not worth bothering with this if all > >> we're going to get is an incremental increase. In that case, let's just > >> set the default to 4MB like Robert suggested. > > > > Uh, well, 100 backends at 6MB gives us 600MB, and if each backend uses > > 3x work_mem, that gives us 1.8GB for total work_mem. This was based on > > Andrew's concerns about possible over-commit of work_mem. I can of > > course adjust that. > > That's worst-case-scenario planning -- the 3X work-mem per backend was: > a) Solaris and > b) data warehousing > > In a normal OLTP application each backend averages something like 0.25 * > work_mem, since many queries use no work_mem at all. > > It also doesn't address my point that, if we are worst-case-scenario > default-setting, we're going to end up with defaults which aren't > materially different from the current defaults. In which case, why even > bother with this whole exercise? OK, here is an updated patch that is less conservative. FYI, this thread has gone on for 80 messages, and I assume it will take many more until we are done: test=> SHOW shared_buffers; shared_buffers 128MB (1 row) test=> SHOW work_mem; work_mem -- 2621kB (1 row) test=> SHOW maintenance_work_mem; maintenance_work_mem -- 10922kB (1 row) --- test=> SHOW shared_buffers; shared_buffers 2GB (1 row) test=> SHOW work_mem; work_mem -- 41943kB (1 row) test=> SHOW maintenance_work_mem; maintenance_work_mem -- 174762kB (1 row) --- test=> SHOW shared_buffers; shared_buffers 8GB (1 row) test=> SHOW work_mem; work_mem -- 167772kB (1 row) test=> SHOW maintenance_work_mem; maintenance_work_mem -- 699050kB (1 row) Patch attached. -- Bruce Momjian http://momjian.us EnterpriseDB http://enterprisedb.com + Everyone has their own god. + diff --git a/doc/src/sgml/config.sgml b/doc/src/sgml/config.sgml new file mode 100644 index e8e8e6f..2f00c74 *** a/doc/src/sgml/config.sgml --- b/doc/src/sgml/config.sgml *** include 'filename' *** 1121,1127 Specifies the amount of memory to be used by internal sort operations and hash tables before writing to temporary disk files. The value ! defaults to one megabyte (1MB). Note that for a complex query, several sort or hash operations might be running in parallel; each operation will be allowed to use as much memory as this value specifies before it starts to write data into temporary --- 1121,1128 Specifies the amount of memory to be used by internal sort operations and hash tables before writing to temporary disk files. The value ! defaults to 2 * shared_buffers / ! max_connections. Note that for a complex query, several sort or hash operations might be running in parallel; each operation will be allowed to use as much memory as this value specifies before it starts to write data into temporary *** include 'filename' *** 1147,1153 Specifies the maximum amount of memory to be used by maintenance operations, such as VACUUM, CREATE INDEX, and ALTER TABLE ADD FOREIGN KEY. It defaults ! to 16 megabytes (16MB). Since only one of these operations can be executed at a time by a database session, and an installation normally doesn't have many of them running concurrently, it's safe to set this value significantly larger --- 1148,1155 Specifies the maximum amount of memory to be used by maintenance operations, such as VACUUM, CREATE INDEX, and ALTER TABLE ADD FOREIGN KEY. It defaults ! to shared_buffers / 4 / ! autovacuum_max_workers. Since only one of these operations can be executed at a time by a database session, and an installation normally doesn't have many of them running concurrently, it's safe to set this value significantly larger diff --git a/src/backend/utils/init/globals.c b/src/backend/utils/init/globals.c new file mode 100644 index 33efb3c..68af1dc *** a/sr
Re: [HACKERS] [PoC] pgstattuple2: block sampling to reduce physical read
On 11/10/13 11:09, Mark Kirkwood wrote: > On 16/09/13 16:20, Satoshi Nagayasu wrote: >> (2013/09/15 11:07), Peter Eisentraut wrote: >>> On Sat, 2013-09-14 at 16:18 +0900, Satoshi Nagayasu wrote: I'm looking forward to seeing more feedback on this approach, in terms of design and performance improvement. So, I have submitted this for the next CF. >>> Your patch fails to build: >>> >>> pgstattuple.c: In function ‘pgstat_heap_sample’: >>> pgstattuple.c:737:13: error: ‘SnapshotNow’ undeclared (first use in >>> this function) >>> pgstattuple.c:737:13: note: each undeclared identifier is reported >>> only once for each function it appears in >> Thanks for checking. Fixed to eliminate SnapshotNow. >> > This seems like a cool idea! I took a quick look, and initally > replicated the sort of improvement you saw: > > > bench=# explain analyze select * from pgstattuple('pgbench_accounts'); > QUERY PLAN > > > Function Scan on pgstattuple (cost=0.00..0.01 rows=1 width=72) (actual > time=786.368..786.369 rows=1 loops=1) > Total runtime: 786.384 ms > (2 rows) > > bench=# explain analyze select * from pgstattuple2('pgbench_accounts'); > NOTICE: pgstattuple2: SE tuple_count 0.00, tuple_len 0.00, > dead_tuple_count 0.00, dead_tuple_len 0.00, free_space 0.00 > QUERY PLAN > > > Function Scan on pgstattuple2 (cost=0.00..0.01 rows=1 width=72) (actual > time=12.004..12.005 rows=1 loops=1) > Total runtime: 12.019 ms > (2 rows) > > > > I wondered what sort of difference eliminating caching would make: > > $ sudo sysctl -w vm.drop_caches=3 > > Repeating the above queries: > > > bench=# explain analyze select * from pgstattuple('pgbench_accounts'); > QUERY PLAN > > > Function Scan on pgstattuple (cost=0.00..0.01 rows=1 width=72) (actual > time=9503.774..9503.776 rows=1 loops=1) > Total runtime: 9504.523 ms > (2 rows) > > bench=# explain analyze select * from pgstattuple2('pgbench_accounts'); > NOTICE: pgstattuple2: SE tuple_count 0.00, tuple_len 0.00, > dead_tuple_count 0.00, dead_tuple_len 0.00, free_space 0.00 > QUERY PLAN > > > Function Scan on pgstattuple2 (cost=0.00..0.01 rows=1 width=72) (actual > time=12330.630..12330.631 rows=1 loops=1) > Total runtime: 12331.353 ms > (2 rows) > > > So the sampling code seems *slower* when the cache is completely cold - > is that expected? (I have not looked at how the code works yet - I'll > dive in later if I get a chance)! > Quietly replying to myself - looking at the code the sampler does 3000 random page reads... I guess this is slower than 163935 (number of pages in pgbench_accounts) sequential page reads thanks to os readahead on my type of disk (WD Velociraptor). Tweaking the number of random reads (i.e the sample size) down helps - but obviously that can impact estimation accuracy. Thinking about this a bit more, I guess the elapsed runtime is not the *only* theng to consider - the sampling code will cause way less disruption to the os page cache (3000 pages vs possibly lots more than 3000 for reading an entire ralation). Thoughts? Cheers Mark -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Auto-tuning work_mem and maintenance_work_mem
On Thu, Oct 10, 2013 at 11:14:27AM -0700, Jeff Janes wrote: > The assumption that each connection won't use lots of work_mem is also > false, I think, especially in these days of connection poolers. > > > I don't follow that. Why would using a connection pooler change the multiples > of work_mem that a connection would use? I assume that a connection pooler would keep processes running longer, so even if they were not all using work_mem, they would have that memory mapped into the process, and perhaps swapped out. -- Bruce Momjian http://momjian.us EnterpriseDB http://enterprisedb.com + Everyone has their own god. + -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Auto-tuning work_mem and maintenance_work_mem
> More generally, Josh has made repeated comments that various proposed > value/formulas for work_mem are too low, but obviously the people who > suggested them didn't think so. So I'm a bit concerned that we don't > all agree on what the end goal of this activity looks like. The counter-proposal to "auto-tuning" is just to raise the default for work_mem to 4MB or 8MB. Given that Bruce's current formula sets it at 6MB for a server with 8GB RAM, I don't really see the benefit of going to a whole lot of code and formulas in order to end up at a figure only incrementally different from a new static default. The core issue here is that there aren't good "generic" values for these settings for all users -- that's why we have the settings in the first place. Following a formula isn't going to change that. If we're serious about autotuning, then we should look at: a) admissions control for non-shared resources (e.g. work_mem) b) auto-feedback tuning loops (ala Heikki's checkpoint_segments and the bgwriter). We could certainly create an autofeedback tuning loop for work_mem. Just watch the database, record the amount of data spilled to disk for work (pg_stat_sorts), and record the total RAM pinned by backends. *Then* apply a formula and maybe bump up work_mem a bit depending on what comes out of it. And keep monitoring and keep readjusting. -- Josh Berkus PostgreSQL Experts Inc. http://pgexperts.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] ECPG FETCH readahead
Boszormenyi Zoltan escribió: > 2013-09-10 03:04 keltezéssel, Peter Eisentraut írta: > >You need to update the dblink regression tests. > > Done. Dude, this is an humongous patch. I was shocked by it initially, but on further reading, I observed that it's only a huge patch which also does some mechanical changes to test output. I think it'd be better to split the part that's responsible for the changed lines in test output mentioning "ecpg_process_output". That should be a reasonably small patch which changes ecpg_execute slightly and adds the new function, is followed by the enormous resulting mechanical changes in test output. It should be possible to commit that relatively quickly. Then there's the rest of the patch, which would adds a huge pile of new code. I think there are some very minor changes to backend code as well -- would it make sense to post that as a separate piece? -- Álvaro Herrerahttp://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [PoC] pgstattuple2: block sampling to reduce physical read
On 16/09/13 16:20, Satoshi Nagayasu wrote: > (2013/09/15 11:07), Peter Eisentraut wrote: >> On Sat, 2013-09-14 at 16:18 +0900, Satoshi Nagayasu wrote: >>> I'm looking forward to seeing more feedback on this approach, >>> in terms of design and performance improvement. >>> So, I have submitted this for the next CF. >> >> Your patch fails to build: >> >> pgstattuple.c: In function ‘pgstat_heap_sample’: >> pgstattuple.c:737:13: error: ‘SnapshotNow’ undeclared (first use in >> this function) >> pgstattuple.c:737:13: note: each undeclared identifier is reported >> only once for each function it appears in > > Thanks for checking. Fixed to eliminate SnapshotNow. > This seems like a cool idea! I took a quick look, and initally replicated the sort of improvement you saw: bench=# explain analyze select * from pgstattuple('pgbench_accounts'); QUERY PLAN Function Scan on pgstattuple (cost=0.00..0.01 rows=1 width=72) (actual time=786.368..786.369 rows=1 loops=1) Total runtime: 786.384 ms (2 rows) bench=# explain analyze select * from pgstattuple2('pgbench_accounts'); NOTICE: pgstattuple2: SE tuple_count 0.00, tuple_len 0.00, dead_tuple_count 0.00, dead_tuple_len 0.00, free_space 0.00 QUERY PLAN Function Scan on pgstattuple2 (cost=0.00..0.01 rows=1 width=72) (actual time=12.004..12.005 rows=1 loops=1) Total runtime: 12.019 ms (2 rows) I wondered what sort of difference eliminating caching would make: $ sudo sysctl -w vm.drop_caches=3 Repeating the above queries: bench=# explain analyze select * from pgstattuple('pgbench_accounts'); QUERY PLAN Function Scan on pgstattuple (cost=0.00..0.01 rows=1 width=72) (actual time=9503.774..9503.776 rows=1 loops=1) Total runtime: 9504.523 ms (2 rows) bench=# explain analyze select * from pgstattuple2('pgbench_accounts'); NOTICE: pgstattuple2: SE tuple_count 0.00, tuple_len 0.00, dead_tuple_count 0.00, dead_tuple_len 0.00, free_space 0.00 QUERY PLAN Function Scan on pgstattuple2 (cost=0.00..0.01 rows=1 width=72) (actual time=12330.630..12330.631 rows=1 loops=1) Total runtime: 12331.353 ms (2 rows) So the sampling code seems *slower* when the cache is completely cold - is that expected? (I have not looked at how the code works yet - I'll dive in later if I get a chance)! Regards Mark
Re: [HACKERS] Re: Request for Patch Feedback: Lag & Lead Window Functions Can Ignore Nulls
We have this block: + { + /* +* This is the window we want - but we have to tweak the +* definition slightly (e.g. to support the IGNORE NULLS frame +* option) as we're not using the default (i.e. parent) frame +* options. +* +* We'll create a 'child' (using refname to inherit everything +* from the parent) that just overrides the frame options +* (assuming it doesn't already exist): +*/ + WindowDef *clone = makeNode(WindowDef); ... then it goes to populate the clone. When this is done, we use the clone to walk the list of existing WindowDefs, and if there's a match we free this one and use that one. Wouldn't it be better to walk the existing array first looking for a match, and only create a clone if none is found? This would avoid the memory leak problems; I originally pointed out that you're leaking the bits created by this makeNode() call above, but now that I look at it again, I think you're also leaking the bits created by the two copyObject() calls to create the clone. It appears to me that it's simpler to not allocate any memory in the first place, unless necessary. Also, in parsenodes.h, you had the [MANDATORY] and such tags. Three things about that: 1) it looks a lot uglier than the original, so how about the modified version below? and 2) what does "MANDATORY value of NULL" means? Maybe you mean "MANDATORY value or NULL" instead? 3) Exactly what case does the "in this case" phrase refer to? I think the comment should be more explicit. Also, I think this should be its own paragraph instead of being mixed with the "For entries in a" paragraph. /* * WindowDef - raw representation of WINDOW and OVER clauses * * For entries in a WINDOW list, "name" is the window name being defined. * For OVER clauses, we use "name" for the "OVER window" syntax, or "refname" * for the "OVER (window)" syntax, which is subtly different --- the latter * implies overriding the window frame clause. * * In this case, the per-field indicators determine what the semantics * are: * [V]irtual * If NULL, then the parent's (refname) value is used. * [M]andatory * Never inherited from the parent, so must be specified; may be NULL. * [S]uper * Always inherited from parent, any local version ignored. */ typedef struct WindowDef { NodeTag type; char *name; /* [M] window's own name */ char *refname;/* [M] referenced window name, if any */ List *partitionClause;/* [V] PARTITION BY expression list */ List *orderClause;/* [M] ORDER BY (list of SortBy) */ int frameOptions; /* [M] frame_clause options, see below */ Node *startOffset;/* [M] expression for starting bound, if any */ Node *endOffset; /* [M] expression for ending bound, if any */ int location; /* parse location, or -1 if none/unknown */ } WindowDef; In gram.y there are some spurious whitespaces at end-of-line. You should be able to see them with git diff --check. (I don't think we support running pgindent on .y files, which would have otherwise cleaned this up.) A style issue. You have this: + /* +* We can process a constant offset much more efficiently; initially +* we'll scan through the first non-null rows, and store that +* index. On subsequent rows we'll decide whether to push that index +* forwards to the next non-null value, or just return it again. +*/ + leadlag_const_context *context = WinGetPartitionLocalMemory( + winobj, + sizeof(leadlag_const_context)); + int count_forward = 0; I think it'd be better to put the declarations above the comment, and assignment to "context" below the comment. This way, the indentation of the assignment is not so odd. So it'd look like + leadlag_const_context *context; + int count_forward = 0; + + /* +* We can process a constant offset much more efficiently; initially +* we'll scan through the first non-null rows, and store that +* index. On subsequent rows we'll decide whether to push that index +* forwards to the next non-null value, or just return it again. +*/ + context = WinGetPartitionLocalMemory(winobj, +sizeof(leadlag_const_context)); And a final style comment. You have a block like this: if (ignore_nulls && !const_offset) { long block; } else if (ignore_nulls /* && const_offset */) { another long block; } else { more stuff; } I t
Re: [HACKERS] dynamic shared memory: wherein I am punished for good intentions
Robert, >> Doesn't #2 negate all advantages of this effort? Bringing sysv >> management back on the table seems like a giant step backwards -- or >> am I missing something? > > Not unless there's no difference between "the default" and "the only option". Well, per our earlier discussion about "the first 15 minutes", there actually isn't a difference. We got rid of SHMMAX for the majority of our users for 9.3. We should NOT revert that just so we can support older platforms -- especially since, if anything, Kernel.org is going to cripple SysV support even further in the future. The platforms where it does work represent the vast majority of our users, and are only on the increase. I can only see two reasonable alternatives: This one: > (3) Add a new setting that auto-probes for a type of shared memory > that works. Try POSIX first, then System V. Maybe even fall back to > mmap'd files if neither of those works. Or: (5) Default to POSIX, and allow for SysV as a compile-time option for platforms with poor POSIX memory support. -- Josh Berkus PostgreSQL Experts Inc. http://pgexperts.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] pg_stat_statements: calls under-estimation propagation
On Thu, Oct 10, 2013 at 1:30 PM, Alvaro Herrera wrote: >> > Just noticed that you changed the timer to struct Instrumentation. Not >> > really sure about that change. Since you seem to be using only the >> > start time and counter, wouldn't it be better to store only those? >> > Particularly unsure about passing INSTRUMENT_ALL to InstrAlloc(). >> >> Yeah, I was unsure about that too. >> >> The motivation was that I need one more piece of information in >> pgss_store (the absolute start time). I was going to widen the >> argument list, but it was looking pretty long, so instead I was >> thinking it'd be more concise to push the entire, typically extant >> Instrumentation struct pointer down. > > Would it work to define your own struct to pass around? Absolutely, I was just hoping to spare the code another abstraction if another was a precise superset. Looks like that isn't going to happen, though, so a pgss-oriented struct is likely what will have to be. -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] dynamic shared memory: wherein I am punished for good intentions
On Thu, Oct 10, 2013 at 4:00 PM, Merlin Moncure wrote: >> (2) Default to using System V shared memory. If people want POSIX >> shared memory, let them change the default. > > Doesn't #2 negate all advantages of this effort? Bringing sysv > management back on the table seems like a giant step backwards -- or > am I missing something? Not unless there's no difference between "the default" and "the only option". -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] Pattern matching operators a index
Hi I'm developing a new type for character string, like varchar. I wrote operators for btree and so forth. I wonder how pattern matching operators using btree index, because btree operator class ony knows about >, >=, <=, and = operators, but operators for pattern matching, such as LIKE, are not known for btree access method. Now my question is: Is Postgre using btree for pattern matching query for varchar or other character string types? If it does, how i implement it for my new type? Regards, Soroosh
Re: [HACKERS] pg_stat_statements: calls under-estimation propagation
The V7-Patch applied cleanly and I got no issues in my first tests. The change from column session_start to a function seems very reasonable for me. Concernig the usability, I would like to suggest a minor change, that massively increases the usefulness of the patch for beginners, who often use this view as a first approach to optimize index structure. The history of this tool contains a first version without normalization. This wasn't useful enough except for prepared queries. The actual version has normalized queries, so calls get summarized to get a glimpse of bad queries. But the drawback of this approach is impossibility to use explain analyze without further substitutions. The identification patch provides the possibility to summarize calls by query_id, so that the normalized query string itself is no longer needed to be exposed in the view for everyone. I suggest to add a parameter to recover the possibility to display real queries. The following very minor change (based on V7) exposes the first real query getting this query_id if normalization of the exposed string ist deactivated (The actual behaviour is the default). This new option is not always the best starting point to discover index shortfalls, but a huge gain for beginners because it serves the needs in more than 90% of the normal use cases. What do you think? Arne Date: Mon Oct 7 17:54:08 2013 + Switch to disable normalized query strings diff --git a/contrib/pg_stat_statements/pg_stat_statements.c b/contrib/pg_stat_statements/pg_stat_statements.c index e50dfba..6cc9244 100644 --- a/contrib/pg_stat_statements/pg_stat_statements.c +++ b/contrib/pg_stat_statements/pg_stat_statements.c @@ -234,7 +234,7 @@ static int pgss_max; /* max # statements to track */ static int pgss_track; /* tracking level */ static bool pgss_track_utility; /* whether to track utility commands */ static bool pgss_save; /* whether to save stats across shutdown */ - +static bool pgss_normalize; /* whether to normalize the query representation shown in the view (otherwise show the first query executed with this query_id) */ #define pgss_enabled() \ (pgss_track == PGSS_TRACK_ALL || \ @@ -356,6 +356,17 @@ _PG_init(void) NULL, NULL); + DefineCustomBoolVariable("pg_stat_statements.normalize", +"Selects whether the view column contains the query strings in a normalized form.", + NULL, + &pgss_normalize, + true, + PGC_SUSET, + 0, + NULL, + NULL, + NULL); + EmitWarningsOnPlaceholders("pg_stat_statements"); /* @@ -1084,9 +1095,9 @@ pgss_store(const char *query, uint32 queryId, query_len = strlen(query); - if (jstate) + if (jstate && pgss_normalize) { - /* Normalize the string if enabled */ + /* Normalize the string is not NULL and normalized query strings are enabled */ norm_query = generate_normalized_query(jstate, query, &query_len, key.encoding); -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] dynamic shared memory: wherein I am punished for good intentions
On Thu, Oct 10, 2013 at 12:13:20PM -0400, Robert Haas wrote: > Since, as has been previously discussed in this forum on multiple > occasions [citation needed], the default System V shared memory limits > are absurdly low on many systems, the dynamic shared memory patch > defaults to POSIX shared memory, which has often been touted as a > superior alternative [citation needed]. Unfortunately, the buildfarm > isn't entirely happy with this decision. On buildfarm member anole > (HP-UX B.11.31), allocation of dynamic shared memory fails with a > "Permission denied" error, and on smew (Debian GNU/Linux 6.0), it > fails with "Function not implemented", which according to a forum > post[1] I found probably indicates that /dev/shm doesn't mount a tmpfs > on that box. > > What shall we do about this? I see a few options. > > (1) Define the issue as "not our problem". IOW, as of now, if you > want to use PostgreSQL, you've got to either make POSIX shared memory > work on your machine, or change the GUC that selects the type of > dynamic shared memory used. +1 for this. Cheers, David. -- David Fetter http://fetter.org/ Phone: +1 415 235 3778 AIM: dfetter666 Yahoo!: dfetter Skype: davidfetter XMPP: david.fet...@gmail.com iCal: webcal://www.tripit.com/feed/ical/people/david74/tripit.ics Remember to vote! Consider donating to Postgres: http://www.postgresql.org/about/donate -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] dynamic shared memory: wherein I am punished for good intentions
On 10/10/2013 02:45 PM, Robert Haas wrote: On Thu, Oct 10, 2013 at 2:36 PM, Peter Geoghegan wrote: On Thu, Oct 10, 2013 at 9:13 AM, Robert Haas wrote: (2) Default to using System V shared memory. If people want POSIX shared memory, let them change the default. After some consideration, I think my vote is for option #2. Wouldn't that become the call of packagers? Packagers can certainly override whatever we do, but we still need to make the buildfarm green again. I really dislike throwing things over the wall to packagers like this, anyway. Quite apart from anything else, not everyone uses pre-built packages, and we should make things as as easy as possible for those who don't, too. cheers andrew -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] pg_stat_statements: calls under-estimation propagation
Daniel Farina escribió: > Given that, perhaps a way to fix this is something like this patch-fragment: > > """ > { > PGSS_TUP_V1_0 = 1, > PGSS_TUP_V1_1, > - PGSS_TUP_LATEST > + PGSS_TUP_V1_2 > } pgssTupVersion; > > +#define PGSS_TUP_LATEST PGSS_TUP_V1_2 > """ This sounds good. I have seen other places that have the LATEST definition as part of the enum, assigning the previous value to it. I'm not really sure which of these is harder to miss when updating the code. I'm happy with either. Make sure to use the PGSS_TUP_V1_2 in the two places mentioned, in any case. > > Just noticed that you changed the timer to struct Instrumentation. Not > > really sure about that change. Since you seem to be using only the > > start time and counter, wouldn't it be better to store only those? > > Particularly unsure about passing INSTRUMENT_ALL to InstrAlloc(). > > Yeah, I was unsure about that too. > > The motivation was that I need one more piece of information in > pgss_store (the absolute start time). I was going to widen the > argument list, but it was looking pretty long, so instead I was > thinking it'd be more concise to push the entire, typically extant > Instrumentation struct pointer down. Would it work to define your own struct to pass around? -- Álvaro Herrerahttp://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] pg_stat_statements: calls under-estimation propagation
On Thu, Oct 10, 2013 at 10:49 AM, Alvaro Herrera wrote: > Daniel Farina escribió: >> On Thu, Oct 10, 2013 at 7:40 AM, Fujii Masao wrote: > >> > In my test, I found that pg_stat_statements.total_time always indicates a >> > zero. >> > I guess that the patch might handle pg_stat_statements.total_time wrongly. >> > >> > +values[i++] = DatumGetTimestamp( >> > +instr_get_timestamptz(pgss->session_start)); >> > +values[i++] = DatumGetTimestamp( >> > +instr_get_timestamptz(entry->introduced)); >> > >> > These should be executed only when detected_version >= PGSS_TUP_LATEST? >> >> Yes. Oversight. > > Hmm, shouldn't this be conditional on a new PGSS_TUP_V1_2? I mean, if > later somebody patches pgss to have a PGSS_TUP_V2 or V1_3 and that > becomes latest, somebody running the current definition with the updated > .so will no longer get these values. Or is the intention that > PGSS_TUP_LATEST will never be updated again, and future versions will > get PGSS_TUP_REALLY_LATEST and PGSS_TUP_REALLY_LATEST_HONEST and so on? > I mean, surely we can always come up with new symbols to use, but it > seems hard to follow. > > There's one other use of PGSS_TUP_LATEST here which I think should also > be changed to >= SOME_SPECIFIC_VERSION: > > + if (detected_version >= PGSS_TUP_LATEST) > + { > + uint64 qid = pgss->private_stat_session_key; > + > + qid ^= (uint64) entry->query_id; > + qid ^= ((uint64) entry->query_id) << 32; > + > + values[i++] = Int64GetDatumFast(qid); > + } I made some confusing mistakes here in using the newer tuple versioning. Let me try to explain what the motivation was: I was adding new fields to pg_stat_statements and was afraid that it'd be hard to get a very clear view that all the set fields are in alignment and there were no accidental overruns, with the problem getting more convoluted as more versions are added. The idea of PGSS_TUP_LATEST is useful to catch common programmer error by testing some invariants, and it'd be nice not to have to thrash the invariant checking code every release, which would probably defeat the point of such "oops" prevention code. But, the fact that I went on to rampantly do questionable things PGSS_TUP_LATEST is a bad sign. By example, here are the two uses that have served me very well: /* Perform version detection */ if (tupdesc->natts == PG_STAT_STATEMENTS_COLS_V1_0) detected_version = PGSS_TUP_V1_0; else if (tupdesc->natts == PG_STAT_STATEMENTS_COLS_V1_1) detected_version = PGSS_TUP_V1_1; else if (tupdesc->natts == PG_STAT_STATEMENTS_COLS) detected_version = PGSS_TUP_LATEST; else { /* * Couldn't identify the tuple format. Raise error. * * This is an exceptional case that may only happen in bizarre * situations, since it is thought that every released version * of pg_stat_statements has a matching schema. */ ereport(ERROR, (errcode(ERRCODE_OBJECT_NOT_IN_PREREQUISITE_STATE), errmsg("pg_stat_statements schema is not supported " "by its installed binary"))); } And #ifdef USE_ASSERT_CHECKING /* Check that every column appears to be filled */ switch (detected_version) { case PGSS_TUP_V1_0: Assert(i == PG_STAT_STATEMENTS_COLS_V1_0); break; case PGSS_TUP_V1_1: Assert(i == PG_STAT_STATEMENTS_COLS_V1_1); break; case PGSS_TUP_LATEST: Assert(i == PG_STAT_STATEMENTS_COLS); break; default: Assert(false); } #endif Given that, perhaps a way to fix this is something like this patch-fragment: """ { PGSS_TUP_V1_0 = 1, PGSS_TUP_V1_1, - PGSS_TUP_LATEST + PGSS_TUP_V1_2 } pgssTupVersion; +#define PGSS_TUP_LATEST PGSS_TUP_V1_2 """ This way when a programmer is making new tuple versions, they are much more likely to see the immediate need to teach those two sites about the new tuple size. But, the fact that one does not get the invariants updated in a completely compulsory way may push the value of this checking under water. I'd be sad to see it go, it has saved me a lot of effort when returning to the code after a while. What do you think? > The instr_time thingy being used for these times maps to > QueryPerformanceCounter() on Windows, and I'm not sure how useful this > is for long-term time tracking; see > http://stackoverflow.com/questions/5296990/queryperformancecounter-and-overflows#5297163 > for instance. I think it'd be better to use TimestampTz and > GetCurrentTimestamp() for this. Ah. I was going to do that, but thought it'd be nice to merely push down the already-extant Instr struct in most cases, as to get the 'start' t
Re: [HACKERS] dynamic shared memory: wherein I am punished for good intentions
On Thu, Oct 10, 2013 at 11:13 AM, Robert Haas wrote: > Since, as has been previously discussed in this forum on multiple > occasions [citation needed], the default System V shared memory limits > are absurdly low on many systems, the dynamic shared memory patch > defaults to POSIX shared memory, which has often been touted as a > superior alternative [citation needed]. Unfortunately, the buildfarm > isn't entirely happy with this decision. On buildfarm member anole > (HP-UX B.11.31), allocation of dynamic shared memory fails with a > "Permission denied" error, and on smew (Debian GNU/Linux 6.0), it > fails with "Function not implemented", which according to a forum > post[1] I found probably indicates that /dev/shm doesn't mount a tmpfs > on that box. > > What shall we do about this? I see a few options. > > (1) Define the issue as "not our problem". IOW, as of now, if you > want to use PostgreSQL, you've got to either make POSIX shared memory > work on your machine, or change the GUC that selects the type of > dynamic shared memory used. > > (2) Default to using System V shared memory. If people want POSIX > shared memory, let them change the default. Doesn't #2 negate all advantages of this effort? Bringing sysv management back on the table seems like a giant step backwards -- or am I missing something? http://www.postgresql.org/docs/9.3/interactive/kernel-resources.html#SYSVIPC merlin -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Auto-tuning work_mem and maintenance_work_mem
On Thu, Oct 10, 2013 at 3:41 PM, Christopher Browne wrote: > On Thu, Oct 10, 2013 at 12:28 PM, Bruce Momjian wrote: >> How do we handle the Python dependency, or is this all to be done in >> some other language? I certainly am not ready to take on that job. > > I should think it possible to reimplement it in C. It was considerably > useful to start by implementing in Python, as that evades various sorts > of efforts needed in C (e.g. - memory allocation, picking a hash table > implementation), and allows someone to hack on it without needing to > run through a recompile every time something is touched. Also, the last time I saw that tool, it output recommendations for work_mem that I would never, ever recommend to anyone on a production server - they were VERY high. More generally, Josh has made repeated comments that various proposed value/formulas for work_mem are too low, but obviously the people who suggested them didn't think so. So I'm a bit concerned that we don't all agree on what the end goal of this activity looks like. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] pg_stat_statements: calls under-estimation propagation
On Thu, Oct 10, 2013 at 10:12 AM, Fujii Masao wrote: > On Fri, Oct 11, 2013 at 12:19 AM, Daniel Farina wrote: >> Probably. >> >> The idea is that without those fields it's, to wit, impossible to >> explain non-monotonic movement in metrics of those queries for precise >> use in tools that insist on monotonicity of the fields, which are all >> cumulative to date I think. >> >> pg_stat_statements_reset() or crashing loses the session, so one >> expects "ncalls" may decrease. In addition, a query that is bouncing >> in and out of the hash table will have its statistics be lost, so its >> statistics will also decrease. This can cause un-resolvable artifact >> in analysis tools. >> >> The two fields allow for precise understanding of when the statistics >> for a given query_id are continuous since the last time a program >> inspected it. > > Thanks for elaborating them! Since 'introduced' is reset even when > the statistics is reset, maybe we can do without 'session_start' for > that purpose? There is a small loss of precision. The original reason was that if one wanted to know, given two samples of pg_stat_statements, when the query_id is going to remain stable for a given query. For example: If the session changes on account of a reset, then it is known that all query_ids one's external program is tracking are no longer going to be updated, and the program can take account for the fact that the same query text is going to have a new query_id. Given the new idea of mixing in the point release number: If the code is changed to instead mixing in the full version of Postgres, as you suggested recently, this can probably be removed. The caveat there is then the client is going to have to do something a bit weird like ask for the point release and perhaps even compile options of Postgres to know when the query_id is going to have a different value for a given query text. But, maybe this is an acceptable compromise to remove one field. >>> +/* >>> + * The role calling this function is unable to see >>> + * sensitive aspects of this tuple. >>> + * >>> + * Nullify everything except the "insufficient privilege" >>> + * message for this entry >>> + */ >>> +memset(nulls, 1, sizeof nulls); >>> + >>> +nulls[i] = 0; >>> +values[i] = CStringGetTextDatum(""); >>> >>> Why do we need to hide *all* the fields in pg_stat_statements, when >>> it's accessed by improper user? This is a big change of pg_stat_statements >>> behavior, and it might break the compatibility. >> >> It seems like an information leak that grows more major if query_id is >> exposed and, at any point, one can determine the query_id for a query >> text. > > So hiding only query and query_id is enough? Yeah, I think so. The other fields feel a bit weird to leave hanging around as well, so I thought I'd just "fix" it in one shot, but doing the minimum or only applying this idea to new fields is safer. It's shorter to hit all the fields, though, which is why I was tempted to do that. Perhaps not a good economy for potential subtle breaks in the next version. -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Auto-tuning work_mem and maintenance_work_mem
On Thu, Oct 10, 2013 at 12:28 PM, Bruce Momjian wrote: > How do we handle the Python dependency, or is this all to be done in > some other language? I certainly am not ready to take on that job. I should think it possible to reimplement it in C. It was considerably useful to start by implementing in Python, as that evades various sorts of efforts needed in C (e.g. - memory allocation, picking a hash table implementation), and allows someone to hack on it without needing to run through a recompile every time something is touched. -- When confronted by a difficult problem, solve it by reducing it to the question, "How would the Lone Ranger handle this?" -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] dynamic shared memory: wherein I am punished for good intentions
* Robert Haas (robertmh...@gmail.com) wrote: > On Thu, Oct 10, 2013 at 2:36 PM, Peter Geoghegan wrote: > > On Thu, Oct 10, 2013 at 9:13 AM, Robert Haas wrote: > >> (2) Default to using System V shared memory. If people want POSIX > >> shared memory, let them change the default. > > > >> After some consideration, I think my vote is for option #2. > > > > Wouldn't that become the call of packagers? > > Packagers can certainly override whatever we do, but we still need to > make the buildfarm green again. While I agree that making the buildfarm green is valuable, I really wonder about a system where /dev/shm is busted. Personally, I like Andrew's suggestion to test and set accordingly, with the default being POSIX if it's available and a fall-back to SysV (maybe with a warning..). Going back to the situation where our default set-up limits us to the ridiculously small SysV value would *really* suck; even if we don't have any users today, we're certainly going to have some soon and I don't think they'll be happy with a 24MB (or whatever) limit. Thanks, Stephen signature.asc Description: Digital signature
Re: [HACKERS] dynamic shared memory: wherein I am punished for good intentions
On 10/10/2013 02:35 PM, Robert Haas wrote: On Thu, Oct 10, 2013 at 2:21 PM, Andrew Dunstan wrote: Other votes? Other ideas? 5) test and set it in initdb. Are you advocating for that option, or just calling out that it's possible? I'd say that's closely related to option #3, except at initdb time rather than run-time - and it might be preferable to #3 for some of the same reasons discussed on the thread about tuning work_mem, namely, that having it change from one postmaster lifetime to the next might lead to user astonishment. Mainly just to throw it into the mix, But like you I think it's probably a better option than #3 for the reason you give. It also has the advantage of keeping any probing code out of the backend. cheers andrew -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Auto-tuning work_mem and maintenance_work_mem
On Thu, Oct 10, 2013 at 8:46 PM, Robert Haas wrote: > On Thu, Oct 10, 2013 at 2:45 PM, Josh Berkus wrote: >> On 10/10/2013 11:41 AM, Robert Haas wrote: >>> tunedb --available-memory=32GB >>> >>> ...and it will print out a set of proposed configuration settings. If >>> we want a mode that rewrites the configuration file, we could have: >>> >>> tunedb --available-memory=32GB --rewrite-config-file=$PATH >>> >>> ...but that might be overkill, at least for version 1. >> >> Given that we are talking currently about ALTER SYSTEM SET *and* >> configuration directories, we should not be rewriting any existing >> config file. We should be adding an auto-generated one, or using ALTER >> SYSTEM SET. >> >> In fact, why don't we just do this though ALTER SYSTEM SET? add a >> plpgsql function called pg_tune(). > > That's another way to do it, for sure. It does require the ability to > log in to the database. I imagine that could be less convenient in > some scripting environments. I think that would also make it much harder to automate for packagers. -- Magnus Hagander Me: http://www.hagander.net/ Work: http://www.redpill-linpro.com/ -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [v9.4] row level security
On Wed, 2013-09-04 at 14:35 +, Robert Haas wrote: > > On Fri, Aug 30, 2013 at 3:43 PM, Tom Lane wrote: > > I think it's entirely sensible to question whether we should reject > (not > > "hold up") RLS if it has major covert-channel problems. > > We've already had this argument before, about the security_barrier [ . . . ] Sorry for following up on this so late, I have just been trying to catch up with the mailing lists. I am the developer of Veil, which this thread mentioned a number of times. I wanted to state/confirm a number of things: Veil is not up to date wrt Postgres versions. I didn't release a new version for 9.2, and when no-one complained I figured no-one other than me was using it. I'll happily update it if anyone wants it. Veil makes no attempt to avoid covert channels. It can't. Veil is a low-level toolset designed for optimising queries about privileges. It allows you to build RLS with reasonable performance, but it is not in itself a solution for RLS. I wish the Postgres RLS project well and look forward to its release in Postgres 9.4. __ Marc signature.asc Description: This is a digitally signed message part
Re: [HACKERS] strange behavior of pg_trgm's similarity function
On Thu, Oct 10, 2013 at 11:00 PM, Merlin Moncure wrote: > On Thu, Oct 10, 2013 at 7:12 AM, Heikki Linnakangas > wrote: >> On 10.10.2013 15:03, Fujii Masao wrote: >>> >>> Hi, >>> >>> The behavior of pg_trgm's similarity function seems strange. Is this >>> intentional? >>> >>> I was thinking that the following three calls of the similarity function >>> return >>> the same number because the second argument is just the three characters >>> contained in the first argument in every calls. >>> >>> =# SELECT similarity('12345', '123'); >>> =# SELECT similarity('12345', '234'); >>> =# SELECT similarity('12345', '345'); >>> >>> But that's not true. Each returns the different number. >>> >>> =# SELECT similarity('12345', '123'); >>> similarity >>> >>> 0.428571 >>> (1 row) >>> >>> =# SELECT similarity('12345', '234'); >>> similarity >>> >>> 0.11 >>> (1 row) >>> >>> =# SELECT similarity('12345', '345'); >>> similarity >>> >>> 0.25 >>> (1 row) >>> >>> This happens because, for example, similarity('12345', '123') returns >>> the similarity number of '**12345*' and '**123*' (* means the blank >>> character), >>> NOT '12345' and '123'. IOW, two and one blank characters are added into >>> the heading and tailing of each argument, respectively. I wonder why >>> pg_trgm's similarity function works in this way. We should change this >>> so that no blank characters are added into the arguments? >> >> >> Well, you could also argue that "11" and "22" are quite similar, >> even though pg_trgm's similarity will not think so. It comes down to the >> definition of similarity, and how well that definition matches your >> intuition. >> >> FWIW, it feels right to me that a match in the beginning of a word is worth >> more than one in the middle of a string. -1 on changing that. Okay, understood. > I'm not so sure that the assumption that leading trigrams should > effectively weight > 3x is a good one to build into the library. > However, the behavior is clearly documented and can't be changed. I > think you'd need to improvise an alternate set of "trigram ops" if you > wanted to rig an alternate matching behavior. Yeah, this makes sense. Regards, -- Fujii Masao -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Auto-tuning work_mem and maintenance_work_mem
On 10/10/13 11:45 AM, Bruce Momjian wrote: > I think the big win for a tool would be to query the user about how they > are going to be using Postgres, and that can then spit out values the > user can add to postgresql.conf, or to a config file that is included at > the end of postgresql.conf. I think such a tool would actually make the initial experience worse for many people. Quick, how are you going to use your PostgreSQL server: - OLTP - web - mixed Uh, all of the above? This sort of thing can quickly turn the "first 15 minutes" into the first 2 hours, as users are forced to analyze the different settings, have second thoughts, wonder about how to change them back, etc. The fewer decisions people have to make initially, the better. The initdb phase already has too many required decisions that can cause regret later (e.g., locale, encoding, checksums). -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Auto-tuning work_mem and maintenance_work_mem
On Wed, Oct 9, 2013 at 10:21 PM, Magnus Hagander wrote: >> Well, the Postgres defaults won't really change, because the default >> vacuum_work_mem will be -1, which will have vacuum defer to >> maintenance_work_mem. Under this scheme, vacuum only *prefers* to get >> bound working memory size from vacuum_work_mem. If you don't like >> vacuum_work_mem, you can just ignore it. > While unrelated to the main topic of this thread, I think this is very > important as well. I often have to advice people to remember to cap > their maintenance_work_mem because of autovacuum, and to remember to > re-tune maintenance_wokr_mem when they change the number of autovacuum > workers. I'll code that up at some point, then. -- Peter Geoghegan -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] dynamic shared memory: wherein I am punished for good intentions
On Thu, Oct 10, 2013 at 2:36 PM, Peter Geoghegan wrote: > On Thu, Oct 10, 2013 at 9:13 AM, Robert Haas wrote: >> (2) Default to using System V shared memory. If people want POSIX >> shared memory, let them change the default. > >> After some consideration, I think my vote is for option #2. > > Wouldn't that become the call of packagers? Packagers can certainly override whatever we do, but we still need to make the buildfarm green again. > Wasn't there already some > reason why it was advantageous for FreeBSD to continue to use System V > shared memory? Yes, but this code doesn't affect the main shared memory segment, so I think that's sort of a separate point. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Auto-tuning work_mem and maintenance_work_mem
On 10/10/2013 11:41 AM, Robert Haas wrote: > tunedb --available-memory=32GB > > ...and it will print out a set of proposed configuration settings. If > we want a mode that rewrites the configuration file, we could have: > > tunedb --available-memory=32GB --rewrite-config-file=$PATH > > ...but that might be overkill, at least for version 1. Given that we are talking currently about ALTER SYSTEM SET *and* configuration directories, we should not be rewriting any existing config file. We should be adding an auto-generated one, or using ALTER SYSTEM SET. In fact, why don't we just do this though ALTER SYSTEM SET? add a plpgsql function called pg_tune(). -- Josh Berkus PostgreSQL Experts Inc. http://pgexperts.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Auto-tuning work_mem and maintenance_work_mem
On Thu, Oct 10, 2013 at 8:41 PM, Robert Haas wrote: > On Thu, Oct 10, 2013 at 12:28 PM, Bruce Momjian wrote: >> On Thu, Oct 10, 2013 at 12:00:54PM -0400, Stephen Frost wrote: >>> * Bruce Momjian (br...@momjian.us) wrote: >>> > Well, I like the idea of initdb calling the tool, though the tool then >>> > would need to be in C probably as we can't require python for initdb. >>> > The tool would not address Robert's issue of someone increasing >>> > shared_buffers on their own. >>> >>> I'm really not impressed with this argument. Either the user is going >>> to go and modify the config file, in which case I would hope that they'd >>> at least glance around at what they should change, or they're going to >>> move off PG because it's not performing well enough for them- which is >>> really what I'm trying to avoid happening during the first 15m. >> >> Well, they aren't going around and looking at other parameters now or we >> would not feel a need to auto-tune many of our defaults. >> >> How do we handle the Python dependency, or is this all to be done in >> some other language? I certainly am not ready to take on that job. > > I don't see why it can't be done in C. The server is written in C, > and so is initdb. So no matter where we do this, it's gonna be in C. > Where does Python enter into it? > > What I might propose is that we have add a new binary tunedb, maybe > compiled out of the src/bin/initdb.c directory. So you can say: > > initdb --available-memory=32GB > > ...and it will initialize the cluster with appropriate settings. Or > you can say: > > tunedb --available-memory=32GB > > ...and it will print out a set of proposed configuration settings. If > we want a mode that rewrites the configuration file, we could have: > > tunedb --available-memory=32GB --rewrite-config-file=$PATH > > ...but that might be overkill, at least for version 1. I like this. And I agree that the edit-in-place might be overkill. But then, if/when we get the ability to programatically modify the config files, that's probably not a very complicated thing to add once the rest is done. >> One nice thing about a tool is that you can see your auto-tuned defaults >> right away, while doing this in the backend, you have to start the >> server to see the defaults. I am not even sure how I could allow users >> to preview their defaults for different available_mem settings. > > Yep, agreed. And agreed that not being able to preview settings is a problem. I'd even say it would be a *big* problem. -- Magnus Hagander Me: http://www.hagander.net/ Work: http://www.redpill-linpro.com/ -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] pg_stat_statements: calls under-estimation propagation
On Fri, Oct 11, 2013 at 2:49 AM, Alvaro Herrera wrote: > Daniel Farina escribió: >> On Thu, Oct 10, 2013 at 7:40 AM, Fujii Masao wrote: > >> > In my test, I found that pg_stat_statements.total_time always indicates a >> > zero. >> > I guess that the patch might handle pg_stat_statements.total_time wrongly. >> > >> > +values[i++] = DatumGetTimestamp( >> > +instr_get_timestamptz(pgss->session_start)); >> > +values[i++] = DatumGetTimestamp( >> > +instr_get_timestamptz(entry->introduced)); >> > >> > These should be executed only when detected_version >= PGSS_TUP_LATEST? >> >> Yes. Oversight. > > Hmm, shouldn't this be conditional on a new PGSS_TUP_V1_2? I was just thinking the same thing. Agreed. Regards, -- Fujii Masao -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Auto-tuning work_mem and maintenance_work_mem
On Thu, Oct 10, 2013 at 2:45 PM, Josh Berkus wrote: > On 10/10/2013 11:41 AM, Robert Haas wrote: >> tunedb --available-memory=32GB >> >> ...and it will print out a set of proposed configuration settings. If >> we want a mode that rewrites the configuration file, we could have: >> >> tunedb --available-memory=32GB --rewrite-config-file=$PATH >> >> ...but that might be overkill, at least for version 1. > > Given that we are talking currently about ALTER SYSTEM SET *and* > configuration directories, we should not be rewriting any existing > config file. We should be adding an auto-generated one, or using ALTER > SYSTEM SET. > > In fact, why don't we just do this though ALTER SYSTEM SET? add a > plpgsql function called pg_tune(). That's another way to do it, for sure. It does require the ability to log in to the database. I imagine that could be less convenient in some scripting environments. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Auto-tuning work_mem and maintenance_work_mem
On 10/10/13 11:31 AM, Bruce Momjian wrote: > Let me walk through the idea of adding an available_mem setting, that > Josh suggested, and which I think addresses Robert's concern about > larger shared_buffers and Windows servers. I think this is a promising idea. available_mem could even be set automatically by packages. And power users could just set available_mem = -1 to turn off all the magic. -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Auto-tuning work_mem and maintenance_work_mem
On Thu, Oct 10, 2013 at 11:43 AM, Robert Haas wrote: > On Thu, Oct 10, 2013 at 1:37 PM, Josh Berkus wrote: >> So, the question is: can we reasonably determine, at initdb time, how >> much RAM the system has? > > As long as you are willing to write platform-dependent code, yes. That's why trying to give the responsibility to a packager is compelling. -- Peter Geoghegan -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Auto-tuning work_mem and maintenance_work_mem
On Thu, Oct 10, 2013 at 11:41 AM, Robert Haas wrote: > I don't see why it can't be done in C. The server is written in C, > and so is initdb. So no matter where we do this, it's gonna be in C. > Where does Python enter into it? I mentioned that pgtune was written in Python, but as you say that's wholly incidental. An equivalent C program would only be slightly more verbose. -- Peter Geoghegan -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Compression of full-page-writes
On Fri, Oct 11, 2013 at 1:20 AM, Dimitri Fontaine wrote: > Hi, > > I did a partial review of this patch, wherein I focused on the patch and > the code itself, as I saw other contributors already did some testing on > it, so that we know it applies cleanly and work to some good extend. Thanks a lot! > In full_page_writes_str() why are you returning "unrecognized" rather > than doing an ELOG(ERROR, …) for this unexpected situation? It's because the similar functions 'wal_level_str' and 'dbState' also return 'unrecognized' in the unexpected situation. I just implemented full_page_writes_str() in the same manner. If we do an elog(ERROR) in that case, pg_xlogdump would fail to dump the 'broken' (i.e., unrecognized fpw is set) WAL file. I think that some users want to use pg_xlogdump to investigate the broken WAL file, so doing an elog(ERROR) seems not good to me. > The code switches to compression (or trying to) when the following > condition is met: > > + if (fpw <= FULL_PAGE_WRITES_COMPRESS) > + { > + rdt->data = CompressBackupBlock(page, BLCKSZ - > bkpb->hole_length, &(rdt->len)); > > We have > > + typedef enum FullPageWritesLevel > + { > + FULL_PAGE_WRITES_OFF = 0, > + FULL_PAGE_WRITES_COMPRESS, > + FULL_PAGE_WRITES_ON > + } FullPageWritesLevel; > > + #define FullPageWritesIsNeeded(fpw) (fpw >= FULL_PAGE_WRITES_COMPRESS) > > I don't much like using the <= test against and ENUM and I'm not sure I > understand the intention you have here. It somehow looks like a typo and > disagrees with the macro. I thought that FPW should be compressed only when full_page_writes is set to 'compress' or 'off'. That is, 'off' implies a compression. When it's set to 'off', FPW is basically not generated, so there is no need to call CompressBackupBlock() in that case. But only during online base backup, FPW is forcibly generated even when it's set to 'off'. So I used the check "fpw <= FULL_PAGE_WRITES_COMPRESS" there. > What about using the FullPageWritesIsNeeded > macro, and maybe rewriting the macro as > > #define FullPageWritesIsNeeded(fpw) \ >(fpw == FULL_PAGE_WRITES_COMPRESS || fpw == FULL_PAGE_WRITES_ON) I'm OK to change the macro so that the <= test is not used. > Also, having "on" imply "compress" is a little funny to me. Maybe we > should just finish our testing and be happy to always compress the full > page writes. What would the downside be exactly (on buzy IO system > writing less data even if needing more CPU will be the right trade-off). "on" doesn't imply "compress". When full_page_writes is set to "on", FPW is not compressed at all. > I like that you're checking the savings of the compressed data with > respect to the uncompressed data and cancel the compression if there's > no gain. I wonder if your test accounts for enough padding and headers > though given the results we saw in other tests made in this thread. I'm afraid that the patch has only limited effects in WAL reduction and performance improvement unless the database contains highly-compressible data like large blank characters column. It really depends on the contents of the database. So, obviously FPW compression should not be the default. Maybe we can treat it as just tuning knob. > Why do we have both the static function full_page_writes_str() and the > macro FullPageWritesStr, with two different implementations issuing > either "true" and "false" or "on" and "off"? First I was thinking to use "on" and "off" because they are often used as the setting value of boolean GUC. But unfortunately the existing pg_xlogdump uses "true" and "false" to show the value of full_page_writes in WAL. To avoid breaking the backward compatibility, I implmented the "true/false" version of function. I'm really not sure how many people want such a compatibility of pg_xlogdump, though. > ! unsignedhole_offset:15, /* number of bytes before "hole" */ > ! flags:2,/* state of a backup > block, see below */ > ! hole_length:15; /* number of bytes in "hole" > */ > > I don't understand that. I wanted to use that patch as a leverage to > smoothly discover the internals of our WAL system but won't have the > time to do that here. We need the flag indicating whether each FPW is compressed or not. If no such a flag exists in WAL, the standby cannot determine whether it should decompress each FPW or not, and then cannot replay the WAL containing FPW properly. That is, I just used a 'space' in the header of FPW to have such a flag. > That said, I don't even know that C syntax. The struct 'ItemIdData' uses the same C syntax. > + #define BKPBLOCK_UNCOMPRESSED 0 /* uncompressed */ > + #define BKPBLOCK_COMPRESSED 1 /* comperssed */ > > There's a typo in the comment above. Yep. >> [time required to replay WAL generated during running pgbench] >> 61s (on) 120
Re: [HACKERS] Auto-tuning work_mem and maintenance_work_mem
On Thu, Oct 10, 2013 at 1:37 PM, Josh Berkus wrote: > So, the question is: can we reasonably determine, at initdb time, how > much RAM the system has? As long as you are willing to write platform-dependent code, yes. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Auto-tuning work_mem and maintenance_work_mem
On Thu, Oct 10, 2013 at 12:28 PM, Bruce Momjian wrote: > On Thu, Oct 10, 2013 at 12:00:54PM -0400, Stephen Frost wrote: >> * Bruce Momjian (br...@momjian.us) wrote: >> > Well, I like the idea of initdb calling the tool, though the tool then >> > would need to be in C probably as we can't require python for initdb. >> > The tool would not address Robert's issue of someone increasing >> > shared_buffers on their own. >> >> I'm really not impressed with this argument. Either the user is going >> to go and modify the config file, in which case I would hope that they'd >> at least glance around at what they should change, or they're going to >> move off PG because it's not performing well enough for them- which is >> really what I'm trying to avoid happening during the first 15m. > > Well, they aren't going around and looking at other parameters now or we > would not feel a need to auto-tune many of our defaults. > > How do we handle the Python dependency, or is this all to be done in > some other language? I certainly am not ready to take on that job. I don't see why it can't be done in C. The server is written in C, and so is initdb. So no matter where we do this, it's gonna be in C. Where does Python enter into it? What I might propose is that we have add a new binary tunedb, maybe compiled out of the src/bin/initdb.c directory. So you can say: initdb --available-memory=32GB ...and it will initialize the cluster with appropriate settings. Or you can say: tunedb --available-memory=32GB ...and it will print out a set of proposed configuration settings. If we want a mode that rewrites the configuration file, we could have: tunedb --available-memory=32GB --rewrite-config-file=$PATH ...but that might be overkill, at least for version 1. > One nice thing about a tool is that you can see your auto-tuned defaults > right away, while doing this in the backend, you have to start the > server to see the defaults. I am not even sure how I could allow users > to preview their defaults for different available_mem settings. Yep, agreed. And agreed that not being able to preview settings is a problem. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] dynamic shared memory: wherein I am punished for good intentions
On Thu, Oct 10, 2013 at 9:13 AM, Robert Haas wrote: > (2) Default to using System V shared memory. If people want POSIX > shared memory, let them change the default. > After some consideration, I think my vote is for option #2. Wouldn't that become the call of packagers? Wasn't there already some reason why it was advantageous for FreeBSD to continue to use System V shared memory? -- Peter Geoghegan -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] dynamic shared memory: wherein I am punished for good intentions
On Thu, Oct 10, 2013 at 2:21 PM, Andrew Dunstan wrote: >> Other votes? Other ideas? > > 5) test and set it in initdb. Are you advocating for that option, or just calling out that it's possible? I'd say that's closely related to option #3, except at initdb time rather than run-time - and it might be preferable to #3 for some of the same reasons discussed on the thread about tuning work_mem, namely, that having it change from one postmaster lifetime to the next might lead to user astonishment. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Auto-tuning work_mem and maintenance_work_mem
> It also doesn't address my point that, if we are worst-case-scenario > default-setting, we're going to end up with defaults which aren't > materially different from the current defaults. In which case, why even > bother with this whole exercise? Oh, and let me reiterate: the way to optimize work_mem is through an admission control mechanism. -- Josh Berkus PostgreSQL Experts Inc. http://pgexperts.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] dynamic shared memory: wherein I am punished for good intentions
On 10/10/2013 12:13 PM, Robert Haas wrote: Since, as has been previously discussed in this forum on multiple occasions [citation needed], the default System V shared memory limits are absurdly low on many systems, the dynamic shared memory patch defaults to POSIX shared memory, which has often been touted as a superior alternative [citation needed]. Unfortunately, the buildfarm isn't entirely happy with this decision. On buildfarm member anole (HP-UX B.11.31), allocation of dynamic shared memory fails with a "Permission denied" error, and on smew (Debian GNU/Linux 6.0), it fails with "Function not implemented", which according to a forum post[1] I found probably indicates that /dev/shm doesn't mount a tmpfs on that box. What shall we do about this? I see a few options. (1) Define the issue as "not our problem". IOW, as of now, if you want to use PostgreSQL, you've got to either make POSIX shared memory work on your machine, or change the GUC that selects the type of dynamic shared memory used. (2) Default to using System V shared memory. If people want POSIX shared memory, let them change the default. (3) Add a new setting that auto-probes for a type of shared memory that works. Try POSIX first, then System V. Maybe even fall back to mmap'd files if neither of those works. (4) Remove the option to use POSIX shared memory. System V FTW! After some consideration, I think my vote is for option #2. Option #1 seems too user-hostile, especially for a facility that has no in-core users yet, though I can imagine we might want to go that way eventually, especially if we at some point try to dump System V shared memory altogether, as has been proposed. Option #4 seems sad; we know that System V shared memory limits are a problem for plenty of people, so it'd be a shame not to have a way to use the POSIX facilities if they're available. Option #3 is fine as far as it goes, but it adds a significant amount of complexity I'd rather not deal with. Other votes? Other ideas? 5) test and set it in initdb. cheers andrew -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Auto-tuning work_mem and maintenance_work_mem
Bruce, >> That's way low, and frankly it's not worth bothering with this if all >> we're going to get is an incremental increase. In that case, let's just >> set the default to 4MB like Robert suggested. > > Uh, well, 100 backends at 6MB gives us 600MB, and if each backend uses > 3x work_mem, that gives us 1.8GB for total work_mem. This was based on > Andrew's concerns about possible over-commit of work_mem. I can of > course adjust that. That's worst-case-scenario planning -- the 3X work-mem per backend was: a) Solaris and b) data warehousing In a normal OLTP application each backend averages something like 0.25 * work_mem, since many queries use no work_mem at all. It also doesn't address my point that, if we are worst-case-scenario default-setting, we're going to end up with defaults which aren't materially different from the current defaults. In which case, why even bother with this whole exercise? -- Josh Berkus PostgreSQL Experts Inc. http://pgexperts.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] dynamic shared memory: wherein I am punished for good intentions
On Thu, Oct 10, 2013 at 1:13 PM, Robert Haas wrote: > (1) Define the issue as "not our problem". IOW, as of now, if you > want to use PostgreSQL, you've got to either make POSIX shared memory > work on your machine, or change the GUC that selects the type of > dynamic shared memory used. > > (2) Default to using System V shared memory. If people want POSIX > shared memory, let them change the default. > > (3) Add a new setting that auto-probes for a type of shared memory > that works. Try POSIX first, then System V. Maybe even fall back to > mmap'd files if neither of those works. > > (4) Remove the option to use POSIX shared memory. System V FTW! > > After some consideration, I think my vote is for option #2. Option #1 > seems too user-hostile, especially for a facility that has no in-core > users yet, though I can imagine we might want to go that way > eventually, especially if we at some point try to dump System V shared > memory altogether, as has been proposed. Option #4 seems sad; we know > that System V shared memory limits are a problem for plenty of people, > so it'd be a shame not to have a way to use the POSIX facilities if > they're available. Option #3 is fine as far as it goes, but it adds a > significant amount of complexity I'd rather not deal with. > > Other votes? Other ideas? I believe option 2 is not only good for now, but also a necessary previous step in the way to option 3, which I believe should be the goal. So, ship a version with option 2, then implement 3, and make it the default when enough people (using option 2) have successfully tested pg's implementation of POSIX shared memory. -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Auto-tuning work_mem and maintenance_work_mem
On Wed, Oct 9, 2013 at 8:06 AM, Andrew Dunstan wrote: > > On 10/09/2013 10:45 AM, Bruce Momjian wrote: > >> On Wed, Oct 9, 2013 at 04:40:38PM +0200, Pavel Stehule wrote: >> >>> Effectively, if every session uses one full work_mem, you end up >>> with >>> total work_mem usage equal to shared_buffers. >>> >>> We can try a different algorithm to scale up work_mem, but it seems >>> wise >>> to auto-scale it up to some extent based on shared_buffers. >>> >>> >>> In my experience a optimal value of work_mem depends on data and load, >>> so I >>> prefer a work_mem as independent parameter. >>> >> But it still is an independent parameter. I am just changing the default. >> >> > The danger with work_mem especially is that setting it too high can lead > to crashing postgres or your system at some stage down the track, so > autotuning it is kinda dangerous, much more dangerous than autotuning > shared buffers. > Is this common to see? I ask because in my experience, having 100 connections all decide to do large sorts simultaneously is going to make the server fall over, regardless of whether it tries to do them in memory (OOM) or whether it does them with tape sorts (stuck spin locks, usually). > > The assumption that each connection won't use lots of work_mem is also > false, I think, especially in these days of connection poolers. > I don't follow that. Why would using a connection pooler change the multiples of work_mem that a connection would use? Cheers, Jeff
[HACKERS] Re: dynamic shared memory: wherein I am punished for good intentions
Robert Haas wrote > Unfortunately, the buildfarm > isn't entirely happy with this decision. On buildfarm member anole > (HP-UX B.11.31), allocation of dynamic shared memory fails with a > "Permission denied" error, and on smew (Debian GNU/Linux 6.0), it > fails with "Function not implemented", which according to a forum > post[1] I found probably indicates that /dev/shm doesn't mount a tmpfs > on that box. > > What shall we do about this? I see a few options. Is this something that rightly falls into being a distro/package specific setting? If so then the first goal should be to ensure the maximum number of successful basic installation scenarios - namely someone installing PostgreSQL and connect to the running postgres database without encountering an error. As a default I would presume the current System V behavior is sufficient to accomplish this goal. If package maintainers can then guarantee that changing the default will improve the user experience they should be supported and encouraged to do so but if they are at all unsure they should leave the default in place. As long as a new user is able to get a running database on their machine if/when they run up against the low defaults of System V memory they will at least be able to focus on that single problem as opposed to having a failed initial install and being unsure exactly what they may have done wrong. Thus option # 2 seems sufficient. I do think that having some kind of shared-memory-manager utility could have value but I'd rather see that be a standalone utility as opposed to something magical done inside the bowels of the database. While probably harder to code and learn such a utility would provide for a much greater UX if implemented well. David J. -- View this message in context: http://postgresql.1045698.n5.nabble.com/dynamic-shared-memory-wherein-I-am-punished-for-good-intentions-tp5774055p5774080.html Sent from the PostgreSQL - hackers mailing list archive at Nabble.com. -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Auto-tuning work_mem and maintenance_work_mem
Bruce, * Bruce Momjian (br...@momjian.us) wrote: > On Thu, Oct 10, 2013 at 12:00:54PM -0400, Stephen Frost wrote: > > I'm really not impressed with this argument. Either the user is going > > to go and modify the config file, in which case I would hope that they'd > > at least glance around at what they should change, or they're going to > > move off PG because it's not performing well enough for them- which is > > really what I'm trying to avoid happening during the first 15m. > > Well, they aren't going around and looking at other parameters now or we > would not feel a need to auto-tune many of our defaults. I think you're confusing things here. There's a huge difference between "didn't configure anything and got our defaults" and "went and changed only one thing in postgresql.conf". For one thing, we have a ton of the former. Perhaps there are some of the latter as well, but I would argue it's a pretty small group. > How do we handle the Python dependency, or is this all to be done in > some other language? I certainly am not ready to take on that job. I agree that we can't add a Python (or really, perl) dependency, but I don't think there's anything terribly complicated in what pgtune is doing that couldn't be pretty easily done in C.. > One nice thing about a tool is that you can see your auto-tuned defaults > right away, while doing this in the backend, you have to start the > server to see the defaults. I am not even sure how I could allow users > to preview their defaults for different available_mem settings. Agreed. > > Actually, it *is* good, as Magnus pointed out. Changing a completely > > unrelated parameter shouldn't make all of your plans suddenly change. > > This is mollified, but only a bit, if you have a GUC that's explicitly > > "this changes other GUCs", but I'd much rather have a tool that can do a > > better job to begin with and which helps the user understand what > > parameters are available to change and why there's more than one. > > Well, the big question is how many users are going to use the tool, as > we are not setting this up for experts, but for novices. The goal would be to have the distros and/or initdb use it for the initial configuration.. Perhaps by using debconf or similar to ask the user, perhaps by just running it and letting it do whatever it wants. > I think one big risk is someone changing shared_buffers and not having > an accurate available_memory. That might lead to some very inaccurate > defaults. Also, what happens if available_memory is not supplied at > all? Do we auto-tune just from shared_buffers, or not autotune at all, > and then what are the defaults? We could certainly throw an error if > shared_buffers > available_memory. We might just ship with > available_memory defaulting to 256MB and auto-tune everything from > there. These questions are less of an issue if we simply don't have this "available_memory" GUC (which strikes me as just adding more confusion for users anyway, not less). If no '--available-memory' (or whatever) option is passed to initdb then we should have it assume some default, yes, but my view on what that is depends on what the specific results are. It sounds like --avail-memory=256MB would end up setting things to about what we have now for defaults, which is alright for shared_buffers, imv, but not for a default work_mem (1MB is *really* small...). Thanks, Stephen signature.asc Description: Digital signature
Re: [HACKERS] pg_stat_statements: calls under-estimation propagation
Daniel Farina escribió: > On Thu, Oct 10, 2013 at 7:40 AM, Fujii Masao wrote: > > In my test, I found that pg_stat_statements.total_time always indicates a > > zero. > > I guess that the patch might handle pg_stat_statements.total_time wrongly. > > > > +values[i++] = DatumGetTimestamp( > > +instr_get_timestamptz(pgss->session_start)); > > +values[i++] = DatumGetTimestamp( > > +instr_get_timestamptz(entry->introduced)); > > > > These should be executed only when detected_version >= PGSS_TUP_LATEST? > > Yes. Oversight. Hmm, shouldn't this be conditional on a new PGSS_TUP_V1_2? I mean, if later somebody patches pgss to have a PGSS_TUP_V2 or V1_3 and that becomes latest, somebody running the current definition with the updated .so will no longer get these values. Or is the intention that PGSS_TUP_LATEST will never be updated again, and future versions will get PGSS_TUP_REALLY_LATEST and PGSS_TUP_REALLY_LATEST_HONEST and so on? I mean, surely we can always come up with new symbols to use, but it seems hard to follow. There's one other use of PGSS_TUP_LATEST here which I think should also be changed to >= SOME_SPECIFIC_VERSION: + if (detected_version >= PGSS_TUP_LATEST) + { + uint64 qid = pgss->private_stat_session_key; + + qid ^= (uint64) entry->query_id; + qid ^= ((uint64) entry->query_id) << 32; + + values[i++] = Int64GetDatumFast(qid); + } This paragraph reads a bit strange to me: + A statistics session is the time period when statistics are gathered by statistics collector + without being reset. So a statistics session continues across normal shutdowns, + but whenever statistics are reset, like during a crash or upgrade, a new time period + of statistics collection commences i.e. a new statistics session. + The query_id value generation is linked to statistics session to emphasize the fact + that whenever statistics are reset,the query_id for the same queries will also change. "time period when"? Shouldn't that be "time period during which". Also, doesn't a new "statistics session" start when a stats reset is invoked by the user? The bit after "commences" appears correct (to me, not a native by any means) but seems also a bit strange. I just noticed that the table describing the output indicates that session_start and introduced are of type text, but the SQL defines timestamptz. The instr_time thingy being used for these times maps to QueryPerformanceCounter() on Windows, and I'm not sure how useful this is for long-term time tracking; see http://stackoverflow.com/questions/5296990/queryperformancecounter-and-overflows#5297163 for instance. I think it'd be better to use TimestampTz and GetCurrentTimestamp() for this. Just noticed that you changed the timer to struct Instrumentation. Not really sure about that change. Since you seem to be using only the start time and counter, wouldn't it be better to store only those? Particularly unsure about passing INSTRUMENT_ALL to InstrAlloc(). -- Álvaro Herrerahttp://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Auto-tuning work_mem and maintenance_work_mem
All, We can't reasonably require user input at initdb time, because most users don't run initdb by hand -- their installer does it for them. So any "tuning" which initdb does needs to be fully automated. So, the question is: can we reasonably determine, at initdb time, how much RAM the system has? I also think this is where the much-debated ALTER SYSTEM SET suddenly becomes valuable. With it, it's reasonable to run a "tune-up" tool on the client side. I do think it's reasonable to tell a user: "Just installed PostgreSQL? Run this command to tune your system:" Mind you, the tuneup tool I'm working on makes use of Python, configuration directory, and Jinga2, so it's not even relevant to the preceeding. -- Josh Berkus PostgreSQL Experts Inc. http://pgexperts.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Compression of full-page-writes
On Wed, Oct 9, 2013 at 1:35 PM, Haribabu kommi wrote: > On 08 October 2013 18:42 KONDO Mitsumasa wrote: >>(2013/10/08 20:13), Haribabu kommi wrote: >>> I will test with sync_commit=on mode and provide the test results. >>OK. Thanks! > > Pgbench test results with synchronous_commit mode as on. Thanks! > Thread-1 > Threads-2 > Head code FPW compressHead > code FPW compress > Pgbench-org 5min138(0.24GB) 131(0.04GB) > 160(0.28GB) 163(0.05GB) > Pgbench-1000 5min 140(0.29GB) 128(0.03GB) > 160(0.33GB) 162(0.02GB) > Pgbench-org 15min 141(0.59GB) 136(0.12GB) > 160(0.65GB) 162(0.14GB) > Pgbench-1000 15min 138(0.81GB) 134(0.11GB) > 159(0.92GB) 162(0.18GB) > > Pgbench-org - original pgbench > Pgbench-1000 - changed pgbench with a record size of 1000. This means that you changed the data type of pgbench_accounts.filler to char(1000)? Regards, -- Fujii Masao -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Compression of full-page-writes
On Tue, Oct 8, 2013 at 10:07 PM, KONDO Mitsumasa wrote: > Hi, > > I tested dbt-2 benchmark in single instance and synchronous replication. Thanks! > Unfortunately, my benchmark results were not seen many differences... > > > * Test server >Server: HP Proliant DL360 G7 >CPU:Xeon E5640 2.66GHz (1P/4C) >Memory: 18GB(PC3-10600R-9) >Disk: 146GB(15k)*4 RAID1+0 >RAID controller: P410i/256MB > > * Result > ** Single instance** > | NOTPM | 90%tile | Average | S.Deviation > +---+-+-+- > no-patched | 3322.93 | 20.469071 | 5.882 | 10.478 > patched | 3315.42 | 19.086105 | 5.669 | 9.108 > > > ** Synchronous Replication ** > | NOTPM | 90%tile | Average | S.Deviation > +---+-+-+- > no-patched | 3275.55 | 21.332866 | 6.072 | 9.882 > patched | 3318.82 | 18.141807 | 5.757 | 9.829 > > ** Detail of result > http://pgstatsinfo.projects.pgfoundry.org/DBT-2_Fujii_patch/ > > > I set full_page_write = compress with Fujii's patch in DBT-2. But it does > not > seems to effect for eleminating WAL files. Could you let me know how much WAL records were generated during each benchmark? I think that this benchmark result clearly means that the patch has only limited effects in the reduction of WAL volume and the performance improvement unless the database contains highly-compressible data like pgbench_accounts.filler. But if we can use other compression algorithm, maybe we can reduce WAL volume very much. I'm not sure what algorithm is good for WAL compression, though. It might be better to introduce the hook for compression of FPW so that users can freely use their compression module, rather than just using pglz_compress(). Thought? Regards, -- Fujii Masao -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Auto-tuning work_mem and maintenance_work_mem
On Thu, Oct 10, 2013 at 10:20:02AM -0700, Josh Berkus wrote: > On 10/09/2013 02:15 PM, Bruce Momjian wrote: > > and for shared_buffers of 2GB: > > > > test=> show shared_buffers; > > shared_buffers > > > > 2GB > > (1 row) > > > > test=> SHOW work_mem; > > work_mem > > -- > > 6010kB > > (1 row) > > Huh? Only 6MB work_mem for 8GB RAM? How'd you get that? > That's way low, and frankly it's not worth bothering with this if all > we're going to get is an incremental increase. In that case, let's just > set the default to 4MB like Robert suggested. Uh, well, 100 backends at 6MB gives us 600MB, and if each backend uses 3x work_mem, that gives us 1.8GB for total work_mem. This was based on Andrew's concerns about possible over-commit of work_mem. I can of course adjust that. Consider 8GB of shared memory is 21MB. -- Bruce Momjian http://momjian.us EnterpriseDB http://enterprisedb.com + Everyone has their own god. + -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Auto-tuning work_mem and maintenance_work_mem
On 10/09/2013 02:15 PM, Bruce Momjian wrote: > and for shared_buffers of 2GB: > > test=> show shared_buffers; >shared_buffers > >2GB > (1 row) > > test=> SHOW work_mem; >work_mem > -- >6010kB > (1 row) Huh? Only 6MB work_mem for 8GB RAM? How'd you get that? That's way low, and frankly it's not worth bothering with this if all we're going to get is an incremental increase. In that case, let's just set the default to 4MB like Robert suggested. -- Josh Berkus PostgreSQL Experts Inc. http://pgexperts.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Auto-tuning work_mem and maintenance_work_mem
> Because 'maintenance' operations were rarer, so we figured we could use > more memory in those cases. Once we brought Autovacuum into core, though, we should have changed that. However, I agree with Magnus that the simple course is to have an autovacuum_worker_memory setting which overrides maint_work_mem if set. -- Josh Berkus PostgreSQL Experts Inc. http://pgexperts.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Auto-tuning work_mem and maintenance_work_mem
On Thu, Oct 10, 2013 at 12:59:39PM -0400, Andrew Dunstan wrote: > > On 10/10/2013 12:45 PM, Bruce Momjian wrote: > >On Thu, Oct 10, 2013 at 12:39:04PM -0400, Andrew Dunstan wrote: > >>On 10/10/2013 12:28 PM, Bruce Momjian wrote: > >>>How do we handle the Python dependency, or is this all to be done in > >>>some other language? I certainly am not ready to take on that job. > >> > >>Without considering any wider question here, let me just note this: > >> > >>Anything that can be done in this area in Python should be doable in > >>Perl fairly simply. I don't think we should be adding any Python > >>dependencies. For good or ill Perl has been used for pretty much all > >>our complex scripting (pgindent, MSVC build system etc.) > >Yes, but this is a run-time requirement, not build-time, and we have not > >used Perl in that regard. > > > > > Nor Python. If we want to avoid added dependencies, we would need to use C. Yeah. :-( My crazy idea would be, because setting setting available_mem without a restart would not be supported, to allow the backend to output suggestions, e.g.: test=> SHOW available_mem = '24GB'; Auto-tuning values: shared_buffers = 6GB work_mem = 10MB ... ERROR: parameter "available_mem" cannot be changed without restarting the server -- Bruce Momjian http://momjian.us EnterpriseDB http://enterprisedb.com + Everyone has their own god. + -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] pg_stat_statements: calls under-estimation propagation
On Fri, Oct 11, 2013 at 12:19 AM, Daniel Farina wrote: > Probably. > > The idea is that without those fields it's, to wit, impossible to > explain non-monotonic movement in metrics of those queries for precise > use in tools that insist on monotonicity of the fields, which are all > cumulative to date I think. > > pg_stat_statements_reset() or crashing loses the session, so one > expects "ncalls" may decrease. In addition, a query that is bouncing > in and out of the hash table will have its statistics be lost, so its > statistics will also decrease. This can cause un-resolvable artifact > in analysis tools. > > The two fields allow for precise understanding of when the statistics > for a given query_id are continuous since the last time a program > inspected it. Thanks for elaborating them! Since 'introduced' is reset even when the statistics is reset, maybe we can do without 'session_start' for that purpose? >> +/* >> + * The role calling this function is unable to see >> + * sensitive aspects of this tuple. >> + * >> + * Nullify everything except the "insufficient privilege" >> + * message for this entry >> + */ >> +memset(nulls, 1, sizeof nulls); >> + >> +nulls[i] = 0; >> +values[i] = CStringGetTextDatum(""); >> >> Why do we need to hide *all* the fields in pg_stat_statements, when >> it's accessed by improper user? This is a big change of pg_stat_statements >> behavior, and it might break the compatibility. > > It seems like an information leak that grows more major if query_id is > exposed and, at any point, one can determine the query_id for a query > text. So hiding only query and query_id is enough? Regards, -- Fujii Masao -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Auto-tuning work_mem and maintenance_work_mem
On 10/10/2013 12:45 PM, Bruce Momjian wrote: On Thu, Oct 10, 2013 at 12:39:04PM -0400, Andrew Dunstan wrote: On 10/10/2013 12:28 PM, Bruce Momjian wrote: How do we handle the Python dependency, or is this all to be done in some other language? I certainly am not ready to take on that job. Without considering any wider question here, let me just note this: Anything that can be done in this area in Python should be doable in Perl fairly simply. I don't think we should be adding any Python dependencies. For good or ill Perl has been used for pretty much all our complex scripting (pgindent, MSVC build system etc.) Yes, but this is a run-time requirement, not build-time, and we have not used Perl in that regard. Nor Python. If we want to avoid added dependencies, we would need to use C. cheers andrew -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Re: [BUGS] BUG #7873: pg_restore --clean tries to drop tables that don't exist
On 09/19/2013 06:12 PM, Pavel Stehule wrote: 2013/9/16 Satoshi Nagayasu mailto:sn...@uptime.jp>> I'm looking at this patch, and I have a question here. Should "DROP TRIGGER IF EXISTS" ignore error for non-existing trigger and non-existing table? Or just only for non-existing trigger? My opinion is so, both variants should be ignored - it should be fully fault tolerant in this use case. This thread seems to have gone cold, but I'm inclined to agree with Pavel. If the table doesn't exist, neither does the trigger, and the whole point of the 'IF EXISTS' variants is to provide the ability to issue DROP commands that don't fail if their target is missing. cheers andrew -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Auto-tuning work_mem and maintenance_work_mem
On Thu, Oct 10, 2013 at 12:39:04PM -0400, Andrew Dunstan wrote: > > On 10/10/2013 12:28 PM, Bruce Momjian wrote: > > > >How do we handle the Python dependency, or is this all to be done in > >some other language? I certainly am not ready to take on that job. > > > Without considering any wider question here, let me just note this: > > Anything that can be done in this area in Python should be doable in > Perl fairly simply. I don't think we should be adding any Python > dependencies. For good or ill Perl has been used for pretty much all > our complex scripting (pgindent, MSVC build system etc.) Yes, but this is a run-time requirement, not build-time, and we have not used Perl in that regard. -- Bruce Momjian http://momjian.us EnterpriseDB http://enterprisedb.com + Everyone has their own god. + -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Auto-tuning work_mem and maintenance_work_mem
On 10/10/2013 12:28 PM, Bruce Momjian wrote: How do we handle the Python dependency, or is this all to be done in some other language? I certainly am not ready to take on that job. Without considering any wider question here, let me just note this: Anything that can be done in this area in Python should be doable in Perl fairly simply. I don't think we should be adding any Python dependencies. For good or ill Perl has been used for pretty much all our complex scripting (pgindent, MSVC build system etc.) cheers andrew -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [COMMITTERS] pgsql: Add record_image_ops opclass for matview concurrent refresh.
Kevin Grittner wrote: > Add record_image_ops opclass for matview concurrent refresh. The buildfarm pointed out that I had not handled pass-by-value data types correctly. Fixed based on advice from Robert. We'll see whether that clears up the part of the buildfarm breakage attributed to this patch. -- Kevin Grittner EDB: http://www.enterprisedb.com The Enterprise PostgreSQL Companydiff --git a/src/backend/utils/adt/rowtypes.c b/src/backend/utils/adt/rowtypes.c index 0bcbb5d..7925ce2 100644 --- a/src/backend/utils/adt/rowtypes.c +++ b/src/backend/utils/adt/rowtypes.c @@ -1464,9 +1464,8 @@ record_image_cmp(PG_FUNCTION_ARGS) } else if (tupdesc1->attrs[i1]->attbyval) { -cmpresult = memcmp(&(values1[i1]), - &(values2[i2]), - tupdesc1->attrs[i1]->attlen); +if (values1[i1] != values2[i2]) + cmpresult = (values1[i1] < values2[i2]) ? -1 : 1; } else { @@ -1695,9 +1694,7 @@ record_image_eq(PG_FUNCTION_ARGS) } else if (tupdesc1->attrs[i1]->attbyval) { -result = (memcmp(&(values1[i1]), - &(values2[i2]), - tupdesc1->attrs[i1]->attlen) == 0); +result = (values1[i1] == values2[i2]); } else { -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Patch for reserved connections for replication users
On Thu, 10 Oct 2013 09:55:24 +0530 Amit Kapila wrote: > On Thu, Oct 10, 2013 at 3:17 AM, Gibheer > wrote: > > On Mon, 7 Oct 2013 11:39:55 +0530 > > Amit Kapila wrote: > >> Robert Haas wrote: > >> On Mon, Aug 5, 2013 at 2:04 AM, Andres Freund > >> wrote: > >> >>> Hmm. It seems like this match is making MaxConnections no > >> >>> longer mean the maximum number of connections, but rather the > >> >>> maximum number of non-replication connections. I don't think > >> >>> I support that definitional change, and I'm kinda surprised if > >> >>> this is sufficient to implement it anyway (e.g. see > >> >>> InitProcGlobal()). > >> > > >> >> I don't think the implementation is correct, but why don't you > >> >> like the definitional change? The set of things you can do from > >> >> replication connections are completely different from a normal > >> >> connection. So using separate "pools" for them seems to make > >> >> sense. That they end up allocating similar internal data seems > >> >> to be an implementation detail to me. > >> > >> > Because replication connections are still "connections". If I > >> > tell the system I want to allow 100 connections to the server, > >> > it should allow 100 connections, not 110 or 95 or any other > >> > number. > >> > >> I think that to reserve connections for replication, mechanism > >> similar to superuser_reserved_connections be used rather than auto > >> vacuum workers or background workers. > >> This won't change the definition of MaxConnections. Another thing > >> is that rather than introducing new parameter for replication > >> reserved connections, it is better to use max_wal_senders as it > >> can serve the purpose. > >> > >> Review for replication_reserved_connections-v2.patch, considering > >> we are going to use mechanism similar to > >> superuser_reserved_connections and won't allow definition of > >> MaxConnections to change. > >> > >> 1. /* the extra unit accounts for the autovacuum launcher */ > >> MaxBackends = MaxConnections + autovacuum_max_workers + 1 + > >> - + max_worker_processes; > >> + + max_worker_processes + max_wal_senders; > >> > >> Above changes are not required. > >> > >> 2. > >> + if ((!am_superuser && !am_walsender) && > >> ReservedBackends > 0 && > >> !HaveNFreeProcs(ReservedBackends)) > >> > >> Change the check as you have in patch-1 for > >> ReserveReplicationConnections > >> > >> if (!am_superuser && > >> (max_wal_senders > 0 || ReservedBackends > 0) && > >> !HaveNFreeProcs(max_wal_senders + ReservedBackends)) > >> ereport(FATAL, > >> (errcode(ERRCODE_TOO_MANY_CONNECTIONS), > >> errmsg("remaining connection slots are reserved for replication and > >> superuser connections"))); > >> > >> 3. In guc.c, change description of max_wal_senders similar to > >> superuser_reserved_connections at below place: > >>{"max_wal_senders", PGC_POSTMASTER, REPLICATION_SENDING, > >> gettext_noop("Sets the maximum number of simultaneously running WAL > >> sender processes."), > >> > >> 4. With this approach, there is no need to change > >> InitProcGlobal(), as it is used to keep track bgworkerFreeProcs > >> and autovacFreeProcs, for others it use freeProcs. > >> > >> 5. Below description in config.sgml needs to be changed for > >> superuser_reserved_connections to include the effect of > >> max_wal_enders in reserved connections. > >> Whenever the number of active concurrent connections is at least > >> max_connections minus superuser_reserved_connections, new > >> connections will be accepted only for superusers, and no new > >> replication connections will be accepted. > >> > >> 6. Also similar description should be added to max_wal_senders > >> configuration parameter. > >> > >> 7. Below comment needs to be updated, as it assumes only superuser > >> reserved connections as part of MaxConnections limit. > >>/* > >> * ReservedBackends is the number of backends reserved for > >> superuser use. > >> * This number is taken out of the pool size given by MaxBackends > >> so > >> * number of backend slots available to non-superusers is > >> * (MaxBackends - ReservedBackends). Note what this really means > >> is > >> * "if there are <= ReservedBackends connections available, only > >> superusers > >> * can make new connections" --- pre-existing superuser connections > >> don't > >> * count against the limit. > >> */ > >> int ReservedBackends; > >> > >> 8. Also we can add comment on top of definition for max_wal_senders > >> similar to ReservedBackends. > >> > >> > >> With Regards, > >> Amit Kapila. > >> EnterpriseDB: http://www.enterprisedb.com > >> > > > > Hi, > > > > I took the time and reworked the patch with the feedback till now. > > Thank you very much Amit! > >Is there any specific reason why you moved this patch to next > CommitFest? > > With Regards, > Amit Kapila. > EnterpriseDB: http://www.enterprisedb.com > Mike asked me about the status of the patch a couple days back and I didn't think I would be able to work on the pa
Re: [HACKERS] Auto-tuning work_mem and maintenance_work_mem
On Thu, Oct 10, 2013 at 12:00:54PM -0400, Stephen Frost wrote: > * Bruce Momjian (br...@momjian.us) wrote: > > Well, I like the idea of initdb calling the tool, though the tool then > > would need to be in C probably as we can't require python for initdb. > > The tool would not address Robert's issue of someone increasing > > shared_buffers on their own. > > I'm really not impressed with this argument. Either the user is going > to go and modify the config file, in which case I would hope that they'd > at least glance around at what they should change, or they're going to > move off PG because it's not performing well enough for them- which is > really what I'm trying to avoid happening during the first 15m. Well, they aren't going around and looking at other parameters now or we would not feel a need to auto-tune many of our defaults. How do we handle the Python dependency, or is this all to be done in some other language? I certainly am not ready to take on that job. One nice thing about a tool is that you can see your auto-tuned defaults right away, while doing this in the backend, you have to start the server to see the defaults. I am not even sure how I could allow users to preview their defaults for different available_mem settings. > > In fact, the other constants would still > > be hard-coded in from initdb, which isn't good. > > Actually, it *is* good, as Magnus pointed out. Changing a completely > unrelated parameter shouldn't make all of your plans suddenly change. > This is mollified, but only a bit, if you have a GUC that's explicitly > "this changes other GUCs", but I'd much rather have a tool that can do a > better job to begin with and which helps the user understand what > parameters are available to change and why there's more than one. Well, the big question is how many users are going to use the tool, as we are not setting this up for experts, but for novices. I think one big risk is someone changing shared_buffers and not having an accurate available_memory. That might lead to some very inaccurate defaults. Also, what happens if available_memory is not supplied at all? Do we auto-tune just from shared_buffers, or not autotune at all, and then what are the defaults? We could certainly throw an error if shared_buffers > available_memory. We might just ship with available_memory defaulting to 256MB and auto-tune everything from there. -- Bruce Momjian http://momjian.us EnterpriseDB http://enterprisedb.com + Everyone has their own god. + -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Add json_typeof() and json_is_*() functions.
On 08/06/2013 08:42 AM, Robert Haas wrote: On Fri, Aug 2, 2013 at 8:22 AM, Andrew Tipton wrote: But without json_is_scalar(), the choice is one of these two forms: json_typeof() NOT IN ('object', 'array') json_typeof() IN ('string', 'number', 'boolean', 'null') The first of those is what seemed to make sense to me. The user can always define their own convenience function if they so desire. I don't think we need to bloat the default contents of pg_proc for that. I agree. I have committed a version with just the one function. cheers andrew -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers