[HACKERS] Substituting Checksum Algorithm (was: Enabling Checksums)
Starting a new thread to more narrowly address the topic. Attached is my reorganization of Ants's patch here: http://www.postgresql.org/message-id/CA +CSw_vinyf-w45i=M1m__MpJZY=e8s4nt_knnpebtwjtoa...@mail.gmail.com My changes: * wrest the core FNV algorithm from the specific concerns of a data page - PageCalcChecksum16 mixes the block number, reduces to 16 bits, and avoids the pd_checksum field - the checksum algorithm is just a pure block checksum with a 32-bit result * moved the FNV checksum into a separate file, checksum.c * added Ants's suggested compilation flags for better optimization * slight update to the paragraph in the README that discusses concerns specific to a data page I do have a couple questions/concerns about Ants's patch though: * The README mentions a slight bias; does that come from the mod (2^16-1)? That's how I updated the README, so I wanted to make sure. * How was the FNV_PRIME chosen? * I can't match the individual algorithm step as described in the README to the actual code. And the comments in the README don't make it clear enough which one is right (or maybe they both are, and I'm just tired). The README says: hash = (hash ^ value) * ((hash ^ value) 3) (which is obviously missing the FNV_PRIME factor) and the code says: +#define CHECKSUM_COMP(checksum, value) do {\ + uint32 __tmp = (checksum) ^ (value);\ + (checksum) = __tmp * FNV_PRIME ^ (__tmp 3);\ +} while (0) I'm somewhat on the fence about the shift right. It was discussed in this part of the thread: http://www.postgresql.org/message-id/99343716-5f5a-45c8-b2f6-74b9ba357...@phlo.org I think we should be able to state with a little more clarity in the README why there is a problem with plain FNV-1a, and why this modification is both effective and safe. I'd lean toward simplicity and closer adherence to the published version of the algorithm rather than detecting a few more obscure error patterns. It looks like the modification slows down the algorithm, too. Regards, Jeff Davis *** a/src/backend/storage/page/Makefile --- b/src/backend/storage/page/Makefile *** *** 12,17 subdir = src/backend/storage/page top_builddir = ../../../.. include $(top_builddir)/src/Makefile.global ! OBJS = bufpage.o itemptr.o include $(top_srcdir)/src/backend/common.mk --- 12,22 top_builddir = ../../../.. include $(top_builddir)/src/Makefile.global ! OBJS = bufpage.o checksum.o itemptr.o include $(top_srcdir)/src/backend/common.mk + + # important optimization flags for checksum.c + ifeq ($(GCC),yes) + checksum.o: CFLAGS += -msse4.1 -funroll-loops -ftree-vectorize + endif *** a/src/backend/storage/page/README --- b/src/backend/storage/page/README *** *** 61,63 checksums are enabled. Systems in Hot-Standby mode may benefit from hint bits --- 61,109 being set, but with checksums enabled, a page cannot be dirtied after setting a hint bit (due to the torn page risk). So, it must wait for full-page images containing the hint bit updates to arrive from the master. + + Checksum algorithm + -- + + The algorithm used to checksum pages is chosen for very fast calculation. + Workloads where the database working set fits into OS file cache but not into + shared buffers can read in pages at a very fast pace and the checksum + algorithm itself can become the largest bottleneck. + + The checksum algorithm itself is based on the FNV-1a hash (FNV is shorthand for + Fowler/Noll/Vo) The primitive of a plain FNV-1a hash folds in data 4 bytes at + a time according to the formula: + + hash = (hash ^ value) * FNV_PRIME(16777619) + + PostgreSQL doesn't use FNV-1a hash directly because it has bad mixing of high + bits - high order bits in input data only affect high order bits in output + data. To resolve this we xor in the value prior to multiplication shifted + right by 3 bits. The number 3 was chosen as it is a small odd, prime, and + experimentally provides enough mixing for the high order bits to avalanche + into lower positions. The actual hash formula used as the basis is: + + hash = (hash ^ value) * ((hash ^ value) 3) + + The main bottleneck in this calculation is the multiplication latency. To hide + the latency and to make use of SIMD parallelism multiple hash values are + calculated in parallel. Each hash function uses a different initial value + (offset basis in FNV terminology). The initial values actually used were + chosen randomly, as the values themselves don't matter as much as that they + are different and don't match anything in real data. The page is then treated + as 32 wide array of 32bit values and each column is aggregated according to + the above formula. Finally one more iteration of the formula is performed with + value 0 to mix the bits of the last value added. + + The partial checksums are then aggregated together using xor to form a + 32-bit checksum. The caller can
Re: [HACKERS] Substituting Checksum Algorithm (was: Enabling Checksums)
On Apr23, 2013, at 09:17 , Jeff Davis pg...@j-davis.com wrote: I'd lean toward simplicity and closer adherence to the published version of the algorithm rather than detecting a few more obscure error patterns. It looks like the modification slows down the algorithm, too. The pattern that plain FNV1 misses are not that obscure, unfortunately. With plain FNV1A, the n-th bit of an input word (i.e. 32-bit block) only affects bits n through 31 of the checksum. In particular, the most significant bit of every 32-bit block only affects the MSB of the checksum, making the algorithm miss any even number of flipped MSBs. More generally, any form of data corruption that affects only the top N bits are missed at least once out of 2^N times, since changing only those bits cannot yield more than 2^N different checksum values. Such corruption pattern may not be especially likely, given that we're mainly protecting against disk corruption, not memory corruption. But quantifying how unlikely exactly seems hard, thus providing at least some form of protection against such errors seems prudent. In addition, even with the shift-induced slowdown, FNV1+SHIFT still performs similarly to hardware-accelerated CRC, reaching about 6bytes/cycle on modern Intel CPUS. This really is plenty fast - if I'm not mistaken, it translates to well over 10 GB/s. So overall -1 for removing the shift. best regards, Florian Pflug -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Substituting Checksum Algorithm (was: Enabling Checksums)
On Apr 23, 2013 10:17 AM, Jeff Davis pg...@j-davis.com wrote: Attached is my reorganization of Ants's patch here: http://www.postgresql.org/message-id/CA +CSw_vinyf-w45i=M1m__MpJZY=e8s4nt_knnpebtwjtoa...@mail.gmail.com Thanks for your help. Some notes below. My changes: * wrest the core FNV algorithm from the specific concerns of a data page - PageCalcChecksum16 mixes the block number, reduces to 16 bits, and avoids the pd_checksum field - the checksum algorithm is just a pure block checksum with a 32-bit result * moved the FNV checksum into a separate file, checksum.c I think the function should not be called checksum_fnv as it implies that we use the well known straightforward implementation. Maybe checksum_block or some other generic name. * added Ants's suggested compilation flags for better optimization -msse4.1 is not safe to use in default builds. On the other hand it doesn't hurt to just specify it in CFLAGS for the whole compile (possibly as -march=native). We should just omit it and mention somewhere that SSE4.1 enabled builds will have better checksum performance. * slight update to the paragraph in the README that discusses concerns specific to a data page I do have a couple questions/concerns about Ants's patch though: * The README mentions a slight bias; does that come from the mod (2^16-1)? That's how I updated the README, so I wanted to make sure. Yes. * How was the FNV_PRIME chosen? I still haven't found the actual source for this value. It's specified as the value to use for 32bit FNV-1a. * I can't match the individual algorithm step as described in the README to the actual code. And the comments in the README don't make it clear enough which one is right (or maybe they both are, and I'm just tired). I will try to reword. The README says: hash = (hash ^ value) * ((hash ^ value) 3) (which is obviously missing the FNV_PRIME factor) and the code says: +#define CHECKSUM_COMP(checksum, value) do {\ + uint32 __tmp = (checksum) ^ (value);\ + (checksum) = __tmp * FNV_PRIME ^ (__tmp 3);\ +} while (0) I'm somewhat on the fence about the shift right. It was discussed in this part of the thread: http://www.postgresql.org/message-id/99343716-5f5a-45c8-b2f6-74b9ba357...@phlo.org I think we should be able to state with a little more clarity in the README why there is a problem with plain FNV-1a, and why this modification is both effective and safe. Florian already mentioned why it's effective. I have an intuition why it's safe, will try to come up with a well reasoned argument. Regards, Antd Aasma
Re: [HACKERS] Substituting Checksum Algorithm (was: Enabling Checksums)
On 2013-04-23 00:17:28 -0700, Jeff Davis wrote: + # important optimization flags for checksum.c + ifeq ($(GCC),yes) + checksum.o: CFLAGS += -msse4.1 -funroll-loops -ftree-vectorize + endif I am pretty sure we can't do those unconditionally: - -funroll-loops and -ftree-vectorize weren't always part of gcc afair, so we would need a configure check for those - SSE4.1 looks like a total no-go, its not available everywhere. We *can* add runtime detection of that with gcc fairly easily and one-time if we wan't to go there (later?) using 'ifunc's, but that needs a fair amount of infrastructure work. - We can rely on SSE1/2 on amd64, but I think thats automatically enabled there. Greetings, Andres Freund -- Andres Freund http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] 9.3 Beta1 status report
On 21 April 2013 06:02, Bruce Momjian br...@momjian.us wrote: I am not sure if Tom shared yet, but we are planning to package 9.3 beta1 on April 29, with a release on May 2. Those dates might change, but that is the current plan. I have completed a draft 9.3 release notes, which you can view here: http://momjian.us/pgsql_docs/release-9-3.html I will be working on polishing them for the next ten days, so any feedback, patches, or commits are welcome. I still need to add lots of SGML markup. E.1.3.4.4. VIEWs: * Make simple views auto-updatable (Dean Rasheed) INSTEAD rules are still available for complex views. I think this should refer to INSTEAD OF triggers for complex views, since that is now the recommended way to implement updatable views. I think it should also expand a little on what simple means in this context, without going into all the gory details, and mention that there is a behaviour change. So perhaps something like this for the second paragraph: Simple views that select some or all columns from a single base table are now updatable by default. More complex views can be made updatable using INSTEAD OF triggers or INSTEAD rules. Regards, Dean -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] Couple of issues with pg_xlogdump
Hello, I was playing with pg_xlogdump in the HEAD and found a few issues. 1. Tried compiling pg_xlogdump via PGXS mechanism and it fails with the following error: make: *** No rule to make target `/home/pavan.deolasee/work/pgsql/postgresql/install/lib/pgxs/src/makefiles/../../src/backend/access/transam/xlogreader.c', needed by `xlogreader.c'. Stop. There are no issues if the sources are compiled directly inside the contrib module 2. I created a fresh database cluster, created a table and COPY IN a million records in the table and then stopped the server. I then tried to dump the xlog files from pg_xlog directory. [pavan.deolasee@puppetserver pg_xlogdump]$ ls ~/db93head/pg_xlog/ 00010004 00010005 00010006 00010007 00010008 archive_status [pavan.deolasee@puppetserver pg_xlogdump]$ ./pg_xlogdump ~/db93head/pg_xlog/00010005 pg_xlogdump: FATAL: could not find a valid record after 0/500 [pavan.deolasee@puppetserver pg_xlogdump]$ ./pg_xlogdump ~/db93head/pg_xlog/00010006 pg_xlogdump: FATAL: could not find a valid record after 0/600 [pavan.deolasee@puppetserver pg_xlogdump]$ ./pg_xlogdump ~/db93head/pg_xlog/00010007 pg_xlogdump: FATAL: could not find a valid record after 0/700 [pavan.deolasee@puppetserver pg_xlogdump]$ ./pg_xlogdump ~/db93head/pg_xlog/00010008 pg_xlogdump: FATAL: could not find a valid record after 0/800 So pg_xlogdump gives error for all WAL files except the first one. Should it not have printed the WAL records from these files ? The first file prints ok with this at the end: [pavan.deolasee@puppetserver pg_xlogdump]$ ./pg_xlogdump ~/db93head/pg_xlog/00010004 | tail -n 2 pg_xlogdump: FATAL: error in WAL record at 0/4C7F208: record with zero length at 0/4C7F270 rmgr: XLOGlen (rec/tot): 72/ 104, tx: 0, lsn: 0/04C7F1A0, prev 0/04C7F170, bkp: , desc: checkpoint: redo 0/4400D70; tli 1; prev tli 1; fpw true; xid 0/1807; oid 24576; multi 1; offset 0; oldest xid 1795 in DB 1; oldest multi 1 in DB 1; oldest running xid 0; online rmgr: XLOGlen (rec/tot): 72/ 104, tx: 0, lsn: 0/04C7F208, prev 0/04C7F1A0, bkp: , desc: checkpoint: redo 0/4C7F208; tli 1; prev tli 1; fpw true; xid 0/1807; oid 16387; multi 1; offset 0; oldest xid 1795 in DB 1; oldest multi 1 in DB 1; oldest running xid 0; shutdown But even this is a bit surprising because I would have expected the shutdown checkpoint WAL record in the last WAL file. Just to note, I stopped the server only once at the end of loading the data. I hope I am not doing something terribly wrong here. 3. The usage of pg_xlogdump shows this: Usage: pg_xlogdump [OPTION] [STARTSEG [ENDSEG]] Looking at the usage, one might feel that the STARTSEG and ENDSEG both are optional. But if I try to invoke pg_xlogdump without any argument, it fails. Thanks, Pavan -- Pavan Deolasee http://www.linkedin.com/in/pavandeolasee
Re: [HACKERS] Couple of issues with pg_xlogdump
On 2013-04-23 14:51:05 +0530, Pavan Deolasee wrote: Hello, I was playing with pg_xlogdump in the HEAD and found a few issues. 1. Tried compiling pg_xlogdump via PGXS mechanism and it fails with the following error: make: *** No rule to make target `/home/pavan.deolasee/work/pgsql/postgresql/install/lib/pgxs/src/makefiles/../../src/backend/access/transam/xlogreader.c', needed by `xlogreader.c'. Stop. There are no issues if the sources are compiled directly inside the contrib module Yes, its not supposed to work. In some previous thread I was suggesting to write out an explicit error but the reactions where mixed, so I didn't pursue it further. I guess I should submit something more than a POC patch then... 2. I created a fresh database cluster, created a table and COPY IN a million records in the table and then stopped the server. I then tried to dump the xlog files from pg_xlog directory. [pavan.deolasee@puppetserver pg_xlogdump]$ ls ~/db93head/pg_xlog/ 00010004 00010005 00010006 00010007 00010008 archive_status [pavan.deolasee@puppetserver pg_xlogdump]$ ./pg_xlogdump ~/db93head/pg_xlog/00010005 pg_xlogdump: FATAL: could not find a valid record after 0/500 [pavan.deolasee@puppetserver pg_xlogdump]$ ./pg_xlogdump ~/db93head/pg_xlog/00010006 pg_xlogdump: FATAL: could not find a valid record after 0/600 [pavan.deolasee@puppetserver pg_xlogdump]$ ./pg_xlogdump ~/db93head/pg_xlog/00010007 pg_xlogdump: FATAL: could not find a valid record after 0/700 [pavan.deolasee@puppetserver pg_xlogdump]$ ./pg_xlogdump ~/db93head/pg_xlog/00010008 pg_xlogdump: FATAL: could not find a valid record after 0/800 So pg_xlogdump gives error for all WAL files except the first one. Should it not have printed the WAL records from these files ? Probably not. Those are likely renamed wal files that do not yet contain valid data. The first file prints ok with this at the end: [pavan.deolasee@puppetserver pg_xlogdump]$ ./pg_xlogdump ~/db93head/pg_xlog/00010004 | tail -n 2 pg_xlogdump: FATAL: error in WAL record at 0/4C7F208: record with zero length at 0/4C7F270 Which this confirms. This is likely the current end of wal. If you look at pg_current_xlog_location() after starting the server again, it should show an address nearby? rmgr: XLOGlen (rec/tot): 72/ 104, tx: 0, lsn: 0/04C7F1A0, prev 0/04C7F170, bkp: , desc: checkpoint: redo 0/4400D70; tli 1; prev tli 1; fpw true; xid 0/1807; oid 24576; multi 1; offset 0; oldest xid 1795 in DB 1; oldest multi 1 in DB 1; oldest running xid 0; online rmgr: XLOGlen (rec/tot): 72/ 104, tx: 0, lsn: 0/04C7F208, prev 0/04C7F1A0, bkp: , desc: checkpoint: redo 0/4C7F208; tli 1; prev tli 1; fpw true; xid 0/1807; oid 16387; multi 1; offset 0; oldest xid 1795 in DB 1; oldest multi 1 in DB 1; oldest running xid 0; shutdown But even this is a bit surprising because I would have expected the shutdown checkpoint WAL record in the last WAL file. Just to note, I stopped the server only once at the end of loading the data. Same reasoning, we rename/recycle wal files into place before they are used. I think its RemoveOldXlogFiles or so that does that. 3. The usage of pg_xlogdump shows this: Usage: pg_xlogdump [OPTION] [STARTSEG [ENDSEG]] Looking at the usage, one might feel that the STARTSEG and ENDSEG both are optional. But if I try to invoke pg_xlogdump without any argument, it fails. It works without either if you use explicit options like -s STARTADDR and -p PATH which is frequently useful to just print a few records at the correct point. I am not sure how could put that in there without making it too complicated. Any suggestions? Thanks for looking at this! Greetings, Andres Freund -- Andres Freund http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Substituting Checksum Algorithm (was: Enabling Checksums)
On Tue, Apr 23, 2013 at 11:47 AM, Andres Freund and...@2ndquadrant.com wrote: On 2013-04-23 00:17:28 -0700, Jeff Davis wrote: + # important optimization flags for checksum.c + ifeq ($(GCC),yes) + checksum.o: CFLAGS += -msse4.1 -funroll-loops -ftree-vectorize + endif I am pretty sure we can't do those unconditionally: - -funroll-loops and -ftree-vectorize weren't always part of gcc afair, so we would need a configure check for those -funroll-loops is available from at least GCC 2.95. -ftree-vectorize is GCC 4.0+. From what I read from the documentation on ICC -axSSE4.1 should generate a plain and accelerated version and do a runtime check., I don't know if ICC vectorizes the specific loop in the patch, but I would expect it to given that Intels vectorization has generally been better than GCCs and the loop is about as simple as it gets. I don't know the relevant options for other compilers. - SSE4.1 looks like a total no-go, its not available everywhere. We *can* add runtime detection of that with gcc fairly easily and one-time if we wan't to go there (later?) using 'ifunc's, but that needs a fair amount of infrastructure work. - We can rely on SSE1/2 on amd64, but I think thats automatically enabled there. This is why I initially went for the lower strength 16bit checksum calculation - requiring only SSE2 would have made supporting the vectorized version on amd64 trivial. By now my feeling is that it's not prudent to compromise in quality to save some infrastructure complexity. If we set a hypothetical VECTORIZATION_FLAGS variable at configure time, the performance is still there for those who need it and can afford CPU specific builds. Regards, Ants Aasma -- Cybertec Schönig Schönig GmbH Gröhrmühlgasse 26 A-2700 Wiener Neustadt Web: http://www.postgresql-support.de -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Couple of issues with pg_xlogdump
On Tue, Apr 23, 2013 at 3:00 PM, Andres Freund and...@2ndquadrant.comwrote: On 2013-04-23 14:51:05 +0530, Pavan Deolasee wrote: Hello, I was playing with pg_xlogdump in the HEAD and found a few issues. 1. Tried compiling pg_xlogdump via PGXS mechanism and it fails with the following error: make: *** No rule to make target `/home/pavan.deolasee/work/pgsql/postgresql/install/lib/pgxs/src/makefiles/../../src/backend/access/transam/xlogreader.c', needed by `xlogreader.c'. Stop. There are no issues if the sources are compiled directly inside the contrib module Yes, its not supposed to work. In some previous thread I was suggesting to write out an explicit error but the reactions where mixed, so I didn't pursue it further. I guess I should submit something more than a POC patch then... Yeah, I think we can print a user friendly error if USE_PGXS is set. Or at least remove its handling from the Makefile Which this confirms. This is likely the current end of wal. If you look at pg_current_xlog_location() after starting the server again, it should show an address nearby? Oh yes, you are right. Again, could there be a better way to report empty WAL files ? A general tendency would be to look at the last few WAL files in case failures or crashes and they are likely to be empty. It works without either if you use explicit options like -s STARTADDR and -p PATH which is frequently useful to just print a few records at the correct point. I am not sure how could put that in there without making it too complicated. Any suggestions? Ah ok. Can we mention these details at in the documentation ? http://www.postgresql.org/docs/devel/static/pgxlogdump.html Thanks, Pavan -- Pavan Deolasee http://www.linkedin.com/in/pavandeolasee
Re: [HACKERS] 9.3 Beta1 status report
On 22.04.2013 23:06, Bruce Momjian wrote: On Mon, Apr 22, 2013 at 10:11:48PM +0300, Heikki Linnakangas wrote: E.1.3.2.1. Write-Ahead Log (WAL) Store WAL in a continuous stream, rather than skipping the last 16MB segment every 4GB (Heikki Linnakangas) BACKWARD COMPATIBLE BREAK Restructure WAL files to better handle timeline changes during recovery (Heikki Linnakangas) Restructure WAL files to use a more compact storage format (Heikki Linnakangas) Can you clarify which commits these came from? The first one is clear (dfda6eba), and I think the 3rd covers commits 20ba5ca6 and 061e7efb1. But what is that second entry? Frankly, I found the WAL and timeline commits all over the place and could hardly make sense of it. I tried to collapse entries into meaningful items, but I need help. Can you suggest changes? Ok: * Don't skip the last 16 MB WAL segment every 4GB, with filename ending in FF (Heikki Linnakangas) BACKWARD COMPATIBLE BREAK * Change WAL record format, allowing the record header to be split across pages. The new format is slightly more compact. (Heikki Linnakangas) In Source Code section: * Use a 64-bit integer to represent WAL positions (XLogRecPtr), instead of two 32-bit integers. (Heikki Linnakangas) Do we usually repeat the changes listed in the backwards compatibility section later, in the Changes section? If not, then instead of the first two items above, let's just have these in the backwards-compatibility section: * The WAL file numbering has changed to not skip WAL files ending with FF. If you have e.g backup / restore scripts that took the skipping into account, they need to be adjusted. * The WAL format has changed. Any external tools that read the WAL files need to be adjusted to understand the new format. The new xlogreader facility helps writing such tools. - Heikki -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Couple of issues with pg_xlogdump
On 2013-04-23 15:16:05 +0530, Pavan Deolasee wrote: Which this confirms. This is likely the current end of wal. If you look at pg_current_xlog_location() after starting the server again, it should show an address nearby? Oh yes, you are right. Again, could there be a better way to report empty WAL files ? A general tendency would be to look at the last few WAL files in case failures or crashes and they are likely to be empty. Hm. I don't really see what we could sensibly and easily do, but perhaps I just lived with it for too long ;) It works without either if you use explicit options like -s STARTADDR and -p PATH which is frequently useful to just print a few records at the correct point. I am not sure how could put that in there without making it too complicated. Any suggestions? Ah ok. Can we mention these details at in the documentation ? http://www.postgresql.org/docs/devel/static/pgxlogdump.html So something like: At least one of STARTSEG, --start and --path or --rmgr=list has to be specified. Greetings, Andres Freund -- Andres Freund http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] REFRESH MATERIALIZED VIEW command in PL block hitting Assert
Hi Tom, Since we are close to release, we should not have crashes like this. Please have a look. My patch may not be correct as I haven't looked closely. Thanks On Mon, Apr 22, 2013 at 7:28 PM, Jeevan Chalke jeevan.cha...@enterprisedb.com wrote: On Mon, Apr 22, 2013 at 6:41 PM, Andres Freund and...@2ndquadrant.comwrote: On 2013-04-22 18:35:04 +0530, Jeevan Chalke wrote: Hi, I have observed that following sequence is causing server crash. CREATE MATERIALIZED VIEW temp_class_mv AS SELECT * FROM pg_class WITH NO DATA; CREATE OR REPLACE FUNCTION test_refresh_mv() RETURNS int AS $$ BEGIN REFRESH MATERIALIZED VIEW temp_class_mv; return 1; END; $$ LANGUAGE plpgsql; SELECT test_refresh_mv(); I had a quick look over the crash and it is hitting following Assert in spi.c: else if (IsA(stmt, RefreshMatViewStmt)) { Assert(strncmp(completionTag, REFRESH MATERIALIZED VIEW , 23) == 0); _SPI_current-processed = strtoul(completionTag + 23, NULL, 10); } It seems like we are missing expected value for completionTag in ExecRefreshMatView() Possibly independent from this issue, but where did that 23 come from? 23 is also bogus here. It should be 26 i.e. length of REFRESH MATERIALIZED VIEW BTW, attached is the patch which works well for me, but need details review. Thanks ISTM we're strtoul()ing EW somenumber here. Greetings, Andres Freund -- Andres Freund http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services -- Jeevan B Chalke Senior Software Engineer, RD EnterpriseDB Corporation The Enterprise PostgreSQL Company Phone: +91 20 30589500 Website: www.enterprisedb.com EnterpriseDB Blog: http://blogs.enterprisedb.com/ Follow us on Twitter: http://www.twitter.com/enterprisedb This e-mail message (and any attachment) is intended for the use of the individual or entity to whom it is addressed. This message contains information from EnterpriseDB Corporation that may be privileged, confidential, or exempt from disclosure under applicable law. If you are not the intended recipient or authorized to receive this for the intended recipient, any use, dissemination, distribution, retention, archiving, or copying of this communication is strictly prohibited. If you have received this e-mail in error, please notify the sender immediately by reply e-mail and delete this message. -- Jeevan B Chalke Senior Software Engineer, RD EnterpriseDB Corporation The Enterprise PostgreSQL Company Phone: +91 20 30589500 Website: www.enterprisedb.com EnterpriseDB Blog: http://blogs.enterprisedb.com/ Follow us on Twitter: http://www.twitter.com/enterprisedb This e-mail message (and any attachment) is intended for the use of the individual or entity to whom it is addressed. This message contains information from EnterpriseDB Corporation that may be privileged, confidential, or exempt from disclosure under applicable law. If you are not the intended recipient or authorized to receive this for the intended recipient, any use, dissemination, distribution, retention, archiving, or copying of this communication is strictly prohibited. If you have received this e-mail in error, please notify the sender immediately by reply e-mail and delete this message.
Re: [HACKERS] REFRESH MATERIALIZED VIEW command in PL block hitting Assert
Hi, On 2013-04-23 17:48:49 +0530, Jeevan Chalke wrote: Hi Tom, Since we are close to release, we should not have crashes like this. Please have a look. My patch may not be correct as I haven't looked closely. Isn't that Kevin's departement? Andres -- Andres Freund http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] REFRESH MATERIALIZED VIEW command in PL block hitting Assert
Andres Freund and...@2ndquadrant.com wrote: On 2013-04-23 17:48:49 +0530, Jeevan Chalke wrote: Hi Tom, Since we are close to release, we should not have crashes like this. Please have a look. My patch may not be correct as I haven't looked closely. Isn't that Kevin's departement? I'll gladly pick this up. Tom had moved some code around and dropped a couple comments that made it sound like he was still working on it, so I was trying to stay out of his way; but if I've misinterpreted that, I can jump in here. -- Kevin Grittner EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] REFRESH MATERIALIZED VIEW command in PL block hitting Assert
On Tue, Apr 23, 2013 at 7:18 AM, Jeevan Chalke jeevan.cha...@enterprisedb.com wrote: Hi Tom, Since we are close to release, we should not have crashes like this. huh? we are not even in beta yet merlin -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Performance with the new security release?
On 13-04-22 11:46 PM, Anne Rosset wrote: Thanks Steve. I have read that a fix has been put in release 9.2.3 for this issue. Is that right? Thanks, Anne No this issue is present in 9.0.13, 9.1.9 and 9.2.4 (as well as 9.2.3). There is talk about fixing this for the next set of minor releases but I haven't yet seen a patch. -Original Message- From: Steve Singer [mailto:st...@ssinger.info] Sent: Monday, April 22, 2013 4:35 PM To: Anne Rosset Cc: pgsql-hackers@postgresql.org Subject: Re: [HACKERS] Performance with the new security release? On 13-04-22 04:41 PM, Anne Rosset wrote: Hi Steve, Yes I see these messages in our log. Is there a solution to this? Thanks, Anne A manual analyze of the effected tables should work and give you updated statistics. If your problem is just statistics then that should help. A manual vacuum will , unfortunately, behave like the auto-vacuum. The only way to get vacuum past this (until this issue is fixed) is for vacuum to be able to get that exclusive lock. If there are times of the day your database is less busy you might have some luck turning off auto-vacuum on these tables and doing manual vacuums during those times. -Original Message- From: Steve Singer [mailto:st...@ssinger.info] Sent: Monday, April 22, 2013 1:26 PM To: Anne Rosset Cc: pgsql-hackers@postgresql.org Subject: Re: [HACKERS] Performance with the new security release? On 13-04-22 04:15 PM, Anne Rosset wrote: Hi Steve, Thanks for your reply. We are now running 9.0.13. Before it was 9.0.7. How can I find out if we are running into this issue: ie if statistics are no longer being updated because analyze can't get the exclusive lock for truncation? This issue is discussed in the thread http://www.postgresql.org/message-id/CAMkU=1xYXOJp=jLAASPdSAqab-HwhA_t nRhy+JUe=4=b=v3...@mail.gmail.com If your seeing messages in your logs of the form: automatic vacuum of table XXX.YYY cannot (re)acquire exclusive lock for truncate scan then you might be hitting this issue. I will dig into our logs to see for the query times. Thanks, Anne -Original Message- From: Steve Singer [mailto:st...@ssinger.info] Sent: Monday, April 22, 2013 12:59 PM To: Anne Rosset Cc: pgsql-hackers@postgresql.org Subject: Re: [HACKERS] Performance with the new security release? On 13-04-22 01:38 PM, Anne Rosset wrote: Hi, We are seeing some overall performance degradation in our application since we installed the security release. Other commits were also done at the same time in the application so we don't know yet if the degradation has any relationship with the security release. While we are digging into this, I would like to know if it is possible that the release has some impact on performance. After reading this It was created as a side effect of a refactoring effort to make establishing new connections to a PostgreSQL server faster, and the associated code more maintainable., I am thinking it is quite possible. Please let me know. Thanks, Exactly which version of PostgreSQL are you running? (we released security update releases for multiple PG versions). Also which version were you running before? There were some changes to analyze/vacuum in the previous set of minor releases that could cause performance issues in some cases (ie if statistics are no longer being updated because analyze can't get the exclusive lock for truncation). There might be other unintended performance related changes. Are all queries taking longer or only some? Can you find any sort of pattern that might help narrow the issue? Steve Anne -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] putting a bgworker to rest
Hi all, I noticed the need to simply stop a bgworker after its work is done but still have it restart in unusual circumstances like a crash. Obviously I can just have it enter a loop where it checks its latch and such, but that seems a bit pointless. Would it make sense to add an extra return value or such for that? Greetings, Andres Freund -- Andres Freund http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] REFRESH MATERIALIZED VIEW command in PL block hitting Assert
On Tue, Apr 23, 2013 at 7:01 PM, Merlin Moncure mmonc...@gmail.com wrote: On Tue, Apr 23, 2013 at 7:18 AM, Jeevan Chalke jeevan.cha...@enterprisedb.com wrote: Hi Tom, Since we are close to release, we should not have crashes like this. huh? we are not even in beta yet I mean, beta release. BTW, as per Bruce's mail we are planning to package 9.3 beta1 on *April 29*, with a release on May 2, we are close enough. Thanks merlin -- Jeevan B Chalke Senior Software Engineer, RD EnterpriseDB Corporation The Enterprise PostgreSQL Company Phone: +91 20 30589500 Website: www.enterprisedb.com EnterpriseDB Blog: http://blogs.enterprisedb.com/ Follow us on Twitter: http://www.twitter.com/enterprisedb This e-mail message (and any attachment) is intended for the use of the individual or entity to whom it is addressed. This message contains information from EnterpriseDB Corporation that may be privileged, confidential, or exempt from disclosure under applicable law. If you are not the intended recipient or authorized to receive this for the intended recipient, any use, dissemination, distribution, retention, archiving, or copying of this communication is strictly prohibited. If you have received this e-mail in error, please notify the sender immediately by reply e-mail and delete this message.
Re: [HACKERS] Performance with the new security release?
Thanks Steve. I found this: http://www.postgresql.org/docs/current/static/release-9-2-3.html Fix performance problems with autovacuum truncation in busy workloads (Jan Wieck) Truncation of empty pages at the end of a table requires exclusive lock, but autovacuum was coded to fail (and release the table lock) when there are conflicting lock requests. Under load, it is easily possible that truncation would never occur, resulting in table bloat. Fix by performing a partial truncation, releasing the lock, then attempting to re-acquire the lock and continue. This fix also greatly reduces the average time before autovacuum releases the lock after a conflicting request arrives. So that is not the fix? (Sorry to ask a second time but I really need to make sure). Thanks, Anne -Original Message- From: Steve Singer [mailto:st...@ssinger.info] Sent: Tuesday, April 23, 2013 6:33 AM To: Anne Rosset Cc: pgsql-hackers@postgresql.org Subject: Re: [HACKERS] Performance with the new security release? On 13-04-22 11:46 PM, Anne Rosset wrote: Thanks Steve. I have read that a fix has been put in release 9.2.3 for this issue. Is that right? Thanks, Anne No this issue is present in 9.0.13, 9.1.9 and 9.2.4 (as well as 9.2.3). There is talk about fixing this for the next set of minor releases but I haven't yet seen a patch. -Original Message- From: Steve Singer [mailto:st...@ssinger.info] Sent: Monday, April 22, 2013 4:35 PM To: Anne Rosset Cc: pgsql-hackers@postgresql.org Subject: Re: [HACKERS] Performance with the new security release? On 13-04-22 04:41 PM, Anne Rosset wrote: Hi Steve, Yes I see these messages in our log. Is there a solution to this? Thanks, Anne A manual analyze of the effected tables should work and give you updated statistics. If your problem is just statistics then that should help. A manual vacuum will , unfortunately, behave like the auto-vacuum. The only way to get vacuum past this (until this issue is fixed) is for vacuum to be able to get that exclusive lock. If there are times of the day your database is less busy you might have some luck turning off auto-vacuum on these tables and doing manual vacuums during those times. -Original Message- From: Steve Singer [mailto:st...@ssinger.info] Sent: Monday, April 22, 2013 1:26 PM To: Anne Rosset Cc: pgsql-hackers@postgresql.org Subject: Re: [HACKERS] Performance with the new security release? On 13-04-22 04:15 PM, Anne Rosset wrote: Hi Steve, Thanks for your reply. We are now running 9.0.13. Before it was 9.0.7. How can I find out if we are running into this issue: ie if statistics are no longer being updated because analyze can't get the exclusive lock for truncation? This issue is discussed in the thread http://www.postgresql.org/message-id/CAMkU=1xYXOJp=jLAASPdSAqab-HwhA_ t nRhy+JUe=4=b=v3...@mail.gmail.com If your seeing messages in your logs of the form: automatic vacuum of table XXX.YYY cannot (re)acquire exclusive lock for truncate scan then you might be hitting this issue. I will dig into our logs to see for the query times. Thanks, Anne -Original Message- From: Steve Singer [mailto:st...@ssinger.info] Sent: Monday, April 22, 2013 12:59 PM To: Anne Rosset Cc: pgsql-hackers@postgresql.org Subject: Re: [HACKERS] Performance with the new security release? On 13-04-22 01:38 PM, Anne Rosset wrote: Hi, We are seeing some overall performance degradation in our application since we installed the security release. Other commits were also done at the same time in the application so we don't know yet if the degradation has any relationship with the security release. While we are digging into this, I would like to know if it is possible that the release has some impact on performance. After reading this It was created as a side effect of a refactoring effort to make establishing new connections to a PostgreSQL server faster, and the associated code more maintainable., I am thinking it is quite possible. Please let me know. Thanks, Exactly which version of PostgreSQL are you running? (we released security update releases for multiple PG versions). Also which version were you running before? There were some changes to analyze/vacuum in the previous set of minor releases that could cause performance issues in some cases (ie if statistics are no longer being updated because analyze can't get the exclusive lock for truncation). There might be other unintended performance related changes. Are all queries taking longer or only some? Can you find any sort of pattern that might help narrow the issue? Steve Anne -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] REFRESH MATERIALIZED VIEW command in PL block hitting Assert
On 2013-04-23 19:33:24 +0530, Jeevan Chalke wrote: On Tue, Apr 23, 2013 at 7:01 PM, Merlin Moncure mmonc...@gmail.com wrote: On Tue, Apr 23, 2013 at 7:18 AM, Jeevan Chalke jeevan.cha...@enterprisedb.com wrote: Hi Tom, Since we are close to release, we should not have crashes like this. huh? we are not even in beta yet I mean, beta release. BTW, as per Bruce's mail we are planning to package 9.3 beta1 on *April 29*, with a release on May 2, we are close enough. The 2nd May date is for the release of packaged beta, not overall release unless I missed something major like months of testing ;) Andres Freund -- Andres Freund http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] high io BUT huge amount of free memory
On 04/22/2013 05:12 PM, Merlin Moncure wrote: free -g total used free sharedbuffers cached Mem: 378250128 0 0229 -/+ buffers/cache: 20357 This is most likely a NUMA issue. There really seems to be some kind of horrible flaw in the Linux kernel when it comes to properly handling NUMA on large memory systems. What does this say: numactl --hardware -- Shaun Thomas OptionsHouse | 141 W. Jackson Blvd. | Suite 500 | Chicago IL, 60604 312-676-8870 stho...@optionshouse.com __ See http://www.peak6.com/email_disclaimer/ for terms and conditions related to this email -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Enabling Checksums
On 23 April 2013 02:35, Jeff Davis pg...@j-davis.com wrote: On Tue, 2013-04-23 at 01:08 +0300, Ants Aasma wrote: A slight delay, but here it is. I didn't lift the checksum part into a separate file as I didn't have a great idea what I would call it. The code is reasonably compact so I don't see a great need for this right now. It would be more worth the effort when/if we add non-generic variants. I'm not particularly attached to the method I used to mask out pd_checksum field, this could be improved if someone has a better idea how to structure the code. Thank you. A few initial comments: I have attached (for illustration purposes only) a patch on top of yours that divides the responsibilities a little more cleanly. * easier to move into a separate file, and use your recommended compiler flags without affecting other routines in bufpage.c * makes the checksum algorithm itself simpler * leaves the data-page-specific aspects (mixing in the page number, ignoring pd_checksum, reducing to 16 bits) to PageCalcChecksum16 * overall easier to review and understand I'm not sure what we should call the separate file or where we should put it, though. How about src/backend/utils/checksum/checksum_fnv.c? Is there a clean way to override the compiler flags for a single file so we don't need to put it in its own directory? OK, I like that a lot better and it seems something I could commit. I suggest the following additional changes... * put the README stuff directly in the checksum.c file * I think we need some external links that describe this algorithm and we need comments that explain what we know about this in terms of detection capability and why it was chosen against others * We need some comments/analysis about whether the coding causes a problem if vectorization is *not* available * make the pg_control.data_checksums field into a version number, for future flexibility... patch attached * rename the routine away from checksum_fnv so its simply a generic checksum call - more modular. That way all knowledge of the algorithm is in one file only. If we do need to change the algorithm in the future we can more easily support multiple versions. -- Simon Riggs http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services checksums_version.v1.patch Description: Binary data -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] Proposal to add --single-row to psql
psql currently collects the query result rows in memory before writing them to a file and can cause out of memory problems for large results in low memory environments like ec2. I can't use COPY TO STDOUT or FETCH_COUNT since I'm using Redshift and it doesn't support [writing to STDOUT]( http://docs.aws.amazon.com/redshift/latest/dg/r_COPY.html) or [CURSOR]( https://forums.aws.amazon.com/thread.jspa?threadID=122664). [Single Row Mode is available in Postgres 9.2]( http://www.postgresql.org/docs/9.2/static/libpq-single-row-mode.html) but [it doesn't look like]( http://www.postgresql.org/docs/9.2/static/app-psql.html) you can tell psql to use single row mode when writing to a file (using --output). I'm proposing to add a --single-row option to psql that would allow the result rows of a query to be streamed to a file without collecting them in memory first. I'm new to the postgres source, but I was considering doing this by adding an elseif at [this line in bin/psql/common.c]( https://github.com/postgres/postgres/blob/master/src/bin/psql/common.c#L955) that would call [PQsetSingleRowMode]( https://github.com/postgres/postgres/blob/master/src/interfaces/libpq/fe-exec.c#L1581) and ideally use something very similar to [ExecQueryUsingCursor]( https://github.com/postgres/postgres/blob/master/src/bin/psql/common.c#L1081 ) Please let me know if that would be an acceptable addition and if there's anything in particular I should be aware of when adding the feature. Thank you, Christopher
Re: [HACKERS] Performance with the new security release?
On 13-04-23 10:04 AM, Anne Rosset wrote: Thanks Steve. I found this: http://www.postgresql.org/docs/current/static/release-9-2-3.html Fix performance problems with autovacuum truncation in busy workloads (Jan Wieck) Truncation of empty pages at the end of a table requires exclusive lock, but autovacuum was coded to fail (and release the table lock) when there are conflicting lock requests. Under load, it is easily possible that truncation would never occur, resulting in table bloat. Fix by performing a partial truncation, releasing the lock, then attempting to re-acquire the lock and continue. This fix also greatly reduces the average time before autovacuum releases the lock after a conflicting request arrives. So that is not the fix? No, that is the change that caused this problem. That fix addresses a slightly different set of symptoms where the truncate as part of an auto-vacuum doesn't happen because the lock gets pre-empted. An unintended/undesirable consequence of that fix was that it means if vacuum can't do the truncate stage because it can't obtain the lock in the first place then statistics don't get updated. (Sorry to ask a second time but I really need to make sure). Thanks, Anne -Original Message- From: Steve Singer [mailto:st...@ssinger.info] Sent: Tuesday, April 23, 2013 6:33 AM To: Anne Rosset Cc: pgsql-hackers@postgresql.org Subject: Re: [HACKERS] Performance with the new security release? On 13-04-22 11:46 PM, Anne Rosset wrote: Thanks Steve. I have read that a fix has been put in release 9.2.3 for this issue. Is that right? Thanks, Anne No this issue is present in 9.0.13, 9.1.9 and 9.2.4 (as well as 9.2.3). There is talk about fixing this for the next set of minor releases but I haven't yet seen a patch. -Original Message- From: Steve Singer [mailto:st...@ssinger.info] Sent: Monday, April 22, 2013 4:35 PM To: Anne Rosset Cc: pgsql-hackers@postgresql.org Subject: Re: [HACKERS] Performance with the new security release? On 13-04-22 04:41 PM, Anne Rosset wrote: Hi Steve, Yes I see these messages in our log. Is there a solution to this? Thanks, Anne A manual analyze of the effected tables should work and give you updated statistics. If your problem is just statistics then that should help. A manual vacuum will , unfortunately, behave like the auto-vacuum. The only way to get vacuum past this (until this issue is fixed) is for vacuum to be able to get that exclusive lock. If there are times of the day your database is less busy you might have some luck turning off auto-vacuum on these tables and doing manual vacuums during those times. -Original Message- From: Steve Singer [mailto:st...@ssinger.info] Sent: Monday, April 22, 2013 1:26 PM To: Anne Rosset Cc: pgsql-hackers@postgresql.org Subject: Re: [HACKERS] Performance with the new security release? On 13-04-22 04:15 PM, Anne Rosset wrote: Hi Steve, Thanks for your reply. We are now running 9.0.13. Before it was 9.0.7. How can I find out if we are running into this issue: ie if statistics are no longer being updated because analyze can't get the exclusive lock for truncation? This issue is discussed in the thread http://www.postgresql.org/message-id/CAMkU=1xYXOJp=jLAASPdSAqab-HwhA_ t nRhy+JUe=4=b=v3...@mail.gmail.com If your seeing messages in your logs of the form: automatic vacuum of table XXX.YYY cannot (re)acquire exclusive lock for truncate scan then you might be hitting this issue. I will dig into our logs to see for the query times. Thanks, Anne -Original Message- From: Steve Singer [mailto:st...@ssinger.info] Sent: Monday, April 22, 2013 12:59 PM To: Anne Rosset Cc: pgsql-hackers@postgresql.org Subject: Re: [HACKERS] Performance with the new security release? On 13-04-22 01:38 PM, Anne Rosset wrote: Hi, We are seeing some overall performance degradation in our application since we installed the security release. Other commits were also done at the same time in the application so we don't know yet if the degradation has any relationship with the security release. While we are digging into this, I would like to know if it is possible that the release has some impact on performance. After reading this It was created as a side effect of a refactoring effort to make establishing new connections to a PostgreSQL server faster, and the associated code more maintainable., I am thinking it is quite possible. Please let me know. Thanks, Exactly which version of PostgreSQL are you running? (we released security update releases for multiple PG versions). Also which version were you running before? There were some changes to analyze/vacuum in the previous set of minor releases that could cause performance issues in some cases (ie if statistics are no longer being updated because analyze can't get the exclusive lock for truncation). There might be other unintended performance related changes.
Re: [HACKERS] Performance with the new security release?
Anne Rosset wrote: Thanks Steve. I found this: http://www.postgresql.org/docs/current/static/release-9-2-3.html Fix performance problems with autovacuum truncation in busy workloads (Jan Wieck) Truncation of empty pages at the end of a table requires exclusive lock, but autovacuum was coded to fail (and release the table lock) when there are conflicting lock requests. Under load, it is easily possible that truncation would never occur, resulting in table bloat. Fix by performing a partial truncation, releasing the lock, then attempting to re-acquire the lock and continue. This fix also greatly reduces the average time before autovacuum releases the lock after a conflicting request arrives. So that is not the fix? (Sorry to ask a second time but I really need to make sure). That's the commit that created the bug, AFAIU. It's a fix for a serious problem, but we overlooked that it introduced some other problems which is what you're now seeing. -- Álvaro Herrerahttp://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Couple of issues with pg_xlogdump
Andres Freund escribió: On 2013-04-23 15:16:05 +0530, Pavan Deolasee wrote: It works without either if you use explicit options like -s STARTADDR and -p PATH which is frequently useful to just print a few records at the correct point. I am not sure how could put that in there without making it too complicated. Any suggestions? Ah ok. Can we mention these details at in the documentation ? http://www.postgresql.org/docs/devel/static/pgxlogdump.html So something like: At least one of STARTSEG, --start and --path or --rmgr=list has to be specified. I think we need more than one synopsis line. Maybe Usage: pg_xlogdump [OPTION] --path=PATH --start=STARTPOS pg_xlogdump [OPTION] [STARTSEG [ENDSEG]] And then, under options, do not list --path and --start (because that'd imply they can be used when STARTSEG is specified, which I understand they cannot). Do we have any other possible operation mode? IIRC those are the only two possible modes. -- Álvaro Herrerahttp://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] putting a bgworker to rest
Andres Freund wrote: Hi all, I noticed the need to simply stop a bgworker after its work is done but still have it restart in unusual circumstances like a crash. Obviously I can just have it enter a loop where it checks its latch and such, but that seems a bit pointless. Would it make sense to add an extra return value or such for that? KaiGai also requested some more flexibility in the stop timing and shutdown sequence. I understand the current design that workers are always on can be a bit annoying. How would postmaster know when to restart a worker that stopped? -- Álvaro Herrerahttp://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Proposal to add --single-row to psql
Hello It is redundant to current FETCH_COUNT implementation, so I don't see sense to use it together. Maybe we can drop FETCH_COUNT and replace it by --single-row mode and probably it can simplify code. Regards Pavel 2013/4/23 Christopher Manning c...@christophermanning.org psql currently collects the query result rows in memory before writing them to a file and can cause out of memory problems for large results in low memory environments like ec2. I can't use COPY TO STDOUT or FETCH_COUNT since I'm using Redshift and it doesn't support [writing to STDOUT]( http://docs.aws.amazon.com/redshift/latest/dg/r_COPY.html) or [CURSOR]( https://forums.aws.amazon.com/thread.jspa?threadID=122664). [Single Row Mode is available in Postgres 9.2]( http://www.postgresql.org/docs/9.2/static/libpq-single-row-mode.html) but [it doesn't look like]( http://www.postgresql.org/docs/9.2/static/app-psql.html) you can tell psql to use single row mode when writing to a file (using --output). I'm proposing to add a --single-row option to psql that would allow the result rows of a query to be streamed to a file without collecting them in memory first. I'm new to the postgres source, but I was considering doing this by adding an elseif at [this line in bin/psql/common.c]( https://github.com/postgres/postgres/blob/master/src/bin/psql/common.c#L955) that would call [PQsetSingleRowMode]( https://github.com/postgres/postgres/blob/master/src/interfaces/libpq/fe-exec.c#L1581) and ideally use something very similar to [ExecQueryUsingCursor]( https://github.com/postgres/postgres/blob/master/src/bin/psql/common.c#L1081 ) Please let me know if that would be an acceptable addition and if there's anything in particular I should be aware of when adding the feature. Thank you, Christopher
Re: [HACKERS] Proposal to add --single-row to psql
Christopher Manning c...@christophermanning.org writes: I'm proposing to add a --single-row option to psql that would allow the result rows of a query to be streamed to a file without collecting them in memory first. Isn't there already a way to set FETCH_COUNT from the command line? (ie, I think there's a generic variable-assignment facility that could do this) regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Couple of issues with pg_xlogdump
On 2013-04-23 11:51:06 -0300, Alvaro Herrera wrote: Andres Freund escribió: On 2013-04-23 15:16:05 +0530, Pavan Deolasee wrote: It works without either if you use explicit options like -s STARTADDR and -p PATH which is frequently useful to just print a few records at the correct point. I am not sure how could put that in there without making it too complicated. Any suggestions? Ah ok. Can we mention these details at in the documentation ? http://www.postgresql.org/docs/devel/static/pgxlogdump.html So something like: At least one of STARTSEG, --start and --path or --rmgr=list has to be specified. I think we need more than one synopsis line. Maybe Usage: pg_xlogdump [OPTION] --path=PATH --start=STARTPOS pg_xlogdump [OPTION] [STARTSEG [ENDSEG]] And then, under options, do not list --path and --start (because that'd imply they can be used when STARTSEG is specified, which I understand they cannot). Both can be used. If you specify --start and STARTSEG the address has to be contained in the file: if (XLogRecPtrIsInvalid(private.startptr)) XLogSegNoOffsetToRecPtr(segno, 0, private.startptr); else if (!XLByteInSeg(private.startptr, segno)) { fprintf(stderr, %s: start log position %X/%X is not inside file \%s\\n, progname, (uint32) (private.startptr 32), (uint32) private.startptr, fname); goto bad_argument; } --path and STARTSEG/ENDSEG also makes sense, it will be used to locate the file. Do we have any other possible operation mode? IIRC those are the only two possible modes. I think so as well. Greetings, Andres Freund -- Andres Freund http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] putting a bgworker to rest
On 2013-04-23 11:59:43 -0300, Alvaro Herrera wrote: Andres Freund wrote: Hi all, I noticed the need to simply stop a bgworker after its work is done but still have it restart in unusual circumstances like a crash. Obviously I can just have it enter a loop where it checks its latch and such, but that seems a bit pointless. Would it make sense to add an extra return value or such for that? KaiGai also requested some more flexibility in the stop timing and shutdown sequence. I understand the current design that workers are always on can be a bit annoying. How would postmaster know when to restart a worker that stopped? I had imagined we would assign some return codes special meaning. Currently 0 basically means restart immediately, 1 means crashed, wait for some time, everything else results in a postmaster restart. It seems we can just assign returncode 2 as done, probably with some enum or such hiding the numbers. Greetings, Andres Freund -- Andres Freund http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] Bug Fix: COLLATE with multiple ORDER BYs in aggregates
Folks, While testing the upcoming FILTER clause for aggregates, Erik Rijkers uncovered a long-standing bug in $subject, namely that this case wasn't handled. Please find attached a patch by Andrew Gierth and myself which fixes this issue and adds a regression test to ensure it remains fixed. Cheers, David. -- David Fetter da...@fetter.org http://fetter.org/ Phone: +1 415 235 3778 AIM: dfetter666 Yahoo!: dfetter Skype: davidfetter XMPP: david.fet...@gmail.com iCal: webcal://www.tripit.com/feed/ical/people/david74/tripit.ics Remember to vote! Consider donating to Postgres: http://www.postgresql.org/about/donate diff --git a/src/backend/parser/parse_collate.c b/src/backend/parser/parse_collate.c index 7ed50cd..f432360 100644 --- a/src/backend/parser/parse_collate.c +++ b/src/backend/parser/parse_collate.c @@ -319,86 +319,6 @@ assign_collations_walker(Node *node, assign_collations_context *context) } } break; - case T_CaseExpr: - { - /* -* CaseExpr is a special case because we do not want to -* recurse into the test expression (if any). It was already -* marked with collations during transformCaseExpr, and -* furthermore its collation is not relevant to the result of -* the CASE --- only the output expressions are. So we can't -* use expression_tree_walker here. -*/ - CaseExpr *expr = (CaseExpr *) node; - Oid typcollation; - ListCell *lc; - - foreach(lc, expr-args) - { - CaseWhen *when = (CaseWhen *) lfirst(lc); - - Assert(IsA(when, CaseWhen)); - - /* -* The condition expressions mustn't affect the CASE's -* result collation either; but since they are known to -* yield boolean, it's safe to recurse directly on them -* --- they won't change loccontext. -*/ - (void) assign_collations_walker((Node *) when-expr, - loccontext); - (void) assign_collations_walker((Node *) when-result, - loccontext); - } - (void) assign_collations_walker((Node *) expr-defresult, - loccontext); - - /* -* Now determine the CASE's output collation. This is the -* same as the general case below. -*/ - typcollation = get_typcollation(exprType(node)); - if (OidIsValid(typcollation)) - { - /* Node's result is collatable; what about its input? */ - if (loccontext.strength COLLATE_NONE) - { - /* Collation state bubbles up from children. */ - collation = loccontext.collation; - strength = loccontext.strength; - location = loccontext.location; - } - else - { - /* -* Collatable output produced without any collatable -* input. Use the type's collation (which is usually -* DEFAULT_COLLATION_OID, but might be different for a -* domain). -*/ - collation = typcollation; - strength = COLLATE_IMPLICIT; - location
Re: [HACKERS] putting a bgworker to rest
Andres Freund wrote: On 2013-04-23 11:59:43 -0300, Alvaro Herrera wrote: Andres Freund wrote: Hi all, I noticed the need to simply stop a bgworker after its work is done but still have it restart in unusual circumstances like a crash. Obviously I can just have it enter a loop where it checks its latch and such, but that seems a bit pointless. Would it make sense to add an extra return value or such for that? KaiGai also requested some more flexibility in the stop timing and shutdown sequence. I understand the current design that workers are always on can be a bit annoying. How would postmaster know when to restart a worker that stopped? I had imagined we would assign some return codes special meaning. Currently 0 basically means restart immediately, 1 means crashed, wait for some time, everything else results in a postmaster restart. It seems we can just assign returncode 2 as done, probably with some enum or such hiding the numbers. So a done worker would never be restarted, until postmaster sees a crash or is itself restarted? I guess that'd be useful for workers running during recovery, which terminate when recovery completes. Is that your use case? -- Álvaro Herrerahttp://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] putting a bgworker to rest
On 2013-04-23 14:11:26 -0300, Alvaro Herrera wrote: Andres Freund wrote: On 2013-04-23 11:59:43 -0300, Alvaro Herrera wrote: Andres Freund wrote: Hi all, I noticed the need to simply stop a bgworker after its work is done but still have it restart in unusual circumstances like a crash. Obviously I can just have it enter a loop where it checks its latch and such, but that seems a bit pointless. Would it make sense to add an extra return value or such for that? KaiGai also requested some more flexibility in the stop timing and shutdown sequence. I understand the current design that workers are always on can be a bit annoying. How would postmaster know when to restart a worker that stopped? I had imagined we would assign some return codes special meaning. Currently 0 basically means restart immediately, 1 means crashed, wait for some time, everything else results in a postmaster restart. It seems we can just assign returncode 2 as done, probably with some enum or such hiding the numbers. So a done worker would never be restarted, until postmaster sees a crash or is itself restarted? I guess that'd be useful for workers running during recovery, which terminate when recovery completes. Is that your use case? Well, its not actual postgres recovery, but something similar in the context of logical replication. Greetings, Andres Freund -- Andres Freund http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] minimizing the target list for foreign data wrappers
In case anyone is interested, I tried it and it doesn't seem to work. It looks like some other plan element already has the target-list tuple baked. Now I'm trying to decide whether to give up on FDW. It's a shame because it's such a sweet facility, but at this point, I just don't think that it's mature enough for what I need to do. Regards, David Gudeman On Mon, Apr 22, 2013 at 11:27 AM, David Gudeman dave.gude...@gmail.com wrote: Re-reading my first email I thought it was a little confusing, so here is some clarification. In GetForeignPlan, tlist seems to be a target list for a basic select * from the foreign table. For the ith TargetEntry te in tlist, it seems that te-expr is a var with varattno=i. I was mis-remembering and calling varattno attrno in the original email. My assumption is that the plan elements that use the output of the FDW plan node will access columns indirectly using tlist. In other words, I'm assuming that if there is a reference to a column c of the foreign table, this column will be represented as a Var with varattno being an offset into tlist. So if c is column number 3, for example, you get its value by looking up TargetEntry number 3 in tlist and evaluate the expr column for that TargetEntry. So if I change the Var in the expr column so the varattno points to a different column in the output tuple, then everything will work. The two risky assumptions I'm making are 1. that it actually uses this indirect way of looking up columns in a foreign table and 2. that it actually uses the tlist that I pass in when I call make_foreignscan(). Can anyone confirm or deny these assumptions? Thanks. On Sun, Apr 21, 2013 at 6:57 PM, David Gudeman dave.gude...@gmail.com wrote: A few years ago I wrote a roll-your-own foreign-data-wrapper system for Postgres because Postgres didn't have one at the time (some details here (http://unobtainabol.blogspot.com/2013/04/dave-foreign-data-introuction.html) if anyone is interested). Now I'm being tasked to move it to Postgres 9.2.x and I'd like to use FDW if possible. One of the problems I'm having is that in my application, the foreign tables typically have hundreds of columns while typical queries only access a dozen or so (the foreign server is a columnar SQL database). Furthermore, there is no size optimization for NULL values passed back from the foreign server, so if I return all of the columns from the table --even as NULLs-- the returned data size will be several times the size that it needs to be. My application cannot tolerate this level of inefficiency, so I need to return minimal columns from the foreign table. The documentation doesn't say how to do this, but looking at the code I think it is possible. In GetForeignPlan() you have to pass on the tlist argument, which I presume means that the query plan will use the tlist that I pass in, right? If so, then it should be possible for me to write a function that takes tlist and baserel-reltargetlist and return a version of tlist that knows which foreign-table columns are actually used, and replaces the rest with a NULL constant. For example, suppose the original tlist is this: [VAR(attrno=1), VAR(attrno=2), VAR(attrno=3)] and reltarget list says that I only need args 1 and 3. Then the new tlist would look like this: [VAR(attrno=1), CONST(val=NULL), VAR(attrno=2)] where the attrno of the last VAR has been reduced by one because the 2 column is no longer there. I did something very much like this in my roll-your-own version of FDW so I know basically how to do it, but I did it at the pre-planning stage and I'm not sure how much is already packed into the other plan nodes at this point. Maybe it's too late to change the target list? Can anyone give me some advice or warnings on this? I'd hate to go to the trouble of implementing and testing it only to find that I'm making some bogus assumptions. Thanks, David Gudeman -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Couple of issues with pg_xlogdump
On Tue, Apr 23, 2013 at 2:30 AM, Andres Freund and...@2ndquadrant.comwrote: On 2013-04-23 14:51:05 +0530, Pavan Deolasee wrote: [pavan.deolasee@puppetserver pg_xlogdump]$ ./pg_xlogdump ~/db93head/pg_xlog/00010008 pg_xlogdump: FATAL: could not find a valid record after 0/800 So pg_xlogdump gives error for all WAL files except the first one. Should it not have printed the WAL records from these files ? Probably not. Those are likely renamed wal files that do not yet contain valid data. But they do contain valid data, just not for the name the file currently has. If you can guess what the pre-recycle name was, you can rename the file (as an out-of-tree copy of course) and then get the dump out of it, which I've found can be quite useful. Perhaps there should be an option for it to press on regardless and dump the contents as if it were the pre-recycled file. Or maybe just have the error message report to you what filename the data it sees would be valid for, so you can know what to rename it to without needing to guess. I don't really know how this would interact with STARTSEG and ENDSEG range, though, so perhaps it would only apply when just one file is given, not a range. Cheers, Jeff
Re: [HACKERS] minimizing the target list for foreign data wrappers
David, Please post your patch(es) and some demo of how things broke so others can improve future versions--possibly even 9.3 versions if it turns out you've discovered a bug in the implementation. Thanks very much for your hard work and insights into this. Cheers, David. On Tue, Apr 23, 2013 at 11:08:04AM -0700, David Gudeman wrote: In case anyone is interested, I tried it and it doesn't seem to work. It looks like some other plan element already has the target-list tuple baked. Now I'm trying to decide whether to give up on FDW. It's a shame because it's such a sweet facility, but at this point, I just don't think that it's mature enough for what I need to do. Regards, David Gudeman On Mon, Apr 22, 2013 at 11:27 AM, David Gudeman dave.gude...@gmail.com wrote: Re-reading my first email I thought it was a little confusing, so here is some clarification. In GetForeignPlan, tlist seems to be a target list for a basic select * from the foreign table. For the ith TargetEntry te in tlist, it seems that te-expr is a var with varattno=i. I was mis-remembering and calling varattno attrno in the original email. My assumption is that the plan elements that use the output of the FDW plan node will access columns indirectly using tlist. In other words, I'm assuming that if there is a reference to a column c of the foreign table, this column will be represented as a Var with varattno being an offset into tlist. So if c is column number 3, for example, you get its value by looking up TargetEntry number 3 in tlist and evaluate the expr column for that TargetEntry. So if I change the Var in the expr column so the varattno points to a different column in the output tuple, then everything will work. The two risky assumptions I'm making are 1. that it actually uses this indirect way of looking up columns in a foreign table and 2. that it actually uses the tlist that I pass in when I call make_foreignscan(). Can anyone confirm or deny these assumptions? Thanks. On Sun, Apr 21, 2013 at 6:57 PM, David Gudeman dave.gude...@gmail.com wrote: A few years ago I wrote a roll-your-own foreign-data-wrapper system for Postgres because Postgres didn't have one at the time (some details here (http://unobtainabol.blogspot.com/2013/04/dave-foreign-data-introuction.html) if anyone is interested). Now I'm being tasked to move it to Postgres 9.2.x and I'd like to use FDW if possible. One of the problems I'm having is that in my application, the foreign tables typically have hundreds of columns while typical queries only access a dozen or so (the foreign server is a columnar SQL database). Furthermore, there is no size optimization for NULL values passed back from the foreign server, so if I return all of the columns from the table --even as NULLs-- the returned data size will be several times the size that it needs to be. My application cannot tolerate this level of inefficiency, so I need to return minimal columns from the foreign table. The documentation doesn't say how to do this, but looking at the code I think it is possible. In GetForeignPlan() you have to pass on the tlist argument, which I presume means that the query plan will use the tlist that I pass in, right? If so, then it should be possible for me to write a function that takes tlist and baserel-reltargetlist and return a version of tlist that knows which foreign-table columns are actually used, and replaces the rest with a NULL constant. For example, suppose the original tlist is this: [VAR(attrno=1), VAR(attrno=2), VAR(attrno=3)] and reltarget list says that I only need args 1 and 3. Then the new tlist would look like this: [VAR(attrno=1), CONST(val=NULL), VAR(attrno=2)] where the attrno of the last VAR has been reduced by one because the 2 column is no longer there. I did something very much like this in my roll-your-own version of FDW so I know basically how to do it, but I did it at the pre-planning stage and I'm not sure how much is already packed into the other plan nodes at this point. Maybe it's too late to change the target list? Can anyone give me some advice or warnings on this? I'd hate to go to the trouble of implementing and testing it only to find that I'm making some bogus assumptions. Thanks, David Gudeman -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers -- David Fetter da...@fetter.org http://fetter.org/ Phone: +1 415 235 3778 AIM: dfetter666 Yahoo!: dfetter Skype: davidfetter XMPP: david.fet...@gmail.com iCal: webcal://www.tripit.com/feed/ical/people/david74/tripit.ics Remember to vote! Consider donating to Postgres: http://www.postgresql.org/about/donate -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your
Re: [HACKERS] GSOC Student Project Idea
Michael Schuh escribió: http://www.cs.montana.edu/~timothy.wylie/files/bncod13.pdf Um. From the paper I get the impression that there's yet much to learn in order for this indexing method to be really effective? -- Álvaro Herrerahttp://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] 9.3 Beta1 status report
On Tue, Apr 23, 2013 at 10:12:58AM +0100, Dean Rasheed wrote: On 21 April 2013 06:02, Bruce Momjian br...@momjian.us wrote: I am not sure if Tom shared yet, but we are planning to package 9.3 beta1 on April 29, with a release on May 2. Those dates might change, but that is the current plan. I have completed a draft 9.3 release notes, which you can view here: http://momjian.us/pgsql_docs/release-9-3.html I will be working on polishing them for the next ten days, so any feedback, patches, or commits are welcome. I still need to add lots of SGML markup. E.1.3.4.4. VIEWs: * Make simple views auto-updatable (Dean Rasheed) INSTEAD rules are still available for complex views. I think this should refer to INSTEAD OF triggers for complex views, since that is now the recommended way to implement updatable views. I think it should also expand a little on what simple means in this context, without going into all the gory details, and mention that there is a behaviour change. So perhaps something like this for the second paragraph: Simple views that select some or all columns from a single base table are now updatable by default. More complex views can be made updatable using INSTEAD OF triggers or INSTEAD rules. I like it! Done. -- Bruce Momjian br...@momjian.ushttp://momjian.us EnterpriseDB http://enterprisedb.com + It's impossible for everything to be true. + -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] 9.3 Beta1 status report
On Tue, Apr 23, 2013 at 05:36:03PM +0800, Jov wrote: E.1.3.1.4: Improve performance of the CREATE TABLE ... ON COMMIT DELETE ROWS clause by only issuing delete if the temporary table was accessed (Heikki Linnakangas) should be: CREATE TEMP TABLE ... ON COMMIT DELETE ROWS or CREATE { TEMPORARY | TEMP } TABLE ... ON COMMIT DELETE ROWS because there is no syntax for CREATE TABLE ... ON COMMIT DELETE ROWS Oh, good point. Fixed. -- Bruce Momjian br...@momjian.ushttp://momjian.us EnterpriseDB http://enterprisedb.com + It's impossible for everything to be true. + -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] 9.3 Beta1 status report
I just spotted some more small stuff: s/IF NOT EXIST /IF NOT EXISTS /g # 2 x It actually had me doubting, but yes that -S should be there... Thanks, Erik Rijkers -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] 9.3 Beta1 status report
On Tue, Apr 23, 2013 at 02:25:08PM +0300, Heikki Linnakangas wrote: On 22.04.2013 23:06, Bruce Momjian wrote: On Mon, Apr 22, 2013 at 10:11:48PM +0300, Heikki Linnakangas wrote: E.1.3.2.1. Write-Ahead Log (WAL) Store WAL in a continuous stream, rather than skipping the last 16MB segment every 4GB (Heikki Linnakangas) BACKWARD COMPATIBLE BREAK Restructure WAL files to better handle timeline changes during recovery (Heikki Linnakangas) Restructure WAL files to use a more compact storage format (Heikki Linnakangas) Can you clarify which commits these came from? The first one is clear (dfda6eba), and I think the 3rd covers commits 20ba5ca6 and 061e7efb1. But what is that second entry? Frankly, I found the WAL and timeline commits all over the place and could hardly make sense of it. I tried to collapse entries into meaningful items, but I need help. Can you suggest changes? Ok: * Don't skip the last 16 MB WAL segment every 4GB, with filename ending in FF (Heikki Linnakangas) BACKWARD COMPATIBLE BREAK Uh, I am unclear why this is better than the original text. We don't normally explain things like WAL file patterns in the release notes. * Change WAL record format, allowing the record header to be split across pages. The new format is slightly more compact. (Heikki Linnakangas) Done. In Source Code section: * Use a 64-bit integer to represent WAL positions (XLogRecPtr), instead of two 32-bit integers. (Heikki Linnakangas) Added. Do we usually repeat the changes listed in the backwards compatibility section later, in the Changes section? If not, then instead of the first two items above, let's just have these in the backwards-compatibility section: We do not repeat the incompatibile items later in release notes. I have added some of your text to one of the items, and added a mention that tooling will need adjustment. Please check the post-commit version and let me know about adjustments. -- Bruce Momjian br...@momjian.ushttp://momjian.us EnterpriseDB http://enterprisedb.com + It's impossible for everything to be true. + -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] 9.3 Beta1 status report
On Tue, Apr 23, 2013 at 11:00:31PM +0200, Erikjan Rijkers wrote: I just spotted some more small stuff: s/IF NOT EXIST /IF NOT EXISTS /g # 2 x It actually had me doubting, but yes that -S should be there... Fixed, thanks. -- Bruce Momjian br...@momjian.ushttp://momjian.us EnterpriseDB http://enterprisedb.com + It's impossible for everything to be true. + -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] 9.3 Beta1 status report
On Tue, Apr 23, 2013 at 05:04:15PM -0400, Bruce Momjian wrote: Do we usually repeat the changes listed in the backwards compatibility section later, in the Changes section? If not, then instead of the first two items above, let's just have these in the backwards-compatibility section: We do not repeat the incompatibile items later in release notes. I have added some of your text to one of the items, and added a mention that tooling will need adjustment. Please check the post-commit version and let me know about adjustments. Let me clarify --- changes to our WAL binary format and source code changes are not really incompatibilities from a user perspective as we never promise to do our best to minimize such changes --- m eaning the fact the WAL format changed is something that would be only in the source code section and not in the incompatibilities section --- incompatibilities are related to change in user experience or documented-API changes. -- Bruce Momjian br...@momjian.ushttp://momjian.us EnterpriseDB http://enterprisedb.com + It's impossible for everything to be true. + -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] GSOC Student Project Idea
Michael, On Mon, Apr 22, 2013 at 4:28 PM, Michael Schuh schuh.m...@gmail.com wrote: Although I do not have a lot of experience with PostgreSQL development, I am eager to learn and commit my summer to enabling another fantastic feature for the community. Since iDistance is a non-recursive, data-driven, space-based partitioning strategy which builds directly onto a B+-tree, I believe the implementation should be possible using only GiST support. Please let me know if this is of any interest, or if you have any additional questions. Unfortunately, I will be unavailable most of the day, but I plan to fill out the GSOC application later this evening. I've taken a brief look on the paper and implementation. As I can see iDistance implements some global building strategy. I mean, for example, it selects some point, calculates distances from selected point to all points in dataset etc. So, it uses the whole dataset at the same time. GiST and SP-GiST now behaves differently. They have interface functions which operates at maximum very small portion of data which fits to disk page. You can try to fit iDistance into GiST or SP-GiST interface. But will it be that efficient in this case? I've experimented with fitting VP-tree (which looks similar to iDistance at first glance) into SP-GiST. Then my results was depressing: it was useless with levenshtein distance, and it was worse than R-tree with multidimensional points. So I think we really need global index building algorithm in order to reveal power of such data structures. But GiST and SP-GiST currently doesn't support it. However you can try to implement global index building in GiST or SP-GiST. In this case I think you should carefully estimate your capabilities during single GSoC project. You would need to extend GiST or SP-GiST interface and write completely new implementation of tree building algorithm. Question of how to exactly extend GiST or SP-GiST interface for this case could appear to be very hard even theoretically. In short my opinion is so: don't expect miracle by just implementing GiST or SP-GiST interface functions. -- With best regards, Alexander Korotkov.
Re: [HACKERS] REFRESH MATERIALIZED VIEW command in PL block hitting Assert
Jeevan Chalke jeevan.cha...@enterprisedb.com wrote: On Mon, Apr 22, 2013 at 6:41 PM, Andres Freund and...@2ndquadrant.com wrote: On 2013-04-22 18:35:04 +0530, Jeevan Chalke wrote: I have observed that following sequence is causing server crash. CREATE MATERIALIZED VIEW temp_class_mv AS SELECT * FROM pg_class WITH NO DATA; CREATE OR REPLACE FUNCTION test_refresh_mv() RETURNS int AS $$ BEGIN REFRESH MATERIALIZED VIEW temp_class_mv; return 1; END; $$ LANGUAGE plpgsql; SELECT test_refresh_mv(); I had a quick look over the crash and it is hitting following Assert in spi.c: else if (IsA(stmt, RefreshMatViewStmt)) { Assert(strncmp(completionTag, REFRESH MATERIALIZED VIEW , 23) == 0); _SPI_current-processed = strtoul(completionTag + 23, NULL, 10); } It seems like we are missing expected value for completionTag in ExecRefreshMatView() Possibly independent from this issue, but where did that 23 come from? When the consensus developed to change the syntax from LOAD MATERIALIZED VIEW I failed to noticed the length here when making the changes for that. BTW, attached is the patch which works well for me, but need details review. I suggest that we just rip out this section of code. Trying to provide a number here is probably all wrong, anyway. As the features evolve, there may not be a readily accessible rowcount for this command in all cases. Any objections to the attached to fix this issue? -- Kevin Grittner EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Companydiff --git a/src/backend/executor/spi.c b/src/backend/executor/spi.c index cc7764d..de8d59a 100644 --- a/src/backend/executor/spi.c +++ b/src/backend/executor/spi.c @@ -2122,13 +2122,6 @@ _SPI_execute_plan(SPIPlanPtr plan, ParamListInfo paramLI, if (((CreateTableAsStmt *) stmt)-is_select_into) res = SPI_OK_SELINTO; } -else if (IsA(stmt, RefreshMatViewStmt)) -{ - Assert(strncmp(completionTag, - REFRESH MATERIALIZED VIEW , 23) == 0); - _SPI_current-processed = strtoul(completionTag + 23, - NULL, 10); -} else if (IsA(stmt, CopyStmt)) { Assert(strncmp(completionTag, COPY , 5) == 0); -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] 9.3 Beta1 status report
Bruce Momjian wrote: On Tue, Apr 23, 2013 at 05:04:15PM -0400, Bruce Momjian wrote: Do we usually repeat the changes listed in the backwards compatibility section later, in the Changes section? If not, then instead of the first two items above, let's just have these in the backwards-compatibility section: We do not repeat the incompatibile items later in release notes. I have added some of your text to one of the items, and added a mention that tooling will need adjustment. Please check the post-commit version and let me know about adjustments. Let me clarify --- changes to our WAL binary format and source code changes are not really incompatibilities from a user perspective as we never promise to do our best to minimize such changes --- m eaning the fact the WAL format changed is something that would be only in the source code section and not in the incompatibilities section --- incompatibilities are related to change in user experience or documented-API changes. These guidelines makes sense, except I think the change in naming standard of xlog segments is important to document clearly because, even if we have not promised compatibility, people could be relying on it in scripts. I think it makes sense to waste a couple of lines documenting this change, even if we expect the number of people to be bitten by it to be very low. Unrelated: I think this item Add configuration variable lock_timeout to limit lock wait duration (Zoltán Böszörményi) should go in the locking section. What's of interest here is the new feature to set maximum lock waits. The fact that this is done using a configuration variable is not that important. -- Álvaro Herrerahttp://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Enabling Checksums
On Tue, 2013-04-23 at 16:28 +0100, Simon Riggs wrote: * make the pg_control.data_checksums field into a version number, for future flexibility... patch attached Commenting on this separately because it's a separate issue. I'd prefer that it was some kind of a checksum ID code -- e.g. 0 for no checksum, 1 for FNV-1a-SR3, etc. That would allow us to release 9.4 with a new algorithm without forcing existing users to change. initdb would have to take the code as an option, probably in string form. What do you think? Regards, Jeff Davis -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] a patch for row-at-a-time execution for table functions
The primary change we made to Postgres in order to support our own version of foreign data wrappers was a row-at-a-time execution for table functions. In standard Postgres, when you execute a table function, it gathers all of the rows at once and stuffs them into a buffer in order to support cursors, even if it is just a vanilla forward scan. We modified the code so that when you do vanilla forward scans it executes the function one row at a time. This wasn't a big change since the support for executing functions that way was always in there but it was bypassed in the plan execution code. Probably someone always intended to do this, but never got around to it. We also encountered what I think is a bug in the way that the cleanup callback functions are called at the end of a table function. I'm not sure I remember the details correctly, but I believe it was freeing the memory region used for the private state before calling the cleanup function. If that was the bug though, why didn't I just use malloc for the private state? I'll have to review my notes on that one... Anyway, my company has agreed to let me post a patch for these changes. Before I go to the work of getting the patches ready, I'd like to take the temperature of the commiters and find out if there is any likelihood of getting these patches accepted. Anyone? -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] a patch for row-at-a-time execution for table functions
-BEGIN PGP SIGNED MESSAGE- Hash: SHA1 On 04/23/2013 06:09 PM, David Gudeman wrote: The primary change we made to Postgres in order to support our own version of foreign data wrappers was a row-at-a-time execution for table functions. In standard Postgres, when you execute a table function, it gathers all of the rows at once and stuffs them into a buffer in order to support cursors, even if it is just a vanilla forward scan. We modified the code so that when you do vanilla forward scans it executes the function one row at a time. This wasn't a big change since the support for executing functions that way was always in there but it was bypassed in the plan execution code. Probably someone always intended to do this, but never got around to it. Guilty as charged :-( We also encountered what I think is a bug in the way that the cleanup callback functions are called at the end of a table function. I'm not sure I remember the details correctly, but I believe it was freeing the memory region used for the private state before calling the cleanup function. If that was the bug though, why didn't I just use malloc for the private state? I'll have to review my notes on that one... Anyway, my company has agreed to let me post a patch for these changes. Before I go to the work of getting the patches ready, I'd like to take the temperature of the commiters and find out if there is any likelihood of getting these patches accepted. Anyone? +1 I have always been in favor of having this option -- in fact that was the way it was originally coded IIRC. Something caused us to switch to materialized mode and we decided we would wait a while to see how that went before worrying again about value-per-call mode. At this point you're looking at 9.4 for this, but I'll try to help see it through since it has long been on my TODO list. Joe - -- Joe Conway credativ LLC: http://www.credativ.us Linux, PostgreSQL, and general Open Source Training, Service, Consulting, 24x7 Support -BEGIN PGP SIGNATURE- Version: GnuPG v1.4.11 (GNU/Linux) Comment: Using GnuPG with Thunderbird - http://www.enigmail.net/ iQIcBAEBAgAGBQJRdzOOAAoJEDfy90M199hlUeQQAI6c3H1rvsqtk7DmGFY+LfhS LKOvEfYPQ2+BtTtgs7g0qjARfZg/fb5KPlUPMDk/3asT5BUnrOjCMOtDWQMIn5G8 hEjSvkzEjaTpCYQzZisrMWyIoD3YYtNZigND5eAsKKx/JXPqvd7xqgX+6FFEh6ui ykpLXcasMZbRDDEBZZp5Yf6GV9ZPpgtvHEp2pHsUFN6d65XwHwHGMxNYAosmzqpS 0hV9d/Tjs/1P1Anw4LSXWnJTwp0U4SyerO/afYRHMk8PpzHxMCx5IGUyBMXybU04 IY3IcwIdTWdlhwuvLEGOpeCP+J8rAY5TenXumnrLvfBuw+heu/FClQndR6szzwdj jwI/wthDRocifPbiEWBfd+rzoTp0F6dMdPleABZbkDROwsEib+cUqn0S1fPiE7zS vsiWpm6THAPxXhC/PDzCC062b2HO2gN4oLBg2bGc0YS7QKjoUs6QFs3JJwDb06RK MZRTJoxmMkUpwfIUY5R28kONOcjKSHNRBTfjc9ChBSqc2heXVwjTwQ/KBeJjwVKG MRkIx/ZoQtURH0XQUSkJcqavRhEW52pWWLvl+/H7dPtGT9CV2qv5LMGuyZ1KBz0G 98LPk9bArBWxiymlYJ8Bx1WT8zjSpIMsl21NPh/d1MDYs4gr1yiyuqR8z3Es8ylB aOoRUhiMEiOySggQsGZQ =d5TY -END PGP SIGNATURE- -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] 9.3 Beta1 status report
On Tue, Apr 23, 2013 at 06:56:34PM -0300, Alvaro Herrera wrote: Bruce Momjian wrote: On Tue, Apr 23, 2013 at 05:04:15PM -0400, Bruce Momjian wrote: Do we usually repeat the changes listed in the backwards compatibility section later, in the Changes section? If not, then instead of the first two items above, let's just have these in the backwards-compatibility section: We do not repeat the incompatibile items later in release notes. I have added some of your text to one of the items, and added a mention that tooling will need adjustment. Please check the post-commit version and let me know about adjustments. Let me clarify --- changes to our WAL binary format and source code changes are not really incompatibilities from a user perspective as we never promise to do our best to minimize such changes --- m eaning the fact the WAL format changed is something that would be only in the source code section and not in the incompatibilities section --- incompatibilities are related to change in user experience or documented-API changes. These guidelines makes sense, except I think the change in naming standard of xlog segments is important to document clearly because, even if we have not promised compatibility, people could be relying on it in scripts. I think it makes sense to waste a couple of lines documenting this change, even if we expect the number of people to be bitten by it to be very low. Agreed. Here is the new text: Store WAL in a continuous stream, rather than skipping the last 16MB segment every 4GB (Heikki Linnakangas) BACKWARD COMPATIBLE BREAK Previously, WAL files ending in FF were not used. If you have WAL backup or restore scripts that took that skipping into account, they need to be adjusted. Unrelated: I think this item Add configuration variable lock_timeout to limit lock wait duration (Zoltán Böszörményi) should go in the locking section. What's of interest here is the new feature to set maximum lock waits. The fact that this is done using a configuration variable is not that important. Agreed. Moved. Thanks. -- Bruce Momjian br...@momjian.ushttp://momjian.us EnterpriseDB http://enterprisedb.com + It's impossible for everything to be true. + -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] 9.3 release notes suggestions
Thanks for the many suggestions on improving the 9.3 release notes. There were many ideas I would have never thought of. Please keep the suggestions coming. -- Bruce Momjian br...@momjian.ushttp://momjian.us EnterpriseDB http://enterprisedb.com + It's impossible for everything to be true. + -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers