Re: [HACKERS] Minmax indexes
On Wed, September 25, 2013 00:14, Alvaro Herrera wrote: [minmax-4-incr.patch] After a --data-checksums initdb (successful), the following error came up: after the statement: create index t_minmax_idx on t using minmax (r); WARNING: page verification failed, calculated checksum 25951 but expected 0 ERROR: invalid page in block 1 of relation base/21324/26267_vm it happens reliably. every time I run the program. Below is the whole program that I used. Thanks, Erik Rijkers #!/bin/sh t=t if [[ 1 -eq 1 ]]; then echo drop table if exists $t ; create table $t as select i, cast( random() * 10^9 as integer ) as r from generate_series(1, 100) as f(i) ; analyze $t; table $t limit 5; select count(*) from $t; explain analyze select min(r), max(r) from $t; select min(r), max(r) from $t; create index ${t}_minmax_idx on $t using minmax (r); analyze $t; explain analyze select min(r), max(r) from $t; select min(r), max(r) from $t; | psql fi -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] pg_dump/restore encoding woes
On 27 August 2013 20:06, Heikki Linnakangas hlinnakan...@vmware.com wrote: On 26.08.2013 18:59, Tom Lane wrote: Heikki Linnakangashlinnakangas@**vmware.com hlinnakan...@vmware.com writes: The pg_dump -E option just sets client_encoding, but I think it would be better for -E to only set the encoding used in the dump, and PGCLIENTENCODING env variable (if set) was used to determine the encoding of the command-line arguments. Opinions? I think this is going to be a lot easier said than done, but feel free to see if you can make it work. (As you point out, we don't have any client-side encoding conversion infrastructure, but I don't see how you're going to make this work without it.) First set client_encoding to PGCLIENTENCODING (ie. let libpq do its default thing), and run the queries that fetch the OIDs of any matching tables. Only after doing that, set client_encoding to that specified by -E. That's actually pretty easy to do, as pg_dump already issues all the queries that include user-given strings first. There's one small wrinkle in that: if the dump fails because the server sends an error, the error would come from the server in the -E encoding, because that's used as the client_encoding after the initial queries. Logically, the errors should be printed in PGCLIENTENCODING. But I think we can live with that. The pg_restore part is more difficult, as pg_restore needs to work without a database connection at all. There the conversion has to be done client-side. A second issue is whether we should divorce -E and PGCLIENTENCODING like that, when they have always meant the same thing. You mentioned the alternative of looking at pg_dump's locale environment to determine the command line encoding --- would that be better? Hmm. I wasn't thinking of looking at the locale environment as an alternative to the divorce, just as a way to determine the default for the client encoding if PGCLIENTENCODING is not set. It would be good for pg_dump to be consistent with other tools (reindexdb etc.). If we say that LC_CTYPE determines the encoding of command-line options, regardless of PGCLIENTENCODING, we should do the same in other tools, too. And that would be weird for the other tools. So no, I don't think we should do that. My plan is to assume that the command-line options are in the client encoding, and the client encoding is determined by the following things, in this order: 1. client_encoding setting given explicitly in the command line, as part of the connection string. 2. PGCLIENTENCODING 3. LC_CTYPE 4. same as server_encoding The encoding to be used in the pg_dump output is a different concept, and there are reasons to *not* want the dump to be in the client encoding. Hence using server_encoding for that would be a better default than the current client encoding. This isn't so visible today, as client_encoding defaults to server_encoding anyway, but it will be if we set client_encoding='auto' (ie. look at LC_CTYPE) by default. There are certainly cases where you'd want to use the client encoding as the dump output encoding. For example if you pipe the pg_dump output to some custom script that mangles it. Or if you just want to look at the output, ie. if the output goes to a terminal. But for the usual case of taking a backup, using the server encoding makes more sense. One option is to check if the output is a tty (like psql does), and output the dump in client or server encoding depending on that (if -E is not given). So yeah, I think we should divorce -E and PGCLIENTENCODING. It feels that using PGCLIENTENCODING to specify the output encoding was more accidental than on purpose. Looking at the history, the implementation has been that way forever, but the docs were adjusted by commit b524cb36 in 2005 to document that fact. Here is a set of three patches that I've been working on: 0001-Divorce-pg_dump-E-option-**from-PGCLIENTENCODING.patch Separates pg_dump -E from PGCLIENTENCODING. Since AH-encoding is now used only to represent dump encoding, we should rename it as AH-dump_encoding. - The standard_conforming_strings parameter is currently set in setup_connection. You have moved it in setup_dump_encoding(). Is it in any way related to encoding ? If not, I think we should keep it in setup_connection(). - If a user supplies pg_dump --role=ROLENAME, we do the SET-ROLE in setup_connection. The role name is in client encoding. In case of parallel pg_dump, the setup_connection() in the parent process does it right, but in the worker processes, the client_encoding is already set to the dump_encoding when -E is given (parent has already updated it's client_encoding to dump_encoding and I think the child inherits client_encoding from parent), and then in the child when SET-ROLE is called through setup_connection, it does not work because role name is in client encoding, not dump encoding. $ pg_dump -t pöö -E
[HACKERS] invalid regexp crashes the server on windows or 9.3
Hi, This regexp call freezes with almost 100% CPU usage, so I guess it creates an infinite loop. With Postgres 9.1 on Linux, I can kill the backend cleanly and get following message: cic_db=# select regexp_matches ('aa bb aa ba baa x','(^((?!aa)))+','gx'); ERROR: invalid regular expression: nfa has too many states on Windows and Postgres 9.3, I used the terminate icon from pgAdmin, which resulted in a FATAL error, killed all other processes and finally stopped the server. I guesss this is a windows issue, but we do not have a Postgres 9.3 installation on Linux yet, so that I can't test it. here the log entries from widows 9.3: 2013-09-25 09:02:40 CEST LOG: Serverprozess (PID 5952) wurde durch Ausnahme 0xC0FD beendet 2013-09-25 09:02:40 CEST DETAIL: Der fehlgeschlagene Prozess führte aus: select regexp_matches ('aa bb aa ba baa x','(^((?!aa)))+','gx') 2013-09-25 09:02:40 CEST TIPP: Sehen Sie die Beschreibung des Hexadezimalwerts in der C-Include-Datei ntstatus.h nach. 2013-09-25 09:02:40 CEST LOG: aktive Serverprozesse werden abgebrochen 2013-09-25 09:02:40 CEST WARNUNG: breche Verbindung ab wegen Absturz eines anderen Serverprozesses 2013-09-25 09:02:40 CEST DETAIL: Der Postmaster hat diesen Serverprozess angewiesen, die aktuelle Transaktion zurückzurollen und die Sitzung zu beenden, weil ein anderer Serverprozess abnormal beendet wurde und möglicherweise das Shared Memory verfälscht hat. 2013-09-25 09:02:40 CEST TIPP: In einem Moment sollten Sie wieder mit der Datenbank verbinden und Ihren Befehl wiederholen können. 2013-09-25 09:02:40 CEST WARNUNG: breche Verbindung ab wegen Absturz eines anderen Serverprozesses 2013-09-25 09:02:40 CEST DETAIL: Der Postmaster hat diesen Serverprozess angewiesen, die aktuelle Transaktion zurückzurollen und die Sitzung zu beenden, weil ein anderer Serverprozess abnormal beendet wurde und möglicherweise das Shared Memory verfälscht hat. 2013-09-25 09:02:40 CEST TIPP: In einem Moment sollten Sie wieder mit der Datenbank verbinden und Ihren Befehl wiederholen können. 2013-09-25 09:02:40 CEST LOG: alle Serverprozesse beendet; initialisiere neu 2013-09-25 09:02:50 CEST FATAL: bereits bestehender Shared-Memory-Block wird noch benutzt 2013-09-25 09:02:50 CEST TIPP: Prüfen Sie, ob irgendwelche alten Serverprozesse noch laufen und beenden Sie diese. translated: exception 0xC0FD stopped PID , check ntstatus.h for the description of the hexadecimal value regards, Marc Mamin
Re: [HACKERS] Minmax indexes
On Sun, Sep 15, 2013 at 5:44 AM, Alvaro Herrera alvhe...@2ndquadrant.com wrote: Hi, Here's a reviewable version of what I've dubbed Minmax indexes. Some people said they would like to use some other name for this feature, but I have yet to hear usable ideas, so for now I will keep calling them this way. I'm open to proposals, but if you pick something that cannot be abbreviated mm I might have you prepare a rebased version which renames the files and structs. The implementation here has been simplified from what I originally proposed at 20130614222805.gz5...@eldon.alvh.no-ip.org -- in particular, I noticed that there's no need to involve aggregate functions at all; we can just use inequality operators. So the pg_amproc entries are gone; only the pg_amop entries are necessary. I've somewhat punted on the question of doing resummarization separately from vacuuming. Right now, resummarization (as well as other necessary index cleanup) takes place in amvacuumcleanup. This is not optimal; I have stated elsewhere that I'd like to create separate maintenance actions that can be carried out by autovacuum. That would be useful both for Minmax indexes and GIN indexes (pending insertion list); maybe others. That's not part of this patch, however. The design of this stuff is in the file minmax-proposal at the top of the tree. That file is up to date, though it still contains some open questions that were present in the original proposal. (I have not fixed some bogosities pointed out by Noah, for instance. I will do that shortly.) In a final version, that file would be applied as src/backend/access/minmax/README, most likely. One area on which I needed to modify core code is IndexBuildHeapScan. I needed a version that was able to scan only a certain range of pages, not the entire table, so I introduced a new IndexBuildHeapRangeScan, and added a quick heap_scansetlimits function. I haven't tested that this works outside of the HeapRangeScan thingy, so it's probably completely bogus; I'm open to suggestions if people think this should be implemented differently. In any case, keeping that implementation together with vanilla IndexBuildHeapScan makes a lot of sense. One thing still to tackle is when to mark ranges as unsummarized. Right now, any new tuple on a page range would cause a new index entry to be created and a new revmap update. This would cause huge index bloat if, say, a page is emptied and vacuumed and filled with new tuples with increasing values outside the original range; each new tuple would create a new index tuple. I have two ideas about this (1. mark range as unsummarized if 3rd time we touch the same page range; Why only at 3rd time? Doesn't it need to be precise, like if someone inserts a row having value greater than max value of corresponding index tuple, then that index tuple's corresponding max value needs to be updated and I think its updated with the help of validity map. For example: considering we need to store below info for each index tuple: In each index tuple (corresponding to one page range), we store: - first block this tuple applies to - last block this tuple applies to - for each indexed column: * min() value across all tuples in the range * max() value across all tuples in the range Assume first and last block for index tuple is same (assume block no. 'x') and min value is 5 and max is 10. Now user insert/update value in block 'x' such that max value of index col. is 11, if we don't update corresponding index tuple or at least invalidate it, won't it lead to wrong results? 2. vacuum the affected index page if it's full, so we can maintain the index always up to date without causing unduly bloat), but I haven't implemented anything yet. The amcostestimate routine is completely bogus; right now it returns constant 0, meaning the index is always chosen if it exists. I think for first version, you might want to keep things simple, but there should be some way for optimizer to select this index. So rather than choose if it is present, we can make optimizer choose when some-one says set enable_minmax index to true. How about keeping this up-to-date during foreground operations. Vacuum/Maintainer task maintaining things usually have problems of bloat and then we need optimize/workaround issues. Lot of people have raised this or similar point previously and what I read you are of opinion that it seems to be slow. I really don't think that it can be so slow that adding so much handling to get it up-to-date by some maintainer task is useful. Currently there are systems like Oracle where index clean-up is mainly done during foreground operation, so this alone cannot be reason for slowness. Comparing the logic with IOS is also not completely right as for IOS, we need to know each tuple's visibility, which is not the case here. Now it can so happen
Re: [HACKERS] invalid regexp crashes the server on windows or 9.3
On Wed, September 25, 2013 09:33, Marc Mamin wrote: Hi, This regexp call freezes with almost 100% CPU usage, so I guess it creates an infinite loop. With Postgres 9.1 on Linux, I can kill the backend cleanly and get following message: cic_db=# select regexp_matches ('aa bb aa ba baa x','(^((?!aa)))+','gx'); ERROR: invalid regular expression: nfa has too many states on Windows and Postgres 9.3, I used the terminate icon from pgAdmin, which resulted in a FATAL error, killed all other processes and finally stopped the server. I guesss this is a windows issue, but we do not have a Postgres 9.3 installation on Linux yet, so that I can't test it. On 9.3 and 9.4devel, on linux (centos 6.4), that statement / regex returns after ~2 minutes (on a modest desktop) with that same error, but not crashing. Erik Rijkers -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] FW: REVIEW: Allow formatting in log_line_prefix
On Wed, Sep 25, 2013 at 1:20 AM, Robert Haas robertmh...@gmail.com wrote: On Tue, Sep 24, 2013 at 5:04 AM, David Rowley dgrowle...@gmail.com wrote: So... I guess the question that I'd ask is, if you write a PL/pgsql function that does RAISE NOTICE in a loop a large number of times, can you measure any difference in how fast that function executes on the patch and unpatched code? If so, how much? I do see a 15-18% slow down with the patched version, so perhaps I'll need to look to see if I can speed it up a bit, although I do feel this benchmark is not quite a normal workload. Ouch! That's pretty painful. I agree that's not a normal workload, but I don't think it's an entirely unfair benchmark, either. There certainly are people who suffer because of the cost of logging as things are; for example, log_min_duration_statement is commonly used and can produce massive amounts of output on a busy system. I wouldn't mind too much if the slowdown you are seeing only occurred when the feature is actually used, but taking a 15-18% hit on logging even when the new formatting features aren't being used seems too expensive to me. Ok, I think I've managed to narrow the performance gap to just about nothing but noise, though to do this the code is now a bit bigger. I've added a series of tests to see if the padding is 0 and if it's not then I'm doing things the old way. I've also added a some code which does a fast test to see if it is worth while calling the padding processing function. This is just a simple if (*p = '9'), I'm not completely happy with that as it does look a bit weird, but to compensate I've added a good comment to explain what it is doing. Please find attached the new patch ... version v0.5 and also updated benchmark results. Regards David -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company log_line_prefix_benchmark_stresslog_v0.5.xls Description: MS-Excel spreadsheet log_line_formatting_v0.5.patch Description: Binary data -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [PATCH] Revive line type
Hi, I had a look over this patch and here are my review points: 1. Patch applies cleanly. 2. make, make install and make check is good. 3. I did lot of random testing and didn't find any issue. 4. Test coverage is very well. It has all scenarios and all operators are tested with line. That's really great. So no issues from my side. However, do we still need this in close_pl() ? #ifdef NOT_USED if (FPeq(line-A, -1.0) FPzero(line-B)) {/* vertical */ } #endif Also close_sl, close_lb and dist_lb are NOT yet implemented. It will be good if we have those. But I don't think we should wait for those functions to be implemented. We can go ahead with this. Please confirm above concern so that I will mark it as Ready for Committer. Thanks On Tue, Sep 17, 2013 at 7:34 AM, Alvaro Herrera alvhe...@2ndquadrant.comwrote: David Fetter escribió: Should the things you tried and others be in the regression tests? If so, should we start with whatever had been in the regression tests when the line type was dropped? Actually, the patch does include a regression test for the revived type (and it passes). I don't think more than that is needed. -- Álvaro Herrerahttp://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers -- Jeevan B Chalke Principal Software Engineer, Product Development EnterpriseDB Corporation
Re: [HACKERS] Freezing without write I/O
On 18.09.2013 22:55, Jeff Janes wrote: On Mon, Sep 16, 2013 at 6:59 AM, Heikki Linnakangashlinnakan...@vmware.com wrote: Here's a rebased version of the patch, including the above-mentioned fixes. Nothing else new. I've applied this to 0892ecbc015930d, the last commit to which it applies cleanly. When I test this by repeatedly incrementing a counter in a randomly chosen row, then querying the whole table and comparing the results to what my driver knows they should be, I get discrepancies. Ok, I found the bug. The problem was that when a HOT chain begins with a dead tuple, when the page was frozen, the dead tuple was not removed, but the xmin of the live tuple in the chain was replaced with FrozenXid. That breaks the HOT-chain following code, which checks that the xmin of the next tuple in the chain matches the xmax of the previous tuple. I fixed that by simply not freezing a page which contains any dead tuples. That's OK because the page will be visited by vacuum before it becomes old enough to be mature. However, it means that the invariant that a page can only contain XIDs within one XID-LSN range, determined by the LSN, is no longer true. AFAICS it everything still works, but I would feel more comfortable if we could uphold that invariant, for debugging reasons if nothing else. Will have to give that some more thought.. Thanks for the testing! I just posted an updated version of the patch elsewhere in this thread. - Heikki -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] Cube extension split algorithm fix
Hello, hackers. I've fixed split algorithm that was implemented in cube extension. I've changed it according to the original Guttman paper (old version was more simple algorithm) and also ported Alexander Korotkov's algorithm from box datatype indexing that work faster and better on low dimensions. On my test dataset (1M records, 7 dimensions, real world database of goods) numbers was following: Building index over table (on expression): old: 67.296058 seconds new: 48.842391 seconds Cube point search, mean, 100 queries old: 0.001025 seconds new: 0.000427 seconds Index on field size: old: 562 MB new: 283 MB Stas. split.diff Description: Binary data -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] UTF8 national character data type support WIP patch and list of open issues.
From: Peter Eisentraut pete...@gmx.net On Tue, 2013-09-24 at 21:04 +0900, MauMau wrote: 4. I guess some users really want to continue to use ShiftJIS or EUC_JP for database encoding, and use NCHAR for a limited set of columns to store international text in Unicode: - to avoid code conversion between the server and the client for performance - because ShiftJIS and EUC_JP require less amount of storage (2 bytes for most Kanji) than UTF-8 (3 bytes) This use case is described in chapter 6 of Oracle Database Globalization Support Guide. But your proposal wouldn't address the first point, because data would have to go client - server - NCHAR. The second point is valid, but it's going to be an awful amount of work for that limited result. I (or, Oracle's use case) meant the following, for example: initdb -E EUC_JP CREATE DATABASE mydb ENCODING EUC_JP NATIONAL ENCODING UTF-8; CREATE TABLE mytable ( col1 char(10), -- EUC_JP text col2 Nchar(10), -- UTF-8 text ); client encoding = EUC_JP That is, 1. Currently, the user is only handling Japanese text. To avoid unnecessary conversion, he uses EUC_JP for both client and server. 2. He needs to store some limited amount of international (non-Japanese) text in a few columns for a new feature of the system. But the international text is limited, so he wants to sacrifice performance and storage cost due to code conversion for most text and more bytes for each character. Regards MauMau -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Freezing without write I/O
On 9/25/13 5:31 AM, Heikki Linnakangas wrote: Attached is a new version, which adds that field to HeapTupleData. Most of the issues on you listed above have been fixed, plus a bunch of other bugs I found myself. The bug that Jeff ran into with his count.pl script has also been fixed. This patch doesn't apply at all: Reversed (or previously applied) patch detected! Assuming -R. 1 out of 39 hunks FAILED -- saving rejects to file src/backend/access/heap/heapam.c.rej 5 out of 9 hunks FAILED -- saving rejects to file src/backend/access/heap/rewriteheap.c.rej Reversed (or previously applied) patch detected! Assuming -R. Reversed (or previously applied) patch detected! Assuming -R. Reversed (or previously applied) patch detected! Assuming -R. 10 out of 12 hunks FAILED -- saving rejects to file src/backend/access/transam/xlog.c.rej 13 out of 18 hunks FAILED -- saving rejects to file src/backend/commands/cluster.c.rej Reversed (or previously applied) patch detected! Assuming -R. 15 out of 17 hunks FAILED -- saving rejects to file src/backend/commands/vacuum.c.rej 5 out of 24 hunks FAILED -- saving rejects to file src/backend/commands/vacuumlazy.c.rej 3 out of 9 hunks FAILED -- saving rejects to file src/backend/postmaster/autovacuum.c.rej Reversed (or previously applied) patch detected! Assuming -R. Reversed (or previously applied) patch detected! Assuming -R. Reversed (or previously applied) patch detected! Assuming -R. 1 out of 4 hunks FAILED -- saving rejects to file src/backend/utils/misc/guc.c.rej Reversed (or previously applied) patch detected! Assuming -R. Reversed (or previously applied) patch detected! Assuming -R. 1 out of 2 hunks FAILED -- saving rejects to file src/include/commands/cluster.h.rej 1 out of 1 hunk FAILED -- saving rejects to file src/include/commands/vacuum.h.rej Reversed (or previously applied) patch detected! Assuming -R. -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] GIN improvements part 1: additional information
On 9/23/13 5:36 PM, Alexander Korotkov wrote: In the attached version of patch double finding of ItemPointer during insert is avoided. Overhead becomes lower as expected. Fails cpluspluscheck: ./src/include/access/gin_private.h: In function ‘char* ginDataPageLeafReadItemPointer(char*, ItemPointer)’: ./src/include/access/gin_private.h:797:2: warning: comparison between signed and unsigned integer expressions [-Wsign-compare] -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Cube extension kNN support
On 9/22/13 7:38 PM, Stas Kelvich wrote: Here is the patch that introduces kNN search for cubes with euclidean, taxicab and chebyshev distances. cube and earthdistance regression tests fail. -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] invalid regexp crashes the server on windows or 9.3
Erik Rijkers e...@xs4all.nl writes: On Wed, September 25, 2013 09:33, Marc Mamin wrote: This regexp call freezes with almost 100% CPU usage, so I guess it creates an infinite loop. cic_db=# select regexp_matches ('aa bb aa ba baa x','(^((?!aa)))+','gx'); I poked into this a bit and decided that it's a variant of a problem we've seen before. I filed a report with the Tcl crew to see if they have any opinion about it: https://core.tcl.tk/tcl/tktview?name=8f245009b0 but prior experience suggests they won't be a lot of help. On 9.3 and 9.4devel, on linux (centos 6.4), that statement / regex returns after ~2 minutes (on a modest desktop) with that same error, but not crashing. Yeah, it should in principle fail eventually with a too-many-states error, since the loop is creating more NFA states than it deletes. But on a Mac laptop with an assert-enabled PG build, I got tired of waiting for that after about an hour ... regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Freezing without write I/O
On 25.09.2013 15:48, Peter Eisentraut wrote: On 9/25/13 5:31 AM, Heikki Linnakangas wrote: Attached is a new version, which adds that field to HeapTupleData. Most of the issues on you listed above have been fixed, plus a bunch of other bugs I found myself. The bug that Jeff ran into with his count.pl script has also been fixed. This patch doesn't apply at all: Huh, that's strange. Merged with latest changes from master, here's a new version that applies. - Heikki xid-lsn-ranges-5.patch.gz Description: GNU Zip compressed data -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Some interesting news about Linux 3.12 OOM
On Wed, Sep 25, 2013 at 12:15 AM, Daniel Farina dan...@heroku.com wrote: Enable the memcg OOM killer only for user faults, where it's really the only option available. Is this really a big deal? I would expect most faults to be user faults. It's certainly a big deal that we need to ensure we can handle ENOMEM from syscalls and library functions we weren't expecting to return it. But I don't expect it to actually reduce the OOM killing sprees by much. -- greg
Re: [HACKERS] logical changeset generation v6
On 09/17/2013 10:31 AM, Andres Freund wrote: This patch set now fails to apply because of the commit Rename various freeze multixact variables. And I am even partially guilty for that patch... Rebased patches attached. While testing the logical replication changes against my WIP logical slony I am sometimes getting error messages from the WAL sender of the form: unexpected duplicate for tablespace X relfilenode X The call stack is HeapTupleSatisfiesMVCCDuringDecoding heap_hot_search_buffer index_fetch_heap index_getnext systable_getnext RelidByRelfilenode ReorderBufferCommit DecodeCommit . . . I am working off something based on your version e0acfeace6d695c229efd5d78041a1b734583431 Any ideas? Greetings, Andres Freund -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] Re: UTF8 national character data type support WIP patch and list of open issues.
On Tue, Sep 24, 2013 at 1:04 PM, MauMau maumau...@gmail.com wrote: Yes, so Tatsuo san suggested to restrict server encoding - NCHAR encoding combination to those with lossless conversion. If it's not lossy then what's the point? From the client's point of view it'll be functionally equivalent to text then. -- greg
Re: [HACKERS] logical changeset generation v6
On 2013-09-25 11:01:44 -0400, Steve Singer wrote: On 09/17/2013 10:31 AM, Andres Freund wrote: This patch set now fails to apply because of the commit Rename various freeze multixact variables. And I am even partially guilty for that patch... Rebased patches attached. While testing the logical replication changes against my WIP logical slony I am sometimes getting error messages from the WAL sender of the form: unexpected duplicate for tablespace X relfilenode X Any chance you could provide a setup to reproduce the error? Any ideas? I'll look into it. Could you provide any context to what youre doing that's being decoded? Greetings, Andres Freund -- Andres Freund http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] SSI freezing bug
On 22.09.2013 00:12, Hannu Krosing wrote: On 09/21/2013 10:46 PM, Andres Freund wrote: Heikki Linnakangashlinnakan...@vmware.com schrieb: Kevin Grittnerkgri...@ymail.com wrote: Andres Freundand...@2ndquadrant.com wrote: On 2013-09-20 13:55:36 +0300, Heikki Linnakangas wrote: When a tuple is predicate-locked, the key of the lock is ctid+xmin. However, when a tuple is frozen, its xmin is changed to FrozenXid. That effectively invalidates any predicate lock on the tuple, as checking for a lock on the same tuple later won't find it as the xmin is different. Attached is an isolationtester spec to demonstrate this. Do you have any idea to fix that besides keeping the xmin horizon below the lowest of the xids that are predicate locked? Which seems nasty to compute and is probably not trivial to fit into the procarray.c machinery? A better solution probably is to promote tuple-level locks if they exist to a relation level one upon freezing I guess? That would work. A couple other ideas would be to use the oldest serializable xmin which is being calculated in predicate.c to either prevent freezing of newer transaction or to summarize predicate locks (using the existing SLRU-based summarization functions) which use xmin values for xids which are being frozen. The transactions must already be committed, and so are eligible for summarization. What's the point of using xmin as part of the lock key in the first place, wouldn't ctid alone suffice? If the tuple was visible to the reading transaction, it cannot be vacuumed away until the transaction commits. No other tuple can appear with the same ctid. SSI locks can live longer than one TX/snapshot. But the only way that ctid can change without xmin changing is when you update the tuple in the same TX that created it. I don't think that's relevant. We're not talking about the ctid field of a tuple, ie. HeapTupleHeader.t_ctid. We're talking about its physical location, ie. HeapTuple-t_self. - Heikki -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] record identical operator
Kevin Grittner kgri...@ymail.com wrote: Here is v2 of the patch which changes from the universally disliked operator names v1 used. It also fixes bugs in the row comparisons for pass-by-reference types, fixes a couple nearby comments, and adds regression tests for a matview containing a box column. I accidentally omitted the regression tests for matview with a box column from v2. Rather than redoing the whole thing, here is a v2a patch to add just that omitted part. -- Kevin Grittner EDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company*** a/src/test/regress/expected/matview.out --- b/src/test/regress/expected/matview.out *** *** 412,418 ERROR: new data for mv contains duplicate rows without any NULL columns DETAIL: Row: (1,10) DROP TABLE foo CASCADE; NOTICE: drop cascades to materialized view mv ! -- make sure that all indexes covered by unique indexes works CREATE TABLE foo(a, b, c) AS VALUES(1, 2, 3); CREATE MATERIALIZED VIEW mv AS SELECT * FROM foo; CREATE UNIQUE INDEX ON mv (a); --- 412,418 DETAIL: Row: (1,10) DROP TABLE foo CASCADE; NOTICE: drop cascades to materialized view mv ! -- make sure that all columns covered by unique indexes works CREATE TABLE foo(a, b, c) AS VALUES(1, 2, 3); CREATE MATERIALIZED VIEW mv AS SELECT * FROM foo; CREATE UNIQUE INDEX ON mv (a); *** *** 424,426 REFRESH MATERIALIZED VIEW mv; --- 424,446 REFRESH MATERIALIZED VIEW CONCURRENTLY mv; DROP TABLE foo CASCADE; NOTICE: drop cascades to materialized view mv + -- make sure that types with unusual equality tests work + CREATE TABLE boxes (id serial primary key, b box); + INSERT INTO boxes (b) VALUES + ('(32,32),(31,31)'), + ('(2.004,2.004),(1,1)'), + ('(1.996,1.996),(1,1)'); + CREATE MATERIALIZED VIEW boxmv AS SELECT * FROM boxes; + CREATE UNIQUE INDEX boxmv_id ON boxmv (id); + UPDATE boxes SET b = '(2,2),(1,1)' WHERE id = 2; + REFRESH MATERIALIZED VIEW CONCURRENTLY boxmv; + SELECT * FROM boxmv ORDER BY id; + id | b + +- + 1 | (32,32),(31,31) + 2 | (2,2),(1,1) + 3 | (1.996,1.996),(1,1) + (3 rows) + + DROP TABLE boxes CASCADE; + NOTICE: drop cascades to materialized view boxmv *** a/src/test/regress/sql/matview.sql --- b/src/test/regress/sql/matview.sql *** *** 143,149 REFRESH MATERIALIZED VIEW mv; REFRESH MATERIALIZED VIEW CONCURRENTLY mv; DROP TABLE foo CASCADE; ! -- make sure that all indexes covered by unique indexes works CREATE TABLE foo(a, b, c) AS VALUES(1, 2, 3); CREATE MATERIALIZED VIEW mv AS SELECT * FROM foo; CREATE UNIQUE INDEX ON mv (a); --- 143,149 REFRESH MATERIALIZED VIEW CONCURRENTLY mv; DROP TABLE foo CASCADE; ! -- make sure that all columns covered by unique indexes works CREATE TABLE foo(a, b, c) AS VALUES(1, 2, 3); CREATE MATERIALIZED VIEW mv AS SELECT * FROM foo; CREATE UNIQUE INDEX ON mv (a); *** *** 154,156 INSERT INTO foo VALUES(3, 4, 5); --- 154,169 REFRESH MATERIALIZED VIEW mv; REFRESH MATERIALIZED VIEW CONCURRENTLY mv; DROP TABLE foo CASCADE; + + -- make sure that types with unusual equality tests work + CREATE TABLE boxes (id serial primary key, b box); + INSERT INTO boxes (b) VALUES + ('(32,32),(31,31)'), + ('(2.004,2.004),(1,1)'), + ('(1.996,1.996),(1,1)'); + CREATE MATERIALIZED VIEW boxmv AS SELECT * FROM boxes; + CREATE UNIQUE INDEX boxmv_id ON boxmv (id); + UPDATE boxes SET b = '(2,2),(1,1)' WHERE id = 2; + REFRESH MATERIALIZED VIEW CONCURRENTLY boxmv; + SELECT * FROM boxmv ORDER BY id; + DROP TABLE boxes CASCADE; -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] SSI freezing bug
On 21.09.2013 23:46, Andres Freund wrote: Heikki Linnakangashlinnakan...@vmware.com schrieb: Kevin Grittnerkgri...@ymail.com wrote: Andres Freundand...@2ndquadrant.com wrote: On 2013-09-20 13:55:36 +0300, Heikki Linnakangas wrote: When a tuple is predicate-locked, the key of the lock is ctid+xmin. However, when a tuple is frozen, its xmin is changed to FrozenXid. That effectively invalidates any predicate lock on the tuple, as checking for a lock on the same tuple later won't find it as the xmin is different. Attached is an isolationtester spec to demonstrate this. Do you have any idea to fix that besides keeping the xmin horizon below the lowest of the xids that are predicate locked? Which seems nasty to compute and is probably not trivial to fit into the procarray.c machinery? A better solution probably is to promote tuple-level locks if they exist to a relation level one upon freezing I guess? That would work. A couple other ideas would be to use the oldest serializable xmin which is being calculated in predicate.c to either prevent freezing of newer transaction or to summarize predicate locks (using the existing SLRU-based summarization functions) which use xmin values for xids which are being frozen. The transactions must already be committed, and so are eligible for summarization. What's the point of using xmin as part of the lock key in the first place, wouldn't ctid alone suffice? If the tuple was visible to the reading transaction, it cannot be vacuumed away until the transaction commits. No other tuple can appear with the same ctid. SSI locks can live longer than one TX/snapshot. Hmm, true. Another idea: we could vacuum away predicate locks, when the corresponding tuples are vacuumed from the heap. Before the 2nd phase of vacuum, before removing the dead tuples, we could scan the predicate lock hash and remove any locks belonging to the tuples we're about to remove. And not include xmin in the lock key. - Heikki -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Cube extension types support
On 9/24/13 11:44 AM, Stas Kelvich wrote: In this patch I've implemented support for different storage types for cubes. Doesn't build: gcc -O2 -Wall -Wmissing-prototypes -Wpointer-arith -Wdeclaration-after-statement -Wendif-labels -Wmissing-format-attribute -Wformat-security -fno-strict-aliasing -fwrapv -fexcess-precision=standard -g -fpic -I. -I. -I../../src/include -D_GNU_SOURCE -I/usr/include/libxml2 -c -o cubeparse.o cubeparse.c -MMD -MP -MF .deps/cubeparse.Po cubeparse.y:35:13: warning: ‘check_coord’ used but never defined [enabled by default] gcc -O2 -Wall -Wmissing-prototypes -Wpointer-arith -Wdeclaration-after-statement -Wendif-labels -Wmissing-format-attribute -Wformat-security -fno-strict-aliasing -fwrapv -fexcess-precision=standard -g -fpic -shared -o cube.so cube.o cubeparse.o -L../../src/port -L../../src/common -L/usr/lib -Wl,--as-needed -lm /usr/bin/ld: cubeparse.o: relocation R_X86_64_PC32 against symbol `check_coord' can not be used when making a shared object; recompile with -fPIC /usr/bin/ld: final link failed: Bad value collect2: error: ld returned 1 exit status make[2]: *** [cube.so] Error 1 -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] FW: REVIEW: Allow formatting in log_line_prefix
On 9/25/13 4:46 AM, David Rowley wrote: Please find attached the new patch ... version v0.5 and also updated benchmark results. Please fix compiler warnings: elog.c: In function ‘log_line_prefix.isra.3’: elog.c:2436:22: warning: ‘padding’ may be used uninitialized in this function [-Wmaybe-uninitialized] elog.c:2172:6: note: ‘padding’ was declared here -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Cube extension split algorithm fix
On 9/25/13 7:14 AM, Stas Kelvich wrote: I've fixed split algorithm that was implemented in cube extension. This patch creates a bunch of new compiler warnings. Please fix those. -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] logical changeset generation v6
On 09/25/2013 11:08 AM, Andres Freund wrote: On 2013-09-25 11:01:44 -0400, Steve Singer wrote: On 09/17/2013 10:31 AM, Andres Freund wrote: This patch set now fails to apply because of the commit Rename various freeze multixact variables. And I am even partially guilty for that patch... Rebased patches attached. While testing the logical replication changes against my WIP logical slony I am sometimes getting error messages from the WAL sender of the form: unexpected duplicate for tablespace X relfilenode X Any chance you could provide a setup to reproduce the error? The steps to build a setup that should reproduce this error are: 1. I had apply the attached patch on top of your logical replication branch so my pg_decode_init would now if it was being called as part of a INIT_REPLICATION or START_REPLICATION. Unless I have misunderstood something you probably will want to merge this fix in 2. Get my WIP for adding logical support to slony from: g...@github.com:ssinger/slony1-engine.git branch logical_repl (4af1917f8418a) (My code changes to slony are more prototype level code quality than production code quality) 3. cd slony1-engine ./configure --with-pgconfigdir=/usr/local/pg94wal/bin (or whatever) make make install 4. Grab the clustertest framework JAR from https://github.com/clustertest/clustertest-framework and build up a clustertest jar file 5. Create a file slony1-engine/clustertest/conf/java.conf that contains the path to the above JAR file as a shell variable assignment: ie CLUSTERTESTJAR=/home/ssinger/src/clustertest/clustertest_git/build/jar/clustertest-coordinator.jar 6. cp clustertest/conf/disorder.properties.sample clustertest/conf/disorder.properties edit disorder.properites to have the proper values for your environment. All 6 databases can point at the same postgres instance, this test will only actually use 2 of them(so far). 7. Run the test cd clustertest ./run_all_disorder_tests.sh This involves having the slon connect to the walsender on the database test1 and replicate the data into test2 (which is a different database on the same postmaster) If this setup seems like too much effort I can request one of the commitfest VM's from Josh and get everything setup there for you. Steve Any ideas? I'll look into it. Could you provide any context to what youre doing that's being decoded? Greetings, Andres Freund From 9efae980c357d3b75c6d98204ed75b8d29ed6385 Mon Sep 17 00:00:00 2001 From: Steve Singer ssin...@ca.afilias.info Date: Mon, 16 Sep 2013 10:21:39 -0400 Subject: [PATCH] set the init parameter to true when performing an INIT REPLICATION --- src/backend/replication/walsender.c |2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/src/backend/replication/walsender.c b/src/backend/replication/walsender.c index 2187d96..1b8f289 100644 --- a/src/backend/replication/walsender.c +++ b/src/backend/replication/walsender.c @@ -743,7 +743,7 @@ InitLogicalReplication(InitLogicalReplicationCmd *cmd) sendTimeLine = ThisTimeLineID; initStringInfo(output_message); - ctx = CreateLogicalDecodingContext(MyLogicalDecodingSlot, false, InvalidXLogRecPtr, + ctx = CreateLogicalDecodingContext(MyLogicalDecodingSlot, true, InvalidXLogRecPtr, NIL, replay_read_page, WalSndPrepareWrite, WalSndWriteData); -- 1.7.10.4 -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] FW: REVIEW: Allow formatting in log_line_prefix
On Thu, Sep 26, 2013 at 4:57 AM, Peter Eisentraut pete...@gmx.net wrote: On 9/25/13 4:46 AM, David Rowley wrote: Please find attached the new patch ... version v0.5 and also updated benchmark results. Please fix compiler warnings: elog.c: In function ‘log_line_prefix.isra.3’: elog.c:2436:22: warning: ‘padding’ may be used uninitialized in this function [-Wmaybe-uninitialized] elog.c:2172:6: note: ‘padding’ was declared here Fixed in attached version. Regards David Rowley log_line_formatting_v0.6.patch Description: Binary data -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Minmax indexes
Amit Kapila escribió: On Sun, Sep 15, 2013 at 5:44 AM, Alvaro Herrera alvhe...@2ndquadrant.com wrote: One thing still to tackle is when to mark ranges as unsummarized. Right now, any new tuple on a page range would cause a new index entry to be created and a new revmap update. This would cause huge index bloat if, say, a page is emptied and vacuumed and filled with new tuples with increasing values outside the original range; each new tuple would create a new index tuple. I have two ideas about this (1. mark range as unsummarized if 3rd time we touch the same page range; Why only at 3rd time? Doesn't it need to be precise, like if someone inserts a row having value greater than max value of corresponding index tuple, then that index tuple's corresponding max value needs to be updated and I think its updated with the help of validity map. Of course. Note I no longer have the concept of a validity map; I have switched things to use a reverse range map, or revmap for short. The revmap is responsible for mapping each page number to an individual index TID. If the TID stored in the revmap is InvalidTid, that means the range is not summarized. Summarized ranges are always considered as match query quals, and thus all tuples in them are returned in the bitmap for heap recheck. The way it works currently, is that any tuple insert (that's outside the bounds of the current index tuple) causes a new index tuple to be created, and the revmap is updated to point to the new index tuple. The old index tuple is orphaned and will be deleted at next vacuum. This works fine. However the problem is excess orphaned tuples; I don't want a long series of updates to create many orphaned dead tuples. Instead I would like the system to, at some point, stop creating new index tuples and instead set the revmap to InvalidTid. That would stop the index bloat. For example: considering we need to store below info for each index tuple: In each index tuple (corresponding to one page range), we store: - first block this tuple applies to - last block this tuple applies to - for each indexed column: * min() value across all tuples in the range * max() value across all tuples in the range Assume first and last block for index tuple is same (assume block no. 'x') and min value is 5 and max is 10. Now user insert/update value in block 'x' such that max value of index col. is 11, if we don't update corresponding index tuple or at least invalidate it, won't it lead to wrong results? Sure, that would result in wrong results. Fortunately that's not how I am suggesting to do it. I note you're reading an old version of the design. I realize now that this is my mistake because instead of posting the new design in the cover letter for the patch, I only put it in the minmax-proposal file. Please give that file a read to see how the design differs from the design I originally posted in the old thread. The amcostestimate routine is completely bogus; right now it returns constant 0, meaning the index is always chosen if it exists. I think for first version, you might want to keep things simple, but there should be some way for optimizer to select this index. So rather than choose if it is present, we can make optimizer choose when some-one says set enable_minmax index to true. Well, enable_bitmapscan already disables minmax indexes, just like it disables other indexes. How about keeping this up-to-date during foreground operations. Vacuum/Maintainer task maintaining things usually have problems of bloat and then we need optimize/workaround issues. Lot of people have raised this or similar point previously and what I read you are of opinion that it seems to be slow. Well, the current code does keep the index up to date -- I did choose to implement what people suggested :-) Now it can so happen that min and max values are sometimes not right because later the operation is rolled back, but I think such cases will be less and we can find some way to handle such cases may be maintainer task only, but the handling will be quite simpler. Agreed. On Windows, patch gives below compilation errors: src\backend\access\minmax\mmtuple.c(96): error C2057: expected constant expression I have fixed all these compile errors (fix attached). Thanks for reporting them. I'll post a new version shortly. -- Álvaro Herrerahttp://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services *** a/src/backend/access/minmax/mmtuple.c --- b/src/backend/access/minmax/mmtuple.c *** *** 93,117 MMTuple * minmax_form_tuple(TupleDesc idxDsc, TupleDesc diskDsc, DeformedMMTuple *tuple, Size *size) { ! Datum values[diskDsc-natts]; ! bool nulls[diskDsc-natts]; bool anynulls = false; MMTuple*rettuple; int keyno; uint16 phony_infomask; ! bits8
Re: [HACKERS] Minmax indexes
Erik Rijkers wrote: After a --data-checksums initdb (successful), the following error came up: after the statement: create index t_minmax_idx on t using minmax (r); WARNING: page verification failed, calculated checksum 25951 but expected 0 ERROR: invalid page in block 1 of relation base/21324/26267_vm it happens reliably. every time I run the program. Thanks for the report. That's fixed with the attached. Below is the whole program that I used. Hmm, this test program shows that you're trying to use the index to optimize min() and max() queries, but that's not what these indexes do. You will need to use operators = = = (or BETWEEN, which is the same thing) to see your index in action. -- Álvaro Herrerahttp://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services commit 762ebb8f6ecfb36b9976bb9aa87d0a7f8fb601b4 Author: Alvaro Herrera alvhe...@alvh.no-ip.org Date: Wed Sep 25 17:03:28 2013 -0300 Set checksum to new revmap pages per Erik Rijkers diff --git a/src/backend/access/minmax/mmrevmap.c b/src/backend/access/minmax/mmrevmap.c index 43bff95..5ff5ca2 100644 --- a/src/backend/access/minmax/mmrevmap.c +++ b/src/backend/access/minmax/mmrevmap.c @@ -323,6 +323,7 @@ mmRevmapExtend(mmRevmapAccess *rmAccess, BlockNumber blkno) while (blkno = rmAccess-physPagesInRevmap) { extended = true; + PageSetChecksumInplace(page, blkno); smgrextend(rmAccess-idxrel-rd_smgr, MM_REVMAP_FORKNUM, rmAccess-physPagesInRevmap, page, false); rmAccess-physPagesInRevmap++; -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] UTF8 national character data type support WIP patch and list of open issues.
From: Greg Stark st...@mit.edu If it's not lossy then what's the point? From the client's point of view it'll be functionally equivalent to text then. Sorry, what Tatsuo san suggested meant was same or compatible, not lossy. I quote the relevant part below. This is enough for the use case I mentioned in my previous mail several hours ago (actually, that is what Oracle manual describes...). http://www.postgresql.org/message-id/20130920.085853.1628917054830864151.t-is...@sraoss.co.jp [Excerpt] What about limiting to use NCHAR with a database which has same encoding or compatible encoding (on which the encoding conversion is defined)? This way, NCHAR text can be automatically converted from NCHAR to the database encoding in the server side thus we can treat NCHAR exactly same as CHAR afterward. I suppose what encoding is used for NCHAR should be defined in initdb time or creation of the database (if we allow this, we need to add a new column to know what encoding is used for NCHAR). For example, CREATE TABLE t1(t NCHAR(10)) will succeed if NCHAR is UTF-8 and database encoding is UTF-8. Even succeed if NCHAR is SHIFT-JIS and database encoding is UTF-8 because there is a conversion between UTF-8 and SHIFT-JIS. However will not succeed if NCHAR is SHIFT-JIS and database encoding is ISO-8859-1 because there's no conversion between them. Regards MauMau -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [PATCH] bitmap indexes
Hi, Here are some quick items while skimming this patch. I am looking at commit 6448de29d from your github repo, branch bmi. What's with the pg_bitmapindex stuff in pg_namespace.h? It doesn't seem to be used anywhere. This led me to research how these indexes are stored. I note that what we're doing here is to create another regular table and a btree index on top of it, and those are the ones that actually store the index data. This seems grotty and completely unlike the way we do things elsewhere (compare GIN indexes which have rbtrees inside them). I think this should be stored in separate forks, or separate kinds of pages intermixed in the index's main fork. I don't like that files are named bitmap.c/.h. We already have bitmap scans, so having the generic concept (bitmap) show up as a file name is confusing. To cite an example, see the name of the bmgetbitmap function (which returns a TID bitmap from a bitmap index. How is that not confusing?). I think I would be more comfortable with the files being called bmi or maybe bitmapidx, or something like that. (For sure do not change the AM's name. I mean CREATE INDEX .. USING bmi would suck.) Not happy with (the contents of) src/include/access/bitmap.h. There's way too much stuff in a single file. I think a three-file split would work nicely: one for the AM routine declarations (bitmap.h), one for xlog stuff (bitmap_xlog.h), one for internal routines and struct declarations (bitmap_priv.h, bitmap_internals.h, or something like that). Also, I think macros and structs should be defined in a narrow scope as possible; for example, macros such as HEADER_SET_FILL_BIT_ON should be defined in bitmaputil.c, not bitmap.h (that macro is missing parens BTW). For macros that are defined in headers, it would be better to have prefixes that scope them to bitmaps; for example IS_FILL_WORD should maybe have a BM_ prefix or something similar. I don't think it's necessary to renumber relopt_kind. Just stash the new one at the end of the enum. bmoptions's DESCR entry in pg_proc.h says btree. contrib/bmfuncs.c defines CHECK_PAGE_OFFSET_RANGE but doesn't seem to use it anywhere. same file defines CHECK_RELATION_BLOCK_RANGE using a bare { } block. Our style is to have these multi-statement macros use do {} while (false). #include lines in source files are normally alphabetically sorted. The new code fails to meet this expectation in many places. First four lines of _bitmap_init seem gratuitous .. All those #ifdef DEBUG_BMI lines sprinkled all over the place look pretty bad; they interrupt the indentation flow. I would suggest to define a macro or some macros to emit debugging messages, which are enabled or disabled in a single place depending on DEBUG_BMI. Something simple such as DO_DB() in fd.c would suffice. I don't like this comment one bit: /* misteriously, MemSet segfaults... :( */ I think that's probably a bug that should be investigated rather than papered over. I don't understand the cur_bmbuild thingy .. I think that stuff should be passed down as arguments to whatever do the index build, instead of being a magical global var that also signals failure to find hash functions for some datatypes. Above definition of BMBuildHashData there's a comment referring us to execGrouping.c but I cannot understand what it refers to. block unfound? In general I think it's poor style to spell out the function name in error messages. I mean, ereport() already reports the function name. Also, please don't split long error messages across multiple lines; better to leave the line to run off the screen. I'm unsure about distinguishing errors in the recovery routines that raise ERROR from those that PANIC. I mean, they would both cause the server to PANIC. The bitmap_xlog_cleanup routine talks about an uninitialized reln. That's a large point about xlog recovery -- you don't have relations. You only have relfilenodes. I think you need to shuffle some routines so that they can work with only a relfilenode. Generally, the xlog stuff needs a lot of comments. A pgindent run would be beneficial. -- Álvaro Herrerahttp://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Minmax indexes
On Wed, September 25, 2013 22:34, Alvaro Herrera wrote: [minmax-5.patch] I have the impression it's not quite working correctly. The attached program returns different results for different values of enable_bitmapscan (consistently). ( Btw, I had to make the max_locks_per_transaction higher for even not-so-large tables -- is that expected? For a 100M row table, max_locks_per_transaction=1024 was not enough; I set it to 2048. Might be worth some documentation, eventually. ) From eyeballing the results it looks like the minmax result (i.e. the result set with enable_bitmapscan = 1) yields only the last part because the only 'last' rows seem to be present (see the values in column i in table tmm in the attached program). Thanks, Erikjan Rijkers test.sh Description: application/shellscript -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] pgbench progress report improvements - split 2
On Tue, Sep 24, 2013 at 08:42:15PM +0200, Fabien COELHO wrote: meet all those goals simultaneously with simpler code, can we not? int64 wait = (int64) (throttle_delay * Min(7.0, -log(1 - pg_erand48(thread-random_state; If you truncate roughly the multipler, as it is done here with min, you necessarily create a bias, my random guess would be a 0.5% under estimation, but maybe it is more... Thus you would have to compute and the correcting factor as well. Its computation is a little bit less easy than with the quantized formula where I just used a simple SUM, and you have to really do the maths here. So I would keep the simple solution, but it is fine with me if you do the maths! Ah, true; I guess each approach has different advantages. I've committed your latest version. -- Noah Misch EnterpriseDB http://www.enterprisedb.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] INSERT...ON DUPLICATE KEY LOCK FOR UPDATE
On Sat, Sep 21, 2013 at 05:07:11PM -0700, Peter Geoghegan wrote: In the average/uncontended case, the subxact example bloats less than all alternatives to my design proposed to date (including the unborn heap tuple idea Robert mentioned in passing to me in person the other day, which I think is somewhat similar to a suggestion of Heikki's [1]). The average case is very important, because in general contention usually doesn't happen. This thread had a lot of discussion about bloating. I wonder, does the code check to see if there is a matching row _before_ adding any data? Our test-and-set code first checks to see if the lock is free, then if it it is, it locks the bus and does a test-and-set. Couldn't we easily check the indexes for matches before doing any locking? It seems that would avoid bloat in most cases, and allow for a simpler implementation. -- Bruce Momjian br...@momjian.ushttp://momjian.us EnterpriseDB http://enterprisedb.com + It's impossible for everything to be true. + -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] INSERT...ON DUPLICATE KEY LOCK FOR UPDATE
On Wed, Sep 25, 2013 at 8:19 PM, Bruce Momjian br...@momjian.us wrote: This thread had a lot of discussion about bloating. I wonder, does the code check to see if there is a matching row _before_ adding any data? That's pretty much what the patch does. Our test-and-set code first checks to see if the lock is free, then if it it is, it locks the bus and does a test-and-set. Couldn't we easily check the indexes for matches before doing any locking? It seems that would avoid bloat in most cases, and allow for a simpler implementation. The value locks are only really necessary for getting consensus across unique indexes on whether or not to go forward, and to ensure that insertion can *finish* unhindered once we're sure that's appropriate. Once we've committed to insertion, we hold them across heap tuple insertion and release each value lock as part of something close to conventional btree index tuple insertion (with an index tuple with an ordinary heap pointer inserted). I believe that all schemes proposed to date have some variant of what could be described as value locking, such as ordinary index tuples inserted speculatively. Value locks are *not* held during row locking, and an attempt at row locking is essentially opportunistic for various reasons (it boils down to the fact that re-verifying uniqueness outside of the btree code is very unappealing, and in any case would naturally sometimes be insufficient - what happens if key values change across row versions?). This might sound a bit odd, but is in a sense no different to the current state of affairs, where the first waiter on a blocking xact that inserted a would-be duplicate is not guaranteed to be the first to get a second chance at inserting. I don't believe that there are any significant additional lock starvation hazards. In the simple case where there is a conflicting tuple that's already committed, value locks above and beyond what the btree code does today are unnecessary (provided the attempt to acquire a row lock is eventually successful, which mostly means that no one else has updated/deleted - otherwise we try again). -- Peter Geoghegan -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] pgbench progress report improvements - split 3 v2
On Tue, Sep 24, 2013 at 10:41:17PM +0200, Fabien COELHO wrote: These changes are coupled because measures are changed, and their reporting as well. Submitting separate patches for these different features would result in conflicting or dependent patches, so I wish to avoid that if possible. My feelings on the patch split haven't changed; your three bullet points call for four separate patches. Conflicting patches are bad, but dependent patches are okay; just disclose the dependency order. How about this: as a next step, please extract just this feature that I listed last Saturday: Patch (4): Redefine latency as reported by pgbench and report lag more. Once that's committed, we can move on to others. nm -- Noah Misch EnterpriseDB http://www.enterprisedb.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] pgbench - exclude pthread_create() from connection start timing
Concerning one of the eventual would-be split patches... On Tue, Sep 24, 2013 at 10:41:17PM +0200, Fabien COELHO wrote: - Take thread start time at the beginning of the thread (!) Otherwise it includes pretty slow thread/fork system start times in the measurements. May help with bug #8326. This also helps with throttling, as the start time is used to adjust the rate: if it is not the actual start time, there is a bit of a rush at the beginning in order to catch up. Taking the start time when the thread actually starts is the sane thing to do to avoid surprises at possibly strange measures on short runs. On Sun, Sep 22, 2013 at 10:08:54AM +0200, Fabien COELHO wrote: I really spent *hours* debugging stupid measures on the previous round of pgbench changes, when adding the throttling stuff. Having the start time taken when the thread really starts is just sanity, and I needed that just to rule out that it was the source of the strange measures. I don't get it; why is taking the time just after pthread_create() more sane than taking it just before pthread_create()? We report separate TPS values for including connections establishing and excluding connections establishing because establishing the database connection is expensive, typically far more expensive than pthread_create(). The patch for threaded pgbench made the decision to account for pthread_create() as though it were part of establishing the connection. You're proposing to not account for it all. Both of those designs are reasonable to me, but I do not comprehend the benefit you anticipate from switching from one to the other. -j 800 vs -j 100 : ITM that if I you create more threads, the time delay incurred is cumulative, so the strangeness of the result should worsen. Not in general; we do one INSTR_TIME_SET_CURRENT() per thread, just before calling pthread_create(). However, thread 0 is a special case; we set its start time first and actually start it last. Your observation of cumulative delay fits those facts. Initializing the thread-0 start time later, just before calling its threadRun(), should clear this anomaly without changing other aspects of the measurement. While pondering this area of the code, it occurs to me -- shouldn't we initialize the throttle rate trigger later, after establishing connections and sending startup queries? As it stands, we build up a schedule deficit during those tasks. Was that deliberate? Thanks, nm -- Noah Misch EnterpriseDB http://www.enterprisedb.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers