Re: [HACKERS] [COMMITTERS] pgsql: Add pg_audit, an auditing extension
Noah, * Noah Misch (n...@leadboat.com) wrote: On Thu, May 14, 2015 at 01:38:49PM -0400, Tom Lane wrote: * The comments in the code betray utter ignorance of how logging actually works, in particular this: * Administrators can choose which log level the audit log is to be logged * at. The default level is LOG, which goes into the server log but does * not go to the client. Set to NOTICE in the regression tests. All the user has to do is change client_min_messages and he'll see all the reports, which means if you think that letting the user see the audit reports is a security problem then you have a hole a mile wide. That indicates the patch's general readiness: While I agree that the comment was poorly worded and agreed with Tom that it should be changed, I do not agree that it's indicative of the patch's readiness, nor do I feel it's an issue that clients can see the auditing information, provided that it's clearly documented. I'm planning to commit a number of documentation updates (along with other updates) to pg_audit to address that. Using ereport() for query information isn't anything new- we do that in auto_explain, and we also report the query in the CONTEXT lines of error messages. We should (and I plan to) document that in the appropriate places and note that there are risks associated with it (eg: with security definer functions). + /* These are DDL, unless they are ROLE */ + case LOGSTMT_DDL: + className = CLASS_DDL; + class = LOG_DDL; + + /* Identify role statements */ + switch (stackItem-auditEvent.commandTag) + { + /* We know these are all role statements */ + case T_GrantStmt: + case T_GrantRoleStmt: + case T_CreateRoleStmt: + case T_DropRoleStmt: + case T_AlterRoleStmt: + case T_AlterRoleSetStmt: + className = CLASS_ROLE; + class = LOG_ROLE; + break; Not T_AlterDefaultPrivilegesStmt? Agreed, that would be more sensible under ROLE than DDL. +static void +pg_audit_ProcessUtility_hook(Node *parsetree, +const char *queryString, +ProcessUtilityContext context, +ParamListInfo params, +DestReceiver *dest, +char *completionTag) +{ + AuditEventStackItem *stackItem = NULL; + int64 stackId = 0; + + /* +* Don't audit substatements. All the substatements we care about should +* be covered by the event triggers. +*/ + if (context = PROCESS_UTILITY_QUERY !IsAbortedTransactionBlockState()) They aren't covered. A GRANT inside CREATE SCHEMA escapes auditing: Actually, they are covered. David and I discussed this extensively, prior to your review, and concluded that this approach works because the items not under the EventTrigger charter are shared catalogs, updates to which wouldn't ever make sense to happen under a CREATE SCHEMA. I do hope that we are able to improve on EventTriggers by supporting them for shared catalogs one day, but that is a discussion for another thread. The issue which you discovered here is that GRANTs were categorized under CLASS_ROLE, but we have a check at the top of the DDL event trigger which does an early-exit if DDL isn't selected for inclusion. That clearly needs to be changed, so that the GRANT under CREATE SCHEMA is caught and logged properly. We put a great deal of thought into any and all places where we filter data to do our best to prevent this from happening, taking steps beyond what a simple module would do to capture information and make sure that something is logged, even when we don't have all of the information available. That isn't to say there aren't bugs- certainly issues have been found through the various reviews and comments provided by multiple individuals now, and I don't pretend that there are no others, but I don't believe that this module is particularly more bug-ridden than other contrib modules or even parts of core. I'm wary of the ease of forgetting to run CREATE EXTENSION. One gets much auditing from GUCs alone; for example, we audit CREATE TABLE t () with or without the extension, but only with the extension do we audit the inner CREATE TABLE of CREATE SCHEMA s CREATE TABLE t (). A user that creates a database without creating the extension might look at the audit messages and mistakenly think the database is all set. I agree with this concern,
Re: [HACKERS] Run pgindent now?
On 2015-05-26 20:25:24 -0400, Peter Eisentraut wrote: On 5/25/15 7:15 PM, Andres Freund wrote: On 2015-05-25 19:01:28 -0400, Bruce Momjian wrote: A longer-term fix would be to make pgindent less stupid about this sort of usage, but nobody's yet volunteered to dig into the guts of that code. I assume a typedefs list is going to be a requirement of any decent C indenting tool. Maybe I'm missing something major here, but why? Afaict it's just only used for formatting decisions that could be made without it just as well? AFAICT, the main reason is to decide whether * and are binary infix or unary prefix operators. Otherwise, it wouldn't know whether to write char * foo; or the more customary char *foo; Now, running pgindent without a typedefs list also makes it do things like static int32 -makepol(QPRS_STATE *state) +makepol(QPRS_STATE * state) which, one might argue, it could figure out without a typedefs list. But then the formatting would be inconsistent between prototypes and variable declarations, which might drive people crazy. I don't know whether there is a better way than living with it, one way or the other (i.e., requiring a types list, or accepting slightly odd formatting). I actually think both are relatively easy to figure out without a typedef list. There's harder cases though, e.g. (char *) foo in an expression is already more complicated. But really, the typedef list is the minor part what annoys me about pgindent. That it completely butchers so many constructs (e.g. function pointer typedefs, inline asm as extreme examples) is much worse. It's also neigh on impossible to predict/keep the indentation pgindent will use in many cases. Having to try to write code in a way that doesn't break the re-indentation tool, even if it'd otherwise be fine, is just absurd. Greetings, Andres Freund -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] psql po translation update
Peter, Here is a translation fule for src/bin/psql/ja.po of 9.4.2 from Kosizumi-san. According to him, there are some missing translations in the po file and he fixed it. Shall I commit into REL9_4_STABLE tree or would you like to do it yourself? Best regards, -- Tatsuo Ishii SRA OSS, Inc. Japan English: http://www.sraoss.co.jp/index_en.php Japanese:http://www.sraoss.co.jp # translation of psql. # HOTTA Michihide ho...@net-newbie.com, 2010. # msgid msgstr Project-Id-Version: PostgreSQL 9.4.2\n Report-Msgid-Bugs-To: pgsql-b...@postgresql.org\n POT-Creation-Date: 2015-05-23 22:59+0900\n PO-Revision-Date: 2015-05-27 08:21+0900\n Last-Translator: KOIZUMI Satoru koizumi...@minos.ocn.ne.jp\n Language-Team: jpug-doc jpug-...@ml.postgresql.jp\n Language: ja\n MIME-Version: 1.0\n Content-Type: text/plain; charset=UTF-8\n Content-Transfer-Encoding: 8bit\n Plural-Forms: nplurals=2; plural=n != 1;\n #: ../../common/exec.c:127 ../../common/exec.c:241 ../../common/exec.c:284 #, c-format msgid could not identify current directory: %s msgstr カレントディレクトリを識別できませんでした。: %s #: ../../common/exec.c:146 #, c-format msgid invalid binary \%s\ msgstr \%s\ は有効なバイナリファイルではありません。 #: ../../common/exec.c:195 #, c-format msgid could not read binary \%s\ msgstr バイナリファイル \%s\ を読み込めませんでした。 #: ../../common/exec.c:202 #, c-format msgid could not find a \%s\ to execute msgstr 実行に必要な \%s\ が見つかりません。 #: ../../common/exec.c:257 ../../common/exec.c:293 #, c-format msgid could not change directory to \%s\: %s msgstr ディレクトリ\%s\に移動できませんでした: %s #: ../../common/exec.c:272 #, c-format msgid could not read symbolic link \%s\ msgstr シンボリックリンク \%s\ を読み込めませんでした。 #: ../../common/exec.c:523 #, c-format msgid pclose failed: %s msgstr pcloseが失敗しました: %s #: ../../common/fe_memutils.c:33 ../../common/fe_memutils.c:60 #: ../../common/fe_memutils.c:83 command.c:321 input.c:205 mainloop.c:72 #: mainloop.c:234 #, c-format msgid out of memory\n msgstr メモリ不足です\n #: ../../common/fe_memutils.c:77 #, c-format msgid cannot duplicate null pointer (internal error)\n msgstr null ポインタを複製できません(内部エラー)。\n #: ../../common/username.c:45 #, c-format msgid could not look up effective user ID %ld: %s msgstr 実効ユーザID %ld が見つかりませんでした: %s #: ../../common/username.c:47 command.c:276 msgid user does not exist msgstr ユーザが存在しません #: ../../common/username.c:62 #, c-format msgid user name lookup failure: error code %lu msgstr ユーザ名の検索に失敗: エラーコード %lu #: ../../common/wait_error.c:47 #, c-format msgid command not executable msgstr コマンドは実行形式ではありません #: ../../common/wait_error.c:51 #, c-format msgid command not found msgstr コマンドが見つかりません #: ../../common/wait_error.c:56 #, c-format msgid child process exited with exit code %d msgstr 子プロセスが終了コード %d で終了しました。 #: ../../common/wait_error.c:63 #, c-format msgid child process was terminated by exception 0x%X msgstr 子プロセスが例外 0x%X で終了させられました。 #: ../../common/wait_error.c:73 #, c-format msgid child process was terminated by signal %s msgstr 子プロセスがシグナル %s で終了させられました。 #: ../../common/wait_error.c:77 #, c-format msgid child process was terminated by signal %d msgstr 子プロセスがシグナル %d で終了させられました。 #: ../../common/wait_error.c:82 #, c-format msgid child process exited with unrecognized status %d msgstr 子プロセスが不明な状態%dにより終了しました。 #: command.c:117 #, c-format msgid Invalid command \\%s. Try \\? for help.\n msgstr \\%sコマンドは無効です。\\? でヘルプを参照してください。\n #: command.c:119 #, c-format msgid invalid command \\%s\n msgstr \\%sは無効なコマンドです:\n #: command.c:130 #, c-format msgid \\%s: extra argument \%s\ ignored\n msgstr \\%s: 余分な引数 \%s\ は無視されました。\n #: command.c:274 #, c-format msgid could not get home directory for user ID %ld: %s\n msgstr ユーザID %ld のホームディレクトリ \%s\ の位置を特定できません。\n #: command.c:292 #, c-format msgid \\%s: could not change directory to \%s\: %s\n msgstr \\%s: ディレクトリを \%s\ に変更できません:%s\n #: command.c:307 common.c:446 common.c:886 #, c-format msgid You are currently not connected to a database.\n msgstr 現在データベースには接続していません。\n #: command.c:334 #, c-format msgid You are connected to database \%s\ as user \%s\ via socket in \%s\ at port \%s\.\n msgstr データベース\%s\にユーザ\%s\でソケット\%s\経由のポート\%s\で接続しています。\n #: command.c:337 #, c-format msgid You are connected to database \%s\ as user \%s\ on host \%s\ at port \%s\.\n msgstr ホスト\%3$s\上のポート\%4$s\のデータベース\%1$s\にユーザ\%2$s\で接続しています\n #: command.c:538 command.c:608 command.c:1403 #, c-format msgid no query buffer\n msgstr 問い合わせバッファがありません。\n #: command.c:571 command.c:3035 #, c-format msgid invalid line number: %s\n msgstr 無効な行番号です: %s\n #: command.c:602 #, c-format msgid The server (version %d.%d) does not support editing function source.\n msgstr このサーバーのバージョン (%d.%d) は関数のソース編集をサポートしていません\n #: command.c:682 msgid No changes msgstr 変更なし #: command.c:736 #, c-format msgid %s: invalid encoding name or conversion procedure not found\n msgstr %s: 符号化方式名が無効、または変換用プロシージャが見つかりません。\n #: command.c:833 command.c:883 command.c:897 command.c:914 command.c:1021 #: command.c:1180 command.c:1383 command.c:1414 #, c-format msgid
Re: [HACKERS] fsync-pgdata-on-recovery tries to write to more files than previously
At 2015-05-26 19:07:20 +0200, and...@anarazel.de wrote: Abhijit, do you recall why the code was changed to follow all symlinks in contrast to explicitly going through the tablespaces as initdb -S does? I'm pretty sure early versions of the patch pretty much had a verbatim copy of the initdb logic? Yes, earlier versions of the patch did follow symlinks only in pg_tblspc. I changed this because Álvaro pointed out that this was likely to miss important stuff, e.g. in 20150403163232.ga28...@eldon.alvh.no-ip.org, 20150406131636.gd4...@alvh.no-ip.org -- Abhijit -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [COMMITTERS] pgsql: Add pg_audit, an auditing extension
On Tue, May 26, 2015 at 8:10 PM, Stephen Frost sfr...@snowman.net wrote: I certainly welcome review from others and if there is not another committer-level formal review before we get close on 9.5 (say, end of August), then I'll revert it. There is certainly no concern that doing so would be difficult to do, as it is entirely self-contained. Like Noah, I'm unhappy that this patch went in. I stated a few days before it was committed that I felt the code quality and documentation were well below the level we normally expect. Your response was to do a little more work on the patch and then commit it without so much as reposting it. Clearly, we have completely different ideas about what constitutes an appropriate response to the sort of comments I sent. A near-record number of committers have objected to this patch in all kinds of different ways. There have been concerns about process, concerns about code quality, and concerns about whether this is really something that we want. The normal way that works is that you either (a) revise the patch or (b) try to convince the person raising the concern that it's OK. The concern is addressed when the person who raised it says it is, and not just because you thought about it and decided it was OK. This can be taken too far in the other direction, but we're not close to that in this instance. I am particularly troubled by the fact that what has happened with this patch is very much like what happened with row-level security: a patch that clearly wasn't finished and clearly had not had adequate review got abruptly committed - by you - without any consensus so to do. It seems likely to me that by now row-level security has had enough review that it is solid enough to be in the tree - in particular, I am encouraged by Dean Rasheed's work to improve the patch. However, it is absolutely not the community process for that stuff to happen after the code is already in the tree. It is the community process for that stuff to happen before the code is in the tree. It will be impossible for our community to continue delivering quality software if you persist in taking this approach. Either the other committers will have to spend an inordinate effort fixing (or convincing you to fix) the stuff you break - instead of working on our own projects - and other people's patches - or we will have to ignore your commits, and the things that are broken in those commits will become bugs in our releases. Either way, the community loses. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Run pgindent now?
On 5/25/15 7:15 PM, Andres Freund wrote: On 2015-05-25 19:01:28 -0400, Bruce Momjian wrote: A longer-term fix would be to make pgindent less stupid about this sort of usage, but nobody's yet volunteered to dig into the guts of that code. I assume a typedefs list is going to be a requirement of any decent C indenting tool. Maybe I'm missing something major here, but why? Afaict it's just only used for formatting decisions that could be made without it just as well? AFAICT, the main reason is to decide whether * and are binary infix or unary prefix operators. Otherwise, it wouldn't know whether to write char * foo; or the more customary char *foo; Now, running pgindent without a typedefs list also makes it do things like static int32 -makepol(QPRS_STATE *state) +makepol(QPRS_STATE * state) which, one might argue, it could figure out without a typedefs list. But then the formatting would be inconsistent between prototypes and variable declarations, which might drive people crazy. I don't know whether there is a better way than living with it, one way or the other (i.e., requiring a types list, or accepting slightly odd formatting). -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Run pgindent now?
On Tue, May 26, 2015 at 3:07 PM, Tom Lane t...@sss.pgh.pa.us wrote: Robert Haas robertmh...@gmail.com writes: Realistically, with merge.conflictstyle = diff3 (why is this not the default?), resolving whitespace conflicts that occur when you try to cherry-pick is typically not very difficult. Really? The problems I have generally come from places where pgindent has changed the line breaks, not just horizontal spacing. I haven't seen anything that copes with this, certainly not git. Well, it's not fully automated, but if you set the setting above, and then cherry-pick, your merge conflicts will look something like this: side A ||| original version === side B Either side A or side B will be what changed in the patch you cherry-picked, and the other will be what changed in the branch in the meantime. I forget which is which. So you notice that one of the two sides differs from the original version only in terms of whitespace and delete that side, keeping the other side. Done. In general, the way you resolve these conflicts is by choosing the side that has fewer changes from the original version, noting how it differs from the original version, modifying the other side accordingly, and then deleting the other two versions. For example: here we renamed the function! ||| original version === here we added an additional parameter to the function call! So you either change side B to the new function name and remove side A, or else you change side A to pass the extra parameter and remove side B. In either case you remove the original version. This is obviously not zero effort. At the same time, it's not much effort, either. I resolve these kinds of mechanical conflicts all the time, and they don't take up much time or effort. If you have to deal with this kind of crap using patch, it bites. If you use git but with the default conflictstyle, you don't get the original version part of the conflict, so it still bites. But after a modest amount of practice, resolving trivial conflicts with merge.conflictstyle=diff3 is pretty darn easy. Or at least, I have found it so. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [COMMITTERS] pgsql: Row-Level Security Policies (RLS)
Alvaro, * Alvaro Herrera (alvhe...@2ndquadrant.com) wrote: Stephen Frost wrote: * Alvaro Herrera (alvhe...@2ndquadrant.com) wrote: What do we need RowSecurityPolicy-policy_id for? It seems to me that it is only used to determine whether the policy is the default deny one, so that it can later be removed if a hook adds a different one. This seems contrived as well as under-documented. Why isn't a boolean flag sufficient? Thanks for taking a look! It's also used during relcache updates (see equalPolicy()). Hmm, but the policy name is unique also, right? So the policy_id check is redundant ... I don't disagree with that, but surely checking if it's the same OID and exiting immediately is going to be faster than comparing the policy names. Now, looking at the code, I'm actually failing to see a case where we use the RowSecurityPolicy-policy_name.. Perhaps *that's* what we should be looking to remove? Thanks! Stephen signature.asc Description: Digital signature
Re: [HACKERS] optimizing vacuum truncation scans
On Tue, May 26, 2015 at 12:37 PM, Jeff Janes jeff.ja...@gmail.com wrote: On Mon, Apr 20, 2015 at 10:18 AM, Jim Nasby jim.na...@bluetreble.com wrote: On 4/20/15 1:50 AM, Jeff Janes wrote: For that matter, why do we scan backwards anyway? The comments don't explain it, and we have nonempty_pages as a starting point, so why don't we just scan forward? I suspect that eons ago we didn't have that and just blindly reverse-scanned until we finally hit a non-empty buffer... nonempty_pages is not concurrency safe, as the pages could become used after vacuum passed them over but before the access exclusive lock was grabbed before the truncation scan. But maybe the combination of the two? If it is above nonempty_pages, then anyone who wrote into the page after vacuum passed it must have cleared the VM bit. And currently I think no one but vacuum ever sets VM bit back on, so once cleared it would stay cleared. Right. In any event nonempty_pages could be used to set the guess as to how many pages (if any) might be worth prefetching, as that is not needed for correctness. Yeah, but I think we'd do a LOT better with the VM idea, because we could immediately truncate without scanning anything. Right now all the interlocks to make this work seem to be in place (only vacuum and startup can set visibility map bits, and only one vacuum can be in a table at a time). But as far as I can tell, those assumption are not baked in and we have pondered loosening them before. For example, letting HOT clean up mark a page as all-visible if it finds it be such. Now in that specific case it would be OK, as HOT cleanup would not cause a page to become empty (or could it? If an insert on a table with no indexes was rolled back, and hot clean up found it and cleaned it up, it could conceptually become empty--unless we make special code to prevent it) , and so the page would have to be below nonempty_pages. But there may be other cases. And I know other people have mentioned making VACUUM concurrent (although I don't see the value in that myself). So doing it this way would be hard to beat (scanning a bitmap vs the table itself), but it would also introduce a modularity violation that I am not sure is worth it. Of course this could always be reverted if its requirements became a problem for a more important change (assuming of course that we detected the problem) The fatal problem here is that nonempty_pages is unreliable. If vacuum skips all-visible pages, it doesn't necessarily increment nonempty_pages beyond that skippage. So if you just rely on nonempty_pages, you will truncate away pages that were already all visible but are not empty. If we changed it so that it did increment nonempty_pages past the skipped ones, then pages which were all empty and got marked as all visible without being truncated (say, because a lock could not be acquired, or because there was a non-empty page after them which later became empty), then those pages would never get truncated away. As it is currently, it is not clear what purpose nonempty_pages serves. It is a guardian value which doesn't seem to actually guard anything. At best it prevents you from needing to inspect one page (the page of the guardian value itself) to see if that page is actually empty, and finding that it is not. That hardly seems worthwhile. We could adopt two nonempty_pages counters, once that fails low on skipped all-visible pages, and one that failed high on them. And then fast truncate down to the high one, and do the current page by page scan between the low and high. That seems rather grotesque. Cheers, Jeff
Re: [HACKERS] Run pgindent now?
On 5/25/15 5:51 PM, Alvaro Herrera wrote: Tom Lane wrote: A longer-term fix would be to make pgindent less stupid about this sort of usage, but nobody's yet volunteered to dig into the guts of that code. We've discussed in the past that we could use something other than BSD's indent -- astyle has been mentioned. It seems that with suitable options, we could make the result very close to what we have now, and not be forced to deal with typedef lists and other nonsense. astyle looks like a decent tool, but it seems to me that it tends to leave things alone that it doesn't have an explicit rule about. So that would leave a lot of room for formatting randomness from different authors. -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Construction of Plan-node by CSP (RE: Custom/Foreign-Join-APIs)
On Mon, May 25, 2015 at 5:08 AM, Kouhei Kaigai kai...@ak.jp.nec.com wrote: I updated the patch to fix up this problem towards the latest master branch. [ ... ] Instead of this approach, Tom suggested to add a list of child Paths on CustomPath node, then createplan.c calls create_plan_recurse() for each entry of the list, without this function getting exported. Tom, do you want to review this patch and figure out how to solve the underlying problem? If not, I will take care of it. But I will be unhappy if I put time and effort into this and then you insist on changing everything afterwards, again. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] fsync-pgdata-on-recovery tries to write to more files than previously
At 2015-05-26 22:44:03 +0200, and...@anarazel.de wrote: So, this was discussed in the following thread, starting at: http://archives.postgresql.org/message-id/20150403163232.GA28444%40eldon.alvh.no-ip.org Sorry, I didn't see this before replying. There are no other places we it's allowed to introduce symlinks and we have refuted bugreports of people having problems after doing that. OK. So what I propose is: 1) Remove the automatic symlink following 2) Follow pg_tbspc/*, pg_xlog if it's a symlink, fix the latter in initdb -S 3) Add a elevel argument to walkdir(), return if AllocateDir() fails, continue for stat() failures in the readdir() loop. 4) Add elevel argument to pre_sync_fname, fsync_fname, return after errors. 5) Accept EACCESS, ETXTBSY (if defined) when open()ing the files. By virtue of not following symlinks we should not need to worry about EROFS Makes sense. I've got most of that done, I'll just remove the symlink following (or rather, restrict it to the cases mentioned), test a bit, and post. I'm inclined to think that 4) is a big enough compat break that a fsync_fname_ext with the new argument is a good idea. Agreed. -- Abhijit -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] [Postgresql NLS support] : Help on using NLS , Custom dictionary to enhance our website search functionality
Hi All, We have newbie to Postgresql. Background: We have site hosted on Ruby on Rails using Postgresql database.It is a eCommerce site and for which we need to provide the NLS supported Search functionality to help end users while searching by using Synonyms, related word , Plurals and Singular , Stop words etc. Problem As per our analysis we found that with Postgresql NLS it possible to do following with any custom changes: 1. Participles (help,helps,helped,helping) 2. Multi-word (search results) 3. Single word (results) 4. Plurals and Singular (s,es) 5. Punctuation's (.,;,!,:,') 6. Stop words (And, or, as , an, but) Reading the documentation for Custom dictionary, We tried to use the Custom dictionary postgresql/9.3/tsearch_data and added following to test : buy purchase big enormous mammoth elephant indices index* But we found that when we search for word Purchase we get search result for Purchase and Buy both. But when we search for Buy then we don't get search result for Purchase. We are using following query to using which we found that transitive synonym search is not working for us. SELECT products.* FROM products WHERE (((to_tsvector('english', coalesce(products.description::TEXT, ''))) @@(to_tsquery('english', ''' ' || 'purchase' || ' '''; I think transitive support will be basic function and Postgresql database must have provided. Please let us know if some setting or configuration changes are needed to enable this feature. Also as per our finding, following are points on which it seems there is no direct way in Postgresql:1. Related words (Pushchair buggies)2. Near by words (Pen , Pencil, Eraser)3. Synonyms (buy=purchase,big=enormous)4. Spelling Mistake We look forward for feedback / suggestions our problem or suggest any alternative solution, how this can be done ? Thanks in advance Thanks and regards, Nivedita Kulkarni
Re: [HACKERS] Supporting TAP tests with MSVC and Windows
On Fri, Apr 24, 2015 at 11:26 AM, Michael Paquier michael.paqu...@gmail.com wrote: On Mon, Apr 20, 2015 at 9:01 PM, Michael Paquier michael.paqu...@gmail.com wrote: On Sun, Apr 19, 2015 at 10:01 PM, Michael Paquier michael.paqu...@gmail.com wrote: Note as well that this patch uses the following patches fixing independent issues: ... Attached is v4. I added a switch in config.pl to be consistent with ./configure and --enable-tap-tests. Attached is v5, rebased on HEAD (2c47fe16) after conflicts with dcae5fac and 54a16df0. Here is v6, a rebased version on HEAD (79f2b5d). There were some conflicts with the indentation and some other patches related to pg_rewind and initdb's tests. -- Michael From 5879da14bbf20b5ac66b78f8beab09114bb2fa96 Mon Sep 17 00:00:00 2001 From: Michael Paquier mich...@otacoo.com Date: Mon, 20 Apr 2015 04:57:37 -0700 Subject: [PATCH 1/2] Add support for TAP tests on Windows Nodes initialized by the TAP tests use SSPI to securely perform the tests, and test scripts are patched in a couple of places to support Windows grammar. In the case of MSVC, tests can be run with this command: vcregress tapcheck --- doc/src/sgml/install-windows.sgml | 1 + src/Makefile.global.in | 2 +- src/bin/initdb/t/001_initdb.pl | 1 + src/bin/pg_basebackup/t/010_pg_basebackup.pl | 67 -- src/bin/pg_controldata/t/001_pg_controldata.pl | 5 +- src/bin/pg_ctl/t/001_start_stop.pl | 14 -- src/bin/pg_ctl/t/002_status.pl | 12 - src/bin/pg_rewind/RewindTest.pm| 66 +++-- src/bin/scripts/t/020_createdb.pl | 3 ++ src/test/perl/TestLib.pm | 18 --- src/tools/msvc/Solution.pm | 1 + src/tools/msvc/config_default.pl | 1 + src/tools/msvc/vcregress.pl| 48 +- 13 files changed, 175 insertions(+), 64 deletions(-) diff --git a/doc/src/sgml/install-windows.sgml b/doc/src/sgml/install-windows.sgml index d154b44..2047790 100644 --- a/doc/src/sgml/install-windows.sgml +++ b/doc/src/sgml/install-windows.sgml @@ -439,6 +439,7 @@ $ENV{CONFIG}=Debug; userinputvcregress modulescheck/userinput userinputvcregress ecpgcheck/userinput userinputvcregress isolationcheck/userinput +userinputvcregress tapcheck/userinput userinputvcregress upgradecheck/userinput /screen diff --git a/src/Makefile.global.in b/src/Makefile.global.in index c583b44..563d1d1 100644 --- a/src/Makefile.global.in +++ b/src/Makefile.global.in @@ -336,7 +336,7 @@ cd $(srcdir) TESTDIR='$(CURDIR)' PATH=$(bindir):$$PATH PGPORT='6$(DEF_PGPOR endef define prove_check -cd $(srcdir) TESTDIR='$(CURDIR)' $(with_temp_install) PGPORT='6$(DEF_PGPORT)' top_builddir='$(CURDIR)/$(top_builddir)' $(PROVE) $(PG_PROVE_FLAGS) $(PROVE_FLAGS) t/*.pl +cd $(srcdir) TESTDIR='$(CURDIR)' TESTREGRESS='$(top_builddir)/src/test/regress/pg_regress' $(with_temp_install) PGPORT='6$(DEF_PGPORT)' $(PROVE) $(PG_PROVE_FLAGS) $(PROVE_FLAGS) t/*.pl endef else diff --git a/src/bin/initdb/t/001_initdb.pl b/src/bin/initdb/t/001_initdb.pl index 299dcf5..095cbf3 100644 --- a/src/bin/initdb/t/001_initdb.pl +++ b/src/bin/initdb/t/001_initdb.pl @@ -4,6 +4,7 @@ use strict; use warnings; +use Config; use TestLib; use Test::More tests = 14; diff --git a/src/bin/pg_basebackup/t/010_pg_basebackup.pl b/src/bin/pg_basebackup/t/010_pg_basebackup.pl index 3476ea6..112f3a1 100644 --- a/src/bin/pg_basebackup/t/010_pg_basebackup.pl +++ b/src/bin/pg_basebackup/t/010_pg_basebackup.pl @@ -1,8 +1,9 @@ use strict; use warnings; use Cwd; +use Config; use TestLib; -use Test::More tests = 35; +use Test::More tests = ($Config{osname} eq MSWin32) ? 25 : 35; program_help_ok('pg_basebackup'); program_version_ok('pg_basebackup'); @@ -17,10 +18,20 @@ command_fails( [ 'pg_basebackup', '-D', $tempdir/backup ], 'pg_basebackup fails because of hba'); +# Use SSPI on Windows, node has been initialized already accordingly +# by pg_regress --config-auth. open HBA, $tempdir/pgdata/pg_hba.conf; -print HBA local replication all trust\n; -print HBA host replication all 127.0.0.1/32 trust\n; -print HBA host replication all ::1/128 trust\n; +if ($Config{osname} ne MSWin32) +{ + print HBA local replication all trust\n; + print HBA host replication all 127.0.0.1/32 trust\n; + print HBA host replication all ::1/128 trust\n; +} +else +{ + print HBA host replication all 127.0.0.1/32 sspi include_realm=1 map=regress\n; + print HBA host replication all ::1/128 sspi include_realm=1 map=regress\n; +} close HBA; system_or_bail 'pg_ctl', '-s', '-D', $tempdir/pgdata, 'reload'; @@ -56,6 +67,33 @@ command_fails([ 'pg_basebackup', '-D', $tempdir/tarbackup_l1, '-Ft' ], 'pg_basebackup tar with long name fails'); unlink $tempdir/pgdata/$superlongname; +command_fails( + [ 'pg_basebackup', '-D', $tempdir/backup_foo, '-Fp', -T=/foo ], + '-T with
Re: [HACKERS] [PATCH] Generalized JSON output functions
On Sat, May 23, 2015 at 3:03 AM, Ryan Pedela rped...@datalanche.com wrote: On Fri, May 22, 2015 at 10:51 AM, Merlin Moncure mmonc...@gmail.com wrote: On Fri, May 22, 2015 at 9:43 AM, Alvaro Herrera alvhe...@2ndquadrant.com wrote: Andrew Dunstan wrote: On 05/20/2015 09:16 AM, Shulgin, Oleksandr wrote: Attached is a patch against master to generalize the JSON-producing functions in utils/adt/json.c and to provide a set of callbacks which can be overridden the same way that is already provided for *parsing* JSON. I'm not necessarily opposed to this, but it sure seems like a lot of changes, and moderately invasive ones, to support something that could be done, at the cost of reparsing, with a simple loadable extension that I could create in a few hours of programming. But this seems like a pretty reasonable change to make, no? Doesn't the total amount of code decrease after this patch? JSON stuff is pretty new so some refactoring and generalization of what we have is to be expected. Yeah. Also, there have been a few previous gripes about this, for example, http://www.postgresql.org/message-id/cahbvmpzs+svr+y-ugxjrq+xw4dqtevl-cozc69zffwmxjck...@mail.gmail.com . As noted, I definitely prefer 'space free' by default for efficiency reasons, but standardizing the output has definitely got to be a reasonable goal. Every JSON implementation I have ever used defaults to the minified version of JSON (no whitespace) when printed. Hashing of arrays seems to be an important issue: we'd rather make sure to produce the same output in every code path. That would also mean: no special logic to add the line feeds in json_agg either. Is it reasonable to add this patch to CommitFest now? -- Alex
Re: [HACKERS] fsync bug faq for publication?
On May 26, 2015 07:31, Tom Lane t...@sss.pgh.pa.us wrote: Josh Berkus j...@agliodbs.com writes: We need to get a notice out to our users who might update their servers and get stuck behind the fsync bug. As such, I've prepared a FAQ. Please read, correct and improve this FAQ so that it's fit for us to announce to users as soon as possible: https://wiki.postgresql.org/wiki/May_2015_Fsync_Permissions_Bug Judging by Ross Boylan's report at http://www.postgresql.org/message-id/f1f13e14a610474196571953929c02096d0...@ex08.net.ucsf.edu it's not sufficient to just recommend changing permissions on the problematic files. It's not entirely clear from here whether there is a solution that both allows fsync on referenced files and keeps OpenSSL happy; but if there is, it probably requires making the cert files be owned by the postgres user, as well as adjusting their permissions to be 0640 or thereabouts. I'm worried about whether that breaks other services using the same cert files. It almost certainly will. I think the recommendation has to be that if it's a symlink, it should be replaced with a copy of the file, and that copy be chown and chmod the right way. /Magnus
[HACKERS] Why does txid_current() assign new transaction-id?
Hi,hackers! I have a question about txid_current(). it is Why does txid_current() assign new transaction-id?. When we executes txid_current() outside of transaction block, it assigns new transaction-id. I guess it doesn't need to assign a new txid because txid_current() is just a read-only function. I found a replaceable function by walking through pg-code, that is GetStableLatestTransactionId(void). I attached a patch which changing just 1-line. Could you please check the code? Regards, Naoya --- Naoya Anzai Engineering Department NEC Solution Inovetors, Ltd. E-Mail: nao-an...@xc.jp.nec.com --- txid_current.patch Description: txid_current.patch -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] problems on Solaris
On Mon, May 25, 2015 at 1:07 AM, Andres Freund and...@anarazel.de wrote: On 2015-05-24 19:44:37 -0400, Andrew Dunstan wrote: Buildfarm members casteroides and protosciurus have been having some problems that seem puzzling. These animals both run on the same machine, but with different compilers. casteroides runs with the Sun Studio 12 compiler, and has twice in the last 3 days demonstrated this error: [5561ce0c.51b7:25] LOG: starting background worker process test_shm_mq [5561ce1e.5287:9] PANIC: stuck spinlock (100cb77f4) detected at atomics.c:30 [5561ce1e.5287:10] STATEMENT: SELECT test_shm_mq_pipelined(16384, (select string_agg(chr(32+(random()*95)::int), '') from generate_series(1,27)), 200, 3); [5561ce0c.51b7:26] LOG: server process (PID 21127) was terminated by signal 6 [5561ce0c.51b7:27] DETAIL: Failed process was running: SELECT test_shm_mq_pipelined(16384, (select string_agg(chr(32+(random()*95)::int), '') from generate_series(1,27)), 200, 3); [5561ce0c.51b7:28] LOG: terminating any other active server processes It's not constant - between the two failures was a success. That's indeed rather odd. For one the relevant code does nothing but lock/unlock a spinlock. For another, there's been no recent change to this and casteroides has been running happily for a long time. protociurus runs with gcc 3.4.3 and gets this error: gcc -Wall -Wmissing-prototypes -Wpointer-arith -Wdeclaration-after-statement -Wendif-labels -Wmissing-format-attribute -Wformat-security -fno-strict-aliasing -fwrapv -Wno-unused-command-line-argument -g -I/usr/local/include -m64 -I. -I../../../src/interfaces/libpq -I./../regress -I../../../src/include -c -o specparse.o specparse.c In file included from /usr/include/sys/vnode.h:47, from /usr/include/sys/stream.h:22, from /usr/include/netinet/in.h:66, from /usr/include/netdb.h:98, from ../../../src/include/port.h:17, from ../../../src/include/c.h:1114, from ../../../src/include/postgres_fe.h:25, from specparse.y:13: /usr/include/sys/kstat.h:439: error: syntax error before numeric constant /usr/include/sys/kstat.h:463: error: syntax error before '}' token /usr/include/sys/kstat.h:464: error: syntax error before '}' token In file included from /usr/include/sys/stream.h:22, from /usr/include/netinet/in.h:66, from /usr/include/netdb.h:98, from ../../../src/include/port.h:17, from ../../../src/include/c.h:1114, from ../../../src/include/postgres_fe.h:25, from specparse.y:13: /usr/include/sys/vnode.h:105: error: syntax error before kstat_named_t I'd noticed this one as well. This sounds like a installation problem, not really ours. Dave, any chance you could look into this, or give somebody an account to test what's up? I'm not going to be able to look at this, at least this week. I can give someone on the EDB team access - Robert; can one of your guys take a look? -- Dave Page Blog: http://pgsnake.blogspot.com Twitter: @pgsnake EnterpriseDB UK: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] ERROR: MultiXactId xxxx has not been created yet -- apparent wraparound
With PostgreSQL 9.3.5 on Ubuntu 12.04, I'm getting the error: ERROR: MultiXactId 1934308693 has not been created yet -- apparent wraparound on doing various queries on our database. I don't think it is a wraparound - I think the tuple has mistakenly decided it has a MultiXactId related to it. Looking back through the logs, it looks as if this suddenly started happening at 2:00 (UTC+0100) on 23 May. (Nothing was happening at that time as far as we can tell - the server didn't crash/restart or anything). The logs were all really small, until the ones from then onwards, which are now full of things like: 2015-05-23 10:44:54 BST ERROR: MultiXactId 1129406840 has not been created yet -- apparent wraparound 2015-05-23 10:44:54 BST CONTEXT: automatic analyze of table x.messages.msgdata 2015-05-23 10:45:16 BST ERROR: MultiXactId 1575170 has not been created yet -- apparent wraparound 2015-05-23 10:45:16 BST STATEMENT: select sum(size) from messages.msgdata 2015-05-23 10:45:54 BST ERROR: MultiXactId 1129406840 has not been created yet -- apparent wraparound 2015-05-23 10:45:54 BST CONTEXT: automatic analyze of table x.messages.msgdata 2015-05-23 10:46:54 BST ERROR: MultiXactId 1129406840 has not been created yet -- apparent wraparound (There are several incorrect MultiXactIds in there). This was NOT an upgrade. It has been running with 9.3 for ages. The pg_controldata output is: pg_control version number:937 Catalog version number: 201306121 Database system identifier: 5990773948116871611 Database cluster state: in production pg_control last modified: Tue 26 May 2015 09:50:25 BST Latest checkpoint location: 7C6/8F863440 Prior checkpoint location:7C6/8E8576D8 Latest checkpoint's REDO location:7C6/8E8745F8 Latest checkpoint's REDO WAL file:000107C6008E Latest checkpoint's TimeLineID: 1 Latest checkpoint's PrevTimeLineID: 1 Latest checkpoint's full_page_writes: on Latest checkpoint's NextXID: 0/24005839 Latest checkpoint's NextOID: 1802564 Latest checkpoint's NextMultiXactId: 216 Latest checkpoint's NextMultiOffset: 439 Latest checkpoint's oldestXID:710 Latest checkpoint's oldestXID's DB: 1 Latest checkpoint's oldestActiveXID: 0 Latest checkpoint's oldestMultiXid: 1 Latest checkpoint's oldestMulti's DB: 1 Time of latest checkpoint:Tue 26 May 2015 09:45:57 BST Fake LSN counter for unlogged rels: 0/1 Minimum recovery ending location: 0/0 Min recovery ending loc's timeline: 0 Backup start location:0/0 Backup end location: 0/0 End-of-backup record required:no Current wal_level setting:minimal Current max_connections setting: 200 Current max_prepared_xacts setting: 0 Current max_locks_per_xact setting: 64 Maximum data alignment: 8 Database block size: 8192 Blocks per segment of large relation: 131072 WAL block size: 8192 Bytes per WAL segment:16777216 Maximum length of identifiers:64 Maximum columns in an index: 32 Maximum size of a TOAST chunk:1996 Date/time type storage: 64-bit integers Float4 argument passing: by value Float8 argument passing: by value Data page checksum version: 0 The pg_multixact directory contains two files members/ offsets/ There are no locked up transactions is pg_stat_activity; Any ideas? I can't even delete the records with the bad multixactid in them (which would be acceptable, as we can recover those individual records from backup). We don't want to restore fully from backup if possible as that will lose everything since the last good backup. -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] 9.5 release notes may need ON CONFLICT DO NOTHING compatibility notice for FDW authors
Peter Geoghegan wrote: In any case, third party foreign data wrappers that target other database system will totally ignore ON CONFLICT DO NOTHING when built against the master branch (unless they consider these questions). They should perhaps make a point of rejecting DO NOTHING outright where it makes sense that support could exist, but it just doesn't. Or they could just add support (I imagine that this would be very easy for mysql_fdw, for example -- MySQL has INSERT IGNORE). I feel a compatibility item in the release notes is in order so the question is considered, but there seems to be no place to do that on the Wiki, and the original commit message does not have a note like this. +1 I wouldn't have become aware of that if I hadn't read your message. Yours, Laurenz Albe -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] ERROR: MultiXactId xxxx has not been created yet -- apparent wraparound
It was fixed in 9.3.7. Unfortunately 9.3.7 has new bug which is irrelevant to this. http://www.postgresql.org/message-id/20150525142657.4686.35...@wrigleys.postgresql.org I'm not sure if Ubuntu 12.04 is affected by the bug or not though. As far as I know developers plan to release 9.3.8 etc. soon. Best regards, -- Tatsuo Ishii SRA OSS, Inc. Japan English: http://www.sraoss.co.jp/index_en.php Japanese:http://www.sraoss.co.jp With PostgreSQL 9.3.5 on Ubuntu 12.04, I'm getting the error: ERROR: MultiXactId 1934308693 has not been created yet -- apparent wraparound on doing various queries on our database. I don't think it is a wraparound - I think the tuple has mistakenly decided it has a MultiXactId related to it. Looking back through the logs, it looks as if this suddenly started happening at 2:00 (UTC+0100) on 23 May. (Nothing was happening at that time as far as we can tell - the server didn't crash/restart or anything). The logs were all really small, until the ones from then onwards, which are now full of things like: 2015-05-23 10:44:54 BST ERROR: MultiXactId 1129406840 has not been created yet -- apparent wraparound 2015-05-23 10:44:54 BST CONTEXT: automatic analyze of table x.messages.msgdata 2015-05-23 10:45:16 BST ERROR: MultiXactId 1575170 has not been created yet -- apparent wraparound 2015-05-23 10:45:16 BST STATEMENT: select sum(size) from messages.msgdata 2015-05-23 10:45:54 BST ERROR: MultiXactId 1129406840 has not been created yet -- apparent wraparound 2015-05-23 10:45:54 BST CONTEXT: automatic analyze of table x.messages.msgdata 2015-05-23 10:46:54 BST ERROR: MultiXactId 1129406840 has not been created yet -- apparent wraparound (There are several incorrect MultiXactIds in there). This was NOT an upgrade. It has been running with 9.3 for ages. The pg_controldata output is: pg_control version number:937 Catalog version number: 201306121 Database system identifier: 5990773948116871611 Database cluster state: in production pg_control last modified: Tue 26 May 2015 09:50:25 BST Latest checkpoint location: 7C6/8F863440 Prior checkpoint location:7C6/8E8576D8 Latest checkpoint's REDO location:7C6/8E8745F8 Latest checkpoint's REDO WAL file:000107C6008E Latest checkpoint's TimeLineID: 1 Latest checkpoint's PrevTimeLineID: 1 Latest checkpoint's full_page_writes: on Latest checkpoint's NextXID: 0/24005839 Latest checkpoint's NextOID: 1802564 Latest checkpoint's NextMultiXactId: 216 Latest checkpoint's NextMultiOffset: 439 Latest checkpoint's oldestXID:710 Latest checkpoint's oldestXID's DB: 1 Latest checkpoint's oldestActiveXID: 0 Latest checkpoint's oldestMultiXid: 1 Latest checkpoint's oldestMulti's DB: 1 Time of latest checkpoint:Tue 26 May 2015 09:45:57 BST Fake LSN counter for unlogged rels: 0/1 Minimum recovery ending location: 0/0 Min recovery ending loc's timeline: 0 Backup start location:0/0 Backup end location: 0/0 End-of-backup record required:no Current wal_level setting:minimal Current max_connections setting: 200 Current max_prepared_xacts setting: 0 Current max_locks_per_xact setting: 64 Maximum data alignment: 8 Database block size: 8192 Blocks per segment of large relation: 131072 WAL block size: 8192 Bytes per WAL segment:16777216 Maximum length of identifiers:64 Maximum columns in an index: 32 Maximum size of a TOAST chunk:1996 Date/time type storage: 64-bit integers Float4 argument passing: by value Float8 argument passing: by value Data page checksum version: 0 The pg_multixact directory contains two files members/ offsets/ There are no locked up transactions is pg_stat_activity; Any ideas? I can't even delete the records with the bad multixactid in them (which would be acceptable, as we can recover those individual records from backup). We don't want to restore fully from backup if possible as that will lose everything since the last good backup. -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] ERROR: MultiXactId xxxx has not been created yet -- apparent wraparound
On 26/05/2015 10:23, Tatsuo Ishii wrote: It was fixed in 9.3.7. OK. Do you know what the bug number was or what it was called (I've tried searching for it before posting, but found nothing which seemed to be the same problem as ours). Do you know whether, if we upgrade to 9.3.7/8, it will fix the problem in the current database, or is the data damage 'permanent', so we'll need to restore from backup and somehow merge in the latest data? -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Order of columns in query is important?!
You're right. I misread the problem description. On Tue, May 26, 2015 at 3:13 AM, Petr Jelinek p...@2ndquadrant.com wrote: On 26/05/15 11:59, CK Tan wrote: It has to do with the implementation of slot_getattr, which tries to do the deform on-demand lazily. if you do select a,b,c, the execution would do slot_getattr(1) and deform a, and then slot_getattr(2) which reparse the tuple to deform b, and finally slot_getattr(3), which parse the tuple yet again to deform c. Where as if you do select c, b, a, it would do slot_getattr(3) to deform c, and in the process deform a and b in one pass. Subsequent calls to slot_getattr 1 and 2 would find the attribute ready and available, and return it (without parsing the tuple again). If this was the case, changing column order would lead to performance increase, not decrease as reported. My guess would be same as Amits, it's most likely the additional projection step. -- Petr Jelinek http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services
Re: [HACKERS] fsync-pgdata-on-recovery tries to write to more files than previously
* Abhijit Menon-Sen (a...@2ndquadrant.com) wrote: At 2015-05-26 03:54:51 +0200, and...@anarazel.de wrote: Another thing is whether we should handle a recursive symlink in pgdata? I personally think not, but... I think not too. Yikes.. That's definitely the kind of thing that's why I worry about the whole 'fsync everything' idea- what if I symlink to /? I've certainly done that before from my home directory for ease of use and I imagine there are people out there who have similar setups where they sftp as the PG user and use the symlink to more easily navigate somewhere else. We have to realize that, on at least some systems, PGDATA could be the postgres user's home directory too. That's not the case on Debian-based systems today, but I think it might have been before Debian had the multi-cluster tooling. It's also not just as simple as making fsync_fname fail gracefully upon EACCESS - the opendir() could fail just as well. I'll post a proposed patch shortly. Thanks! Stephen signature.asc Description: Digital signature
Re: [HACKERS] Order of columns in query is important?!
It has to do with the implementation of slot_getattr, which tries to do the deform on-demand lazily. if you do select a,b,c, the execution would do slot_getattr(1) and deform a, and then slot_getattr(2) which reparse the tuple to deform b, and finally slot_getattr(3), which parse the tuple yet again to deform c. Where as if you do select c, b, a, it would do slot_getattr(3) to deform c, and in the process deform a and b in one pass. Subsequent calls to slot_getattr 1 and 2 would find the attribute ready and available, and return it (without parsing the tuple again). For Vitesse X, we mark all columns that were required in the query during JIT compile, and deform it in one shot. PG should be able to do the same. -cktan On Mon, May 25, 2015 at 2:26 AM, Colin 't Hart co...@sharpheart.org wrote: Hi, I hope this is the best place to report this or should I be on pgsql-general or pgsql-bugs? It seems that the order of columns in a query can make a difference in execution times. In my brief investigation, queries on table(a,b,c,d,e,f,g,h) of the form select * from table order by non-indexed-column limit 25; select a,b,c,d,e,f,g,h from table order by non-indexed-column limit 25; performed the same (approx 1.5 seconds on our customers table -- rows=514431 width=215), while the query select h,g,f,e,d,c,b,a from table order by non-indexed-column limit 25; was about 50% slower (approx 2.2 seconds on our customers table). I had expected these to perform the same -- to my mind column ordering in a query should be purely presentation -- as far as I'm concerned, the DBMS can retrieve the columns in a different order as long as it displays it in the order I've asked for them. Although, again, the order of columns in a resultset in a Java or Python is mostly irrelevant, though when displayed in psql I'd want the columns in the order I asked for them. Is there really something strange happening here? Or perfectly explainable and expected? Regards, Colin -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Order of columns in query is important?!
On 26/05/15 11:59, CK Tan wrote: It has to do with the implementation of slot_getattr, which tries to do the deform on-demand lazily. if you do select a,b,c, the execution would do slot_getattr(1) and deform a, and then slot_getattr(2) which reparse the tuple to deform b, and finally slot_getattr(3), which parse the tuple yet again to deform c. Where as if you do select c, b, a, it would do slot_getattr(3) to deform c, and in the process deform a and b in one pass. Subsequent calls to slot_getattr 1 and 2 would find the attribute ready and available, and return it (without parsing the tuple again). If this was the case, changing column order would lead to performance increase, not decrease as reported. My guess would be same as Amits, it's most likely the additional projection step. -- Petr Jelinek http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Order of columns in query is important?!
On 25 May 2015 at 11:48, Amit Langote langote_amit...@lab.ntt.co.jp wrote: On 2015-05-25 PM 06:26, Colin 't Hart wrote: That means the specified order of columns in a query does matter which would have to match the defined order in order to avoid extra processing (that is only when specified columns *exactly* matches the tuple descriptor). ...and it matters a lot in those cases because we are sorting all of the data scanned, not just 25 rows. -- Simon Riggshttp://www.2ndQuadrant.com/ http://www.2ndquadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training Services
[HACKERS] Selectivity estimation for intarray with @@
Hello. Attached patch based on: http://www.postgresql.org/message-id/capphfdssy+qepdcovxx-b4lp3ybr+qs04m6-arggknfk3fr...@mail.gmail.com and adds selectivity estimation functions to @@ (port from tsquery). Now we support , @, @ and @@. In addition it was written migration to version 1.1 intarray. Because of what this patch requires my other patch: http://www.postgresql.org/message-id/14346041.DNcb5Y1inS@dinodell Alexander Korotkov know about this patch. Thanks. -- Uriy Zhuravlev Postgres Professional: http://www.postgrespro.com The Russian Postgres Companydiff --git a/contrib/intarray/Makefile b/contrib/intarray/Makefile index 920c5b1..16c829c 100644 --- a/contrib/intarray/Makefile +++ b/contrib/intarray/Makefile @@ -2,10 +2,10 @@ MODULE_big = _int OBJS = _int_bool.o _int_gist.o _int_op.o _int_tool.o \ - _intbig_gist.o _int_gin.o $(WIN32RES) + _intbig_gist.o _int_gin.o _int_selfuncs.o $(WIN32RES) EXTENSION = intarray -DATA = intarray--1.0.sql intarray--unpackaged--1.0.sql +DATA = intarray--1.0.sql intarray--1.1.sql intarray--1.0--1.1.sql intarray--unpackaged--1.0.sql PGFILEDESC = intarray - functions and operators for arrays of integers REGRESS = _int diff --git a/contrib/intarray/_int.h b/contrib/intarray/_int.h index d524f0f..739c3c0 100644 --- a/contrib/intarray/_int.h +++ b/contrib/intarray/_int.h @@ -140,6 +140,7 @@ typedef struct QUERYTYPE #define COMPUTESIZE(size) ( HDRSIZEQT + (size) * sizeof(ITEM) ) #define QUERYTYPEMAXITEMS ((MaxAllocSize - HDRSIZEQT) / sizeof(ITEM)) #define GETQUERY(x) ( (x)-items ) +#define GETRQUERY(x) ( (x)-items + ((x)-size - 1) ) /* type codes for ITEM */ #define END 0 diff --git a/contrib/intarray/_int_selfuncs.c b/contrib/intarray/_int_selfuncs.c new file mode 100644 index 000..fd80668 --- /dev/null +++ b/contrib/intarray/_int_selfuncs.c @@ -0,0 +1,346 @@ +/*- + * + * _int_selfuncs.c + * Functions for selectivity estimation of intarray operators + * + * Portions Copyright (c) 1996-2014, PostgreSQL Global Development Group + * Portions Copyright (c) 1994, Regents of the University of California + * + * + * IDENTIFICATION + * contrib/intarray/_int_selfuncs.c + * + *- + */ +#include postgres.h +#include _int.h + +#include access/htup_details.h +#include catalog/pg_operator.h +#include catalog/pg_statistic.h +#include catalog/pg_type.h +#include utils/selfuncs.h +#include utils/syscache.h +#include utils/lsyscache.h +#include miscadmin.h + +PG_FUNCTION_INFO_V1(_int_contsel); +PG_FUNCTION_INFO_V1(_int_contjoinsel); +PG_FUNCTION_INFO_V1(_int_matchsel); + +Datum _int_contsel(PG_FUNCTION_ARGS); +Datum _int_contjoinsel(PG_FUNCTION_ARGS); +Datum _int_matchsel(PG_FUNCTION_ARGS); + +/* lookup table type for binary searching through MCELEMs */ +typedef struct +{ + int32 element; + float4 frequency; +} Int4Freq; + + +static Oid transformOperator(Oid oprOid); +static Selectivity int_querysel(VariableStatData *vardata, Datum constval); +static Selectivity int_query_opr_selec(ITEM *item, Int4Freq *lookup, + int length, float4 minfreq); +static Selectivity mcelem_int_query_selec(QUERYTYPE *query, + Datum *mcelem, int nmcelem, + float4 *numbers, int nnumbers); + +static int +compare_val_int4freq(const void *a, const void *b); + +#define int_query_opr_selec_no_stats(query) \ + int_query_opr_selec(GETRQUERY(query), NULL, 0, 0) + + + +static Oid +transformOperator(Oid oprOid) +{ + HeapTuple tup; + Form_pg_operator op; + Oid result = InvalidOid; + + tup = SearchSysCache1(OPEROID, ObjectIdGetDatum(oprOid)); + if (!HeapTupleIsValid(tup)) + elog(ERROR, Invalid operator: %u, oprOid); + + op = (Form_pg_operator) GETSTRUCT(tup); + + if (!strcmp(op-oprname.data, )) + result = OID_ARRAY_OVERLAP_OP; + else if (!strcmp(op-oprname.data, @)) + result = OID_ARRAY_CONTAINS_OP; + else if (!strcmp(op-oprname.data, @)) + result = OID_ARRAY_CONTAINED_OP; + + ReleaseSysCache(tup); + + if (!OidIsValid(result)) + elog(ERROR, Invalid operator: %u, oprOid); + + return result; +} + +/* + * _int_contsel -- restriction selectivity for array @, , @ operators + */ +Datum +_int_contsel(PG_FUNCTION_ARGS) +{ + PG_RETURN_DATUM(DirectFunctionCall4(arraycontsel, + PG_GETARG_DATUM(0), + ObjectIdGetDatum(transformOperator(PG_GETARG_OID(1))), + PG_GETARG_DATUM(2), + PG_GETARG_DATUM(3))); +} + +/* + * _int_contjoinsel -- join selectivity for array @, , @ operators + */ +Datum +_int_contjoinsel(PG_FUNCTION_ARGS) +{ + PG_RETURN_DATUM(DirectFunctionCall5(arraycontjoinsel, + PG_GETARG_DATUM(0), + ObjectIdGetDatum(transformOperator(PG_GETARG_OID(1))), + PG_GETARG_DATUM(2), + PG_GETARG_DATUM(3), + PG_GETARG_DATUM(4))); +} + + +/* + * _int_matchsel -- restriction selectivity function for intarray @@ int_query + */ +Datum +_int_matchsel(PG_FUNCTION_ARGS) +{ + PlannerInfo *root = (PlannerInfo *)
Re: [HACKERS] fsync-pgdata-on-recovery tries to write to more files than previously
On Mon, May 25, 2015 at 9:54 PM, Andres Freund and...@anarazel.de wrote: On 2015-05-25 21:33:03 -0400, Robert Haas wrote: On Mon, May 25, 2015 at 2:20 PM, Tom Lane t...@sss.pgh.pa.us wrote: Perhaps, but if we didn't have permission to write the file, it's hard to argue that it's our responsibility to fsync it. So this seems like it's adding complexity without really adding any safety. I agree. I think ignoring fsync failures is a very sensible approach. If the files are not writable, they're probably not ours. The reason we started discussing this is because Tom had the - quite reasonable - concern that this might not solely be a problem of EACCESS, but that there could be other errors that we need to ignore to not fail spuriously. Say a symlink goes to a binary, which is currently being executed: ETXTBSY. Or the file is in a readonly filesystem: EROFS. So we'd need to ignore a lot of errors, possibly ignoring valid ones. But ignoring those errors wouldn't compromise data integrity, either. So let's just ignore (but log) all errors: then we'll be demonstrably no worse off than we were before this patch went in. If it turns out that there's some particular case where ignoring errors DOES compromise data integrity, then we can plug that hole surgically when somebody reports it. Anything we do short of making all errors in this area non-fatal is going to leave behind startup-failure cases that exist today, and we have no evidence at this time that such startup failures would be justified by any actual data loss risk. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] PostgreSQL 8.3 index page count clarification
Hi, For the user created indexes in PostgreSQL 8.3.6, I would like to know which file or function populates the index data structure. Specifically, where in the code is index-tuples and index-pages values are fetched and updated. Regards and Thanks in Advance, Srinivas Karthik
Re: [HACKERS] PostgreSQL 8.3 index page count clarification
On 05/26/2015 08:45 AM, Srinivas Karthik V wrote: Hi, For the user created indexes in PostgreSQL 8.3.6, I would like to know which file or function populates the index data structure. Specifically, where in the code is index-tuples and index-pages values are fetched and updated. 8.3 is long out of support. Please move to a modern supported version of PostgreSQL. cheers andrew -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [COMMITTERS] pgsql: Row-Level Security Policies (RLS)
Alvaro, * Alvaro Herrera (alvhe...@2ndquadrant.com) wrote: What do we need RowSecurityPolicy-policy_id for? It seems to me that it is only used to determine whether the policy is the default deny one, so that it can later be removed if a hook adds a different one. This seems contrived as well as under-documented. Why isn't a boolean flag sufficient? Thanks for taking a look! It's also used during relcache updates (see equalPolicy()). That wasn't originally the case (I had missed adding the necessary bits to relcache in the original patch), but I wouldn't want to remove that piece now and, given that it's there, using InvalidOid to indicate when it's the default-deny policy (and therefore this is no actual Oid) seems sensible. Thanks again! Stephen signature.asc Description: Digital signature
Re: [HACKERS] fsync-pgdata-on-recovery tries to write to more files than previously
On Mon, May 25, 2015 at 9:54 PM, Stephen Frost sfr...@snowman.net wrote: I certainly see your point, but Tom also pointed out that it's not great to ignore failures during this phase: * Tom Lane (t...@sss.pgh.pa.us) wrote: Greg Stark st...@mit.edu writes: What exactly is failing? Is it that fsync is returning -1 ? According to the original report from Christoph Berg, it was open() not fsync() that was failing, at least in permissions-based cases. I'm not sure if we should just uniformly ignore all failures in this phase. That would have the merit of clearly not creating any new startup failure cases compared to the previous code, but as you say sometimes it might mean ignoring real problems. If we accept this, then we still have to have the lists, to decide what to fail on and what to ignore. If we're going to have said lists tho, I don't really see the point in fsync'ing things we're pretty confident aren't ours. No, that's not right at all. The idea, at least as I understand it, would be decide which errors to ignore by looking at the error code, not by looking at which file is involved. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Why does txid_current() assign new transaction-id?
Christoph Berg m...@debian.org writes: I think the OP's point was (or should have been), to make txid_current not draw a new xid when run outside a transaction block, though it's questionable if that wouldn't just add a POLA-violating layer. Well, the patch as proposed failed to do that, but in any case I don't see much argument for changing the long-established behavior of that function. Better to add another one if you want a read-only monitoring probe. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Run pgindent now?
On Mon, May 25, 2015 at 5:34 PM, Tom Lane t...@sss.pgh.pa.us wrote: Bruce Momjian br...@momjian.us writes: On Mon, May 25, 2015 at 04:52:38PM -0300, Alvaro Herrera wrote: Something is wrong. See aclchk.c changes. Yes, this is what I was concerned about. aclitem was a typedef in 9.0 and 9.1, and the use of that as a typedef in 9.4 is certainly odd: - aclitem.ai_grantor = grantorId; + aclitem.ai_grantor = grantorId; Yeah. I think we might've gotten rid of that typedef partially in order to fix this. A different strategy we could consider is use HEAD's typedef list even in the back branches. This would in some situations lead to inferior-looking results in the back branches, but that's probably better than inferior results in HEAD. (In any case, we want the same typedef list across all branches. Then anyplace where the results diverge, there must have been non-pgindent code changes, so that back-patching would require manual fixups anyway.) This is kind of why I think that reindenting the back branches is unlikely to be productive: it only helps if you can get pgindent to do the same thing on all branches, and I bet that's going to be tough. Realistically, with merge.conflictstyle = diff3 (why is this not the default?), resolving whitespace conflicts that occur when you try to cherry-pick is typically not very difficult. But every time we pgindent, especially with slightly different settings, we cause tools like 'git blame' to return less useful answers. And that sucks. We also risk breaking private patchsets that people are carrying - of course, Advanced Server is one (very large) such patchset, but it's hardly the only place where people are compiling a non-standard distribution that has to be reconciled with upstream changes. I'm not going to put up a huge fuss if we decide to go ahead with this, but I still think it's a bad plan, especially with regarding to existing branches that have not been re-indented in years. I bet it won't save that much back-patching effort - maybe not any, on net - and I bet it will inconvenience users and developers in various subtle ways that we may not even hear about but which are still quite real. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] ERROR: MultiXactId xxxx has not been created yet -- apparent wraparound
Paul Smith wrote: No, nothing like that. It was just running fine, and then suddenly (at 2am on 23 May) it started throwing up loads of these errors. The DB server wasn't even restarted at that point. It was just working fine, then suddenly wasn't. (The first error was at 02:00:32 BST, then every few minutes after that there's another one). Another crazy thought is that the host system messed up and overwrote part of the table with pages from some other guest system. I have never seen a report of such a thing happening. It's running in a Hyper-V guest. We had taken a backup of the VM at 00:34 on 23 May and that looks to be absolutely fine. What I have done now is restore that backup and import the new data which arrived since that backup was made, and it seems OK now. I still have the 'broken' installation in case more information is needed from it. I'd try to get a raw dump of the damaged tuple data if I knew how to find where it is in the relation file... Well, you could try a binary search to locate the (or a) borked page within the table. Or you could write a plpgsql function with an EXCEPTION block that reads each tuple, and in case of error reports the CTID of the tuple being read. (select * from tab where ctid=generated_ctid; EXCEPT WHEN OTHERS THEN raise notice ' .. ', generated_ctid). Once you have located the problem tuple, a pageinspect dump would be the starting point. Another idea is to attach a debugger to the backend process, and set a breakpoint on function errfinish. Once the error is hit the breakpoint will stop the execution before the error is reported; at that point you can request for a backtrace from the debugger. (Make sure to examine errordata to see whether the error being hit is the multixact one.) I suppose it's possible that it was disk or memory corruption, but I've seen that before, and it hasn't looked like this. Yeah, same here. -- Álvaro Herrerahttp://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] fsync bug faq for publication?
On 05/25/2015 11:09 PM, Magnus Hagander wrote: On May 26, 2015 07:31, Tom Lane t...@sss.pgh.pa.us mailto:t...@sss.pgh.pa.us wrote: Josh Berkus j...@agliodbs.com mailto:j...@agliodbs.com writes: We need to get a notice out to our users who might update their servers and get stuck behind the fsync bug. As such, I've prepared a FAQ. Please read, correct and improve this FAQ so that it's fit for us to announce to users as soon as possible: https://wiki.postgresql.org/wiki/May_2015_Fsync_Permissions_Bug Judging by Ross Boylan's report at http://www.postgresql.org/message-id/f1f13e14a610474196571953929c02096d0...@ex08.net.ucsf.edu it's not sufficient to just recommend changing permissions on the problematic files. It's not entirely clear from here whether there is a solution that both allows fsync on referenced files and keeps OpenSSL happy; but if there is, it probably requires making the cert files be owned by the postgres user, as well as adjusting their permissions to be 0640 or thereabouts. I'm worried about whether that breaks other services using the same cert files. It almost certainly will. I think the recommendation has to be that if it's a symlink, it should be replaced with a copy of the file, and that copy be chown and chmod the right way. Where did we get the idea that this issue only affects symlinked files? On testing, any file which postgres doesn't have write permissions on is affected: root@d623471b11ee:/var/lib/postgresql/9.3/main# touch root_file.txt root@d623471b11ee:/var/lib/postgresql/9.3/main# ls -l total 60 -rw--- 1 postgres postgres4 May 26 17:46 PG_VERSION drwx-- 5 postgres postgres 4096 May 26 17:46 base drwx-- 2 postgres postgres 4096 May 26 17:46 global drwx-- 2 postgres postgres 4096 May 26 17:46 pg_clog drwx-- 4 postgres postgres 4096 May 26 17:46 pg_multixact drwx-- 2 postgres postgres 4096 May 26 17:46 pg_notify drwx-- 2 postgres postgres 4096 May 26 17:46 pg_serial drwx-- 2 postgres postgres 4096 May 26 17:46 pg_snapshots drwx-- 2 postgres postgres 4096 May 26 17:47 pg_stat drwx-- 2 postgres postgres 4096 May 26 17:46 pg_stat_tmp drwx-- 2 postgres postgres 4096 May 26 17:46 pg_subtrans drwx-- 2 postgres postgres 4096 May 26 17:46 pg_tblspc drwx-- 2 postgres postgres 4096 May 26 17:46 pg_twophase drwx-- 3 postgres postgres 4096 May 26 17:46 pg_xlog -rw--- 1 postgres postgres 133 May 26 17:46 postmaster.opts -rw-r--r-- 1 root root0 May 26 17:49 root_file.txt root@d623471b11ee:/var/lib/postgresql/9.3/main# service postgresql start * Starting PostgreSQL 9.3 database server [ OK ] root@d623471b11ee:/var/lib/postgresql/9.3/main# ps aux | grep postgres postgres 4627 0.2 0.4 244880 16100 ?S17:49 0:00 /usr/lib/postgresql/9.3/bin/postgres -D /var/lib/postgresql/9.3/main -c config_file=/etc/postgresql/9.3/main/postgresql.conf postgres 4629 0.0 0.0 244880 1868 ?Ss 17:49 0:00 postgres: checkpointer process postgres 4630 0.0 0.0 244880 1872 ?Ss 17:49 0:00 postgres: writer process postgres 4631 0.0 0.0 244880 1648 ?Ss 17:49 0:00 postgres: wal writer process postgres 4632 0.0 0.0 245632 2956 ?Ss 17:49 0:00 postgres: autovacuum launcher process postgres 4633 0.0 0.0 100556 1768 ?Ss 17:49 0:00 postgres: stats collector process root 4647 0.0 0.0 8860 648 ?S+ 17:49 0:00 grep --color=auto postgres root@d623471b11ee:/var/lib/postgresql/9.3/main# kill -9 4627 root@d623471b11ee:/var/lib/postgresql/9.3/main# service postgresql start * Starting PostgreSQL 9.3 database server * Removed stale pid file. The PostgreSQL server failed to start. Please check the log output: 2015-05-26 17:49:36 UTC [4676-1] LOG: database system was interrupted; last known up at 2015-05-26 17:49:16 UTC 2015-05-26 17:49:36 UTC [4676-2] FATAL: could not open file /var/lib/postgresql/9.3/main/root_file.txt: Permission denied 2015-05-26 17:49:36 UTC [4675-1] LOG: startup process (PID 4676) exited with exit code 1 2015-05-26 17:49:36 UTC [4675-2] LOG: aborting startup due to startup process failure -- Josh Berkus PostgreSQL Experts Inc. http://pgexperts.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Why does txid_current() assign new transaction-id?
Re: Tom Lane 2015-05-26 18863.1432661...@sss.pgh.pa.us Christoph Berg m...@debian.org writes: Still, exposing GetStableLatestTransactionId() on the SQL level would make sense for monitoring transaction throughput. Perhaps, though I wonder why we should expose that and not just report the result of ReadNewTransactionId() --- or in txid.c's case, the result of GetNextXidAndEpoch(). Whatever is most suitable, yes. In either case it would have to be a new function, not unilaterally redefining what txid_current() does. Sure. I think the OP's point was (or should have been), to make txid_current not draw a new xid when run outside a transaction block, though it's questionable if that wouldn't just add a POLA-violating layer. Christoph -- c...@df7cb.de | http://www.df7cb.de/ -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] fsync bug faq for publication?
Josh Berkus j...@agliodbs.com writes: Where did we get the idea that this issue only affects symlinked files? Nobody said any such thing. My point was that permissions and ownership both have to be looked at. The Debian situation is that there are symlinks in $PGDATA pointing at root-owned files, and those files are (we think) also used by other services; so Magnus' point was that you'd probably better copy those files not modify their ownership/permissions in situ. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] fsync bug faq for publication?
On 05/26/2015 10:57 AM, Tom Lane wrote: Josh Berkus j...@agliodbs.com writes: Where did we get the idea that this issue only affects symlinked files? Nobody said any such thing. My point was that permissions and ownership both have to be looked at. The Debian situation is that there are symlinks in $PGDATA pointing at root-owned files, and those files are (we think) also used by other services; so Magnus' point was that you'd probably better copy those files not modify their ownership/permissions in situ. Updated, please make further corrections so I can get an announcement out ASAP. Thanks! https://wiki.postgresql.org/wiki/May_2015_Fsync_Permissions_Bug -- Josh Berkus PostgreSQL Experts Inc. http://pgexperts.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] fsync-pgdata-on-recovery tries to write to more files than previously
On 05/26/2015 08:05 AM, Robert Haas wrote: On Mon, May 25, 2015 at 9:54 PM, Stephen Frost sfr...@snowman.net wrote: I certainly see your point, but Tom also pointed out that it's not great to ignore failures during this phase: * Tom Lane (t...@sss.pgh.pa.us) wrote: Greg Stark st...@mit.edu writes: What exactly is failing? Is it that fsync is returning -1 ? According to the original report from Christoph Berg, it was open() not fsync() that was failing, at least in permissions-based cases. I'm not sure if we should just uniformly ignore all failures in this phase. That would have the merit of clearly not creating any new startup failure cases compared to the previous code, but as you say sometimes it might mean ignoring real problems. If we accept this, then we still have to have the lists, to decide what to fail on and what to ignore. If we're going to have said lists tho, I don't really see the point in fsync'ing things we're pretty confident aren't ours. No, that's not right at all. The idea, at least as I understand it, would be decide which errors to ignore by looking at the error code, not by looking at which file is involved. OK, I'm late to the party. But why exactly are we syncing absolutely everything? That seems over-broad. And might it be better to check that we can open each file using access() than calling open() and looking at the error code? cheers andrew -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] ERROR: MultiXactId xxxx has not been created yet -- apparent wraparound
On 26/05/2015 16:01, Alvaro Herrera wrote: Paul Smith wrote: With PostgreSQL 9.3.5 on Ubuntu 12.04, I'm getting the error: ERROR: MultiXactId 1934308693 has not been created yet -- apparent wraparound on doing various queries on our database. I don't think it is a wraparound - I think the tuple has mistakenly decided it has a MultiXactId related to it. Yeah, that looks like the case. According to your pg_controldata output, you haven't used many multixacts at all: Yep, that's what I thought as well. so it doesn't seem plausible that the single bit HEAP_XMAX_IS_MULTI was turned on accidentally (something which I've never seen happen). It doesn't look like a randomly corrupt page either; normally you would see errors about mismatching page headers before you get to the point where Xmax is read. I wonder if the data page came from elsewhere. Maybe you copied a data file from another database? No, nothing like that. It was just running fine, and then suddenly (at 2am on 23 May) it started throwing up loads of these errors. The DB server wasn't even restarted at that point. It was just working fine, then suddenly wasn't. (The first error was at 02:00:32 BST, then every few minutes after that there's another one). It's running in a Hyper-V guest. We had taken a backup of the VM at 00:34 on 23 May and that looks to be absolutely fine. What I have done now is restore that backup and import the new data which arrived since that backup was made, and it seems OK now. I still have the 'broken' installation in case more information is needed from it. I'd try to get a raw dump of the damaged tuple data if I knew how to find where it is in the relation file... I suppose it's possible that it was disk or memory corruption, but I've seen that before, and it hasn't looked like this. (There are several PostgreSQL guests running on the same Hyper-V server, and none of the others seem to have this problem which may suggest that it's less likely to be a hardware issue). -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] fsync-pgdata-on-recovery tries to write to more files than previously
On 05/26/2015 11:58 AM, Tom Lane wrote: Andrew Dunstan and...@dunslane.net writes: OK, I'm late to the party. But why exactly are we syncing absolutely everything? That seems over-broad. If we try to be selective, we risk errors of omission, which no one would ever notice until someone's data got eaten in a low-probability crash scenario. It seems more robust (at least to me) to fsync everything we can find. That does require more thought about error cases than went into the original patch ... but I think that we need more thought about error cases even if we do try to be selective. One thing perhaps we *should* be selective about, though, is which symlinks we try to follow. I think that a good case could be made for ignoring symlinks everywhere except in the pg_tablespace directory. If we did, that would all by itself take care of the Debian scenario, if I understand that case correctly. People have symlinked the xlog directory. I've done it myself in the past. A better rule might be to ignore symlinks unless either they are in pg_tblspc or they are in the data directory and their name starts with pg_. cheers andrew -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] fsync-pgdata-on-recovery tries to write to more files than previously
Andrew Dunstan and...@dunslane.net writes: On 05/26/2015 11:58 AM, Tom Lane wrote: One thing perhaps we *should* be selective about, though, is which symlinks we try to follow. I think that a good case could be made for ignoring symlinks everywhere except in the pg_tablespace directory. If we did, that would all by itself take care of the Debian scenario, if I understand that case correctly. People have symlinked the xlog directory. Good point, we need to cover that case. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] fsync-pgdata-on-recovery tries to write to more files than previously
Andrew Dunstan and...@dunslane.net writes: OK, I'm late to the party. But why exactly are we syncing absolutely everything? That seems over-broad. If we try to be selective, we risk errors of omission, which no one would ever notice until someone's data got eaten in a low-probability crash scenario. It seems more robust (at least to me) to fsync everything we can find. That does require more thought about error cases than went into the original patch ... but I think that we need more thought about error cases even if we do try to be selective. One thing perhaps we *should* be selective about, though, is which symlinks we try to follow. I think that a good case could be made for ignoring symlinks everywhere except in the pg_tablespace directory. If we did, that would all by itself take care of the Debian scenario, if I understand that case correctly. And might it be better to check that we can open each file using access() than calling open() and looking at the error code? Don't really see the point; that's just an extra step, and access() won't exactly prove you can open the file, anyway. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] why does txid_current() assign new transaction-id?
Hi,hackers! I have a question about txid_current(). it is Why does txid_current() assign new transaction-id?. When we executes txid_current() outside of transaction block, it assigns new transaction-id. I guess it doesn't need to assign a new txid because txid_current() is just a read-only function. I found a replaceable function by walking through pg-code, that is GetStableLatestTransactionId(void). I attached a patch which changing just 1-line. Could you please check the code? Regards, Naoya --- Naoya Anzai Engineering Department NEC Solution Inovetors, Ltd. E-Mail: nao-an...@xc.jp.nec.com --- txid_current.patch Description: txid_current.patch -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] brin regression test intermittent failures
Tom Lane wrote: Peter Geoghegan p...@heroku.com writes: I meant to get around to looking into it, but FWIW I see BRIN-related Valgrind issues. e.g.: Fixed, see 79f2b5d583e2e2a7; but AFAICS this has no real-world impact so it does not explain whatever is happening on chipmunk. Ah, thanks for diagnosing that. The chipmunk failure is strange -- notice it only references the = operators, except for type box for which it's ~= that fails. The test includes a lot of operators ... Also, we have quite a number of ARM boxes: apart from chipmunk we have gull, hamster, mereswine, dangomushi, axolotl, grison. (hamster and chipmunk report hostname -m as armv6l, the others armv7l). All of them are running Linux, either Fedora or Debian. Most are using gcc, compilation flags look pretty standard. -- Álvaro Herrerahttp://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Why does txid_current() assign new transaction-id?
Naoya Anzai nao-an...@xc.jp.nec.com writes: I have a question about txid_current(). it is Why does txid_current() assign new transaction-id?. Consider begin; select txid_current(); insert into my_table ...; commit; If we changed the code as you propose, the result of the SELECT would no longer have anything to do with the XID used for the insertion. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] fsync-pgdata-on-recovery tries to write to more files than previously
Robert Haas robertmh...@gmail.com writes: Anything we do short of making all errors in this area non-fatal is going to leave behind startup-failure cases that exist today, and we have no evidence at this time that such startup failures would be justified by any actual data loss risk. Yeah. Perhaps I missed it, but was the original patch motivated by actual failures that had been seen in the field, or was it just a hypothetical concern? Certainly, any actual failures of that sort are few and far between compared to the number of problems we now realize the patch introduced. Also, we need to discuss how hard walkdir() needs to try to avoid throwing errors of its own. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Run pgindent now?
Robert Haas wrote: But every time we pgindent, especially with slightly different settings, we cause tools like 'git blame' to return less useful answers. And that sucks. I've wondered a few times whether there's a way to make pgindent commits transparent to git blame, i.e. blame their modified lines to whatever commits modified them immediately before. -- Álvaro Herrerahttp://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] jsonb_set
Here is the latest version of the refinement of jsonb_replace into jsonb_set. All known bugs have been fixed, and the only issue is the default value of the fourth parameter. Currently it's set to false, but I gather from the previous responses that the consensus is to make it true. cheers andrew diff --git a/doc/src/sgml/func.sgml b/doc/src/sgml/func.sgml index 89a609f..a33f03d 100644 --- a/doc/src/sgml/func.sgml +++ b/doc/src/sgml/func.sgml @@ -10323,6 +10323,15 @@ table2-mapping /tgroup /table + note + para +The literal||/ operator concatenates the elements at the top level of +each of its operands. It does not operate recursively. For example, if +both operands are objects with a common key field name, the value of the +field in the result will just be the value from the right hand operand. + /para + /note + para xref linkend=functions-json-creation-table shows the functions that are available for creating typejson/type and typejsonb/type values. @@ -10830,17 +10839,24 @@ table2-mapping entryliteral[{f1:1},2,null,3]/literal/entry /row row - entryparaliteraljsonb_replace(target jsonb, path text[], replacement jsonb)/literal + entryparaliteraljsonb_set(target jsonb, path text[], new_value jsonboptional, parametercreate_missing/parameter typeboolean/type/optional)/literal /para/entry entryparatypejsonb/type/para/entry entry Returns replaceabletarget/replaceable - with the section designated by replaceablepath/replaceable - replaced by replaceablereplacement/replaceable. - /entry - entryliteraljsonb_replace('[{f1:1,f2:null},2,null,3]', '{0,f1}','[2,3,4]')/literal/entry - entryliteral[{f1:[2,3,4],f2:null},2,null,3]/literal -/entry + with the section designated by replaceablepath/replaceable + replaced by replaceablenew_value/replaceable, or with + replaceablenew_value/replaceable added if + replaceablecreate_missing/replaceable is true ( default is + literalfalse/) and the item + designated by replaceablepath/replaceable does not exist. + /entry + entryparaliteraljsonb_set('[{f1:1,f2:null},2,null,3]', '{0,f1}','[2,3,4]')/literal + /paraparaliteraljsonb_set('[{f1:1,f2:null},2]', '{0,f3}','[2,3,4]', true)/literal + /para/entry + entryparaliteral[{f1:[2,3,4],f2:null},2,null,3]/literal + /paraparaliteral[{f1: 1, f2: null, f3: [2, 3, 4]}, 2]/literal +/para/entry /row row entryparaliteraljsonb_pretty(from_json jsonb)/literal @@ -10893,6 +10909,27 @@ table2-mapping note para + All the items of the literalpath/ parameter of literaljsonb_set/ + must be present in the literaltarget/, except when + literalcreate_missing/ is true, in which case all but the last item + must be present. If these conditions are not met the literaltarget/ + is returned unchanged. +/para +para + If the last path item is an object key, it will be created if it + is absent and given the new value. If the last path item is an array + index, if it is positive the item to set is found by counting from + the left, and if negative by counting from the right - literal-1/ + designates the rightmost element, and so on. + If the item is out of the range -array_length .. array_length -1, + and create_missing is true, the new value is added at the beginning + of the array if the item is negative, and at the end of the array if + it is positive. +/para + /note + + note +para The literaljson_typeof/ function's literalnull/ return value should not be confused with a SQL NULL. While calling literaljson_typeof('null'::json)/ will diff --git a/src/backend/catalog/system_views.sql b/src/backend/catalog/system_views.sql index 18921c4..3ff1437 100644 --- a/src/backend/catalog/system_views.sql +++ b/src/backend/catalog/system_views.sql @@ -922,3 +922,10 @@ RETURNS interval LANGUAGE INTERNAL STRICT IMMUTABLE AS 'make_interval'; + +CREATE OR REPLACE FUNCTION + jsonb_set(jsonb_in jsonb, path text[] , replacement jsonb, create_if_missing boolean DEFAULT false) +RETURNS jsonb +LANGUAGE INTERNAL +STRICT IMMUTABLE +AS 'jsonb_set'; diff --git a/src/backend/utils/adt/jsonfuncs.c b/src/backend/utils/adt/jsonfuncs.c index 5143493..e41c869 100644 --- a/src/backend/utils/adt/jsonfuncs.c +++ b/src/backend/utils/adt/jsonfuncs.c @@ -124,18 +124,20 @@ static JsonbValue *findJsonbValueFromContainerLen(JsonbContainer *container, char *key, uint32 keylen); -/* functions supporting jsonb_delete, jsonb_replace and jsonb_concat */ +/* functions supporting jsonb_delete, jsonb_set and jsonb_concat */ static JsonbValue *IteratorConcat(JsonbIterator **it1, JsonbIterator **it2, JsonbParseState **state); -static JsonbValue *replacePath(JsonbIterator **it,
Re: [HACKERS] optimizing vacuum truncation scans
On Mon, Apr 20, 2015 at 10:18 AM, Jim Nasby jim.na...@bluetreble.com wrote: On 4/20/15 1:50 AM, Jeff Janes wrote: For that matter, why do we scan backwards anyway? The comments don't explain it, and we have nonempty_pages as a starting point, so why don't we just scan forward? I suspect that eons ago we didn't have that and just blindly reverse-scanned until we finally hit a non-empty buffer... nonempty_pages is not concurrency safe, as the pages could become used after vacuum passed them over but before the access exclusive lock was grabbed before the truncation scan. But maybe the combination of the two? If it is above nonempty_pages, then anyone who wrote into the page after vacuum passed it must have cleared the VM bit. And currently I think no one but vacuum ever sets VM bit back on, so once cleared it would stay cleared. Right. In any event nonempty_pages could be used to set the guess as to how many pages (if any) might be worth prefetching, as that is not needed for correctness. Yeah, but I think we'd do a LOT better with the VM idea, because we could immediately truncate without scanning anything. Right now all the interlocks to make this work seem to be in place (only vacuum and startup can set visibility map bits, and only one vacuum can be in a table at a time). But as far as I can tell, those assumption are not baked in and we have pondered loosening them before. For example, letting HOT clean up mark a page as all-visible if it finds it be such. Now in that specific case it would be OK, as HOT cleanup would not cause a page to become empty (or could it? If an insert on a table with no indexes was rolled back, and hot clean up found it and cleaned it up, it could conceptually become empty--unless we make special code to prevent it) , and so the page would have to be below nonempty_pages. But there may be other cases. And I know other people have mentioned making VACUUM concurrent (although I don't see the value in that myself). So doing it this way would be hard to beat (scanning a bitmap vs the table itself), but it would also introduce a modularity violation that I am not sure is worth it. Of course this could always be reverted if its requirements became a problem for a more important change (assuming of course that we detected the problem) Cheers, Jeff
Re: [HACKERS] Run pgindent now?
On 05/25/2015 05:34 PM, Tom Lane wrote: Bruce Momjian br...@momjian.us writes: On Mon, May 25, 2015 at 04:52:38PM -0300, Alvaro Herrera wrote: Something is wrong. See aclchk.c changes. Yes, this is what I was concerned about. aclitem was a typedef in 9.0 and 9.1, and the use of that as a typedef in 9.4 is certainly odd: - aclitem.ai_grantor = grantorId; + aclitem.ai_grantor = grantorId; Yeah. I think we might've gotten rid of that typedef partially in order to fix this. A different strategy we could consider is use HEAD's typedef list even in the back branches. This would in some situations lead to inferior-looking results in the back branches, but that's probably better than inferior results in HEAD. (In any case, we want the same typedef list across all branches. Then anyplace where the results diverge, there must have been non-pgindent code changes, so that back-patching would require manual fixups anyway.) A longer-term fix would be to make pgindent less stupid about this sort of usage, but nobody's yet volunteered to dig into the guts of that code. It looks to me like this says we should use the typedefs for each branch in any pgindent run for that branch, with the list being fetched just before each run, so it reflects any backpatches, bug fixes etc. The buildfarm collects these lists for all live branches now (or at least my animals do) and keeps them up to date. Anything else is likely to lead to false positives with results like that above. cheers andrew -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] ERROR: MultiXactId xxxx has not been created yet -- apparent wraparound
On 05/26/2015 11:47 AM, Alvaro Herrera wrote: Paul Smith wrote: No, nothing like that. It was just running fine, and then suddenly (at 2am on 23 May) it started throwing up loads of these errors. The DB server wasn't even restarted at that point. It was just working fine, then suddenly wasn't. (The first error was at 02:00:32 BST, then every few minutes after that there's another one). Another crazy thought is that the host system messed up and overwrote part of the table with pages from some other guest system. I have never seen a report of such a thing happening. I have. Linux kernel version of the host? -- Josh Berkus PostgreSQL Experts Inc. http://pgexperts.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Run pgindent now?
Robert Haas robertmh...@gmail.com writes: Realistically, with merge.conflictstyle = diff3 (why is this not the default?), resolving whitespace conflicts that occur when you try to cherry-pick is typically not very difficult. Really? The problems I have generally come from places where pgindent has changed the line breaks, not just horizontal spacing. I haven't seen anything that copes with this, certainly not git. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] ERROR: MultiXactId xxxx has not been created yet -- apparent wraparound
On 26/05/2015 19:47, Alvaro Herrera wrote: Paul Smith wrote: No, nothing like that. It was just running fine, and then suddenly (at 2am on 23 May) it started throwing up loads of these errors. The DB server wasn't even restarted at that point. It was just working fine, then suddenly wasn't. (The first error was at 02:00:32 BST, then every few minutes after that there's another one). Another crazy thought is that the host system messed up and overwrote part of the table with pages from some other guest system. I have never seen a report of such a thing happening. Yes, that seems unlikely to me, but you never know... While I was trying to work out how to fix it (after my original email), I did see a pgsql server crash: 2015-05-26 12:26:30 BST LOG: server process (PID 35493) was terminated by signal 11: Segmentation fault It's running in a Hyper-V guest. We had taken a backup of the VM at 00:34 on 23 May and that looks to be absolutely fine. What I have done now is restore that backup and import the new data which arrived since that backup was made, and it seems OK now. I still have the 'broken' installation in case more information is needed from it. I'd try to get a raw dump of the damaged tuple data if I knew how to find where it is in the relation file... Well, you could try a binary search to locate the (or a) borked page within the table. Or you could write a plpgsql function with an EXCEPTION block that reads each tuple, and in case of error reports the CTID of the tuple being read. (select * from tab where ctid=generated_ctid; EXCEPT WHEN OTHERS THEN raise notice ' .. ', generated_ctid). In the backup, for the row which will become broken, I get: ctid| xmin | xmax +--+-- (18722,29) | 23862661 | 23862661 Assuming it hadn't changed in the broken version (from application knowledge, it's unlikely to change), then on the broken version, select heap_page_items(get_raw_page('messages.msgdata', 18722)); gives: (1,2,2,0,) (2,7816,1,376,21857691,21857691,11,(18722,2),32784,10646,24,,) (3,4,2,0,) (4,7568,1,248,21858126,21858126,11,(18722,4),32784,10646,24,,) (5,6,2,0,) (6,7216,1,352,22823803,23871405,41,(18722,30),49168,9478,24,,) (7,8,2,0,) (8,6968,1,248,22825206,22825206,11,(18722,8),32784,10646,24,,) (9,10,2,0,) (10,6720,1,248,22826427,22826427,11,(18722,10),32784,10646,24,,) (11,12,2,0,) (12,6336,1,380,22826899,23935627,1,(18722,31),49168,9478,24,,) (13,0,3,0,) (14,15,2,0,) (15,5400,1,932,23862314,23862314,11,(18722,15),32784,10642,24,,) (16,17,2,0,) (17,5168,1,228,23862333,23862333,11,(18722,17),32784,10646,24,,) (18,19,2,0,) (19,4800,1,368,23862365,23862365,11,(18722,19),32784,10646,24,,) (20,24,2,0,) (21,3112,1,812,1332175474,894779760,223823413,(1443526721,25142),12141,18516,57,,) (22,23,2,0,) (23,4448,1,352,728903474,1466642284,1869042033,(1262568257,16743),16751,26455,65,011010101101001010101010001001101010110100101010100010100111011011001100100010110110111011000010101100101011001011011000110010110010110011001011,) (24,4184,1,264,619804,228,8000,(956969542,26694),22857,29251,98,110001101110011001100110011111001010001000101110010011010010100101100010110010010110011000100110101011100100010101100010111011000110101000101101011111000101011001001100100111001101001010011010011001101100011010001010001101101001011001101000101000110110011010101100001010100110010010100111011000101110010010100101101011010110101000101100101100101000101011111010111000101100110001110010010100101010010100100100101001101100111010110101111001101100101000110110101101101001101001100110,) (25,26,2,0,) (26,3928,1,252,1212837441,1232350037,1668772674,(728910135,11093),31284,16687,70,01101010001010001010010011001100110010101101001001110111011011100110011001101011011010011100100001001010100011000101001000101110110101000110101100101011011010110010001100100010101011101100111001001100110010110101101101100100101101101110101010100010011011100100101010110110001100100101011001001110001101101100011001100110,1717791858) (27,21,2,0,) (28,29,2,0,) (29,2296,1,812,1229271346,1934308693,1733443957,(1848337707,21358),31337,21592,88,,1178686785) (30,1944,1,352,947155032,1380218998,792031842,(1211650927,22579),30585,20532,80,,) (31,1560,1,380,23935627,23935627,1,(18722,31),40976,9510,24,,) (31 rows) The data for index 29 has 1934308693 in the header (which is the multixactid reported in the error message when trying to retrieve that row). I'm struggling to find the
Re: [HACKERS] fsync-pgdata-on-recovery tries to write to more files than previously
On 2015-05-26 19:07:20 +0200, Andres Freund wrote: It is somewhat interesting that similar code has been used in pg_upgrade, via initdb -S, for a while now, without, to my knowledge, it causing reported problem. I think the relevant difference is that that code doesn't follow symlinks. It's obviously also less exercised and poeople might just have fixed up permissions when encountering troubles. Abhijit, do you recall why the code was changed to follow all symlinks in contrast to explicitly going through the tablespaces as initdb -S does? I'm pretty sure early versions of the patch pretty much had a verbatim copy of the initdb logic? That logic is missing pg_xlog btw, which is bad for pg_upgrade. So, this was discussed in the following thread, starting at: http://archives.postgresql.org/message-id/20150403163232.GA28444%40eldon.alvh.no-ip.org Actually, since surely we must follow symlinks everywhere, why do we have to do this separately for pg_tblspc? Shouldn't that link-following occur automatically when walking PGDATA in the first place? I don't think it's true that we must follow symlinks everywhere. I think, as argued upthread, that it's sufficient to recurse through PGDATA, follow the symlinks in pg_tbspc, and if a symlink, also go through pg_xlog separately. There are no other places we it's allowed to introduce symlinks and we have refuted bugreports of people having problems after doing that. So what I propose is: 1) Remove the automatic symlink following 2) Follow pg_tbspc/*, pg_xlog if it's a symlink, fix the latter in initdb -S 3) Add a elevel argument to walkdir(), return if AllocateDir() fails, continue for stat() failures in the readdir() loop. 4) Add elevel argument to pre_sync_fname, fsync_fname, return after errors. 5) Accept EACCESS, ETXTBSY (if defined) when open()ing the files. By virtue of not following symlinks we should not need to worry about EROFS I'm inclined to think that 4) is a big enough compat break that a fsync_fname_ext with the new argument is a good idea. Arguments for/against? -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Run pgindent now?
On Tue, May 26, 2015 at 3:07 PM, Tom Lane t...@sss.pgh.pa.us wrote: Robert Haas robertmh...@gmail.com writes: Realistically, with merge.conflictstyle = diff3 (why is this not the default?), resolving whitespace conflicts that occur when you try to cherry-pick is typically not very difficult. Really? The problems I have generally come from places where pgindent has changed the line breaks, not just horizontal spacing. I haven't seen anything that copes with this, certainly not git. Iif pgindet were easy to run, committers could start complaining if patch submissions don't abide by pg coding style conventions. Part of submitting a patch would be making sure that an pgindent run after the patch has been applied is still a no-op... A reviewer could easily check it, and a committer could easily squash the pgindent run result in if they wanted to be nice to a 1st time submitter... If every patch were pgindent clean, then you would never end up with these huge pgindent commits causing pain... a.
Re: [HACKERS] Run pgindent now?
On 2015-05-26 16:32:42 -0300, Alvaro Herrera wrote: I've wondered a few times whether there's a way to make pgindent commits transparent to git blame, i.e. blame their modified lines to whatever commits modified them immediately before. You can make blame ignore whitespace changes with -w -- but that obviously doesn't help with rewrapped lines and such. -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] hstore_plpython regression test does not work on Python 3
22.05.2015, 09:44, Christian Ullrich kirjoitti: * Peter Eisentraut wrote: On 5/16/15 12:06 PM, Tom Lane wrote: As exhibited for instance here: http://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=spoonbilldt=2015-05-16%2011%3A00%3A07 I've been able to replicate this on a Fedora 21 box: works fine with Python 2, fails with Python 3. Seems like we still have an issue with reliance on a system-provided sort method. Pushed a fix, tested with 2.3 .. 3.4. There is still a sorting problem (of sorts). jaguarundi [1] keeps failing intermittently like this: *** 47,53 return len(val) $$; SELECT test1arr(array['aa=bb, cc=NULL'::hstore, 'dd=ee']); ! INFO: [{'aa': 'bb', 'cc': None}, {'dd': 'ee'}] CONTEXT: PL/Python function test1arr test1arr -- --- 47,53 return len(val) $$; SELECT test1arr(array['aa=bb, cc=NULL'::hstore, 'dd=ee']); ! INFO: [{'cc': None, 'aa': 'bb'}, {'dd': 'ee'}] CONTEXT: PL/Python function test1arr test1arr -- I cannot find any other animal that does the same, but I doubt it's due to CCA this time. Should dict tests perhaps output sorted(thedict.items()) instead? Testing dict formatting could be done with single-item dicts. [1] http://pgbuildfarm.org/cgi-bin/show_history.pl?nm=jaguarundibr=HEAD Looks like that animal uses Python 3.4. Python 3.3 and newer versions default to using a random seed for hashing objects into dicts which makes the order of dict elements random; see https://docs.python.org/3/using/cmdline.html#cmdoption-R The test case could be changed to use sorted(dict.items()) always, but there are multiple places where it would need to be applied. Setting the PYTHONHASHSEED environment variable to a stable value would probably be easier. / Oskari -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Redesigning checkpoint_segments
On Thu, May 21, 2015 at 8:40 AM, Fujii Masao masao.fu...@gmail.com wrote: On Thu, May 21, 2015 at 3:53 PM, Jeff Janes jeff.ja...@gmail.com wrote: On Mon, Mar 16, 2015 at 11:05 PM, Jeff Janes jeff.ja...@gmail.com wrote: On Mon, Feb 23, 2015 at 8:56 AM, Heikki Linnakangas hlinnakan...@vmware.com wrote: Everyone seems to be happy with the names and behaviour of the GUCs, so committed. The docs suggest that max_wal_size will be respected during archive recovery (causing restartpoints and recycling), but I'm not seeing that happening. Is this a doc bug or an implementation bug? I think the old behavior, where restartpoints were driven only by time and not by volume, was a misfeature. But not a bug, because it was documented. One of the points of max_wal_size and its predecessor is to limit how big pg_xlog can grow. But running out of disk space on pg_xlog is no more fun during archive recovery than it is during normal operations. So why shouldn't max_wal_size be active during recovery? The following message of commit 7181530 explains why. In standby mode, respect checkpoint_segments in addition to checkpoint_timeout to trigger restartpoints. We used to deliberately only do time-based restartpoints, because if checkpoint_segments is small we would spend time doing restartpoints more often than really necessary. But now that restartpoints are done in bgwriter, they're not as disruptive as they used to be. Secondly, because streaming replication stores the streamed WAL files in pg_xlog, we want to clean it up more often to avoid running out of disk space when checkpoint_timeout is large and checkpoint_segments small. Previously users were more likely to fall into this trouble (i.e., too frequent occurrence of restartpoints) because the default value of checkpoint_segments was very small, I guess. But we increased the default of max_wal_size, so now the risk of that trouble seems to be smaller than before, and maybe we can allow max_wal_size to trigger restartpoints. I see. The old behavior was present for the same reason we decided to split checkpoint_segments into max_wal_size and min_wal_size. That is, the default checkpoint_segments was small, and it had to be small because increasing it would cause more space to be used even when that extra space was not helpful. So perhaps we can consider this change a completion of the max_wal_size work, rather than a new feature? Cheers, Jeff
Re: [HACKERS] ERROR: MultiXactId xxxx has not been created yet -- apparent wraparound
Paul Smith wrote: In the backup, for the row which will become broken, I get: ctid| xmin | xmax +--+-- (18722,29) | 23862661 | 23862661 Okay. Assuming it hadn't changed in the broken version (from application knowledge, it's unlikely to change), then on the broken version, select heap_page_items(get_raw_page('messages.msgdata', 18722)); gives: (21,3112,1,812,1332175474,894779760,223823413,(1443526721,25142),12141,18516,57,,) (23,4448,1,352,728903474,1466642284,1869042033,(1262568257,16743),16751,26455,65,011010101101001010101010001001101010110100101010100010100111011011001100100010110110111011000010101100101011001011011000110010110010110011001011,) (24,4184,1,264,619804,228,8000,(956969542,26694),22857,29251,98,110001101110011001100110011111001010001000101110010011010010100101100010110010010110011000100110101011100100010101100010111011000110101000101101011111000101011001001100100111001101001010011010011001101100011010001010001101101001011001101000101000110110011010101100001010100110010010100111011000101110010010100101101011010110101000101100101100101000101011111010111000101100110001110010010100101010010100100100101001101100111010110101111001101100101000110110101101101001101001100110,) (26,3928,1,252,1212837441,1232350037,1668772674,(728910135,11093),31284,16687,70,01101010001010001010010011001100110010101101001001110111011011100110011001101011011010011100100001001010100011000101001000101110110101000110101100101011011010110010001100100010101011101100111001001100110010110101101101100100101101101110101010100010011011100100101010110110001100100101011001001110001101101100011001100110,1717791858) (29,2296,1,812,1229271346,1934308693,1733443957,(1848337707,21358),31337,21592,88,,1178686785) (30,1944,1,352,947155032,1380218998,792031842,(1211650927,22579),30585,20532,80,,) (31,1560,1,380,23935627,23935627,1,(18722,31),40976,9510,24,,) (31 rows) The data for index 29 has 1934308693 in the header (which is the multixactid reported in the error message when trying to retrieve that row). Right. The ctids on items 21, 23, 24, 26,29 and 30 are pretty obviously corrupted (page numbers are too large, and offset numbers are way above normal offset numbers), as is probaly everything else about them. Note xmin should be a valid xid, but for example on item 21 it's 133 million which obviously is not a valid xid on your system. Items 26 and 29 even have an OID, which the others do not (indicating wrong HEAP_HASOID flag). Note the lp_off fields. It seems that all valid items have offsets above 4096, and corrupted items have offsets below that. The theory that somehow the system wrote a bad filesystem page (4096 bytes) on the lower half of the Postgres page (8192 bytes) sounds rather more probable now. I'm struggling to find the definitions for the data returned by heap_page_items, Try \df+ heap_page_items; or more easily, use this query instead: select * from heap_page_items(get_raw_page('messages.msgdata', 18722)); gives: If I look at the raw page data, it contains data relating to this installation, so it doesn't look like a page from a different Hyper-V guest. It also doesn't look like just random data from a memory error. It actually looks like legitimate data which should possibly have been written elsewhere. (I don't want to post it here because it contains possibly confidential data from one of our customers). Sure. -- Álvaro Herrerahttp://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] unaddressable bytes in BRIN
Alvaro Herrera wrote: Andres Freund just forwarded me a valgrind error report that Peter Geoghegan noticed: ==29892== Unaddressable byte(s) found during client check request ==29892==at 0x7D1317: PageAddItem (bufpage.c:314) Fixed by Tom, http://git.postgresql.org/pg/commitdiff/79f2b5d583e2e2a7ccd13e31d0e20a900c8f2f61 -- Álvaro Herrerahttp://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Order of columns in query is important?!
CK Tan ck...@vitessedata.com writes: For Vitesse X, we mark all columns that were required in the query during JIT compile, and deform it in one shot. PG should be able to do the same. See ExecProject(). regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] ERROR: MultiXactId xxxx has not been created yet -- apparent wraparound
Paul Smith wrote: With PostgreSQL 9.3.5 on Ubuntu 12.04, I'm getting the error: ERROR: MultiXactId 1934308693 has not been created yet -- apparent wraparound on doing various queries on our database. I don't think it is a wraparound - I think the tuple has mistakenly decided it has a MultiXactId related to it. Yeah, that looks like the case. According to your pg_controldata output, you haven't used many multixacts at all: Latest checkpoint's NextMultiXactId: 216 Latest checkpoint's NextMultiOffset: 439 and the fact that you only have these files: The pg_multixact directory contains two files members/ offsets/ confirms this. You XID counter is also rather low, only 24 million: Latest checkpoint's NextXID: 0/24005839 so it doesn't seem plausible that the single bit HEAP_XMAX_IS_MULTI was turned on accidentally (something which I've never seen happen). It doesn't look like a randomly corrupt page either; normally you would see errors about mismatching page headers before you get to the point where Xmax is read. I wonder if the data page came from elsewhere. Maybe you copied a data file from another database? -- Álvaro Herrerahttp://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [COMMITTERS] pgsql: Row-Level Security Policies (RLS)
Stephen Frost wrote: Alvaro, * Alvaro Herrera (alvhe...@2ndquadrant.com) wrote: What do we need RowSecurityPolicy-policy_id for? It seems to me that it is only used to determine whether the policy is the default deny one, so that it can later be removed if a hook adds a different one. This seems contrived as well as under-documented. Why isn't a boolean flag sufficient? Thanks for taking a look! It's also used during relcache updates (see equalPolicy()). Hmm, but the policy name is unique also, right? So the policy_id check is redundant ... -- Álvaro Herrerahttp://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] Run a test instance from the source directory
Hi, I'd like to edit, compile, run, profile and debug in a loop. To do that I put pg_sleep(3600) in the regression test suite. That way I can easily change a file, 'make check', and then run some SQL from another terminal. Is there a 'better' way to run postgres after building? In a way that doesn't break 'make check'? Thanks, Groeten, Arjen -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] about lob(idea)
El 25/05/15 a las 06:13, alex2010 escribió: Maybe it makes sense to add ability to store large objects in the same table space as the table. Or an opportunity - to specify table space for a large object. Do you have anything in todolists about it? This is something which has popped up on me more than once when giving talks about storing files in PostgreSQL (last PgDay Argentina there was quite a debate about it, particularly when bringing up the bytea - LO comparison). The concerns the people exposed had different end goals. One of the main concerns was the fact that all LO live in a common catalog table (pg_largeobjects). If the LO were stored per-database, with a some alike schema as pg_largeobjects, then they could be placed on any tablespace available, and even get dumped on a normal DB dump, which makes administration much simpler. Cheers, -- Martín Marquéshttp://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] a few thoughts on the schedule
On Tue, May 19, 2015 at 10:25:49AM -0400, Robert Haas wrote: Unfortunately, I don't have a lot of good ideas here. I know that I spend as much time reviewing other people's patches as I can manage to find in my schedule, and I know a lot of people would probably like to see me do more of that. I'm sure there are also some people who would like to see me do less of that, and at least a few who would like to see me die in a fire. Ultimately, this is about money. I have a job where I can devote some time to reviewing other people's patches, which is great: many people aren't that lucky. Nobody has offered me a job where I can spend a higher percentage of my time doing that than I spend now. Unless talented reviewers can get such job offers, we are going to continue to have trouble making ends meet. I think this comes down to how many companies care about the health of the community vs. how many care about getting their specific patches committed. In some sense this is the free rider problem: http://en.wikipedia.org/wiki/Free_rider_problem Historically employees who are Postgres community members have been good at telling employers that they cannot be free riders, but there has been some diminishment of that as Postgres has figured more prominently in company success. However, I have also heard that there is increased concern among employers that the free rider problem is causing structural problems in the community, which might lend support to free rider-resisting employees. -- Bruce Momjian br...@momjian.ushttp://momjian.us EnterpriseDB http://enterprisedb.com + Everyone has their own god. + -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] fsync-pgdata-on-recovery tries to write to more files than previously
On 2015-05-26 10:41:12 -0400, Tom Lane wrote: Yeah. Perhaps I missed it, but was the original patch motivated by actual failures that had been seen in the field, or was it just a hypothetical concern? I'd mentioned that it might be a good idea to do this while investingating a bug with unlogged tables that several parties had reported. That patch has been fixed in a more granular fashion. From there it took on kind of a life on its own. It is somewhat interesting that similar code has been used in pg_upgrade, via initdb -S, for a while now, without, to my knowledge, it causing reported problem. I think the relevant difference is that that code doesn't follow symlinks. It's obviously also less exercised and poeople might just have fixed up permissions when encountering troubles. Abhijit, do you recall why the code was changed to follow all symlinks in contrast to explicitly going through the tablespaces as initdb -S does? I'm pretty sure early versions of the patch pretty much had a verbatim copy of the initdb logic? That logic is missing pg_xlog btw, which is bad for pg_upgrade. I *can* reproduce corrupted clusters without this without trying too hard. I yesterday wrote a test for the crash testing infrastructure I've on and off worked on (since 2011. Uhm) and I could reproduce a bunch of corrupted clusters without this patch. When randomly triggering crash restarts shortly afterwards follwed by a simulated hw restart (stop accepting writes via linux device mapper) while concurrent COPYs are running, I can trigger a bunch of data corruptions. Since then my computer in berlin has done 440 testruns with the patch, and 450 without. I've gotten 7 errors without, 0 with. But the instrumentation for detecting errors is really shitty (pkey lookup for every expected row) and doesn't yet keep meaningful diagnosistics. So this isn't particularly bulletproof either way. I can't tell whether the patch is just masking yet another problem due to different timings caused by the fsync, or whether it's fixing the problem that we can forget to sync WAL segments. Greetings, Andres Freund -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Why does txid_current() assign new transaction-id?
Re: Tom Lane 2015-05-26 14207.1432650...@sss.pgh.pa.us Naoya Anzai nao-an...@xc.jp.nec.com writes: I have a question about txid_current(). it is Why does txid_current() assign new transaction-id?. Consider begin; select txid_current(); insert into my_table ...; commit; If we changed the code as you propose, the result of the SELECT would no longer have anything to do with the XID used for the insertion. Still, exposing GetStableLatestTransactionId() on the SQL level would make sense for monitoring transaction throughput. Currently if you do that with txid_current(), you'll generate an (low, but measurable) transaction load of 1/monitoring interval. Christoph -- c...@df7cb.de | http://www.df7cb.de/ -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] fsync-pgdata-on-recovery tries to write to more files than previously
On Tue, May 26, 2015 at 6:16 PM, Andrew Dunstan and...@dunslane.net wrote: On 05/26/2015 11:58 AM, Tom Lane wrote: Andrew Dunstan and...@dunslane.net writes: OK, I'm late to the party. But why exactly are we syncing absolutely everything? That seems over-broad. If we try to be selective, we risk errors of omission, which no one would ever notice until someone's data got eaten in a low-probability crash scenario. It seems more robust (at least to me) to fsync everything we can find. That does require more thought about error cases than went into the original patch ... but I think that we need more thought about error cases even if we do try to be selective. One thing perhaps we *should* be selective about, though, is which symlinks we try to follow. I think that a good case could be made for ignoring symlinks everywhere except in the pg_tablespace directory. If we did, that would all by itself take care of the Debian scenario, if I understand that case correctly. People have symlinked the xlog directory. I've done it myself in the past. A better rule might be to ignore symlinks unless either they are in pg_tblspc or they are in the data directory and their name starts with pg_. Not just people. initdb will symlink the xlog directory if you use -x. -- Magnus Hagander Me: http://www.hagander.net/ Work: http://www.redpill-linpro.com/
Re: [HACKERS] fsync-pgdata-on-recovery tries to write to more files than previously
On 2015-05-26 19:07:20 +0200, Andres Freund wrote: Abhijit, do you recall why the code was changed to follow all symlinks in contrast to explicitly going through the tablespaces as initdb -S does? I'm pretty sure early versions of the patch pretty much had a verbatim copy of the initdb logic? Forgot to add Abhijit to CC list, sorry. That [initdb's] logic is missing pg_xlog btw, which is bad for pg_upgrade. On second thought it's probably not actually bad for pg_upgrade's specific usecase. It'll always end with a proper shutdown, so it'll never need that WAL. But it'd be bad if anybody ever relied on initdb -S in a different scenario. -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Why does txid_current() assign new transaction-id?
Christoph Berg m...@debian.org writes: Still, exposing GetStableLatestTransactionId() on the SQL level would make sense for monitoring transaction throughput. Perhaps, though I wonder why we should expose that and not just report the result of ReadNewTransactionId() --- or in txid.c's case, the result of GetNextXidAndEpoch(). In either case it would have to be a new function, not unilaterally redefining what txid_current() does. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers