Re: [HACKERS] [COMMITTERS] pgsql: Add pg_audit, an auditing extension

2015-05-26 Thread Stephen Frost
Noah,

* Noah Misch (n...@leadboat.com) wrote:
 On Thu, May 14, 2015 at 01:38:49PM -0400, Tom Lane wrote:
  * The comments in the code betray utter ignorance of how logging actually
  works, in particular this:
  
   * Administrators can choose which log level the audit log is to be logged
   * at.  The default level is LOG, which goes into the server log but does
   * not go to the client.  Set to NOTICE in the regression tests.
  
  All the user has to do is change client_min_messages and he'll see all the
  reports, which means if you think that letting the user see the audit
  reports is a security problem then you have a hole a mile wide.
 
 That indicates the patch's general readiness:

While I agree that the comment was poorly worded and agreed with Tom
that it should be changed, I do not agree that it's indicative of the
patch's readiness, nor do I feel it's an issue that clients can see the
auditing information, provided that it's clearly documented.

I'm planning to commit a number of documentation updates (along with
other updates) to pg_audit to address that.  Using ereport() for query
information isn't anything new- we do that in auto_explain, and we also
report the query in the CONTEXT lines of error messages.  We should (and
I plan to) document that in the appropriate places and note that there
are risks associated with it (eg: with security definer functions).

  +   /* These are DDL, unless they are ROLE */
  +   case LOGSTMT_DDL:
  +   className = CLASS_DDL;
  +   class = LOG_DDL;
  +
  +   /* Identify role statements */
  +   switch (stackItem-auditEvent.commandTag)
  +   {
  +   /* We know these are all role statements */
  +   case T_GrantStmt:
  +   case T_GrantRoleStmt:
  +   case T_CreateRoleStmt:
  +   case T_DropRoleStmt:
  +   case T_AlterRoleStmt:
  +   case T_AlterRoleSetStmt:
  +   className = CLASS_ROLE;
  +   class = LOG_ROLE;
  +   break;
 
 Not T_AlterDefaultPrivilegesStmt?

Agreed, that would be more sensible under ROLE than DDL.

  +static void
  +pg_audit_ProcessUtility_hook(Node *parsetree,
  +const char 
  *queryString,
  +ProcessUtilityContext 
  context,
  +ParamListInfo params,
  +DestReceiver *dest,
  +char *completionTag)
  +{
  +   AuditEventStackItem *stackItem = NULL;
  +   int64 stackId = 0;
  +
  +   /*
  +* Don't audit substatements.  All the substatements we care about 
  should
  +* be covered by the event triggers.
  +*/
  +   if (context = PROCESS_UTILITY_QUERY  
  !IsAbortedTransactionBlockState())
 
 They aren't covered.  A GRANT inside CREATE SCHEMA escapes auditing:

Actually, they are covered.  David and I discussed this extensively,
prior to your review, and concluded that this approach works because the
items not under the EventTrigger charter are shared catalogs, updates to
which wouldn't ever make sense to happen under a CREATE SCHEMA.  I do
hope that we are able to improve on EventTriggers by supporting them for
shared catalogs one day, but that is a discussion for another thread.

The issue which you discovered here is that GRANTs were categorized
under CLASS_ROLE, but we have a check at the top of the DDL event
trigger which does an early-exit if DDL isn't selected for inclusion.
That clearly needs to be changed, so that the GRANT under CREATE SCHEMA
is caught and logged properly.

We put a great deal of thought into any and all places where we filter
data to do our best to prevent this from happening, taking steps beyond
what a simple module would do to capture information and make sure that
something is logged, even when we don't have all of the information
available.  That isn't to say there aren't bugs- certainly issues have
been found through the various reviews and comments provided by multiple
individuals now, and I don't pretend that there are no others, but I
don't believe that this module is particularly more bug-ridden than
other contrib modules or even parts of core.

 I'm wary of the ease of forgetting to run CREATE EXTENSION.  One gets much
 auditing from GUCs alone; for example, we audit CREATE TABLE t () with or
 without the extension, but only with the extension do we audit the inner
 CREATE TABLE of CREATE SCHEMA s CREATE TABLE t ().  A user that creates a
 database without creating the extension might look at the audit messages and
 mistakenly think the database is all set.

I agree with this concern, 

Re: [HACKERS] Run pgindent now?

2015-05-26 Thread Andres Freund
On 2015-05-26 20:25:24 -0400, Peter Eisentraut wrote:
 On 5/25/15 7:15 PM, Andres Freund wrote:
  On 2015-05-25 19:01:28 -0400, Bruce Momjian wrote:
  A longer-term fix would be to make pgindent less stupid about this sort
  of usage, but nobody's yet volunteered to dig into the guts of that code.
 
  I assume a typedefs list is going to be a requirement of any decent C
  indenting tool.
  
  Maybe I'm missing something major here, but why? Afaict it's just only
  used for formatting decisions that could be made without it just as well?
 
 AFAICT, the main reason is to decide whether * and  are binary infix or
 unary prefix operators.  Otherwise, it wouldn't know whether to write
 
 char * foo;
 
 or the more customary
 
 char *foo;
 
 Now, running pgindent without a typedefs list also makes it do things like
 
  static int32
 -makepol(QPRS_STATE *state)
 +makepol(QPRS_STATE * state)
 
 which, one might argue, it could figure out without a typedefs list.
 But then the formatting would be inconsistent between prototypes and
 variable declarations, which might drive people crazy.  I don't know
 whether there is a better way than living with it, one way or the other
 (i.e., requiring a types list, or accepting slightly odd formatting).

I actually think both are relatively easy to figure out without a
typedef list. There's harder cases though, e.g. (char *) foo in an
expression is already more complicated.


But really, the typedef list is the minor part what annoys me about
pgindent. That it completely butchers so many constructs (e.g. function
pointer typedefs, inline asm as extreme examples) is much worse. It's
also neigh on impossible to predict/keep the indentation pgindent will
use in many cases.  Having to try to write code in a way that doesn't
break the re-indentation tool, even if it'd otherwise be fine, is just
absurd.

Greetings,

Andres Freund


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] psql po translation update

2015-05-26 Thread Tatsuo Ishii
Peter,

Here is a translation fule for src/bin/psql/ja.po of 9.4.2 from
Kosizumi-san. According to him, there are some missing translations in
the po file and he fixed it. Shall I commit into REL9_4_STABLE tree or
would you like to do it yourself?

Best regards,
--
Tatsuo Ishii
SRA OSS, Inc. Japan
English: http://www.sraoss.co.jp/index_en.php
Japanese:http://www.sraoss.co.jp
# translation of psql.
# HOTTA Michihide ho...@net-newbie.com, 2010.
#
msgid 
msgstr 
Project-Id-Version: PostgreSQL 9.4.2\n
Report-Msgid-Bugs-To: pgsql-b...@postgresql.org\n
POT-Creation-Date: 2015-05-23 22:59+0900\n
PO-Revision-Date: 2015-05-27 08:21+0900\n
Last-Translator: KOIZUMI Satoru  koizumi...@minos.ocn.ne.jp\n
Language-Team: jpug-doc jpug-...@ml.postgresql.jp\n
Language: ja\n
MIME-Version: 1.0\n
Content-Type: text/plain; charset=UTF-8\n
Content-Transfer-Encoding: 8bit\n
Plural-Forms: nplurals=2; plural=n != 1;\n

#: ../../common/exec.c:127 ../../common/exec.c:241 ../../common/exec.c:284
#, c-format
msgid could not identify current directory: %s
msgstr カレントディレクトリを識別できませんでした。: %s

#: ../../common/exec.c:146
#, c-format
msgid invalid binary \%s\
msgstr \%s\ は有効なバイナリファイルではありません。

#: ../../common/exec.c:195
#, c-format
msgid could not read binary \%s\
msgstr バイナリファイル \%s\ を読み込めませんでした。

#: ../../common/exec.c:202
#, c-format
msgid could not find a \%s\ to execute
msgstr 実行に必要な \%s\ が見つかりません。

#: ../../common/exec.c:257 ../../common/exec.c:293
#, c-format
msgid could not change directory to \%s\: %s
msgstr ディレクトリ\%s\に移動できませんでした: %s

#: ../../common/exec.c:272
#, c-format
msgid could not read symbolic link \%s\
msgstr シンボリックリンク \%s\ を読み込めませんでした。

#: ../../common/exec.c:523
#, c-format
msgid pclose failed: %s
msgstr pcloseが失敗しました: %s

#: ../../common/fe_memutils.c:33 ../../common/fe_memutils.c:60
#: ../../common/fe_memutils.c:83 command.c:321 input.c:205 mainloop.c:72
#: mainloop.c:234
#, c-format
msgid out of memory\n
msgstr メモリ不足です\n

#: ../../common/fe_memutils.c:77
#, c-format
msgid cannot duplicate null pointer (internal error)\n
msgstr null ポインタを複製できません(内部エラー)。\n

#: ../../common/username.c:45
#, c-format
msgid could not look up effective user ID %ld: %s
msgstr 実効ユーザID %ld が見つかりませんでした: %s

#: ../../common/username.c:47 command.c:276
msgid user does not exist
msgstr ユーザが存在しません

#: ../../common/username.c:62
#, c-format
msgid user name lookup failure: error code %lu
msgstr ユーザ名の検索に失敗: エラーコード %lu

#: ../../common/wait_error.c:47
#, c-format
msgid command not executable
msgstr コマンドは実行形式ではありません

#: ../../common/wait_error.c:51
#, c-format
msgid command not found
msgstr コマンドが見つかりません

#: ../../common/wait_error.c:56
#, c-format
msgid child process exited with exit code %d
msgstr 子プロセスが終了コード %d で終了しました。

#: ../../common/wait_error.c:63
#, c-format
msgid child process was terminated by exception 0x%X
msgstr 子プロセスが例外 0x%X で終了させられました。

#: ../../common/wait_error.c:73
#, c-format
msgid child process was terminated by signal %s
msgstr 子プロセスがシグナル %s で終了させられました。

#: ../../common/wait_error.c:77
#, c-format
msgid child process was terminated by signal %d
msgstr 子プロセスがシグナル %d で終了させられました。

#: ../../common/wait_error.c:82
#, c-format
msgid child process exited with unrecognized status %d
msgstr 子プロセスが不明な状態%dにより終了しました。

#: command.c:117
#, c-format
msgid Invalid command \\%s. Try \\? for help.\n
msgstr \\%sコマンドは無効です。\\? でヘルプを参照してください。\n

#: command.c:119
#, c-format
msgid invalid command \\%s\n
msgstr \\%sは無効なコマンドです:\n

#: command.c:130
#, c-format
msgid \\%s: extra argument \%s\ ignored\n
msgstr \\%s: 余分な引数 \%s\ は無視されました。\n

#: command.c:274
#, c-format
msgid could not get home directory for user ID %ld: %s\n
msgstr ユーザID %ld のホームディレクトリ \%s\ の位置を特定できません。\n

#: command.c:292
#, c-format
msgid \\%s: could not change directory to \%s\: %s\n
msgstr \\%s: ディレクトリを \%s\ に変更できません:%s\n

#: command.c:307 common.c:446 common.c:886
#, c-format
msgid You are currently not connected to a database.\n
msgstr 現在データベースには接続していません。\n

#: command.c:334
#, c-format
msgid You are connected to database \%s\ as user \%s\ via socket in \%s\ 
at port \%s\.\n
msgstr データベース\%s\にユーザ\%s\でソケット\%s\経由のポート\%s\で接続しています。\n

#: command.c:337
#, c-format
msgid You are connected to database \%s\ as user \%s\ on host \%s\ at 
port \%s\.\n
msgstr ホスト\%3$s\上のポート\%4$s\のデータベース\%1$s\にユーザ\%2$s\で接続しています\n

#: command.c:538 command.c:608 command.c:1403
#, c-format
msgid no query buffer\n
msgstr 問い合わせバッファがありません。\n

#: command.c:571 command.c:3035
#, c-format
msgid invalid line number: %s\n
msgstr 無効な行番号です: %s\n

#: command.c:602
#, c-format
msgid The server (version %d.%d) does not support editing function source.\n
msgstr このサーバーのバージョン (%d.%d) は関数のソース編集をサポートしていません\n

#: command.c:682
msgid No changes
msgstr 変更なし

#: command.c:736
#, c-format
msgid %s: invalid encoding name or conversion procedure not found\n
msgstr %s: 符号化方式名が無効、または変換用プロシージャが見つかりません。\n

#: command.c:833 command.c:883 command.c:897 command.c:914 command.c:1021
#: command.c:1180 command.c:1383 command.c:1414
#, c-format
msgid 

Re: [HACKERS] fsync-pgdata-on-recovery tries to write to more files than previously

2015-05-26 Thread Abhijit Menon-Sen
At 2015-05-26 19:07:20 +0200, and...@anarazel.de wrote:

 Abhijit, do you recall why the code was changed to follow all symlinks
 in contrast to explicitly going through the tablespaces as initdb -S
 does? I'm pretty sure early versions of the patch pretty much had a
 verbatim copy of the initdb logic?

Yes, earlier versions of the patch did follow symlinks only in
pg_tblspc. I changed this because Álvaro pointed out that this
was likely to miss important stuff, e.g. in

20150403163232.ga28...@eldon.alvh.no-ip.org,
20150406131636.gd4...@alvh.no-ip.org

-- Abhijit


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [COMMITTERS] pgsql: Add pg_audit, an auditing extension

2015-05-26 Thread Robert Haas
On Tue, May 26, 2015 at 8:10 PM, Stephen Frost sfr...@snowman.net wrote:
 I certainly welcome review from others and if there is not another
 committer-level formal review before we get close on 9.5 (say, end of
 August), then I'll revert it.  There is certainly no concern that doing
 so would be difficult to do, as it is entirely self-contained.

Like Noah, I'm unhappy that this patch went in.  I stated a few days
before it was committed that I felt the code quality and documentation
were well below the level we normally expect.  Your response was to do
a little more work on the patch and then commit it without so much as
reposting it.  Clearly, we have completely different ideas about what
constitutes an appropriate response to the sort of comments I sent.

A near-record number of committers have objected to this patch in all
kinds of different ways.  There have been concerns about process,
concerns about code quality, and concerns about whether this is really
something that we want.  The normal way that works is that you either
(a) revise the patch or (b) try to convince the person raising the
concern that it's OK.  The concern is addressed when the person who
raised it says it is, and not just because you thought about it and
decided it was OK.  This can be taken too far in the other direction,
but we're not close to that in this instance.

I am particularly troubled by the fact that what has happened with
this patch is very much like what happened with row-level security: a
patch that clearly wasn't finished and clearly had not had adequate
review got abruptly committed - by you - without any consensus so to
do.  It seems likely to me that by now row-level security has had
enough review that it is solid enough to be in the tree - in
particular, I am encouraged by Dean Rasheed's work to improve the
patch.  However, it is absolutely not the community process for that
stuff to happen after the code is already in the tree.  It is the
community process for that stuff to happen before the code is in the
tree.

It will be impossible for our community to continue delivering quality
software if you persist in taking this approach.  Either the other
committers will have to spend an inordinate effort fixing (or
convincing you to fix) the stuff you break - instead of working on our
own projects - and other people's patches - or we will have to ignore
your commits, and the things that are broken in those commits will
become bugs in our releases.  Either way, the community loses.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Run pgindent now?

2015-05-26 Thread Peter Eisentraut
On 5/25/15 7:15 PM, Andres Freund wrote:
 On 2015-05-25 19:01:28 -0400, Bruce Momjian wrote:
 A longer-term fix would be to make pgindent less stupid about this sort
 of usage, but nobody's yet volunteered to dig into the guts of that code.

 I assume a typedefs list is going to be a requirement of any decent C
 indenting tool.
 
 Maybe I'm missing something major here, but why? Afaict it's just only
 used for formatting decisions that could be made without it just as well?

AFAICT, the main reason is to decide whether * and  are binary infix or
unary prefix operators.  Otherwise, it wouldn't know whether to write

char * foo;

or the more customary

char *foo;

Now, running pgindent without a typedefs list also makes it do things like

 static int32
-makepol(QPRS_STATE *state)
+makepol(QPRS_STATE * state)

which, one might argue, it could figure out without a typedefs list.
But then the formatting would be inconsistent between prototypes and
variable declarations, which might drive people crazy.  I don't know
whether there is a better way than living with it, one way or the other
(i.e., requiring a types list, or accepting slightly odd formatting).



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Run pgindent now?

2015-05-26 Thread Robert Haas
On Tue, May 26, 2015 at 3:07 PM, Tom Lane t...@sss.pgh.pa.us wrote:
 Robert Haas robertmh...@gmail.com writes:
 Realistically, with merge.conflictstyle = diff3 (why is this not the
 default?), resolving whitespace conflicts that occur when you try to
 cherry-pick is typically not very difficult.

 Really?  The problems I have generally come from places where pgindent
 has changed the line breaks, not just horizontal spacing.  I haven't
 seen anything that copes with this, certainly not git.

Well, it's not fully automated, but if you set the setting above, and
then cherry-pick, your merge conflicts will look something like this:


side A
|||
original version
===
side B


Either side A or side B will be what changed in the patch you
cherry-picked, and the other will be what changed in the branch in the
meantime.  I forget which is which.  So you notice that one of the two
sides differs from the original version only in terms of whitespace
and delete that side, keeping the other side.  Done.

In general, the way you resolve these conflicts is by choosing the
side that has fewer changes from the original version, noting how it
differs from the original version, modifying the other side
accordingly, and then deleting the other two versions.  For example:


here we renamed the function!
|||
original version
===
here we added an additional parameter to the function call!


So you either change side B to the new function name and remove side
A, or else you change side A to pass the extra parameter and remove
side B.  In either case you remove the original version.

This is obviously not zero effort.  At the same time, it's not much
effort, either.  I resolve these kinds of mechanical conflicts all the
time, and they don't take up much time or effort.  If you have to deal
with this kind of crap using patch, it bites.  If you use git but
with the default conflictstyle, you don't get the original version
part of the conflict, so it still bites.  But after a modest amount of
practice, resolving trivial conflicts with merge.conflictstyle=diff3
is pretty darn easy.  Or at least, I have found it so.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [COMMITTERS] pgsql: Row-Level Security Policies (RLS)

2015-05-26 Thread Stephen Frost
Alvaro,

* Alvaro Herrera (alvhe...@2ndquadrant.com) wrote:
 Stephen Frost wrote:
  * Alvaro Herrera (alvhe...@2ndquadrant.com) wrote:
   What do we need RowSecurityPolicy-policy_id for?  It seems to me that
   it is only used to determine whether the policy is the default deny
   one, so that it can later be removed if a hook adds a different one.
   This seems contrived as well as under-documented.  Why isn't a boolean
   flag sufficient?
  
  Thanks for taking a look!
  
  It's also used during relcache updates (see equalPolicy()).
 
 Hmm, but the policy name is unique also, right?  So the policy_id check
 is redundant ...

I don't disagree with that, but surely checking if it's the same OID and
exiting immediately is going to be faster than comparing the policy
names.

Now, looking at the code, I'm actually failing to see a case where we
use the RowSecurityPolicy-policy_name..  Perhaps *that's* what we
should be looking to remove?

Thanks!

Stephen


signature.asc
Description: Digital signature


Re: [HACKERS] optimizing vacuum truncation scans

2015-05-26 Thread Jeff Janes
On Tue, May 26, 2015 at 12:37 PM, Jeff Janes jeff.ja...@gmail.com wrote:

 On Mon, Apr 20, 2015 at 10:18 AM, Jim Nasby jim.na...@bluetreble.com
 wrote:

 On 4/20/15 1:50 AM, Jeff Janes wrote:


  For that matter, why do we scan backwards anyway? The comments don't
 explain it, and we have nonempty_pages as a starting point, so why
 don't we just scan forward? I suspect that eons ago we didn't have
 that and just blindly reverse-scanned until we finally hit a
 non-empty buffer...


 nonempty_pages is not concurrency safe, as the pages could become used
 after vacuum passed them over but before the access exclusive lock was
 grabbed before the truncation scan.  But maybe the combination of the
 two?  If it is above nonempty_pages, then anyone who wrote into the page
 after vacuum passed it must have cleared the VM bit. And currently I
 think no one but vacuum ever sets VM bit back on, so once cleared it
 would stay cleared.


 Right.

  In any event nonempty_pages could be used to set the guess as to how
 many pages (if any) might be worth prefetching, as that is not needed
 for correctness.


 Yeah, but I think we'd do a LOT better with the VM idea, because we could
 immediately truncate without scanning anything.


 Right now all the interlocks to make this work seem to be in place (only
 vacuum and startup can set visibility map bits, and only one vacuum can be
 in a table at a time).  But as far as I can tell, those assumption are not
 baked in and we have pondered loosening them before.

 For example, letting HOT clean up mark a page as all-visible if it finds
 it be such.  Now in that specific case it would be OK, as HOT cleanup would
 not cause a page to become empty (or could it?  If an insert on a table
 with no indexes was rolled back, and hot clean up found it and cleaned it
 up, it could conceptually become empty--unless we make special code to
 prevent it) , and so the page would have to be below nonempty_pages.  But
 there may be other cases.

 And I know other people have mentioned making VACUUM concurrent (although
 I don't see the value in that myself).

 So doing it this way would be hard to beat (scanning a bitmap vs the table
 itself), but it would also introduce a modularity violation that I am not
 sure is worth it.

 Of course this could always be reverted if its requirements became a
 problem for a more important change (assuming of course that we detected
 the problem)


The fatal problem here is that nonempty_pages is unreliable.  If vacuum
skips all-visible pages, it doesn't necessarily increment nonempty_pages
beyond that skippage.  So if you just rely on nonempty_pages, you will
truncate away pages that were already all visible but are not empty.  If we
changed it so that it did increment nonempty_pages past the skipped ones,
then pages which were all empty and got marked as all visible without being
truncated (say, because a lock could not be acquired, or because there was
a non-empty page after them which later became empty), then those pages
would never get truncated away.

As it is currently, it is not clear what purpose nonempty_pages serves.  It
is a guardian value which doesn't seem to actually guard anything.  At best
it prevents you from needing to inspect one page (the page of the guardian
value itself) to see if that page is actually empty, and finding that it is
not.  That hardly seems worthwhile.

We could adopt two nonempty_pages counters, once that fails low on skipped
all-visible pages, and one that failed high on them.  And then fast
truncate down to the high one, and do the current page by page scan between
the low and high. That seems rather grotesque.

Cheers,

Jeff


Re: [HACKERS] Run pgindent now?

2015-05-26 Thread Peter Eisentraut
On 5/25/15 5:51 PM, Alvaro Herrera wrote:
 Tom Lane wrote:
 
 A longer-term fix would be to make pgindent less stupid about this sort
 of usage, but nobody's yet volunteered to dig into the guts of that code.
 
 We've discussed in the past that we could use something other than BSD's
 indent -- astyle has been mentioned.  It seems that with suitable
 options, we could make the result very close to what we have now, and
 not be forced to deal with typedef lists and other nonsense.

astyle looks like a decent tool, but it seems to me that it tends to
leave things alone that it doesn't have an explicit rule about.  So that
would leave a lot of room for formatting randomness from different authors.



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Construction of Plan-node by CSP (RE: Custom/Foreign-Join-APIs)

2015-05-26 Thread Robert Haas
On Mon, May 25, 2015 at 5:08 AM, Kouhei Kaigai kai...@ak.jp.nec.com wrote:
 I updated the patch to fix up this problem towards the latest master
 branch.
[ ... ]
 Instead of this approach, Tom suggested to add a list of child Paths
 on CustomPath node, then createplan.c calls create_plan_recurse() for
 each entry of the list, without this function getting exported.

Tom, do you want to review this patch and figure out how to solve the
underlying problem?  If not, I will take care of it.  But I will be
unhappy if I put time and effort into this and then you insist on
changing everything afterwards, again.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] fsync-pgdata-on-recovery tries to write to more files than previously

2015-05-26 Thread Abhijit Menon-Sen
At 2015-05-26 22:44:03 +0200, and...@anarazel.de wrote:

 So, this was discussed in the following thread, starting at:
 http://archives.postgresql.org/message-id/20150403163232.GA28444%40eldon.alvh.no-ip.org

Sorry, I didn't see this before replying.

 There are no other places we it's allowed to introduce symlinks and
 we have refuted bugreports of people having problems after doing that.

OK.

 So what I propose is:
 1) Remove the automatic symlink following
 2) Follow pg_tbspc/*, pg_xlog if it's a symlink, fix the latter in
initdb -S
 3) Add a elevel argument to walkdir(), return if AllocateDir() fails,
continue for stat() failures in the readdir() loop.
 4) Add elevel argument to pre_sync_fname, fsync_fname, return after
errors.
 5) Accept EACCESS, ETXTBSY (if defined) when open()ing the files. By
virtue of not following symlinks we should not need to worry about
EROFS

Makes sense. I've got most of that done, I'll just remove the symlink
following (or rather, restrict it to the cases mentioned), test a bit,
and post.

 I'm inclined to think that 4) is a big enough compat break that a
 fsync_fname_ext with the new argument is a good idea.

Agreed.

-- Abhijit


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] [Postgresql NLS support] : Help on using NLS , Custom dictionary to enhance our website search functionality

2015-05-26 Thread Nivedita Kulkarni
Hi All,
We have newbie to Postgresql.
Background:
We have site hosted on Ruby on Rails using Postgresql database.It is a 
eCommerce site and for which we need to provide the NLS supported Search 
functionality to help end users while searching by using Synonyms, related word 
, Plurals and Singular , Stop words etc.

Problem 
As per our analysis we found that with Postgresql NLS it possible to do 
following with any custom changes:


1. Participles (help,helps,helped,helping)

2. Multi-word (search results)

3. Single word (results)

4. Plurals and Singular (s,es)

5. Punctuation's (.,;,!,:,')

6. Stop words (And, or, as , an, but)




Reading the documentation for Custom dictionary, We tried to use the Custom 
dictionary postgresql/9.3/tsearch_data  and added following to test :


buy purchase

big enormous

mammoth elephant

indices index*


But we found that when we search for word Purchase we get search result for 
Purchase and Buy both.

But when we search for Buy then we don't get search result for Purchase.

We are using following query to using which we found that transitive synonym 
search is not working for us. 

SELECT products.* FROM products WHERE (((to_tsvector('english', 
coalesce(products.description::TEXT, ''))) @@(to_tsquery('english', ''' ' 
|| 'purchase' || ' ''';




I think transitive support will be basic function and Postgresql database must 
have provided.  Please let us know if some setting or configuration changes are 
needed to enable this feature. 




Also as per our finding, following are points on which it seems there is no 
direct way in Postgresql:1. Related words  (Pushchair buggies)2. Near by words 
(Pen , Pencil, Eraser)3. Synonyms (buy=purchase,big=enormous)4. Spelling 
Mistake


We look forward for feedback / suggestions our problem or suggest any 
alternative solution, how this can be done ? 

Thanks in advance  Thanks and regards, 
Nivedita Kulkarni


Re: [HACKERS] Supporting TAP tests with MSVC and Windows

2015-05-26 Thread Michael Paquier
On Fri, Apr 24, 2015 at 11:26 AM, Michael Paquier
michael.paqu...@gmail.com wrote:
 On Mon, Apr 20, 2015 at 9:01 PM, Michael Paquier
 michael.paqu...@gmail.com wrote:
 On Sun, Apr 19, 2015 at 10:01 PM, Michael Paquier
 michael.paqu...@gmail.com wrote:
 Note as well that this patch uses the following patches fixing
 independent issues:
 ...

 Attached is v4. I added a switch in config.pl to be consistent with
 ./configure and --enable-tap-tests.

 Attached is v5, rebased on HEAD (2c47fe16) after conflicts with
 dcae5fac and 54a16df0.

Here is v6, a rebased version on HEAD (79f2b5d). There were some
conflicts with the indentation and some other patches related to
pg_rewind and initdb's tests.
-- 
Michael
From 5879da14bbf20b5ac66b78f8beab09114bb2fa96 Mon Sep 17 00:00:00 2001
From: Michael Paquier mich...@otacoo.com
Date: Mon, 20 Apr 2015 04:57:37 -0700
Subject: [PATCH 1/2] Add support for TAP tests on Windows

Nodes initialized by the TAP tests use SSPI to securely perform the
tests, and test scripts are patched in a couple of places to support
Windows grammar. In the case of MSVC, tests can be run with this
command:
vcregress tapcheck
---
 doc/src/sgml/install-windows.sgml  |  1 +
 src/Makefile.global.in |  2 +-
 src/bin/initdb/t/001_initdb.pl |  1 +
 src/bin/pg_basebackup/t/010_pg_basebackup.pl   | 67 --
 src/bin/pg_controldata/t/001_pg_controldata.pl |  5 +-
 src/bin/pg_ctl/t/001_start_stop.pl | 14 --
 src/bin/pg_ctl/t/002_status.pl | 12 -
 src/bin/pg_rewind/RewindTest.pm| 66 +++--
 src/bin/scripts/t/020_createdb.pl  |  3 ++
 src/test/perl/TestLib.pm   | 18 ---
 src/tools/msvc/Solution.pm |  1 +
 src/tools/msvc/config_default.pl   |  1 +
 src/tools/msvc/vcregress.pl| 48 +-
 13 files changed, 175 insertions(+), 64 deletions(-)

diff --git a/doc/src/sgml/install-windows.sgml b/doc/src/sgml/install-windows.sgml
index d154b44..2047790 100644
--- a/doc/src/sgml/install-windows.sgml
+++ b/doc/src/sgml/install-windows.sgml
@@ -439,6 +439,7 @@ $ENV{CONFIG}=Debug;
 userinputvcregress modulescheck/userinput
 userinputvcregress ecpgcheck/userinput
 userinputvcregress isolationcheck/userinput
+userinputvcregress tapcheck/userinput
 userinputvcregress upgradecheck/userinput
 /screen
 
diff --git a/src/Makefile.global.in b/src/Makefile.global.in
index c583b44..563d1d1 100644
--- a/src/Makefile.global.in
+++ b/src/Makefile.global.in
@@ -336,7 +336,7 @@ cd $(srcdir)  TESTDIR='$(CURDIR)' PATH=$(bindir):$$PATH PGPORT='6$(DEF_PGPOR
 endef
 
 define prove_check
-cd $(srcdir)  TESTDIR='$(CURDIR)' $(with_temp_install) PGPORT='6$(DEF_PGPORT)' top_builddir='$(CURDIR)/$(top_builddir)' $(PROVE) $(PG_PROVE_FLAGS) $(PROVE_FLAGS) t/*.pl
+cd $(srcdir)  TESTDIR='$(CURDIR)' TESTREGRESS='$(top_builddir)/src/test/regress/pg_regress' $(with_temp_install) PGPORT='6$(DEF_PGPORT)' $(PROVE) $(PG_PROVE_FLAGS) $(PROVE_FLAGS) t/*.pl
 endef
 
 else
diff --git a/src/bin/initdb/t/001_initdb.pl b/src/bin/initdb/t/001_initdb.pl
index 299dcf5..095cbf3 100644
--- a/src/bin/initdb/t/001_initdb.pl
+++ b/src/bin/initdb/t/001_initdb.pl
@@ -4,6 +4,7 @@
 
 use strict;
 use warnings;
+use Config;
 use TestLib;
 use Test::More tests = 14;
 
diff --git a/src/bin/pg_basebackup/t/010_pg_basebackup.pl b/src/bin/pg_basebackup/t/010_pg_basebackup.pl
index 3476ea6..112f3a1 100644
--- a/src/bin/pg_basebackup/t/010_pg_basebackup.pl
+++ b/src/bin/pg_basebackup/t/010_pg_basebackup.pl
@@ -1,8 +1,9 @@
 use strict;
 use warnings;
 use Cwd;
+use Config;
 use TestLib;
-use Test::More tests = 35;
+use Test::More tests = ($Config{osname} eq MSWin32) ? 25 : 35;
 
 program_help_ok('pg_basebackup');
 program_version_ok('pg_basebackup');
@@ -17,10 +18,20 @@ command_fails(
 	[ 'pg_basebackup', '-D', $tempdir/backup ],
 	'pg_basebackup fails because of hba');
 
+# Use SSPI on Windows, node has been initialized already accordingly
+# by pg_regress --config-auth.
 open HBA, $tempdir/pgdata/pg_hba.conf;
-print HBA local replication all trust\n;
-print HBA host replication all 127.0.0.1/32 trust\n;
-print HBA host replication all ::1/128 trust\n;
+if ($Config{osname} ne MSWin32)
+{
+	print HBA local replication all trust\n;
+	print HBA host replication all 127.0.0.1/32 trust\n;
+	print HBA host replication all ::1/128 trust\n;
+}
+else
+{
+	print HBA host replication all 127.0.0.1/32 sspi include_realm=1 map=regress\n;
+	print HBA host replication all ::1/128 sspi include_realm=1 map=regress\n;
+}
 close HBA;
 system_or_bail 'pg_ctl', '-s', '-D', $tempdir/pgdata, 'reload';
 
@@ -56,6 +67,33 @@ command_fails([ 'pg_basebackup', '-D', $tempdir/tarbackup_l1, '-Ft' ],
 	'pg_basebackup tar with long name fails');
 unlink $tempdir/pgdata/$superlongname;
 
+command_fails(
+	[ 'pg_basebackup', '-D', $tempdir/backup_foo, '-Fp', -T=/foo ],
+	'-T with 

Re: [HACKERS] [PATCH] Generalized JSON output functions

2015-05-26 Thread Shulgin, Oleksandr
On Sat, May 23, 2015 at 3:03 AM, Ryan Pedela rped...@datalanche.com wrote:

 On Fri, May 22, 2015 at 10:51 AM, Merlin Moncure mmonc...@gmail.com
 wrote:

 On Fri, May 22, 2015 at 9:43 AM, Alvaro Herrera
 alvhe...@2ndquadrant.com wrote:
  Andrew Dunstan wrote:
 
  On 05/20/2015 09:16 AM, Shulgin, Oleksandr wrote:
 
  Attached is a patch against master to generalize the JSON-producing
  functions in utils/adt/json.c and to provide a set of callbacks which
 can
  be overridden the same way that is already provided for *parsing*
 JSON.
 
  I'm not necessarily opposed to this, but it sure seems like a lot of
  changes, and moderately invasive ones, to support something that could
 be
  done, at the cost of reparsing, with a simple loadable extension that I
  could create in a few hours of programming.
 
  But this seems like a pretty reasonable change to make, no?  Doesn't the
  total amount of code decrease after this patch?  JSON stuff is pretty
  new so some refactoring and generalization of what we have is to be
  expected.

 Yeah.  Also, there have been a few previous gripes about this, for
 example,
 http://www.postgresql.org/message-id/cahbvmpzs+svr+y-ugxjrq+xw4dqtevl-cozc69zffwmxjck...@mail.gmail.com
 .
 As noted, I definitely prefer 'space free' by default for efficiency
 reasons, but standardizing the output has definitely got to be a
 reasonable goal.


 Every JSON implementation I have ever used defaults to the minified
 version of JSON (no whitespace) when printed.


Hashing of arrays seems to be an important issue: we'd rather make sure to
produce the same output in every code path.  That would also mean: no
special logic to add the line feeds in json_agg either.

Is it reasonable to add this patch to CommitFest now?

--
Alex


Re: [HACKERS] fsync bug faq for publication?

2015-05-26 Thread Magnus Hagander
On May 26, 2015 07:31, Tom Lane t...@sss.pgh.pa.us wrote:

 Josh Berkus j...@agliodbs.com writes:
  We need to get a notice out to our users who might update their servers
  and get stuck behind the fsync bug.  As such, I've prepared a FAQ.
  Please read, correct and improve this FAQ so that it's fit for us to
  announce to users as soon as possible:

  https://wiki.postgresql.org/wiki/May_2015_Fsync_Permissions_Bug

 Judging by Ross Boylan's report at

http://www.postgresql.org/message-id/f1f13e14a610474196571953929c02096d0...@ex08.net.ucsf.edu
 it's not sufficient to just recommend changing permissions on the
 problematic files.  It's not entirely clear from here whether there is a
 solution that both allows fsync on referenced files and keeps OpenSSL
 happy; but if there is, it probably requires making the cert files be
 owned by the postgres user, as well as adjusting their permissions to
 be 0640 or thereabouts.  I'm worried about whether that breaks other
 services using the same cert files.


It almost certainly will.

I think the recommendation has to be that if it's a symlink, it should be
replaced with a copy of the file, and that copy be chown and chmod the
right way.

/Magnus


[HACKERS] Why does txid_current() assign new transaction-id?

2015-05-26 Thread Naoya Anzai
Hi,hackers!

I have a question about txid_current().
it is Why does txid_current() assign new transaction-id?.

When we executes txid_current() outside of transaction block, it assigns new 
transaction-id.
I guess it doesn't need to assign a new txid because txid_current() is just a 
read-only function.

I found a replaceable function by walking through pg-code, that is 
GetStableLatestTransactionId(void).

I attached a patch which changing just 1-line.
Could you please check the code?

Regards,

Naoya

---
Naoya Anzai
Engineering Department
NEC Solution Inovetors, Ltd.
E-Mail: nao-an...@xc.jp.nec.com
---



txid_current.patch
Description: txid_current.patch

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] problems on Solaris

2015-05-26 Thread Dave Page
On Mon, May 25, 2015 at 1:07 AM, Andres Freund and...@anarazel.de wrote:
 On 2015-05-24 19:44:37 -0400, Andrew Dunstan wrote:

 Buildfarm members casteroides and protosciurus have been having some
 problems that seem puzzling. These animals both run on the same machine, but
 with different compilers.

 casteroides runs with the Sun Studio 12 compiler, and has twice in the last
 3 days demonstrated this error:

[5561ce0c.51b7:25] LOG:  starting background worker process test_shm_mq
[5561ce1e.5287:9] PANIC:  stuck spinlock (100cb77f4) detected at 
 atomics.c:30
[5561ce1e.5287:10] STATEMENT:  SELECT test_shm_mq_pipelined(16384, 
 (select string_agg(chr(32+(random()*95)::int), '') from 
 generate_series(1,27)), 200, 3);
[5561ce0c.51b7:26] LOG:  server process (PID 21127) was terminated by 
 signal 6
[5561ce0c.51b7:27] DETAIL:  Failed process was running: SELECT 
 test_shm_mq_pipelined(16384, (select string_agg(chr(32+(random()*95)::int), 
 '') from generate_series(1,27)), 200, 3);
[5561ce0c.51b7:28] LOG:  terminating any other active server processes

 It's not constant - between the two failures was a success.

 That's indeed rather odd. For one the relevant code does nothing but
 lock/unlock a spinlock. For another, there's been no recent change to
 this and casteroides has been running happily for a long time.

 protociurus runs with gcc 3.4.3 and gets this error:

gcc -Wall -Wmissing-prototypes -Wpointer-arith 
 -Wdeclaration-after-statement -Wendif-labels -Wmissing-format-attribute 
 -Wformat-security -fno-strict-aliasing -fwrapv 
 -Wno-unused-command-line-argument -g -I/usr/local/include -m64 -I. 
 -I../../../src/interfaces/libpq -I./../regress -I../../../src/include   -c 
 -o specparse.o specparse.c
In file included from /usr/include/sys/vnode.h:47,
  from /usr/include/sys/stream.h:22,
  from /usr/include/netinet/in.h:66,
  from /usr/include/netdb.h:98,
  from ../../../src/include/port.h:17,
  from ../../../src/include/c.h:1114,
  from ../../../src/include/postgres_fe.h:25,
  from specparse.y:13:
/usr/include/sys/kstat.h:439: error: syntax error before numeric constant
/usr/include/sys/kstat.h:463: error: syntax error before '}' token
/usr/include/sys/kstat.h:464: error: syntax error before '}' token
In file included from /usr/include/sys/stream.h:22,
  from /usr/include/netinet/in.h:66,
  from /usr/include/netdb.h:98,
  from ../../../src/include/port.h:17,
  from ../../../src/include/c.h:1114,
  from ../../../src/include/postgres_fe.h:25,
  from specparse.y:13:
/usr/include/sys/vnode.h:105: error: syntax error before kstat_named_t

 I'd noticed this one as well. This sounds like a installation problem,
 not really ours. Dave, any chance you could look into this, or give
 somebody an account to test what's up?

I'm not going to be able to look at this, at least this week. I can
give someone on the EDB team access - Robert; can one of your guys
take a look?

-- 
Dave Page
Blog: http://pgsnake.blogspot.com
Twitter: @pgsnake

EnterpriseDB UK: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] ERROR: MultiXactId xxxx has not been created yet -- apparent wraparound

2015-05-26 Thread Paul Smith

With PostgreSQL 9.3.5 on Ubuntu 12.04, I'm getting the error:

ERROR:  MultiXactId 1934308693 has not been created yet -- apparent 
wraparound


on doing various queries on our database. I don't think it is a 
wraparound - I think the tuple has mistakenly decided it has a 
MultiXactId related to it.


Looking back through the logs, it looks as if this suddenly started 
happening at 2:00 (UTC+0100) on 23 May. (Nothing was happening at that 
time as far as we can tell - the server didn't crash/restart or anything).


The logs were all really small, until the ones from then onwards, which 
are now full of things like:


2015-05-23 10:44:54 BST ERROR:  MultiXactId 1129406840 has not been 
created yet -- apparent wraparound
2015-05-23 10:44:54 BST CONTEXT:  automatic analyze of table 
x.messages.msgdata
2015-05-23 10:45:16 BST ERROR:  MultiXactId 1575170 has not been created 
yet -- apparent wraparound

2015-05-23 10:45:16 BST STATEMENT:  select sum(size) from messages.msgdata
2015-05-23 10:45:54 BST ERROR:  MultiXactId 1129406840 has not been 
created yet -- apparent wraparound
2015-05-23 10:45:54 BST CONTEXT:  automatic analyze of table 
x.messages.msgdata
2015-05-23 10:46:54 BST ERROR:  MultiXactId 1129406840 has not been 
created yet -- apparent wraparound


(There are several incorrect MultiXactIds in there).

This was NOT an upgrade. It has been running with 9.3 for ages.

The pg_controldata output is:

pg_control version number:937
Catalog version number:   201306121
Database system identifier:   5990773948116871611
Database cluster state:   in production
pg_control last modified: Tue 26 May 2015 09:50:25 BST
Latest checkpoint location:   7C6/8F863440
Prior checkpoint location:7C6/8E8576D8
Latest checkpoint's REDO location:7C6/8E8745F8
Latest checkpoint's REDO WAL file:000107C6008E
Latest checkpoint's TimeLineID:   1
Latest checkpoint's PrevTimeLineID:   1
Latest checkpoint's full_page_writes: on
Latest checkpoint's NextXID:  0/24005839
Latest checkpoint's NextOID:  1802564
Latest checkpoint's NextMultiXactId:  216
Latest checkpoint's NextMultiOffset:  439
Latest checkpoint's oldestXID:710
Latest checkpoint's oldestXID's DB:   1
Latest checkpoint's oldestActiveXID:  0
Latest checkpoint's oldestMultiXid:   1
Latest checkpoint's oldestMulti's DB: 1
Time of latest checkpoint:Tue 26 May 2015 09:45:57 BST
Fake LSN counter for unlogged rels:   0/1
Minimum recovery ending location: 0/0
Min recovery ending loc's timeline:   0
Backup start location:0/0
Backup end location:  0/0
End-of-backup record required:no
Current wal_level setting:minimal
Current max_connections setting:  200
Current max_prepared_xacts setting:   0
Current max_locks_per_xact setting:   64
Maximum data alignment:   8
Database block size:  8192
Blocks per segment of large relation: 131072
WAL block size:   8192
Bytes per WAL segment:16777216
Maximum length of identifiers:64
Maximum columns in an index:  32
Maximum size of a TOAST chunk:1996
Date/time type storage:   64-bit integers
Float4 argument passing:  by value
Float8 argument passing:  by value
Data page checksum version:   0

The pg_multixact directory contains two files
members/
offsets/

There are no locked up transactions is pg_stat_activity;

Any ideas? I can't even delete the records with the bad multixactid in 
them (which would be acceptable, as we can recover those individual 
records from backup). We don't want to restore fully from backup if 
possible as that will lose everything since the last good backup.




--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] 9.5 release notes may need ON CONFLICT DO NOTHING compatibility notice for FDW authors

2015-05-26 Thread Albe Laurenz
Peter Geoghegan wrote:
 In any case, third party foreign data wrappers that target other
 database system will totally ignore ON CONFLICT DO NOTHING when built
 against the master branch (unless they consider these questions). They
 should perhaps make a point of rejecting DO NOTHING outright where it
 makes sense that support could exist, but it just doesn't. Or they
 could just add support (I imagine that this would be very easy for
 mysql_fdw, for example -- MySQL has INSERT IGNORE). I feel a
 compatibility item in the release notes is in order so the question is
 considered, but there seems to be no place to do that on the Wiki, and
 the original commit message does not have a note like this.

+1

I wouldn't have become aware of that if I hadn't read your message.

Yours,
Laurenz Albe

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] ERROR: MultiXactId xxxx has not been created yet -- apparent wraparound

2015-05-26 Thread Tatsuo Ishii
It was fixed in 9.3.7.

Unfortunately 9.3.7 has new bug which is irrelevant to this.
http://www.postgresql.org/message-id/20150525142657.4686.35...@wrigleys.postgresql.org

I'm not sure if Ubuntu 12.04 is affected by the bug or not though.

As far as I know developers plan to release 9.3.8 etc. soon.

Best regards,
--
Tatsuo Ishii
SRA OSS, Inc. Japan
English: http://www.sraoss.co.jp/index_en.php
Japanese:http://www.sraoss.co.jp

 With PostgreSQL 9.3.5 on Ubuntu 12.04, I'm getting the error:
 
 ERROR: MultiXactId 1934308693 has not been created yet -- apparent
 wraparound
 
 on doing various queries on our database. I don't think it is a
 wraparound - I think the tuple has mistakenly decided it has a
 MultiXactId related to it.
 
 Looking back through the logs, it looks as if this suddenly started
 happening at 2:00 (UTC+0100) on 23 May. (Nothing was happening at that
 time as far as we can tell - the server didn't crash/restart or
 anything).
 
 The logs were all really small, until the ones from then onwards,
 which are now full of things like:
 
 2015-05-23 10:44:54 BST ERROR: MultiXactId 1129406840 has not been
 created yet -- apparent wraparound
 2015-05-23 10:44:54 BST CONTEXT: automatic analyze of table
 x.messages.msgdata
 2015-05-23 10:45:16 BST ERROR: MultiXactId 1575170 has not been
 created yet -- apparent wraparound
 2015-05-23 10:45:16 BST STATEMENT: select sum(size) from
 messages.msgdata
 2015-05-23 10:45:54 BST ERROR: MultiXactId 1129406840 has not been
 created yet -- apparent wraparound
 2015-05-23 10:45:54 BST CONTEXT: automatic analyze of table
 x.messages.msgdata
 2015-05-23 10:46:54 BST ERROR: MultiXactId 1129406840 has not been
 created yet -- apparent wraparound
 
 (There are several incorrect MultiXactIds in there).
 
 This was NOT an upgrade. It has been running with 9.3 for ages.
 
 The pg_controldata output is:
 
 pg_control version number:937
 Catalog version number:   201306121
 Database system identifier:   5990773948116871611
 Database cluster state:   in production
 pg_control last modified: Tue 26 May 2015 09:50:25 BST
 Latest checkpoint location:   7C6/8F863440
 Prior checkpoint location:7C6/8E8576D8
 Latest checkpoint's REDO location:7C6/8E8745F8
 Latest checkpoint's REDO WAL file:000107C6008E
 Latest checkpoint's TimeLineID:   1
 Latest checkpoint's PrevTimeLineID:   1
 Latest checkpoint's full_page_writes: on
 Latest checkpoint's NextXID:  0/24005839
 Latest checkpoint's NextOID:  1802564
 Latest checkpoint's NextMultiXactId:  216
 Latest checkpoint's NextMultiOffset:  439
 Latest checkpoint's oldestXID:710
 Latest checkpoint's oldestXID's DB:   1
 Latest checkpoint's oldestActiveXID:  0
 Latest checkpoint's oldestMultiXid:   1
 Latest checkpoint's oldestMulti's DB: 1
 Time of latest checkpoint:Tue 26 May 2015 09:45:57 BST
 Fake LSN counter for unlogged rels:   0/1
 Minimum recovery ending location: 0/0
 Min recovery ending loc's timeline:   0
 Backup start location:0/0
 Backup end location:  0/0
 End-of-backup record required:no
 Current wal_level setting:minimal
 Current max_connections setting:  200
 Current max_prepared_xacts setting:   0
 Current max_locks_per_xact setting:   64
 Maximum data alignment:   8
 Database block size:  8192
 Blocks per segment of large relation: 131072
 WAL block size:   8192
 Bytes per WAL segment:16777216
 Maximum length of identifiers:64
 Maximum columns in an index:  32
 Maximum size of a TOAST chunk:1996
 Date/time type storage:   64-bit integers
 Float4 argument passing:  by value
 Float8 argument passing:  by value
 Data page checksum version:   0
 
 The pg_multixact directory contains two files
 members/
 offsets/
 
 There are no locked up transactions is pg_stat_activity;
 
 Any ideas? I can't even delete the records with the bad multixactid in
 them (which would be acceptable, as we can recover those individual
 records from backup). We don't want to restore fully from backup if
 possible as that will lose everything since the last good backup.
 
 
 
 -- 
 Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
 To make changes to your subscription:
 http://www.postgresql.org/mailpref/pgsql-hackers


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] ERROR: MultiXactId xxxx has not been created yet -- apparent wraparound

2015-05-26 Thread Paul Smith



On 26/05/2015 10:23, Tatsuo Ishii wrote:

It was fixed in 9.3.7.


OK.

Do you know what the bug number was or what it was called (I've tried 
searching for it before posting, but found nothing which seemed to be 
the same problem as ours).


Do you know whether, if we upgrade to 9.3.7/8, it will fix the problem 
in the current database, or is the data damage 'permanent', so we'll 
need to restore from backup and somehow merge in the latest data?






--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Order of columns in query is important?!

2015-05-26 Thread CK Tan
You're right. I misread the problem description.

On Tue, May 26, 2015 at 3:13 AM, Petr Jelinek p...@2ndquadrant.com wrote:

 On 26/05/15 11:59, CK Tan wrote:

 It has to do with the implementation of slot_getattr, which tries to do
 the deform on-demand lazily.

 if you do select a,b,c, the execution would do slot_getattr(1) and
 deform a, and then slot_getattr(2) which reparse the tuple to deform b,
 and finally slot_getattr(3), which parse the tuple yet again to deform c.

 Where as if you do select c, b, a, it would do slot_getattr(3) to deform
 c, and in the process deform a and b in one pass. Subsequent calls to
 slot_getattr 1 and 2 would find the attribute ready and available, and
 return it (without parsing the tuple again).


 If this was the case, changing column order would lead to performance
 increase, not decrease as reported.

 My guess would be same as Amits, it's most likely the additional
 projection step.

 --
  Petr Jelinek  http://www.2ndQuadrant.com/
  PostgreSQL Development, 24x7 Support, Training  Services



Re: [HACKERS] fsync-pgdata-on-recovery tries to write to more files than previously

2015-05-26 Thread Stephen Frost
* Abhijit Menon-Sen (a...@2ndquadrant.com) wrote:
 At 2015-05-26 03:54:51 +0200, and...@anarazel.de wrote:
  Another thing is whether we should handle a recursive symlink in
  pgdata? I personally think not, but...
 
 I think not too.

Yikes..  That's definitely the kind of thing that's why I worry about
the whole 'fsync everything' idea- what if I symlink to /?  I've
certainly done that before from my home directory for ease of use and I
imagine there are people out there who have similar setups where they
sftp as the PG user and use the symlink to more easily navigate
somewhere else.  We have to realize that, on at least some systems,
PGDATA could be the postgres user's home directory too.  That's not the
case on Debian-based systems today, but I think it might have been
before Debian had the multi-cluster tooling.

  It's also not just as simple as making fsync_fname fail gracefully
  upon EACCESS - the opendir() could fail just as well.
 
 I'll post a proposed patch shortly.

Thanks!

Stephen


signature.asc
Description: Digital signature


Re: [HACKERS] Order of columns in query is important?!

2015-05-26 Thread CK Tan
It has to do with the implementation of slot_getattr, which tries to do the
deform on-demand lazily.

if you do select a,b,c, the execution would do slot_getattr(1) and deform
a, and then slot_getattr(2) which reparse the tuple to deform b, and
finally slot_getattr(3), which parse the tuple yet again to deform c.

Where as if you do select c, b, a, it would do slot_getattr(3) to deform c,
and in the process deform a and b in one pass. Subsequent calls to
slot_getattr 1 and 2 would find the attribute ready and available, and
return it (without parsing the tuple again).

For Vitesse X, we mark all columns that were required in the query during
JIT compile, and deform it in one shot. PG should be able to do the same.

-cktan




On Mon, May 25, 2015 at 2:26 AM, Colin 't Hart co...@sharpheart.org wrote:

 Hi,

 I hope this is the best place to report this or should I be on
 pgsql-general or pgsql-bugs?


 It seems that the order of columns in a query can make a difference in
 execution times.

 In my brief investigation, queries on table(a,b,c,d,e,f,g,h) of the form

 select * from table order by non-indexed-column limit 25;
 select a,b,c,d,e,f,g,h from table order by non-indexed-column limit 25;

 performed the same (approx 1.5 seconds on our customers table --
 rows=514431 width=215), while the query

 select h,g,f,e,d,c,b,a from table order by non-indexed-column limit 25;

 was about 50% slower (approx 2.2 seconds on our customers table).


 I had expected these to perform the same -- to my mind column ordering
 in a query should be purely presentation -- as far as I'm concerned,
 the DBMS can retrieve the columns in a different order as long as it
 displays it in the order I've asked for them. Although, again, the
 order of columns in a resultset in a Java or Python is mostly
 irrelevant, though when displayed in psql I'd want the columns in the
 order I asked for them.


 Is there really something strange happening here? Or perfectly
 explainable and expected?


 Regards,

 Colin


 --
 Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
 To make changes to your subscription:
 http://www.postgresql.org/mailpref/pgsql-hackers



Re: [HACKERS] Order of columns in query is important?!

2015-05-26 Thread Petr Jelinek

On 26/05/15 11:59, CK Tan wrote:

It has to do with the implementation of slot_getattr, which tries to do
the deform on-demand lazily.

if you do select a,b,c, the execution would do slot_getattr(1) and
deform a, and then slot_getattr(2) which reparse the tuple to deform b,
and finally slot_getattr(3), which parse the tuple yet again to deform c.

Where as if you do select c, b, a, it would do slot_getattr(3) to deform
c, and in the process deform a and b in one pass. Subsequent calls to
slot_getattr 1 and 2 would find the attribute ready and available, and
return it (without parsing the tuple again).



If this was the case, changing column order would lead to performance 
increase, not decrease as reported.


My guess would be same as Amits, it's most likely the additional 
projection step.


--
 Petr Jelinek  http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Order of columns in query is important?!

2015-05-26 Thread Simon Riggs
On 25 May 2015 at 11:48, Amit Langote langote_amit...@lab.ntt.co.jp wrote:

 On 2015-05-25 PM 06:26, Colin 't Hart wrote:



 That means the specified order of columns in a query does matter which
 would
 have to match the defined order in order to avoid extra processing (that is
 only when specified columns *exactly* matches the tuple descriptor).


...and it matters a lot in those cases because we are sorting all of the
data scanned, not just 25 rows.

-- 
Simon Riggshttp://www.2ndQuadrant.com/
http://www.2ndquadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training  Services


[HACKERS] Selectivity estimation for intarray with @@

2015-05-26 Thread Uriy Zhuravlev
Hello.

Attached patch based on:
http://www.postgresql.org/message-id/capphfdssy+qepdcovxx-b4lp3ybr+qs04m6-arggknfk3fr...@mail.gmail.com

and adds selectivity estimation functions to @@ (port from tsquery). Now we 
support , @, @ and @@.
In addition it was written migration to version 1.1 intarray. Because of what 
this patch requires my other patch:
http://www.postgresql.org/message-id/14346041.DNcb5Y1inS@dinodell

Alexander Korotkov know about this patch.

Thanks. 

-- 
Uriy Zhuravlev
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Companydiff --git a/contrib/intarray/Makefile b/contrib/intarray/Makefile
index 920c5b1..16c829c 100644
--- a/contrib/intarray/Makefile
+++ b/contrib/intarray/Makefile
@@ -2,10 +2,10 @@
 
 MODULE_big = _int
 OBJS = _int_bool.o _int_gist.o _int_op.o _int_tool.o \
-	_intbig_gist.o _int_gin.o $(WIN32RES)
+	_intbig_gist.o _int_gin.o _int_selfuncs.o $(WIN32RES)
 
 EXTENSION = intarray
-DATA = intarray--1.0.sql intarray--unpackaged--1.0.sql
+DATA = intarray--1.0.sql intarray--1.1.sql intarray--1.0--1.1.sql intarray--unpackaged--1.0.sql
 PGFILEDESC = intarray - functions and operators for arrays of integers
 
 REGRESS = _int
diff --git a/contrib/intarray/_int.h b/contrib/intarray/_int.h
index d524f0f..739c3c0 100644
--- a/contrib/intarray/_int.h
+++ b/contrib/intarray/_int.h
@@ -140,6 +140,7 @@ typedef struct QUERYTYPE
 #define COMPUTESIZE(size)	( HDRSIZEQT + (size) * sizeof(ITEM) )
 #define QUERYTYPEMAXITEMS	((MaxAllocSize - HDRSIZEQT) / sizeof(ITEM))
 #define GETQUERY(x)  ( (x)-items )
+#define GETRQUERY(x)  ( (x)-items + ((x)-size - 1) )
 
 /* type codes for ITEM */
 #define END		0
diff --git a/contrib/intarray/_int_selfuncs.c b/contrib/intarray/_int_selfuncs.c
new file mode 100644
index 000..fd80668
--- /dev/null
+++ b/contrib/intarray/_int_selfuncs.c
@@ -0,0 +1,346 @@
+/*-
+ *
+ * _int_selfuncs.c
+ *	  Functions for selectivity estimation of intarray operators
+ *
+ * Portions Copyright (c) 1996-2014, PostgreSQL Global Development Group
+ * Portions Copyright (c) 1994, Regents of the University of California
+ *
+ *
+ * IDENTIFICATION
+ *	  contrib/intarray/_int_selfuncs.c
+ *
+ *-
+ */
+#include postgres.h
+#include _int.h
+
+#include access/htup_details.h
+#include catalog/pg_operator.h
+#include catalog/pg_statistic.h
+#include catalog/pg_type.h
+#include utils/selfuncs.h
+#include utils/syscache.h
+#include utils/lsyscache.h
+#include miscadmin.h
+
+PG_FUNCTION_INFO_V1(_int_contsel);
+PG_FUNCTION_INFO_V1(_int_contjoinsel);
+PG_FUNCTION_INFO_V1(_int_matchsel);
+
+Datum _int_contsel(PG_FUNCTION_ARGS);
+Datum _int_contjoinsel(PG_FUNCTION_ARGS);
+Datum _int_matchsel(PG_FUNCTION_ARGS);
+
+/* lookup table type for binary searching through MCELEMs */
+typedef struct
+{
+	int32	   element;
+	float4		frequency;
+} Int4Freq;
+
+
+static Oid transformOperator(Oid oprOid);
+static Selectivity int_querysel(VariableStatData *vardata, Datum constval);
+static Selectivity int_query_opr_selec(ITEM *item, Int4Freq *lookup,
+	   int length, float4 minfreq);
+static Selectivity mcelem_int_query_selec(QUERYTYPE *query,
+	 Datum *mcelem, int nmcelem,
+	 float4 *numbers, int nnumbers);
+
+static int
+compare_val_int4freq(const void *a, const void *b);
+
+#define int_query_opr_selec_no_stats(query) \
+	int_query_opr_selec(GETRQUERY(query), NULL, 0, 0)
+
+
+
+static Oid
+transformOperator(Oid oprOid)
+{
+	HeapTuple			tup;
+	Form_pg_operator	op;
+	Oid	result = InvalidOid;
+
+	tup = SearchSysCache1(OPEROID, ObjectIdGetDatum(oprOid));
+	if (!HeapTupleIsValid(tup))
+		elog(ERROR, Invalid operator: %u, oprOid);
+
+	op = (Form_pg_operator) GETSTRUCT(tup);
+
+	if (!strcmp(op-oprname.data, ))
+		result = OID_ARRAY_OVERLAP_OP;
+	else if (!strcmp(op-oprname.data, @))
+		result = OID_ARRAY_CONTAINS_OP;
+	else if (!strcmp(op-oprname.data, @))
+		result = OID_ARRAY_CONTAINED_OP;
+
+	ReleaseSysCache(tup);
+
+	if (!OidIsValid(result))
+		elog(ERROR, Invalid operator: %u, oprOid);
+
+	return result;
+}
+
+/*
+ * _int_contsel -- restriction selectivity for array @, , @ operators
+ */
+Datum
+_int_contsel(PG_FUNCTION_ARGS)
+{
+	PG_RETURN_DATUM(DirectFunctionCall4(arraycontsel,
+			PG_GETARG_DATUM(0),
+			ObjectIdGetDatum(transformOperator(PG_GETARG_OID(1))),
+			PG_GETARG_DATUM(2),
+			PG_GETARG_DATUM(3)));
+}
+
+/*
+ * _int_contjoinsel -- join selectivity for array @, , @ operators
+ */
+Datum
+_int_contjoinsel(PG_FUNCTION_ARGS)
+{
+	PG_RETURN_DATUM(DirectFunctionCall5(arraycontjoinsel,
+			PG_GETARG_DATUM(0),
+			ObjectIdGetDatum(transformOperator(PG_GETARG_OID(1))),
+			PG_GETARG_DATUM(2),
+			PG_GETARG_DATUM(3),
+			PG_GETARG_DATUM(4)));
+}
+
+
+/*
+ * _int_matchsel -- restriction selectivity function for intarray @@ int_query
+ */
+Datum
+_int_matchsel(PG_FUNCTION_ARGS)
+{
+	PlannerInfo *root = (PlannerInfo *) 

Re: [HACKERS] fsync-pgdata-on-recovery tries to write to more files than previously

2015-05-26 Thread Robert Haas
On Mon, May 25, 2015 at 9:54 PM, Andres Freund and...@anarazel.de wrote:
 On 2015-05-25 21:33:03 -0400, Robert Haas wrote:
 On Mon, May 25, 2015 at 2:20 PM, Tom Lane t...@sss.pgh.pa.us wrote:
  Perhaps, but if we didn't have permission to write the file, it's hard to
  argue that it's our responsibility to fsync it.  So this seems like it's
  adding complexity without really adding any safety.

 I agree.  I think ignoring fsync failures is a very sensible approach.
 If the files are not writable, they're probably not ours.

 The reason we started discussing this is because Tom had the - quite
 reasonable - concern that this might not solely be a problem of EACCESS,
 but that there could be other errors that we need to ignore to not fail
 spuriously.  Say a symlink goes to a binary, which is currently being
 executed: ETXTBSY. Or the file is in a readonly filesystem: EROFS.  So
 we'd need to ignore a lot of errors, possibly ignoring valid ones.

But ignoring those errors wouldn't compromise data integrity, either.
So let's just ignore (but log) all errors: then we'll be demonstrably
no worse off than we were before this patch went in.  If it turns out
that there's some particular case where ignoring errors DOES
compromise data integrity, then we can plug that hole surgically when
somebody reports it.

Anything we do short of making all errors in this area non-fatal is
going to leave behind startup-failure cases that exist today, and we
have no evidence at this time that such startup failures would be
justified by any actual data loss risk.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] PostgreSQL 8.3 index page count clarification

2015-05-26 Thread Srinivas Karthik V
Hi,
   For the user created indexes in PostgreSQL 8.3.6, I would like to
know which file or function populates the index data structure.
Specifically, where in the code is index-tuples and index-pages values
are fetched and updated.

Regards and Thanks in Advance,
Srinivas Karthik


Re: [HACKERS] PostgreSQL 8.3 index page count clarification

2015-05-26 Thread Andrew Dunstan


On 05/26/2015 08:45 AM, Srinivas Karthik V wrote:

Hi,
   For the user created indexes in PostgreSQL 8.3.6, I would like 
to know which file or function populates the index data structure. 
Specifically, where in the code is index-tuples and index-pages 
values are fetched and updated.






8.3 is long out of support. Please move to a modern supported version of 
PostgreSQL.


cheers

andrew


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [COMMITTERS] pgsql: Row-Level Security Policies (RLS)

2015-05-26 Thread Stephen Frost
Alvaro,

* Alvaro Herrera (alvhe...@2ndquadrant.com) wrote:
 What do we need RowSecurityPolicy-policy_id for?  It seems to me that
 it is only used to determine whether the policy is the default deny
 one, so that it can later be removed if a hook adds a different one.
 This seems contrived as well as under-documented.  Why isn't a boolean
 flag sufficient?

Thanks for taking a look!

It's also used during relcache updates (see equalPolicy()).  That wasn't
originally the case (I had missed adding the necessary bits to relcache
in the original patch), but I wouldn't want to remove that piece now
and, given that it's there, using InvalidOid to indicate when it's the
default-deny policy (and therefore this is no actual Oid) seems
sensible.

Thanks again!

Stephen


signature.asc
Description: Digital signature


Re: [HACKERS] fsync-pgdata-on-recovery tries to write to more files than previously

2015-05-26 Thread Robert Haas
On Mon, May 25, 2015 at 9:54 PM, Stephen Frost sfr...@snowman.net wrote:
 I certainly see your point, but Tom also pointed out that it's not great
 to ignore failures during this phase:

 * Tom Lane (t...@sss.pgh.pa.us) wrote:
 Greg Stark st...@mit.edu writes:
  What exactly is failing?
  Is it that fsync is returning -1 ?
 According to the original report from Christoph Berg, it was open()
 not fsync() that was failing, at least in permissions-based cases.

 I'm not sure if we should just uniformly ignore all failures in this
 phase.  That would have the merit of clearly not creating any new
 startup failure cases compared to the previous code, but as you say
 sometimes it might mean ignoring real problems.

 If we accept this, then we still have to have the lists, to decide what
 to fail on and what to ignore.  If we're going to have said lists tho, I
 don't really see the point in fsync'ing things we're pretty confident
 aren't ours.

No, that's not right at all.  The idea, at least as I understand it,
would be decide which errors to ignore by looking at the error code,
not by looking at which file is involved.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Why does txid_current() assign new transaction-id?

2015-05-26 Thread Tom Lane
Christoph Berg m...@debian.org writes:
 I think the OP's point was (or should have been), to make txid_current
 not draw a new xid when run outside a transaction block, though it's
 questionable if that wouldn't just add a POLA-violating layer.

Well, the patch as proposed failed to do that, but in any case I don't
see much argument for changing the long-established behavior of that
function.  Better to add another one if you want a read-only monitoring
probe.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Run pgindent now?

2015-05-26 Thread Robert Haas
On Mon, May 25, 2015 at 5:34 PM, Tom Lane t...@sss.pgh.pa.us wrote:
 Bruce Momjian br...@momjian.us writes:
 On Mon, May 25, 2015 at 04:52:38PM -0300, Alvaro Herrera wrote:
 Something is wrong.  See aclchk.c changes.

 Yes, this is what I was concerned about.  aclitem was a typedef in 9.0
 and 9.1, and the use of that as a typedef in 9.4 is certainly odd:

   -   aclitem.ai_grantor = grantorId;
   +   aclitem.ai_grantor = grantorId;

 Yeah.  I think we might've gotten rid of that typedef partially in order
 to fix this.

 A different strategy we could consider is use HEAD's typedef list
 even in the back branches.  This would in some situations lead to
 inferior-looking results in the back branches, but that's probably better
 than inferior results in HEAD.  (In any case, we want the same typedef
 list across all branches.  Then anyplace where the results diverge, there
 must have been non-pgindent code changes, so that back-patching would
 require manual fixups anyway.)

This is kind of why I think that reindenting the back branches is
unlikely to be productive: it only helps if you can get pgindent to do
the same thing on all branches, and I bet that's going to be tough.

Realistically, with merge.conflictstyle = diff3 (why is this not the
default?), resolving whitespace conflicts that occur when you try to
cherry-pick is typically not very difficult.  But every time we
pgindent, especially with slightly different settings, we cause tools
like 'git blame' to return less useful answers.  And that sucks.

We also risk breaking private patchsets that people are carrying - of
course, Advanced Server is one (very large) such patchset, but it's
hardly the only place where people are compiling a non-standard
distribution that has to be reconciled with upstream changes.

I'm not going to put up a huge fuss if we decide to go ahead with
this, but I still think it's a bad plan, especially with regarding to
existing branches that have not been re-indented in years.  I bet it
won't save that much back-patching effort - maybe not any, on net -
and I bet it will inconvenience users and developers in various subtle
ways that we may not even hear about but which are still quite real.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] ERROR: MultiXactId xxxx has not been created yet -- apparent wraparound

2015-05-26 Thread Alvaro Herrera
Paul Smith wrote:

 No, nothing like that. It was just running fine, and then suddenly (at 2am
 on 23 May) it started throwing up loads of these errors. The DB server
 wasn't even restarted at that point. It was just working fine, then suddenly
 wasn't. (The first error was at 02:00:32 BST, then every few minutes after
 that there's another one).

Another crazy thought is that the host system messed up and overwrote
part of the table with pages from some other guest system.  I have never
seen a report of such a thing happening.

 It's running in a Hyper-V guest. We had taken a backup of the VM at 00:34 on
 23 May and that looks to be absolutely fine. What I have done now is restore
 that backup and import the new data which arrived since that backup was
 made, and it seems OK now. I still have the 'broken' installation in case
 more information is needed from it. I'd try to get a raw dump of the damaged
 tuple data if I knew how to find where it is in the relation file...

Well, you could try a binary search to locate the (or a) borked page
within the table.  Or you could write a plpgsql function with an
EXCEPTION block that reads each tuple, and in case of error reports the
CTID of the tuple being read. (select * from tab where
ctid=generated_ctid; EXCEPT WHEN OTHERS THEN raise notice ' .. ',
generated_ctid).  Once you have located the problem tuple, a pageinspect
dump would be the starting point.

Another idea is to attach a debugger to the backend process, and set a
breakpoint on function errfinish.  Once the error is hit the breakpoint
will stop the execution before the error is reported; at that point you
can request for a backtrace from the debugger.  (Make sure to examine
errordata to see whether the error being hit is the multixact one.)

 I suppose it's possible that it was disk or memory corruption, but I've seen
 that before, and it hasn't looked like this.

Yeah, same here.

-- 
Álvaro Herrerahttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] fsync bug faq for publication?

2015-05-26 Thread Josh Berkus
On 05/25/2015 11:09 PM, Magnus Hagander wrote:
 
 On May 26, 2015 07:31, Tom Lane t...@sss.pgh.pa.us
 mailto:t...@sss.pgh.pa.us wrote:

 Josh Berkus j...@agliodbs.com mailto:j...@agliodbs.com writes:
  We need to get a notice out to our users who might update their servers
  and get stuck behind the fsync bug.  As such, I've prepared a FAQ.
  Please read, correct and improve this FAQ so that it's fit for us to
  announce to users as soon as possible:

  https://wiki.postgresql.org/wiki/May_2015_Fsync_Permissions_Bug

 Judging by Ross Boylan's report at

 http://www.postgresql.org/message-id/f1f13e14a610474196571953929c02096d0...@ex08.net.ucsf.edu
 it's not sufficient to just recommend changing permissions on the
 problematic files.  It's not entirely clear from here whether there is a
 solution that both allows fsync on referenced files and keeps OpenSSL
 happy; but if there is, it probably requires making the cert files be
 owned by the postgres user, as well as adjusting their permissions to
 be 0640 or thereabouts.  I'm worried about whether that breaks other
 services using the same cert files.

 
 It almost certainly will.
 
 I think the recommendation has to be that if it's a symlink, it should
 be replaced with a copy of the file, and that copy be chown and chmod
 the right way.

Where did we get the idea that this issue only affects symlinked files?
 On testing, any file which postgres doesn't have write permissions on
is affected:

root@d623471b11ee:/var/lib/postgresql/9.3/main# touch root_file.txt
root@d623471b11ee:/var/lib/postgresql/9.3/main# ls -l
total 60
-rw--- 1 postgres postgres4 May 26 17:46 PG_VERSION
drwx-- 5 postgres postgres 4096 May 26 17:46 base
drwx-- 2 postgres postgres 4096 May 26 17:46 global
drwx-- 2 postgres postgres 4096 May 26 17:46 pg_clog
drwx-- 4 postgres postgres 4096 May 26 17:46 pg_multixact
drwx-- 2 postgres postgres 4096 May 26 17:46 pg_notify
drwx-- 2 postgres postgres 4096 May 26 17:46 pg_serial
drwx-- 2 postgres postgres 4096 May 26 17:46 pg_snapshots
drwx-- 2 postgres postgres 4096 May 26 17:47 pg_stat
drwx-- 2 postgres postgres 4096 May 26 17:46 pg_stat_tmp
drwx-- 2 postgres postgres 4096 May 26 17:46 pg_subtrans
drwx-- 2 postgres postgres 4096 May 26 17:46 pg_tblspc
drwx-- 2 postgres postgres 4096 May 26 17:46 pg_twophase
drwx-- 3 postgres postgres 4096 May 26 17:46 pg_xlog
-rw--- 1 postgres postgres  133 May 26 17:46 postmaster.opts
-rw-r--r-- 1 root root0 May 26 17:49 root_file.txt
root@d623471b11ee:/var/lib/postgresql/9.3/main# service postgresql start
 * Starting PostgreSQL 9.3 database server

[ OK ]
root@d623471b11ee:/var/lib/postgresql/9.3/main# ps aux | grep postgres
postgres  4627  0.2  0.4 244880 16100 ?S17:49   0:00
/usr/lib/postgresql/9.3/bin/postgres -D /var/lib/postgresql/9.3/main -c
config_file=/etc/postgresql/9.3/main/postgresql.conf
postgres  4629  0.0  0.0 244880  1868 ?Ss   17:49   0:00
postgres: checkpointer process

postgres  4630  0.0  0.0 244880  1872 ?Ss   17:49   0:00
postgres: writer process

postgres  4631  0.0  0.0 244880  1648 ?Ss   17:49   0:00
postgres: wal writer process

postgres  4632  0.0  0.0 245632  2956 ?Ss   17:49   0:00
postgres: autovacuum launcher process

postgres  4633  0.0  0.0 100556  1768 ?Ss   17:49   0:00
postgres: stats collector process

root  4647  0.0  0.0   8860   648 ?S+   17:49   0:00 grep
--color=auto postgres
root@d623471b11ee:/var/lib/postgresql/9.3/main# kill -9 4627
root@d623471b11ee:/var/lib/postgresql/9.3/main# service postgresql start
 * Starting PostgreSQL 9.3 database server

* Removed
stale pid file.
The PostgreSQL server failed to start. Please check the log output:
2015-05-26 17:49:36 UTC [4676-1] LOG:  database system was interrupted;
last known up at 2015-05-26 17:49:16 UTC
2015-05-26 17:49:36 UTC [4676-2] FATAL:  could not open file
/var/lib/postgresql/9.3/main/root_file.txt: Permission denied
2015-05-26 17:49:36 UTC [4675-1] LOG:  startup process (PID 4676) exited
with exit code 1
2015-05-26 17:49:36 UTC [4675-2] LOG:  aborting startup due to startup
process failure

-- 
Josh Berkus
PostgreSQL Experts Inc.
http://pgexperts.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Why does txid_current() assign new transaction-id?

2015-05-26 Thread Christoph Berg
Re: Tom Lane 2015-05-26 18863.1432661...@sss.pgh.pa.us
 Christoph Berg m...@debian.org writes:
  Still, exposing GetStableLatestTransactionId() on the SQL level would
  make sense for monitoring transaction throughput.
 
 Perhaps, though I wonder why we should expose that and not just report the
 result of ReadNewTransactionId() --- or in txid.c's case, the result of
 GetNextXidAndEpoch().

Whatever is most suitable, yes.

 In either case it would have to be a new function,
 not unilaterally redefining what txid_current() does.

Sure.

I think the OP's point was (or should have been), to make txid_current
not draw a new xid when run outside a transaction block, though it's
questionable if that wouldn't just add a POLA-violating layer.

Christoph
-- 
c...@df7cb.de | http://www.df7cb.de/


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] fsync bug faq for publication?

2015-05-26 Thread Tom Lane
Josh Berkus j...@agliodbs.com writes:
 Where did we get the idea that this issue only affects symlinked files?

Nobody said any such thing.  My point was that permissions and ownership
both have to be looked at.  The Debian situation is that there are symlinks
in $PGDATA pointing at root-owned files, and those files are (we think)
also used by other services; so Magnus' point was that you'd probably
better copy those files not modify their ownership/permissions in situ.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] fsync bug faq for publication?

2015-05-26 Thread Josh Berkus
On 05/26/2015 10:57 AM, Tom Lane wrote:
 Josh Berkus j...@agliodbs.com writes:
 Where did we get the idea that this issue only affects symlinked files?
 
 Nobody said any such thing.  My point was that permissions and ownership
 both have to be looked at.  The Debian situation is that there are symlinks
 in $PGDATA pointing at root-owned files, and those files are (we think)
 also used by other services; so Magnus' point was that you'd probably
 better copy those files not modify their ownership/permissions in situ.

Updated, please make further corrections so I can get an announcement
out ASAP.  Thanks!

https://wiki.postgresql.org/wiki/May_2015_Fsync_Permissions_Bug

-- 
Josh Berkus
PostgreSQL Experts Inc.
http://pgexperts.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] fsync-pgdata-on-recovery tries to write to more files than previously

2015-05-26 Thread Andrew Dunstan


On 05/26/2015 08:05 AM, Robert Haas wrote:

On Mon, May 25, 2015 at 9:54 PM, Stephen Frost sfr...@snowman.net wrote:

I certainly see your point, but Tom also pointed out that it's not great
to ignore failures during this phase:

* Tom Lane (t...@sss.pgh.pa.us) wrote:

Greg Stark st...@mit.edu writes:

What exactly is failing?
Is it that fsync is returning -1 ?

According to the original report from Christoph Berg, it was open()
not fsync() that was failing, at least in permissions-based cases.

I'm not sure if we should just uniformly ignore all failures in this
phase.  That would have the merit of clearly not creating any new
startup failure cases compared to the previous code, but as you say
sometimes it might mean ignoring real problems.

If we accept this, then we still have to have the lists, to decide what
to fail on and what to ignore.  If we're going to have said lists tho, I
don't really see the point in fsync'ing things we're pretty confident
aren't ours.

No, that's not right at all.  The idea, at least as I understand it,
would be decide which errors to ignore by looking at the error code,
not by looking at which file is involved.




OK, I'm late to the party. But why exactly are we syncing absolutely 
everything? That seems over-broad.


And might it be better to check that we can open each file using 
access() than calling open() and looking at the error code?


cheers

andrew



--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] ERROR: MultiXactId xxxx has not been created yet -- apparent wraparound

2015-05-26 Thread Paul Smith



On 26/05/2015 16:01, Alvaro Herrera wrote:

Paul Smith wrote:

With PostgreSQL 9.3.5 on Ubuntu 12.04, I'm getting the error:

ERROR:  MultiXactId 1934308693 has not been created yet -- apparent
wraparound

on doing various queries on our database. I don't think it is a wraparound -
I think the tuple has mistakenly decided it has a MultiXactId related to it.

Yeah, that looks like the case.  According to your pg_controldata
output, you haven't used many multixacts at all:

Yep, that's what I thought as well.

so it doesn't seem plausible that the single bit HEAP_XMAX_IS_MULTI 
was turned on accidentally (something which I've never seen happen). 
It doesn't look like a randomly corrupt page either; normally you 
would see errors about mismatching page headers before you get to the 
point where Xmax is read. I wonder if the data page came from 
elsewhere. Maybe you copied a data file from another database? 


No, nothing like that. It was just running fine, and then suddenly (at 
2am on 23 May) it started throwing up loads of these errors. The DB 
server wasn't even restarted at that point. It was just working fine, 
then suddenly wasn't. (The first error was at 02:00:32 BST, then every 
few minutes after that there's another one).


It's running in a Hyper-V guest. We had taken a backup of the VM at 
00:34 on 23 May and that looks to be absolutely fine. What I have done 
now is restore that backup and import the new data which arrived since 
that backup was made, and it seems OK now. I still have the 'broken' 
installation in case more information is needed from it. I'd try to get 
a raw dump of the damaged tuple data if I knew how to find where it is 
in the relation file...


I suppose it's possible that it was disk or memory corruption, but I've 
seen that before, and it hasn't looked like this. (There are several 
PostgreSQL guests running on the same Hyper-V server, and none of the 
others seem to have this problem which may suggest that it's less likely 
to be a hardware issue).






--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] fsync-pgdata-on-recovery tries to write to more files than previously

2015-05-26 Thread Andrew Dunstan


On 05/26/2015 11:58 AM, Tom Lane wrote:

Andrew Dunstan and...@dunslane.net writes:

OK, I'm late to the party. But why exactly are we syncing absolutely
everything? That seems over-broad.

If we try to be selective, we risk errors of omission, which no one would
ever notice until someone's data got eaten in a low-probability crash
scenario.  It seems more robust (at least to me) to fsync everything we
can find.  That does require more thought about error cases than went
into the original patch ... but I think that we need more thought about
error cases even if we do try to be selective.

One thing perhaps we *should* be selective about, though, is which
symlinks we try to follow.  I think that a good case could be made
for ignoring symlinks everywhere except in the pg_tablespace directory.
If we did, that would all by itself take care of the Debian scenario,
if I understand that case correctly.


People have symlinked the xlog directory. I've done it myself in the 
past. A better rule might be to ignore symlinks unless either they are 
in pg_tblspc or they are in the data directory and their name starts 
with pg_.


cheers

andrew




--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] fsync-pgdata-on-recovery tries to write to more files than previously

2015-05-26 Thread Tom Lane
Andrew Dunstan and...@dunslane.net writes:
 On 05/26/2015 11:58 AM, Tom Lane wrote:
 One thing perhaps we *should* be selective about, though, is which
 symlinks we try to follow.  I think that a good case could be made
 for ignoring symlinks everywhere except in the pg_tablespace directory.
 If we did, that would all by itself take care of the Debian scenario,
 if I understand that case correctly.

 People have symlinked the xlog directory.

Good point, we need to cover that case.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] fsync-pgdata-on-recovery tries to write to more files than previously

2015-05-26 Thread Tom Lane
Andrew Dunstan and...@dunslane.net writes:
 OK, I'm late to the party. But why exactly are we syncing absolutely 
 everything? That seems over-broad.

If we try to be selective, we risk errors of omission, which no one would
ever notice until someone's data got eaten in a low-probability crash
scenario.  It seems more robust (at least to me) to fsync everything we
can find.  That does require more thought about error cases than went
into the original patch ... but I think that we need more thought about
error cases even if we do try to be selective.

One thing perhaps we *should* be selective about, though, is which
symlinks we try to follow.  I think that a good case could be made
for ignoring symlinks everywhere except in the pg_tablespace directory.
If we did, that would all by itself take care of the Debian scenario,
if I understand that case correctly.

 And might it be better to check that we can open each file using 
 access() than calling open() and looking at the error code?

Don't really see the point; that's just an extra step, and access()
won't exactly prove you can open the file, anyway.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] why does txid_current() assign new transaction-id?

2015-05-26 Thread Naoya Anzai
Hi,hackers!

I have a question about txid_current().
it is Why does txid_current() assign new transaction-id?.

When we executes txid_current() outside of transaction block, it assigns new 
transaction-id.
I guess it doesn't need to assign a new txid because txid_current() is just a 
read-only function.

I found a replaceable function by walking through pg-code, that is 
GetStableLatestTransactionId(void).

I attached a patch which changing just 1-line.
Could you please check the code?

Regards,

Naoya

---
Naoya Anzai
Engineering Department
NEC Solution Inovetors, Ltd.
E-Mail: nao-an...@xc.jp.nec.com
---



txid_current.patch
Description: txid_current.patch

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] brin regression test intermittent failures

2015-05-26 Thread Alvaro Herrera
Tom Lane wrote:
 Peter Geoghegan p...@heroku.com writes:
  I meant to get around to looking into it, but FWIW I see BRIN-related
  Valgrind issues. e.g.:
 
 Fixed, see 79f2b5d583e2e2a7; but AFAICS this has no real-world impact
 so it does not explain whatever is happening on chipmunk.

Ah, thanks for diagnosing that.

The chipmunk failure is strange -- notice it only references the
= operators, except for type box for which it's ~= that fails.  The test
includes a lot of operators ...

Also, we have quite a number of ARM boxes: apart from chipmunk we have
gull, hamster, mereswine, dangomushi, axolotl, grison.  (hamster and
chipmunk report hostname -m as armv6l, the others armv7l).  All of
them are running Linux, either Fedora or Debian.  Most are using gcc,
compilation flags look pretty standard.

-- 
Álvaro Herrerahttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Why does txid_current() assign new transaction-id?

2015-05-26 Thread Tom Lane
Naoya Anzai nao-an...@xc.jp.nec.com writes:
 I have a question about txid_current().
 it is Why does txid_current() assign new transaction-id?.

Consider

begin;
select txid_current();
insert into my_table ...;
commit;

If we changed the code as you propose, the result of the SELECT would
no longer have anything to do with the XID used for the insertion.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] fsync-pgdata-on-recovery tries to write to more files than previously

2015-05-26 Thread Tom Lane
Robert Haas robertmh...@gmail.com writes:
 Anything we do short of making all errors in this area non-fatal is
 going to leave behind startup-failure cases that exist today, and we
 have no evidence at this time that such startup failures would be
 justified by any actual data loss risk.

Yeah.  Perhaps I missed it, but was the original patch motivated by
actual failures that had been seen in the field, or was it just a
hypothetical concern?  Certainly, any actual failures of that sort
are few and far between compared to the number of problems we now
realize the patch introduced.

Also, we need to discuss how hard walkdir() needs to try to avoid
throwing errors of its own.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Run pgindent now?

2015-05-26 Thread Alvaro Herrera
Robert Haas wrote:

 But every time we pgindent, especially with slightly different
 settings, we cause tools like 'git blame' to return less useful
 answers.  And that sucks.

I've wondered a few times whether there's a way to make pgindent commits
transparent to git blame, i.e. blame their modified lines to whatever
commits modified them immediately before.

-- 
Álvaro Herrerahttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] jsonb_set

2015-05-26 Thread Andrew Dunstan
Here is the latest version of the refinement of jsonb_replace into 
jsonb_set. All known bugs have been fixed, and the only issue is the 
default value of the fourth parameter. Currently it's set to false, but 
I gather from the previous responses that the consensus is to make it true.


cheers

andrew
diff --git a/doc/src/sgml/func.sgml b/doc/src/sgml/func.sgml
index 89a609f..a33f03d 100644
--- a/doc/src/sgml/func.sgml
+++ b/doc/src/sgml/func.sgml
@@ -10323,6 +10323,15 @@ table2-mapping
  /tgroup
/table
 
+  note
+   para
+The literal||/ operator concatenates the elements at the top level of
+each of its operands. It does not operate recursively. For example, if
+both operands are objects with a common key field name, the value of the
+field in the result will just be the value from the right hand operand.
+   /para
+  /note
+
   para
xref linkend=functions-json-creation-table shows the functions that are
available for creating typejson/type and typejsonb/type values.
@@ -10830,17 +10839,24 @@ table2-mapping
entryliteral[{f1:1},2,null,3]/literal/entry
/row
   row
-   entryparaliteraljsonb_replace(target jsonb, path text[], replacement jsonb)/literal
+   entryparaliteraljsonb_set(target jsonb, path text[], new_value jsonboptional, parametercreate_missing/parameter typeboolean/type/optional)/literal
  /para/entry
entryparatypejsonb/type/para/entry
entry
  Returns replaceabletarget/replaceable
- with the section designated by  replaceablepath/replaceable
- replaced by replaceablereplacement/replaceable.
-   /entry
-   entryliteraljsonb_replace('[{f1:1,f2:null},2,null,3]', '{0,f1}','[2,3,4]')/literal/entry
-   entryliteral[{f1:[2,3,4],f2:null},2,null,3]/literal
-/entry
+ with the section designated by replaceablepath/replaceable
+ replaced by replaceablenew_value/replaceable, or with
+ replaceablenew_value/replaceable added if
+ replaceablecreate_missing/replaceable is true ( default is
+ literalfalse/) and the item
+ designated by replaceablepath/replaceable does not exist.
+   /entry
+   entryparaliteraljsonb_set('[{f1:1,f2:null},2,null,3]', '{0,f1}','[2,3,4]')/literal
+ /paraparaliteraljsonb_set('[{f1:1,f2:null},2]', '{0,f3}','[2,3,4]', true)/literal
+ /para/entry
+   entryparaliteral[{f1:[2,3,4],f2:null},2,null,3]/literal
+ /paraparaliteral[{f1: 1, f2: null, f3: [2, 3, 4]}, 2]/literal
+/para/entry
/row
   row
entryparaliteraljsonb_pretty(from_json jsonb)/literal
@@ -10893,6 +10909,27 @@ table2-mapping
 
   note
 para
+  All the items of the literalpath/ parameter of literaljsonb_set/
+  must be present in the literaltarget/, except when
+  literalcreate_missing/ is true, in which case all but the last item
+  must be present. If these conditions are not met the literaltarget/
+  is returned unchanged.
+/para
+para
+  If the last path item is an object key, it will be created if it
+  is absent and given the new value. If the last path item is an array
+  index, if it is positive the item to set is found by counting from
+  the left, and if negative by counting from the right - literal-1/
+  designates the rightmost element, and so on.
+  If the item is out of the range -array_length .. array_length -1,
+  and create_missing is true, the new value is added at the beginning
+  of the array if the item is negative, and at the end of the array if
+  it is positive.
+/para
+  /note
+
+  note
+para
   The literaljson_typeof/ function's literalnull/ return value
   should not be confused with a SQL NULL.  While
   calling literaljson_typeof('null'::json)/ will
diff --git a/src/backend/catalog/system_views.sql b/src/backend/catalog/system_views.sql
index 18921c4..3ff1437 100644
--- a/src/backend/catalog/system_views.sql
+++ b/src/backend/catalog/system_views.sql
@@ -922,3 +922,10 @@ RETURNS interval
 LANGUAGE INTERNAL
 STRICT IMMUTABLE
 AS 'make_interval';
+
+CREATE OR REPLACE FUNCTION
+  jsonb_set(jsonb_in jsonb, path text[] , replacement jsonb, create_if_missing boolean DEFAULT false)
+RETURNS jsonb
+LANGUAGE INTERNAL
+STRICT IMMUTABLE
+AS 'jsonb_set';
diff --git a/src/backend/utils/adt/jsonfuncs.c b/src/backend/utils/adt/jsonfuncs.c
index 5143493..e41c869 100644
--- a/src/backend/utils/adt/jsonfuncs.c
+++ b/src/backend/utils/adt/jsonfuncs.c
@@ -124,18 +124,20 @@ static JsonbValue *findJsonbValueFromContainerLen(JsonbContainer *container,
 			   char *key,
 			   uint32 keylen);
 
-/* functions supporting jsonb_delete, jsonb_replace and jsonb_concat */
+/* functions supporting jsonb_delete, jsonb_set and jsonb_concat */
 static JsonbValue *IteratorConcat(JsonbIterator **it1, JsonbIterator **it2,
 			   JsonbParseState **state);
-static JsonbValue *replacePath(JsonbIterator **it, 

Re: [HACKERS] optimizing vacuum truncation scans

2015-05-26 Thread Jeff Janes
On Mon, Apr 20, 2015 at 10:18 AM, Jim Nasby jim.na...@bluetreble.com
wrote:

 On 4/20/15 1:50 AM, Jeff Janes wrote:


  For that matter, why do we scan backwards anyway? The comments don't
 explain it, and we have nonempty_pages as a starting point, so why
 don't we just scan forward? I suspect that eons ago we didn't have
 that and just blindly reverse-scanned until we finally hit a
 non-empty buffer...


 nonempty_pages is not concurrency safe, as the pages could become used
 after vacuum passed them over but before the access exclusive lock was
 grabbed before the truncation scan.  But maybe the combination of the
 two?  If it is above nonempty_pages, then anyone who wrote into the page
 after vacuum passed it must have cleared the VM bit. And currently I
 think no one but vacuum ever sets VM bit back on, so once cleared it
 would stay cleared.


 Right.

  In any event nonempty_pages could be used to set the guess as to how
 many pages (if any) might be worth prefetching, as that is not needed
 for correctness.


 Yeah, but I think we'd do a LOT better with the VM idea, because we could
 immediately truncate without scanning anything.


Right now all the interlocks to make this work seem to be in place (only
vacuum and startup can set visibility map bits, and only one vacuum can be
in a table at a time).  But as far as I can tell, those assumption are not
baked in and we have pondered loosening them before.

For example, letting HOT clean up mark a page as all-visible if it finds it
be such.  Now in that specific case it would be OK, as HOT cleanup would
not cause a page to become empty (or could it?  If an insert on a table
with no indexes was rolled back, and hot clean up found it and cleaned it
up, it could conceptually become empty--unless we make special code to
prevent it) , and so the page would have to be below nonempty_pages.  But
there may be other cases.

And I know other people have mentioned making VACUUM concurrent (although I
don't see the value in that myself).

So doing it this way would be hard to beat (scanning a bitmap vs the table
itself), but it would also introduce a modularity violation that I am not
sure is worth it.

Of course this could always be reverted if its requirements became a
problem for a more important change (assuming of course that we detected
the problem)

Cheers,

Jeff


Re: [HACKERS] Run pgindent now?

2015-05-26 Thread Andrew Dunstan


On 05/25/2015 05:34 PM, Tom Lane wrote:

Bruce Momjian br...@momjian.us writes:

On Mon, May 25, 2015 at 04:52:38PM -0300, Alvaro Herrera wrote:

Something is wrong.  See aclchk.c changes.

Yes, this is what I was concerned about.  aclitem was a typedef in 9.0
and 9.1, and the use of that as a typedef in 9.4 is certainly odd:
-   aclitem.ai_grantor = grantorId;
+   aclitem.ai_grantor = grantorId;

Yeah.  I think we might've gotten rid of that typedef partially in order
to fix this.

A different strategy we could consider is use HEAD's typedef list
even in the back branches.  This would in some situations lead to
inferior-looking results in the back branches, but that's probably better
than inferior results in HEAD.  (In any case, we want the same typedef
list across all branches.  Then anyplace where the results diverge, there
must have been non-pgindent code changes, so that back-patching would
require manual fixups anyway.)

A longer-term fix would be to make pgindent less stupid about this sort
of usage, but nobody's yet volunteered to dig into the guts of that code.





It looks to me like this says we should use the typedefs for each branch 
in any pgindent run for that branch, with the list being fetched just 
before each run, so it reflects any backpatches, bug fixes etc.


The buildfarm collects these lists for all live branches now (or at 
least my animals do) and keeps them up to date.


Anything else is likely to lead to false positives with results like 
that above.


cheers

andrew




--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] ERROR: MultiXactId xxxx has not been created yet -- apparent wraparound

2015-05-26 Thread Josh Berkus
On 05/26/2015 11:47 AM, Alvaro Herrera wrote:
 Paul Smith wrote:
 
 No, nothing like that. It was just running fine, and then suddenly (at 2am
 on 23 May) it started throwing up loads of these errors. The DB server
 wasn't even restarted at that point. It was just working fine, then suddenly
 wasn't. (The first error was at 02:00:32 BST, then every few minutes after
 that there's another one).
 
 Another crazy thought is that the host system messed up and overwrote
 part of the table with pages from some other guest system.  I have never
 seen a report of such a thing happening.

I have.  Linux kernel version of the host?

-- 
Josh Berkus
PostgreSQL Experts Inc.
http://pgexperts.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Run pgindent now?

2015-05-26 Thread Tom Lane
Robert Haas robertmh...@gmail.com writes:
 Realistically, with merge.conflictstyle = diff3 (why is this not the
 default?), resolving whitespace conflicts that occur when you try to
 cherry-pick is typically not very difficult.

Really?  The problems I have generally come from places where pgindent
has changed the line breaks, not just horizontal spacing.  I haven't
seen anything that copes with this, certainly not git.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] ERROR: MultiXactId xxxx has not been created yet -- apparent wraparound

2015-05-26 Thread Paul Smith

On 26/05/2015 19:47, Alvaro Herrera wrote:

Paul Smith wrote:


No, nothing like that. It was just running fine, and then suddenly (at 2am
on 23 May) it started throwing up loads of these errors. The DB server
wasn't even restarted at that point. It was just working fine, then suddenly
wasn't. (The first error was at 02:00:32 BST, then every few minutes after
that there's another one).

Another crazy thought is that the host system messed up and overwrote
part of the table with pages from some other guest system.  I have never
seen a report of such a thing happening.

Yes, that seems unlikely to me, but you never know...

While I was trying to work out how to fix it (after my original email), 
I did see a pgsql server crash:


2015-05-26 12:26:30 BST LOG:  server process (PID 35493) was terminated 
by signal 11: Segmentation fault





It's running in a Hyper-V guest. We had taken a backup of the VM at 00:34 on
23 May and that looks to be absolutely fine. What I have done now is restore
that backup and import the new data which arrived since that backup was
made, and it seems OK now. I still have the 'broken' installation in case
more information is needed from it. I'd try to get a raw dump of the damaged
tuple data if I knew how to find where it is in the relation file...

Well, you could try a binary search to locate the (or a) borked page
within the table.  Or you could write a plpgsql function with an
EXCEPTION block that reads each tuple, and in case of error reports the
CTID of the tuple being read. (select * from tab where
ctid=generated_ctid; EXCEPT WHEN OTHERS THEN raise notice ' .. ',
generated_ctid).


In the backup, for the row which will become broken, I get:

ctid|   xmin   |   xmax
+--+--
 (18722,29) | 23862661 | 23862661


Assuming it hadn't changed in the broken version (from application 
knowledge, it's unlikely to change), then on the broken version,

select heap_page_items(get_raw_page('messages.msgdata', 18722)); gives:

 (1,2,2,0,)
 (2,7816,1,376,21857691,21857691,11,(18722,2),32784,10646,24,,)
 (3,4,2,0,)
 (4,7568,1,248,21858126,21858126,11,(18722,4),32784,10646,24,,)
 (5,6,2,0,)
 (6,7216,1,352,22823803,23871405,41,(18722,30),49168,9478,24,,)
 (7,8,2,0,)
 (8,6968,1,248,22825206,22825206,11,(18722,8),32784,10646,24,,)
 (9,10,2,0,)
 (10,6720,1,248,22826427,22826427,11,(18722,10),32784,10646,24,,)
 (11,12,2,0,)
 (12,6336,1,380,22826899,23935627,1,(18722,31),49168,9478,24,,)
 (13,0,3,0,)
 (14,15,2,0,)
 (15,5400,1,932,23862314,23862314,11,(18722,15),32784,10642,24,,)
 (16,17,2,0,)
 (17,5168,1,228,23862333,23862333,11,(18722,17),32784,10646,24,,)
 (18,19,2,0,)
 (19,4800,1,368,23862365,23862365,11,(18722,19),32784,10646,24,,)
 (20,24,2,0,)
 
(21,3112,1,812,1332175474,894779760,223823413,(1443526721,25142),12141,18516,57,,)
 (22,23,2,0,)
 
(23,4448,1,352,728903474,1466642284,1869042033,(1262568257,16743),16751,26455,65,011010101101001010101010001001101010110100101010100010100111011011001100100010110110111011000010101100101011001011011000110010110010110011001011,)
 
(24,4184,1,264,619804,228,8000,(956969542,26694),22857,29251,98,110001101110011001100110011111001010001000101110010011010010100101100010110010010110011000100110101011100100010101100010111011000110101000101101011111000101011001001100100111001101001010011010011001101100011010001010001101101001011001101000101000110110011010101100001010100110010010100111011000101110010010100101101011010110101000101100101100101000101011111010111000101100110001110010010100101010010100100100101001101100111010110101111001101100101000110110101101101001101001100110,)
 (25,26,2,0,)
 
(26,3928,1,252,1212837441,1232350037,1668772674,(728910135,11093),31284,16687,70,01101010001010001010010011001100110010101101001001110111011011100110011001101011011010011100100001001010100011000101001000101110110101000110101100101011011010110010001100100010101011101100111001001100110010110101101101100100101101101110101010100010011011100100101010110110001100100101011001001110001101101100011001100110,1717791858)
 (27,21,2,0,)
 (28,29,2,0,)
 
(29,2296,1,812,1229271346,1934308693,1733443957,(1848337707,21358),31337,21592,88,,1178686785)
 
(30,1944,1,352,947155032,1380218998,792031842,(1211650927,22579),30585,20532,80,,)
 (31,1560,1,380,23935627,23935627,1,(18722,31),40976,9510,24,,)
(31 rows)


The data for index 29 has 1934308693 in the header (which is the 
multixactid reported in the error message when trying to retrieve that row).


I'm struggling to find the 

Re: [HACKERS] fsync-pgdata-on-recovery tries to write to more files than previously

2015-05-26 Thread Andres Freund
On 2015-05-26 19:07:20 +0200, Andres Freund wrote:
 It is somewhat interesting that similar code has been used in
 pg_upgrade, via initdb -S, for a while now, without, to my knowledge, it
 causing reported problem. I think the relevant difference is that that
 code doesn't follow symlinks.  It's obviously also less exercised and
 poeople might just have fixed up permissions when encountering troubles.
 
 Abhijit, do you recall why the code was changed to follow all symlinks
 in contrast to explicitly going through the tablespaces as initdb -S
 does? I'm pretty sure early versions of the patch pretty much had a
 verbatim copy of the initdb logic?  That logic is missing pg_xlog btw,
 which is bad for pg_upgrade.

So, this was discussed in the following thread, starting at:
http://archives.postgresql.org/message-id/20150403163232.GA28444%40eldon.alvh.no-ip.org

Actually, since surely we must follow symlinks everywhere, why do we
have to do this separately for pg_tblspc?  Shouldn't that link-following
occur automatically when walking PGDATA in the first place?

I don't think it's true that we must follow symlinks everywhere. I
think, as argued upthread, that it's sufficient to recurse through
PGDATA, follow the symlinks in pg_tbspc, and if a symlink, also go
through pg_xlog separately.  There are no other places we it's allowed
to introduce symlinks and we have refuted bugreports of people having
problems after doing that.

So what I propose is:
1) Remove the automatic symlink following
2) Follow pg_tbspc/*, pg_xlog if it's a symlink, fix the latter in
   initdb -S
3) Add a elevel argument to walkdir(), return if AllocateDir() fails,
   continue for stat() failures in the readdir() loop.
4) Add elevel argument to pre_sync_fname, fsync_fname, return after
   errors.
5) Accept EACCESS, ETXTBSY (if defined) when open()ing the files. By
   virtue of not following symlinks we should not need to worry about
   EROFS

I'm inclined to think that 4) is a big enough compat break that a
fsync_fname_ext with the new argument is a good idea.

Arguments for/against?


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Run pgindent now?

2015-05-26 Thread Aidan Van Dyk
On Tue, May 26, 2015 at 3:07 PM, Tom Lane t...@sss.pgh.pa.us wrote:

 Robert Haas robertmh...@gmail.com writes:
  Realistically, with merge.conflictstyle = diff3 (why is this not the
  default?), resolving whitespace conflicts that occur when you try to
  cherry-pick is typically not very difficult.

 Really?  The problems I have generally come from places where pgindent
 has changed the line breaks, not just horizontal spacing.  I haven't
 seen anything that copes with this, certainly not git.


Iif pgindet were easy to run, committers could start complaining if patch
submissions don't abide by pg coding style conventions.

Part of submitting a patch would be making sure that an pgindent run
after the patch has been applied is still a no-op...  A reviewer could
easily check it, and a committer could easily squash the pgindent run
result in if they wanted to be nice to a 1st time submitter...

If every patch were pgindent clean, then you would never end up with
these huge pgindent commits causing pain...

a.


Re: [HACKERS] Run pgindent now?

2015-05-26 Thread Andres Freund
On 2015-05-26 16:32:42 -0300, Alvaro Herrera wrote:
 I've wondered a few times whether there's a way to make pgindent commits
 transparent to git blame, i.e. blame their modified lines to whatever
 commits modified them immediately before.

You can make blame ignore whitespace changes with -w -- but that
obviously doesn't help with rewrapped lines and such.


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] hstore_plpython regression test does not work on Python 3

2015-05-26 Thread Oskari Saarenmaa
22.05.2015, 09:44, Christian Ullrich kirjoitti:
 * Peter Eisentraut wrote:
 On 5/16/15 12:06 PM, Tom Lane wrote:
 As exhibited for instance here:

 http://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=spoonbilldt=2015-05-16%2011%3A00%3A07


 I've been able to replicate this on a Fedora 21 box: works fine with
 Python 2, fails with Python 3.  Seems like we still have an issue
 with reliance on a system-provided sort method.

 Pushed a fix, tested with 2.3 .. 3.4.
 
 There is still a sorting problem (of sorts). jaguarundi [1] keeps
 failing intermittently like this:
 
 *** 47,53 
   return len(val)
   $$;
   SELECT test1arr(array['aa=bb, cc=NULL'::hstore, 'dd=ee']);
 ! INFO:  [{'aa': 'bb', 'cc': None}, {'dd': 'ee'}]
   CONTEXT:  PL/Python function test1arr
test1arr
   --
 --- 47,53 
   return len(val)
   $$;
   SELECT test1arr(array['aa=bb, cc=NULL'::hstore, 'dd=ee']);
 ! INFO:  [{'cc': None, 'aa': 'bb'}, {'dd': 'ee'}]
   CONTEXT:  PL/Python function test1arr
test1arr
   --
 
 I cannot find any other animal that does the same, but I doubt it's due
 to CCA this time.
 
 Should dict tests perhaps output sorted(thedict.items()) instead?
 Testing dict formatting could be done with single-item dicts.
 
 [1] http://pgbuildfarm.org/cgi-bin/show_history.pl?nm=jaguarundibr=HEAD

Looks like that animal uses Python 3.4.  Python 3.3 and newer versions
default to using a random seed for hashing objects into dicts which
makes the order of dict elements random; see
https://docs.python.org/3/using/cmdline.html#cmdoption-R

The test case could be changed to use sorted(dict.items()) always, but
there are multiple places where it would need to be applied.  Setting
the PYTHONHASHSEED environment variable to a stable value would probably
be easier.

/ Oskari


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Redesigning checkpoint_segments

2015-05-26 Thread Jeff Janes
On Thu, May 21, 2015 at 8:40 AM, Fujii Masao masao.fu...@gmail.com wrote:

 On Thu, May 21, 2015 at 3:53 PM, Jeff Janes jeff.ja...@gmail.com wrote:
  On Mon, Mar 16, 2015 at 11:05 PM, Jeff Janes jeff.ja...@gmail.com
 wrote:
 
  On Mon, Feb 23, 2015 at 8:56 AM, Heikki Linnakangas
  hlinnakan...@vmware.com wrote:
 
 
  Everyone seems to be happy with the names and behaviour of the GUCs, so
  committed.
 
 
 
  The docs suggest that max_wal_size will be respected during archive
  recovery (causing restartpoints and recycling), but I'm not seeing that
  happening.  Is this a doc bug or an implementation bug?
 
 
  I think the old behavior, where restartpoints were driven only by time
 and
  not by volume, was a misfeature.  But not a bug, because it was
 documented.
 
  One of the points of max_wal_size and its predecessor is to limit how big
  pg_xlog can grow.  But running out of disk space on pg_xlog is no more
 fun
  during archive recovery than it is during normal operations.  So why
  shouldn't max_wal_size be active during recovery?

 The following message of commit 7181530 explains why.

 In standby mode, respect checkpoint_segments in addition to
 checkpoint_timeout to trigger restartpoints. We used to deliberately
 only
 do time-based restartpoints, because if checkpoint_segments is small we
 would spend time doing restartpoints more often than really necessary.
 But now that restartpoints are done in bgwriter, they're not as
 disruptive as they used to be. Secondly, because streaming replication
 stores the streamed WAL files in pg_xlog, we want to clean it up more
 often to avoid running out of disk space when checkpoint_timeout is
 large
 and checkpoint_segments small.

 Previously users were more likely to fall into this trouble (i.e., too
 frequent
 occurrence of restartpoints) because the default value of
 checkpoint_segments
 was very small, I guess. But we increased the default of max_wal_size, so
 now
 the risk of that trouble seems to be smaller than before, and maybe we can
 allow max_wal_size to trigger restartpoints.


I see.  The old behavior was present for the same reason we decided to split
checkpoint_segments into max_wal_size and min_wal_size.

That is, the default checkpoint_segments was small, and it had to be small
because increasing it would cause more space to be used even when that
extra space was not helpful.

So perhaps we can consider this change a completion of the max_wal_size
work, rather than a new feature?

Cheers,

Jeff


Re: [HACKERS] ERROR: MultiXactId xxxx has not been created yet -- apparent wraparound

2015-05-26 Thread Alvaro Herrera
Paul Smith wrote:

 In the backup, for the row which will become broken, I get:
 
 ctid|   xmin   |   xmax
 +--+--
  (18722,29) | 23862661 | 23862661

Okay.

 Assuming it hadn't changed in the broken version (from application
 knowledge, it's unlikely to change), then on the broken version,
 select heap_page_items(get_raw_page('messages.msgdata', 18722)); gives:
 
  
 (21,3112,1,812,1332175474,894779760,223823413,(1443526721,25142),12141,18516,57,,)
  
 (23,4448,1,352,728903474,1466642284,1869042033,(1262568257,16743),16751,26455,65,011010101101001010101010001001101010110100101010100010100111011011001100100010110110111011000010101100101011001011011000110010110010110011001011,)
  
 (24,4184,1,264,619804,228,8000,(956969542,26694),22857,29251,98,110001101110011001100110011111001010001000101110010011010010100101100010110010010110011000100110101011100100010101100010111011000110101000101101011111000101011001001100100111001101001010011010011001101100011010001010001101101001011001101000101000110110011010101100001010100110010010100111011000101110010010100101101011010110101000101100101100101000101011111010111000101100110001110010010100101010010100100100101001101100111010110101111001101100101000110110101101101001101001100110,)
  
 (26,3928,1,252,1212837441,1232350037,1668772674,(728910135,11093),31284,16687,70,01101010001010001010010011001100110010101101001001110111011011100110011001101011011010011100100001001010100011000101001000101110110101000110101100101011011010110010001100100010101011101100111001001100110010110101101101100100101101101110101010100010011011100100101010110110001100100101011001001110001101101100011001100110,1717791858)
  
 (29,2296,1,812,1229271346,1934308693,1733443957,(1848337707,21358),31337,21592,88,,1178686785)
  
 (30,1944,1,352,947155032,1380218998,792031842,(1211650927,22579),30585,20532,80,,)
  (31,1560,1,380,23935627,23935627,1,(18722,31),40976,9510,24,,)
 (31 rows)
 
 
 The data for index 29 has 1934308693 in the header (which is the multixactid
 reported in the error message when trying to retrieve that row).

Right.  The ctids on items 21, 23, 24, 26,29 and 30 are pretty obviously
corrupted (page numbers are too large, and offset numbers are way above
normal offset numbers), as is probaly everything else about them.  Note
xmin should be a valid xid, but for example on item 21 it's 133 million
which obviously is not a valid xid on your system.  Items 26 and 29 even
have an OID, which the others do not (indicating wrong HEAP_HASOID
flag).

Note the lp_off fields.  It seems that all valid items have offsets
above 4096, and corrupted items have offsets below that.  The theory
that somehow the system wrote a bad filesystem page (4096 bytes) on the
lower half of the Postgres page (8192 bytes) sounds rather more probable
now.

 I'm struggling to find the definitions for the data returned by
 heap_page_items,

Try \df+ heap_page_items; or more easily, use this query instead:

 select * from heap_page_items(get_raw_page('messages.msgdata', 18722)); gives:

 If I look at the raw page data, it contains data relating to this
 installation, so it doesn't look like a page from a different Hyper-V guest.
 It also doesn't look like just random data from a memory error. It actually
 looks like legitimate data which should possibly have been written
 elsewhere. (I don't want to post it here because it contains possibly
 confidential data from one of our customers).

Sure.

-- 
Álvaro Herrerahttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] unaddressable bytes in BRIN

2015-05-26 Thread Alvaro Herrera
Alvaro Herrera wrote:
 Andres Freund just forwarded me a valgrind error report that Peter
 Geoghegan noticed:
 
 ==29892== Unaddressable byte(s) found during client check request
 ==29892==at 0x7D1317: PageAddItem (bufpage.c:314)

Fixed by Tom,
http://git.postgresql.org/pg/commitdiff/79f2b5d583e2e2a7ccd13e31d0e20a900c8f2f61

-- 
Álvaro Herrerahttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Order of columns in query is important?!

2015-05-26 Thread Tom Lane
CK Tan ck...@vitessedata.com writes:
 For Vitesse X, we mark all columns that were required in the query during
 JIT compile, and deform it in one shot. PG should be able to do the same.

See ExecProject().

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] ERROR: MultiXactId xxxx has not been created yet -- apparent wraparound

2015-05-26 Thread Alvaro Herrera
Paul Smith wrote:
 With PostgreSQL 9.3.5 on Ubuntu 12.04, I'm getting the error:
 
 ERROR:  MultiXactId 1934308693 has not been created yet -- apparent
 wraparound
 
 on doing various queries on our database. I don't think it is a wraparound -
 I think the tuple has mistakenly decided it has a MultiXactId related to it.

Yeah, that looks like the case.  According to your pg_controldata
output, you haven't used many multixacts at all:

 Latest checkpoint's NextMultiXactId:  216
 Latest checkpoint's NextMultiOffset:  439

and the fact that you only have these files:

 The pg_multixact directory contains two files
 members/
 offsets/

confirms this.  You XID counter is also rather low, only 24 million:

 Latest checkpoint's NextXID:  0/24005839

so it doesn't seem plausible that the single bit HEAP_XMAX_IS_MULTI was
turned on accidentally (something which I've never seen happen).

It doesn't look like a randomly corrupt page either; normally you would
see errors about mismatching page headers before you get to the point
where Xmax is read.  I wonder if the data page came from elsewhere.
Maybe you copied a data file from another database?

-- 
Álvaro Herrerahttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [COMMITTERS] pgsql: Row-Level Security Policies (RLS)

2015-05-26 Thread Alvaro Herrera
Stephen Frost wrote:
 Alvaro,
 
 * Alvaro Herrera (alvhe...@2ndquadrant.com) wrote:
  What do we need RowSecurityPolicy-policy_id for?  It seems to me that
  it is only used to determine whether the policy is the default deny
  one, so that it can later be removed if a hook adds a different one.
  This seems contrived as well as under-documented.  Why isn't a boolean
  flag sufficient?
 
 Thanks for taking a look!
 
 It's also used during relcache updates (see equalPolicy()).

Hmm, but the policy name is unique also, right?  So the policy_id check
is redundant ...

-- 
Álvaro Herrerahttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] Run a test instance from the source directory

2015-05-26 Thread Arjen Nienhuis
Hi,

I'd like to edit, compile, run, profile and debug in a loop. To do
that I put pg_sleep(3600) in the regression test suite.

That way I can easily change a file, 'make check', and then run some
SQL from another terminal.

Is there a 'better' way to run postgres after building? In a way that
doesn't break 'make check'?

Thanks,

Groeten, Arjen


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] about lob(idea)

2015-05-26 Thread Martín Marqués
El 25/05/15 a las 06:13, alex2010 escribió:
  Maybe it makes sense to add ability to store large objects in the same table 
 space as the table. 
 Or an opportunity - to specify table space for a large object.
 Do you have anything in todolists about it? 

This is something which has popped up on me more than once when giving
talks about storing files in PostgreSQL (last PgDay Argentina there was
quite a debate about it, particularly when bringing up the bytea - LO
comparison). The concerns the people exposed had different end goals.

One of the main concerns was the fact that all LO live in a common
catalog table (pg_largeobjects).

If the LO were stored per-database, with a some alike schema as
pg_largeobjects, then they could be placed on any tablespace available,
and even get dumped on a normal DB dump, which makes administration much
simpler.

Cheers,

-- 
Martín Marquéshttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] a few thoughts on the schedule

2015-05-26 Thread Bruce Momjian
On Tue, May 19, 2015 at 10:25:49AM -0400, Robert Haas wrote:
 Unfortunately, I don't have a lot of good ideas here.  I know that I
 spend as much time reviewing other people's patches as I can manage to
 find in my schedule, and I know a lot of people would probably like to
 see me do more of that.  I'm sure there are also some people who would
 like to see me do less of that, and at least a few who would like to
 see me die in a fire.  Ultimately, this is about money.  I have a job
 where I can devote some time to reviewing other people's patches,
 which is great: many people aren't that lucky.  Nobody has offered me
 a job where I can spend a higher percentage of my time doing that than
 I spend now.  Unless talented reviewers can get such job offers, we
 are going to continue to have trouble making ends meet.

I think this comes down to how many companies care about the health of
the community vs. how many care about getting their specific patches
committed.  In some sense this is the free rider problem:

http://en.wikipedia.org/wiki/Free_rider_problem

Historically employees who are Postgres community members have been good
at telling employers that they cannot be free riders, but there has been
some diminishment of that as Postgres has figured more prominently in
company success.  However, I have also heard that there is increased
concern among employers that the free rider problem is causing
structural problems in the community, which might lend support to free
rider-resisting employees.

-- 
  Bruce Momjian  br...@momjian.ushttp://momjian.us
  EnterpriseDB http://enterprisedb.com

  + Everyone has their own god. +


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] fsync-pgdata-on-recovery tries to write to more files than previously

2015-05-26 Thread Andres Freund
On 2015-05-26 10:41:12 -0400, Tom Lane wrote:
 Yeah.  Perhaps I missed it, but was the original patch motivated by
 actual failures that had been seen in the field, or was it just a
 hypothetical concern?

I'd mentioned that it might be a good idea to do this while
investingating a bug with unlogged tables that several parties had
reported. That patch has been fixed in a more granular fashion. From
there it took on kind of a life on its own.

It is somewhat interesting that similar code has been used in
pg_upgrade, via initdb -S, for a while now, without, to my knowledge, it
causing reported problem. I think the relevant difference is that that
code doesn't follow symlinks.  It's obviously also less exercised and
poeople might just have fixed up permissions when encountering troubles.

Abhijit, do you recall why the code was changed to follow all symlinks
in contrast to explicitly going through the tablespaces as initdb -S
does? I'm pretty sure early versions of the patch pretty much had a
verbatim copy of the initdb logic?  That logic is missing pg_xlog btw,
which is bad for pg_upgrade.



I *can* reproduce corrupted clusters without this without trying too
hard. I yesterday wrote a test for the crash testing infrastructure I've
on and off worked on (since 2011. Uhm) and I could reproduce a bunch of
corrupted clusters without this patch.  When randomly triggering crash
restarts shortly afterwards follwed by a simulated hw restart (stop
accepting writes via linux device mapper) while concurrent COPYs are
running, I can trigger a bunch of data corruptions.

Since then my computer in berlin has done 440 testruns with the patch,
and 450 without. I've gotten 7 errors without, 0 with. But the
instrumentation for detecting errors is really shitty (pkey lookup for
every expected row) and doesn't yet keep meaningful diagnosistics. So
this isn't particularly bulletproof either way.

I can't tell whether the patch is just masking yet another problem due
to different timings caused by the fsync, or whether it's fixing the
problem that we can forget to sync WAL segments.

Greetings,

Andres Freund


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Why does txid_current() assign new transaction-id?

2015-05-26 Thread Christoph Berg
Re: Tom Lane 2015-05-26 14207.1432650...@sss.pgh.pa.us
 Naoya Anzai nao-an...@xc.jp.nec.com writes:
  I have a question about txid_current().
  it is Why does txid_current() assign new transaction-id?.
 
 Consider
 
   begin;
   select txid_current();
   insert into my_table ...;
   commit;
 
 If we changed the code as you propose, the result of the SELECT would
 no longer have anything to do with the XID used for the insertion.

Still, exposing GetStableLatestTransactionId() on the SQL level would
make sense for monitoring transaction throughput. Currently if you do
that with txid_current(), you'll generate an (low, but measurable)
transaction load of 1/monitoring interval.

Christoph
-- 
c...@df7cb.de | http://www.df7cb.de/


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] fsync-pgdata-on-recovery tries to write to more files than previously

2015-05-26 Thread Magnus Hagander
On Tue, May 26, 2015 at 6:16 PM, Andrew Dunstan and...@dunslane.net wrote:


 On 05/26/2015 11:58 AM, Tom Lane wrote:

 Andrew Dunstan and...@dunslane.net writes:

 OK, I'm late to the party. But why exactly are we syncing absolutely
 everything? That seems over-broad.

 If we try to be selective, we risk errors of omission, which no one would
 ever notice until someone's data got eaten in a low-probability crash
 scenario.  It seems more robust (at least to me) to fsync everything we
 can find.  That does require more thought about error cases than went
 into the original patch ... but I think that we need more thought about
 error cases even if we do try to be selective.

 One thing perhaps we *should* be selective about, though, is which
 symlinks we try to follow.  I think that a good case could be made
 for ignoring symlinks everywhere except in the pg_tablespace directory.
 If we did, that would all by itself take care of the Debian scenario,
 if I understand that case correctly.


 People have symlinked the xlog directory. I've done it myself in the past.
 A better rule might be to ignore symlinks unless either they are in
 pg_tblspc or they are in the data directory and their name starts with
 pg_.


Not just people. initdb will symlink the xlog directory if you use -x.

-- 
 Magnus Hagander
 Me: http://www.hagander.net/
 Work: http://www.redpill-linpro.com/


Re: [HACKERS] fsync-pgdata-on-recovery tries to write to more files than previously

2015-05-26 Thread Andres Freund
On 2015-05-26 19:07:20 +0200, Andres Freund wrote:
 Abhijit, do you recall why the code was changed to follow all symlinks
 in contrast to explicitly going through the tablespaces as initdb -S
 does? I'm pretty sure early versions of the patch pretty much had a
 verbatim copy of the initdb logic?

Forgot to add Abhijit to CC list, sorry.

 That [initdb's] logic is missing pg_xlog btw, which is bad for pg_upgrade.

On second thought it's probably not actually bad for pg_upgrade's
specific usecase. It'll always end with a proper shutdown, so it'll
never need that WAL. But it'd be bad if anybody ever relied on initdb -S
in a different scenario.


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Why does txid_current() assign new transaction-id?

2015-05-26 Thread Tom Lane
Christoph Berg m...@debian.org writes:
 Still, exposing GetStableLatestTransactionId() on the SQL level would
 make sense for monitoring transaction throughput.

Perhaps, though I wonder why we should expose that and not just report the
result of ReadNewTransactionId() --- or in txid.c's case, the result of
GetNextXidAndEpoch().  In either case it would have to be a new function,
not unilaterally redefining what txid_current() does.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers