Re: [HACKERS] sepgsql: label regression test failed

2014-05-16 Thread Heikki Linnakangas

On 05/14/2014 07:33 AM, Sergey Muraviov wrote:

I've got this compiler warning:
  relation.c: In function ‘sepgsql_relation_drop’:
relation.c:472:25: warning: ‘tclass’ may be used uninitialized in this
function [-Wmaybe-uninitialized]
   sepgsql_avc_check_perms(object,
  ^


KaiGei, could you take a look at this warning, too? It looks like a 
genuine bug to me, but I'm not sure what we should do there instead.


- Heikki


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] Buffer manager scalability and correlated reference period

2014-05-16 Thread Peter Geoghegan
I have performed a new benchmark related to my ongoing experimentation
around caching and buffer manager scalability. The benchmark tests a
minor refinement of the prototype patch previously posted [1]. The
patch itself is still very much a prototype, and does not
significantly differ from what I originally posted.  The big
difference is usage_count starts at 6, and saturates at 30, plus I've
tried to reduce the impact of the prior prototype's gettimeofday()
calls by using clock_gettime() + CLOCK_MONOTONIC_COARSE. I previously
posted some numbers for a patch with just the former change.

I effectively disabled the background writer entirely here, since it
never helps. These are unlogged tables, so as to not have the outcome
obscured by checkpoint spikes during the sync phase that are more or
less inevitable here (I believe this is particularly true given the
hardware I'm using). Multiple client counts are tested, giving some
indication of the impact on scalability. The same gains previously
demonstrated in both transaction throughput and latency are once again
clearly in evidence.

I should emphasize that although I've talked a lot about LRU-K and
other more sophisticated algorithms, this proof of concept still only
adds a correlated reference period (while allowing usage_count to span
a larger range). I have yet to come up with something really
interesting, such as a patch that makes an inference about the
frequency of access of a page based on the recency of its penultimate
access (that is, a scheme that is similar to LRU-K, a scheme known to
be used in other systems [2] and thought to be widely influential).

The benchmark results are available from:
http://postgres-benchmarks.s3-website-us-east-1.amazonaws.com/collector-correlate

This report was built using pgbench-collector, my fork of
pgbench-tools. It is currently under very active development. The
largest difference between it and pgbench-tools is that it offers more
advanced reporting of operating system information, which can be
correlated with pgbench latency and TPS figures. It's hosted on
Github: https://github.com/petergeoghegan/pgbench-collector . Patches
are very much welcome.

This benchmark doesn't show very much new information. I thought that
it might be useful to have detailed operating system statistics to
work off of for each test. I believe that better benchmarking tools
will help the planned improvements to buffer manager scalability.
There is likely to be multiple angles of attack.

[1] 
http://www.postgresql.org/message-id/cam3swzrm7-qmogmczix5u49ndfbls_wwtx6vjsja+bn_li5...@mail.gmail.com

[2] http://db.cs.berkeley.edu/papers/fntdb07-architecture.pdf, Page 215
-- 
Peter Geoghegan


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] 9.0 PDF build broken

2014-05-16 Thread Andres Freund
On 2014-05-15 22:27:42 -0400, Tom Lane wrote:
 Peter Eisentraut pete...@gmx.net writes:
  Mysteriously, commit 6b2a1445ec8a631060c4cbff3f172bf31d3379b9 has broken
  the PDF build (openjade + pdfjadetex) in the 9.0 branch only.  The error
  is
 
  [256.0.28
  ! pdfTeX error (ext4): \pdfendlink ended up in different nesting level than 
  \pd
  fstartlink.

  I have reproduced this on two different platforms, and it affects only
  this branch.  I guess this change might have caused the page boundaries
  to shift in an unfortunate way.  I seem to recall we have had similar
  problems before.  Does anyone remember?
 
 Yeah.  This is caused by a hyperlink whose displayed text crosses a page
 boundary.  The only known fix is to change the text enough so the link
 no longer runs across a page boundary.  Unfortunately, pdfTeX is pretty
 unhelpful about identifying exactly where the problem is.  I seem to
 recall having posted a recipe about finding such problems.

Hm. Could a nonbreaking space be used inside such links to prevent the
issue (i.e. nbsp;)? That seems slightly more robust than trying to
rephrase sentences to avoid the issue.
It'd even be nicer if those could be added automagically in the tex
conversio. I have no idea how that works right now though.

Greetings,

Andres Freund

-- 
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Logical replication woes

2014-05-16 Thread Heikki Linnakangas

On 05/15/2014 11:58 PM, Andres Freund wrote:

On 2014-05-15 22:30:53 +0300, Heikki Linnakangas wrote:

Attached patch fixes things, but I want to add some regression tests
before commit.


Looks good to me.


Attached are two patches. One for the unitialized dbId/tsId issue; one
for the decoding bug. The former should be backpatched.


Thanks, committed.


Should you wonder about the slight reordering of the assignments in
RecordTransactionCommitPrepared() - I've made it more similar to 
RecordTransactionCommit()
to make it easier to see they are equivalent.


Makes sense.

- Heikki


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] SKIP LOCKED DATA (work in progress)

2014-05-16 Thread Craig Ringer
I'm rebasing another implementation of this against current HEAD at the
moment. It was well tested but has bitrotted a bit, in particular it
needs merging with the multixact changes (eep).

That should provide a useful basis for comparison and a chance to share
ideas.

I'll follow up with the patch and a git tree when it's ready, hopefully
tonight.

-- 
 Craig Ringer   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Trigger concurrent execution

2014-05-16 Thread Blagoj Petrushev
Hi,

Thanks all for being patient, apparently I didn't quite understand the
norms of trigger execution.

On 16 May 2014 07:55, Craig Ringer cr...@2ndquadrant.com wrote:
 On 05/16/2014 08:06 AM, Blagoj Petrushev wrote:
 Hi,

 I'm thinking of an extension to trigger functionality like this:

 CREATE TRIGGER trigger_name
 AFTER event
 ON table
 CONCURRENTLY EXECUTE PROCEDURE trigger_fc

 This would call the trigger after the end of the transaction.

 If after the end of the transaction is what you mean by
 concurrently, then that's the wrong word to choose.

 AFTER COMMIT ?

You're right, 'concurrently' is the wrong word.


 The concept of running a trigger concurrently just doesn't make sense
 in PostgreSQL, because the backend is single threaded. You wouldn't be
 able to run any SQL commands until the trigger finished.

 It isn't possible to do anything useful without a transaction, so
 PostgreSQL would need to start a transaction for the trigger and commit
 the transaction at the end, as if you'd run SELECT my_procedure();.
 Because it's outside the scope of the transaction it probably wouldn't
 be possible to do FOR EACH ROW with a NEW and OLD var,

Right. Didn't think of this.

 unless you
 stashed them as materialized rows in the queue of pending AFTER COMMIT
 triggers.

 Finally, because it's after transaction commit, you couldn't easily
 guarantee that the trigger would really run. If the backend crashed /
 the server was shut down  / etc after the commit but before your trigger
 finished, you'd have a committed transaction but the trigger would not
 run. To fix that you'd need to somehow make the trigger queue WAL-logged
 and run it during replay, which from my rather limited understanding of
 this area would be ... interesting to do. It'd also mean the trigger
 couldn't have any session context.

 This isn't easy, if it's practical at all.

 I have a big table with big text column article and a nullable
 tsvector column fts_article. On each insert or update that changes the
 article, I trigger-issue 'NOTIFY article_changed row_id', then, with a
 daemon listener, I catch the notification and update fts_article
 accordingly with my_fts_fc(article). The reason I don't do this
 directly in my trigger is because my_fts_fc is slow for big articles,
 fts_article has a gin index, and also, on heavy load, my listener can
 do these updates concurrently. Now, with a concurrent execution of
 triggers, I can just call my_fts_fc inside the trigger instead of the
 notify roundtrip.

 I don't think that really fits.

 It seems like you want to run the trigger procedure in the background on
 another back-end. That'd be quite cool, but also not trivial to do,
 especially if you wanted to guarantee that it happened reliably and in a
 crash-safe manner.



 --
  Craig Ringer   http://www.2ndQuadrant.com/
  PostgreSQL Development, 24x7 Support, Training  Services

I'll also try to reply to David G Johnston answer here, since I didn't
actually get the email.

Conceptually, trigger actions run in-transaction and can cause it to
ROLLBACK; so how would after the end of the transaction work?  Since the
easy way is to have COMMIT; block until all the AFTER event concurrent
triggers fire I presume you would want something more like a task queue for
background workers where, at commit, the function call is in place in a FIFO
queue and the calling session is allowed to move onto other activity.

As I see, for my problem, it would be great if there's a way to put a
function call in a queue in a background worker. I don't know how to
do this, however. But, the crash handling and NEW/OLD vars passing
would remain a problem nonetheless.

It is not clear what you mean by my listener can do these updates
concurrently? Concurrently with each other or concurrently with other DML
action on table?I assume you have multiple listeners since the potential
rate of insert of the documents is likely much greater than the rate of
update/indexing.

I meant concurrently with additional inserts to the table, as well as
the fact that my listener is able to receive new notifications while
updating some record.

Also, it would seem you'd typically want the GIN index to be updated once
the corresponding transaction committed and makes the rest of the data
available.  Or does your use case allow for some delay between the article
being in the database physically and it being available in the index?

The latter, namely, the search feature can 'lag' a few seconds after
the content update.

Thanks,
Blagoj Petrushev


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] btree_gist macaddr valgrind woes

2014-05-16 Thread Andres Freund
Hi,

After Heikki has fixed the bit btree_gist bugs my valgrind run shows a
bug in macaddr.

Presumably it's because the type is a fixed width type with a length of
6. I guess it's allocating stuff sizeof(macaddr) but then passes the
result around as a Datum which doesn't work well.

==14219== Invalid read of size 8
==14219==at 0x4C2BB50: memcpy@@GLIBC_2.14 (mc_replace_strmem.c:882)
==14219==by 0x45FA01: heap_fill_tuple (heaptuple.c:248)
==14219==by 0x4629A7: index_form_tuple (indextuple.c:132)
==14219==by 0x48ED05: gistFormTuple (gistutil.c:611)
==14219==by 0x499801: gistBuildCallback (gistbuild.c:476)
==14219==by 0x530412: IndexBuildHeapScan (index.c:2460)
==14219==by 0x4991D4: gistbuild (gistbuild.c:209)
==14219==by 0x8D813B: OidFunctionCall3Coll (fmgr.c:1650)
==14219==by 0x52F816: index_build (index.c:1962)
==14219==by 0x52E79D: index_create (index.c:1083)
==14219==by 0x5E9253: DefineIndex (indexcmds.c:600)
==14219==by 0x7A4DED: ProcessUtilitySlow (utility.c:1149)
==14219==  Address 0x67d37a0 is 8 bytes inside a block of size 12 client-defined
==14219==at 0x8FB83A: palloc0 (mcxt.c:698)
==14219==by 0x6B6C3C4: gbt_num_compress (btree_utils_num.c:31)
==14219==by 0x6B74BF1: gbt_macad_compress (btree_macaddr.c:113)
==14219==by 0x8D73E2: FunctionCall1Coll (fmgr.c:1298)
==14219==by 0x48EB39: gistcentryinit (gistutil.c:576)
==14219==by 0x48ECA1: gistFormTuple (gistutil.c:603)
==14219==by 0x499801: gistBuildCallback (gistbuild.c:476)
==14219==by 0x530412: IndexBuildHeapScan (index.c:2460)
==14219==by 0x4991D4: gistbuild (gistbuild.c:209)
==14219==by 0x8D813B: OidFunctionCall3Coll (fmgr.c:1650)
==14219==by 0x52F816: index_build (index.c:1962)
==14219==by 0x52E79D: index_create (index.c:1083)
==14219== 
==14219== Invalid read of size 8
==14219==at 0x4C2BA70: memcpy@@GLIBC_2.14 (mc_replace_strmem.c:882)
==14219==by 0x45FA01: heap_fill_tuple (heaptuple.c:248)
==14219==by 0x4629A7: index_form_tuple (indextuple.c:132)
==14219==by 0x48ED05: gistFormTuple (gistutil.c:611)
==14219==by 0x48C755: gistSplit (gist.c:1314)
==14219==by 0x488ED4: gistplacetopage (gist.c:242)
==14219==by 0x48C025: gistinserttuples (gist.c:1134)
==14219==by 0x48BFAA: gistinserttuple (gist.c:1093)
==14219==by 0x48AC56: gistdoinsert (gist.c:750)
==14219==by 0x49985B: gistBuildCallback (gistbuild.c:490)
==14219==by 0x530412: IndexBuildHeapScan (index.c:2460)
==14219==by 0x4991D4: gistbuild (gistbuild.c:209)
==14219==  Address 0x67dba58 is 8 bytes inside a block of size 12 client-defined
==14219==at 0x8FB6D2: palloc (mcxt.c:678)
==14219==by 0x6B74D04: gbt_macad_union (btree_macaddr.c:145)
==14219==by 0x8D74D0: FunctionCall2Coll (fmgr.c:1324)
==14219==by 0x48D7C8: gistMakeUnionItVec (gistutil.c:198)
==14219==by 0x496E8E: gistunionsubkeyvec (gistsplit.c:64)
==14219==by 0x496F05: gistunionsubkey (gistsplit.c:91)
==14219==by 0x498981: gistSplitByKey (gistsplit.c:689)
==14219==by 0x48C423: gistSplit (gist.c:1270)
==14219==by 0x488ED4: gistplacetopage (gist.c:242)
==14219==by 0x48C025: gistinserttuples (gist.c:1134)
==14219==by 0x48BFAA: gistinserttuple (gist.c:1093)
==14219==by 0x48AC56: gistdoinsert (gist.c:750)
==14219== 
==14250== Uninitialised byte(s) found during client check request
==14250==at 0x794E9F: PageAddItem (bufpage.c:314)
==14250==by 0x48CC95: gistfillbuffer (gistutil.c:42)
==14250==by 0x489CBC: gistplacetopage (gist.c:451)
==14250==by 0x48C025: gistinserttuples (gist.c:1134)
==14250==by 0x48C31C: gistfinishsplit (gist.c:1240)
==14250==by 0x48C0F5: gistinserttuples (gist.c:1159)
==14250==by 0x48BFAA: gistinserttuple (gist.c:1093)
==14250==by 0x48AC56: gistdoinsert (gist.c:750)
==14250==by 0x49985B: gistBuildCallback (gistbuild.c:490)
==14250==by 0x530412: IndexBuildHeapScan (index.c:2460)
==14250==by 0x4991D4: gistbuild (gistbuild.c:209)
==14250==by 0x8D813B: OidFunctionCall3Coll (fmgr.c:1650)
==14250==  Address 0x6efbcfb is 19 bytes inside a block of size 40 
client-defined
==14250==at 0x8FB83A: palloc0 (mcxt.c:698)
==14250==by 0x462954: index_form_tuple (indextuple.c:129)
==14250==by 0x48ED05: gistFormTuple (gistutil.c:611)
==14250==by 0x48C618: gistSplit (gist.c:1292)
==14250==by 0x488ED4: gistplacetopage (gist.c:242)
==14250==by 0x48C025: gistinserttuples (gist.c:1134)
==14250==by 0x48BFAA: gistinserttuple (gist.c:1093)
==14250==by 0x48AC56: gistdoinsert (gist.c:750)
==14250==by 0x49985B: gistBuildCallback (gistbuild.c:490)
==14250==by 0x530412: IndexBuildHeapScan (index.c:2460)
==14250==by 0x4991D4: gistbuild (gistbuild.c:209)
==14250==by 0x8D813B: OidFunctionCall3Coll (fmgr.c:1650)
==14250==  Uninitialised value was created by a heap allocation
==14250==at 0x8FB6D2: palloc (mcxt.c:678)
==14250==by 0x6B6CFCE: gbt_var_key_copy 

Re: [HACKERS] Priority table or Cache table

2014-05-16 Thread Sameer Thakur
Hello,
I applied the patch to current HEAD. There was one failure (attached), 
freelist.rej
http://postgresql.1045698.n5.nabble.com/file/n5804200/freelist.rej  

Compiled the provided pgbench.c and added  following in .conf 
shared_buffers = 128MB  # min 128kB
Shared_buffers=64MB
Priority_buffers=128MB

I was planning to performance test later hence different values.

But while executing pgbench the following assertion occurs

LOG:  database system is ready to accept connections
LOG:  autovacuum launcher started
TRAP: FailedAssertion(!(strategy_delta = 0), File: bufmgr.c, Line:
1435)
LOG:  background writer process (PID 10274) was terminated by signal 6:
Aborted
LOG:  terminating any other active server processes
WARNING:  terminating connection because of crash of another server process
DETAIL:  The postmaster has commanded this server process to roll back the
current transaction and exit, because another server process exited
abnormally and possibly corrupted shared memory.

Is there a way to avoid it? Am i making some mistake?
regards
Sameer



--
View this message in context: 
http://postgresql.1045698.n5.nabble.com/Priority-table-or-Cache-table-tp5792831p5804200.html
Sent from the PostgreSQL - hackers mailing list archive at Nabble.com.


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] sepgsql: label regression test failed

2014-05-16 Thread Kohei KaiGai
2014-05-16 16:26 GMT+09:00 Heikki Linnakangas hlinnakan...@vmware.com:
 On 05/14/2014 07:33 AM, Sergey Muraviov wrote:

 I've got this compiler warning:
   relation.c: In function ‘sepgsql_relation_drop’:
 relation.c:472:25: warning: ‘tclass’ may be used uninitialized in this
 function [-Wmaybe-uninitialized]
sepgsql_avc_check_perms(object,
   ^

 KaiGei, could you take a look at this warning, too? It looks like a genuine
 bug to me, but I'm not sure what we should do there instead.

This warning is harmless, because the code path that does not initialize
tclass variable (a case when dropped relation is index) never goes to
the code path that references tclass.
It just checks schema's {remove_name} permission, then jumps to
another code path for index, never backed.

BTW, I could not produce this message in my environment with -Wall.
(Fedora 20, gcc-4.8.2). Is it a newer compiler's wisdom?

Thanks,
-- 
KaiGai Kohei kai...@kaigai.gr.jp


sepgsql-fixup-maybe-uninitialized-warnning.patch
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Priority table or Cache table

2014-05-16 Thread Hans-Jürgen Schönig

On 20 Feb 2014, at 01:38, Tom Lane t...@sss.pgh.pa.us wrote:

 Haribabu Kommi kommi.harib...@gmail.com writes:
 I want to propose a new feature called priority table or cache table.
 This is same as regular table except the pages of these tables are having
 high priority than normal tables. These tables are very useful, where a
 faster query processing on some particular tables is expected.
 
 Why exactly does the existing LRU behavior of shared buffers not do
 what you need?
 
 I am really dubious that letting DBAs manage buffers is going to be
 an improvement over automatic management.
 
   regards, tom lane



the reason for a feature like that is to define an area of the application 
which needs more predictable runtime behaviour.
not all tables are created equals in term of importance. 

example: user authentication should always be supersonic fast while some 
reporting tables might gladly be forgotten even if they happened to be in use 
recently.

i am not saying that we should have this feature. 
however, there are definitely use cases which would justify some more control 
here.
otherwise people will fall back and use dirty tricks sucks as “SELECT count(*)” 
or so to emulate what we got here.

many thanks,

hans


--
Cybertec Schönig  Schönig GmbH
Gröhrmühlgasse 26
A-2700 Wiener Neustadt
Web: http://www.postgresql-support.de



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] SKIP LOCKED DATA (work in progress)

2014-05-16 Thread Craig Ringer
On 05/16/2014 04:46 PM, Craig Ringer wrote:
 
 I'll follow up with the patch and a git tree when it's ready, hopefully
 tonight.

Here's a rebased version of Simon's original patch that runs on current
master.

I still need to merge the isolation tests for it merged and sorted out,
and after re-reading it I'd like to change waitMode into an enum, not
just some #defines .

Hope it's useful for comparison and ideas.

-- 
 Craig Ringer   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services
From c9532344cdabcde4d1992659ec8be8ad4b0041ce Mon Sep 17 00:00:00 2001
From: Craig Ringer cr...@2ndquadrant.com
Date: Wed, 31 Jul 2013 18:37:46 +0800
Subject: [PATCH] implement FOR UPDATE SKIP LOCKED

---
 src/backend/access/heap/heapam.c  | 54 ---
 src/backend/executor/execMain.c   |  2 +-
 src/backend/executor/nodeLockRows.c   | 20 +++-
 src/backend/nodes/copyfuncs.c |  6 ++--
 src/backend/nodes/equalfuncs.c|  4 +--
 src/backend/nodes/outfuncs.c  |  6 ++--
 src/backend/nodes/readfuncs.c |  2 +-
 src/backend/optimizer/plan/planner.c  |  4 +--
 src/backend/optimizer/prep/prepsecurity.c |  8 ++---
 src/backend/optimizer/prep/prepunion.c|  2 +-
 src/backend/parser/analyze.c  | 17 +-
 src/backend/parser/gram.y | 23 -
 src/backend/rewrite/rewriteHandler.c  | 18 +--
 src/backend/utils/adt/ruleutils.c |  4 ++-
 src/include/access/heapam.h   |  2 +-
 src/include/nodes/execnodes.h |  5 ++-
 src/include/nodes/parsenodes.h|  4 +--
 src/include/nodes/plannodes.h |  2 +-
 src/include/parser/analyze.h  |  2 +-
 src/include/parser/kwlist.h   |  2 ++
 20 files changed, 118 insertions(+), 69 deletions(-)

diff --git a/src/backend/access/heap/heapam.c b/src/backend/access/heap/heapam.c
index b77c32c..8e7f2bf 100644
--- a/src/backend/access/heap/heapam.c
+++ b/src/backend/access/heap/heapam.c
@@ -56,6 +56,7 @@
 #include catalog/namespace.h
 #include miscadmin.h
 #include pgstat.h
+#include nodes/execnodes.h
 #include storage/bufmgr.h
 #include storage/freespace.h
 #include storage/lmgr.h
@@ -4081,7 +4082,7 @@ get_mxact_status_for_lock(LockTupleMode mode, bool is_update)
  *	cid: current command ID (used for visibility test, and stored into
  *		tuple's cmax if lock is successful)
  *	mode: indicates if shared or exclusive tuple lock is desired
- *	nowait: if true, ereport rather than blocking if lock not available
+ *	waitMode: mode describes handling of lock waits
  *	follow_updates: if true, follow the update chain to also lock descendant
  *		tuples.
  *
@@ -4105,7 +4106,7 @@ get_mxact_status_for_lock(LockTupleMode mode, bool is_update)
  */
 HTSU_Result
 heap_lock_tuple(Relation relation, HeapTuple tuple,
-CommandId cid, LockTupleMode mode, bool nowait,
+CommandId cid, LockTupleMode mode, int waitMode,
 bool follow_updates,
 Buffer *buffer, HeapUpdateFailureData *hufd)
 {
@@ -4208,16 +4209,21 @@ l3:
 		 */
 		if (!have_tuple_lock)
 		{
-			if (nowait)
+			if (waitMode == WAITMODE_NORMAL)
+LockTupleTuplock(relation, tid, mode);
+			else
 			{
 if (!ConditionalLockTupleTuplock(relation, tid, mode))
-	ereport(ERROR,
+{
+	if (waitMode == WAITMODE_SKIP_LOCKED)
+		return result;
+	else if (waitMode == WAITMODE_NOWAIT)
+		ereport(ERROR,
 			(errcode(ERRCODE_LOCK_NOT_AVAILABLE),
-	errmsg(could not obtain lock on row in relation \%s\,
-		   RelationGetRelationName(relation;
+			 errmsg(could not obtain lock on row in relation \%s\,
+	RelationGetRelationName(relation;
+}
 			}
-			else
-LockTupleTuplock(relation, tid, mode);
 			have_tuple_lock = true;
 		}
 
@@ -4419,21 +4425,26 @@ l3:
 	elog(ERROR, invalid lock mode in heap_lock_tuple);
 
 /* wait for multixact to end */
-if (nowait)
+if (waitMode == WAITMODE_NORMAL)
+	MultiXactIdWait((MultiXactId) xwait, status, infomask,
+	relation, tuple-t_data-t_ctid,
+	XLTW_Lock, NULL);
+else
 {
 	if (!ConditionalMultiXactIdWait((MultiXactId) xwait,
   status, infomask, relation,
 	tuple-t_data-t_ctid,
 	XLTW_Lock, NULL))
-		ereport(ERROR,
+	{
+		if (waitMode == WAITMODE_SKIP_LOCKED)
+			return result;
+		else if (waitMode == WAITMODE_NOWAIT)
+			ereport(ERROR,
 (errcode(ERRCODE_LOCK_NOT_AVAILABLE),
  errmsg(could not obtain lock on row in relation \%s\,
 		RelationGetRelationName(relation;
+	}
 }
-else
-	MultiXactIdWait((MultiXactId) xwait, status, infomask,
-	relation, tuple-t_data-t_ctid,
-	XLTW_Lock, NULL);
 
 /* if there are updates, follow the update chain */
 if (follow_updates 
@@ -4479,17 +4490,22 @@ l3:
 			else
 			{
 /* wait for regular 

Re: [HACKERS] btree_gist macaddr valgrind woes

2014-05-16 Thread Heikki Linnakangas

On 05/16/2014 01:28 PM, Andres Freund wrote:

Hi,

After Heikki has fixed the bit btree_gist bugs my valgrind run shows a
bug in macaddr.

Presumably it's because the type is a fixed width type with a length of
6. I guess it's allocating stuff sizeof(macaddr) but then passes the
result around as a Datum which doesn't work well.


The complaints seem to be coming from all the varlen data types, too, 
not just macaddr:



...
==14219==by 0x6B74BF1: gbt_macad_compress (btree_macaddr.c:113)
...
==14250==by 0x6B75C45: gbt_text_picksplit (btree_text.c:215)
...
==14280==by 0x6B76063: gbt_bytea_picksplit (btree_bytea.c:143)
...
==14292==by 0x6B76591: gbt_bit_union (btree_bit.c:173)
...


I'm not seeing those valgrind warnings when I run it. Can you give more 
details on how to reproduce?


- Heikki


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] btree_gist macaddr valgrind woes

2014-05-16 Thread Andres Freund
On 2014-05-16 17:08:40 +0300, Heikki Linnakangas wrote:
 On 05/16/2014 01:28 PM, Andres Freund wrote:
 Hi,
 
 After Heikki has fixed the bit btree_gist bugs my valgrind run shows a
 bug in macaddr.
 
 Presumably it's because the type is a fixed width type with a length of
 6. I guess it's allocating stuff sizeof(macaddr) but then passes the
 result around as a Datum which doesn't work well.
 
 The complaints seem to be coming from all the varlen data types, too, not
 just macaddr:

Hm, right. Didn't look very far yet.

 ...
 ==14219==by 0x6B74BF1: gbt_macad_compress (btree_macaddr.c:113)
 ...
 ==14250==by 0x6B75C45: gbt_text_picksplit (btree_text.c:215)
 ...
 ==14280==by 0x6B76063: gbt_bytea_picksplit (btree_bytea.c:143)
 ...
 ==14292==by 0x6B76591: gbt_bit_union (btree_bit.c:173)
 ...
 
 I'm not seeing those valgrind warnings when I run it. Can you give more
 details on how to reproduce?

I've added a bit more support for valgrind, but I don't see that being
relevant in those. Are you compiling with USE_VALGRIND?

Greetings,

Andres Freund

-- 
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] pg_basebackup: could not get transaction log end position from server: FATAL: could not open file ./pg_hba.conf~: Permission denied

2014-05-16 Thread Joshua D. Drake

Hello,

Can we get that fixed please? It seems rather bad behavior for 
pg_basebackup to fatal out because of the permissions on a backup file 
of all things. Instead, we should do WARNING and say skipped.


JD


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] pg_basebackup: could not get transaction log end position from server: FATAL: could not open file ./pg_hba.conf~: Permission denied

2014-05-16 Thread Andres Freund
Hi,

On 2014-05-16 07:28:42 -0700, Joshua D. Drake wrote:
 Can we get that fixed please? It seems rather bad behavior for pg_basebackup
 to fatal out because of the permissions on a backup file of all things.
 Instead, we should do WARNING and say skipped.

Doesn't sound like a good idea to me. We'd need to have a catalog of
common unimportant fileendings and such. We surely *do* want to error
out when we fail to copy an important file.

Greetings,

Andres Freund

-- 
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Scaling shared buffer eviction

2014-05-16 Thread Amit Kapila
On Thu, May 15, 2014 at 11:11 AM, Amit Kapila amit.kapil...@gmail.comwrote:


 Data with LWLOCK_STATS
 --
 BufMappingLocks

 PID 7245 lwlock main 38: shacq 41117 exacq 34561 blk 36274 spindelay 101
 PID 7310 lwlock main 39: shacq 40257 exacq 34219 blk 25886 spindelay 72
 PID 7308 lwlock main 40: shacq 41024 exacq 34794 blk 20780 spindelay 54
 PID 7314 lwlock main 40: shacq 41195 exacq 34848 blk 20638 spindelay 60
 PID 7288 lwlock main 41: shacq 84398 exacq 34750 blk 29591 spindelay 128
 PID 7208 lwlock main 42: shacq 63107 exacq 34737 blk 20133 spindelay 81
 PID 7245 lwlock main 43: shacq 278001 exacq 34601 blk 53473 spindelay 503
 PID 7307 lwlock main 44: shacq 85155 exacq 34440 blk 19062 spindelay 71
 PID 7301 lwlock main 45: shacq 61999 exacq 34757 blk 13184 spindelay 46
 PID 7235 lwlock main 46: shacq 41199 exacq 34622 blk 9031 spindelay 30
 PID 7324 lwlock main 46: shacq 40906 exacq 34692 blk 8799 spindelay 14
 PID 7292 lwlock main 47: shacq 41180 exacq 34604 blk 8241 spindelay 25
 PID 7303 lwlock main 48: shacq 40727 exacq 34651 blk 7567 spindelay 30
 PID 7230 lwlock main 49: shacq 60416 exacq 34544 blk 9007 spindelay 28
 PID 7300 lwlock main 50: shacq 44591 exacq 34763 blk 6687 spindelay 25
 PID 7317 lwlock main 50: shacq 44349 exacq 34583 blk 6861 spindelay 22
 PID 7305 lwlock main 51: shacq 62626 exacq 34671 blk 7864 spindelay 29
 PID 7301 lwlock main 52: shacq 60646 exacq 34512 blk 7093 spindelay 36
 PID 7324 lwlock main 53: shacq 39756 exacq 34359 blk 5138 spindelay 22

 This data shows that after patch, there is no contention
 for BufFreeListLock, rather there is a huge contention around
 BufMappingLocks.  I have checked that HEAD also has contention
 around BufMappingLocks.

 As per my analysis till now, I think reducing contention around
 BufFreelistLock is not sufficient to improve scalability, we need
 to work on reducing contention around BufMappingLocks as well.


To reduce the contention around BufMappingLocks, I have tried the patch
by just increasing the Number of Buffer Partitions, and it actually shows
a really significant increase in scalability both due to reduced contention
around BufFreeListLock and BufMappingLocks.  The real effect of reducing
contention around BufFreeListLock was hidden because the whole contention
was shifted to BufMappingLocks.  I have taken performance data for both
HEAD+increase_buf_part and Patch+increase_buf_part to clearly see the
benefit of reducing contention around BufFreeListLock.  This data has been
taken using pgbench read only load (Select).

Performance Data
---
HEAD + 64 = HEAD + (NUM_BUFFER_PARTITONS(64) +
   LOG2_NUM_LOCK_PARTITIONS(6))
V1 + 64  = PATCH + (NUM_BUFFER_PARTITONS(64) +
   LOG2_NUM_LOCK_PARTITIONS(6))
Similarly 128 means 128 buffer partitions

shared_buffers= 8GB
scale factor   = 3000
RAM - 64GB


Thrds (64) Thrds (128)  HEAD 45562 17128  HEAD + 64 57904 32810  V1 + 64
105557 81011  HEAD + 128 58383 32997  V1 + 128 110705 114544

shared_buffers= 8GB
scale factor   = 1000
RAM - 64GB


Thrds (64) Thrds (128)  HEAD 92142 31050  HEAD + 64 108120 86367  V1 + 64
117454 123429  HEAD + 128 107762 86902  V1 + 128 123641 124822


Observations
-
1. There is increase of upto 5 times in performance for data that can fit
in memory but not in shared buffers
2. Though there is a increase in performance by just increasing number
of buffer partitions, but it doesn't scale well (especially see the case
when partitions have increased to 128 from 64).


I have verified that contention has reduced around BufMappingLocks
by running the patch with LWLOCKS

BufFreeListLock
PID 17894 lwlock main 0: shacq 0 exacq 171 blk 27 spindelay 1


BufMappingLocks

PID 17902 lwlock main 38: shacq 12770 exacq 10104 blk 282 spindelay 0
PID 17924 lwlock main 39: shacq 11409 exacq 10257 blk 243 spindelay 0
PID 17929 lwlock main 40: shacq 13120 exacq 10739 blk 239 spindelay 0
PID 17940 lwlock main 41: shacq 11865 exacq 10373 blk 262 spindelay 0
..
..
PID 17831 lwlock main 162: shacq 12706 exacq 10267 blk 199 spindelay 0
PID 17826 lwlock main 163: shacq 11081 exacq 10256 blk 168 spindelay 0
PID 17903 lwlock main 164: shacq 11494 exacq 10375 blk 176 spindelay 0
PID 17899 lwlock main 165: shacq 12043 exacq 10485 blk 216 spindelay 0

We can clearly notice that the number for *blk* has reduced significantly
which shows that contention has reduced.


The patch is still in a shape to prove the merit of idea and I have just
changed the number of partitions so that if someone wants to verify
the performance for similar load, it can be done by just applying
the patch.


With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com


scalable_buffer_eviction_v2.patch
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:

[HACKERS] chr() is still too loose about UTF8 code points

2014-05-16 Thread Tom Lane
Quite some time ago, we made the chr() function accept Unicode code points
up to U+1F, which is the largest value that will fit in a 4-byte UTF8
string.  It was pointed out to me though that RFC3629 restricted the
original definition of UTF8 to only allow code points up to U+10 (for
compatibility with UTF16).  While that might not be something we feel we
need to follow exactly, pg_utf8_islegal implements the checking algorithm
specified by RFC3629, and will therefore reject points above U+10.

This means you can use chr() to create values that will be rejected on
dump and reload:

u8=# create table tt (f1 text);
CREATE TABLE
u8=# insert into tt values(chr('x001f'::bit(32)::int));
INSERT 0 1
u8=# select * from tt;
 f1 

 
(1 row)

u8=# \copy tt to 'junk' 
COPY 1
u8=# \copy tt from 'junk'
ERROR:  22021: invalid byte sequence for encoding UTF8: 0xf7 0xbf 0xbf 0xbf
CONTEXT:  COPY tt, line 1
LOCATION:  report_invalid_encoding, wchar.c:2011

I think this probably means we need to change chr() to reject code points
above 10.  Should we back-patch that, or just do it in HEAD?

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] pg_basebackup: could not get transaction log end position from server: FATAL: could not open file ./pg_hba.conf~: Permission denied

2014-05-16 Thread Joshua D. Drake

On 05/16/2014 07:30 AM, Andres Freund wrote:


Hi,

On 2014-05-16 07:28:42 -0700, Joshua D. Drake wrote:

Can we get that fixed please? It seems rather bad behavior for pg_basebackup
to fatal out because of the permissions on a backup file of all things.
Instead, we should do WARNING and say skipped.


Doesn't sound like a good idea to me. We'd need to have a catalog of
common unimportant fileendings and such. We surely *do* want to error
out when we fail to copy an important file.they



pg_hba.conf~ is not an important file.

We know what files are important, especially in $PGDATA, they aren't 
variable, so why is pg_basebackup failing on a file it should know or 
care nothing about?


JD




--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] pg_basebackup: could not get transaction log end position from server: FATAL: could not open file ./pg_hba.conf~: Permission denied

2014-05-16 Thread Tom Lane
Joshua D. Drake j...@commandprompt.com writes:
 On 05/16/2014 07:30 AM, Andres Freund wrote:
 On 2014-05-16 07:28:42 -0700, Joshua D. Drake wrote:
 Can we get that fixed please? It seems rather bad behavior for pg_basebackup
 to fatal out because of the permissions on a backup file of all things.
 Instead, we should do WARNING and say skipped.

 Doesn't sound like a good idea to me. We'd need to have a catalog of
 common unimportant fileendings and such. We surely *do* want to error
 out when we fail to copy an important file.they

 pg_hba.conf~ is not an important file.

Rather than blaming the messenger, you should be asking why there are
files in $PGDATA that the server can't read.  That's a recipe for trouble
no matter what.

Or in words of one syllable: this is a bug in your editor, not in Postgres.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] pg_basebackup: could not get transaction log end position from server: FATAL: could not open file ./pg_hba.conf~: Permission denied

2014-05-16 Thread Andres Freund
On 2014-05-16 08:13:04 -0700, Joshua D. Drake wrote:
 On 05/16/2014 07:30 AM, Andres Freund wrote:
 
 Hi,
 
 On 2014-05-16 07:28:42 -0700, Joshua D. Drake wrote:
 Can we get that fixed please? It seems rather bad behavior for pg_basebackup
 to fatal out because of the permissions on a backup file of all things.
 Instead, we should do WARNING and say skipped.
 
 Doesn't sound like a good idea to me. We'd need to have a catalog of
 common unimportant fileendings and such. We surely *do* want to error
 out when we fail to copy an important file.they
 
 
 pg_hba.conf~ is not an important file.

Where do we know that from?

 We know what files are important, especially in $PGDATA, they aren't
 variable, so why is pg_basebackup failing on a file it should know or care
 nothing about?

No, we don't necessarily. It'd e.g. bad to succeed if postgresql.conf
includes another file and we fail when backing that up even though it's
in the data directory.
But even otherwise it'd be a non-neglegible amount of code to enumerate
possibly important files (which wouldn't fully reliable. We can't access
the catalogs). Code that's only there to work around a user doing
something bad that's trivially fixable. Nah.

Greetings,

Andres Freund

-- 
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] btree_gist macaddr valgrind woes

2014-05-16 Thread Tom Lane
Heikki Linnakangas hlinnakan...@vmware.com writes:
 On 05/16/2014 01:28 PM, Andres Freund wrote:
 Presumably it's because the type is a fixed width type with a length of
 6. I guess it's allocating stuff sizeof(macaddr) but then passes the
 result around as a Datum which doesn't work well.

I've not tried valgrinding to check, but for macaddr I suspect the blame
can be laid at the feet of gbt_num_compress, which is creating a datum of
actual size 2 * tinfo-size (ie, 12 bytes for macaddr).  This is later
stored into an index column of declared type gbtreekey16, so the tuple
assembly code is expecting the datum to have size 16.

AFAICS there is no direct connection between the sizes the C code knows
about and the sizes of the datatypes declared as STORAGE options in
btree_gist--1.0.sql.  Probably a reasonable fix would be to add a field
to gbtree_ninfo that specifies the index datum size, and then make
gbt_num_compress palloc0 that size rather than 2 * tinfo-size.

 The complaints seem to be coming from all the varlen data types, too, 
 not just macaddr:

Dunno what's the problem for the varlena types, but that's a completely
separate code path.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] pg_basebackup: could not get transaction log end position from server: FATAL: could not open file ./pg_hba.conf~: Permission denied

2014-05-16 Thread Joshua D. Drake

On 05/16/2014 08:19 AM, Tom Lane wrote:


pg_hba.conf~ is not an important file.


Rather than blaming the messenger, you should be asking why there are
files in $PGDATA that the server can't read.  That's a recipe for trouble
no matter what.

Or in words of one syllable: this is a bug in your editor, not in Postgres.


Hardly and shows a distinct lack of user space experience. It also shows 
how useless pg_basebackup can be. Basically you are saying, Well 
yeah, there is this rogue file that doesn't belong, fark it... we will 
blow away a 2TB base backup and make you start over because.. meh, 
pg_basebackup is lazy.


Software is supposed to make our lives easier, not harder. I should be 
able to evaluate the errors for the conditions they create. This is why 
rsync is and for the forseeable future will be king for creating base 
backups.


JD




regards, tom lane






--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] pg_basebackup: could not get transaction log end position from server: FATAL: could not open file ./pg_hba.conf~: Permission denied

2014-05-16 Thread Joshua D. Drake

At a minimum:

Check to see if there is going to be a permission error BEFORE the base 
backup begins:


starting basebackup:
  checking perms: ERROR no access to pg_hba.conf~ base backup will fail

JD


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] pg_basebackup: could not get transaction log end position from server: FATAL: could not open file ./pg_hba.conf~: Permission denied

2014-05-16 Thread Andres Freund
Hi,

On 2014-05-16 08:45:12 -0700, Joshua D. Drake wrote:
 Software is supposed to make our lives easier, not harder. I should be able
 to evaluate the errors for the conditions they create. This is why rsync is
 and for the forseeable future will be king for creating base backups.

It's dangerous to ignore errors rsync errors other than 'file
vanished'. This hardly is an argument for your position.

Greetings,

Andres Freund

-- 
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] pg_basebackup: could not get transaction log end position from server: FATAL: could not open file ./pg_hba.conf~: Permission denied

2014-05-16 Thread Joshua D. Drake

On 05/16/2014 08:48 AM, Andres Freund wrote:


Hi,

On 2014-05-16 08:45:12 -0700, Joshua D. Drake wrote:

Software is supposed to make our lives easier, not harder. I should be able
to evaluate the errors for the conditions they create. This is why rsync is
and for the forseeable future will be king for creating base backups.


It's dangerous to ignore errors rsync errors other than 'file
vanished'. This hardly is an argument for your position.


Are you reading what I write?

I said. I should be able to evaluate the errors for the conditions they 
create.


I never suggested ignoring anything. The point is RSYNC gives me a 
chance at success, pg_basebackup does not (in respect to this specific 
condition).


JD



--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] pg_basebackup: could not get transaction log end position from server: FATAL: could not open file ./pg_hba.conf~: Permission denied

2014-05-16 Thread Magnus Hagander
On Fri, May 16, 2014 at 5:46 PM, Joshua D. Drake j...@commandprompt.comwrote:

 At a minimum:

 Check to see if there is going to be a permission error BEFORE the base
 backup begins:

 starting basebackup:
   checking perms: ERROR no access to pg_hba.conf~ base backup will fail


That's pretty much what it does if you enable progress meter. I realize you
don't necessarily want that one, but we could have a switch that still
tells the server to measure the size, but not actually print the output?
While it costs a bit of overhead to do that, that's certainly something
that's a lot more safe than ignoring errors.

-- 
 Magnus Hagander
 Me: http://www.hagander.net/
 Work: http://www.redpill-linpro.com/


Re: [HACKERS] pg_basebackup: could not get transaction log end position from server: FATAL: could not open file ./pg_hba.conf~: Permission denied

2014-05-16 Thread Andres Freund
On 2014-05-16 18:20:35 +0200, Magnus Hagander wrote:
 On Fri, May 16, 2014 at 5:46 PM, Joshua D. Drake 
 j...@commandprompt.comwrote:
 
  At a minimum:
 
  Check to see if there is going to be a permission error BEFORE the base
  backup begins:
 
  starting basebackup:
checking perms: ERROR no access to pg_hba.conf~ base backup will fail
 
 
 That's pretty much what it does if you enable progress meter. I realize you
 don't necessarily want that one, but we could have a switch that still
 tells the server to measure the size, but not actually print the output?
 While it costs a bit of overhead to do that, that's certainly something
 that's a lot more safe than ignoring errors.

Don't think it'll show you that error - that mode only stats() files,
right? So you'd need to add access() or open()s.

Greetings,

Andres Freund

-- 
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] pg_basebackup: could not get transaction log end position from server: FATAL: could not open file ./pg_hba.conf~: Permission denied

2014-05-16 Thread Magnus Hagander
On Fri, May 16, 2014 at 6:25 PM, Andres Freund and...@2ndquadrant.comwrote:

 On 2014-05-16 18:20:35 +0200, Magnus Hagander wrote:
  On Fri, May 16, 2014 at 5:46 PM, Joshua D. Drake j...@commandprompt.com
 wrote:
 
   At a minimum:
  
   Check to see if there is going to be a permission error BEFORE the base
   backup begins:
  
   starting basebackup:
 checking perms: ERROR no access to pg_hba.conf~ base backup will fail
 
 
  That's pretty much what it does if you enable progress meter. I realize
 you
  don't necessarily want that one, but we could have a switch that still
  tells the server to measure the size, but not actually print the output?
  While it costs a bit of overhead to do that, that's certainly something
  that's a lot more safe than ignoring errors.

 Don't think it'll show you that error - that mode only stats() files,
 right? So you'd need to add access() or open()s.


You're right, we don't. I thought we did, but was clearly remembering wrong.

I guess we could add an access() call to that codepath though. Not sure if
that's going to cause any real overhead compared to the rest of what we're
doing anyway?

-- 
 Magnus Hagander
 Me: http://www.hagander.net/
 Work: http://www.redpill-linpro.com/


Re: [HACKERS] pg_basebackup: could not get transaction log end position from server: FATAL: could not open file ./pg_hba.conf~: Permission denied

2014-05-16 Thread Andres Freund
On 2014-05-16 18:29:25 +0200, Magnus Hagander wrote:
 On Fri, May 16, 2014 at 6:25 PM, Andres Freund and...@2ndquadrant.comwrote:
 
  On 2014-05-16 18:20:35 +0200, Magnus Hagander wrote:
   On Fri, May 16, 2014 at 5:46 PM, Joshua D. Drake j...@commandprompt.com
  wrote:
  
At a minimum:
   
Check to see if there is going to be a permission error BEFORE the base
backup begins:
   
starting basebackup:
  checking perms: ERROR no access to pg_hba.conf~ base backup will fail
  
  
   That's pretty much what it does if you enable progress meter. I realize
  you
   don't necessarily want that one, but we could have a switch that still
   tells the server to measure the size, but not actually print the output?
   While it costs a bit of overhead to do that, that's certainly something
   that's a lot more safe than ignoring errors.
 
  Don't think it'll show you that error - that mode only stats() files,
  right? So you'd need to add access() or open()s.
 
 
 You're right, we don't. I thought we did, but was clearly remembering wrong.
 
 I guess we could add an access() call to that codepath though. Not sure if
 that's going to cause any real overhead compared to the rest of what we're
 doing anyway?

It's not free. But I don't think it'd seriously matter in comparison.

But it doesn't protect you if the file is created during the backup -
which as you know can take a long time. For example because somebody
felt the need to increase wal_keep_segments.

Greetings,

Andres Freund

-- 
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] chr() is still too loose about UTF8 code points

2014-05-16 Thread Heikki Linnakangas

On 05/16/2014 06:05 PM, Tom Lane wrote:

Quite some time ago, we made the chr() function accept Unicode code points
up to U+1F, which is the largest value that will fit in a 4-byte UTF8
string.  It was pointed out to me though that RFC3629 restricted the
original definition of UTF8 to only allow code points up to U+10 (for
compatibility with UTF16).  While that might not be something we feel we
need to follow exactly, pg_utf8_islegal implements the checking algorithm
specified by RFC3629, and will therefore reject points above U+10.

This means you can use chr() to create values that will be rejected on
dump and reload:

u8=# create table tt (f1 text);
CREATE TABLE
u8=# insert into tt values(chr('x001f'::bit(32)::int));
INSERT 0 1
u8=# select * from tt;
  f1


(1 row)

u8=# \copy tt to 'junk'
COPY 1
u8=# \copy tt from 'junk'
ERROR:  22021: invalid byte sequence for encoding UTF8: 0xf7 0xbf 0xbf 0xbf
CONTEXT:  COPY tt, line 1
LOCATION:  report_invalid_encoding, wchar.c:2011

I think this probably means we need to change chr() to reject code points
above 10.  Should we back-patch that, or just do it in HEAD?


+1 for back-patching. A value that cannot be restored is bad, and I 
can't imagine any legitimate use case for producing a Unicode character 
larger than U+10 with chr(x), when the rest of the system doesn't 
handle it. Fully supporting such values might be useful, but that's a 
different story.


- Heikki


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] pg_basebackup: could not get transaction log end position from server: FATAL: could not open file ./pg_hba.conf~: Permission denied

2014-05-16 Thread Joshua D. Drake

On 05/16/2014 09:20 AM, Magnus Hagander wrote:


On Fri, May 16, 2014 at 5:46 PM, Joshua D. Drake j...@commandprompt.com
mailto:j...@commandprompt.com wrote:

At a minimum:

Check to see if there is going to be a permission error BEFORE the
base backup begins:

starting basebackup:
   checking perms: ERROR no access to pg_hba.conf~ base backup will fail


That's pretty much what it does if you enable progress meter. I realize
you don't necessarily want that one, but we could have a switch that
still tells the server to measure the size, but not actually print the
output? While it costs a bit of overhead to do that, that's certainly
something that's a lot more safe than ignoring errors.


That seems reasonable.

JD


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] chr() is still too loose about UTF8 code points

2014-05-16 Thread Andrew Dunstan


On 05/16/2014 12:43 PM, Heikki Linnakangas wrote:

On 05/16/2014 06:05 PM, Tom Lane wrote:
Quite some time ago, we made the chr() function accept Unicode code 
points
up to U+1F, which is the largest value that will fit in a 4-byte 
UTF8

string.  It was pointed out to me though that RFC3629 restricted the
original definition of UTF8 to only allow code points up to U+10 
(for

compatibility with UTF16).  While that might not be something we feel we
need to follow exactly, pg_utf8_islegal implements the checking 
algorithm

specified by RFC3629, and will therefore reject points above U+10.

This means you can use chr() to create values that will be rejected on
dump and reload:

u8=# create table tt (f1 text);
CREATE TABLE
u8=# insert into tt values(chr('x001f'::bit(32)::int));
INSERT 0 1
u8=# select * from tt;
  f1


(1 row)

u8=# \copy tt to 'junk'
COPY 1
u8=# \copy tt from 'junk'
ERROR:  22021: invalid byte sequence for encoding UTF8: 0xf7 0xbf 
0xbf 0xbf

CONTEXT:  COPY tt, line 1
LOCATION:  report_invalid_encoding, wchar.c:2011

I think this probably means we need to change chr() to reject code 
points

above 10.  Should we back-patch that, or just do it in HEAD?


+1 for back-patching. A value that cannot be restored is bad, and I 
can't imagine any legitimate use case for producing a Unicode 
character larger than U+10 with chr(x), when the rest of the 
system doesn't handle it. Fully supporting such values might be 
useful, but that's a different story.





My understanding us that U+10 is the highest legal Unicode code 
point anyway. So this is really just tightening our routines to make 
sure we don't produce an invalid value. We won't be disallowing anything 
that is legal Unicode.


cheers

andrew


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] Re: pg_basebackup: could not get transaction log end position from server: FATAL: could not open file ./pg_hba.conf~: Permission denied

2014-05-16 Thread David G Johnston
Andres Freund-3 wrote
 On 2014-05-16 18:29:25 +0200, Magnus Hagander wrote:
 On Fri, May 16, 2014 at 6:25 PM, Andres Freund lt;

 andres@

 gt;wrote:
 
  On 2014-05-16 18:20:35 +0200, Magnus Hagander wrote:
   On Fri, May 16, 2014 at 5:46 PM, Joshua D. Drake lt;

 jd@

 gt;  wrote:
  
At a minimum:
   
Check to see if there is going to be a permission error BEFORE the
 base
backup begins:
   
starting basebackup:
  checking perms: ERROR no access to pg_hba.conf~ base backup will
 fail
  
  
   That's pretty much what it does if you enable progress meter. I
 realize
  you
   don't necessarily want that one, but we could have a switch that
 still
   tells the server to measure the size, but not actually print the
 output?
   While it costs a bit of overhead to do that, that's certainly
 something
   that's a lot more safe than ignoring errors.
 
  Don't think it'll show you that error - that mode only stats() files,
  right? So you'd need to add access() or open()s.
 
 
 You're right, we don't. I thought we did, but was clearly remembering
 wrong.
 
 I guess we could add an access() call to that codepath though. Not sure
 if
 that's going to cause any real overhead compared to the rest of what
 we're
 doing anyway?
 
 It's not free. But I don't think it'd seriously matter in comparison.
 
 But it doesn't protect you if the file is created during the backup -
 which as you know can take a long time. For example because somebody
 felt the need to increase wal_keep_segments.
 
 Greetings,
 
 Andres Freund

Can we simply backup the non-data parts of $PGDATA first then move onto the
data-parts?  For the files that we'd be dealing with it would be
sufficiently quick to just try and fail, immediately, then check for all
possible preconditions first.  The main issue seems to be the case where the
2TB of data get backed-up and then a small 1k file blows away all that work. 
Lets do those 1k files first.

David J.




--
View this message in context: 
http://postgresql.1045698.n5.nabble.com/pg-basebackup-could-not-get-transaction-log-end-position-from-server-FATAL-could-not-open-file-pg-hbd-tp5804225p5804257.html
Sent from the PostgreSQL - hackers mailing list archive at Nabble.com.


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] chr() is still too loose about UTF8 code points

2014-05-16 Thread Tom Lane
Heikki Linnakangas hlinnakan...@vmware.com writes:
 On 05/16/2014 06:05 PM, Tom Lane wrote:
 I think this probably means we need to change chr() to reject code points
 above 10.  Should we back-patch that, or just do it in HEAD?

 +1 for back-patching. A value that cannot be restored is bad, and I 
 can't imagine any legitimate use case for producing a Unicode character 
 larger than U+10 with chr(x), when the rest of the system doesn't 
 handle it. Fully supporting such values might be useful, but that's a 
 different story.

Well, AFAICT the rest of the system does handle any code point up to
U+1F.  It's only pg_utf8_islegal that's being picky.  So another
possible answer is to weaken the check in pg_utf8_islegal.  However,
that could create interoperability concerns with other software, and
as you say the use-case for larger values seems pretty thin.

Actually, after re-reading the spec there's more to it than this:
chr() will allow creating utf8 sequences that correspond to the
surrogate-pair codes, which are expressly disallowed in UTF8 by
the RFCs.  Maybe we should apply pg_utf8_islegal to the result
string rather than duplicating its checks?

BTW, there are various places that have comments or ifdefd-out code
anticipating possible future support of 5- or 6-byte UTF8 sequences,
which were specified in RFC2279 but then rescinded by RFC3629.
I guess as a matter of cleanup we should think about removing that
stuff.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] chr() is still too loose about UTF8 code points

2014-05-16 Thread Noah Misch
On Fri, May 16, 2014 at 11:05:08AM -0400, Tom Lane wrote:
 I think this probably means we need to change chr() to reject code points
 above 10.  Should we back-patch that, or just do it in HEAD?

The compatibility risks resemble those associated with the fixes for bug
#9210, so I recommend HEAD only:

http://www.postgresql.org/message-id/flat/20140220043940.ga3064...@tornado.leadboat.com

-- 
Noah Misch
EnterpriseDB http://www.enterprisedb.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] chr() is still too loose about UTF8 code points

2014-05-16 Thread Tom Lane
Noah Misch n...@leadboat.com writes:
 On Fri, May 16, 2014 at 11:05:08AM -0400, Tom Lane wrote:
 I think this probably means we need to change chr() to reject code points
 above 10.  Should we back-patch that, or just do it in HEAD?

 The compatibility risks resemble those associated with the fixes for bug
 #9210, so I recommend HEAD only:

 http://www.postgresql.org/message-id/flat/20140220043940.ga3064...@tornado.leadboat.com

While I'd be willing to ignore that risk so far as code points above
10 go, if we want pg_utf8_islegal to be happy then we will also
have to reject surrogate-pair code points.  It's not beyond the realm
of possibility that somebody is intentionally generating such code
points with chr(), despite the dump/reload hazard.  So now I agree
that this is sounding more like a major-version-only behavioral change.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] %d in log_line_prefix doesn't work for bg/autovacuum workers

2014-05-16 Thread Andres Freund
Hi,

elog.c's log_line_prefix() processes %d with:
case 'd':
if (MyProcPort)
{
const char *dbname = MyProcPort-database_name;

if (dbname == NULL || *dbname == '\0')
dbname = _([unknown]);
if (padding != 0)
appendStringInfo(buf, %*s, padding, dbname);
else
appendStringInfoString(buf, dbname);
}
else if (padding != 0)
appendStringInfoSpaces(buf,
   padding  0 ? padding : -padding);
write_csvlog() uses similar logic.

Unfortunately MyProcPort only exists in user initiated backends.

It's imo pretty annoying that neither bgworkers nor autovacuum workers
show the proper database in the log. Why don't we just populate a global
variable in InitPostgres() once we're sure which database the backend is
connected to? We could fill fake MyProcPorts, but that doesn't seem like
a good idea to me.

Greetings,

Andres Freund

-- 
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] %d in log_line_prefix doesn't work for bg/autovacuum workers

2014-05-16 Thread Tom Lane
Andres Freund and...@2ndquadrant.com writes:
 elog.c's log_line_prefix() processes %d with:
 case 'd':
 if (MyProcPort)
 {
 const char *dbname = MyProcPort-database_name;

 if (dbname == NULL || *dbname == '\0')
 dbname = _([unknown]);
 if (padding != 0)
 appendStringInfo(buf, %*s, padding, dbname);

Not directly related to your gripe, but: where did this padding logic
come from, and what prevents it from creating invalidly-encoded output by
means of truncating multibyte characters in the middle?  Not to mention
that if glibc has a different idea of the prevailing encoding than we do,
it is quite likely to mess up completely.  We've been burnt badly enough
by use of this coding technique that it should never have been accepted in
someplace as critical as elog.c.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] %d in log_line_prefix doesn't work for bg/autovacuum workers

2014-05-16 Thread Andres Freund
On 2014-05-16 14:02:44 -0400, Tom Lane wrote:
 Andres Freund and...@2ndquadrant.com writes:
  elog.c's log_line_prefix() processes %d with:
  case 'd':
  if (MyProcPort)
  {
  const char *dbname = MyProcPort-database_name;
 
  if (dbname == NULL || *dbname == '\0')
  dbname = _([unknown]);
  if (padding != 0)
  appendStringInfo(buf, %*s, padding, dbname);
 
 Not directly related to your gripe, but: where did this padding logic
 come from, and what prevents it from creating invalidly-encoded output by
 means of truncating multibyte characters in the middle?

Isn't that syntax just the *minimal* width?

Greetings,

Andres Freund

-- 
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] chr() is still too loose about UTF8 code points

2014-05-16 Thread David G Johnston
Tom Lane-2 wrote
 Noah Misch lt;

 noah@

 gt; writes:
 On Fri, May 16, 2014 at 11:05:08AM -0400, Tom Lane wrote:
 I think this probably means we need to change chr() to reject code
 points
 above 10.  Should we back-patch that, or just do it in HEAD?
 
 The compatibility risks resemble those associated with the fixes for bug
 #9210, so I recommend HEAD only:
 
 http://www.postgresql.org/message-id/flat/

 20140220043940.GA3064539@.leadboat

 
 While I'd be willing to ignore that risk so far as code points above
 10 go, if we want pg_utf8_islegal to be happy then we will also
 have to reject surrogate-pair code points.  It's not beyond the realm
 of possibility that somebody is intentionally generating such code
 points with chr(), despite the dump/reload hazard.  So now I agree
 that this is sounding more like a major-version-only behavioral change.

I would tend to agree on principle - though since this does fall in a
grey-area does 9.4 qualify for this bug-fix.

David J.



--
View this message in context: 
http://postgresql.1045698.n5.nabble.com/chr-is-still-too-loose-about-UTF8-code-points-tp5804232p5804270.html
Sent from the PostgreSQL - hackers mailing list archive at Nabble.com.


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] %d in log_line_prefix doesn't work for bg/autovacuum workers

2014-05-16 Thread Tom Lane
Andres Freund and...@2ndquadrant.com writes:
 On 2014-05-16 14:02:44 -0400, Tom Lane wrote:
 Not directly related to your gripe, but: where did this padding logic
 come from, and what prevents it from creating invalidly-encoded output by
 means of truncating multibyte characters in the middle?

 Isn't that syntax just the *minimal* width?

Ah, you're right, so sprintf shouldn't attempt to truncate the data
anywhere.  Nonetheless, this has created a hazard that wasn't there
before: with any padding spec, sprintf has to determine the
width-in-characters of the supplied string.  If glibc thinks the data
is invalid according to *its* idea of the prevailing encoding, it will
do something we won't like.  My recollection is it refuses to print
anything at all.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] chr() is still too loose about UTF8 code points

2014-05-16 Thread Tom Lane
David G Johnston david.g.johns...@gmail.com writes:
 Tom Lane-2 wrote
 While I'd be willing to ignore that risk so far as code points above
 10 go, if we want pg_utf8_islegal to be happy then we will also
 have to reject surrogate-pair code points.  It's not beyond the realm
 of possibility that somebody is intentionally generating such code
 points with chr(), despite the dump/reload hazard.  So now I agree
 that this is sounding more like a major-version-only behavioral change.

 I would tend to agree on principle - though since this does fall in a
 grey-area does 9.4 qualify for this bug-fix.

I don't think it's too late to change this in 9.4.  The discussion was
about whether to back-patch.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] %d in log_line_prefix doesn't work for bg/autovacuum workers

2014-05-16 Thread Andres Freund
On 2014-05-16 14:51:01 -0400, Tom Lane wrote:
 Andres Freund and...@2ndquadrant.com writes:
  On 2014-05-16 14:02:44 -0400, Tom Lane wrote:
  Not directly related to your gripe, but: where did this padding logic
  come from, and what prevents it from creating invalidly-encoded output by
  means of truncating multibyte characters in the middle?
 
  Isn't that syntax just the *minimal* width?
 
 Ah, you're right, so sprintf shouldn't attempt to truncate the data
 anywhere.  Nonetheless, this has created a hazard that wasn't there
 before: with any padding spec, sprintf has to determine the
 width-in-characters of the supplied string.

Shouldn't we already have setup the correct locale at that point beside
a couple of lines inside InitPostgres()?

 If glibc thinks the data is invalid according to *its* idea of the
 prevailing encoding, it will do something we won't like.  My
 recollection is it refuses to print anything at all.

Since this is just about things like database,user, application name,
not the actual log message it's probably not too bad if that
happens. I.e. still debuggable. The message itself is separately
appended, so it should still go through unhindered.

Nnote that I haven't been involved in the feature and haven't thought
much about related issues...

Greetings,

Andres Freund

-- 
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] btree_gist macaddr valgrind woes

2014-05-16 Thread Tom Lane
Andres Freund and...@2ndquadrant.com writes:
 After Heikki has fixed the bit btree_gist bugs my valgrind run shows a
 bug in macaddr.

AFAICT none of these traces represent live bugs, but I've pushed fixes
for them into HEAD.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] btree_gist macaddr valgrind woes

2014-05-16 Thread Heikki Linnakangas

On 05/16/2014 06:43 PM, Tom Lane wrote:

Heikki Linnakangas hlinnakan...@vmware.com writes:

On 05/16/2014 01:28 PM, Andres Freund wrote:

Presumably it's because the type is a fixed width type with a length of
6. I guess it's allocating stuff sizeof(macaddr) but then passes the
result around as a Datum which doesn't work well.


I've not tried valgrinding to check, but for macaddr I suspect the blame
can be laid at the feet of gbt_num_compress, which is creating a datum of
actual size 2 * tinfo-size (ie, 12 bytes for macaddr).  This is later
stored into an index column of declared type gbtreekey16, so the tuple
assembly code is expecting the datum to have size 16.


Ah, I see.


AFAICS there is no direct connection between the sizes the C code knows
about and the sizes of the datatypes declared as STORAGE options in
btree_gist--1.0.sql.  Probably a reasonable fix would be to add a field
to gbtree_ninfo that specifies the index datum size, and then make
gbt_num_compress palloc0 that size rather than 2 * tinfo-size.


ISTM the correct fix would be to define a gbtreekey12 data type and 
use that. That's not back-patchable though, and introducing a whole new 
type to save a few bytes is hardly worth it. What you did makes sense.



The complaints seem to be coming from all the varlen data types, too,
not just macaddr:


Dunno what's the problem for the varlena types, but that's a completely
separate code path.


It seems to be a similar issue to what I fixed in the bit type earlier. 
When the lower+upper keys are stored as one varlen Datum, the padding 
bytes between them are not zeroed. With these datatypes (i.e. not 
varbit) it doesn't lead to any real errors, however, because the padding 
bytes are never read. Valgrind is just complaining that we're storing 
uninitialized data on disk, even though it's never looked at.


Nevertheless, seems best to fix that, if only to silence Valgrind.

(And you just beat me to it. Thanks!)

- Heikki


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] btree_gist macaddr valgrind woes

2014-05-16 Thread Tom Lane
Heikki Linnakangas hlinnakan...@vmware.com writes:
 On 05/16/2014 06:43 PM, Tom Lane wrote:
 Dunno what's the problem for the varlena types, but that's a completely
 separate code path.

 It seems to be a similar issue to what I fixed in the bit type earlier. 
 When the lower+upper keys are stored as one varlen Datum, the padding 
 bytes between them are not zeroed. With these datatypes (i.e. not 
 varbit) it doesn't lead to any real errors, however, because the padding 
 bytes are never read. Valgrind is just complaining that we're storing 
 uninitialized data on disk, even though it's never looked at.

Yeah, I came to the same conclusions.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] btree_gist macaddr valgrind woes

2014-05-16 Thread Tom Lane
Heikki Linnakangas hlinnakan...@vmware.com writes:
 ISTM the correct fix would be to define a gbtreekey12 data type and 
 use that. That's not back-patchable though, and introducing a whole new 
 type to save a few bytes is hardly worth it. What you did makes sense.

BTW, the *real* problem with all this stuff is that the gbtreekeyNN types
are declared as having int alignment, even though some of the opclasses
store double-aligned types in them.  I imagine it's possible to provoke
bus errors on machines that are picky about alignment.  The first column
of an index is safe enough because index tuples will be double-aligned
anyway, but it seems like there's a hazard for lower-order columns.
This is something we cannot fix compatibly :-(

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Re: pg_basebackup: could not get transaction log end position from server: FATAL: could not open file ./pg_hba.conf~: Permission denied

2014-05-16 Thread Heikki Linnakangas

On 05/16/2014 08:11 PM, David G Johnston wrote:

Can we simply backup the non-data parts of $PGDATA first then move onto the
data-parts?  For the files that we'd be dealing with it would be
sufficiently quick to just try and fail, immediately, then check for all
possible preconditions first.  The main issue seems to be the case where the
2TB of data get backed-up and then a small 1k file blows away all that work.
Lets do those 1k files first.


You'll still need to distinguish data and non-data parts somehow. 
One idea would be to backup any files in the top directory first, before 
recursing into the subdirectories. That would've caught the OP's case, 
and probably many other typical cases where you drop something 
unexpected into $PGDATA. You could still have something funny nested 
deep in the data directory, but that's much less common.


- Heikki


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] btree_gist macaddr valgrind woes

2014-05-16 Thread Andres Freund
Hi,

On 2014-05-16 15:31:52 -0400, Tom Lane wrote:
 Heikki Linnakangas hlinnakan...@vmware.com writes:
  On 05/16/2014 06:43 PM, Tom Lane wrote:
  Dunno what's the problem for the varlena types, but that's a completely
  separate code path.
 
  It seems to be a similar issue to what I fixed in the bit type earlier. 
  When the lower+upper keys are stored as one varlen Datum, the padding 
  bytes between them are not zeroed. With these datatypes (i.e. not 
  varbit) it doesn't lead to any real errors, however, because the padding 
  bytes are never read. Valgrind is just complaining that we're storing 
  uninitialized data on disk, even though it's never looked at.
 
 Yeah, I came to the same conclusions.

Thanks for fixing, looks good here.

Greetings,

Andres Freund

-- 
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] %d in log_line_prefix doesn't work for bg/autovacuum workers

2014-05-16 Thread Tom Lane
Andres Freund and...@2ndquadrant.com writes:
 On 2014-05-16 14:51:01 -0400, Tom Lane wrote:
 Ah, you're right, so sprintf shouldn't attempt to truncate the data
 anywhere.  Nonetheless, this has created a hazard that wasn't there
 before: with any padding spec, sprintf has to determine the
 width-in-characters of the supplied string.

 Shouldn't we already have setup the correct locale at that point beside
 a couple of lines inside InitPostgres()?

Define correct locale.  Keep in mind that names in shared catalogs
might be in any encoding.  But according to previous research, we don't
really guarantee that glibc thinks the encoding is what we think, anyway.
The commit messages for 54cd4f04576833abc394e131288bf3dd7dcf4806 and
ed437e2b27c48219a78f3504b0d05c17c2082d02 are relevant here.  The second
one suggests that only %.*s is at risk not %*s, but I'm not really
convinced right now.  My recollection is that glibc will abandon
processing either format spec if it thinks the string is wrongly encoded.

 Since this is just about things like database,user, application name,
 not the actual log message it's probably not too bad if that
 happens.

[ shrug... ]  People *will* complain if data disappears from their logs.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] %d in log_line_prefix doesn't work for bg/autovacuum workers

2014-05-16 Thread Andres Freund
On 2014-05-16 15:44:53 -0400, Tom Lane wrote:
 Andres Freund and...@2ndquadrant.com writes:
  On 2014-05-16 14:51:01 -0400, Tom Lane wrote:
  Ah, you're right, so sprintf shouldn't attempt to truncate the data
  anywhere.  Nonetheless, this has created a hazard that wasn't there
  before: with any padding spec, sprintf has to determine the
  width-in-characters of the supplied string.
 
  Shouldn't we already have setup the correct locale at that point beside
  a couple of lines inside InitPostgres()?
 
 Define correct locale.  Keep in mind that names in shared catalogs
 might be in any encoding.

Even the name of the database connected to? I've never checked but I had
assumed it be valid in that database's encoding. But evidently not...

Why doesn't name_text() have to do an encodign check btw? As there
doesn't seem to be much input checking in namein it's a pretty easy way
to get bogus data in.

 But according to previous research, we don't
 really guarantee that glibc thinks the encoding is what we think, anyway.
 The commit messages for 54cd4f04576833abc394e131288bf3dd7dcf4806 and
 ed437e2b27c48219a78f3504b0d05c17c2082d02 are relevant here.  The second
 one suggests that only %.*s is at risk not %*s, but I'm not really
 convinced right now.  My recollection is that glibc will abandon
 processing either format spec if it thinks the string is wrongly encoded.

I've tested it here. From what it looks like it prints just fine but
misjudges the width for padding.

  Since this is just about things like database,user, application name,
  not the actual log message it's probably not too bad if that
  happens.
 
 [ shrug... ]  People *will* complain if data disappears from their logs.

You propose to revert?

Greetings,

Andres Freund

-- 
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] %d in log_line_prefix doesn't work for bg/autovacuum workers

2014-05-16 Thread Tom Lane
Andres Freund and...@2ndquadrant.com writes:
 On 2014-05-16 15:44:53 -0400, Tom Lane wrote:
 But according to previous research, we don't
 really guarantee that glibc thinks the encoding is what we think, anyway.
 The commit messages for 54cd4f04576833abc394e131288bf3dd7dcf4806 and
 ed437e2b27c48219a78f3504b0d05c17c2082d02 are relevant here.  The second
 one suggests that only %.*s is at risk not %*s, but I'm not really
 convinced right now.  My recollection is that glibc will abandon
 processing either format spec if it thinks the string is wrongly encoded.

 I've tested it here. From what it looks like it prints just fine but
 misjudges the width for padding.

Oh, good.  We can live with that.

 You propose to revert?

Not if the data gets printed.  I was afraid it wouldn't be.

(In any case, we could retain the feature and just do the padding
computations ourselves.  But I now think we don't have to.)

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] %d in log_line_prefix doesn't work for bg/autovacuum workers

2014-05-16 Thread Andres Freund
On 2014-05-16 16:19:28 -0400, Tom Lane wrote:
 Andres Freund and...@2ndquadrant.com writes:
  On 2014-05-16 15:44:53 -0400, Tom Lane wrote:
  But according to previous research, we don't
  really guarantee that glibc thinks the encoding is what we think, anyway.
  The commit messages for 54cd4f04576833abc394e131288bf3dd7dcf4806 and
  ed437e2b27c48219a78f3504b0d05c17c2082d02 are relevant here.  The second
  one suggests that only %.*s is at risk not %*s, but I'm not really
  convinced right now.  My recollection is that glibc will abandon
  processing either format spec if it thinks the string is wrongly encoded.
 
  I've tested it here. From what it looks like it prints just fine but
  misjudges the width for padding.
 
 Oh, good.  We can live with that.

Yea, seems pretty harmless. It's a pretty new glibc version though, so
things might have been different in the past. On the other hand I can
see more justification to stop processing on input than output...

 (In any case, we could retain the feature and just do the padding
 computations ourselves.  But I now think we don't have to.)

Might actually make the code somewhat less ugly... But I am not
volunteering.

Greetings,

Andres Freund

-- 
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] CREATE REPLICATION SLOT fails on a timeout

2014-05-16 Thread Steve Singer
I am finding that my logical walsender connections are being terminated 
due to a timeout on the CREATE REPLICATION SLOT command. with 
terminating walsender process due to replication timeout


Below is the stack trace when this happens

#3  0x0067df28 in WalSndCheckTimeOut 
(now=now@entry=453585463823871) at walsender.c:1748
#4  0x0067eedc in WalSndWaitForWal (loc=691727888) at 
walsender.c:1216
#5  logical_read_xlog_page (state=optimized out, 
targetPagePtr=691724288, reqLen=optimized out,
targetRecPtr=optimized out, cur_page=0x18476e0 }\320\005, 
pageTLI=optimized out) at walsender.c:741
#6  0x004f41bf in ReadPageInternal (state=state@entry=0x17aa140, 
pageptr=pageptr@entry=691724288,

reqLen=reqLen@entry=3600) at xlogreader.c:525
#7  0x004f454a in XLogReadRecord (state=0x17aa140, 
RecPtr=691727856, RecPtr@entry=0,

errormsg=errormsg@entry=0x7fff7f668c28) at xlogreader.c:228
#8  0x00675e5c in DecodingContextFindStartpoint 
(ctx=ctx@entry=0x17a0358) at logical.c:460
#9  0x00680f16 in CreateReplicationSlot (cmd=0x1798b50) at 
walsender.c:800

#10 exec_replication_command (
cmd_string=cmd_string@entry=0x17f1478 CREATE_REPLICATION_SLOT 
\slon_4_1\ LOGICAL \slony1_funcs.2.2.0\)

at walsender.c:1291
#11 0x006bf4a1 in PostgresMain (argc=optimized out, 
argv=argv@entry=0x177db50, dbname=0x177db30 test1,


(gdb) p last_reply_timestamp
$1 = 0


I propose the attached patch sets last_reply_timestamp to now() it 
starts processing a command.  Since receiving a command is hearing 
something from the client.



Steve

diff --git a/src/backend/replication/walsender.c b/src/backend/replication/walsender.c
new file mode 100644
index 5c11d68..56a2f10
*** a/src/backend/replication/walsender.c
--- b/src/backend/replication/walsender.c
*** exec_replication_command(const char *cmd
*** 1276,1281 
--- 1276,1282 
    parse_rc;
  
  	cmd_node = replication_parse_result;
+ 	last_reply_timestamp = GetCurrentTimestamp();
  
  	switch (cmd_node-type)
  	{

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] CREATE REPLICATION SLOT fails on a timeout

2014-05-16 Thread Andres Freund
Hi,

On 2014-05-16 16:37:16 -0400, Steve Singer wrote:
 I am finding that my logical walsender connections are being terminated due
 to a timeout on the CREATE REPLICATION SLOT command. with terminating
 walsender process due to replication timeout
 
 Below is the stack trace when this happens
 
 #3  0x0067df28 in WalSndCheckTimeOut (now=now@entry=453585463823871)
 at walsender.c:1748
 #4  0x0067eedc in WalSndWaitForWal (loc=691727888) at
 walsender.c:1216
 ...
 #9  0x00680f16 in CreateReplicationSlot (cmd=0x1798b50) at
 walsender.c:800
 #10 exec_replication_command ()  at walsender.c:1291
 #11 0x006bf4a1 in PostgresMain (argc=optimized out,
 argv=argv@entry=0x177db50, dbname=0x177db30 test1,
 
 (gdb) p last_reply_timestamp
 $1 = 0
 
 
 I propose the attached patch sets last_reply_timestamp to now() it starts
 processing a command.  Since receiving a command is hearing something from
 the client.

Hm. Yes, that's a problem.

 diff --git a/src/backend/replication/walsender.c 
 b/src/backend/replication/walsender.c
 new file mode 100644
 index 5c11d68..56a2f10
 *** a/src/backend/replication/walsender.c
 --- b/src/backend/replication/walsender.c
 *** exec_replication_command(const char *cmd
 *** 1276,1281 
 --- 1276,1282 
 parse_rc;
   
   cmd_node = replication_parse_result;
 + last_reply_timestamp = GetCurrentTimestamp();
   
   switch (cmd_node-type)
   {

I don't think that's going to cut it though. The creation can take
longer than whatever wal_sender_timeout is set to (when there's lots of
longrunning transactions). I think checking whether last_reply_timestamp
= 0 during timeout checking is more robust.

Greetings,

Andres Freund


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] CREATE REPLICATION SLOT fails on a timeout

2014-05-16 Thread Steve Singer

On 05/16/2014 04:43 PM, Andres Freund wrote:

Hi,

I don't think that's going to cut it though. The creation can take
longer than whatever wal_sender_timeout is set to (when there's lots of
longrunning transactions). I think checking whether last_reply_timestamp
= 0 during timeout checking is more robust.

Greetings,

Andres Freund





That makes sense.
A patch that does that is attached.

Steve
diff --git a/src/backend/replication/walsender.c b/src/backend/replication/walsender.c
new file mode 100644
index 5c11d68..d23f06b
*** a/src/backend/replication/walsender.c
--- b/src/backend/replication/walsender.c
*** WalSndCheckTimeOut(TimestampTz now)
*** 1738,1744 
  	timeout = TimestampTzPlusMilliseconds(last_reply_timestamp,
  		  wal_sender_timeout);
  
! 	if (wal_sender_timeout  0  now = timeout)
  	{
  		/*
  		 * Since typically expiration of replication timeout means
--- 1738,1744 
  	timeout = TimestampTzPlusMilliseconds(last_reply_timestamp,
  		  wal_sender_timeout);
  
! 	if (last_reply_timestamp  0  wal_sender_timeout  0  now = timeout)
  	{
  		/*
  		 * Since typically expiration of replication timeout means

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] SKIP LOCKED DATA (work in progress)

2014-05-16 Thread Thomas Munro
On 16 May 2014 13:21, Craig Ringer cr...@2ndquadrant.com wrote:

 On 05/16/2014 04:46 PM, Craig Ringer wrote:
 
  I'll follow up with the patch and a git tree when it's ready, hopefully
  tonight.

 Here's a rebased version of Simon's original patch that runs on current
 master.

 I still need to merge the isolation tests for it merged and sorted out,
 and after re-reading it I'd like to change waitMode into an enum, not
 just some #defines .

 Hope it's useful for comparison and ideas.


Thank you!  At first glance they're sort of similar which is reassuring.
 I'm especially interested in the buffer release semantics which I was
confused about and will look into that (to resolve the TODO notes in my
patch).

I noticed that in applyLockingClause, Simon has rc-waitMode |= waitMode.
Is that right? The values are 0, 1, and 2, but if you had both NOWAIT and
SKIP LOCKED somewhere in your query you could up with rc-waitMode == 3
unless I'm mistaken. In my patch I have code that would give precedence to
NOWAIT, though looking at it again it could be simpler. (One reviewer
pointed out, that it should really be a single unified enum. In fact I have
been a bit unsure about what scope such an enumeration should have in the
application -- could it even be used in parser code? I tried to follow
existing examples which is why I used #define macros in gram.y).

From a bikeshed colour point of view:
* I used SKIP LOCKED DATA  like DB2, and Simon used SKIP LOCKED like
Oracle, and I guess shorter is sweeter
* I used the term wait_policy and an enum, Simon used waitMode and an int
* I had noWait and skipLocked travelling separately in some places, Simon
had a single parameter, which is much better

Best regards,
Thomas Munro


Re: [HACKERS] %d in log_line_prefix doesn't work for bg/autovacuum workers

2014-05-16 Thread Andres Freund
Hi,

On 2014-05-16 19:54:56 +0200, Andres Freund wrote:
 Hi,
 
 elog.c's log_line_prefix() processes %d with:
 case 'd':
 if (MyProcPort)
 {
 const char *dbname = MyProcPort-database_name;
 
 if (dbname == NULL || *dbname == '\0')
 dbname = _([unknown]);
 if (padding != 0)
 appendStringInfo(buf, %*s, padding, dbname);
 else
 appendStringInfoString(buf, dbname);
 }
 else if (padding != 0)
 appendStringInfoSpaces(buf,
padding  0 ? padding : -padding);
 write_csvlog() uses similar logic.
 
 Unfortunately MyProcPort only exists in user initiated backends.
 
 It's imo pretty annoying that neither bgworkers nor autovacuum workers
 show the proper database in the log. Why don't we just populate a global
 variable in InitPostgres() once we're sure which database the backend is
 connected to? We could fill fake MyProcPorts, but that doesn't seem like
 a good idea to me.

The attached simple patch implements the former idea.

We could probably replace a couple of get_database_name(MyDatabaseId)
calls by it, but that doesn't look that important.

Greetings,

Andres Freund

-- 
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services
From e0b5bae6323f22494ba8aad79a50a8fd69a0d21c Mon Sep 17 00:00:00 2001
From: Andres Freund and...@anarazel.de
Date: Fri, 16 May 2014 22:34:47 +0200
Subject: [PATCH] Support showing the database in log messages originating in a
 background process.

---
 src/backend/utils/error/elog.c| 11 +--
 src/backend/utils/init/globals.c  |  1 +
 src/backend/utils/init/postinit.c |  3 +++
 src/include/miscadmin.h   |  1 +
 4 files changed, 14 insertions(+), 2 deletions(-)

diff --git a/src/backend/utils/error/elog.c b/src/backend/utils/error/elog.c
index 0d92dcd..db6cc43 100644
--- a/src/backend/utils/error/elog.c
+++ b/src/backend/utils/error/elog.c
@@ -2320,9 +2320,14 @@ log_line_prefix(StringInfo buf, ErrorData *edata)
 		   padding  0 ? padding : -padding);
 break;
 			case 'd':
-if (MyProcPort)
+if (MyProcPort || MyDatabaseName)
 {
-	const char *dbname = MyProcPort-database_name;
+	const char *dbname;
+
+	if (MyProcPort)
+		dbname = MyProcPort-database_name;
+	else
+		dbname = MyDatabaseName;
 
 	if (dbname == NULL || *dbname == '\0')
 		dbname = _([unknown]);
@@ -2577,6 +2582,8 @@ write_csvlog(ErrorData *edata)
 	/* database name */
 	if (MyProcPort)
 		appendCSVLiteral(buf, MyProcPort-database_name);
+	else if (MyDatabaseName)
+		appendCSVLiteral(buf, MyDatabaseName);
 	appendStringInfoChar(buf, ',');
 
 	/* Process id  */
diff --git a/src/backend/utils/init/globals.c b/src/backend/utils/init/globals.c
index be74835..46d4cf4 100644
--- a/src/backend/utils/init/globals.c
+++ b/src/backend/utils/init/globals.c
@@ -62,6 +62,7 @@ char		postgres_exec_path[MAXPGPATH];		/* full path to backend */
 BackendId	MyBackendId = InvalidBackendId;
 
 Oid			MyDatabaseId = InvalidOid;
+char	   *MyDatabaseName = NULL;
 
 Oid			MyDatabaseTableSpace = InvalidOid;
 
diff --git a/src/backend/utils/init/postinit.c b/src/backend/utils/init/postinit.c
index ed936d7..0f5aaa8 100644
--- a/src/backend/utils/init/postinit.c
+++ b/src/backend/utils/init/postinit.c
@@ -51,6 +51,7 @@
 #include utils/acl.h
 #include utils/fmgroids.h
 #include utils/guc.h
+#include utils/memutils.h
 #include utils/pg_locale.h
 #include utils/portal.h
 #include utils/ps_status.h
@@ -795,6 +796,7 @@ InitPostgres(const char *in_dbname, Oid dboid, const char *username,
 		MyDatabaseTableSpace = dbform-dattablespace;
 		/* take database name from the caller, just for paranoia */
 		strlcpy(dbname, in_dbname, sizeof(dbname));
+		MyDatabaseName = MemoryContextStrdup(TopMemoryContext, in_dbname);
 	}
 	else
 	{
@@ -815,6 +817,7 @@ InitPostgres(const char *in_dbname, Oid dboid, const char *username,
 		/* pass the database name back to the caller */
 		if (out_dbname)
 			strcpy(out_dbname, dbname);
+		MyDatabaseName = MemoryContextStrdup(TopMemoryContext, dbname);
 	}
 
 	/* Now we can mark our PGPROC entry with the database ID */
diff --git a/src/include/miscadmin.h b/src/include/miscadmin.h
index c2b786e..91710d3 100644
--- a/src/include/miscadmin.h
+++ b/src/include/miscadmin.h
@@ -165,6 +165,7 @@ extern char postgres_exec_path[];
  * extern BackendIdMyBackendId;
  */
 extern PGDLLIMPORT Oid MyDatabaseId;
+extern PGDLLIMPORT char *MyDatabaseName;
 
 extern PGDLLIMPORT Oid MyDatabaseTableSpace;
 
-- 
2.0.0.rc2.4.g1dc51c6.dirty


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] 9.4 checksum error in recovery with btree index

2014-05-16 Thread Jeff Janes
More fun with my torn page injection test program on 9.4.

24171  2014-05-16 14:00:44.934 PDT:WARNING:  01000: page verification
failed, calculated checksum 21100 but expected 3356
24171  2014-05-16 14:00:44.934 PDT:CONTEXT:  xlog redo split_l: rel
1663/16384/16405 left 35191, right 35652, next 34666, level 0, firstright
192
24171  2014-05-16 14:00:44.934 PDT:LOCATION:  PageIsVerified, bufpage.c:145
24171  2014-05-16 14:00:44.934 PDT:FATAL:  XX001: invalid page in block
34666 of relation base/16384/16405
24171  2014-05-16 14:00:44.934 PDT:CONTEXT:  xlog redo split_l: rel
1663/16384/16405 left 35191, right 35652, next 34666, level 0, firstright
192
24171  2014-05-16 14:00:44.934 PDT:LOCATION:  ReadBuffer_common,
bufmgr.c:483


I've seen this twice now, the checksum failure was both times for the block
labelled next in the redo record.  Is this another case where the block
needs to be reinitialized upon replay?

Cheers,

Jeff


Re: [HACKERS] %d in log_line_prefix doesn't work for bg/autovacuum workers

2014-05-16 Thread Tom Lane
Andres Freund and...@2ndquadrant.com writes:
 It's imo pretty annoying that neither bgworkers nor autovacuum workers
 show the proper database in the log. Why don't we just populate a global
 variable in InitPostgres() once we're sure which database the backend is
 connected to? We could fill fake MyProcPorts, but that doesn't seem like
 a good idea to me.

 The attached simple patch implements the former idea.

This is basically reintroducing a variable that used to exist and was
intentionally gotten rid of, on the grounds that it'd fail to track
any renaming of the session's current database (cf commit b256f24264).
I'm not sure how much effort ought to be put into dealing with that corner
case; but let's not reintroduce old bugs just for failure to remember
them.

The existing implementation of log line %d has the same issue of course,
so this is not a very good argument not to change %d.  It *is* a reason
not to blindly change get_database_name(MyDatabaseId) calls, which were
coded that way precisely for this reason.  It might also be a reason
not to blithely expose a global variable like this, which would encourage
making of the same mistake in places that might be more critical than %d.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] %d in log_line_prefix doesn't work for bg/autovacuum workers

2014-05-16 Thread Andres Freund
On 2014-05-16 17:48:28 -0400, Tom Lane wrote:
 Andres Freund and...@2ndquadrant.com writes:
  It's imo pretty annoying that neither bgworkers nor autovacuum workers
  show the proper database in the log. Why don't we just populate a global
  variable in InitPostgres() once we're sure which database the backend is
  connected to? We could fill fake MyProcPorts, but that doesn't seem like
  a good idea to me.
 
  The attached simple patch implements the former idea.
 
 This is basically reintroducing a variable that used to exist and was
 intentionally gotten rid of, on the grounds that it'd fail to track
 any renaming of the session's current database (cf commit b256f24264).
 I'm not sure how much effort ought to be put into dealing with that corner
 case; but let's not reintroduce old bugs just for failure to remember
 them.

I did look whether there's a race making MyDatabaseName out of date. I
didn't see any. There's a windows where it could be renamed between
where I'd done the MyDatabaseName = ... and the LockSharedObject() a few
lines below. But directly afterwards there's a check that the database name is
still correct.
We could move the filling of MyDatabaseName to lessen the amount of
comments that need to be added.

It looks like the interlock around database names didn't exist back in
b256f24264 - so the situation simply has changed. Looks like that has
happend in 2667d56a3b489e5645f0.

 The existing implementation of log line %d has the same issue of course,
 so this is not a very good argument not to change %d.  It *is* a reason
 not to blindly change get_database_name(MyDatabaseId) calls, which were
 coded that way precisely for this reason.  It might also be a reason
 not to blithely expose a global variable like this, which would encourage
 making of the same mistake in places that might be more critical than %d.

I think exposing the variable somewhat reasonable. It allows to to print
the database name in situations where we'd normally not dare because of
the syscache lookup... And given that it doesn't look racy anymore the
danger seems pretty low.

Greetings,

Andres Freund

-- 
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] %d in log_line_prefix doesn't work for bg/autovacuum workers

2014-05-16 Thread Andres Freund
On 2014-05-17 00:17:46 +0200, Andres Freund wrote:
 On 2014-05-16 17:48:28 -0400, Tom Lane wrote:
  This is basically reintroducing a variable that used to exist and was
  intentionally gotten rid of, on the grounds that it'd fail to track
  any renaming of the session's current database (cf commit b256f24264).
  I'm not sure how much effort ought to be put into dealing with that corner
  case; but let's not reintroduce old bugs just for failure to remember
  them.
 
 I did look whether there's a race making MyDatabaseName out of date. I
 didn't see any. There's a windows where it could be renamed between
 where I'd done the MyDatabaseName = ... and the LockSharedObject() a few
 lines below. But directly afterwards there's a check that the database name is
 still correct.
 We could move the filling of MyDatabaseName to lessen the amount of
 comments that need to be added.

Moving it also provides the name in bootstrap mode. No idea whether
there'll ever be a use in that but given the variable may later replace
some get_database_name(MyDatabaseId) calls it seems to be a benefit.

I'd briefly changed elog.c to only use MydatabaseName, thinking that
there seems to be little reason to continue using database_name in
elog.c once the variable is introduced. But that's not a good idea:
MyProcPort-database_name exists earlier.
It's imo a bit debatable whether it's a good idea to log database names
in log_line_prefix that don't exist. But that's a separate change.

I've changed it though so that MyDatabaseName is preferred over
MyProc. It's imo easier to see that it has the correct value. And it'll
be correct if we ever support mapping authentication and real database
names or something.

Revised patch attached.

Greetings,

Andres Freund

-- 
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services
From 3dd4063e607b0800fe74d33188c3198961ada073 Mon Sep 17 00:00:00 2001
From: Andres Freund and...@anarazel.de
Date: Fri, 16 May 2014 22:34:47 +0200
Subject: [PATCH] Support showing the database in log entries issued by
 background processes.

Until now %d in log_line_prefix and the database_name column in csvlog
didn't display the database for autovacuum and background worker
processes. Because logging should better not perform catalog accesses
MyProcPort-database_name was used. The database name specified by the
user when connecting. As that structure isn't used in background
processes that aren't connected to clients log messages where using an
empty/NULL database name respectively..

Add a new global variable MyDabaseName that contains the database name
of the database a processes is connected to. Such a variable even used
to exist, back in b256f24264, but it was removed due to concerns about
it being out of date because of database renames. But since then
2667d56a3b added locking that prevents a database with connected users
from being renamed.
---
 src/backend/utils/error/elog.c| 13 ++---
 src/backend/utils/init/globals.c  |  1 +
 src/backend/utils/init/postinit.c | 11 +++
 src/include/miscadmin.h   |  1 +
 4 files changed, 23 insertions(+), 3 deletions(-)

diff --git a/src/backend/utils/error/elog.c b/src/backend/utils/error/elog.c
index 0d92dcd..41b93f3 100644
--- a/src/backend/utils/error/elog.c
+++ b/src/backend/utils/error/elog.c
@@ -2320,9 +2320,14 @@ log_line_prefix(StringInfo buf, ErrorData *edata)
 		   padding  0 ? padding : -padding);
 break;
 			case 'd':
-if (MyProcPort)
+if (MyProcPort || MyDatabaseName)
 {
-	const char *dbname = MyProcPort-database_name;
+	const char *dbname;
+
+	if (MyDatabaseName)
+		dbname = MyDatabaseName;
+	else
+		dbname = MyProcPort-database_name;
 
 	if (dbname == NULL || *dbname == '\0')
 		dbname = _([unknown]);
@@ -2575,7 +2580,9 @@ write_csvlog(ErrorData *edata)
 	appendStringInfoChar(buf, ',');
 
 	/* database name */
-	if (MyProcPort)
+	if (MyDatabaseName)
+		appendCSVLiteral(buf, MyDatabaseName);
+	else if (MyProcPort)
 		appendCSVLiteral(buf, MyProcPort-database_name);
 	appendStringInfoChar(buf, ',');
 
diff --git a/src/backend/utils/init/globals.c b/src/backend/utils/init/globals.c
index be74835..46d4cf4 100644
--- a/src/backend/utils/init/globals.c
+++ b/src/backend/utils/init/globals.c
@@ -62,6 +62,7 @@ char		postgres_exec_path[MAXPGPATH];		/* full path to backend */
 BackendId	MyBackendId = InvalidBackendId;
 
 Oid			MyDatabaseId = InvalidOid;
+char	   *MyDatabaseName = NULL;
 
 Oid			MyDatabaseTableSpace = InvalidOid;
 
diff --git a/src/backend/utils/init/postinit.c b/src/backend/utils/init/postinit.c
index ed936d7..ca11ae4 100644
--- a/src/backend/utils/init/postinit.c
+++ b/src/backend/utils/init/postinit.c
@@ -51,6 +51,7 @@
 #include utils/acl.h
 #include utils/fmgroids.h
 #include utils/guc.h
+#include utils/memutils.h
 #include utils/pg_locale.h
 #include utils/portal.h
 #include utils/ps_status.h
@@ -862,6 

Re: [HACKERS] CREATE REPLICATION SLOT fails on a timeout

2014-05-16 Thread Andres Freund
On 2014-05-16 17:02:33 -0400, Steve Singer wrote:
 I don't think that's going to cut it though. The creation can take
 longer than whatever wal_sender_timeout is set to (when there's lots of
 longrunning transactions). I think checking whether last_reply_timestamp
 = 0 during timeout checking is more robust.

 That makes sense.
 A patch that does that is attached.

I've attached a heavily revised version of that patch. Didn't touch all
the necessary places...

Survives creation of logical replication slots under 'intensive
pressure', with a wal_sender_timeout=10ms.

Thanks for the bugreport!

Greetings,

Andres Freund

-- 
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services
From 324d74e16dd8ee2a0fa977d92a269fd43a746196 Mon Sep 17 00:00:00 2001
From: Andres Freund and...@anarazel.de
Date: Sat, 17 May 2014 01:22:01 +0200
Subject: [PATCH] Don't pay heed to wal_sender_timeout while creating a
 decoding slot.

Sometimes CREATE_REPLICATION_SLOT ... LOGICAL ... needs to wait for
futher WAL using WalSndWaitForWal(). That used to always respect
wal_sender_timeout and kill the session when waiting long enough
because no feedback/ping messages can be sent while the slot is still
being created.
Add the notion that last_reply_timestamp = 0 means that the walsender
currently doesn't need timeout processing and add checks for it in a
couple of places.

Bugreport and initial patch by Steve Singer, revised by Andres Freund.
---
 src/backend/replication/walsender.c | 25 +
 1 file changed, 21 insertions(+), 4 deletions(-)

diff --git a/src/backend/replication/walsender.c b/src/backend/replication/walsender.c
index 5c11d68..90394ce 100644
--- a/src/backend/replication/walsender.c
+++ b/src/backend/replication/walsender.c
@@ -148,9 +148,10 @@ static StringInfoData reply_message;
 static StringInfoData tmpbuf;
 
 /*
- * Timestamp of the last receipt of the reply from the standby.
+ * Timestamp of the last receipt of the reply from the standby. Set to 0 if a
+ * the process currently shouldn't be killed by wal_sender_timeout.
  */
-static TimestampTz last_reply_timestamp;
+static TimestampTz last_reply_timestamp = 0;
 
 /* Have we sent a heartbeat message asking for reply, since last reply? */
 static bool waiting_for_ping_response = false;
@@ -796,6 +797,14 @@ CreateReplicationSlot(CreateReplicationSlotCmd *cmd)
 		logical_read_xlog_page,
 		WalSndPrepareWrite, WalSndWriteData);
 
+		/*
+		 * Signal that we don't need the timeout mechanism. We're just
+		 * creating the replication slot and don't yet accept feedback
+		 * messages or send keepalives. As we possibly need to wait for
+		 * further WAL the walsender would possibly be killed too soon.
+		 */
+		last_reply_timestamp = 0;
+
 		/* build initial snapshot, might take a while */
 		DecodingContextFindStartpoint(ctx);
 
@@ -1693,7 +1702,7 @@ WalSndComputeSleeptime(TimestampTz now)
 {
 	long		sleeptime = 1;		/* 10 s */
 
-	if (wal_sender_timeout  0)
+	if (wal_sender_timeout  0  last_reply_timestamp  0)
 	{
 		TimestampTz wakeup_time;
 		long		sec_to_timeout;
@@ -1735,6 +1744,10 @@ WalSndCheckTimeOut(TimestampTz now)
 {
 	TimestampTz timeout;
 
+	/* don't bail out if we're doing something that doesn't require timeouts */
+	if (last_reply_timestamp = 0)
+		return;
+
 	timeout = TimestampTzPlusMilliseconds(last_reply_timestamp,
 		  wal_sender_timeout);
 
@@ -2879,7 +2892,11 @@ WalSndKeepaliveIfNecessary(TimestampTz now)
 {
 	TimestampTz ping_time;
 
-	if (wal_sender_timeout = 0)
+	/*
+	 * Don't send keepalive messages if timeouts are globally disabled or
+	 * we're doing something not partaking in timeouts.
+	 */
+	if (wal_sender_timeout = 0 || last_reply_timestamp = 0)
 		return;
 
 	if (waiting_for_ping_response)
-- 
2.0.0.rc2.4.g1dc51c6.dirty


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Scaling shared buffer eviction

2014-05-16 Thread Robert Haas
On Fri, May 16, 2014 at 10:51 AM, Amit Kapila amit.kapil...@gmail.comwrote:



 Thrds (64) Thrds (128)  HEAD 45562 17128  HEAD + 64 57904 32810  V1 + 64
 105557 81011  HEAD + 128 58383 32997  V1 + 128 110705 114544


I haven't actually reviewed the code, but this sort of thing seems like
good evidence that we need your patch, or something like it.  The fact that
the patch produces little performance improvement on it's own (though it
does produce some) shouldn't be held against it - the fact that the
contention shifts elsewhere when the first bottleneck is removed is not
your patch's fault.

In terms of ameliorating contention on the buffer mapping locks, I think it
would be better to replace the whole buffer mapping table with something
different.  I started working on that almost 2 years ago, building a
hash-table that can be read without requiring any locks and written with,
well, less locking than what we have right now:

http://git.postgresql.org/gitweb/?p=users/rhaas/postgres.git;a=shortlog;h=refs/heads/chash

I never got quite as far as trying to hook that up to the buffer mapping
machinery, but maybe that would be worth doing.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


Re: [HACKERS] Scaling shared buffer eviction

2014-05-16 Thread Peter Geoghegan
On Fri, May 16, 2014 at 7:51 AM, Amit Kapila amit.kapil...@gmail.comwrote:

 shared_buffers= 8GB
 scale factor   = 3000
 RAM - 64GB


 Thrds (64) Thrds (128)  HEAD 45562 17128  HEAD + 64 57904 32810  V1 + 64
 105557 81011  HEAD + 128 58383 32997  V1 + 128 110705 114544

 shared_buffers= 8GB
 scale factor   = 1000
 RAM - 64GB


 Thrds (64) Thrds (128)  HEAD 92142 31050  HEAD + 64 108120 86367  V1 + 64
 117454 123429  HEAD + 128 107762 86902  V1 + 128 123641 124822



I'm having a little trouble following this. These figure are transactions
per second for a 300 second pgbench tpc-b run? What does Thrds denote?

-- 
Peter Geoghegan


Re: [HACKERS] Scaling shared buffer eviction

2014-05-16 Thread Amit Kapila
On Sat, May 17, 2014 at 6:29 AM, Peter Geoghegan p...@heroku.com wrote:


 On Fri, May 16, 2014 at 7:51 AM, Amit Kapila amit.kapil...@gmail.com
wrote:

 shared_buffers= 8GB
 scale factor   = 3000
 RAM - 64GB

 I'm having a little trouble following this. These figure are transactions
per second for a 300 second pgbench tpc-b run?

Yes, the figures are tps for a 300 second run.
It is for select-only transactions.

What does Thrds denote?
It denotes number of threads (-j in pgbench run)

I have used below statements to take data
./pgbench -c 64 -j 64 -T 300 -S  postgres
./pgbench -c 128 -j 128 -T 300 -S  postgres

The reason for posting the numbers for 64/128 threads is because we have
mainly concurrency bottleneck when the number of connections are higher
than CPU cores and I am using 16 cores, 64 hardware threads m/c.


With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com


Re: [HACKERS] Scaling shared buffer eviction

2014-05-16 Thread Amit Kapila
On Sat, May 17, 2014 at 6:02 AM, Robert Haas robertmh...@gmail.com wrote:

 I haven't actually reviewed the code, but this sort of thing seems like
good evidence that we need your patch, or something like it.  The fact that
the patch produces little performance improvement on it's own (though it
does produce some) shouldn't be held against it - the fact that the
contention shifts elsewhere when the first bottleneck is removed is not
your patch's fault.

 In terms of ameliorating contention on the buffer mapping locks, I think
it would be better to replace the whole buffer mapping table with something
different.

Is there anything bad except for may be increase in LWLocks with scaling
hash partitions w.r.t to shared buffers either by auto tuning or by having a
configuration knob.  I understand that it would be bit difficult for users
to
estimate the correct value of such a parameter, we can provide info about
its usage in docs such that if user increases shared buffers by 'X' (20
times)
of default value (128M), then consider increasing such partitions and it
should
be always power of 2 or does something similar to above internally in code.


I agree that may be even by having a reasonably good estimate of number of
partitions w.r.t shared buffers, we might not be able to eliminate the
contention
around BufMappingLocks, but I think the scalability we get by doing that is
not
bad either.

 I started working on that almost 2 years ago, building a hash-table that
can be read without requiring any locks and written with, well, less
locking than what we have right now:

I have still not read the complete code, but by just going through initial
file
header, it seems to me that it will be much better than current
implementation in terms of concurrency, by the way does such an
implementation can extend to reducing scalability for hash indexes as well?

With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com


Re: [HACKERS] SKIP LOCKED DATA (work in progress)

2014-05-16 Thread Craig Ringer
On 05/17/2014 05:24 AM, Thomas Munro wrote:

 I noticed that in applyLockingClause, Simon has rc-waitMode |=
 waitMode. Is that right? The values are 0, 1, and 2, but if you had
 both NOWAIT and SKIP LOCKED somewhere in your query you could up with
 rc-waitMode == 3 unless I'm mistaken.

I do not think that |= is correct there.

It may be that no case can arise where you get the bogus value, but
since in all other places the values are tested for equalty not as bit
fields ( waitMode == NOWAIT not waitMode  NOWAIT ) it doesn't make
sense to |= it there.

? In my patch I have code that
 would give precedence to NOWAIT, though looking at it again it could be
 simpler.

I agree; if NOWAIT is present anywhere it should be preferred to
SKIP_LOCKED, as it's OK to apply NOWAIT where SKIP LOCKED appears, but
possibly semantically incorrect to apply SKIP LOCKED where only NOWAIT
was expected.

 (One reviewer pointed out, that it should really be a single
 unified enum. In fact I have been a bit unsure about what scope such an
 enumeration should have in the application -- could it even be used in
 parser code? I tried to follow existing examples which is why I used
 #define macros in gram.y).

Not sure there.

 From a bikeshed colour point of view:
 * I used SKIP LOCKED DATA  like DB2, and Simon used SKIP LOCKED like
 Oracle, and I guess shorter is sweeter

We have a long tradition of trying to allow noise keywords where it's
harmless.

So the clause should probably be

 SKIP LOCKED [DATA]

in much the same way we have

BEGIN [ WORK | TRANSACTION ] ...

There won't be any ambiguity there.

 * I used the term wait_policy and an enum, Simon used waitMode and an int

I prefer an enum and intended to change Simon's patch but didn't have
the time.

I have some isolation tester and regression tests that are still to follow.

 * I had noWait and skipLocked travelling separately in some places,
 Simon had a single parameter, which is much better

Yes, I strongly prefer that.

-- 
 Craig Ringer   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers