date:20150212

Re: [HACKERS] Patch to support SEMI and ANTI join removal

2015-02-12 Thread Marko Tiikkaja


On 2/13/15 8:52 AM, Michael Paquier wrote:

On Sun, Nov 23, 2014 at 8:23 PM, David Rowley  wrote:

As the patch stands there's still a couple of FIXMEs in there, so there's
still a bit of work to do yet.
Comments are welcome



Hm, if there is still work to do, we may as well mark this patch as
rejected as-is, also because it stands in this state for a couple of months.


I didn't bring this up before, but I'm pretty sure this patch should be 
marked "returned with feedback".  From what I've understood, "rejected" 
means "we don't want this thing, not in this form or any other".  That 
doesn't seem to be the case for this patch, nor for a few others marked 
"rejected" in the currently in-progress commit fest.




.m


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Logical Decoding follows timelines

2015-02-12 Thread Michael Paquier

On Wed, Dec 17, 2014 at 5:35 PM, Simon Riggs  wrote:

> On 16 December 2014 at 21:17, Simon Riggs  wrote:
>
> >>> This patch is a WIP version of doing that, but only currently attempts
>
> >> With the patch, XLogSendLogical uses the same logic to calculate
> SendRqstPtr
> >> that XLogSendPhysical does. It would be good to refactor that into a
> common
> >> function, rather than copy-paste.
> >
> > Some of the logic is similar, but not all.
> >
> >> SendRqstPtr isn't actually used for anything in XLogSendLogical.
> >
> > It exists to allow the call which resets TLI.
> >
> > I'll see if I can make it exactly identical; I didn't think so when I
> > first looked, will look again.
>
> Yes, that works. New version attached
>

Moved patch to CF 2015-02 to not lose track of it, also because it does not
seem it received a proper review.
-- 
Michael

Re: [HACKERS] PATCH: hashjoin - gracefully increasing NTUP_PER_BUCKET instead of batching

2015-02-12 Thread Michael Paquier

On Thu, Jan 15, 2015 at 5:02 AM, Tomas Vondra 
wrote:

> Maybe we can try later again, but there's no poin in keeping this in the
> current CF.
>
> Any objections?
>

None, marked as rejected.
-- 
Michael

Re: [HACKERS] Patch to support SEMI and ANTI join removal

2015-02-12 Thread Michael Paquier

On Sun, Nov 23, 2014 at 8:23 PM, David Rowley  wrote:

>
> As the patch stands there's still a couple of FIXMEs in there, so there's
> still a bit of work to do yet.
> Comments are welcome
>

Hm, if there is still work to do, we may as well mark this patch as
rejected as-is, also because it stands in this state for a couple of months.
-- 
Michael

Re: [HACKERS] Logical Replication Helpers WIP for discussion

2015-02-12 Thread Michael Paquier

On Mon, Dec 22, 2014 at 10:26 PM, Robert Haas  wrote:

> On Fri, Dec 19, 2014 at 8:40 AM, Petr Jelinek 
> wrote:
> > What I hope to get from this is agreement on the general approach and
> > protocol so that we can have common base which will both make it easier
> to
> > create external logical replication solutions and also eventually lead to
> > full logical replication inside core PostgreSQL.
>
> The protocol is a really important topic which deserves its own
> discussion.  Andres has mentioned some of the ideas he has in mind -
> which I think are similar to what you did here - but there hasn't
> really been a thread devoted to discussing that topic specifically.  I
> think that would be a good idea: lay out what you have in mind, and
> why, and solicit feedback.
>

Looking at this patch, I don't see what we actually gain much here except a
decoder plugin that speaks a special protocol for a special background
worker that has not been presented yet. What actually is the value of that
defined as a contrib/ module in-core. Note that we have already
test_decoding to basically test the logical decoding facility, used at
least at the SQL level to get logical changes decoded.

Based on those reasons I am planning to mark this as rejected (it has no
documentation as well). So please speak up if you think the contrary, but
it seems to me that this could live happily out of core.
-- 
Michael

Re: [HACKERS] Doing better at HINTing an appropriate column within errorMissingColumn()

2015-02-12 Thread Michael Paquier

On Tue, Dec 23, 2014 at 9:43 AM, Peter Geoghegan  wrote:

> On Mon, Dec 22, 2014 at 4:34 PM, Peter Geoghegan  wrote:
> > To put it another way, creating a separate object obfuscates
> > scanRTEForColumn(), since it's the only client of
> > updateFuzzyAttrMatchState().
>
>
> Excuse me. I mean *not* creating a separate object -- having a unified
> state representation for the entire range-table, rather than having
> one per RTE and merging them one by one into an overall/final range
> table object.
>

Patch moved to CF 2015-02.
-- 
Michael

Re: ctidscan as an example of custom-scan (Re: [HACKERS] [v9.5] Custom Plan API)

2015-02-12 Thread Michael Paquier

On Tue, Jan 6, 2015 at 10:51 PM, Kouhei Kaigai  wrote:

> Jim, Thanks for your reviewing the patch.
>
> The attached patch is revised one according to your suggestion,
> and also includes bug fix I could found.
>
> * Definitions of TIDOperator was moved to pg_operator.h
>   as other operator doing.
> * Support the case of "ctid (operator) Param" expression.
> * Add checks if commutator of TID was not found.
> * Qualifiers gets extracted on plan stage, not path stage.
> * Adjust cost estimation logic to fit SeqScan manner.
> * Add some new test cases for regression test.
>
> > On 12/24/14, 7:54 PM, Kouhei Kaigai wrote:
> > >>> On Mon, Dec 15, 2014 at 4:55 PM, Kouhei Kaigai
> > >>> 
> > >> wrote:
> >  I'm not certain whether we should have this functionality in
> >  contrib from the perspective of workload that can help, but its
> >  major worth is for an example of custom-scan interface.
> > >>> worker_spi is now in src/test/modules. We may add it there as well,
> > no?
> > >>>
> > >> Hmm, it makes sense for me. Does it also make sense to add a
> > >> test-case to the core regression test cases?
> > >>
> > > The attached patch adds ctidscan module at test/module instead of
> contrib.
> > > Basic portion is not changed from the previous post, but file
> > > locations and test cases in regression test are changed.
> >
> > First, I'm glad for this. It will be VERY valuable for anyone trying to
> > clean up the end of a majorly bloated table.
> >
> > Here's a partial review...
> >
> > +++ b/src/test/modules/ctidscan/ctidscan.c
> >
> > +/* missing declaration in pg_proc.h */
> > +#ifndef TIDGreaterOperator
> > +#define TIDGreaterOperator   2800
> > ...
> > If we're calling that out here, should we have a corresponding comment in
> > pg_proc.h, in case these ever get renumbered?
> >
> It was a quick hack when I moved this module out of the tree.
> Yep, I should revert when I submit this patch again.
>
> > +CTidQualFromExpr(Node *expr, int varno)
> > ...
> > + if (IsCTIDVar(arg1, varno))
> > + other = arg2;
> > + else if (IsCTIDVar(arg2, varno))
> > + other = arg1;
> > + else
> > + return NULL;
> > + if (exprType(other) != TIDOID)
> > + return NULL;/* should not happen */
> > + /* The other argument must be a pseudoconstant */
> > + if (!is_pseudo_constant_clause(other))
> > + return NULL;
> >
> > I think this needs some additional blank lines...
> >
> OK. And, I also noticed the coding style around this function is
> different from other built-in plans, so I redefined the role of
> this function just to check whether the supplied RestrictInfo is
> OpExpr that involves TID inequality operator, or not.
>
> Expression node shall be extracted when Plan node is created
> from Path node.
>
> > + if (IsCTIDVar(arg1, varno))
> > + other = arg2;
> > + else if (IsCTIDVar(arg2, varno))
> > + other = arg1;
> > + else
> > + return NULL;
> > +
> > + if (exprType(other) != TIDOID)
> > + return NULL;/* should not happen */
> > +
> > + /* The other argument must be a pseudoconstant */
> > + if (!is_pseudo_constant_clause(other))
> > + return NULL;
> >
> > + * CTidEstimateCosts
> > + *
> > + * It estimates cost to scan the target relation according to the given
> > + * restriction clauses. Its logic to scan relations are almost same as
> > + * SeqScan doing, because it uses regular heap_getnext(), except for
> > + * the number of tuples to be scanned if restriction clauses work well.
> >
> > That should read "same as what SeqScan is doing"... however,
> what
> > actual function are you talking about? I couldn't find
> SeqScanEstimateCosts
> > (or anything ending EstimateCosts).
> >
> It is cost_seqscan(). But I don't put a raw function name in the source
> code comments in other portion, because nobody can guarantee it is up to
> date in the future...
>
> > BTW, there's other grammar issues but it'd be best to handle those all at
> > once after all the code stuff is done.
> >
> Yep. Help by native English speaker is very helpful for us.
>
> > + opno = get_commutator(op->opno);
> > What happens if there's no commutator? Perhaps best to Assert(opno !=
> > InvalidOid) at the end of that if block.
> >
> Usually, commutator operator of TID is defined on initdb time, however,
> nobody can guarantee a mad superuser doesn't drop it.
> So, I added elog(ERROR,...) here.
>
>
> > Though, it seems things will fall apart anyway if ctid_quals isn't
> exactly
> > what we're expecting; I don't know if that's OK or not.
> >
> No worry, it was already checked on planning stage.
>
> > + /* disk costs --- assume each tuple on a different page */
> > + run

Re: Custom/Foreign-Join-APIs (Re: [HACKERS] [v9.5] Custom Plan API)

2015-02-12 Thread Michael Paquier

On Thu, Jan 15, 2015 at 8:02 AM, Kouhei Kaigai  wrote:

> > On Fri, Jan 9, 2015 at 10:51 AM, Kouhei Kaigai 
> wrote:
> > > When custom-scan node replaced a join-plan, it shall have at least two
> > > child plan-nodes. The callback handler of PlanCustomPath needs to be
> > > able to call create_plan_recurse() to transform the underlying
> > > path-nodes to plan-nodes, because this custom-scan node may take other
> > > built-in scan or sub-join nodes as its inner/outer input.
> > > In case of FDW, it shall kick any underlying scan relations to remote
> > > side, thus we may not expect ForeignScan has underlying plans...
> >
> > Do you have an example of this?
> >
> Yes, even though full code set is too large for patch submission...
>
> https://github.com/pg-strom/devel/blob/master/src/gpuhashjoin.c#L1880
>
> This create_gpuhashjoin_plan() is PlanCustomPath callback of GpuHashJoin.
> It takes GpuHashJoinPath inherited from CustomPath that has multiple
> underlying scan/join paths.
> Once it is called back from the backend, it also calls
> create_plan_recurse()
> to make inner/outer plan nodes according to the paths.
>
> In the result, we can see the following query execution plan that
> CustomScan
> takes underlying scan plans.
>
> postgres=# EXPLAIN SELECT * FROM t0 NATURAL JOIN t1 NATURAL JOIN t2;
> QUERY PLAN
>
> --
>  Custom Scan (GpuHashJoin)  (cost=2968.00..140120.31 rows=3970922
> width=143)
>Hash clause 1: (aid = aid)
>Hash clause 2: (bid = bid)
>Bulkload: On
>->  Custom Scan (GpuScan) on t0  (cost=500.00..57643.00 rows=409
> width=77)
>->  Custom Scan (MultiHash)  (cost=734.00..734.00 rows=4 width=37)
>  hash keys: aid
>  nBatches: 1  Buckets: 46000  Memory Usage: 99.99%
>  ->  Seq Scan on t1  (cost=0.00..734.00 rows=4 width=37)
>  ->  Custom Scan (MultiHash)  (cost=734.00..734.00 rows=4
> width=37)
>hash keys: bid
>nBatches: 1  Buckets: 46000  Memory Usage: 49.99%
>->  Seq Scan on t2  (cost=0.00..734.00 rows=4 width=37)
> (13 rows)
>

Where are we on this? AFAIK, we have now a feature with no documentation
and no example in-core to test those custom routine APIs, hence moved to
next CF.
-- 
Michael

Re: [HACKERS] documentation update for doc/src/sgml/func.sgml

2015-02-12 Thread Michael Paquier

On Mon, Feb 2, 2015 at 10:42 PM, Robert Haas  wrote:

> On Tue, Jan 20, 2015 at 4:01 AM, Fabien COELHO 
> wrote:
> >> I had a look at this patch.  This patch adds some text below a table
> >> of functions.  Immediately above that table, there is this existing
> >> language:
> >>
> >>   The functions working with double precision data are
> mostly
> >>   implemented on top of the host system's C library; accuracy and
> behavior
> >> in
> >>   boundary cases can therefore vary depending on the host system.
> >>
> >> This seems to me to substantially duplicate the information added by the
> >> patch.
> >
> > I would rather say that it explicites the potential issues. Taking that
> into
> > account, maybe the part about floating point could be moved up after the
> > above sentence, or the above sentence moved down as an introduction, with
> > some pruning so that it fits in?
>
> Possibly.  If anyone still cares about this patch, then they should
> try revising it along those lines and submit an updated version.  If
> no one is excited enough about this to do that, we should just flag
> this as rejected and move on.  Since this patch has been kicking
> around since August, my reading is nobody's very excited about it, but
> maybe I'm misinterpreting the situation.


Let's move on then, marked as rejected.
-- 
Michael

Re: [HACKERS] REVIEW: Track TRUNCATE via pgstat

2015-02-12 Thread Michael Paquier

On Sat, Jan 24, 2015 at 5:58 AM, Alvaro Herrera 
wrote:

> Here's v0.5. (Why did you use that weird decimal versioning scheme?  You
> could just say "v4" and save a couple of keystrokes).  This patch makes
> perfect sense to me now.  I was ready to commit, but I checked the
> regression test you added and noticed that you're only reading results
> for the last set of operations because they all use the same table and
> so each new set clobbers the values for the previous one.  So I modified
> them to use one table for each set, and report the counters for all
> tables.  In doing this I noticed that the one for trunc_stats_test3 is
> at odds with what your comment in the .sql file says; would you review
> it please?  Thanks.
>
> (I didn't update the expected file.)
>
> BTW you forgot to update expected/prepared_xact_1.out, for the case when
> prep xacts are disabled.
>
> If some other committer decides to give this a go, please remember to
> bump catversion before pushing.
>

Alex, this patch seems nicely backed. Could you review the changes of
Alvaro? This thread is waiting for your input for 3 weeks.
-- 
Michael

Re: [HACKERS] Turning off HOT/Cleanup sometimes

2015-02-12 Thread Michael Paquier

On Wed, Dec 17, 2014 at 5:39 PM, Simon Riggs  wrote:

> On 15 December 2014 at 20:26, Jeff Janes  wrote:
>
> > I still get the compiler error in contrib:
> >
> > pgstattuple.c: In function 'pgstat_heap':
> > pgstattuple.c:279: error: too few arguments to function
> > 'heap_beginscan_strat'
> >
> > Should it pass false for the always_prune?
>
> Yes.
>
> New version attached.
>

Moved patch to CF 2015-02 with same status "Needs review". It visibly needs
more work, and numbers to show increase in performance while only cases
showing up performance decrease showed up.
-- 
Michael

Re: [HACKERS] parallel mode and parallel contexts

2015-02-12 Thread Michael Paquier

On Thu, Feb 12, 2015 at 3:59 AM, Robert Haas  wrote:

> We're not seeing eye to eye here yet, since I don't accept your
> example scenario and you don't accept mine.  Let's keep discussing.
>
> Meanwhile, here's an updated patch.
>

A lot of cool activity is showing up here, so moved the patch to CF
2015-02. Perhaps Andres you could add yourself as a reviewer?
-- 
Michael

Re: [HACKERS] WALWriter active during recovery

2015-02-12 Thread Michael Paquier

On Thu, Dec 18, 2014 at 6:43 PM, Fujii Masao  wrote:

> On Tue, Dec 16, 2014 at 3:51 AM, Simon Riggs 
> wrote:
> > Currently, WALReceiver writes and fsyncs data it receives. Clearly,
> > while we are waiting for an fsync we aren't doing any other useful
> > work.
> >
> > Following patch starts WALWriter during recovery and makes it
> > responsible for fsyncing data, allowing WALReceiver to progress other
> > useful actions.
>
> +1
>
> > At present this is a WIP patch, for code comments only. Don't bother
> > with anything other than code questions at this stage.
> >
> > Implementation questions are
> >
> > * How should we wake WALReceiver, since it waits on a poll(). Should
> > we use SIGUSR1, which is already used for latch waits, or another
> > signal?
>
> Probably we need to change libpqwalreceiver so that it uses the latch.
> This is useful even for the startup process to report the replay location
> to
> the walreceiver in real time.
>
> > * Should we introduce some pacing delays if the WALreceiver gets too
> > far ahead of apply?
>
> I don't think so for now. Instead, we can support synchronous_commit =
> replay,
> and the users can use that new mode if they are worried about the delay of
> WAL replay.
>
> > * Other questions you may have?
>
> Who should wake the startup process so that it reads and replays the WAL
> data?
> Current walreceiver. But if walwriter is responsible for fsyncing WAL data,
> probably walwriter should do that. Because the startup process should not
> replay
> the WAL data which has not been fsync'd yet.
>

Moved this patch to CF 2015-02 to not lose track of it and because it did
not get any reviews.
-- 
Michael

[HACKERS] question on Postgres smart shutdown mode

2015-02-12 Thread wei sun

Hi All,



I have a question on PG smart shutdown mode.

When shutdown Postgres by issuing *Smart Shutdown *mode (SIGTERM) request,
is there a way for client to be notified of this shutdown event? I tried
PG_NOTIFY, but I cannot get any notification events when this happens. BTW,
I am relative new to Postgres.



Thanks,

Wei

Re: [HACKERS] [PATCH] Add transforms feature

2015-02-12 Thread Michael Paquier

On Mon, Dec 22, 2014 at 12:19 PM, Michael Paquier  wrote:

> On Mon, Dec 15, 2014 at 3:10 PM, Peter Eisentraut  wrote:
> > fixed
> This patch needs a rebase, it does not apply correctly in a couple of
> places on latest HEAD (699300a):
> ./src/include/catalog/catversion.h.rej
> ./src/include/catalog/pg_proc.h.rej
> ./src/pl/plpython/plpy_procedure.c.rej
>

I moved this patch to 2015-02 to not lose track of it and because it did
not receive much reviews...
-- 
Michael

Re: [HACKERS] Final Patch for GROUPING SETS

2015-02-12 Thread Michael Paquier

On Wed, Jan 21, 2015 at 6:02 AM, Andrew Gierth 
wrote:

> Updated patch (mostly just conflict resolution):
>
>  - fix explain code to track changes to deparse context handling
>
>  - tiny expansion of some comments (clarify in nodeAgg header
>comment that aggcontexts are now EContexts rather than just
>memory contexts)
>
>  - declare support for features in sql_features.txt, which had been
>previously overlooked
>
>
Patch moved to CF 2015-02.
-- 
Michael

Re: [HACKERS] Patch: add recovery_timeout option to control timeout of restore_command nonzero status code

2015-02-12 Thread Michael Paquier

On Tue, Feb 10, 2015 at 10:32 PM, Michael Paquier  wrote:

> Patch updated is attached.
>

Patch moved to CF 2015-02 with same status.
-- 
Michael

Re: [HACKERS] inherit support for foreign tables

2015-02-12 Thread Michael Paquier

On Thu, Jan 15, 2015 at 4:35 PM, Etsuro Fujita 
wrote:

> As I said before, that seems to me like a good idea.  So I'll update the
> patch based on that if you're okey with it.  Or you've found any problem
> concerning the above idea?
>

Patch moved to CF 2015-02 with same status, "Ready for committer".
-- 
Michael

Re: [HACKERS] pg_basebackup -x/X doesn't play well with archive_mode & wal_keep_segments

2015-02-12 Thread Sergey Konoplev

On Thu, Feb 12, 2015 at 4:18 PM, Andres Freund  wrote:
> No need for debugging. It's plain and simply a (cherry-pick) conflict I
> resolved wrongly during backpatching. 9.3, 9.4 and master do not have
> that problem. That whole fix was quite painful because every single
> release had significantly different code :(. pg_basebackup/ is pretty
> messy.
> I'm not sure why my testsuite didn't trigger that problem. Possibly
> because a retry makes things work :(
>
> Somewhat uckily it's 9.2 only (9.3, 9.4 and master look correct, earlier
> releases don't have pg_receivexlog) and can quite easily be worked
> around by creating the archive_status directory.

The workaround works perfectly for me in this case, I'm going to
updrade it up to 9.4 anyway soon.

Thank you, Andres.

-- 
Kind regards,
Sergey Konoplev
PostgreSQL Consultant and DBA

http://www.linkedin.com/in/grayhemp
+1 (415) 867-9984, +7 (499) 346-7196, +7 (988) 888-1979
gray...@gmail.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [REVIEW] Re: Compression of full-page-writes

2015-02-12 Thread Michael Paquier

On Thu, Feb 12, 2015 at 8:08 PM, Syed, Rahila 
wrote:
>
>
>
> Thank you for comments. Please find attached the updated patch.
>
>
>
> >This patch fails to compile:
> >xlogreader.c:1049:46: error: extraneous ')' after condition, expected a
statement
> >blk->with_hole &&
blk->hole_offset <= 0))
>
> This has been rectified.
>
>
>
> >Note as well that at least clang does not like much how the sanity check
with with_hole are done. You should place parentheses around the '&&'
expressions. Also, I would rather define with_hole == 0 or with_hole == 1
explicitly int those checks
>
> The expressions are modified accordingly.
>
>
>
> >There is a typo:
>
> >s/true,see/true, see/
>
> >[nitpicky]Be as well aware of the 80-character limit per line that is
usually normally by comment blocks.[/]
>
>
>
> Have corrected the typos and changed the comments as mentioned. Also ,
realigned certain lines to meet the 80-char limit.


Thanks for the updated patch.

+   /* leave if data cannot be compressed */
+   if (compressed_len == 0)
+   return false;
This should be < 0, pglz_compress returns -1 when compression fails.

+   if (pglz_decompress(block_image, bkpb->bkp_len,
record->uncompressBuf,
+
bkpb->bkp_uncompress_len) == 0)
Similarly, this should be < 0.

Regarding the sanity checks that have been added recently. I think that
they are useful but I am suspecting as well that only a check on the record
CRC is done because that's reliable enough and not doing those checks
accelerates a bit replay. So I am thinking that we should simply replace
them by assertions.

I have as well re-run my small test case, with the following results
(scripts and results attached)
=# select test, user_diff,system_diff, pg_size_pretty(pre_update -
pre_insert),
pg_size_pretty(post_update - pre_update) from results;
  test   | user_diff | system_diff | pg_size_pretty | pg_size_pretty
-+---+-++
 FPW on  | 46.134564 |0.823306 | 429 MB | 566 MB
 FPW on  | 16.307575 |0.798591 | 171 MB | 229 MB
 FPW on  |  8.325136 |0.848390 | 86 MB  | 116 MB
 FPW off | 29.992383 |1.100458 | 440 MB | 746 MB
 FPW off | 12.237578 |1.027076 | 171 MB | 293 MB
 FPW off |  6.814926 |0.931624 | 86 MB  | 148 MB
 HEAD| 26.590816 |1.159255 | 440 MB | 746 MB
 HEAD| 11.620359 |0.990851 | 171 MB | 293 MB
 HEAD|  6.300401 |0.904311 | 86 MB  | 148 MB
(9 rows)
The level of compression reached is the same as previous mark, 566MB for
the case of fillfactor=50 (
cab7npqsc97o-ue5paxfmukwcxe_jioyxo1m4a0pmnmyqane...@mail.gmail.com) with a
similar CPU usage.

Once we get those small issues fixes, I think that it is with having a
committer look at this patch, presumably Fujii-san.
Regards,
-- 
Michael


compress_run.bash
Description: Binary data


results.sql
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Odd behavior of updatable security barrier views on foreign tables

2015-02-12 Thread Etsuro Fujita


On 2015/02/11 4:06, Stephen Frost wrote:

* Etsuro Fujita (fujita.ets...@lab.ntt.co.jp) wrote:

On 2015/02/10 7:23, Dean Rasheed wrote:

Sorry, I didn't have time to look at this properly. My initial thought
is that expand_security_qual() needs to request a lock on rows coming

>from the relation it pushes down into a subquery if that relation was

the result relation, because otherwise it won't have any locks, since
preprocess_rowmarks() only adds PlanRowMarks to non-target relations.


That seems close to what I had in mind; expand_security_qual() needs
to request a FOR UPDATE lock on rows coming from the relation it
pushes down into a subquery only when that relation is the result
relation and *foreign table*.


I had been trying to work out an FDW-specific way to address this, but I
think Dean's right that this should be addressed in
expand_security_qual(), which means it'll apply to all cases and not
just these FDW calls.  I don't think that's actually an issue though and
it'll match up to how SELECT FOR UPDATE is handled today.


Sorry, my explanation was not accurate, but I also agree with Dean's 
idea.  In the above, I just wanted to make it clear that such a lock 
request done by expand_security_qual() should be limited to the case 
where the relation that is a former result relation is a foreign table.


If it's OK, I'll submit a patch for that, maybe early next week.

Thank you for working on this issue!

Best regards,
Etsuro Fujita


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

[HACKERS] "multiple backends attempting to wait for pincount 1"

2015-02-12 Thread Tom Lane

Two different CLOBBER_CACHE_ALWAYS critters recently reported exactly
the same failure pattern on HEAD:

http://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=markhor&dt=2015-02-06%2011%3A59%3A59
http://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=tick&dt=2015-02-12%2010%3A22%3A57

I'd say we have a problem.  I'd even go so far as to say that somebody has
completely broken locking, because this looks like autovacuum and manual
vacuuming are hitting the same table at the same time.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] assessing parallel-safety

2015-02-12 Thread Noah Misch

On Thu, Feb 12, 2015 at 07:40:12AM -0500, Robert Haas wrote:
> On Thu, Feb 12, 2015 at 12:16 AM, Noah Misch  wrote:
> > That is a major mark against putting the check in simplify_function(), 
> > agreed.
> 
> I do see one way to rescue that idea, which is this: put two flags,
> parallelModeOK and parallelModeRequired, into PlannerGlobal.  At the
> beginning of planning, set parallelModeOK based on our best knowledge
> at that time; as we preprocess expressions, update it to false if we
> see something that's not parallel-safe.  Emit paths for parallel plans
> only if the flag is currently true.  At the end of planning, when we
> convert paths to plans, set parallelModeRequired if any parallel plan
> elements are generated.  If we started with parallelModeOK = true or
> ended up with parallelModeRequired = false, then life is good.  In the
> unfortunate event that we end up with parallelModeOK = false and
> parallelModeRequired = true, replan, this time with parallelModeOK =
> false from the beginning.

> So this would mostly be pretty cheap, but if you do hit the case where
> a re-plan is required it would be pretty painful.

> >> > Unless we want to rejigger this so that we do a
> >> > complete eval_const_expressions() pass over the entire query tree
> >> > (including all subqueries) FIRST, and then only after that go back and
> >> > plan all of those subqueries, I don't see how to make this work; and
> >> > I'm guessing that there are good reasons not to do that.
> >
> > I expect that would work fine, but I do think it premature to venture that 
> > far
> > out of your way to optimize this new tree examination.

Given your wish to optimize, I recommend first investigating the earlier
thought to issue eval_const_expressions() once per planner() instead of once
per subquery_planner().  Compared to the parallelModeRequired/parallelModeOK
idea, it would leave us with a more maintainable src/backend/optimizer.  I
won't object to either design, though.

Your survey of parallel safety among built-in functions was on-target.


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

[HACKERS] Strange assertion using VACOPT_FREEZE in vacuum.c

2015-02-12 Thread Michael Paquier

Hi all,

When calling vacuum(), there is the following assertion using VACOPT_FREEZE:
Assert((vacstmt->options & VACOPT_VACUUM) ||
!(vacstmt->options & (VACOPT_FULL | VACOPT_FREEZE)));
I think that this should be changed with sanity checks based on the
parameter values of freeze_* in VacuumStmt as we do not set up
VACOPT_FREEZE when VACUUM is used without options in parenthesis, for
something like that:
Assert((vacstmt->options & VACOPT_VACUUM) ||
-  !(vacstmt->options & (VACOPT_FULL | VACOPT_FREEZE)));
+  ((vacstmt->options & VACOPT_FULL) == 0 &&
+   vacstmt->freeze_min_age < 0 &&
+   vacstmt->freeze_table_age < 0 &&
+   vacstmt->multixact_freeze_min_age < 0 &&
+   vacstmt->multixact_freeze_table_age < 0));
This would also have the advantage to limit the use of VACOPT_FREEZE
in the query parser.
A patch is attached.
Thoughts?
-- 
Michael
diff --git a/src/backend/commands/vacuum.c b/src/backend/commands/vacuum.c
index 2f3f79d..3ddc077 100644
--- a/src/backend/commands/vacuum.c
+++ b/src/backend/commands/vacuum.c
@@ -110,7 +110,11 @@ vacuum(VacuumStmt *vacstmt, Oid relid, bool do_toast,
 	/* sanity checks on options */
 	Assert(vacstmt->options & (VACOPT_VACUUM | VACOPT_ANALYZE));
 	Assert((vacstmt->options & VACOPT_VACUUM) ||
-		   !(vacstmt->options & (VACOPT_FULL | VACOPT_FREEZE)));
+		   ((vacstmt->options & VACOPT_FULL) == 0 &&
+			vacstmt->freeze_min_age < 0 &&
+			vacstmt->freeze_table_age < 0 &&
+			vacstmt->multixact_freeze_min_age < 0 &&
+			vacstmt->multixact_freeze_table_age < 0));
 	Assert((vacstmt->options & VACOPT_ANALYZE) || vacstmt->va_cols == NIL);
 
 	stmttype = (vacstmt->options & VACOPT_VACUUM) ? "VACUUM" : "ANALYZE";

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Table-level log_autovacuum_min_duration

2015-02-12 Thread Michael Paquier

On Fri, Feb 13, 2015 at 10:16 AM, Naoya Anzai
 wrote:
>>> You mean that ...
>>> Log_autovacuum_min_duration assumes a role of VACOPT_VERBOSE.
>>> Even if this parameter never use currently for manual vacuum,
>>> log_autovacuum_min_duration should be set zero(anytime output)
>>> when we executes "VACUUM(or ANALYZE) VERBOSE".
>>> Is my understanding correct? If so,it sounds logical.
>>>
>>
>>Yup, that's my opinion. Now I don't know if people would mind to remove
>>VACOPT_VERBOSE and replace the control it does by log_min_duration in
>>VacuumStmt. At least both things are overlapping, and log_min_duration
>>offers more options than the plain VACOPT_VERBOSE.
>
> OK. I agree with you.
> Please re-implement as you are thinking.

OK will do that today.

>>> If we can abolish VERBOSE option,
>>> I think it's ideal that we will prepare a new parameter like
>>> a log_min_duration_vacuum(and log_min_duration_analyze) which
>>> integrating "VERBOSE feature" and "log_autovacuum_min_duration".
>>>
>>
>>What I think you are proposing here is a VERBOSE option that hypothetically
>>gets activated if a manual VACUUM takes more than a certain amount
>>specified by those parameters. I doubt this would be useful. In any case
>>this is unrelated to this patch.
>
> I suspect manual vacuum often executes as "semi-auto vacuum"
> by job-scheduler, cron, etc in actual maintenance cases.
> Whether auto or manual, I think that's important to output
> their logs in the same mechanism.
>
> Sorry, I seem to have wandered from the subject.

No problem. That's a constructive discussion :)
-- 
Michael


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Table-level log_autovacuum_min_duration

2015-02-12 Thread Naoya Anzai

>> You mean that ...
>> Log_autovacuum_min_duration assumes a role of VACOPT_VERBOSE.
>> Even if this parameter never use currently for manual vacuum,
>> log_autovacuum_min_duration should be set zero(anytime output)
>> when we executes "VACUUM(or ANALYZE) VERBOSE".
>> Is my understanding correct? If so,it sounds logical.
>>
>
>Yup, that's my opinion. Now I don't know if people would mind to remove
>VACOPT_VERBOSE and replace the control it does by log_min_duration in
>VacuumStmt. At least both things are overlapping, and log_min_duration
>offers more options than the plain VACOPT_VERBOSE.

OK. I agree with you.
Please re-implement as you are thinking.

>> If we can abolish VERBOSE option,
>> I think it's ideal that we will prepare a new parameter like
>> a log_min_duration_vacuum(and log_min_duration_analyze) which
>> integrating "VERBOSE feature" and "log_autovacuum_min_duration".
>>
>
>What I think you are proposing here is a VERBOSE option that hypothetically
>gets activated if a manual VACUUM takes more than a certain amount
>specified by those parameters. I doubt this would be useful. In any case
>this is unrelated to this patch.

I suspect manual vacuum often executes as "semi-auto vacuum" 
by job-scheduler, cron, etc in actual maintenance cases. 
Whether auto or manual, I think that's important to output 
their logs in the same mechanism.

Sorry, I seem to have wandered from the subject.

Naoya


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] PATCH: Reducing lock strength of trigger and foreign key DDL

2015-02-12 Thread Michael Paquier

On Sun, Feb 8, 2015 at 10:05 AM, Andreas Karlsson  wrote:
> On 01/30/2015 07:48 AM, Michael Paquier wrote:
>>
>> Looking at the latest patch, it seems that in
>> AlterTableGetLockLevel@tablecmds.c we ought to put AT_ReAddConstraint,
>> AT_AddConstraintRecurse and AT_ProcessedConstraint under the same
>> banner as AT_AddConstraint. Thoughts?
>
>
> A new version of the patch is attached which treats them as the same for
> locking. I think it is correct and improves readability to do so.

Well then, let's switch it to "Ready for committer". I am moving as
well this entry to the next CF with the same status.
-- 
Michael


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Commit fest 2015-12 enters money time

2015-02-12 Thread Michael Paquier

On Thu, Jan 15, 2015 at 4:05 PM, Michael Paquier
 wrote:
> We are soon entering in the money time for this CF. The last month has
> been mainly a vacation period, the progress being fantomatic on many
> fronts, still there are a couple of patches that are marked as ready
> for committer:
> - Foreign table inheritance , whose first patch has been committed
> - speedup tidbitmap patch: cache page
> - pg_basebackup vs. Windows and tablespaces (Extend base backup to
> include symlink file used to restore symlinks)
> - If pg_hba.conf has changed since last reload, emit a HINT to the
> client and server log
> - Add restore_command_retry_interval option to control timeout of
> restore_command nonzero status code
> - documentation update for doc/src/sgml/func.sgml
> - Reducing lock strength of trigger and foreign key DDL
>
> I am going to make a first pass on all the patches to check their
> status, and please note that the patches waiting for some input from
> their authors will be marked as returned with feedback.

In order to move on to the next CF, I am going to go through all the
remaining patches and update their status accordingly. And sorry for
slacking a bit regarding that. If you wish to complain, of course feel
free to post messages on this thread or on the thread related to the
patch itself.
Regards,
-- 
Michael


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] pg_basebackup -x/X doesn't play well with archive_mode & wal_keep_segments

2015-02-12 Thread Andres Freund

On 2015-02-12 11:44:05 -0800, Sergey Konoplev wrote:
> On Thu, Feb 12, 2015 at 11:40 AM, Andres Freund  
> wrote:
> > This obviously should not be the case. I'll have a look in a couple of 
> > hours. Until then you can likely  just work around the problem by creating 
> > the archive_status directory.
> 
> Thank you. Just let me know if you need some extra info or debugging.

No need for debugging. It's plain and simply a (cherry-pick) conflict I
resolved wrongly during backpatching. 9.3, 9.4 and master do not have
that problem. That whole fix was quite painful because every single
release had significantly different code :(. pg_basebackup/ is pretty
messy.
I'm not sure why my testsuite didn't trigger that problem. Possibly
because a retry makes things work :(

Somewhat uckily it's 9.2 only (9.3, 9.4 and master look correct, earlier
releases don't have pg_receivexlog) and can quite easily be worked
around by creating the archive_status directory.

If you want to fix it locally, you just need to replace
ReceiveXlogStream(conn, startpos, timeline, NULL, basedir,
  stop_streaming, 
standby_message_timeout, false, true);
by
ReceiveXlogStream(conn, startpos, timeline, NULL, basedir,
  stop_streaming, 
standby_message_timeout, false, false);

Yes, that and pretty much all other functions in that directory have too
many parameters.

Greetings,

Andres Freund

-- 
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] gcc5: initdb produces gigabytes of _fsm files

2015-02-12 Thread Robert Haas

On Thu, Feb 12, 2015 at 6:16 PM, Tom Lane  wrote:
> I wrote:
>> Christoph Berg  writes:
>>> gcc5 is lurking in Debian experimental, and it's breaking initdb.
>
>> Yeah, I just heard the same about Red Hat as well:
>> https://bugzilla.redhat.com/show_bug.cgi?id=1190978
>> Not clear if it's an outright compiler bug or they've just found some
>> creative new way to make an optimization assumption we're violating.
>
> Apparently, it's the former.  See
>
> https://bugzilla.redhat.com/show_bug.cgi?id=1190978#c3
>
> I will be unamused if the gcc boys try to make an argument that they
> did some valid optimization here.

You're new here, aren't you?

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] gcc5: initdb produces gigabytes of _fsm files

2015-02-12 Thread Tom Lane

I wrote:
> Christoph Berg  writes:
>> gcc5 is lurking in Debian experimental, and it's breaking initdb.

> Yeah, I just heard the same about Red Hat as well:
> https://bugzilla.redhat.com/show_bug.cgi?id=1190978
> Not clear if it's an outright compiler bug or they've just found some
> creative new way to make an optimization assumption we're violating.

Apparently, it's the former.  See

https://bugzilla.redhat.com/show_bug.cgi?id=1190978#c3

I will be unamused if the gcc boys try to make an argument that they
did some valid optimization here.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] How about to have relnamespace and relrole?

2015-02-12 Thread Jim Nasby


On 2/12/15 5:28 AM, Kyotaro HORIGUCHI wrote:

Hello, I changed the subject.

This mail is to address the point at hand, preparing for
registering this commitfest.

15 17:29:14 +0900 (Tokyo Standard Time), Kyotaro HORIGUCHI
 wrote in
<20150204.172914.52110711.horiguchi.kyot...@lab.ntt.co.jp>

Tue, 03 Feb 2015 10:12:12 -0500, Tom Lane  wrote in 
<2540.1422976...@sss.pgh.pa.us>

I'm not really excited about that.  That line of thought would imply
that we should have "reg*" types for every system catalog, which is
surely overkill.


Mmm. I suppose "for every OID usage", not "every system catalog".
but I agree as the whole. There's no agreeable-by-all
boundary. Perhaps it depends on how often the average DBA looks
onto catalogs which have oids pointing another catalog which they
want to see in human-readable form, without joins if possible.

I very roughly counted how often the oids of catalogs referred
from other catalogs.


1. Expected frequency of use

...

For that reason, although the current policy of deciding whether
to have reg* types seems to be whether they have schema-qualified
names, I think regrole and regnamespace are valuable to have.


Perhaps schema qualification was the original intent, but I think at 
this point everyone uses them for convenience. Why would I want to type 
out JOIN pg_namespace n ON n.oid = blah.???namespace when I could simply 
do ???namespace::regnamespace?



2. Anticipaed un-optimizability

Tom pointed that these reg* types prevents planner from
optimizing the query, so we should refrain from having such
machinary. It should have been a long-standing issue but reg*s
sees to be rather faster than joining corresponding catalogs
for moderate number of the result rows, but this raises another
more annoying issue.


3. Potentially breakage of MVCC

The another issue Tom pointed is potentially breakage of MVCC by
using these reg* types. Syscache is invalidated on updates so
this doesn't seem to be a problem on READ COMMITTED mode, but
breaks SERIALIZABLE mode. But IMHO it is not so serious problem
as long as such inconsistency occurs only on system catalog and
it is explicitly documented togethee with the un-optimizability
issue. I'll add a patch for this later.


The way I look at it, all these arguments went out the window when 
regclass was introduced.


The reality is that reg* is *way* more convenient than manual joins. The 
arguments about optimization and MVCC presumably apply to all reg* 
casts, which means that ship has long since sailed.


If we offered views that made it easier to get at this data then maybe 
this wouldn't matter so much. But DBA's don't want to join 3 catalogs 
together to try and answer a simple question.

--
Jim Nasby, Data Architect, Blue Treble Consulting
Data in Trouble? Get it in Treble! http://BlueTreble.com


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

[HACKERS] Help me! Why did the salve stop suddenly ?

2015-02-12 Thread hailong Li

Hi, dear pgsql-hackers


*1. environment*

*DB Master*

$ cat /etc/issue
CentOS release 6.5 (Final)
Kernel \r on an \m

$ uname -av
Linux l-x1.xx.cnx 3.14.29-3.centos6.x86_64 #1 SMP Tue Jan 20 17:48:32
CST 2015 x86_64 x86_64 x86_64 GNU/Linux

$ psql -U postgres
psql (9.3.5)
Type "help" for help.

postgres=# select version();

version
--
 PostgreSQL 9.3.5 on x86_64-unknown-linux-gnu, compiled by gcc (GCC) 4.4.7
20120313 (Red Hat 4.4.7-3), 64-bit
(1 row)

$ pg_config
BINDIR = /opt/pg93/bin
DOCDIR = /opt/pg93/share/doc/postgresql
HTMLDIR = /opt/pg93/share/doc/postgresql
INCLUDEDIR = /opt/pg93/include
PKGINCLUDEDIR = /opt/pg93/include/postgresql
INCLUDEDIR-SERVER = /opt/pg93/include/postgresql/server
LIBDIR = /opt/pg93/lib
PKGLIBDIR = /opt/pg93/lib/postgresql
LOCALEDIR = /opt/pg93/share/locale
MANDIR = /opt/pg93/share/man
SHAREDIR = /opt/pg93/share/postgresql
SYSCONFDIR = /opt/pg93/etc/postgresql
PGXS = /opt/pg93/lib/postgresql/pgxs/src/makefiles/pgxs.mk
CONFIGURE = '--prefix=/opt/pg93' '--with-perl' '--with-libxml'
'--with-libxslt' '--with-ossp-uuid' 'CFLAGS= -march=core2 -O2 '
CC = gcc
CPPFLAGS = -D_GNU_SOURCE -I/usr/include/libxml2
CFLAGS = -march=core2 -O2  -Wall -Wmissing-prototypes -Wpointer-arith
-Wdeclaration-after-statement -Wendif-labels -Wmissing-format-attribute
-Wformat-security -fno-strict-aliasing -fwrapv
CFLAGS_SL = -fpic
LDFLAGS = -L../../../src/common -Wl,--as-needed
-Wl,-rpath,'/opt/pg93/lib',--enable-new-dtags
LDFLAGS_EX =
LDFLAGS_SL =
LIBS = -lpgport -lpgcommon -lxslt -lxml2 -lz -lreadline -lcrypt -ldl -lm
VERSION = PostgreSQL 9.3.5





*DB Slave*$ cat /etc/issue
CentOS release 6.5 (Final)
Kernel \r on an \m

$ uname -av
Linux l-2.xx.cnx 3.14.31-3.centos6.x86_64 #1 SMP Mon Feb 2 15:26:04 CST
2015 x86_64 x86_64 x86_64 GNU/Linux

$ psql -U postgres
psql (9.3.5)
Type "help" for help.

postgres=# select version();

version
--
 PostgreSQL 9.3.5 on x86_64-unknown-linux-gnu, compiled by gcc (GCC) 4.4.7
20120313 (Red Hat 4.4.7-3), 64-bit
(1 row)

postgres=# show log_line_prefix ;
   log_line_prefix
--
 [%u %d %a %h %m %p %c %l %x]
(1 row)


$ pg_config
BINDIR = /opt/pg93/bin
DOCDIR = /opt/pg93/share/doc/postgresql
HTMLDIR = /opt/pg93/share/doc/postgresql
INCLUDEDIR = /opt/pg93/include
PKGINCLUDEDIR = /opt/pg93/include/postgresql
INCLUDEDIR-SERVER = /opt/pg93/include/postgresql/server
LIBDIR = /opt/pg93/lib
PKGLIBDIR = /opt/pg93/lib/postgresql
LOCALEDIR = /opt/pg93/share/locale
MANDIR = /opt/pg93/share/man
SHAREDIR = /opt/pg93/share/postgresql
SYSCONFDIR = /opt/pg93/etc/postgresql
PGXS = /opt/pg93/lib/postgresql/pgxs/src/makefiles/pgxs.mk
CONFIGURE = '--prefix=/opt/pg93' '--with-perl' '--with-libxml'
'--with-libxslt' '--with-ossp-uuid' 'CFLAGS= -march=core2 -O2 '
CC = gcc
CPPFLAGS = -D_GNU_SOURCE -I/usr/include/libxml2
CFLAGS = -march=core2 -O2  -Wall -Wmissing-prototypes -Wpointer-arith
-Wdeclaration-after-statement -Wendif-labels -Wmissing-format-attribute
-Wformat-security -fno-strict-aliasing -fwrapv
CFLAGS_SL = -fpic
LDFLAGS = -L../../../src/common -Wl,--as-needed
-Wl,-rpath,'/opt/pg93/lib',--enable-new-dtags
LDFLAGS_EX =
LDFLAGS_SL =
LIBS = -lpgport -lpgcommon -lxslt -lxml2 -lz -lreadline -lcrypt -ldl -lm
VERSION = PostgreSQL 9.3.5



2.

*the DB log in the Slave's log_directory*[2015-02-05 15:38:51.406
CST 2328 54d08abc.918 6 0]WARNING:  will not overwrite a used ItemId
[2015-02-05 15:38:51.406 CST 2328 54d08abc.918 7 0]CONTEXT:  xlog
redo insert: rel 38171461/16384/57220350; tid 1778398/9
[2015-02-05 15:38:51.406 CST 2328 54d08abc.918 8 0]PANIC:
heap_insert_redo: failed to add tuple
[2015-02-05 15:38:51.406 CST 2328 54d08abc.918 9 0]CONTEXT:  xlog
redo insert: rel 38171461/16384/57220350; tid 1778398/9
[2015-02-05 15:38:51.765 CST 2320 54d08abb.910 6 0]LOG:  startup
process (PID 2328) was terminated by signal 6: Aborted
[2015-02-05 15:38:51.765 CST 2320 54d08abb.910 7 0]LOG:
terminating any other active server processes
[DBusesr DBname [unknown] 192.168.xxx.x 2015-02-05 15:38:51.765 CST
61450 54d31d48.f00a 3 0]WARNING:  terminating connection because of
crash of another server process
[DBusesr DBname [unknown] 192.168.xxx.x 2015-02-05 15:38:51.765 CST
61450 54d31d48.f00a 4 0]DETAIL:  The postmaster has commanded this
server process to roll back the current transaction and exit, because
another server process exited abnormally and possibly corrupted shared
memory.
[DBusesr DBname [unknown] 192.168.xxx.x 2015-02-05 15:38:51.765 CST
61450 54d31d48.f00a 5 0]HINT:  In a moment you should be able to
reconnect to the database and repeat your command.
[DBusesr DBname [unknown] 192.168.xxx.x 2015-02-05 15:38:51.765 CST
51208 54d315b6.c808 7 0]WARNING:  terminating connection because of
crash of another server process
[DBuse

Re: [HACKERS] enabling nestedloop and disabling hashjon

2015-02-12 Thread Rodrigo Gonzalez

On 12/2/15 18:29, Tom Lane wrote:
> Ravi Kiran  writes:
>> I am sorry for the late reply, when I  disabled the hash join command
>> "enable_hashjoin=off" in the postgresql.conf file, it was not working. But
>> I when I used the command "set enable_hashjoin=off" command in the back
>> end. It worked.
>>  I am not able to understand why it did not get disabled when I changed it
>> in the postgresql file.
> 
> The two plausible explanations for that are (1) you didn't do a reload
> or (2) you forgot to remove the '#' comment marker in the file's entry.

Or you changed the wrong postgresql.conf file

Regards

Rodrigo Gonzalez



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] enabling nestedloop and disabling hashjon

2015-02-12 Thread Jim Nasby


On 2/12/15 3:34 PM, Ravi Kiran wrote:

sorry for the inconvenience if caused to anyone, but as David G johnston
said, I was trying to change how the postgresql works and was not able
to figure out how it should be done. I will make sure it will be clear
from the next time. Thank you very much.


And we're glad for the input. But we get a tremendous amount of email, 
so it's best if posts go to the right place. Just for future reference.



@Tom lane   Sir, I forgot to remove the # comment marker in the file's
entry, Thank you.


Glad you were able to fix the problem. :)
--
Jim Nasby, Data Architect, Blue Treble Consulting
Data in Trouble? Get it in Treble! http://BlueTreble.com


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] enabling nestedloop and disabling hashjon

2015-02-12 Thread Jim Nasby

On 2/12/15 3:20 PM, David G Johnston wrote:

>>Does "show enable_hashjoin" say it's off?  If not, I think you must've
>>fat-fingered the postgresql.conf change somehow.

>
>For future reference, posts like this belong on pgsql-performance.

>but postgres is still using the hash join algorithm even after modifying
>the postgresql code

To be fair given the original post, and some other recent posts by the same
author, the question is not "why is my query performing slowly" but rather
"I'm trying to change how PostgreSQL works and cannot figure out how to
properly enable and disable algorithms".  -hackers seems like the proper
forum though the OP could give more context as to his overarching goals to
make that more clear to readers.

-hackers is for discussion directly related to developing Postgres 
itself. This email was request for help, and nothing to do with actual 
development.

I'm not trying to dismiss the importance of the request; it is 
important. But the proper forum for it was -general or -performance.

--
Jim Nasby, Data Architect, Blue Treble Consulting
Data in Trouble? Get it in Treble! http://BlueTreble.com

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] enabling nestedloop and disabling hashjon

2015-02-12 Thread Ravi Kiran

sorry for the inconvenience if caused to anyone, but as David G johnston
said, I was trying to change how the postgresql works and was not able to
figure out how it should be done. I will make sure it will be clear from
the next time. Thank you very much.

@Tom lane   Sir, I forgot to remove the # comment marker in the file's
entry, Thank you.
ᐧ

On Fri, Feb 13, 2015 at 2:50 AM, David G Johnston <
david.g.johns...@gmail.com> wrote:

> Jim Nasby-5 wrote
> > On 2/10/15 9:29 AM, Tom Lane wrote:
> >> Ravi Kiran <
>
> > ravi.kolanpaka@
>
> > > writes:
> >>> yes sir, I did try the pg_ctl reload command, but its still using the
> >>> hash
> >>> join algorithm and not the nested loop algorithm. I even restarted the
> >>> server, even then its still using the hash join algorithm
> >>
> >> Does "show enable_hashjoin" say it's off?  If not, I think you must've
> >> fat-fingered the postgresql.conf change somehow.
> >
> > For future reference, posts like this belong on pgsql-performance.
>
>
> > but postgres is still using the hash join algorithm even after modifying
> > the postgresql code
>
> To be fair given the original post, and some other recent posts by the same
> author, the question is not "why is my query performing slowly" but rather
> "I'm trying to change how PostgreSQL works and cannot figure out how to
> properly enable and disable algorithms".  -hackers seems like the proper
> forum though the OP could give more context as to his overarching goals to
> make that more clear to readers.
>
> David J.
>
>
>
>
> --
> View this message in context:
> http://postgresql.nabble.com/enabling-nestedloop-and-disabling-hashjon-tp5837275p5837728.html
> Sent from the PostgreSQL - hackers mailing list archive at Nabble.com.
>
>
> --
> Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-hackers
>

Re: [HACKERS] enabling nestedloop and disabling hashjon

2015-02-12 Thread Tom Lane

Ravi Kiran  writes:
> I am sorry for the late reply, when I  disabled the hash join command
> "enable_hashjoin=off" in the postgresql.conf file, it was not working. But
> I when I used the command "set enable_hashjoin=off" command in the back
> end. It worked.
>  I am not able to understand why it did not get disabled when I changed it
> in the postgresql file.

The two plausible explanations for that are (1) you didn't do a reload
or (2) you forgot to remove the '#' comment marker in the file's entry.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] enabling nestedloop and disabling hashjon

2015-02-12 Thread David G Johnston

Jim Nasby-5 wrote
> On 2/10/15 9:29 AM, Tom Lane wrote:
>> Ravi Kiran <

> ravi.kolanpaka@

> > writes:
>>> yes sir, I did try the pg_ctl reload command, but its still using the
>>> hash
>>> join algorithm and not the nested loop algorithm. I even restarted the
>>> server, even then its still using the hash join algorithm
>>
>> Does "show enable_hashjoin" say it's off?  If not, I think you must've
>> fat-fingered the postgresql.conf change somehow.
> 
> For future reference, posts like this belong on pgsql-performance.


> but postgres is still using the hash join algorithm even after modifying
> the postgresql code

To be fair given the original post, and some other recent posts by the same
author, the question is not "why is my query performing slowly" but rather
"I'm trying to change how PostgreSQL works and cannot figure out how to
properly enable and disable algorithms".  -hackers seems like the proper
forum though the OP could give more context as to his overarching goals to
make that more clear to readers.

David J.




--
View this message in context: 
http://postgresql.nabble.com/enabling-nestedloop-and-disabling-hashjon-tp5837275p5837728.html
Sent from the PostgreSQL - hackers mailing list archive at Nabble.com.


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] enabling nestedloop and disabling hashjon

2015-02-12 Thread Ravi Kiran

I am sorry for the late reply, when I  disabled the hash join command
"enable_hashjoin=off" in the postgresql.conf file, it was not working. But
I when I used the command "set enable_hashjoin=off" command in the back
end. It worked.
 I am not able to understand why it did not get disabled when I changed it
in the postgresql file.
ᐧ

On Fri, Feb 13, 2015 at 2:34 AM, Jim Nasby  wrote:

> On 2/10/15 9:29 AM, Tom Lane wrote:
>
>> Ravi Kiran  writes:
>>
>>> yes sir, I did try the pg_ctl reload command, but its still using the
>>> hash
>>> join algorithm and not the nested loop algorithm. I even restarted the
>>> server, even then its still using the hash join algorithm
>>>
>>
>> Does "show enable_hashjoin" say it's off?  If not, I think you must've
>> fat-fingered the postgresql.conf change somehow.
>>
>
> For future reference, posts like this belong on pgsql-performance.
>
> The other possibility is that the query estimates are so high that the
> setting doesn't matter. When you set any of the enable_* settings to off,
> all that really happens is the planner adds a cost of 10M to those nodes
> when it's planning. Normally that's enough to toss those plans out, but in
> extreme cases the cost estimates will still come up with the un-desired
> plan.
>
> Can you post EXPLAIN ANALYZE output with the setting on and off? Or at
> least plain EXLPAIN output.
> --
> Jim Nasby, Data Architect, Blue Treble Consulting
> Data in Trouble? Get it in Treble! http://BlueTreble.com
>

Re: [HACKERS] assessing parallel-safety

2015-02-12 Thread Robert Haas

On Thu, Feb 12, 2015 at 3:52 PM, Amit Kapila  wrote:
>> Probably not, because many queries will scan multiple relations, and
>> we want to do all of this work just once per query.
>
> By this, do you mean to say that if there is any parallel-unsafe
> expression (function call) in query, then we won't parallelize any
> part of query, if so why is that mandatory?

Because of stuff like set_config() and txid_current(), which will fail
outright in parallel mode.  Also because the user may have defined a
function that updates some other table in the database, which will
also fail outright if called in parallel mode.  Instead of failing, we
want those kinds of things to fall back to a non-parallel plan.

> Can't we parallelize scan on a particular relation if all the expressions
> in which that relation is involved are parallel-safe

No, because the execution of that node can be interleaved in arbitrary
fashion with all the other nodes in the plan tree.  Once we've got
parallel mode active, all the related prohibitions apply to
*everything* we do thereafter, not just that one node.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] enabling nestedloop and disabling hashjon

2015-02-12 Thread Jim Nasby


On 2/10/15 9:29 AM, Tom Lane wrote:

Ravi Kiran  writes:

yes sir, I did try the pg_ctl reload command, but its still using the hash
join algorithm and not the nested loop algorithm. I even restarted the
server, even then its still using the hash join algorithm


Does "show enable_hashjoin" say it's off?  If not, I think you must've
fat-fingered the postgresql.conf change somehow.


For future reference, posts like this belong on pgsql-performance.

The other possibility is that the query estimates are so high that the 
setting doesn't matter. When you set any of the enable_* settings to 
off, all that really happens is the planner adds a cost of 10M to those 
nodes when it's planning. Normally that's enough to toss those plans 
out, but in extreme cases the cost estimates will still come up with the 
un-desired plan.


Can you post EXPLAIN ANALYZE output with the setting on and off? Or at 
least plain EXLPAIN output.

--
Jim Nasby, Data Architect, Blue Treble Consulting
Data in Trouble? Get it in Treble! http://BlueTreble.com


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] gcc5: initdb produces gigabytes of _fsm files

2015-02-12 Thread Tom Lane

Christoph Berg  writes:
> gcc5 is lurking in Debian experimental, and it's breaking initdb.

Yeah, I just heard the same about Red Hat as well:

https://bugzilla.redhat.com/show_bug.cgi?id=1190978

Not clear if it's an outright compiler bug or they've just found some
creative new way to make an optimization assumption we're violating.
Either way it seems clear that the find-a-page-with-free-space code is
getting into an infinite loop whereby it keeps extending the FSM till
it runs out of disk space.

There's a more detailed stack trace in the Red Hat report.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] assessing parallel-safety

2015-02-12 Thread Amit Kapila

On Thu, Feb 12, 2015 at 10:07 PM, Robert Haas  wrote:
>
> On Thu, Feb 12, 2015 at 6:40 AM, Amit Kapila 
wrote:
> > If we have to go this way, then isn't it better to evaluate the same
> > when we are trying to create parallel path (something like in the
> > parallel_seq scan patch - create_parallelscan_paths())?
>
> Probably not, because many queries will scan multiple relations, and
> we want to do all of this work just once per query.

By this, do you mean to say that if there is any parallel-unsafe
expression (function call) in query, then we won't parallelize any
part of query, if so why is that mandatory?
Can't we parallelize scan on a particular relation if all the expressions
in which that relation is involved are parallel-safe

> Also, when
> somebody adds another parallel node (e.g. parallel hash join) that
> will need this same information.
>

I think we should be able to handle this even if we have per relation
information (something like don't parallelize a node/path if any lower
node is not parallel)

With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com

[HACKERS] gcc5: initdb produces gigabytes of _fsm files

2015-02-12 Thread Christoph Berg

Hi,

gcc5 is lurking in Debian experimental, and it's breaking initdb.
There's bug reports for 9.1 and 9.4:
https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=778070 (9.1)
https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=778071 (9.4)
but I could reproduce it with 9.5devel (from last week) as well:

gcc (Debian 5-20150205-1) 5.0.0 20150205 (experimental) [trunk revision 220455]


make[2]: Leaving directory '/srv/projects/postgresql/pg/master/src/common'
../../../src/test/regress/pg_regress --inputdir=. --temp-install=./tmp_check 
--top-builddir=../../..--dlpath=.  --schedule=./parallel_schedule   
== removing existing temp installation==
== creating temporary installation==
== initializing database system   ==

pg_regress: initdb failed
Examine /srv/projects/postgresql/pg/master/src/test/regress/log/initdb.log for 
the reason.
Command was: 
"/srv/projects/postgresql/pg/master/src/test/regress/./tmp_check/install//usr/local/pgsql/bin/initdb"
 -D "/srv/projects/postgresql/pg/master/src/test/regress/./tmp_check/data" -L 
"/srv/projects/postgresql/pg/master/src/test/regress/./tmp_check/install//usr/local/pgsql/share"
 --noclean --nosync > 
"/srv/projects/postgresql/pg/master/src/test/regress/log/initdb.log" 2>&1
GNUmakefile:138: recipe for target 'check' failed


src/test/regress $ cat log/initdb.log
Running in noclean mode.  Mistakes will not be cleaned up.
The files belonging to this database system will be owned by user "cbe".
This user must also own the server process.

The database cluster will be initialized with locales
  COLLATE:  de_DE.utf8
  CTYPE:de_DE.utf8
  MESSAGES: C
  MONETARY: de_DE.utf8
  NUMERIC:  de_DE.utf8
  TIME: de_DE.utf8
The default database encoding has accordingly been set to "UTF8".
The default text search configuration will be set to "german".

Data page checksums are disabled.

creating directory 
/srv/projects/postgresql/pg/master/src/test/regress/./tmp_check/data ... ok
creating subdirectories ... ok
selecting default max_connections ... 100
selecting default shared_buffers ... 128MB
selecting dynamic shared memory implementation ... sysv
creating configuration files ... ok
creating template1 database in 
/srv/projects/postgresql/pg/master/src/test/regress/./tmp_check/data/base/1 ... 
FATAL:  could not extend file "base/1/2617_fsm": wrote only 4096 of 8192 bytes 
at block 46197
HINT:  Check free disk space.
PANIC:  cannot abort transaction 1, it was already committed
Aborted (core dumped)


src/test/regress $ ls -al tmp_check/data/base/1/
insgesamt 34156376
drwx-- 2 cbe cbe   4096 Feb 12 20:04 ./
drwx-- 3 cbe cbe   4096 Feb 12 19:55 ../
-rw--- 1 cbe cbe  40960 Feb 12 20:04 1247
-rw--- 1 cbe cbe 1073741824 Feb 12 20:03 1247_fsm
-rw--- 1 cbe cbe 1073741824 Feb 12 20:03 1247_fsm.1
-rw--- 1 cbe cbe 1073741824 Feb 12 20:03 1247_fsm.2
-rw--- 1 cbe cbe 1073741824 Feb 12 20:03 1247_fsm.3
-rw--- 1 cbe cbe 1073741824 Feb 12 20:03 1247_fsm.4
-rw--- 1 cbe cbe 1073741824 Feb 12 20:03 1247_fsm.5
-rw--- 1 cbe cbe 1073741824 Feb 12 20:03 1247_fsm.6
-rw--- 1 cbe cbe 1073741824 Feb 12 20:04 1247_fsm.7
-rw--- 1 cbe cbe   59138048 Feb 12 20:04 1247_fsm.8
-rw--- 1 cbe cbe  49152 Feb 12 20:04 1249
-rw--- 1 cbe cbe 1073741824 Feb 12 20:04 1249_fsm
-rw--- 1 cbe cbe 1073741824 Feb 12 20:04 1249_fsm.1
-rw--- 1 cbe cbe 1073741824 Feb 12 20:04 1249_fsm.2
-rw--- 1 cbe cbe 1073741824 Feb 12 20:04 1249_fsm.3
-rw--- 1 cbe cbe 1073741824 Feb 12 20:04 1249_fsm.4
-rw--- 1 cbe cbe 1073741824 Feb 12 20:04 1249_fsm.5
-rw--- 1 cbe cbe 1073741824 Feb 12 20:04 1249_fsm.6
-rw--- 1 cbe cbe 1073741824 Feb 12 20:04 1249_fsm.7
-rw--- 1 cbe cbe   59138048 Feb 12 20:04 1249_fsm.8
-rw--- 1 cbe cbe 540672 Feb 12 20:03 1255
-rw--- 1 cbe cbe 1073741824 Feb 12 19:55 1255_fsm
-rw--- 1 cbe cbe 1073741824 Feb 12 19:55 1255_fsm.1
-rw--- 1 cbe cbe 1073741824 Feb 12 19:55 1255_fsm.2
-rw--- 1 cbe cbe 1073741824 Feb 12 19:55 1255_fsm.3
-rw--- 1 cbe cbe 1073741824 Feb 12 20:03 1255_fsm.4
-rw--- 1 cbe cbe 1073741824 Feb 12 20:03 1255_fsm.5
-rw--- 1 cbe cbe 1073741824 Feb 12 20:03 1255_fsm.6
-rw--- 1 cbe cbe 1073741824 Feb 12 20:03 1255_fsm.7
-rw--- 1 cbe cbe   59138048 Feb 12 20:03 1255_fsm.8
-rw--- 1 cbe cbe  16384 Feb 12 20:04 1259
-rw--- 1 cbe cbe 1073741824 Feb 12 20:04 1259_fsm
-rw--- 1 cbe cbe 1073741824 Feb 12 20:04 1259_fsm.1
-rw--- 1 cbe cbe 1073741824 Feb 12 20:04 1259_fsm.2
-rw--- 1 cbe cbe 1073741824 Feb 12 20:04 1259_fsm.3
-rw--- 1 cbe cbe 1073741824 Feb 12 20:04 1259_fsm.4
-rw--- 1 cbe cbe 1073741824 Feb 12 20:04 1259_fsm.5
-rw--- 1 cbe cbe 1073741824 Feb 12 20:04 1259_fsm.6
-rw--- 1 cbe cbe 1073741824 Feb 12 20:04 1259_fsm.7
-rw--- 1 cbe cbe   59138048 Feb 12 20:04 1259_fsm.8
-rw--- 1 cbe cbe  0 Feb 12 20:04 2604
-rw---

Re: [HACKERS] pg_basebackup -x/X doesn't play well with archive_mode & wal_keep_segments

2015-02-12 Thread Sergey Konoplev

On Thu, Feb 12, 2015 at 11:40 AM, Andres Freund  wrote:
> This obviously should not be the case. I'll have a look in a couple of hours. 
> Until then you can likely  just work around the problem by creating the 
> archive_status directory.

Thank you. Just let me know if you need some extra info or debugging.

-- 
Kind regards,
Sergey Konoplev
PostgreSQL Consultant and DBA

http://www.linkedin.com/in/grayhemp
+1 (415) 867-9984, +7 (499) 346-7196, +7 (988) 888-1979
gray...@gmail.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] pg_basebackup -x/X doesn't play well with archive_mode & wal_keep_segments

2015-02-12 Thread Andres Freund

Hi.

This obviously should not be the case. I'll have a look in a couple of hours. 
Until then you can likely  just work around the problem by creating the 
archive_status directory.
-- 
Please excuse brevity and formatting - I am writing this on my mobile phone.

Andres Freund  http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] pg_basebackup -x/X doesn't play well with archive_mode & wal_keep_segments

2015-02-12 Thread Sergey Konoplev

On Thu, Feb 12, 2015 at 11:13 AM, Andres Freund  wrote:
>>I started getting these errors after upgrading from 9.2.8 to 9.2.10.
>>Is it something critical that requires version downgrade or I can just
>>ignore that errors?
>
> What errors are you getting in precisely which circumstances? You're using 
> pg-receivexlog?

Errors like this one

pg_receivexlog: could not create archive status file
"/mnt/archive/wal/archive_status/000402AF00B7.done": No
such file or directory
pg_receivexlog: disconnected..

on

Linux xyz 3.2.0-76-generic #111-Ubuntu SMP
PostgreSQL 9.2.10

Yes, I use pg_receivexlog. I also use a wrapper/watchdog script around
pg_receivexlog which tracks failures and restarts the latter.

The WAL files time correlates with the pg_receivexlog failures.

postgres@xyz:~$ ls -ltr /mnt/archive/wal/ | tail
-rw--- 1 postgres postgres 16777216 Feb 12 10:58 000402B60011
-rw--- 1 postgres postgres 16777216 Feb 12 11:02 000402B60012
-rw--- 1 postgres postgres 16777216 Feb 12 11:06 000402B60013
-rw--- 1 postgres postgres 16777216 Feb 12 11:11 000402B60014
-rw--- 1 postgres postgres 16777216 Feb 12 11:15 000402B60015
-rw--- 1 postgres postgres 16777216 Feb 12 11:19 000402B60016
-rw--- 1 postgres postgres 16777216 Feb 12 11:23 000402B60017
-rw--- 1 postgres postgres 16777216 Feb 12 11:27 000402B60018
-rw--- 1 postgres postgres 16777216 Feb 12 11:30 000402B60019
-rw--- 1 postgres postgres 16777216 Feb 12 11:32
000402B6001A.partial

postgres@xyz:~$ cat /var/log/pgcookbook/manage_pitr-wal.log | tail
Thu Feb 12 11:15:18 PST 2015 ERROR manage_pitr.sh: Problem occured
during WAL archiving: pg_receivexlog: could not create archive status
file "/mnt/archive/wal/archive_status/000402B60015.done":
No such file or directory
pg_receivexlog: disconnected..
Thu Feb 12 11:19:33 PST 2015 ERROR manage_pitr.sh: Problem occured
during WAL archiving: pg_receivexlog: could not create archive status
file "/mnt/archive/wal/archive_status/000402B60016.done":
No such file or directory
pg_receivexlog: disconnected..
Thu Feb 12 11:23:38 PST 2015 ERROR manage_pitr.sh: Problem occured
during WAL archiving: pg_receivexlog: could not create archive status
file "/mnt/archive/wal/archive_status/000402B60017.done":
No such file or directory
pg_receivexlog: disconnected..
Thu Feb 12 11:27:32 PST 2015 ERROR manage_pitr.sh: Problem occured
during WAL archiving: pg_receivexlog: could not create archive status
file "/mnt/archive/wal/archive_status/000402B60018.done":
No such file or directory
pg_receivexlog: disconnected..
Thu Feb 12 11:30:34 PST 2015 ERROR manage_pitr.sh: Problem occured
during WAL archiving: pg_receivexlog: could not create archive status
file "/mnt/archive/wal/archive_status/000402B60019.done":
No such file or directory
pg_receivexlog: disconnected..

-- 
Kind regards,
Sergey Konoplev
PostgreSQL Consultant and DBA

http://www.linkedin.com/in/grayhemp
+1 (415) 867-9984, +7 (499) 346-7196, +7 (988) 888-1979
gray...@gmail.com

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] What exactly is our CRC algorithm?

2015-02-12 Thread Heikki Linnakangas


On 02/11/2015 04:20 PM, Abhijit Menon-Sen wrote:

At 2015-02-11 13:20:29 +0200, hlinnakan...@vmware.com wrote:


I don't follow. I didn't change configure at all, compared to your
patch.


OK, I extrapolated a little too much. Your patch didn't actually include
crc_instructions.h;


Oh, I'm sorry. Here's the complete patch with crc_instructions.h
- Heikki

From bd4a90d339e21cd6ac517d077fe3a76abb5ef37d Mon Sep 17 00:00:00 2001
From: Heikki Linnakangas 
Date: Tue, 10 Feb 2015 14:26:24 +0200
Subject: [PATCH 1/1] Use Intel SSE4.2 CRC instructions where available.

On x86, perform a runtime check to see if we're running on a CPU that
supports SSE 4.2. If we are, we can use the special crc32b and crc32q
instructions for the CRC-32C calculations. That greatly speeds up CRC
calculation.

Abhijit Menon-Sen, reviewed by Andres Freund and me.
---
 configure   |   2 +-
 configure.in|   2 +-
 src/common/pg_crc.c | 109 +---
 src/include/common/pg_crc.h |  12 +++-
 src/include/pg_config.h.in  |   3 +
 src/include/port/crc_instructions.h | 121 
 6 files changed, 235 insertions(+), 14 deletions(-)
 create mode 100644 src/include/port/crc_instructions.h

diff --git a/configure b/configure
index fa271fe..c352128 100755
--- a/configure
+++ b/configure
@@ -9204,7 +9204,7 @@ fi
 done
 
 
-for ac_header in atomic.h crypt.h dld.h fp_class.h getopt.h ieeefp.h ifaddrs.h langinfo.h mbarrier.h poll.h pwd.h sys/ioctl.h sys/ipc.h sys/poll.h sys/pstat.h sys/resource.h sys/select.h sys/sem.h sys/shm.h sys/socket.h sys/sockio.h sys/tas.h sys/time.h sys/un.h termios.h ucred.h utime.h wchar.h wctype.h
+for ac_header in atomic.h cpuid.h crypt.h dld.h fp_class.h getopt.h ieeefp.h ifaddrs.h langinfo.h mbarrier.h poll.h pwd.h sys/ioctl.h sys/ipc.h sys/poll.h sys/pstat.h sys/resource.h sys/select.h sys/sem.h sys/shm.h sys/socket.h sys/sockio.h sys/tas.h sys/time.h sys/un.h termios.h ucred.h utime.h wchar.h wctype.h
 do :
   as_ac_Header=`$as_echo "ac_cv_header_$ac_header" | $as_tr_sh`
 ac_fn_c_check_header_mongrel "$LINENO" "$ac_header" "$as_ac_Header" "$ac_includes_default"
diff --git a/configure.in b/configure.in
index e6a49d1..588d626 100644
--- a/configure.in
+++ b/configure.in
@@ -1032,7 +1032,7 @@ AC_SUBST(UUID_LIBS)
 ##
 
 dnl sys/socket.h is required by AC_FUNC_ACCEPT_ARGTYPES
-AC_CHECK_HEADERS([atomic.h crypt.h dld.h fp_class.h getopt.h ieeefp.h ifaddrs.h langinfo.h mbarrier.h poll.h pwd.h sys/ioctl.h sys/ipc.h sys/poll.h sys/pstat.h sys/resource.h sys/select.h sys/sem.h sys/shm.h sys/socket.h sys/sockio.h sys/tas.h sys/time.h sys/un.h termios.h ucred.h utime.h wchar.h wctype.h])
+AC_CHECK_HEADERS([atomic.h cpuid.h crypt.h dld.h fp_class.h getopt.h ieeefp.h ifaddrs.h langinfo.h mbarrier.h poll.h pwd.h sys/ioctl.h sys/ipc.h sys/poll.h sys/pstat.h sys/resource.h sys/select.h sys/sem.h sys/shm.h sys/socket.h sys/sockio.h sys/tas.h sys/time.h sys/un.h termios.h ucred.h utime.h wchar.h wctype.h])
 
 # On BSD, test for net/if.h will fail unless sys/socket.h
 # is included first.
diff --git a/src/common/pg_crc.c b/src/common/pg_crc.c
index eba32d3..b6db749 100644
--- a/src/common/pg_crc.c
+++ b/src/common/pg_crc.c
@@ -21,25 +21,113 @@
 
 #include "common/pg_crc.h"
 
-/* Accumulate one input byte */
-#ifdef WORDS_BIGENDIAN
-#define CRC8(x) pg_crc32c_table[0][((crc >> 24) ^ (x)) & 0xFF] ^ (crc << 8)
+#ifdef PG_HAVE_CRC32C_INSTRUCTIONS
+static pg_crc32 pg_comp_crc32c_hw(pg_crc32 crc, const void *data, size_t len);
+#endif
+
+#if !defined(PG_HAVE_CRC32C_INSTRUCTIONS) || defined(PG_CRC32C_INSTRUCTIONS_NEED_RUNTIME_CHECK)
+static pg_crc32 pg_comp_crc32c_sb8(pg_crc32 crc, const void *data, size_t len);
+static const uint32 pg_crc32c_table[8][256];
+#endif
+
+#ifdef PG_CRC32C_INSTRUCTIONS_NEED_RUNTIME_CHECK
+/*
+ * When built with support for CRC instructions, but we need to perform a
+ * run-time check to determine whether we can actually use them,
+ * pg_comp_crc32c is a function pointer. It is initialized to
+ * pg_comp_crc32c_choose, which performs the runtime check, and changes the
+ * function pointer so that subsequent calls go directly to the hw-accelerated
+ * version, or the fallback slicing-by-8 version.
+ */
+static pg_crc32
+pg_comp_crc32c_choose(pg_crc32 crc, const void *data, size_t len)
+{
+	if (pg_crc32_instructions_runtime_check())
+		pg_comp_crc32c = pg_comp_crc32c_hw;
+	else
+		pg_comp_crc32c = pg_comp_crc32c_sb8;
+
+	return pg_comp_crc32c(crc, data, len);
+}
+
+pg_crc32 (*pg_comp_crc32c)(pg_crc32 crc, const void *data, size_t len) = pg_comp_crc32c_choose;
+
 #else
-#define CRC8(x) pg_crc32c_table[0][(crc ^ (x)) & 0xFF] ^ (crc >> 8)
+/*
+ * No need for a runtime check. Compile directly with the hw-accelerated
+ * or the slicing-by-8 version. (We trust that the compiler
+ * is smart enough to inline it here.)
+ */
+pg_crc32
+pg_comp_crc32c(pg_crc32 crc, const void *data, size_t len)
+{
+#if

Re: [HACKERS] pg_basebackup -x/X doesn't play well with archive_mode & wal_keep_segments

2015-02-12 Thread Andres Freund

On February 12, 2015 8:11:05 PM CET, Sergey Konoplev  wrote:
>Hi,
>
>On Mon, Jan 5, 2015 at 4:34 AM, Andres Freund 
>wrote:
>>> >> pg_receivexlog: could not create archive status file
>>> >> "mmm/archive_status/00010003.done": No such file
>or
>>> >> directory
>>> >
>>> > Dang. Stupid typo. And my tests didn't catch it, because I had
>>> > archive_directory in the target directory :(
>
>I started getting these errors after upgrading from 9.2.8 to 9.2.10.
>Is it something critical that requires version downgrade or I can just
>ignore that errors?

What errors are you getting in precisely which circumstances? You're using 
pg-receivexlog? 

-- 
Please excuse brevity and formatting - I am writing this on my mobile phone.

Andres Freund  http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] pg_basebackup -x/X doesn't play well with archive_mode & wal_keep_segments

2015-02-12 Thread Sergey Konoplev

Hi,

On Mon, Jan 5, 2015 at 4:34 AM, Andres Freund  wrote:
>> >> pg_receivexlog: could not create archive status file
>> >> "mmm/archive_status/00010003.done": No such file or
>> >> directory
>> >
>> > Dang. Stupid typo. And my tests didn't catch it, because I had
>> > archive_directory in the target directory :(

I started getting these errors after upgrading from 9.2.8 to 9.2.10.
Is it something critical that requires version downgrade or I can just
ignore that errors?

-- 
Kind regards,
Sergey Konoplev
PostgreSQL Consultant and DBA

http://www.linkedin.com/in/grayhemp
+1 (415) 867-9984, +7 (499) 346-7196, +7 (988) 888-1979
gray...@gmail.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: binworld and install-binworld targets - was Re: [HACKERS] Release note bloat is getting out of hand

2015-02-12 Thread Peter Eisentraut

On 2/4/15 8:20 PM, Andrew Dunstan wrote:
> 
> On 02/04/2015 06:53 PM, Tom Lane wrote:
>>> Or maybe use a make variable, like NO_DOC.  I think that's preferable to
>>> adding more targets.
>> Unless we can come up with a new target name that obviously means
>> "world minus docs", the make-variable idea may be the best.

> I'm not terribly keen on this. If you don't like "binworld", how about
> "world-no-docs"?

I think using options of some kind instead of top-level targets is
preferable.

If we add world-no-docs, should we also add install-world-no-docs,
installdirs-world-no-docs, uninstall-world-no-docs, check-work-no-docs,
installcheck-world-no-docs, clean-no-docs, distclean-no-docs, etc.?
This would get out of hand.

Also, it's harder to port things like that to other build systems,
including the secondary ones we already have.

We already have configure options to decide that we don't want to deal
with part of the tree.  (There is no make world-no-python.)  We used to
have support in configure to not build part of the docs.  We could
resurrect that if that's what people want.  I'd actually prefer that
even over a make variable.

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Getting rid of wal_level=archive and default to hot_standby + wal_senders

2015-02-12 Thread Peter Eisentraut

On 2/3/15 11:00 AM, Robert Haas wrote:
> Crazy ideas: Could we make wal_level something other than
> PGC_POSTMASTER?  PGC_SIGHUP would be nice...  Could we, maybe, even
> make it a derived value rather than one that is explicitly configured?
>  Like, if you set max_wal_senders>0, you automatically get
> wal_level=hot_standby?  If you register a logical replication slot,
> you automatically get wal_level=logical?

We could probably make wal_level changeable at run-time if we somehow
recorded to the point at which it was changed, as you describe later (or
even brute-force it by forcing a checkpoint every time it is changed,
which is not worse than what we require now (or even just write out a
warning that the setting is not effective until after a checkpoint)).

But that still leaves max_wal_senders (and arguably
max_replication_slots) requiring a restart before replication can start.
 I don't see a great plan for those on the horizon.

To me, the restart requirement is the killer.

That there are so many interwoven settings isn't great either, but there
will always be more options, and all we can do is manage it.

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Index-only scans for GiST.

2015-02-12 Thread Fabrízio de Royes Mello

On Thu, Feb 12, 2015 at 3:07 PM, Thom Brown  wrote:
>
> On 12 February 2015 at 16:40, Heikki Linnakangas 
wrote:
>>
>> On 02/12/2015 12:40 PM, Anastasia Lubennikova wrote:
>>>
>>> Thanks for answer.
>>> Now it seems to be applied correctly.
>>
>>
>> * Documentation is missing.
>
>
> Anastasia provided a documentation patch in the first email.
>

Yeah, but it's a good practice send all the patches together. Will make the
life of the reviewers better ;-)

Regards,

--
Fabrízio de Royes Mello
Consultoria/Coaching PostgreSQL
>> Timbira: http://www.timbira.com.br
>> Blog: http://fabriziomello.github.io
>> Linkedin: http://br.linkedin.com/in/fabriziomello
>> Twitter: http://twitter.com/fabriziomello
>> Github: http://github.com/fabriziomello

Re: [HACKERS] Index-only scans for GiST.

2015-02-12 Thread Thom Brown

On 12 February 2015 at 16:40, Heikki Linnakangas 
wrote:

> On 02/12/2015 12:40 PM, Anastasia Lubennikova wrote:
>
>> Thanks for answer.
>> Now it seems to be applied correctly.
>>
>
> * Documentation is missing.
>

Anastasia provided a documentation patch in the first email.

Thom

Re: [HACKERS] assessing parallel-safety

2015-02-12 Thread Robert Haas

On Wed, Feb 11, 2015 at 3:21 PM, Robert Haas  wrote:
> I think we may want a dedicated parallel-safe property for functions
> rather than piggybacking on provolatile ...

I went through the current contents of pg_proc and tried to assess how
much parallel-unsafe stuff we've got.  I think there are actually
three categories of things: (1) functions that can be called in
parallel mode either in the worker or in the leader ("parallel safe"),
(2) functions that can be called in parallel mode in the worker, but
not in the leader ("parallel restricted"), and (3) functions that
cannot be called in parallel mode at all ("parallel unsafe").  On a
first read-through, the number of things that looked not to be
anything other than parallel-safe looked to be fairly small; many of
these could be made parallel-safe with more work, but it's unlikely to
be worth the effort.

current_query() - Restricted because debug_query_string is not copied.
lo_open(), lo_close(), loread(), lowrite(), and other large object
functions - Restricted because large object state is not shared.
age(xid) - Restricted because it uses a transaction-lifespan cache
which is not shared.
now() - Restricted because transaction start timestamp is not copied.
statement_timestamp() - Restricted because statement start timestamp
is not copied.
pg_conf_load_time() - Restricted because PgReloadTime is not copied.
nextval(), currval() - Restricted because sequence-related state is not shared.
setval() - Unsafe because no data can be written in parallel mode.
random(), setseed() - Restricted because random seed state is not
shared. (We could alternatively treat setseed() as unsafe and random()
to be restricted only in sessions where setseed() has never been
called, and otherwise safe.)
pg_stat_get_* - Restricted because there's no guarantee the value
would be the same in the parallel worker as in the leader.
pg_backend_pid() - Restricted because the worker has a different PID.
set_config() - Unsafe because GUC state must remain synchronized.
pg_my_temp_schema() - Restricted because temporary namespaces aren't
shared with parallel workers.
pg_export_snapshot() - Restricted because the worker will go away quickly.
pg_prepared_statement(), pg_cursor() - Restricted because the prepared
statements and cursors are not synchronized with the worker.
pg_listening_channels() - Restricted because listening channels are
not synchronized with the worker.
pg*advisory*lock*() - Restricted because advisory lock state is not
shared with workers - and even if it were, the semantics would be hard
to reason about.
txid_current() - Unsafe because it might attempt XID assignment.
pg_logical_slot*() - Unsafe because they do all kinds of crazy stuff.

That's not a lot, and very little of it is anything you'd care about
parallelizing anyway.  I expect that the incidence of user-written
parallel-unsafe functions will be considerably higher.  I'm not sure
if this impacts the decision about how to design the facility for
assessing parallel-safety or not, but I thought it was worth sharing.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Index-only scans for GiST.

2015-02-12 Thread Heikki Linnakangas


On 02/12/2015 12:40 PM, Anastasia Lubennikova wrote:

Thanks for answer.
Now it seems to be applied correctly.


Thanks, it would be great to get this completed.

This still leaks memory with the same test query as earlier. The leak 
seems to be into a different memory context now; it used to be to the 
"GiST scan context", but now it's to the ExecutorState context. See 
attached patch which makes the leak evident.


Looking at my previous comments from August:

* What's the reason for turning GISTScanOpaqueData.pageData from an
array to a List?

* Documentation is missing.

- Heikki
diff --git a/src/backend/access/gist/gistget.c b/src/backend/access/gist/gistget.c
index 0925e56..3768c9c 100644
--- a/src/backend/access/gist/gistget.c
+++ b/src/backend/access/gist/gistget.c
@@ -526,6 +526,12 @@ gistgettuple(PG_FUNCTION_ARGS)
  * It's always head of so->pageData
  */
 so->pageData = list_delete_first(so->pageData);
+{
+	static int lastreport = 0;
+	if ((lastreport++) % 1 == 0)
+		MemoryContextStats(CurrentMemoryContext);
+}
+
 PG_RETURN_BOOL(TRUE);
 			}
 

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] assessing parallel-safety

2015-02-12 Thread Robert Haas

On Thu, Feb 12, 2015 at 6:40 AM, Amit Kapila  wrote:
> If we have to go this way, then isn't it better to evaluate the same
> when we are trying to create parallel path (something like in the
> parallel_seq scan patch - create_parallelscan_paths())?

Probably not, because many queries will scan multiple relations, and
we want to do all of this work just once per query.  Also, when
somebody adds another parallel node (e.g. parallel hash join) that
will need this same information.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Index-only scans for GiST.

2015-02-12 Thread Thom Brown

On 12 February 2015 at 10:40, Anastasia Lubennikova  wrote:

> Thanks for answer.
> Now it seems to be applied correctly.
>

(please avoid top-posting)

Thanks for the updated patch.  I can confirm that it now cleanly applies
and compiles fine.  I've run the tested in the SQL file you provided, and
it's behaving as expected.

Thom

Re: [HACKERS] Standby receiving part of missing WAL segment

2015-02-12 Thread Robert Haas

On Thu, Feb 12, 2015 at 9:56 AM, Thom Brown  wrote:
> On 12 February 2015 at 13:56, Robert Haas  wrote:
>>
>> On Wed, Feb 11, 2015 at 12:55 PM, Thom Brown  wrote:
>> > Today I witnessed a situation which appears to have gone down like this:
>> >
>> > - The primary server starting streaming WAL data from segment 00A8 to
>> > the
>> > standby
>> > - The standby server started receiving that data
>> > - Before 00A8 is finished, the wal sender process dies on the primary,
>> > but
>> > the archiver process continues, and 00A8 ends up being archived as usual
>> > - The primary continues to generate WAL and cleans up old WAL files from
>> > pg_xlog until 00A8 is gone.
>> > - The primary is restarted and the wal sender process is back up and
>> > running
>> > - The standby says "waiting for 00A8", which it can no longer get from
>> > the
>> > primary
>> > - 00A8 is in the standby's archive directory, but the standby is waiting
>> > for
>> > the rest of the segment from the primary via streaming replication, so
>> > doesn't check the archive
>> > - The standby is restarted
>> > - The standby goes back into recovery and eventually replays 00A8 and
>> > continues as normal.
>> >
>> > Should the standby be able to get feedback from the primary that the
>> > requested segment is no longer available, and therefore know to check
>> > its
>> > archive?
>>
>> Last time I played around with this, if the standby requested a
>> segment from the master that was no longer present there, the standby
>> would immediately get an ERROR, which it seems like would get you out
>> of trouble.  I wonder why that didn't happen in your case.
>
>
> Yeah, I've tried recreating this like so:
>
> - Primary streams to standby like usual
> - Kill -9 primary then change its port and bring it back up
> - Create traffic on primary until it no longer has the WAL file the standby
> wants, but has archived it
> - Change the port of the primary back to what the standby is trying to talk
> to
>
> But before it gets to that 4th point, the standby has gone to the archive
> for the rest of it:
>
> cp: cannot stat ‘/tmp/walarch/0001000600C8’: No such file or
> directory
> 2015-02-12 14:47:52 GMT [8280]: [1-1] user=,db=,client= FATAL:  could not
> connect to the primary server: could not connect to server: Connection
> refused
> Is the server running on host "127.0.0.1" and accepting
> TCP/IP connections on port 5488?
>
> cp: cannot stat ‘/tmp/walarch/0001000600C8’: No such file or
> directory
> 2015-02-12 14:47:57 GMT [8283]: [1-1] user=,db=,client= FATAL:  could not
> connect to the primary server: could not connect to server: Connection
> refused
> Is the server running on host "127.0.0.1" and accepting
> TCP/IP connections on port 5488?
>
> 2015-02-12 14:48:02 GMT [8202]: [6-1] user=,db=,client= LOG:  restored log
> file "0001000600C8" from archive
>
> I don't suppose this is something that was buggy in 9.3.1?

I don't know.  I don't remember a bug of that type, but I might have
missed it.  If you see the scenario again, it might help to pull a
stack trace from the master and the standby.  That might tell us
whether the master or the standby is to blame, and where the offender
is wedged.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Manipulating complex types as non-contiguous structures in-memory

2015-02-12 Thread Robert Haas

On Thu, Feb 12, 2015 at 9:50 AM, Tom Lane  wrote:
>> My first thought is that we should form some kind of TOAST-like
>> backronym, like Serialization Avoidance Loading and Access Device
>> (SALAD) or Break-up, Read, Edit, Assemble, and Deposit (BREAD).  I
>> don't think there is anything per se wrong with the terms
>> serialization and deserialization; indeed, I used the same ones in the
>> parallel-mode stuff.  But they are fairly general terms, so it might
>> be nice to have something more specific that applies just to this
>> particular usage.
>
> Hm.  I'm not against the concept, but those particular suggestions don't
> grab me.

Fair enough.  I guess the core of my point is just that I suggest we
invent a name for this thing.  "Serialize" and "deserialize" describe
what you are doing just fine, but the mechanism itself should be
called something, I think.  When you say "varlena header" or "TOAST
pointer" that is a name for a very particular thing, not just a
general category of things you might do.  If we replaced every
instance of "TOAST pointer" to "reference to where the full value is
stored", it would be much less clear, and naming all of the related
functions would be harder.

>> This is a clever
>> representation, but it's hard to wrap your head around, and I'm not
>> sure "primary" and "secondary" are the best names, although I don't
>> have an idea as to what would be better.  I'm a bit confused, though:
>> once you give out a secondary pointer, how is it safe to write the
>> object through the primary pointer?
>
> It's no different from allowing plpgsql to update the values of variables
> of pass-by-reference types even though it has previously given out Datums
> that are pointers to them: by the time we're ready to execute an
> assignment, any query execution that had such a pointer is over and done
> with.  (This implies that cursor parameters have to be physically copied
> into the cursor's execution state, which is one of a depressingly large
> number of reasons why datumCopy() has to physically copy a deserialized
> value rather than just copying the pointer.  But otherwise it works.)

OK, I see.  So giving out a secondary pointer doesn't necessarily
preclude further changes via the primary pointer, but you'd better be
sure that you don't try until such time as all of those secondary
references are gone.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] GSoC 2015 - mentors, students and admins.

2015-02-12 Thread Alexander Korotkov

Hi!

On Mon, Feb 9, 2015 at 11:52 PM, Thom Brown  wrote:

> Google Summer of Code 2015 is approaching.  I'm intending on registering
> PostgreSQL again this year.
>
> Before I do that, I'd like to have an idea of how many people are
> interested in being either a student or a mentor.
>

I'm ready to participate as mentor again!

--
With best regards,
Alexander Korotkov.

Re: [HACKERS] Standby receiving part of missing WAL segment

2015-02-12 Thread Thom Brown

On 12 February 2015 at 13:56, Robert Haas  wrote:

> On Wed, Feb 11, 2015 at 12:55 PM, Thom Brown  wrote:
> > Today I witnessed a situation which appears to have gone down like this:
> >
> > - The primary server starting streaming WAL data from segment 00A8 to the
> > standby
> > - The standby server started receiving that data
> > - Before 00A8 is finished, the wal sender process dies on the primary,
> but
> > the archiver process continues, and 00A8 ends up being archived as usual
> > - The primary continues to generate WAL and cleans up old WAL files from
> > pg_xlog until 00A8 is gone.
> > - The primary is restarted and the wal sender process is back up and
> running
> > - The standby says "waiting for 00A8", which it can no longer get from
> the
> > primary
> > - 00A8 is in the standby's archive directory, but the standby is waiting
> for
> > the rest of the segment from the primary via streaming replication, so
> > doesn't check the archive
> > - The standby is restarted
> > - The standby goes back into recovery and eventually replays 00A8 and
> > continues as normal.
> >
> > Should the standby be able to get feedback from the primary that the
> > requested segment is no longer available, and therefore know to check its
> > archive?
>
> Last time I played around with this, if the standby requested a
> segment from the master that was no longer present there, the standby
> would immediately get an ERROR, which it seems like would get you out
> of trouble.  I wonder why that didn't happen in your case.


Yeah, I've tried recreating this like so:

- Primary streams to standby like usual
- Kill -9 primary then change its port and bring it back up
- Create traffic on primary until it no longer has the WAL file the standby
wants, but has archived it
- Change the port of the primary back to what the standby is trying to talk
to

But before it gets to that 4th point, the standby has gone to the archive
for the rest of it:

cp: cannot stat ‘/tmp/walarch/0001000600C8’: No such file or
directory
2015-02-12 14:47:52 GMT [8280]: [1-1] user=,db=,client= FATAL:  could not
connect to the primary server: could not connect to server: Connection
refused
Is the server running on host "127.0.0.1" and accepting
TCP/IP connections on port 5488?

cp: cannot stat ‘/tmp/walarch/0001000600C8’: No such file or
directory
2015-02-12 14:47:57 GMT [8283]: [1-1] user=,db=,client= FATAL:  could not
connect to the primary server: could not connect to server: Connection
refused
Is the server running on host "127.0.0.1" and accepting
TCP/IP connections on port 5488?

2015-02-12 14:48:02 GMT [8202]: [6-1] user=,db=,client= LOG:  restored log
file "0001000600C8" from archive

I don't suppose this is something that was buggy in 9.3.1?

Thom

Re: [HACKERS] Manipulating complex types as non-contiguous structures in-memory

2015-02-12 Thread Tom Lane

Robert Haas  writes:
> On Tue, Feb 10, 2015 at 3:00 PM, Tom Lane  wrote:
>> BTW, I'm not all that thrilled with the "deserialized object" terminology.
>> I found myself repeatedly tripping up on which form was serialized and
>> which de-.  If anyone's got a better naming idea I'm willing to adopt it.

> My first thought is that we should form some kind of TOAST-like
> backronym, like Serialization Avoidance Loading and Access Device
> (SALAD) or Break-up, Read, Edit, Assemble, and Deposit (BREAD).  I
> don't think there is anything per se wrong with the terms
> serialization and deserialization; indeed, I used the same ones in the
> parallel-mode stuff.  But they are fairly general terms, so it might
> be nice to have something more specific that applies just to this
> particular usage.

Hm.  I'm not against the concept, but those particular suggestions don't
grab me.

> I found the notion of "primary" and "secondary" TOAST pointers to be
> quite confusing.  I *think* what you are doing is storing two pointers
> to the object in the object, and a pointer to the object is really a
> pointer to one of those two pointers to the object.  Depending on
> which one it is, you can write the object, or not.

There's more to it than that.  (Writing more docs is one of the to-do
items ;-).)  We could alternatively have done that with two different
va_tag values for "read write" and "read only", which indeed was my
initial intention before I thought of this dodge.  However, then you
have to figure out where to store such pointers, which is problematic
both for plpgsql variable assignment and for ExecMakeSlotContentsReadOnly,
especially the latter which would have to put any freshly-made pointer
in a long-lived context resulting in query-lifespan memory leaks.
So I early decided that the read-write pointer should live right in the
object's own context where it need not be copied when swinging the
context ownership someplace else, and later realized that there should
also be a permanent read-only pointer in there for the use of
ExecMakeSlotContentsReadOnly, and then realized that they didn't need
to have different va_tag values if we implemented the "is read-write
pointer" test as it's done in the patch.  Having only one va_tag value
not two saves cycles, I think, because there are a lot of low-level
tests that don't need to distinguish, eg VARTAG_SIZE().  However it
does make it more expensive when you do need to distinguish, so I might
reconsider that decision later.  (Since these will never go to disk,
we can whack the representation around pretty freely if needed.)

Also, I have hopes of allowing deserialized-object pointers to be copied
into tuples as pointers rather than by reserialization, if we can
establish that the tuple is short-lived enough that the pointer will stay
good, which would be true in a lot of cases during execution of queries by
plpgsql.  With the patch's design, a pointer so copied will automatically
be considered read-only, which I *think* is the behavior we'd need.  If it
turns out that it's okay to propagate read-write-ness through such a copy
step then that would argue in favor of using two va_tag values.

It may be that this solution is overly cute and we should just use two
tag values.  But I wanted to be sure it was possible for copying of a
pointer to automatically lose read-write-ness, in case we need to have
such a guarantee.

> This is a clever
> representation, but it's hard to wrap your head around, and I'm not
> sure "primary" and "secondary" are the best names, although I don't
> have an idea as to what would be better.  I'm a bit confused, though:
> once you give out a secondary pointer, how is it safe to write the
> object through the primary pointer?

It's no different from allowing plpgsql to update the values of variables
of pass-by-reference types even though it has previously given out Datums
that are pointers to them: by the time we're ready to execute an
assignment, any query execution that had such a pointer is over and done
with.  (This implies that cursor parameters have to be physically copied
into the cursor's execution state, which is one of a depressingly large
number of reasons why datumCopy() has to physically copy a deserialized
value rather than just copying the pointer.  But otherwise it works.)

There is more work to do to figure out how we can safely give out a
read/write pointer for cases like
hstore_var := hstore_concat(hstore_var, ...);
Aside from the question of whether hstore_concat guarantees not to trash
the value on failure, we'd have to restrict this (I think) to expressions
in which there is only one reference to the target variable and it's an
argument of the topmost function/operator.  But that's something I've not
tried to implement yet.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Logical decoding document

2015-02-12 Thread Tatsuo Ishii

>> Hi, I need help.
>> 
>> In "46.6.4.5 Change Callback"
>> 
>>Note: Only changes in user defined tables that are not unlogged
>>(see UNLOGGED) and not temporary (see TEMPORARY or TEMP) can be
>>extracted using logical decoding.
>> 
>> I cannot parse the sentence above. Maybe logical decoding does not
>> decode a table if it is an unloged table or a temporary table?
> 
> It basically is saying that you won't see changes made to temporary
> and/or unlogged tables. But the begin/commit callbacks being called for
> the relevant transaction.

Oh, thanks.

Best regards,
--
Tatsuo Ishii
SRA OSS, Inc. Japan
English: http://www.sraoss.co.jp/index_en.php
Japanese:http://www.sraoss.co.jp


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] pg_check_dir comments and implementation mismatch

2015-02-12 Thread Marco Nenciarini

Il 02/02/15 21:48, Robert Haas ha scritto:
> On Sat, Jan 31, 2015 at 8:28 AM, Marco Nenciarini
>  wrote:
>> Il 30/01/15 03:54, Michael Paquier ha scritto:
>>> On Fri, Jan 30, 2015 at 2:49 AM, Tom Lane  wrote:
 There is at least one other bug in that function now that I look at it:
 in event of a readdir() failure, it neglects to execute closedir().
 Perhaps not too significant since all existing callers will just exit()
 anyway after a failure, but still ...
>>> I would imagine that code scanners like coverity or similar would not
>>> be happy about that. ISTM that it is better to closedir()
>>> appropriately in all the code paths.
>>>
>>
>> I've attached a new version of the patch fixing the missing closedir on
>> readdir error.
> 
> If readir() fails and closedir() succeeds, the return will be -1 but
> errno will be 0.
> 

The attached patch should fix it.

Regards,
Marco

-- 
Marco Nenciarini - 2ndQuadrant Italy
PostgreSQL Training, Services and Support
marco.nenciar...@2ndquadrant.it | www.2ndQuadrant.it
diff --git a/src/port/pgcheckdir.c b/src/port/pgcheckdir.c
index 7061893..7102f2c 100644
*** a/src/port/pgcheckdir.c
--- b/src/port/pgcheckdir.c
***
*** 22,28 
   * Returns:
   *0 if nonexistent
   *1 if exists and empty
!  *2 if exists and not empty
   *-1 if trouble accessing directory (errno reflects the error)
   */
  int
--- 22,30 
   * Returns:
   *0 if nonexistent
   *1 if exists and empty
!  *2 if exists and contains _only_ dot files
!  *3 if exists and contains a mount point
!  *4 if exists and not empty
   *-1 if trouble accessing directory (errno reflects the error)
   */
  int
*** pg_check_dir(const char *dir)
*** 32,37 
--- 34,41 
DIR*chkdir;
struct dirent *file;
booldot_found = false;
+   boolmount_found = false;
+   int readdir_errno;
  
chkdir = opendir(dir);
if (chkdir == NULL)
*** pg_check_dir(const char *dir)
*** 51,60 
{
dot_found = true;
}
else if (strcmp("lost+found", file->d_name) == 0)
{
!   result = 3; /* not empty, mount 
point */
!   break;
}
  #endif
else
--- 55,64 
{
dot_found = true;
}
+   /* lost+found directory */
else if (strcmp("lost+found", file->d_name) == 0)
{
!   mount_found = true;
}
  #endif
else
*** pg_check_dir(const char *dir)
*** 64,72 
}
}
  
!   if (errno || closedir(chkdir))
result = -1;/* some kind of I/O error? */
  
/* We report on dot-files if we _only_ find dot files */
if (result == 1 && dot_found)
result = 2;
--- 68,87 
}
}
  
!   if (errno)
result = -1;/* some kind of I/O error? */
  
+   /* Close chkdir and avoid overwriting the readdir errno on success */
+   readdir_errno = errno;
+   if (closedir(chkdir))
+   result = -1;/* error executing closedir */
+   else
+   errno = readdir_errno;
+ 
+   /* We report on mount point if we find a lost+found directory */
+   if (result == 1 && mount_found)
+   result = 3;
+ 
/* We report on dot-files if we _only_ find dot files */
if (result == 1 && dot_found)
result = 2;


signature.asc
Description: OpenPGP digital signature

Re: [HACKERS] Simplify sleeping while reading/writing from client

2015-02-12 Thread Heikki Linnakangas


On 02/06/2015 11:50 AM, Andres Freund wrote:

On 2015-02-05 16:45:50 +0200, Heikki Linnakangas wrote:

I propose the attached, which pulls all the wait-retry logic up to
secure_read() and secure_write(). This makes the code a lot more
understandable.


Generally a good idea. Especially if we get more ssl implementations.

But I think it'll make the already broken renegotiation handling even
worse. Because now we're always entering do_handshake in nonblocking
mode (essentially).


Good point. In the other thread, we concluded that the 
SSL_do_handshake() call isn't really needed anyway, so attached are two 
patches: 1. remove the SSL_do_handshake() calls, and 2. the previous 
patch, to simplify the waiting logic.


- Heikki

From 30e5930c9da553d504339e00a05a2a8892182f63 Mon Sep 17 00:00:00 2001
From: Heikki Linnakangas 
Date: Thu, 12 Feb 2015 15:53:26 +0200
Subject: [PATCH 1/2] Simplify the way OpenSSL renegotiation is initiated in
 server.

At least in all modern versions of OpenSSL, it is enough to call
SSL_renegotiate() once, and then forget about it. Subsequent SSL_write()
and SSL_read() calls will finish the handshake.

The SSL_set_session_id_context() call is unnecessary too. We only have
one SSL context, and the SSL session was created with that to begin with.
---
 src/backend/libpq/be-secure-openssl.c | 23 ---
 1 file changed, 23 deletions(-)

diff --git a/src/backend/libpq/be-secure-openssl.c b/src/backend/libpq/be-secure-openssl.c
index d5f9712..d13ce33 100644
--- a/src/backend/libpq/be-secure-openssl.c
+++ b/src/backend/libpq/be-secure-openssl.c
@@ -624,33 +624,10 @@ be_tls_write(Port *port, void *ptr, size_t len)
 		 */
 		SSL_clear_num_renegotiations(port->ssl);
 
-		SSL_set_session_id_context(port->ssl, (void *) &SSL_context,
-   sizeof(SSL_context));
 		if (SSL_renegotiate(port->ssl) <= 0)
 			ereport(COMMERROR,
 	(errcode(ERRCODE_PROTOCOL_VIOLATION),
 	 errmsg("SSL failure during renegotiation start")));
-		else
-		{
-			int			retries;
-
-			/*
-			 * A handshake can fail, so be prepared to retry it, but only
-			 * a few times.
-			 */
-			for (retries = 0;; retries++)
-			{
-if (SSL_do_handshake(port->ssl) > 0)
-	break;	/* done */
-ereport(COMMERROR,
-		(errcode(ERRCODE_PROTOCOL_VIOLATION),
-		 errmsg("SSL handshake failure on renegotiation, retrying")));
-if (retries >= 20)
-	ereport(FATAL,
-			(errcode(ERRCODE_PROTOCOL_VIOLATION),
-			 errmsg("could not complete SSL handshake on renegotiation, too many failures")));
-			}
-		}
 	}
 
 wloop:
-- 
2.1.4

From 6341bd76e8184c5c464c90a1ba3d99f02ef61bac Mon Sep 17 00:00:00 2001
From: Heikki Linnakangas 
Date: Thu, 5 Feb 2015 16:44:13 +0200
Subject: [PATCH 2/2] Simplify waiting logic in reading from / writing to
 client.

The client socket is always in non-blocking mode, and if we actually want
blocking behaviour, we emulate it by sleeping and retrying. But we retry
loops at different layers for reads and writes, which was confusing.
To simplify, remove all the sleeping and retrying code from the lower
levels, from be_tls_read and secure_raw_read and secure_raw_write, and put
all the logic in secure_read() and secure_write().
---
 src/backend/libpq/be-secure-openssl.c |  81 +--
 src/backend/libpq/be-secure.c | 143 ++
 src/backend/libpq/pqcomm.c|   3 +-
 src/include/libpq/libpq-be.h  |   4 +-
 4 files changed, 79 insertions(+), 152 deletions(-)

diff --git a/src/backend/libpq/be-secure-openssl.c b/src/backend/libpq/be-secure-openssl.c
index d13ce33..37af6e4 100644
--- a/src/backend/libpq/be-secure-openssl.c
+++ b/src/backend/libpq/be-secure-openssl.c
@@ -511,14 +511,11 @@ be_tls_close(Port *port)
  *	Read data from a secure connection.
  */
 ssize_t
-be_tls_read(Port *port, void *ptr, size_t len)
+be_tls_read(Port *port, void *ptr, size_t len, int *waitfor)
 {
 	ssize_t		n;
 	int			err;
-	int			waitfor;
-	int			latchret;
 
-rloop:
 	errno = 0;
 	n = SSL_read(port->ssl, ptr, len);
 	err = SSL_get_error(port->ssl, n);
@@ -528,39 +525,15 @@ rloop:
 			port->count += n;
 			break;
 		case SSL_ERROR_WANT_READ:
+			*waitfor = WL_SOCKET_READABLE;
+			errno = EWOULDBLOCK;
+			n = -1;
+			break;
 		case SSL_ERROR_WANT_WRITE:
-			/* Don't retry if the socket is in nonblocking mode. */
-			if (port->noblock)
-			{
-errno = EWOULDBLOCK;
-n = -1;
-break;
-			}
-
-			waitfor = WL_LATCH_SET;
-
-			if (err == SSL_ERROR_WANT_READ)
-waitfor |= WL_SOCKET_READABLE;
-			else
-waitfor |= WL_SOCKET_WRITEABLE;
-
-			latchret = WaitLatchOrSocket(MyLatch, waitfor, port->sock, 0);
-
-			/*
-			 * We'll, among other situations, get here if the low level
-			 * routine doing the actual recv() via the socket got interrupted
-			 * by a signal. That's so we can handle interrupts once outside
-			 * openssl, so we don't jump out from underneath its covers. We
-			 * can check this both, when reading and writing,

Re: [HACKERS] Logical decoding document

2015-02-12 Thread Andres Freund

Hi,

On 2015-02-12 22:55:33 +0900, Tatsuo Ishii wrote:
> Hi, I need help.
> 
> In "46.6.4.5 Change Callback"
> 
>Note: Only changes in user defined tables that are not unlogged
>(see UNLOGGED) and not temporary (see TEMPORARY or TEMP) can be
>extracted using logical decoding.
> 
> I cannot parse the sentence above. Maybe logical decoding does not
> decode a table if it is an unloged table or a temporary table?

It basically is saying that you won't see changes made to temporary
and/or unlogged tables. But the begin/commit callbacks being called for
the relevant transaction.

Greetings,

Andres Freund

-- 
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Standby receiving part of missing WAL segment

2015-02-12 Thread Robert Haas

On Wed, Feb 11, 2015 at 12:55 PM, Thom Brown  wrote:
> Today I witnessed a situation which appears to have gone down like this:
>
> - The primary server starting streaming WAL data from segment 00A8 to the
> standby
> - The standby server started receiving that data
> - Before 00A8 is finished, the wal sender process dies on the primary, but
> the archiver process continues, and 00A8 ends up being archived as usual
> - The primary continues to generate WAL and cleans up old WAL files from
> pg_xlog until 00A8 is gone.
> - The primary is restarted and the wal sender process is back up and running
> - The standby says "waiting for 00A8", which it can no longer get from the
> primary
> - 00A8 is in the standby's archive directory, but the standby is waiting for
> the rest of the segment from the primary via streaming replication, so
> doesn't check the archive
> - The standby is restarted
> - The standby goes back into recovery and eventually replays 00A8 and
> continues as normal.
>
> Should the standby be able to get feedback from the primary that the
> requested segment is no longer available, and therefore know to check its
> archive?

Last time I played around with this, if the standby requested a
segment from the master that was no longer present there, the standby
would immediately get an ERROR, which it seems like would get you out
of trouble.  I wonder why that didn't happen in your case.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

[HACKERS] Logical decoding document

2015-02-12 Thread Tatsuo Ishii

Hi, I need help.

In "46.6.4.5 Change Callback"

   Note: Only changes in user defined tables that are not unlogged
   (see UNLOGGED) and not temporary (see TEMPORARY or TEMP) can be
   extracted using logical decoding.

I cannot parse the sentence above. Maybe logical decoding does not
decode a table if it is an unloged table or a temporary table?

Best regards,
--
Tatsuo Ishii
SRA OSS, Inc. Japan
English: http://www.sraoss.co.jp/index_en.php
Japanese:http://www.sraoss.co.jp


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Manipulating complex types as non-contiguous structures in-memory

2015-02-12 Thread Robert Haas

On Tue, Feb 10, 2015 at 3:00 PM, Tom Lane  wrote:
> I've now taken this idea as far as building the required infrastructure
> and revamping a couple of array operators to use it.  There's a lot yet
> to do, but I've done enough to get some preliminary ideas about
> performance (see below).

Very impressive.  This is something that's been mentioned before and
which I always thought would be great to have, but I didn't expect it
would be this easy to cobble together a working implementation.  Or
maybe "easy" isn't the right term, but this isn't a very big patch.

> BTW, I'm not all that thrilled with the "deserialized object" terminology.
> I found myself repeatedly tripping up on which form was serialized and
> which de-.  If anyone's got a better naming idea I'm willing to adopt it.

My first thought is that we should form some kind of TOAST-like
backronym, like Serialization Avoidance Loading and Access Device
(SALAD) or Break-up, Read, Edit, Assemble, and Deposit (BREAD).  I
don't think there is anything per se wrong with the terms
serialization and deserialization; indeed, I used the same ones in the
parallel-mode stuff.  But they are fairly general terms, so it might
be nice to have something more specific that applies just to this
particular usage.

I found the notion of "primary" and "secondary" TOAST pointers to be
quite confusing.  I *think* what you are doing is storing two pointers
to the object in the object, and a pointer to the object is really a
pointer to one of those two pointers to the object.  Depending on
which one it is, you can write the object, or not.  This is a clever
representation, but it's hard to wrap your head around, and I'm not
sure "primary" and "secondary" are the best names, although I don't
have an idea as to what would be better.  I'm a bit confused, though:
once you give out a secondary pointer, how is it safe to write the
object through the primary pointer?

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] File based Incremental backup v8

2015-02-12 Thread Marco Nenciarini

Hi,

I've attached an updated version of the patch. This fixes the issue on
checksum calculation for segments after the first one.

To solve it I've added an optional uint32 *segno argument to
parse_filename_for_nontemp_relation, so I can know the segment number
and calculate the block number correctly.

Il 29/01/15 18:57, Robert Haas ha scritto:
> On Thu, Jan 29, 2015 at 9:47 AM, Marco Nenciarini
>  wrote:
>> The current implementation of copydir function is incompatible with LSN
>> based incremental backups. The problem is that new files are created,
>> but their blocks are still with the old LSN, so they will not be backed
>> up because they are looking old enough.
> 
> I think this is trying to pollute what's supposed to be a pure
> fs-level operation ("copy a directory") into something that is aware
> of specific details like the PostgreSQL page format.  I really think
> that nothing in storage/file should know about the page format.  If we
> need a function that copies a file while replacing the LSNs, I think
> it should be a new function living somewhere else.
> 

I've named it copydir_set_lsn and placed it as static function in
dbcommands.c. This lefts the copydir and copy_file functions in copydir.c
untouched. The copydir function in copydir.c is now unused, while the copy_file
function is still used during unlogged tables reinit.

> A bigger problem is that you are proposing to stamp those files with
> LSNs that are, for lack of a better word, fake.  I would expect that
> this would completely break if checksums are enabled.  Also, unlogged
> relations typically have an LSN of 0; this would change that in some
> cases, and I don't know whether that's OK.
>

I've investigate a bit and I have not been able to find any problem here.

> The issues here are similar to those in
> http://www.postgresql.org/message-id/20150120152819.gc24...@alap3.anarazel.de
> - basically, I think we need to make CREATE DATABASE and ALTER
> DATABASE .. SET TABLESPACE fully WAL-logged operations, or this is
> never going to work right.  If we're not going to allow that, we need
> to disallow hot backups while those operations are in progress.
>

As already said the copydir-LSN patch should be treated as a "temporary"
until a proper WAL logging of CREATE DATABASE and ALTER DATABASE SET
TABLESPACE will be implemented. At that time we could probably get rid
of the whole copydir.[ch] file moving the copy_file function inside reinit.c

Regards,
Marco

-- 
Marco Nenciarini - 2ndQuadrant Italy
PostgreSQL Training, Services and Support
marco.nenciar...@2ndquadrant.it | www.2ndQuadrant.it


diff --git a/src/backend/storage/file/reinit.c 
b/src/backend/storage/file/reinit.c
index afd9255..02b5fee 100644
*** a/src/backend/storage/file/reinit.c
--- b/src/backend/storage/file/reinit.c
*** static void ResetUnloggedRelationsInTabl
*** 28,35 
  int 
op);
  static void ResetUnloggedRelationsInDbspaceDir(const char *dbspacedirname,
   int op);
- static bool parse_filename_for_nontemp_relation(const char *name,
-   int 
*oidchars, ForkNumber *fork);
  
  typedef struct
  {
--- 28,33 
*** ResetUnloggedRelationsInDbspaceDir(const
*** 388,446 
fsync_fname((char *) dbspacedirname, true);
}
  }
- 
- /*
-  * Basic parsing of putative relation filenames.
-  *
-  * This function returns true if the file appears to be in the correct format
-  * for a non-temporary relation and false otherwise.
-  *
-  * NB: If this function returns true, the caller is entitled to assume that
-  * *oidchars has been set to the a value no more than OIDCHARS, and thus
-  * that a buffer of OIDCHARS+1 characters is sufficient to hold the OID
-  * portion of the filename.  This is critical to protect against a possible
-  * buffer overrun.
-  */
- static bool
- parse_filename_for_nontemp_relation(const char *name, int *oidchars,
-   
ForkNumber *fork)
- {
-   int pos;
- 
-   /* Look for a non-empty string of digits (that isn't too long). */
-   for (pos = 0; isdigit((unsigned char) name[pos]); ++pos)
-   ;
-   if (pos == 0 || pos > OIDCHARS)
-   return false;
-   *oidchars = pos;
- 
-   /* Check for a fork name. */
-   if (name[pos] != '_')
-   *fork = MAIN_FORKNUM;
-   else
-   {
-   int forkchar;
- 
-   forkchar = forkname_chars(&name[pos + 1], fork);
-   if (forkchar <= 0)
-   return false;
-   pos += forkchar + 1;
-   }
- 
-   /* Check for a segment number. */
-   if (name[pos] == '.')
-   {
-   int segchar;
- 
-

Re: [HACKERS] SSL renegotiation and other related woes

2015-02-12 Thread Heikki Linnakangas


On 02/12/2015 01:33 AM, Andres Freund wrote:

On 2015-02-11 14:54:03 +0200, Heikki Linnakangas wrote:

Thoughts? Can you reproduce any errors with this?


I'm on battery right now, so I can't really test much.

Did you test having an actual standby instead of pg_receivexlog? I saw
some slightly different errors there.

Does this patch alone work for you or did you test it together with the
earlier one to fix the renegotiation handling server side when
nonblocking? Because that frequently seemed to error out for me, at
least over actual network. A latch loop + checking for the states seemed
to fix that for me.


This patch alone worked for me.


+* NB: This relies on the calling code to call pqsecure_read(), 
completing
+* the renegotiation handshake, if pqsecure_write() returns 0. Otherwise
+* we'll never make progress.
+*/


I think this should a) refer to the fact that pqSendSome does that in
blocking scenarios b) expand the comment around the pqReadData to
reference that fact.

How are we going to deal with callers using libpq in nonblocking mode. A
client doing a large COPY FROM STDIN to the server using a nonblocking
connection (not a stupid thing to do) could hit this IIUC.


Hmm, that's problematic even without SSL. If you do a large COPY FROM 
STDIN, and the server sends a lot of NOTICEs, the NOTICEs will be 
accumulated in the TCP send and receive buffers until they fill up. 
After that, the server will block on the send, and will stop reading the 
tuple data the client sends, and you get a deadlock. In blocking mode, 
pqSendSome calls pqReadData to avoid that.


I think pqSendSome should call pqReadData() after getting EWOULDBLOCK, 
even in non-blocking mode. It won't completely prevent the problem: it's 
possible that the receive buffer fills up, but the application doesn't 
call pqSendSome() until the socket becomes writeable, which won't happen 
until the client drains the incoming data is drained and unblocks the 
server. But it greatly reduces the risk. In practice that will solve the 
problem for SSL renegotiations.


We should also document that the application should always wait for the 
socket readable condition, in addition to writeable, and call 
pqConsumeInput(). That fixes the issue for good.



Could we perhaps call SSL_peek() when SSL_write() hits WANTS_READ? In
combination with your fix I think that should fix the possibility of
such a deadlock? Especially on the serverside where the socket most of
the time is is in, although emulated, blocking mode that seems easier -
we can just retry afterwards.


Hmm. Not sure what the semantics of SSL_peek() are. At least we'd need 
to call it with a large enough buffer that it would read all the 
incoming data from the OS buffer. I think it would be safer to just call 
SSL_read(). Both the server and libpq have an input buffer that you can 
easily read into at any time.


- Heikki



--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] assessing parallel-safety

2015-02-12 Thread Robert Haas

On Thu, Feb 12, 2015 at 12:16 AM, Noah Misch  wrote:
> That is a major mark against putting the check in simplify_function(), agreed.

I do see one way to rescue that idea, which is this: put two flags,
parallelModeOK and parallelModeRequired, into PlannerGlobal.  At the
beginning of planning, set parallelModeOK based on our best knowledge
at that time; as we preprocess expressions, update it to false if we
see something that's not parallel-safe.  Emit paths for parallel plans
only if the flag is currently true.  At the end of planning, when we
convert paths to plans, set parallelModeRequired if any parallel plan
elements are generated.  If we started with parallelModeOK = true or
ended up with parallelModeRequired = false, then life is good.  In the
unfortunate event that we end up with parallelModeOK = false and
parallelModeRequired = true, replan, this time with parallelModeOK =
false from the beginning.

If there are no sub-queries involved, this will work out fine -
parallelModeOK will always be definitively set to the right value
before we rely on it for anything.  This includes cases where
subqueries are eliminated by pulling them up.  If there *are*
subqueries, we'll still be OK if we happen to hit the parallel-unsafe
construct before we hit the part - if any - that can benefit from
parallelism.

So this would mostly be pretty cheap, but if you do hit the case where
a re-plan is required it would be pretty painful.

>> > Unless we want to rejigger this so that we do a
>> > complete eval_const_expressions() pass over the entire query tree
>> > (including all subqueries) FIRST, and then only after that go back and
>> > plan all of those subqueries, I don't see how to make this work; and
>> > I'm guessing that there are good reasons not to do that.
>
> I expect that would work fine, but I do think it premature to venture that far
> out of your way to optimize this new tree examination.  The cost may just not
> matter.  Other parts of the planner use code like contain_volatile_functions()
> and contain_nonstrict_functions(), which have the same kind of inefficiency
> you're looking to avoid here.

They do, but those are run on much smaller pieces of the query tree,
never on the whole thing.  I really wish Tom Lane would weigh in here,
as an opinion from him could save me from spending time on a doomed
approach.  I expect criticism of this type:

http://www.postgresql.org/message-id/22266.1269641...@sss.pgh.pa.us

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] libpq's multi-threaded SSL callback handling is busted

2015-02-12 Thread Jan Urbański


Jan Urbański writes:

> Andres Freund writes:
>
>> On 2015-02-12 09:31:27 +0100, Jan Urbański wrote:
>>> That doesn't solve the problem of the Python deadlock, where you're not at
>>> leisure to call a C function at the beginning of your module.
>>
>> We could just never unload the hooks...
>
> That's what we did before 4e816286533dd34c10b368487d4079595a3e1418 :) And it
> got changed after 
> http://www.postgresql.org/message-id/48620925.6070...@pws.com.au
>
>>
>>> > * If there's already callbacks set: Remember that fact and don't
>>> >   overwrite. In the next major version: warn.
>>>
>>> So yeah, that was my initial approach - check if callbacks are set, don't do
>>> the dance if they are. It felt like a crutch, though, and racy at that. 
>>> There's
>>> no atomic way to test-and-set those callbacks. The window for racyness is
>>> small, though.
>>
>> If you do that check during library initialization instead of every
>> connection it shouldn't be racy - if that part is run in a multithreaded
>> fashion you're doing something crazy.
>
> Yes, that's true. The problem is that there's no real libpq initialisation
> function. The docs say that:
>
> "If your application initializes libssl and/or libcrypto libraries and libpq 
> is
> built with SSL support, you should call PQinitOpenSSL"
>
> So most apps will just not bother. The moment you know you'll need SSL is only
> when you get an 'S' message from the server...

For the sake of discussion, here's a patch to prevent stomping on
previously-set callbacks, racy as it looks.

FWIW, it does fix the Python deadlock and doesn't cause the PHP segfault...

J
diff --git a/src/interfaces/libpq/fe-secure-openssl.c b/src/interfaces/libpq/fe-secure-openssl.c
index a32af34..54312a5 100644
--- a/src/interfaces/libpq/fe-secure-openssl.c
+++ b/src/interfaces/libpq/fe-secure-openssl.c
@@ -710,6 +710,9 @@ verify_peer_name_matches_certificate(PGconn *conn)
  *	Callback functions for OpenSSL internal locking
  */
 
+#if OPENSSL_VERSION_NUMBER < 0x1000
+/* OpenSSL 1.0.0 deprecates the CRYPTO_set_id_callback function and provides a
+ * default implementation, so there's no need for our own. */
 static unsigned long
 pq_threadidcallback(void)
 {
@@ -720,6 +723,7 @@ pq_threadidcallback(void)
 	 */
 	return (unsigned long) pthread_self();
 }
+#endif   /* OPENSSL_VERSION_NUMBER < 0x1000 */
 
 static pthread_mutex_t *pq_lockarray;
 
@@ -806,9 +810,14 @@ pgtls_init(PGconn *conn)
 
 		if (ssl_open_connections++ == 0)
 		{
-			/* These are only required for threaded libcrypto applications */
-			CRYPTO_set_id_callback(pq_threadidcallback);
-			CRYPTO_set_locking_callback(pq_lockingcallback);
+			/* These are only required for threaded libcrypto applications, but
+			 * make sure we don't stomp on them if they're already set. */
+#if OPENSSL_VERSION_NUMBER < 0x1000
+			if (CRYPTO_get_id_callback() == NULL)
+CRYPTO_set_id_callback(pq_threadidcallback);
+#endif
+			if (CRYPTO_get_locking_callback() == NULL)
+CRYPTO_set_locking_callback(pq_lockingcallback);
 		}
 	}
 #endif   /* ENABLE_THREAD_SAFETY */
@@ -885,9 +894,14 @@ destroy_ssl_system(void)
 
 	if (pq_init_crypto_lib && ssl_open_connections == 0)
 	{
-		/* No connections left, unregister libcrypto callbacks */
-		CRYPTO_set_locking_callback(NULL);
-		CRYPTO_set_id_callback(NULL);
+		/* No connections left, unregister libcrypto callbacks, if no one
+		 * registered different ones in the meantime. */
+#if OPENSSL_VERSION_NUMBER < 0x1000
+		if (CRYPTO_get_id_callback() == pq_threadidcallback)
+			CRYPTO_set_id_callback(NULL);
+#endif
+		if (CRYPTO_get_locking_callback() == pq_lockingcallback)
+			CRYPTO_set_locking_callback(NULL);
 
 		/*
 		 * We don't free the lock array or the SSL_context. If we get another

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] libpq's multi-threaded SSL callback handling is busted

2015-02-12 Thread Jan Urbański


Andres Freund writes:

> On 2015-02-12 09:31:27 +0100, Jan Urbański wrote:
>> That doesn't solve the problem of the Python deadlock, where you're not at
>> leisure to call a C function at the beginning of your module.
>
> We could just never unload the hooks...

That's what we did before 4e816286533dd34c10b368487d4079595a3e1418 :) And it
got changed after 
http://www.postgresql.org/message-id/48620925.6070...@pws.com.au

>
>> > * If there's already callbacks set: Remember that fact and don't
>> >   overwrite. In the next major version: warn.
>>
>> So yeah, that was my initial approach - check if callbacks are set, don't do
>> the dance if they are. It felt like a crutch, though, and racy at that. 
>> There's
>> no atomic way to test-and-set those callbacks. The window for racyness is
>> small, though.
>
> If you do that check during library initialization instead of every
> connection it shouldn't be racy - if that part is run in a multithreaded
> fashion you're doing something crazy.

Yes, that's true. The problem is that there's no real libpq initialisation
function. The docs say that:

"If your application initializes libssl and/or libcrypto libraries and libpq is
built with SSL support, you should call PQinitOpenSSL"

So most apps will just not bother. The moment you know you'll need SSL is only
when you get an 'S' message from the server...

Cheers,
Jan


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] assessing parallel-safety

2015-02-12 Thread Amit Kapila

On Thu, Feb 12, 2015 at 1:51 AM, Robert Haas  wrote:
>
>
> I think we may want a dedicated parallel-safe property for functions
> rather than piggybacking on provolatile, but that will probably also
> be changeable via ALTER FUNCTION, and stored rules won't get
> miraculously updated.  So this definitely can't be something we figure
> out at parse-time ... it's got to be determined later.  But at the
> moment I see no way to do that without an extra pass over the whole
> rewritten query tree.  :-(
>

If we have to go this way, then isn't it better to evaluate the same
when we are trying to create parallel path (something like in the
parallel_seq scan patch - create_parallelscan_paths())?

With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com

Re: [HACKERS] How about to have relnamespace and relrole?

2015-02-12 Thread Kyotaro HORIGUCHI

Hello, I changed the subject.

This mail is to address the point at hand, preparing for
registering this commitfest.

15 17:29:14 +0900 (Tokyo Standard Time), Kyotaro HORIGUCHI
 wrote in
<20150204.172914.52110711.horiguchi.kyot...@lab.ntt.co.jp>
> Tue, 03 Feb 2015 10:12:12 -0500, Tom Lane  wrote in 
> <2540.1422976...@sss.pgh.pa.us>
> > I'm not really excited about that.  That line of thought would imply
> > that we should have "reg*" types for every system catalog, which is
> > surely overkill.
>
> Mmm. I suppose "for every OID usage", not "every system catalog".
> but I agree as the whole. There's no agreeable-by-all
> boundary. Perhaps it depends on how often the average DBA looks
> onto catalogs which have oids pointing another catalog which they
> want to see in human-readable form, without joins if possible.
>
> I very roughly counted how often the oids of catalogs referred
> from other catalogs.

1. Expected frequency of use

I counted how often oids of system catalogs are referred to by
other system catalog/views. Among them, pg_stat_* views, are
excluded since they have text representations for all oid
references.

The result is like this. The full result of the counting is in
the Excel file but it's not at hand for now.. I'll show it here
if anyone wants to see it.

pg_class.oid: 27
pg_authid.oid:33
pg_namespace.oid: 20
pg_proc.oid:  13
pg_type.oid:  15
pg_databse.oid:5
pg_operator.oid:   5
pg_am.oid: 4


Among these, authid and namespace are apparently referred to
frequently but don't have their reg* types but referred to from
more points than proc, type, operator, am..

# By the way, I don't understand where the "reg" comes from,
# REGistered? Or other origin?

For that reason, although the current policy of deciding whether
to have reg* types seems to be whether they have schema-qualified
names, I think regrole and regnamespace are valuable to have.


2. Anticipaed un-optimizability

Tom pointed that these reg* types prevents planner from
optimizing the query, so we should refrain from having such
machinary. It should have been a long-standing issue but reg*s
sees to be rather faster than joining corresponding catalogs
for moderate number of the result rows, but this raises another
more annoying issue.


3. Potentially breakage of MVCC

The another issue Tom pointed is potentially breakage of MVCC by
using these reg* types. Syscache is invalidated on updates so
this doesn't seem to be a problem on READ COMMITTED mode, but
breaks SERIALIZABLE mode. But IMHO it is not so serious problem
as long as such inconsistency occurs only on system catalog and
it is explicitly documented togethee with the un-optimizability
issue. I'll add a patch for this later.

regards,

-- 
Kyotaro Horiguchi
NTT Open Source Software Center


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Table-level log_autovacuum_min_duration

2015-02-12 Thread Michael Paquier

On Thu, Feb 12, 2015 at 5:44 PM, Naoya Anzai 
wrote:

> Hi, Michael-san
>
> > An updated patch is attached,
>
> I'm sorry for confusing you.
>
> I think you don't have to implement this code to disable this
> feature with using value "-2".Because this use case is a rare case,
> and there is a practical workaround using huge value like "2e9".
> (You suggested "2e9" to me, didn't you? :) ) So, please remove this code.
>

I will clean up the code.


> > Well, I see your point but this is not completely true: we could as
> > well rely entirely on this parameter instead of VACOPT_VERBOSE to
> > determine if autovacuum, a vacuum or an analyze are in verbose mode,
> > and remove VACOPT_VERBOSE, but I can imagine people complaining if
> > VACOPT_VERBOSE is removed. So let's set it up unconditionally to -1 in
> > gram.y for now. However if people think that it is fine to remove
> > VACOPT_VERBOSE, we could use exclusively this parameter in VacuumStmt.
> > Or even add sanity checks at the top of vacuum() to ensure that
> > VACOPT_VERBOSE is set only when log_min_duration is positive.
> > Additional opinions on this matter are welcome.
>
> I understand your point at last. :)
>
> You mean that ...
> Log_autovacuum_min_duration assumes a role of VACOPT_VERBOSE.
> Even if this parameter never use currently for manual vacuum,
> log_autovacuum_min_duration should be set zero(anytime output)
> when we executes "VACUUM(or ANALYZE) VERBOSE".
> Is my understanding correct? If so,it sounds logical.
>

Yup, that's my opinion. Now I don't know if people would mind to remove
VACOPT_VERBOSE and replace the control it does by log_min_duration in
VacuumStmt. At least both things are overlapping, and log_min_duration
offers more options than the plain VACOPT_VERBOSE.


> If we can abolish VERBOSE option,
> I think it's ideal that we will prepare a new parameter like
> a log_min_duration_vacuum(and log_min_duration_analyze) which
> integrating "VERBOSE feature" and "log_autovacuum_min_duration".
>

What I think you are proposing here is a VERBOSE option that hypothetically
gets activated if a manual VACUUM takes more than a certain amount
specified by those parameters. I doubt this would be useful. In any case
this is unrelated to this patch.
-- 
Michael

Re: [HACKERS] [REVIEW] Re: Compression of full-page-writes

2015-02-12 Thread Syed, Rahila


Thank you for comments. Please find attached the updated patch.

>This patch fails to compile:
>xlogreader.c:1049:46: error: extraneous ')' after condition, expected a 
>statement
>blk->with_hole && blk->hole_offset <= 
> 0))
This has been rectified.

>Note as well that at least clang does not like much how the sanity check with 
>with_hole are done. You should place parentheses around the '&&' expressions. 
>Also, I would rather define with_hole == 0 or with_hole == 1 explicitly int 
>those checks
The expressions are modified accordingly.

>There is a typo:
>s/true,see/true, see/
>[nitpicky]Be as well aware of the 80-character limit per line that is usually 
>normally by comment blocks.[/]

Have corrected the typos and changed the comments as mentioned. Also , 
realigned certain lines to meet the 80-char limit.

Thank you,
Rahila Syed



__
Disclaimer: This email and any attachments are sent in strictest confidence
for the sole use of the addressee and may contain legally privileged,
confidential, and proprietary data. If you are not the intended recipient,
please advise the sender by replying promptly to this email and then delete
and destroy this email and any attachments without any further use, copying
or forwarding.

Support-compression-for-full-page-writes-in-WAL_v18.patch
Description: Support-compression-for-full-page-writes-in-WAL_v18.patch

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Parallel Seq Scan

2015-02-12 Thread Amit Kapila

On Thu, Feb 12, 2015 at 2:19 AM, Robert Haas  wrote:
>
> On Tue, Feb 10, 2015 at 3:56 PM, Andres Freund 
wrote:
> > On 2015-02-10 09:23:02 -0500, Robert Haas wrote:
> >> On Tue, Feb 10, 2015 at 9:08 AM, Andres Freund 
wrote:
> >
> > As pointed out above (moved there after reading the patch...) I don't
> > think a chunk size of 1 or any other constant size can make sense. I
> > don't even believe it'll necessarily be constant across an entire query
> > execution (big initially, small at the end).  Now, we could move
> > determining that before the query execution into executor
> > initialization, but then we won't yet know how many workers we're going
> > to get. We could add a function setting that at runtime, but that'd mix
> > up responsibilities quite a bit.
>
> I still think this belongs in heapam.c somehow or other.  If the logic
> is all in the executor, then it becomes impossible for any code that
> doensn't use the executor to do a parallel heap scan, and that's
> probably bad.  It's not hard to imagine something like CLUSTER wanting
> to reuse that code, and that won't be possible if the logic is up in
> some higher layer.  If the logic we want is to start with a large
> chunk size and then switch to a small chunk size when there's not much
> of the relation left to scan, there's still no reason that can't be
> encapsulated in heapam.c.
>

It seems to me that we need to use both ways (make heap or other lower
layers aware of parallelism and another one is handle at executor level and
use callback_function and callback_state to make it work) for doing
parallelism.  TBH, I think for the matter of this patch we can go either way
and then think more on it as we move ahead to parallelize other operations.
So what I can do is to try using Robert's patch to make heap aware of
parallelism and then see how it comes up?

> > Btw, using a atomic uint32 you'd end up without the spinlock and just
> > about the same amount of code... Just do a atomic_fetch_add_until32(var,
> > 1, InvalidBlockNumber)... ;)
>
> I thought of that, but I think there's an overflow hazard.
>
> > Where the 'coordinated seqscan' scans a relation so that each tuple
> > eventually gets returned once across all nodes, but the nested loop (and
> > through it the index scan) will just run normally, without any
> > coordination and parallelism. But everything below --- would happen
> > multiple nodes. If you agree, yes, then we're in violent agreement
> > ;). The "single node that gets copied" bit above makes me a bit unsure
> > whether we are though.
>
> Yeah, I think we're talking about the same thing.
>
> > To me, given the existing executor code, it seems easiest to achieve
> > that by having the ParallelismDrivingNode above having a dynamic number
> > of nestloop children in different backends and point the coordinated
> > seqscan to some shared state.  As you point out, the number of these
> > children cannot be certainly known (just targeted for) at plan time;
> > that puts a certain limit on how independent they are.  But since a
> > large number of them can be independent between workers it seems awkward
> > to generally treat them as being the same node across workers. But maybe
> > that's just an issue with my mental model.
>
> I think it makes sense to think of a set of tasks in which workers can
> assist.  So you a query tree which is just one query tree, with no
> copies of the nodes, and then there are certain places in that query
> tree where a worker can jump in and assist that node.  To do that, it
> will have a copy of the node, but that doesn't mean that all of the
> stuff inside the node becomes shared data at the code level, because
> that would be stupid.
>

As per my understanding of the discussion related to this point, I think
there are 3 somewhat related ways to achieve this.

1. Both master and worker runs the same node (ParallelSeqScan) where
the work done by worker (scan chunks of the heap) for this node is
subset of what is done by master (coordinate the data returned by workers +
scan chunks of heap).  It seems to me Robert is advocating this approach.
2. Master and worker uses different nodes to operate. Master runs
parallelism
drivingnode (ParallelSeqscan - coordinate the data returned by workers +
scan chunks of heap ) and worker runs some form of Parallelismdriver
node (PartialSeqScan - scan chunks of the heap).  It seems to me
Andres is proposing this approach.
3. Same as 2, but modify existing SeqScan node to behave as
PartialSeqScan.  This is what I have done in patch.

Correct me or add here if I have misunderstood any thing.

I think going forward (for cases like aggregation) the work done in
Master and Worker node will have substantial differences that it
is better to do the work as part of different nodes in master and
worker.

With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com

Re: [HACKERS] Index-only scans for GiST.

2015-02-12 Thread Anastasia Lubennikova

Thanks for answer.
Now it seems to be applied correctly.

2015-02-12 3:12 GMT+04:00 Thom Brown :

> On 11 February 2015 at 22:50, Anastasia Lubennikova <
> lubennikov...@gmail.com> wrote:
>
>> Finally there is a new version of patch (in attachments).
>> It provides multicolumn index-only scan for GiST indexes.
>>
>> - Memory leak is fixed.
>> - little code cleanup
>> - example of performance test in attachmens
>> - function OIDs have debugging values (*) just to avoid merge
>> conflicts while testing patch
>>
>> Wiki page of the project is
>>
>> https://wiki.postgresql.org/wiki/Support_for_Index-only_scans_for_GIST_GSoC_2014
>>
>> Waiting for feedback.
>>
>
> Hi Anastasia.  Thanks for the updated patch.  I've just tried applying it
> to head and it doesn't appear to apply cleanly.
>
> $ patch -p1 < ~/Downloads/indexonlyscan_gist_2.0.patch
> (Stripping trailing CRs from patch; use --binary to disable.)
> patching file src/backend/access/gist/gist.c
> Hunk #1 succeeded at 1404 (offset 9 lines).
> Hunk #2 succeeded at 1434 (offset 9 lines).
> (Stripping trailing CRs from patch; use --binary to disable.)
> patching file src/backend/access/gist/gistget.c
> Hunk #1 succeeded at 227 (offset 1 line).
> Hunk #2 succeeded at 243 (offset 1 line).
> Hunk #3 succeeded at 293 (offset -4 lines).
> Hunk #4 succeeded at 330 (offset -4 lines).
> Hunk #5 succeeded at 365 (offset -5 lines).
> Hunk #6 succeeded at 444 (offset -27 lines).
> Hunk #7 succeeded at 474 (offset -27 lines).
> Hunk #8 FAILED at 518.
> Hunk #9 succeeded at 507 (offset -28 lines).
> Hunk #10 succeeded at 549 with fuzz 1 (offset -28 lines).
> Hunk #11 FAILED at 601.
> 2 out of 11 hunks FAILED -- saving rejects to file
> src/backend/access/gist/gistget.c.rej
> ...
>
> --
> Thom
>



-- 
Best regards,
Lubennikova Anastasia
diff --git a/src/backend/access/gist/gist.c b/src/backend/access/gist/gist.c
index db2a452..53750da 100644
--- a/src/backend/access/gist/gist.c
+++ b/src/backend/access/gist/gist.c
@@ -1404,6 +1404,14 @@ initGISTstate(Relation index)
 		else
 			giststate->distanceFn[i].fn_oid = InvalidOid;
 
+		/* opclasses are not required to provide a Fetch method */
+		if (OidIsValid(index_getprocid(index, i + 1, GIST_FETCH_PROC)))
+			fmgr_info_copy(&(giststate->fetchFn[i]),
+		 index_getprocinfo(index, i + 1, GIST_FETCH_PROC),
+		   scanCxt);
+		else
+			giststate->fetchFn[i].fn_oid = InvalidOid;
+
 		/*
 		 * If the index column has a specified collation, we should honor that
 		 * while doing comparisons.  However, we may have a collatable storage
@@ -1426,6 +1434,22 @@ initGISTstate(Relation index)
 	return giststate;
 }
 
+/*
+ * Gistcanreturn is supposed to be true if ANY FetchFn method is defined.
+ * If FetchFn exists it would be used in index-only scan
+ * Thus the responsibility rests with the opclass developer.
+ */
+
+Datum
+gistcanreturn(PG_FUNCTION_ARGS) {
+	Relation index = (Relation) PG_GETARG_POINTER(0);
+	int i = PG_GETARG_INT32(1);
+	if (OidIsValid(index_getprocid(index, i+1, GIST_FETCH_PROC)))
+		PG_RETURN_BOOL(true);
+	else
+		PG_RETURN_BOOL(false);
+}
+
 void
 freeGISTstate(GISTSTATE *giststate)
 {
diff --git a/src/backend/access/gist/gistget.c b/src/backend/access/gist/gistget.c
index 717cb85..0925e56 100644
--- a/src/backend/access/gist/gistget.c
+++ b/src/backend/access/gist/gistget.c
@@ -227,8 +227,10 @@ gistindex_keytest(IndexScanDesc scan,
  * If tbm/ntids aren't NULL, we are doing an amgetbitmap scan, and heap
  * tuples should be reported directly into the bitmap.  If they are NULL,
  * we're doing a plain or ordered indexscan.  For a plain indexscan, heap
- * tuple TIDs are returned into so->pageData[].  For an ordered indexscan,
+ * tuple TIDs are returned into so->pageData. For an ordered indexscan,
  * heap tuple TIDs are pushed into individual search queue items.
+ * If index-only scan is possible, heap tuples themselves are returned
+ * into so->pageData or into search queue.
  *
  * If we detect that the index page has split since we saw its downlink
  * in the parent, we push its new right sibling onto the queue so the
@@ -241,6 +243,10 @@ gistScanPage(IndexScanDesc scan, GISTSearchItem *pageItem, double *myDistances,
 	GISTScanOpaque so = (GISTScanOpaque) scan->opaque;
 	Buffer		buffer;
 	Page		page;
+	GISTSTATE *giststate = so->giststate;
+	Relation r = scan->indexRelation;
+	boolisnull[INDEX_MAX_KEYS];
+	GISTSearchHeapItem *tmpListItem;
 	GISTPageOpaque opaque;
 	OffsetNumber maxoff;
 	OffsetNumber i;
@@ -287,8 +293,6 @@ gistScanPage(IndexScanDesc scan, GISTSearchItem *pageItem, double *myDistances,
 		MemoryContextSwitchTo(oldcxt);
 	}
 
-	so->nPageData = so->curPageData = 0;
-
 	/*
 	 * check all tuples on page
 	 */
@@ -326,11 +330,20 @@ gistScanPage(IndexScanDesc scan, GISTSearchItem *pageItem, double *myDistances,
 		else if (scan->numberOfOrderBys == 0 && GistPageIsLeaf(page))
 		{
 			/*
-			 * Non-ordered scan, so report heap tuples in so->pageData[]
+			 * Non-ordered scan, so rep

Re: [HACKERS] libpq's multi-threaded SSL callback handling is busted

2015-02-12 Thread Andres Freund

On 2015-02-12 09:31:27 +0100, Jan Urbański wrote:
> 
> Andres Freund writes:
> 
> > On 2015-02-09 18:17:14 +0100, Jan Urbański wrote:
> >> First of all, the current behaviour is crazy. We're setting and unsetting 
> >> the
> >> locking callback every time a connection is made/closed, which is not how
> >> OpenSSL is supposed to be used. The *application* using libpq should set a
> >> callback before it starts threads, it's no business of the library's.
> >
> > I don't buy this argument at all. Delivering a libpq that's not
> > threadsafe in some circumstances is a really bad idea.
> 
> I knew this would be a hard sell :( What I know is that the current situation
> is not good and leaving it as-is causes real grief.

It certainly causes less grief than just breaking working
applications. While really shitty the current situation works in a large
number of scenarios. It's not that common to have several users of
openssl in an application *and* unload libpq.

> > I fail to see why it's any better to do it in the VM. That relies on
> > either always loading the VM's openssl module, even if you don't
> > actually need it because the library you use deals with openssl. It
> > prevents other implementations of openssl in the VM.
> 
> The callbacks are a thing that's owned by the application, so libraries have 
> no
> business in setting them up. The way OpenSSL's API is specified (very
> unfortunately IMHO) is that before you use OpenSSL from threads, you need to
> set the callbacks. If you don't control your application's startup (like when
> you're a script executed through a VM), you assume the VM took care of it at
> startup. If you're a library, you assume the user took care of it, as they
> should.

That's a bogus argument. The VM cannot setup up every library in the
world in a correct way. It'd be obviously be completely insane to load
the ssl module in every library just because some part of some random
application might need it. In fact, the ssl library for python does
pretty much the same thing as libpq does (although it's slightly more
careful). It's not the VM at all.

> > What I think we should do is the following:
> >
> > * Emphasize the fact that it's important to use PQinitSSL(false) if you
> >   did things yourself.
> 
> PQinitOpenSSL(true, false) if anything. The reason that function got 
> introduced
> is precisely because PQinitSSL(false) isn't exactly right, see
> http://www.postgresql.org/message-id/49820d7d.7030...@esilo.com

Well, that really depends on what the application is actually using...

> That doesn't solve the problem of the Python deadlock, where you're not at
> leisure to call a C function at the beginning of your module.

We could just never unload the hooks...

> > * If there's already callbacks set: Remember that fact and don't
> >   overwrite. In the next major version: warn.
> 
> So yeah, that was my initial approach - check if callbacks are set, don't do
> the dance if they are. It felt like a crutch, though, and racy at that. 
> There's
> no atomic way to test-and-set those callbacks. The window for racyness is
> small, though.

If you do that check during library initialization instead of every
connection it shouldn't be racy - if that part is run in a multithreaded
fashion you're doing something crazy.

> > * Assert or something if the callback when unregistering isn't the same
> >   as it was when registering. That's pretty much guaranteed to cause
> >   issues.
> 
> So let me try another proposal and see if it doesn't set alarm bells off.
> 
>   * When opening a connection, if there's a callback set, don't overwrite it
> (small race window).

>   * When closing a connection, if the callback set is not
> pq_lockingcallback/pq_threadidcallback, don't NULL it (small race window)

If we do the tests once during initialization there shouldn't be a
race

> Asserts in frontend code are impossible, right? There's no way to warn, 
> either.

You can write to stderr...

> That would be a ~8 line patch. Does it feel back-patcheable?

I think we first need to find out what we all can agree on before
deciding about that.

Greetings,

Andres Freund


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] libpq's multi-threaded SSL callback handling is busted

2015-02-12 Thread Jan Urbański

Andres Freund writes:

> On 2015-02-09 18:17:14 +0100, Jan Urbański wrote:
>> First of all, the current behaviour is crazy. We're setting and unsetting the
>> locking callback every time a connection is made/closed, which is not how
>> OpenSSL is supposed to be used. The *application* using libpq should set a
>> callback before it starts threads, it's no business of the library's.
>
> I don't buy this argument at all. Delivering a libpq that's not
> threadsafe in some circumstances is a really bad idea.

I knew this would be a hard sell :( What I know is that the current situation
is not good and leaving it as-is causes real grief.

>> The old behaviour was slightly less insane (set callbacks first time we're
>> engaging OpenSSL code, never touch them again). The real sane solution is to
>> leave it up to the application.
>
> The real solution would be for openssl to do it itself.

I think that the OpenSSL API is the real culprit there, requiring threads to
register a callback is what's causing all the issues, but I don't think this
will get ever changed.

>> Note that the PHP pgsql extension doesn't set the OpenSSL callbacks, because
>> libpq was setting them on its own. If we remove the callback handling from
>> libpq, PHP will need to add them. By the way, the MySQL extension for PHP 
>> also
>> does not set those callbacks.
>
> I think this shows pretty clearly how insane it would be to change this
> in a minor release. Do you really expect people to update libpq and php
> in unison. After a minor release?

Well, I haven't found reports of threaded PHP+MySQL+SSL programs crashing, and
the MySQL extension also doesn't care about the callbacks. I think it's because
it's both because it's a rare combination, or because almost everyone loads the
cURL extension, which *does* set up callbacks. Like I said, it's not a good
situation.

> Note that we *already* provide applications with the ability to set the
> callbacks themselves and prevent us from doing so: PQinitSSL(false).

Ah, I only now saw that with PQinitOpenSSL(true, false) you can work around the
problem, if you set up your own callbacks first. That seems to at least provide
a way to make libpq not do the callbacks dance in released versions. Thank you.

>> Let me reiterate: I now believe the callbacks should be set by the 
>> application,
>> libraries should not touch them, since they don't know what will they be
>> stomping on. If the application is run through a VM like Python or PHP, it's
>> the VM that should make sure the callbacks are set.
>
> I fail to see why it's any better to do it in the VM. That relies on
> either always loading the VM's openssl module, even if you don't
> actually need it because the library you use deals with openssl. It
> prevents other implementations of openssl in the VM.

The callbacks are a thing that's owned by the application, so libraries have no
business in setting them up. The way OpenSSL's API is specified (very
unfortunately IMHO) is that before you use OpenSSL from threads, you need to
set the callbacks. If you don't control your application's startup (like when
you're a script executed through a VM), you assume the VM took care of it at
startup. If you're a library, you assume the user took care of it, as they
should.

Now I know that a lot of apps get this wrong. Python almost does the right
thing, but you're right, it only sets up callbacks if you load the 'ssl'
module. It still feels like setting them up in library code is stomping on
things not owned by the library - like libperl setting up signal handlers,
which caused problems in Postgres. They're resources not owned by the library!

> What I think we should do is the following:
>
> * Emphasize the fact that it's important to use PQinitSSL(false) if you
>   did things yourself.

PQinitOpenSSL(true, false) if anything. The reason that function got introduced
is precisely because PQinitSSL(false) isn't exactly right, see
http://www.postgresql.org/message-id/49820d7d.7030...@esilo.com

That doesn't solve the problem of the Python deadlock, where you're not at
leisure to call a C function at the beginning of your module.

> * If there's already callbacks set: Remember that fact and don't
>   overwrite. In the next major version: warn.

So yeah, that was my initial approach - check if callbacks are set, don't do
the dance if they are. It felt like a crutch, though, and racy at that. There's
no atomic way to test-and-set those callbacks. The window for racyness is
small, though.

> * Assert or something if the callback when unregistering isn't the same
>   as it was when registering. That's pretty much guaranteed to cause
>   issues.

So let me try another proposal and see if it doesn't set alarm bells off.

  * When opening a connection, if there's a callback set, don't overwrite it
(small race window).

  * When closing a connection, if the callback set is not
pq_lockingcallback/pq_threadidcallback, don't NULL it (small race window)

Asserts i

Re: [HACKERS] Table-level log_autovacuum_min_duration

2015-02-12 Thread Naoya Anzai

Hi, Michael-san

> An updated patch is attached, 

I'm sorry for confusing you.

I think you don't have to implement this code to disable this
feature with using value "-2".Because this use case is a rare case,
and there is a practical workaround using huge value like "2e9".
(You suggested "2e9" to me, didn't you? :) ) So, please remove this code.


> Well, I see your point but this is not completely true: we could as
> well rely entirely on this parameter instead of VACOPT_VERBOSE to
> determine if autovacuum, a vacuum or an analyze are in verbose mode,
> and remove VACOPT_VERBOSE, but I can imagine people complaining if
> VACOPT_VERBOSE is removed. So let's set it up unconditionally to -1 in
> gram.y for now. However if people think that it is fine to remove
> VACOPT_VERBOSE, we could use exclusively this parameter in VacuumStmt.
> Or even add sanity checks at the top of vacuum() to ensure that
> VACOPT_VERBOSE is set only when log_min_duration is positive.
> Additional opinions on this matter are welcome.

I understand your point at last. :)

You mean that ...
Log_autovacuum_min_duration assumes a role of VACOPT_VERBOSE.
Even if this parameter never use currently for manual vacuum,
log_autovacuum_min_duration should be set zero(anytime output)
when we executes "VACUUM(or ANALYZE) VERBOSE".
Is my understanding correct? If so,it sounds logical.

If we can abolish VERBOSE option,
I think it's ideal that we will prepare a new parameter like 
a log_min_duration_vacuum(and log_min_duration_analyze) which 
integrating "VERBOSE feature" and "log_autovacuum_min_duration".

Regards,
---
Naoya Anzai


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

85 matches

Mail list logo