date:20140515

Re: [HACKERS] Trigger concurrent execution

2014-05-15 Thread Craig Ringer

On 05/16/2014 08:06 AM, Blagoj Petrushev wrote:
> Hi,
> 
> I'm thinking of an extension to trigger functionality like this:
> 
> CREATE TRIGGER trigger_name
> AFTER event
> ON table
> CONCURRENTLY EXECUTE PROCEDURE trigger_fc
> 
> This would call the trigger after the end of the transaction.

If "after the end of the transaction" is what you mean by
"concurrently", then that's the wrong word to choose.

"AFTER COMMIT" ?

The concept of running a trigger "concurrently" just doesn't make sense
in PostgreSQL, because the backend is single threaded. You wouldn't be
able to run any SQL commands until the trigger finished.

It isn't possible to do anything useful without a transaction, so
PostgreSQL would need to start a transaction for the trigger and commit
the transaction at the end, as if you'd run SELECT my_procedure();.
Because it's outside the scope of the transaction it probably wouldn't
be possible to do FOR EACH ROW with a NEW and OLD var, unless you
stashed them as materialized rows in the queue of pending "AFTER COMMIT"
triggers.

Finally, because it's after transaction commit, you couldn't easily
guarantee that the trigger would really run. If the backend crashed /
the server was shut down  / etc after the commit but before your trigger
finished, you'd have a committed transaction but the trigger would not
run. To fix that you'd need to somehow make the trigger queue WAL-logged
and run it during replay, which from my rather limited understanding of
this area would be ... "interesting" to do. It'd also mean the trigger
couldn't have any session context.

This isn't easy, if it's practical at all.

> I have a big table with big text column article and a nullable
> tsvector column fts_article. On each insert or update that changes the
> article, I trigger-issue 'NOTIFY article_changed row_id', then, with a
> daemon listener, I catch the notification and update fts_article
> accordingly with my_fts_fc(article). The reason I don't do this
> directly in my trigger is because my_fts_fc is slow for big articles,
> fts_article has a gin index, and also, on heavy load, my listener can
> do these updates concurrently. Now, with a concurrent execution of
> triggers, I can just call my_fts_fc inside the trigger instead of the
> notify roundtrip.

I don't think that really fits.

It seems like you want to run the trigger procedure in the background on
another back-end. That'd be quite cool, but also not trivial to do,
especially if you wanted to guarantee that it happened reliably and in a
crash-safe manner.

-- 
 Craig Ringer   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

[HACKERS] BUGFIX: Dynamic bgworkers with BGW_NEVER_RESTART worker restarted on FatalError

2014-05-15 Thread Craig Ringer

Hi all

There's a bug in the dynamic bgworkers code that I think needs fixing
before release. TL;DR: BGW_NO_RESTART workers are restarted after
postmaster crash, attached patch changes that.

The case that's triggering the issue is where a static bgworker is
registering a new dynamic bgworker to do some setup work each time it
starts. The static worker is relaunched on postmaster restart and
registers a new BGW_NO_RESTART dynamic bgworker. This dynamic bgworker
immediately hits an Assert and dies.

This *should* cause a postmaster restart loop; that's expected. What it
shouldn't do, but is doing, is restart multiple copies of the bgworker -
fail to purge the old BGW_NO_RESTART one and launch it as if it'd never
crashed.

The attached patch fixes the problem.



Detail:

If you set a worker as BGW_NO_RESTART it isn't restarted if it ERRORs
out. That's fine.

With Petr's applied patch it no longer restarts on exit 0 (normal exit)
either.

There's a third case, though: a bgworker crash causing postmaster
restart. In this case the bgworker is still restarted, which makes no
sense at all if it isn't for the other two cases.

The existing code looks like it tries to protect against this - in
maybe_start_bgworker, the invocation of do_start_bgworker is protected
by a prior test for rw->rw_crashed_at that, if
rw->rw_worker.bgw_restart_time == BGW_NEVER_RESTART, unregisters the
worker and skips to the next one.

However, in my testing a breakpoint inside the if (rw->rw_crashed_at !=
0) test in maybe_start_bgworker is never hit, even for a bgworker that
is known to have crashed. rw->rw_crashed_at is always zero.

The culprit is ResetBackgroundWorkerCrashTimes, which unconditionally
resets the crash time without considering that the worker might be
BGW_NO_RESTART.

The attached patch makes ResetBackgroundWorkerCrashTimes only reset the
crashed time for workers with a restart time set.


-- 
 Craig Ringer   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services
>From a48de97f593b17160487e67f953c8d2b25f9e98a Mon Sep 17 00:00:00 2001
From: Craig Ringer 
Date: Fri, 16 May 2014 13:29:16 +0800
Subject: [PATCH] Don't restart BGW_NO_RESTART workers after postmaster crash

ResetBackgroundWorkerCrashTimes was ignoring BGW_NO_RESTART and clearing the
crash time of all bgworkers on postmaster restart so they'd be restarted as
soon as the postmaster was up and ready. It should only do this for workers
that are supposed to be restarted, as a BGW_NO_RESTART worker with no
rw_crashed_at set is assumed to be newly registered.

As a result, if a bgworker registered another BGW_NO_RESTART bgworker
during postmaster restart we'd get an increasing number of duplicates
of the BGW_NO_RESTART worker every time the postmaster restarted.
---
 src/backend/postmaster/bgworker.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/src/backend/postmaster/bgworker.c b/src/backend/postmaster/bgworker.c
index 85a3b3a..4bbebb6 100644
--- a/src/backend/postmaster/bgworker.c
+++ b/src/backend/postmaster/bgworker.c
@@ -411,7 +411,8 @@ ResetBackgroundWorkerCrashTimes(void)
 		RegisteredBgWorker *rw;
 
 		rw = slist_container(RegisteredBgWorker, rw_lnode, iter.cur);
-		rw->rw_crashed_at = 0;
+		if (rw->rw_worker.bgw_restart_time != BGW_NEVER_RESTART)
+			rw->rw_crashed_at = 0;
 	}
 }
 
-- 
1.9.0


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

[HACKERS] psql \db+ lack of size column

2014-05-15 Thread Fabrízio de Royes Mello

Hi all,

Are there some reason to don't show the tablespace size in the \db+ psql
command?

Regards,

-- 
Fabrízio de Royes Mello
Consultoria/Coaching PostgreSQL
>> Timbira: http://www.timbira.com.br
>> Blog sobre TI: http://fabriziomello.blogspot.com
>> Perfil Linkedin: http://br.linkedin.com/in/fabriziomello
>> Twitter: http://twitter.com/fabriziomello

Re: [HACKERS] 9.0 PDF build broken

2014-05-15 Thread Tom Lane

I wrote:
> Yeah.  This is caused by a hyperlink whose displayed text crosses a page
> boundary.  The only known fix is to change the text enough so the link
> no longer runs across a page boundary.  Unfortunately, pdfTeX is pretty
> unhelpful about identifying exactly where the problem is.  I seem to
> recall having posted a recipe about finding such problems.

Ah, found it:
http://www.postgresql.org/message-id/9473.1296172...@sss.pgh.pa.us

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] 9.0 PDF build broken

2014-05-15 Thread Tom Lane

Peter Eisentraut  writes:
> Mysteriously, commit 6b2a1445ec8a631060c4cbff3f172bf31d3379b9 has broken
> the PDF build (openjade + pdfjadetex) in the 9.0 branch only.  The error
> is

> [256.0.28
> ! pdfTeX error (ext4): \pdfendlink ended up in different nesting level than 
> \pd
> fstartlink.

Oh, not that again.

> I have reproduced this on two different platforms, and it affects only
> this branch.  I guess this change might have caused the page boundaries
> to shift in an unfortunate way.  I seem to recall we have had similar
> problems before.  Does anyone remember?

Yeah.  This is caused by a hyperlink whose displayed text crosses a page
boundary.  The only known fix is to change the text enough so the link
no longer runs across a page boundary.  Unfortunately, pdfTeX is pretty
unhelpful about identifying exactly where the problem is.  I seem to
recall having posted a recipe about finding such problems.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

[HACKERS] 9.0 PDF build broken

2014-05-15 Thread Peter Eisentraut

Mysteriously, commit 6b2a1445ec8a631060c4cbff3f172bf31d3379b9 has broken
the PDF build (openjade + pdfjadetex) in the 9.0 branch only.  The error
is

[256.0.28
! pdfTeX error (ext4): \pdfendlink ended up in different nesting level than \pd
fstartlink.
\AtBegShi@Output ...ipout \box \AtBeginShipoutBox 
  \fi \fi 
l.241723 ...char95{}stat\char95{}file('filename');
  
!  ==> Fatal error occurred, no output PDF file produced!
Transcript written on postgres-A4.log.
make: *** [postgres-A4.pdf] Error 1


I have reproduced this on two different platforms, and it affects only
this branch.  I guess this change might have caused the page boundaries
to shift in an unfortunate way.  I seem to recall we have had similar
problems before.  Does anyone remember?




-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Trigger concurrent execution

2014-05-15 Thread David G Johnston

Blagoj Petrushev wrote
> Hi,
> 
> I'm thinking of an extension to trigger functionality like this:
> 
> CREATE TRIGGER trigger_name
> AFTER event
> ON table
> CONCURRENTLY EXECUTE PROCEDURE trigger_fc
> 
> This would call the trigger after the end of the transaction.
> 
> The following is a use-case, please tell me if I'm doing it wrong.
> 
> I have a big table with big text column article and a nullable
> tsvector column fts_article. On each insert or update that changes the
> article, I trigger-issue 'NOTIFY article_changed row_id', then, with a
> daemon listener, I catch the notification and update fts_article
> accordingly with my_fts_fc(article). The reason I don't do this
> directly in my trigger is because my_fts_fc is slow for big articles,
> fts_article has a gin index, and also, on heavy load, my listener can
> do these updates concurrently. Now, with a concurrent execution of
> triggers, I can just call my_fts_fc inside the trigger instead of the
> notify roundtrip.

Conceptually, trigger actions run in-transaction and can cause it to
ROLLBACK; so how would "after the end of the transaction" work?  Since the
easy way is to have COMMIT; block until all the AFTER event concurrent
triggers fire I presume you would want something more like a task queue for
background workers where, at commit, the function call is in place in a FIFO
queue and the calling session is allowed to move onto other activity.

It is not clear what you mean by "my listener can do these updates
concurrently"? Concurrently with each other or concurrently with other DML
action on table?I assume you have multiple listeners since the potential
rate of insert of the documents is likely much greater than the rate of
update/indexing.

Also, it would seem you'd typically want the GIN index to be updated once
the corresponding transaction committed and makes the rest of the data
available.  Or does your use case allow for some delay between the article
being in the database physically and it being available in the index?

David J.




--
View this message in context: 
http://postgresql.1045698.n5.nabble.com/Trigger-concurrent-execution-tp5804158p5804164.html
Sent from the PostgreSQL - hackers mailing list archive at Nabble.com.


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] 9.4 release notes

2014-05-15 Thread David G Johnston

Andres Freund-3 wrote
> On 2014-05-04 08:46:07 -0400, Bruce Momjian wrote:
>> I have completed the initial version of the 9.4 release notes.  You can
>> view them here:
>> 
>>  http://www.postgresql.org/docs/devel/static/release-9-4.html
>> 
>> I will be adding additional markup in the next few days.
>> 
>> Feedback expected and welcomed.  I expect to be modifying this until we
>> release 9.4 final.  I have marked items where I need help with question
>> marks.
> 
> This time I started reading from the end. I think I've fixed most of the
> questionable things (i.e. ? or FIXMEs) left.
> 
> I am not really sure how to rewrite the notes for the logical decoding
> stuff into a more appropriate format for the release notes. Currently it
> seems to describe too many details and not enough overview. It's also
> probably too long.
> 
> How about letting it keep it's 
> 
>  but remove the 
> 
>  and
> put a short explanation about the individual parts into a following
> 
>  or two? That'd require a name after a 
> 
>  which normally
> isn't done...
> 
> Greetings,
> 
> Andres Freund

Some errors and suggestions - my apologizes for the format as I do not have
a proper patching routine setup.

Patch Review - Top to Bottom (mostly, I think...)

s/constraints's/constraint/ - possessive not needed here, it is a property
of the constraint, not owned by it.

s/plannet/planner

DISCARD ALL now also discards the states of sequences.
change to
DISCARD ALL now also discards sequence state.

IIUC: Logical decoding allows for streaming of statement-scoped database
changes.

Add function pg_filenode_relation() to more efficiently lookup relations via
their filenode.

This allows one to declare array_agg()-like aggregates using SQL.

IIUC: Remove line length restrictions from pgbench.


These are not in the patch but from my quick scan of the online release
notes - top to bottom:

then conditionally additional adjacent whitespace if not in FX mode
->
then, conditionally, remove additional adjacent whitespace if not in FX
mode.
(I presume those conditions are noted in the documentation somewhere)

For example, previously format string space-space-space would consume only
the first space in ' 12', while it will not consume all three characters.
-> 
For example, the format string space-space-space previously consumed only
the first space in 'space-space-12' whereas now it will consume all three
characters.

style: add comma -> Previously[,] empty arrays were returned (occurs
frequently but is minor)

style: NULL VARIADIC function arguments are now disallowed
->
Disallow NULL VARIADIC function arguments
(most of the notes are verb-leading in structure)

During immediate shutdown, send uncatchable termination <- kill the comma

In contrast to local_preload_libraries, this parameter can load any shared
library <- shoot the comma

The linking on this one is odd:
Have Windows ASCII-encoded databases and server process (e.g. postmaster)
emit messages in the LC_CTYPE-defined language (Alexander Law, Noah Misch)


Add ROWS FROM() syntax to allow horizontal concatenation of set-returning
functions in the FROM-clause (Andrew Gierth)
-> Maybe a note about using this to avoid least-common-multiple expansion?

Add WITH ORDINALITY syntax which numbers rows returned from FROM-clause
functions
-> 
Add WITH ORDINALITY syntax to number the rows returned by FROM-clause
functions.

???
DISCARD ALL will now also discard such information.
->
DISCARD ALL now also invokes this command.


be converted to NULL in in CSV mode <- strike one of the "in"s


[I got no clue on this pair... but recommend someone rewrite it]
Improve the internal definition of system relations (Andres Freund, Robert
Haas)
Previously, relations once moved into the system catalog schema could no
longer be modified or dropped.


David J.





--
View this message in context: 
http://postgresql.1045698.n5.nabble.com/9-4-release-notes-tp5802343p5804163.html
Sent from the PostgreSQL - hackers mailing list archive at Nabble.com.


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Error in running DBT2

2014-05-15 Thread David G Johnston

Andrew Dunstan wrote
> On 05/15/2014 06:37 PM, Rohit Goyal wrote:
>> Hi All,
>>
>> I am using centOS6 and after all confugration, I run the below command
>>
>> *dbt2-run-workload -a pgsql -d 120 -w 1 -o /tmp/result -c 10
>> *
>> *Error:*
>> Stage 3. Processing of results...
>> Killing client...
>> waiting for server to shut down done
>> server stopped
>> Traceback (most recent call last):
>>   File "/tmp/dbt2/bin/dbt2-post-process", line 14, in 
> 
>> import rpy2.robjects as robjects
>> ImportError: No module named rpy2.robjects
>> Test completed.
>> Results are in: /tmp/result
>>
>> Please guide me !! :)
>>
>>
> 
> 
> Do these questions about running dbt2 even belong on pgsql-hackers? They 
> seem to me to be usage questions that belong on pgsql-general.
> 
> They are also woefully inadequate in the detail they provide. I don't 
> see how anyone could take the above report and give any sort of opinion, 
> other than that it is a pythin error and not apparently a postgres error 
> at all.

Actually, 

osdldbt-gene...@lists.sourceforge.net

would be the proper location and I do see that the OP has found said mailing
list and even got a response a few days ago.

I guess it makes some sense to shout help from the rooftops but likely
anyone willing and able to personal training to a DBT2 newbie is going to be
monitoring the aforementioned list.

One other suggestion is, after seeing such an error as above, actually
telling people that you have made some rudimentary attempt to confirm that,
indeed you have:

1) Python installed
2) R installed
3) The python module "rpy2" installed

I am thinking centOS6 doesn't have a packaging system that ensures that all
dependencies are present and so it seems that the user needs to do so and
affirm that such is the case when asking for help.

Another helpful outcome of trying to understand what people need to help you
is that you would realize that the message "Results are in: /tmp/result"
indicates that there is an additional file that someone debugging (you or
someone on these mailing lists) would likely find helpful to review.  Attach
that file - either in full or in part - or at least indicate you looked at
it and didn't see anything unusual (unlikely that, though).

Sorry I cannot help you directly but maybe I can help you be more successful
in asking for help in the future :)

Good Luck!

David J.

IOW: Guiding someone through debugging on an mailing list is very tedious
and difficult for the people doing the guiding - do as much exploring as
possible on your own and tell people where you have already been and what
you have seen.

--
View this message in context: 
http://postgresql.1045698.n5.nabble.com/Error-in-running-DBT2-tp5804147p5804161.html
Sent from the PostgreSQL - hackers mailing list archive at Nabble.com.

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] 9.4 release notes

2014-05-15 Thread Andres Freund

On 2014-05-04 08:46:07 -0400, Bruce Momjian wrote:
> I have completed the initial version of the 9.4 release notes.  You can
> view them here:
> 
>   http://www.postgresql.org/docs/devel/static/release-9-4.html
> 
> I will be adding additional markup in the next few days.
> 
> Feedback expected and welcomed.  I expect to be modifying this until we
> release 9.4 final.  I have marked items where I need help with question
> marks.

This time I started reading from the end. I think I've fixed most of the
questionable things (i.e. ? or FIXMEs) left.

I am not really sure how to rewrite the notes for the logical decoding
stuff into a more appropriate format for the release notes. Currently it
seems to describe too many details and not enough overview. It's also
probably too long.

How about letting it keep it's  but remove the  and
put a short explanation about the individual parts into a following
 or two? That'd require a name after a  which normally
isn't done...

Greetings,

Andres Freund

-- 
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services
>From 18511f2e4cd7d72b8d943efd9e6501d3903a64c4 Mon Sep 17 00:00:00 2001
From: Andres Freund 
Date: Fri, 16 May 2014 01:37:07 +0200
Subject: [PATCH] Further 9.4 release notes improvements.

---
 doc/src/sgml/release-9.4.sgml | 52 +++
 1 file changed, 33 insertions(+), 19 deletions(-)

diff --git a/doc/src/sgml/release-9.4.sgml b/doc/src/sgml/release-9.4.sgml
index 3070d0b..e143178 100644
--- a/doc/src/sgml/release-9.4.sgml
+++ b/doc/src/sgml/release-9.4.sgml
@@ -30,7 +30,7 @@
  
   
Logical decoding allows database
-   changes to be streamed out in customizable format
+   changes to be streamed out in a customizable format
   
  
 
@@ -298,6 +298,12 @@
  
 
 
+
+ 
+  DISCARD ALL now also discards the states of sequences.
+ 
+
+

 
   
@@ -1005,7 +1011,6 @@

 

-
 This was added so views that select from a table with zero columns
 can be dumped correctly.

@@ -1028,7 +1033,6 @@

 

-
 DISCARD ALL will now also discard such information.

   
@@ -1199,6 +1203,11 @@
 AGGREGATE to supply the size of the aggregate's
 transition state data (Hadi Moshayedi)

+
+   
+This allows the plannet to better estimate how much memory will be
+used when aggregating.
+   
   
 
  
@@ -1218,7 +1227,7 @@
 
   

-Allow the changing of foreign key constraint  via ALTER TABLE
 ... ALTER CONSTRAINT (Simon Riggs)

@@ -1254,7 +1263,7 @@
 
   

-Fully-implement the line data type (Peter
 Eisentraut)

@@ -1472,7 +1481,7 @@

 Add function pg_filenode_relation()
-to allow for more efficient filenode to relation lookups (Andres
+to allow for more efficient lookups from filenode to relation (Andres
 Freund)

   
@@ -1543,10 +1552,13 @@
   
 
   
-   

 Allow polymorphic aggregates to have non-polymorphic state data
-types ? (Tom Lane)
+types (Tom Lane)
+   
+   
+This allows to declare aggregates like the builtin
+array_agg() from SQL.

   
 
@@ -1772,8 +1784,8 @@
 
   

-Allow field wrapping to psql's "extended" mode
-(Sergey Muraviov)
+Add ability to wrap long lines in psql's "expanded"
+mode by using \pset format wrapped (Sergey Muraviov)

   
 
@@ -2218,7 +2230,8 @@
   

 Add pg_prewarm
-to preload relation data into the shared buffer cache (Robert Haas)
+extension to preload relation data into the shared buffer cache
+(Robert Haas)

 

@@ -2243,7 +2256,7 @@
 
   

-Add logging of trigger execution to auto_explain
 (Horiguchi Kyotaro)

@@ -2279,9 +2292,10 @@
 
   

-Improve indexing of pg_trgm values to
-discourage indexing whitespace (Alexander Korotkov)
+Improve pg_trgm's
+generation of trigrams for indexed regular expression searches by
+discouraging the use of trigrams containing whitespace (Alexander
+Korotkov)

   
 
@@ -2328,7 +2342,7 @@
 
   

-Allow pgbench to process script files of any line length (Sawada
+Allow pgbench to process script files of any line length (Sawada
 Masahiko)

 
@@ -2339,20 +2353,20 @@
 
   

-Add pg_bench option (--rate) to control
+Add pgbench option (--rate) to control
 the transaction rate (Fabien Coelho)

   
 
   

-Add --progress output option to pgbench (Fabien Coelho)
+Add --progress output option t

[HACKERS] Trigger concurrent execution

2014-05-15 Thread Blagoj Petrushev

Hi,

I'm thinking of an extension to trigger functionality like this:

CREATE TRIGGER trigger_name
AFTER event
ON table
CONCURRENTLY EXECUTE PROCEDURE trigger_fc

This would call the trigger after the end of the transaction.

The following is a use-case, please tell me if I'm doing it wrong.

I have a big table with big text column article and a nullable
tsvector column fts_article. On each insert or update that changes the
article, I trigger-issue 'NOTIFY article_changed row_id', then, with a
daemon listener, I catch the notification and update fts_article
accordingly with my_fts_fc(article). The reason I don't do this
directly in my trigger is because my_fts_fc is slow for big articles,
fts_article has a gin index, and also, on heavy load, my listener can
do these updates concurrently. Now, with a concurrent execution of
triggers, I can just call my_fts_fc inside the trigger instead of the
notify roundtrip.

Kind regards,
Blagoj Petrushev


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] buildfarm animals and 'snapshot too old'

2014-05-15 Thread Tomas Vondra

On 15.5.2014 22:07, Andrew Dunstan wrote:
> 
> Yes, I've seen that. Frankly, a test that takes something like 500 
> hours is a bit crazy.

Maybe. It certainly is not a test people will use during development.
But if it can detect some hard-to-find errors in the code, that might
possibly lead to serious problems, then +1 from me to run them at least
on one animal. 500 hours is ~3 weeks, which is not that bad IMHO.

Also, once you know where it fails the developer can run just that
single test (which might take minutes/hours, but not days).

> If we really want to run this in the buildfarm we should probably
> try to create a massively cut down test schedule for use in this
> case.

If we can run cut this down in a meaningful way (i.e. without
sacrificing most of the benefits) then sure - let's do that. But I think
that's what CLOBBER_CACHE_ALWAYS is about.

regards
Tomas

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] buildfarm animals and 'snapshot too old'

2014-05-15 Thread Tomas Vondra

On 15.5.2014 22:56, Andrew Dunstan wrote:
> 
> On 05/15/2014 04:30 PM, Stefan Kaltenbrunner wrote:
>
>> well I'm not sure about about "misconfigured" but both my personal
>> buildfarm members and pginfra run ones (like gaibasaurus) got errors
>> complaining about "snapshot too old" in the past for long running tests
>> so I'm not sure it is really a "we never had machine with runs that
>> long". So maybe we should not reject those submissions at submission
>> time but rather mark them clearly on the dashboard and leave the final
>> interpretation to a human...
>>
> 
> That's a LOT harder and more work to arrange. Frankly, there are more
> important things to do.
> 
> I would like to know the circumstances of these very long runs. I drive
> some of my VMs pretty hard on pretty modest hardware, and they don't
> come close to running 24 hours.
> 
> The current behaviour goes back to this commit from December 2011:
> 
>commit a8b5049e64f9cb08f8e165d0737139dab74e3bce
>Author: Andrew Dunstan 
>Date:   Wed Dec 14 14:38:44 2011 -0800
> 
> Use git snapshot instead of fixed 10 day timeout.
> 
> The sanity checks made sure that an animal wasn't submitting a
> snapshot that was too old. But sometimes an old branch doesn't
> get any changes for more than 10 days. So accept a snapshot that
> is not more than 1 day older than the last known snapshot. Per
> complaint from Stefan.
> 
> 
> I'm prepared to increase the sanity check time if there is a serious
> demand for it, but I'd like to know what to increase it to.

I doubt there's no "one size fits all" limit. If the machines running
"recursive clobber" tests need tens of days to complete the tests. then
that limit is pretty useless to most regular animals.

So what about keeping the current value for most animals, but allowing
an override for some selected ones? I'd expect this to be much simpler
to implement, and it shouldn't require any human intervention.

Tomas


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Error in running DBT2

2014-05-15 Thread Peter Geoghegan

On Thu, May 15, 2014 at 3:47 PM, Andrew Dunstan  wrote:
> Do these questions about running dbt2 even belong on pgsql-hackers? They
> seem to me to be usage questions that belong on pgsql-general.

I agree.

Anyway, perhaps the OP will have more luck with OLTPBenchmark, which
has some kind of TPC-C support:
http://oltpbenchmark.com/wiki/index.php?title=Main_Page


-- 
Peter Geoghegan


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Error in running DBT2

2014-05-15 Thread Andrew Dunstan




On 05/15/2014 06:37 PM, Rohit Goyal wrote:

Hi All,

I am using centOS6 and after all confugration, I run the below command

*dbt2-run-workload -a pgsql -d 120 -w 1 -o /tmp/result -c 10
*
*Error:*
Stage 3. Processing of results...
Killing client...
waiting for server to shut down done
server stopped
Traceback (most recent call last):
  File "/tmp/dbt2/bin/dbt2-post-process", line 14, in 
import rpy2.robjects as robjects
ImportError: No module named rpy2.robjects
Test completed.
Results are in: /tmp/result

Please guide me !! :)





Do these questions about running dbt2 even belong on pgsql-hackers? They 
seem to me to be usage questions that belong on pgsql-general.


They are also woefully inadequate in the detail they provide. I don't 
see how anyone could take the above report and give any sort of opinion, 
other than that it is a pythin error and not apparently a postgres error 
at all.


cheers

andrew


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

[HACKERS] Error in running DBT2

2014-05-15 Thread Rohit Goyal

Hi All,

I am using centOS6 and after all confugration, I run the below command


*dbt2-run-workload -a pgsql -d 120 -w 1 -o /tmp/result -c 10*
*Error:*
Stage 3. Processing of results...
Killing client...
waiting for server to shut down done
server stopped
Traceback (most recent call last):
  File "/tmp/dbt2/bin/dbt2-post-process", line 14, in 
import rpy2.robjects as robjects
ImportError: No module named rpy2.robjects
Test completed.
Results are in: /tmp/result

Please guide me !! :)

Regards,
Rohit Goyal


-- 
Regards,
Rohit Goyal

Re: [HACKERS] CTE that result in repeated sorting of the data

2014-05-15 Thread Jon Nelson

On Thu, May 15, 2014 at 4:50 PM, David G Johnston
 wrote:
> Jon Nelson-14 wrote
>> I was watching a very large recursive CTE get built today and this CTE
>> involves on the order of a dozen or so "loops" joining the initial
>> table against existing tables. It struck me that - every time through
>> the loop the tables were sorted and then joined and that it would be
>> much more efficient if the tables remained in a sorted state and could
>> avoid being re-sorted each time through the loop. Am I missing
>> something here? I am using PG 8.4 if that matters.
>
> I'm not sure what you mean by "watching" but maybe this is a simple as
> changing your CTE to use "UNION ALL" instead of "UNION [DISTINCT]"?

In fact, I'm using UNION ALL.

> If you really think it could be improved upon maybe you can help and provide
> a minimal self-contained example query and data that exhibits the behavior
> you describe so others can see it and test changes?  It would be nice to
> know if other versions than one that is basically no longer supported
> exhibits the same behavior.

Pretty much any CTE that looks like this:

with cte AS (
  select stuff from A
  UNION ALL
  select more_stuff from B, cte WHERE 
) SELECT * FROM cte;

*and* where the planner chooses to join B and cte by sorting and doing
a merge join.

I'll see if I can come up with a self-contained example.


-- 
Jon


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] sepgsql: label regression test failed

2014-05-15 Thread Kohei KaiGai

Sorry, I've forgotten the report.

The test fails on label test come from specification change in the mcs policy.
Previously, it was applied to all the domains including unconfined_t, but now,
it became to be applied on the domain with "mcsconstrained" attribute.

This regression test run sepgsql_seton() on the system "unconfined_t" domain,
and see the behavior when process intended to move wider or narrower ranged
categories, so it was affected by system policy change, even though it is our
intention of sepgsql.

The attached patch adds "mcsconstrained" attribute on the domain for this
regression test, if this attribute exists. So, it will work on both of F20 and
older system.

Regarding to the regression test on ddl and alter, this change looks to me
hook invocation around recomputeNamespacePath() were gone, because
the schema already allowed to search was already checked.
Is the behavior around recomputeNamespacePath() recently updated?
At least, it is not a matter since {search} permission towards
"regtest_schema_2" is checked in this test scenario.

Thanks,

2014-05-14 13:33 GMT+09:00 Sergey Muraviov :
> Hi.
>
> Some regression tests for sepgsql still not work on Fedora 20:
>
> == running regression test queries==
> test label... FAILED
> test dml  ... ok
> test ddl  ... FAILED
> test alter... FAILED
> test misc ... ok
>
> ==
>  3 of 5 tests failed.
> ==
>
> $ sestatus
> SELinux status: enabled
> SELinuxfs mount:/sys/fs/selinux
> SELinux root directory: /etc/selinux
> Loaded policy name: targeted
> Current mode:   enforcing
> Mode from config file:  enforcing
> Policy MLS status:  enabled
> Policy deny_unknown status: allowed
> Max kernel policy version:  29
>
> $ uname -i -o -r
> 3.14.3-200.fc20.x86_64 x86_64 GNU/Linux
>
> $ /usr/local/pgsql/bin/postgres --version
> postgres (PostgreSQL) 9.4beta1
>
> PS
> I've got this compiler warning:
>  relation.c: In function ‘sepgsql_relation_drop’:
> relation.c:472:25: warning: ‘tclass’ may be used uninitialized in this
> function [-Wmaybe-uninitialized]
>   sepgsql_avc_check_perms(&object,
>  ^
>
>
> 2013-12-25 0:34 GMT+04:00 Kohei KaiGai :
>
>> Hello,
>>
>> It seems to me changes in the base security policy on Fedora affected to
>> the regression test. Our test cases for sepgsql_setcon() utilizes the MCS
>> rules, that prevents domain transition from narrow categories to wider
>> ones,
>> to control the success cases and failure cases.
>>
>> However, its coverage was changed. It was applied all the domains in the
>> system, thus "unconfined_t" domain had been enforced by MCS rules.
>> But now, it shall be applied only domains with "mcs_constrained_type"
>> attribute.
>>
>> [kaigai@vmlinux tmp]$ diff -up old/policy/mcs new/policy/mcs
>>   :
>>  
>>   :
>>  mlsconstrain process { transition dyntransition }
>> -   (( h1 dom h2 ) or ( t1 == mcssetcats ));
>> +   (( h1 dom h2 ) or ( t1 != mcs_constrained_type ));
>>
>> Probably, we need to define a domain by ourselves for regression test to
>> ensure
>> the test stability, not using the system "unconfined" domain that has
>> different
>> meaning by release.
>>
>> I'll make a patch. Please wait for a while.
>>
>> Thanks for your test & reports.
>>
>> 2013/12/18 Sergey Muraviov :
>> > # semodule -l | grep sepgslq
>> > sepgsql-regtest 1.07
>> >
>> > Full list of modules is in attachment.
>> >
>> >
>> > 2013/12/18 Kohei KaiGai 
>> >>
>> >> Could you show me semodule -l on your environment?
>> >> I believe security policy has not been changed between F19 and F20...
>> >>
>> >> Thanks,
>> >>
>> >> 2013/12/18 Sergey Muraviov :
>> >> > Hi
>> >> >
>> >> > I've tried to test postgres 9.3.2 and 9.4devel with selinux on Fedora
>> >> > 20
>> >> > and
>> >> > met with a label regression test failure.
>> >> >
>> >> > PS
>> >> > I've got some warning during build process.
>> >> >
>> >> > --
>> >> > Best regards,
>> >> > Sergey Muraviov
>> >> >
>> >> >
>> >> > --
>> >> > Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
>> >> > To make changes to your subscription:
>> >> > http://www.postgresql.org/mailpref/pgsql-hackers
>> >> >
>> >>
>> >>
>> >>
>> >> --
>> >> KaiGai Kohei 
>> >
>> >
>> >
>> >
>> > --
>> > Best regards,
>> > Sergey Muraviov
>>
>>
>>
>> --
>> KaiGai Kohei 
>
>
>
>
> --
> Best regards,
> Sergey Muraviov



-- 
KaiGai Kohei 


sepgsql-fixup-regtest-policy.patch
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Proposal for CSN based snapshots

2014-05-15 Thread Alvaro Herrera

Andres Freund wrote:
> On 2014-05-15 17:37:14 -0400, Alvaro Herrera wrote:
> > Andres Freund wrote:
> > > On 2014-05-15 15:40:06 -0400, Robert Haas wrote:
> > > > On Thu, May 15, 2014 at 2:34 PM, Bruce Momjian  wrote:
> > 
> > > > If the larger clog size is a show-stopper (and I'm not sure I have an
> > > > intelligent opinion on that just yet), one way to get around the
> > > > problem would be to summarize CLOG entries after-the-fact.  Once an
> > > > XID precedes the xmin of every snapshot, we don't need to know the
> > > > commit LSN any more.  So we could read the old pg_clog files and write
> > > > new summary files.  Since we don't need to care about subcommitted
> > > > transactions either, we could get by with just 1 bit per transaction,
> > > > 1 = committed, 0 = aborted.  Once we've written and fsync'd the
> > > > summary files, we could throw away the original files.  That might
> > > > leave us with a smaller pg_clog than what we have today.
> > > 
> > > I think the easiest way for now would be to have pg_clog with the same
> > > format as today and a rangewise much smaller pg_csn storing the lsns
> > > that are needed. That'll leave us with pg_upgrade'ability without
> > > needing to rewrite pg_clog during the upgrade.
> > 
> > Err, we're proposing a patch to add timestamps to each commit,
> > http://www.postgresql.org/message-id/2013101600.ge4...@eldon.alvh.no-ip.org
> > which does so in precisely this way.
> 
> I am not sure where my statements above conflict with committs?

I didn't say it did ...

-- 
Álvaro Herrerahttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] pg_recvlogical tests

2014-05-15 Thread Andres Freund

On 2014-05-15 18:52:45 -0300, Euler Taveira wrote:
> On 15-05-2014 18:09, Andres Freund wrote:
> > I have some preliminary tests for the pg_recvlogical binary using the
> > infrastructure Peter added. I am wondering if somebody has a good idea
> > about how to make the tests more meaningful. Currently all that's tested
> > are simple commands. Not the main functionality namely the actual
> > streaming of changes.
> >
> Could you post your preliminary patch?

It's not particularly interesting yet, but attached.

> > I wonder if somebody has a great idea for cancelling pg_recvlogical
> > after a while. Right now my best idea is to add a new
> > --stop-after-messages parameter. Does anybody have a better idea?
> > 
> pgbench uses two stop conditions: (i) number of transactions and (ii)
> time. These could be applied to pg_recvlogical too.
> 
> --transactions=NUM
> --time=NUM

Sounds too complicated for what I need it for. The former isn't easily
implementable because pg_recvlogical doesn't know about transactions and
the latter isn't that interesting for the tests because it'll have
timing issues...

> > If that's the way, would somebody object to tests and the parameter
> > being added now? The potential harm seems pretty low and the additional
> > tests would cover the walsender interface that's not covered by any
> > tests right now...
> > 
> beta is not the right time to add even a *minor* functionality. Let's do
> it for 9.5.

I am not interested in the feature iself at all. I am interested into
adding more tests for new functionality that's currently not tested in
an automated fassion. And I do think beta isn't a bad time for that. We
didn't *have* the infrastructure for the tests of binaries until the end
of the last CF, so I couldn't have added it during development.

Greetings,

Andres Freund

-- 
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


030_pg_recvlogical.pl
Description: Perl program

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] pg_recvlogical tests

2014-05-15 Thread Euler Taveira

On 15-05-2014 18:09, Andres Freund wrote:
> I have some preliminary tests for the pg_recvlogical binary using the
> infrastructure Peter added. I am wondering if somebody has a good idea
> about how to make the tests more meaningful. Currently all that's tested
> are simple commands. Not the main functionality namely the actual
> streaming of changes.
>
Could you post your preliminary patch?

> I wonder if somebody has a great idea for cancelling pg_recvlogical
> after a while. Right now my best idea is to add a new
> --stop-after-messages parameter. Does anybody have a better idea?
> 
pgbench uses two stop conditions: (i) number of transactions and (ii)
time. These could be applied to pg_recvlogical too.

--transactions=NUM
--time=NUM

> If that's the way, would somebody object to tests and the parameter
> being added now? The potential harm seems pretty low and the additional
> tests would cover the walsender interface that's not covered by any
> tests right now...
> 
beta is not the right time to add even a *minor* functionality. Let's do
it for 9.5.


-- 
   Euler Taveira   Timbira - http://www.timbira.com.br/
   PostgreSQL: Consultoria, Desenvolvimento, Suporte 24x7 e Treinamento


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] CTE that result in repeated sorting of the data

2014-05-15 Thread David G Johnston

Jon Nelson-14 wrote
> I was watching a very large recursive CTE get built today and this CTE
> involves on the order of a dozen or so "loops" joining the initial
> table against existing tables. It struck me that - every time through
> the loop the tables were sorted and then joined and that it would be
> much more efficient if the tables remained in a sorted state and could
> avoid being re-sorted each time through the loop. Am I missing
> something here? I am using PG 8.4 if that matters.

I'm not sure what you mean by "watching" but maybe this is a simple as
changing your CTE to use "UNION ALL" instead of "UNION [DISTINCT]"?

If you really think it could be improved upon maybe you can help and provide
a minimal self-contained example query and data that exhibits the behavior
you describe so others can see it and test changes?  It would be nice to
know if other versions than one that is basically no longer supported
exhibits the same behavior.

Or maybe someone more familiar with the implementation of recursive CTE will
chime in - my knowledge in this area is fairly limited.

David J.





--
View this message in context: 
http://postgresql.1045698.n5.nabble.com/CTE-that-result-in-repeated-sorting-of-the-data-tp5804136p5804140.html
Sent from the PostgreSQL - hackers mailing list archive at Nabble.com.


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] 9.5: UPDATE/DELETE .. ORDER BY .. LIMIT ..

2014-05-15 Thread Jeff Janes

On Wed, May 14, 2014 at 8:26 AM, Robert Haas  wrote:

> On Sun, May 11, 2014 at 12:47 PM, Tom Lane  wrote:
> > Simon Riggs  writes:
> >> On 11 May 2014 11:18, Andres Freund  wrote:
> >>> I don't know. I'd find UPDATE/DELETE ORDER BY something rather
> >>> useful.
> >
> >> Perhaps if an index exists to provide an ordering that makes it clear
> >> what this means, then yes.
> >
> > The $64 question is whether we'd accept an implementation that fails
> > if the target table has children (ie, is partitioned).
>
> I'd say "no".  Partitioning is important, and we need to make it more
> seamless and better-integrated, not add new warts.
>

I think the importance of partitioning argues the other way on this issue.
 Where I most wanted a LIMIT clause on DELETE is where I was moving tuples
from one partition to a different one in a transactional way using
bite-size chunks that wouldn't choke the live system with locks or with IO.

So the DELETE was always running against either a child by name, or against
ONLY parent, not against the whole inheritance tree.  Not being able to do
this on single partitions makes partitioning harder, not easier.

Sure, I can select the nth smallest ctid and then "WITH T AS (DELETE FROM
ONLY foo WHERE ctid < :that RETURNING *) INSERT INTO bar SELECT * from T",
but how annoying.

> > That seems
> > to me to not be up to the project's usual quality expectations, but
> > maybe if there's enough demand for a partial solution we should do so.
>
> I like this feature, but if I were searching for places where it makes
> sense to loosen our project's usual quality expectations, this isn't
> where I'd start.
>

In this case I'd much rather have half a loaf rather than none at all.  We
wouldn't be adding warts to partitioning, but removing warts from the
simpler case.

Cheers,

Jeff

Re: [HACKERS] Proposal for CSN based snapshots

2014-05-15 Thread Andres Freund

On 2014-05-15 17:37:14 -0400, Alvaro Herrera wrote:
> Andres Freund wrote:
> > On 2014-05-15 15:40:06 -0400, Robert Haas wrote:
> > > On Thu, May 15, 2014 at 2:34 PM, Bruce Momjian  wrote:
> 
> > > If the larger clog size is a show-stopper (and I'm not sure I have an
> > > intelligent opinion on that just yet), one way to get around the
> > > problem would be to summarize CLOG entries after-the-fact.  Once an
> > > XID precedes the xmin of every snapshot, we don't need to know the
> > > commit LSN any more.  So we could read the old pg_clog files and write
> > > new summary files.  Since we don't need to care about subcommitted
> > > transactions either, we could get by with just 1 bit per transaction,
> > > 1 = committed, 0 = aborted.  Once we've written and fsync'd the
> > > summary files, we could throw away the original files.  That might
> > > leave us with a smaller pg_clog than what we have today.
> > 
> > I think the easiest way for now would be to have pg_clog with the same
> > format as today and a rangewise much smaller pg_csn storing the lsns
> > that are needed. That'll leave us with pg_upgrade'ability without
> > needing to rewrite pg_clog during the upgrade.
> 
> Err, we're proposing a patch to add timestamps to each commit,
> http://www.postgresql.org/message-id/2013101600.ge4...@eldon.alvh.no-ip.org
> which does so in precisely this way.

I am not sure where my statements above conflict with committs?

Greetings,

Andres Freund

-- 
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Proposal for CSN based snapshots

2014-05-15 Thread Alvaro Herrera

Andres Freund wrote:
> On 2014-05-15 15:40:06 -0400, Robert Haas wrote:
> > On Thu, May 15, 2014 at 2:34 PM, Bruce Momjian  wrote:

> > If the larger clog size is a show-stopper (and I'm not sure I have an
> > intelligent opinion on that just yet), one way to get around the
> > problem would be to summarize CLOG entries after-the-fact.  Once an
> > XID precedes the xmin of every snapshot, we don't need to know the
> > commit LSN any more.  So we could read the old pg_clog files and write
> > new summary files.  Since we don't need to care about subcommitted
> > transactions either, we could get by with just 1 bit per transaction,
> > 1 = committed, 0 = aborted.  Once we've written and fsync'd the
> > summary files, we could throw away the original files.  That might
> > leave us with a smaller pg_clog than what we have today.
> 
> I think the easiest way for now would be to have pg_clog with the same
> format as today and a rangewise much smaller pg_csn storing the lsns
> that are needed. That'll leave us with pg_upgrade'ability without
> needing to rewrite pg_clog during the upgrade.

Err, we're proposing a patch to add timestamps to each commit,
http://www.postgresql.org/message-id/2013101600.ge4...@eldon.alvh.no-ip.org
which does so in precisely this way.

The idea that pg_csn or pg_committs can be truncated much earlier than
pg_clog has its merit, no doubt.  If we can make sure that the atomicity
is sane, +1 from me.

-- 
Álvaro Herrerahttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

[HACKERS] CTE that result in repeated sorting of the data

2014-05-15 Thread Jon Nelson

I was watching a very large recursive CTE get built today and this CTE
involves on the order of a dozen or so "loops" joining the initial
table against existing tables. It struck me that - every time through
the loop the tables were sorted and then joined and that it would be
much more efficient if the tables remained in a sorted state and could
avoid being re-sorted each time through the loop. Am I missing
something here? I am using PG 8.4 if that matters.

-- 
Jon


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

[HACKERS] pg_recvlogical tests

2014-05-15 Thread Andres Freund

Hi,

I have some preliminary tests for the pg_recvlogical binary using the
infrastructure Peter added. I am wondering if somebody has a good idea
about how to make the tests more meaningful. Currently all that's tested
are simple commands. Not the main functionality namely the actual
streaming of changes.
I wonder if somebody has a great idea for cancelling pg_recvlogical
after a while. Right now my best idea is to add a new
--stop-after-messages parameter. Does anybody have a better idea?

If that's the way, would somebody object to tests and the parameter
being added now? The potential harm seems pretty low and the additional
tests would cover the walsender interface that's not covered by any
tests right now...

Greetings,

Andres Freund

-- 
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Logical replication woes

2014-05-15 Thread Andres Freund

On 2014-05-15 22:30:53 +0300, Heikki Linnakangas wrote:
> >Attached patch fixes things, but I want to add some regression tests
> >before commit.
> 
> Looks good to me.

Attached are two patches. One for the unitialized dbId/tsId issue; one
for the decoding bug. The former should be backpatched.
Should you wonder about the slight reordering of the assignments in
RecordTransactionCommitPrepared() - I've made it more similar to 
RecordTransactionCommit()
to make it easier to see they are equivalent.

Thanks for the testing!

Greetings,

Andres Freund

-- 
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services
>From 07d0f4330f7c25bcec9e356527b0dc86372d2886 Mon Sep 17 00:00:00 2001
From: Andres Freund 
Date: Thu, 15 May 2014 22:23:12 +0200
Subject: [PATCH 1/2] Initialize all members of xl_xact_commit during prepared
 transaction commits.

Commit dd428c79 added dbId and tsId to the xl_xact_commit struct but
missed that prepared transaction commits reuse that struct. Fix that.

Because those fields were used WAL logged unitialized a hot standby
node could miss relcache init file invalidations leading to errors like
ERROR:  could not open file "...": No such file or directory
on the standby. A restart of the database is sufficient to fix the
problem.
As problems can only be triggered when a system table/index has been
rewritten in a transaction using two phase commit the problem is
unlikely to have affected many installations.

Found while investigating a logical decoding bugreport from Heikki.

Backpatch to 9.0 where the bug was introduced.
---
 src/backend/access/transam/twophase.c | 8 ++--
 1 file changed, 6 insertions(+), 2 deletions(-)

diff --git a/src/backend/access/transam/twophase.c b/src/backend/access/transam/twophase.c
index 2cefa08..d5409a6 100644
--- a/src/backend/access/transam/twophase.c
+++ b/src/backend/access/transam/twophase.c
@@ -2082,9 +2082,13 @@ RecordTransactionCommitPrepared(TransactionId xid,
 
 	/* Emit the XLOG commit record */
 	xlrec.xid = xid;
-	xlrec.crec.xact_time = GetCurrentTimestamp();
+
 	xlrec.crec.xinfo = initfileinval ? XACT_COMPLETION_UPDATE_RELCACHE_FILE : 0;
-	xlrec.crec.nmsgs = 0;
+
+	xlrec.crec.dbId = MyDatabaseId;
+	xlrec.crec.tsId = MyDatabaseTableSpace;
+
+	xlrec.crec.xact_time = GetCurrentTimestamp();
 	xlrec.crec.nrels = nrels;
 	xlrec.crec.nsubxacts = nchildren;
 	xlrec.crec.nmsgs = ninvalmsgs;
-- 
2.0.0.rc2.4.g1dc51c6.dirty

>From de22d8e5308b1c4c509d8566317d9aea0956c2bc Mon Sep 17 00:00:00 2001
From: Andres Freund 
Date: Thu, 15 May 2014 22:42:27 +0200
Subject: [PATCH 2/2] Fix typo-induced bug in decoding of prepared
 transactions.

The decoding of prepared transaction commits accidentally used the xid
of the transaction performing the COMMIT PREPARED. Before
bb38fb0d43c8d that lead to those transactions not being decoded,
afterwards to a assertion failure.
Add tests for the decoding of prepared transactions.

Bug found and fixed by Heikki Linnakangas; new regression test by
Andres Freund.
---
 contrib/test_decoding/Makefile  |  2 +-
 contrib/test_decoding/expected/ddl.out  |  6 +--
 contrib/test_decoding/expected/prepared.out | 82 +
 contrib/test_decoding/sql/ddl.sql   |  2 +-
 contrib/test_decoding/sql/prepared.sql  | 50 ++
 src/backend/replication/logical/decode.c|  2 +-
 6 files changed, 138 insertions(+), 6 deletions(-)
 create mode 100644 contrib/test_decoding/expected/prepared.out
 create mode 100644 contrib/test_decoding/sql/prepared.sql

diff --git a/contrib/test_decoding/Makefile b/contrib/test_decoding/Makefile
index 685986c..58e0f38 100644
--- a/contrib/test_decoding/Makefile
+++ b/contrib/test_decoding/Makefile
@@ -37,7 +37,7 @@ submake-isolation:
 submake-test_decoding:
 	$(MAKE) -C $(top_builddir)/contrib/test_decoding
 
-REGRESSCHECKS=ddl rewrite toast permissions decoding_in_xact binary
+REGRESSCHECKS=ddl rewrite toast permissions decoding_in_xact binary prepared
 
 regresscheck: all | submake-regress submake-test_decoding
 	$(MKDIR_P) regression_output
diff --git a/contrib/test_decoding/expected/ddl.out b/contrib/test_decoding/expected/ddl.out
index 05a4bd3..37ff8b7 100644
--- a/contrib/test_decoding/expected/ddl.out
+++ b/contrib/test_decoding/expected/ddl.out
@@ -640,8 +640,8 @@ SELECT pg_drop_replication_slot('regression_slot');
 (1 row)
 
 /* check that we aren't visible anymore now */
-SELECT * FROM pg_stat_replication;
- pid | usesysid | usename | application_name | client_addr | client_hostname | client_port | backend_start | backend_xmin | state | sent_location | write_location | flush_location | replay_location | sync_priority | sync_state 
--+--+-+--+-+-+-+---+--+---+---+++-+---+
+SELECT * FROM pg_replic

Re: [HACKERS] buildfarm animals and 'snapshot too old'

2014-05-15 Thread Andrew Dunstan



On 05/15/2014 04:30 PM, Stefan Kaltenbrunner wrote:

On 05/15/2014 07:46 PM, Andrew Dunstan wrote:

On 05/15/2014 12:43 PM, Tomas Vondra wrote:

Hi all,

today I got a few of errors like these (this one is from last week,
though):

 Status Line: 493 snapshot too old: Wed May  7 04:36:57 2014 GMT
 Content:
 snapshot to old: Wed May  7 04:36:57 2014 GMT

on the new buildfarm animals. I believe it was my mistake (incorrectly
configured local git mirror), but it got me thinking about how this will
behave with the animals running CLOBBER_CACHE_RECURSIVELY.

If I understand the Perl code correctly, it does this:

(1) update the repository
(2) run the tests
(3) check that the snapshot is not older than 24 hours (pgstatus.pl:188)
(4) fail if older

Now, imagine that the test runs for days/weeks. This pretty much means
it's wasted, because the results will be thrown away anyway, no?



The 24 hours runs from the time of the latest commit on the branch in
question, not the current time, but basically yes.

We've never had machines with runs that long. The longest in recent
times has been friarbird, which runs CLOBBER_CACHE_ALWAYS and takes
around 4.5 hours. But we have had misconfigured machines reporting
unbelievable snapshot times.  I'll take a look and see if we can tighten
up the sanity check. It's worth noting that one thing friarbird does is
skip the install-check stage - it's almost certainly not going to have
terribly much interesting to tell us from that, given it has already run
a plain "make check".

well I'm not sure about about "misconfigured" but both my personal
buildfarm members and pginfra run ones (like gaibasaurus) got errors
complaining about "snapshot too old" in the past for long running tests
so I'm not sure it is really a "we never had machine with runs that
long". So maybe we should not reject those submissions at submission
time but rather mark them clearly on the dashboard and leave the final
interpretation to a human...





That's a LOT harder and more work to arrange. Frankly, there are more 
important things to do.


I would like to know the circumstances of these very long runs. I drive 
some of my VMs pretty hard on pretty modest hardware, and they don't 
come close to running 24 hours.


The current behaviour goes back to this commit from December 2011:

   commit a8b5049e64f9cb08f8e165d0737139dab74e3bce
   Author: Andrew Dunstan 
   Date:   Wed Dec 14 14:38:44 2011 -0800

Use git snapshot instead of fixed 10 day timeout.

The sanity checks made sure that an animal wasn't submitting a
snapshot that was too old. But sometimes an old branch doesn't
get any changes for more than 10 days. So accept a snapshot that
is not more than 1 day older than the last known snapshot. Per
complaint from Stefan.


I'm prepared to increase the sanity check time if there is a serious 
demand for it, but I'd like to know what to increase it to.


cheers

andrew





--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] buildfarm animals and 'snapshot too old'

2014-05-15 Thread Stefan Kaltenbrunner

On 05/15/2014 07:46 PM, Andrew Dunstan wrote:
> 
> On 05/15/2014 12:43 PM, Tomas Vondra wrote:
>> Hi all,
>>
>> today I got a few of errors like these (this one is from last week,
>> though):
>>
>> Status Line: 493 snapshot too old: Wed May  7 04:36:57 2014 GMT
>> Content:
>> snapshot to old: Wed May  7 04:36:57 2014 GMT
>>
>> on the new buildfarm animals. I believe it was my mistake (incorrectly
>> configured local git mirror), but it got me thinking about how this will
>> behave with the animals running CLOBBER_CACHE_RECURSIVELY.
>>
>> If I understand the Perl code correctly, it does this:
>>
>> (1) update the repository
>> (2) run the tests
>> (3) check that the snapshot is not older than 24 hours (pgstatus.pl:188)
>> (4) fail if older
>>
>> Now, imagine that the test runs for days/weeks. This pretty much means
>> it's wasted, because the results will be thrown away anyway, no?
>>
> 
> 
> The 24 hours runs from the time of the latest commit on the branch in
> question, not the current time, but basically yes.
> 
> We've never had machines with runs that long. The longest in recent
> times has been friarbird, which runs CLOBBER_CACHE_ALWAYS and takes
> around 4.5 hours. But we have had misconfigured machines reporting
> unbelievable snapshot times.  I'll take a look and see if we can tighten
> up the sanity check. It's worth noting that one thing friarbird does is
> skip the install-check stage - it's almost certainly not going to have
> terribly much interesting to tell us from that, given it has already run
> a plain "make check".

well I'm not sure about about "misconfigured" but both my personal
buildfarm members and pginfra run ones (like gaibasaurus) got errors
complaining about "snapshot too old" in the past for long running tests
so I'm not sure it is really a "we never had machine with runs that
long". So maybe we should not reject those submissions at submission
time but rather mark them clearly on the dashboard and leave the final
interpretation to a human...




Stefan


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Proposal for CSN based snapshots

2014-05-15 Thread Andres Freund

On 2014-05-15 16:13:49 -0400, Bruce Momjian wrote:
> On Thu, May 15, 2014 at 10:06:32PM +0200, Andres Freund wrote:
> > > If the larger clog size is a show-stopper (and I'm not sure I have an
> > > intelligent opinion on that just yet), one way to get around the
> > > problem would be to summarize CLOG entries after-the-fact.  Once an
> > > XID precedes the xmin of every snapshot, we don't need to know the
> > > commit LSN any more.  So we could read the old pg_clog files and write
> > > new summary files.  Since we don't need to care about subcommitted
> > > transactions either, we could get by with just 1 bit per transaction,
> > > 1 = committed, 0 = aborted.  Once we've written and fsync'd the
> > > summary files, we could throw away the original files.  That might
> > > leave us with a smaller pg_clog than what we have today.
> > 
> > I think the easiest way for now would be to have pg_clog with the same
> > format as today and a rangewise much smaller pg_csn storing the lsns
> > that are needed. That'll leave us with pg_upgrade'ability without
> > needing to rewrite pg_clog during the upgrade.
> 
> Yes, I like the idea of storing the CSN separately.  One reason the
> 2-bit clog is so good is that we know we have atomic 1-byte writes on
> all platforms.

I don't think we rely on that anywhere. And in fact we don't have the
ability to do so for arbitrary bytes - lots of platforms can do that
only on specifically aligned bytes.

We rely on being able to atomically (as in either before/after no torn
value) write/read TransactionIds, but that's it I think?

>  Can we assume atomic 64-bit writes?

Not on 32bit platforms.

Greetings,

Andres Freund

-- 
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] buildfarm animals and 'snapshot too old'

2014-05-15 Thread Tom Lane

Andrew Dunstan  writes:
> Incidentally, should the CLOBBER_CACHE_ALWAYS machines also be defining 
> CLOBBER_FREED_MEMORY?

The former does need the latter or it's not very thorough.  However,
CLOBBER_FREED_MEMORY is defined automatically by --enable-cassert,
so you shouldn't need to use a -D switch for it.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Proposal for CSN based snapshots

2014-05-15 Thread Bruce Momjian

On Thu, May 15, 2014 at 10:06:32PM +0200, Andres Freund wrote:
> > If the larger clog size is a show-stopper (and I'm not sure I have an
> > intelligent opinion on that just yet), one way to get around the
> > problem would be to summarize CLOG entries after-the-fact.  Once an
> > XID precedes the xmin of every snapshot, we don't need to know the
> > commit LSN any more.  So we could read the old pg_clog files and write
> > new summary files.  Since we don't need to care about subcommitted
> > transactions either, we could get by with just 1 bit per transaction,
> > 1 = committed, 0 = aborted.  Once we've written and fsync'd the
> > summary files, we could throw away the original files.  That might
> > leave us with a smaller pg_clog than what we have today.
> 
> I think the easiest way for now would be to have pg_clog with the same
> format as today and a rangewise much smaller pg_csn storing the lsns
> that are needed. That'll leave us with pg_upgrade'ability without
> needing to rewrite pg_clog during the upgrade.

Yes, I like the idea of storing the CSN separately.  One reason the
2-bit clog is so good is that we know we have atomic 1-byte writes on
all platforms.  Can we assume atomic 64-bit writes?

-- 
  Bruce Momjian  http://momjian.us
  EnterpriseDB http://enterprisedb.com

  + Everyone has their own god. +


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] buildfarm animals and 'snapshot too old'

2014-05-15 Thread Andrew Dunstan



On 05/15/2014 03:57 PM, Tomas Vondra wrote:

How long does a CLOBBER_CACHE_RECURSIVELY run take? days or weeks seems
kinda nuts.

I don't know. According to this comment from cache/inval.c, it's expected
to be way slower (~100x) compared to CLOBBER_CACHE_ALWAYS.

/*
  * Test code to force cache flushes anytime a flush could happen.
  *
  * If used with CLOBBER_FREED_MEMORY, CLOBBER_CACHE_ALWAYS provides a
  * fairly thorough test that the system contains no cache-flush hazards.
  * However, it also makes the system unbelievably slow --- the regression
  * tests take about 100 times longer than normal.
  *
  * If you're a glutton for punishment, try CLOBBER_CACHE_RECURSIVELY. This
  * slows things by at least a factor of 1, so I wouldn't suggest
  * trying to run the entire regression tests that way.It's useful to try
  * a few simple tests, to make sure that cache reload isn't subject to
  * internal cache-flush hazards, but after you've done a few thousand
  * recursive reloads it's unlikely you'll learn more.
  */



Yes, I've seen that. Frankly, a test that takes something like 500 hours 
is a bit crazy.


If we really want to run this in the buildfarm we should probably try to 
create a massively cut down test schedule for use in this case.


Incidentally, should the CLOBBER_CACHE_ALWAYS machines also be defining 
CLOBBER_FREED_MEMORY?


cheers

andrew



--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Proposal for CSN based snapshots

2014-05-15 Thread Andres Freund

On 2014-05-15 15:40:06 -0400, Robert Haas wrote:
> On Thu, May 15, 2014 at 2:34 PM, Bruce Momjian  wrote:
> > On Mon, May 12, 2014 at 06:01:59PM +0300, Heikki Linnakangas wrote:
> >> >Some of the stuff in here will be influence whether your freezing
> >> >replacement patch gets in. Do you plan to further pursue that one?
> >>
> >> Not sure. I got to the point where it seemed to work, but I got a
> >> bit of a cold feet proceeding with it. I used the page header's LSN
> >> field to define the "epoch" of the page, but I started to feel
> >> uneasy about it. I would be much more comfortable with an extra
> >> field in the page header, even though that uses more disk space. And
> >> requires dealing with pg_upgrade.
> >
> > FYI, pg_upgrade copies pg_clog from the old cluster, so there will be a
> > pg_upgrade issue anyway.
> >
> > I am not excited about a 32x increase in clog size, especially since we
> > already do freezing at 200M transactions to allow for more aggressive
> > clog trimming.  Extrapolating that out, it means we would freeze every
> > 6.25M transactions.

The default setting imo is far too low for a database of any relevant
activity. If I had the stomach for the fight around it I'd suggest
increasing it significantly by default. People with small databases
won't be hurt significantly because they simply don't have that many
transactions and autovacuum will get around to cleanup long before
normally.

> It seems better to allow clog to grow larger than to force
> more-frequent freezing.

Yes.

> If the larger clog size is a show-stopper (and I'm not sure I have an
> intelligent opinion on that just yet), one way to get around the
> problem would be to summarize CLOG entries after-the-fact.  Once an
> XID precedes the xmin of every snapshot, we don't need to know the
> commit LSN any more.  So we could read the old pg_clog files and write
> new summary files.  Since we don't need to care about subcommitted
> transactions either, we could get by with just 1 bit per transaction,
> 1 = committed, 0 = aborted.  Once we've written and fsync'd the
> summary files, we could throw away the original files.  That might
> leave us with a smaller pg_clog than what we have today.

I think the easiest way for now would be to have pg_clog with the same
format as today and a rangewise much smaller pg_csn storing the lsns
that are needed. That'll leave us with pg_upgrade'ability without
needing to rewrite pg_clog during the upgrade.

Greetings,

Andres Freund

-- 
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Logical replication woes

2014-05-15 Thread Andres Freund

On 2014-05-15 22:30:53 +0300, Heikki Linnakangas wrote:
> On 05/15/2014 08:46 PM, Andres Freund wrote:
> >On 2014-05-15 20:07:23 +0300, Heikki Linnakangas wrote:
> >How very wierd. The reason for this is that
> >RecordTransactionCommitPrepared() forgets to fill a couple of fields in
> >xl_xact_commit. Any reason dbId/tsId aren't filled? They aren't strictly
> >needed because afaics they're only looked at for relcache invalidations
> >which prepared xacts don't support, but still?
> 
> Seems like an oversight in commit dd428c79, which added the fields to
> xl_xact_commit. They are needed. A prepared xact can indeed cause relcache
> invalidations, and removal of the init file. For example:

Hm, so that part has to be backpatched to 9.0. Ick, I wonder if that's
been hit in production. Seems like it could cause pretty random looking
errors. It's easy enough to cause errors like:
ERROR:  could not open file "base/12753/12613": No such file or directory
I guess not many people will do relevant stuff in prepared xacts tho.

Greetings,

Andres Freund

-- 
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] buildfarm animals and 'snapshot too old'

2014-05-15 Thread Tomas Vondra

On 15 Květen 2014, 19:46, Andrew Dunstan wrote:
>
> On 05/15/2014 12:43 PM, Tomas Vondra wrote:
>> Hi all,
>>
>> today I got a few of errors like these (this one is from last week,
>> though):
>>
>> Status Line: 493 snapshot too old: Wed May  7 04:36:57 2014 GMT
>> Content:
>> snapshot to old: Wed May  7 04:36:57 2014 GMT
>>
>> on the new buildfarm animals. I believe it was my mistake (incorrectly
>> configured local git mirror), but it got me thinking about how this will
>> behave with the animals running CLOBBER_CACHE_RECURSIVELY.
>>
>> If I understand the Perl code correctly, it does this:
>>
>> (1) update the repository
>> (2) run the tests
>> (3) check that the snapshot is not older than 24 hours (pgstatus.pl:188)
>> (4) fail if older
>>
>> Now, imagine that the test runs for days/weeks. This pretty much means
>> it's wasted, because the results will be thrown away anyway, no?
>>
>
>
> The 24 hours runs from the time of the latest commit on the branch in
> question, not the current time, but basically yes.
>
> We've never had machines with runs that long. The longest in recent
> times has been friarbird, which runs CLOBBER_CACHE_ALWAYS and takes
> around 4.5 hours. But we have had misconfigured machines reporting
> unbelievable snapshot times.  I'll take a look and see if we can tighten
> up the sanity check. It's worth noting that one thing friarbird does is
> skip the install-check stage - it's almost certainly not going to have
> terribly much interesting to tell us from that, given it has already run
> a plain "make check".
>
> How long does a CLOBBER_CACHE_RECURSIVELY run take? days or weeks seems
> kinda nuts.

I don't know. According to this comment from cache/inval.c, it's expected
to be way slower (~100x) compared to CLOBBER_CACHE_ALWAYS.

/*
 * Test code to force cache flushes anytime a flush could happen.
 *
 * If used with CLOBBER_FREED_MEMORY, CLOBBER_CACHE_ALWAYS provides a
 * fairly thorough test that the system contains no cache-flush hazards.
 * However, it also makes the system unbelievably slow --- the regression
 * tests take about 100 times longer than normal.
 *
 * If you're a glutton for punishment, try CLOBBER_CACHE_RECURSIVELY. This
 * slows things by at least a factor of 1, so I wouldn't suggest
 * trying to run the entire regression tests that way.It's useful to try
 * a few simple tests, to make sure that cache reload isn't subject to
 * internal cache-flush hazards, but after you've done a few thousand
 * recursive reloads it's unlikely you'll learn more.
 */

regards
Tomas

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] GSOC13 proposal - extend RETURNING syntax

2014-05-15 Thread Arthur Axel 'fREW' Schmidt

Andres Freund  anarazel.de> writes:

> 
> Hi,
> 
> Some comments about the patch:
> * Coding Style:
>   * multiline comments have both /* and */ on their own lines.
>   * I think several places indent by two tabs.
>   * Spaces around operators
>   * ...
> * Many of the new comments would enjoy a bit TLC by a native speaker.
> 
> * The way RTE_ALIAS creeps in many place where it doesn't seem to belong
>   seems to indicate that the design isn't yet ready. I share Robert's
>   suspicion that this would be better solved by referring to a special
>   range table entry.
> 
> Based on the lack of activity around this and the fact that this needs a
> *significant* chunk of work before being committable, I am going to mark
> this as returned with feedback.

I'm actively working towards converting our software at work to use Pg
instead of SQL Server and before we switch we'll need this feature to be
merged.  I'll do what I can to get the verbage and style whipped into shape,
though I doubt I can do much with the actual code.  Hopefully I can get
something in soon.



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] autovacuum scheduling starvation and frenzy

2014-05-15 Thread Alvaro Herrera

Jeff Janes wrote:

> If you have a database with a large table in it that has just passed
> autovacuum_freeze_max_age, all future workers will be funnelled into that
> database until the wrap-around completes.  But only one of those workers
> can actually vacuum the one table which is holding back the frozenxid.
> Maybe the 2nd worker to come along will find other useful work to do, but
> eventually all the vacuuming that needs doing is already in progress, and
> so each worker starts up, gets directed to this database, finds it can't
> help, and exits.  So all other databases are entirely starved of
> autovacuuming for the entire duration of the wrap-around vacuuming of this
> one large table.

Bah.  Of course :-(

Note that if you have two databases in danger of wraparound, the oldest
will always be chosen until it's no longer in danger.  Ignoring the
second one past freeze_max_age seems bad also.

This code is in autovacuum.c, do_start_worker().  Not sure what does
your proposal look like in terms of code.  I think that instead of
trying to get a single target database in that foreach loop, we could
try to build a prioritized list (in-wraparound-danger first, then
in-multixid-wraparound danger, then the one with the oldest autovac time
of all the ones that remain); then recheck the wrap-around condition by
seeing whether there are other workers in that database that started
after the wraparound condition appeared.  If there are, move down the
list.  The first in the list not skipped is chosen for vacuuming.

(Do we need to consider the situation that all databases were skipped by
the above logic, and if so then perhaps pick up the first DB in the
list?)

-- 
Álvaro Herrerahttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Proposal for CSN based snapshots

2014-05-15 Thread Robert Haas

On Thu, May 15, 2014 at 2:34 PM, Bruce Momjian  wrote:
> On Mon, May 12, 2014 at 06:01:59PM +0300, Heikki Linnakangas wrote:
>> >Some of the stuff in here will be influence whether your freezing
>> >replacement patch gets in. Do you plan to further pursue that one?
>>
>> Not sure. I got to the point where it seemed to work, but I got a
>> bit of a cold feet proceeding with it. I used the page header's LSN
>> field to define the "epoch" of the page, but I started to feel
>> uneasy about it. I would be much more comfortable with an extra
>> field in the page header, even though that uses more disk space. And
>> requires dealing with pg_upgrade.
>
> FYI, pg_upgrade copies pg_clog from the old cluster, so there will be a
> pg_upgrade issue anyway.
>
> I am not excited about a 32x increase in clog size, especially since we
> already do freezing at 200M transactions to allow for more aggressive
> clog trimming.  Extrapolating that out, it means we would freeze every
> 6.25M transactions.

It seems better to allow clog to grow larger than to force
more-frequent freezing.

If the larger clog size is a show-stopper (and I'm not sure I have an
intelligent opinion on that just yet), one way to get around the
problem would be to summarize CLOG entries after-the-fact.  Once an
XID precedes the xmin of every snapshot, we don't need to know the
commit LSN any more.  So we could read the old pg_clog files and write
new summary files.  Since we don't need to care about subcommitted
transactions either, we could get by with just 1 bit per transaction,
1 = committed, 0 = aborted.  Once we've written and fsync'd the
summary files, we could throw away the original files.  That might
leave us with a smaller pg_clog than what we have today.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Logical replication woes

2014-05-15 Thread Heikki Linnakangas


On 05/15/2014 08:46 PM, Andres Freund wrote:

On 2014-05-15 20:07:23 +0300, Heikki Linnakangas wrote:

Ok, so the immediate cause was quick to find: when decoding a
commit-prepared WAL record, we have to use the XID from the record content
(patch attached). The XID in the record header is the XID of the transaction
doing the COMMIT PREPARED, which is always 0 after patch
bb38fb0d43c8d7ff54072bfd8bd63154e536b384 which causes the assertion. But it
was always wrong. After fixing, it no longer asserts or gives the above
"could not map filenode" error.

However, it still doesn't seem right. When I do the above as a regular
transaction, ie:

begin; insert into foo values (6);  alter table foo alter column id type
text; commit;

pg_recvlogical prints this:

BEGIN 708
table public.foo: INSERT: id[text]:'6'
COMMIT 708

But if I do it as a prepared transaction:

begin; insert into foo values (7);  alter table foo alter column id type
text; prepare transaction 'foo'; commit prepared 'foo';


How very wierd. The reason for this is that
RecordTransactionCommitPrepared() forgets to fill a couple of fields in
xl_xact_commit. Any reason dbId/tsId aren't filled? They aren't strictly
needed because afaics they're only looked at for relcache invalidations
which prepared xacts don't support, but still?


Seems like an oversight in commit dd428c79, which added the fields to 
xl_xact_commit. They are needed. A prepared xact can indeed cause 
relcache invalidations, and removal of the init file. For example:


 begin; cluster pg_opclass using pg_opclass_oid_index ; prepare 
transaction 'foo'; commit prepared 'foo';



Attached patch fixes things, but I want to add some regression tests
before commit.


Looks good to me.

- Heikki


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] New timezones used in regression tests

2014-05-15 Thread Bruce Momjian

On Thu, May 15, 2014 at 02:47:21PM -0400, Tom Lane wrote:
> Bruce Momjian  writes:
> > On Mon, May 12, 2014 at 07:16:48PM -0400, Tom Lane wrote:
> >> I agree, that seems an entirely gratuitous choice of zone.  It does
> >> seem like a good idea to test a zone that has a nonintegral offset
> >> from GMT, but we can get that from almost anywhere as long as we're
> >> testing a pre-1900 date.  There's no need to use any zones that aren't
> >> long-established and unlikely to change.
> 
> > If we want a nonintegral offset, why are we not using 'Asia/Calcutta',
> > which is +5:30 from UTC?
> 
> I believe there's already one of those tests that considers a zone like
> that.  No, I meant a really odd offset, like Paris' +0:09:21 before they
> adopted standardized time.

Wow, OK, got it.

-- 
  Bruce Momjian  http://momjian.us
  EnterpriseDB http://enterprisedb.com

  + Everyone has their own god. +


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

[HACKERS] autovacuum scheduling starvation and frenzy

2014-05-15 Thread Jeff Janes

In testing 9.4 with some long running tests, I noticed that autovacuum
launcher/worker sometimes goes a bit nuts.  It vacuums the same database
repeatedly without respect to the nap time.

As far as I can tell, the behavior is the same in older versions, but I
haven't tested that.

This is my understanding of what is happening:

If you have a database with a large table in it that has just passed
autovacuum_freeze_max_age, all future workers will be funnelled into that
database until the wrap-around completes.  But only one of those workers
can actually vacuum the one table which is holding back the frozenxid.
Maybe the 2nd worker to come along will find other useful work to do, but
eventually all the vacuuming that needs doing is already in progress, and
so each worker starts up, gets directed to this database, finds it can't
help, and exits.  So all other databases are entirely starved of
autovacuuming for the entire duration of the wrap-around vacuuming of this
one large table.

Also, the launcher decides when to launch the next worker by looking at the
scheduled time of the least-recently-vacuumed database (with the implicit
intention that that is the one that will get chosen to vacuum next).  But
since the worker gets redirected to the wrap-around database instead of the
least-recently-vacuumed database, the least-recently-vacuumed database
never gets it schedule updated and always looks like it is chronologically
overdue.  That means the launcher keeps launching new workers as fast as
the previous ones exit, ignoring the nap time. So there is one long running
worker actually making progress, plus a frenzy of workers all attacking the
same database, finding that there is nothing they can do.

I think that a database more than autovacuum_freeze_max_age should get
first priority, but only if its next scheduled vacuum time is in the past.
 If it can beneficially use more than one vacuum worker, they would usually
accumulate there naturally within a few naptimes iterations[1].  And if it
can't usefully use more than one worker, don't prevent other databases from
using them.

[1] you could argue that all other max_workers processes could become
pinned down in long running vacuums of other nonrisk databases between the
time that the database crosses autovacuum_freeze_max_age (and has its first
worker started), and the time its nap time expires and so it becomes
eligible for a second one.  But that seems like a weak argument, as it
could just have easily happened that all of them got pinned down in nonrisk
databases a few transactions *before* the database crosses
autovacuum_freeze_max_age in the first place.

Does this analysis and proposal seem sound?

Cheers,

Jeff

Re: Race condition between PREPARE TRANSACTION and COMMIT PREPARED (was Re: [HACKERS] Problem with txid_snapshot_in/out() functionality)

2014-05-15 Thread Robert Haas

On Thu, May 15, 2014 at 10:38 AM, Andres Freund  wrote:
>> . async.c and namespace.c does the same, and it hasn't been a
>> problem.
>
> Well, it doesn't seem unreasonable to have C code using
> PG_ENSURE_ERROR_CLEANUP/PG_END_ENSURE_ERROR_CLEANUP around a 2pc commit
> to me. That'll break with this.
> Perhaps we should just finally make cancel_before_shmem_exit search the
> stack of callbacks.

Yes, please.  And while we're at it, perhaps we should make it Trap()
or fail an Assert() if it doesn't find the callback it was told to
remove.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] proposal: Set effective_cache_size to greater of .conf value, shared_buffers

2014-05-15 Thread Robert Haas

On Thu, May 15, 2014 at 8:06 AM, Bruce Momjian  wrote:
> On Tue, May  6, 2014 at 11:15:17PM +0100, Simon Riggs wrote:
>> > Well, for what it's worth, I've encountered systems where setting
>> > effective_cache_size too low resulted in bad query plans, but I've
>> > never encountered the reverse situation.
>>
>> I agree with that.
>>
>> Though that misses my point, which is that you can't know that all of
>> that memory is truly available on a server with many concurrent users.
>> Choosing settings that undercost memory intensive plans are not the
>> best choice for a default strategy in a mixed workload when cache may
>> be better used elsewhere, even if such settings make sense for some
>> individual users.
>
> This is the same problem we had with auto-tuning work_mem, in that we
> didn't know what other concurrent activity was happening.  Seems we need
> concurrent activity detection before auto-tuning work_mem and
> effective_cache_size.

I think it's worse than that: we don't even know what else is
happening *in the same query*.  For example, look at this:

http://www.postgresql.org/message-id/16161.1324414...@sss.pgh.pa.us

That's pretty awful, and it's just one example of a broader class of
problems that we haven't even tried to solve.  We really need a way to
limit memory usage on a per-query basis rather than a per-node basis.
For example, consider a query plan that needs to do four sorts.  If
work_mem = 64MB, we'll happily use 256MB total, 64MB for each sort.
Now, that might cause the system to swap: since there are four sorts,
maybe we ought to have used only 16MB per sort, and switched to a heap
sort if that wasn't enough.  But it's even subtler than that: if we
had known when building the query plan that we only had 16MB per sort
rather than 64MB per sort, we would potentially have estimated higher
costs for those sorts in the first place, which might have led to a
different plan that needed fewer sorts to begin with.

When you start to try to balance memory usage across multiple
backends, things get even more complicated.  If the first query that
starts up is allowed to use all the available memory, and we respond
to that by lowering the effective value of work_mem to something very
small, a second query that shows up a bit later might choose a very
inefficient plan as a result.  That in turn might cause heavy I/O load
on the system for a long time, making the first query run very slowly.
 We might have been better off just letting the first query finish,
and the running the second one (with a much better plan) after it was
done.  Or, maybe we should have only let the first query take a
certain fraction (half? 10%?) of the available memory, so that there
was more left for the second guy.  But that could be wrong too - it
might cause the first plan to be unnecessarily inefficient when nobody
was planning to run any other queries anyway.  Plus, DBAs hate it when
plans change on them unexpectedly, so anything that involves a
feedback loop between current utilization and query plans will be
unpopular with some people for that reason.

These are hard problems.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] gettimeofday is at the end of its usefulness?

2014-05-15 Thread Bruce Momjian

On Tue, May 13, 2014 at 06:58:11PM -0400, Tom Lane wrote:
> A recent question from Tim Kane prompted me to measure the overhead
> costs of EXPLAIN ANALYZE, which I'd not checked in awhile.  Things
> are far worse than I thought.  On my current server (by no means
> lavish hardware: Xeon E5-2609 @2.40GHz) a simple seqscan can run
> at something like 110 nsec per row:

I assume you ran pg_test_timing too:

Testing timing overhead for 3 seconds.
Per loop time including overhead: 41.70 nsec
Histogram of timing durations:
< usec   % of total  count
 1 95.83035   68935459
 2  4.169232999133
 4  0.00037268
 8  0.4 31
16  0.0  1
32  0.0  1

My overhead of 41.70 nsec matches yours.

-- 
  Bruce Momjian  http://momjian.us
  EnterpriseDB http://enterprisedb.com

  + Everyone has their own god. +


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] New timezones used in regression tests

2014-05-15 Thread David Fetter

On Tue, May 13, 2014 at 09:55:26AM -0400, Alvaro Herrera wrote:
> Christoph Berg wrote:
> 
> > Of course, Wikipedia has something to say about this:
> > http://en.wikipedia.org/wiki/Timekeeping_on_Mars
> 
> Nice.
> 
> > I especially like MTC, Mars Time Coordinated. But whatever scheme gets
> > chosen, it won't be a standard 24h day, so PostgreSQL has a whole lot
> > of different problems to solve than to "fix" that little
> > Mars/Mons_Olympus gem now... :)
> 
> Maybe a new type, mars_timestamptz()?  Or perhaps the celestial body
> name should be part of the typmod for standard timestamptz ...?

The latter seems a good bit more extensible.

Conversions among the different timestamptzs might be problematic, as
we are currently assuming certain approximations about spacetime that
don't actually hold when we have time zones in significantly different
reference frames.

Cheers,
David.
-- 
David Fetter  http://fetter.org/
Phone: +1 415 235 3778  AIM: dfetter666  Yahoo!: dfetter
Skype: davidfetter  XMPP: david.fet...@gmail.com
iCal: webcal://www.tripit.com/feed/ical/people/david74/tripit.ics

Remember to vote!
Consider donating to Postgres: http://www.postgresql.org/about/donate


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] New timezones used in regression tests

2014-05-15 Thread Tom Lane

Bruce Momjian  writes:
> On Mon, May 12, 2014 at 07:16:48PM -0400, Tom Lane wrote:
>> I agree, that seems an entirely gratuitous choice of zone.  It does
>> seem like a good idea to test a zone that has a nonintegral offset
>> from GMT, but we can get that from almost anywhere as long as we're
>> testing a pre-1900 date.  There's no need to use any zones that aren't
>> long-established and unlikely to change.

> If we want a nonintegral offset, why are we not using 'Asia/Calcutta',
> which is +5:30 from UTC?

I believe there's already one of those tests that considers a zone like
that.  No, I meant a really odd offset, like Paris' +0:09:21 before they
adopted standardized time.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Freezing without write I/O

2014-05-15 Thread Robert Haas

On Wed, May 14, 2014 at 8:46 PM, Jeff Janes  wrote:
>> +1.  I can't think of many things we might do that would be more
>> important.
>
> Can anyone guess how likely this approach is to make it into 9.5?  I've been
> pondering some incremental improvements over what we have now, but if this
> revolutionary approach has a high chance of landing then any work on
> incremental improvements would be pointless.

Well, Heikki was saying on another thread that he had kind of gotten
cold feet about this, so I gather he's not planning to pursue it.  Not
sure if I understood that correctly.  If so, I guess it depends on
whether someone else can pick it up, but we might first want to
establish why he got cold feet and how worrying those problems seem to
other people.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] New timezones used in regression tests

2014-05-15 Thread Bruce Momjian

On Mon, May 12, 2014 at 07:16:48PM -0400, Tom Lane wrote:
> Christoph Berg  writes:
> > 84df54b22e8035addc7108abd9ff6995e8c49264 introduced timestamp
> > constructors. In the regression tests, various time zones are tested,
> > including America/Metlakatla. Now, if you configure using
> > --with-system-tzdata, you'll get an error if that zone isn't there.
> > Unfortunately, this is what I'm getting now when trying to build beta1
> > on Ubuntu 10.04 (lucid) with tzdata 2010i-1:
> 
> I agree, that seems an entirely gratuitous choice of zone.  It does
> seem like a good idea to test a zone that has a nonintegral offset
> from GMT, but we can get that from almost anywhere as long as we're
> testing a pre-1900 date.  There's no need to use any zones that aren't
> long-established and unlikely to change.

If we want a nonintegral offset, why are we not using 'Asia/Calcutta',
which is +5:30 from UTC?

-- 
  Bruce Momjian  http://momjian.us
  EnterpriseDB http://enterprisedb.com

  + Everyone has their own god. +


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Proposal for CSN based snapshots

2014-05-15 Thread Bruce Momjian

On Mon, May 12, 2014 at 06:01:59PM +0300, Heikki Linnakangas wrote:
> >Some of the stuff in here will be influence whether your freezing
> >replacement patch gets in. Do you plan to further pursue that one?
> 
> Not sure. I got to the point where it seemed to work, but I got a
> bit of a cold feet proceeding with it. I used the page header's LSN
> field to define the "epoch" of the page, but I started to feel
> uneasy about it. I would be much more comfortable with an extra
> field in the page header, even though that uses more disk space. And
> requires dealing with pg_upgrade.

FYI, pg_upgrade copies pg_clog from the old cluster, so there will be a
pg_upgrade issue anyway.

I am not excited about a 32x increase in clog size, especially since we
already do freezing at 200M transactions to allow for more aggressive
clog trimming.  Extrapolating that out, it means we would freeze every
6.25M transactions.

-- 
  Bruce Momjian  http://momjian.us
  EnterpriseDB http://enterprisedb.com

  + Everyone has their own god. +


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

[HACKERS] pg_dump warnings in MinGW build

2014-05-15 Thread Jeff Janes

Now that popen and pclose don't throw thousands of warnings when compiling
mingw builds, some other warnings stand out.


parallel.c: In function 'pgpipe':
parallel.c:1332:2: warning: overflow in implicit constant conversion
[-Woverflow]
parallel.c:1386:3: warning: overflow in implicit constant conversion
[-Woverflow]

I think the solution is to use the pgsocket typedef from
src/include/port.h, rather than int.  Like attached.

But I'm far from being a typedef lawyer, so maybe I am all wet.

Cheers,

Jeff


pgpipe.patch
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] wrapping in extended mode doesn't work well with default pager

2014-05-15 Thread Pavel Stehule

Hello


2014-05-15 15:04 GMT+02:00 Sergey Muraviov :

> Hi.
> Please review the new patch.
>

This version works perfect

Regards

Pavel


>
> PS
> Issues which were described by Tom and Pavel were relevant to single-line
> headers.
> So I've added appropriate regression tests to the patch.
>
> I've also attached complex regression tests for unicode linestyle and
> multibyte symbols.
>
>
> 2014-05-14 10:55 GMT+04:00 Pavel Stehule :
>
> sorry
>>
>> there is still small issue
>>
>> I have a plpgsql function:
>>
>> CREATE OR REPLACE FUNCTION public.foo_update_trg()
>>  RETURNS trigger
>>  LANGUAGE plpgsql
>> AS $function$
>> DECLARE t text;
>> BEGIN
>>   EXECUTE format('SELECT $1.%I', TG_ARGV[0]) INTO t USING old;
>>   RAISE NOTICE 'original value of "%" is "%"', TG_ARGV[0], t;
>>   RETURN NULL;
>> END;
>> $function$
>>
>> Default expanded view of select * from pg_proc where proname =
>> 'foo_update_trg'; is little bit broken (screenshoot 1)
>>
>> After wrap mode, it add useless new line into source code (screenshoot 2)
>>
>> but border2 fixes it (screenshots 3)
>>
>> Regards
>>
>> Pavel
>>
>>
>>
>> 2014-05-14 8:32 GMT+02:00 Pavel Stehule :
>>
>> Hello
>>>
>>> With this patch it works perfect
>>>
>>> Thank you
>>>
>>> Regards
>>>
>>> Pavel
>>>
>>>
>>> 2014-05-13 21:33 GMT+02:00 Sergey Muraviov 
>>> :
>>>
>>> Please check this patch.


 2014-05-12 22:56 GMT+04:00 Sergey Muraviov >>> >:

 Hi.
>
> I'll try to fix it tomorrow.
>
>
> 2014-05-12 18:42 GMT+04:00 Tom Lane :
>
> Greg Stark  writes:
>> > On Mon, May 12, 2014 at 2:12 PM, Greg Stark  wrote:
>> >> Hm, there was an off by one error earlier in some cases, maybe we
>> >> fixed it by breaking other case. Will investigate.
>>
>> > Those spaces are coming from the ascii wrapping indicators. i.e.
>> the periods in:
>>
>> Ah.  I wonder whether anyone will complain that the format changed?
>>
>> > Apparently we used to print those with border=1 in normal mode but
>> in
>> > expanded mode we left out the space for those on the outermost edges
>> > since there was no need for them. If we put them in for wrapped mode
>> > then we'll be inconsistent if we don't for nonwrapped mode though.
>> And
>> > if we don't put them in for wrapped mode then there's no way to
>> > indicate wrapping versus newlines.
>>
>> Barring anyone complaining that the format changed, I'd say the issue
>> is not that you added them but that the accounting for line length
>> fails to include them.
>>
>> regards, tom lane
>>
>
>
>
> --
> Best regards,
> Sergey Muraviov
>



 --
 Best regards,
 Sergey MuraviovH

>>>
>>>
>>
>
>
> --
> Best regards,
> Sergey Muraviov
>

Re: [HACKERS] Logical replication woes

2014-05-15 Thread Andres Freund

On 2014-05-15 19:46:57 +0200, Andres Freund wrote:
> Attached patch fixes things, but I want to add some regression tests
> before commit.

And now actually attached. Will send a patch with regression tests later
tonight or tomorrow. Need to eat first...

Greetings,

Andres Freund

-- 
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services
diff --git a/src/backend/access/transam/twophase.c b/src/backend/access/transam/twophase.c
index 2cefa08..5d8e195 100644
--- a/src/backend/access/transam/twophase.c
+++ b/src/backend/access/transam/twophase.c
@@ -2082,9 +2082,10 @@ RecordTransactionCommitPrepared(TransactionId xid,
 
 	/* Emit the XLOG commit record */
 	xlrec.xid = xid;
+	xlrec.crec.dbId = MyDatabaseId;
+	xlrec.crec.tsId = MyDatabaseTableSpace;
 	xlrec.crec.xact_time = GetCurrentTimestamp();
 	xlrec.crec.xinfo = initfileinval ? XACT_COMPLETION_UPDATE_RELCACHE_FILE : 0;
-	xlrec.crec.nmsgs = 0;
 	xlrec.crec.nrels = nrels;
 	xlrec.crec.nsubxacts = nchildren;
 	xlrec.crec.nmsgs = ninvalmsgs;
diff --git a/src/backend/replication/logical/decode.c b/src/backend/replication/logical/decode.c
index 06b99e7..603d083 100644
--- a/src/backend/replication/logical/decode.c
+++ b/src/backend/replication/logical/decode.c
@@ -225,7 +225,7 @@ DecodeXactOp(LogicalDecodingContext *ctx, XLogRecordBuffer *buf)
 subxacts = (TransactionId *) &(xlrec->xnodes[xlrec->nrels]);
 invals = (SharedInvalidationMessage *) &(subxacts[xlrec->nsubxacts]);
 
-DecodeCommit(ctx, buf, r->xl_xid, xlrec->dbId,
+DecodeCommit(ctx, buf, prec->xid, xlrec->dbId,
 			 xlrec->xact_time,
 			 xlrec->nsubxacts, subxacts,
 			 xlrec->nmsgs, invals);

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Logical replication woes

2014-05-15 Thread Andres Freund

On 2014-05-15 20:07:23 +0300, Heikki Linnakangas wrote:
> Ok, so the immediate cause was quick to find: when decoding a
> commit-prepared WAL record, we have to use the XID from the record content
> (patch attached). The XID in the record header is the XID of the transaction
> doing the COMMIT PREPARED, which is always 0 after patch
> bb38fb0d43c8d7ff54072bfd8bd63154e536b384 which causes the assertion. But it
> was always wrong. After fixing, it no longer asserts or gives the above
> "could not map filenode" error.
> 
> However, it still doesn't seem right. When I do the above as a regular
> transaction, ie:
> 
> begin; insert into foo values (6);  alter table foo alter column id type
> text; commit;
> 
> pg_recvlogical prints this:
> 
> BEGIN 708
> table public.foo: INSERT: id[text]:'6'
> COMMIT 708
> 
> But if I do it as a prepared transaction:
> 
> begin; insert into foo values (7);  alter table foo alter column id type
> text; prepare transaction 'foo'; commit prepared 'foo';

How very wierd. The reason for this is that
RecordTransactionCommitPrepared() forgets to fill a couple of fields in
xl_xact_commit. Any reason dbId/tsId aren't filled? They aren't strictly
needed because afaics they're only looked at for relcache invalidations
which prepared xacts don't support, but still?
That also explains why decoding for prepared xacts (besides the "typo"
you found) didn't work anymore - the filtering at commit was added
pretty late...

Attached patch fixes things, but I want to add some regression tests
before commit.

Greetings,

Andres Freund

-- 
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] buildfarm animals and 'snapshot too old'

2014-05-15 Thread Andrew Dunstan



On 05/15/2014 12:43 PM, Tomas Vondra wrote:

Hi all,

today I got a few of errors like these (this one is from last week, though):

Status Line: 493 snapshot too old: Wed May  7 04:36:57 2014 GMT
Content:
snapshot to old: Wed May  7 04:36:57 2014 GMT

on the new buildfarm animals. I believe it was my mistake (incorrectly
configured local git mirror), but it got me thinking about how this will
behave with the animals running CLOBBER_CACHE_RECURSIVELY.

If I understand the Perl code correctly, it does this:

(1) update the repository
(2) run the tests
(3) check that the snapshot is not older than 24 hours (pgstatus.pl:188)
(4) fail if older

Now, imagine that the test runs for days/weeks. This pretty much means
it's wasted, because the results will be thrown away anyway, no?




The 24 hours runs from the time of the latest commit on the branch in 
question, not the current time, but basically yes.


We've never had machines with runs that long. The longest in recent 
times has been friarbird, which runs CLOBBER_CACHE_ALWAYS and takes 
around 4.5 hours. But we have had misconfigured machines reporting 
unbelievable snapshot times.  I'll take a look and see if we can tighten 
up the sanity check. It's worth noting that one thing friarbird does is 
skip the install-check stage - it's almost certainly not going to have 
terribly much interesting to tell us from that, given it has already run 
a plain "make check".


How long does a CLOBBER_CACHE_RECURSIVELY run take? days or weeks seems 
kinda nuts.


cheers

andrew


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Logical replication woes

2014-05-15 Thread Andres Freund

Hi,

On 2014-05-15 20:07:23 +0300, Heikki Linnakangas wrote:
> On 05/15/2014 07:57 PM, Heikki Linnakangas wrote:
> >Spotted while testing pg_recvlogical:
> >
> >1. Set up pg_recvlogical to receive:
> >
> >./pg_recvlogical -S fooslot -d postgres --create
> >./pg_recvlogical -S fooslot -d postgres --start -f -
> >
> >2. In another terminal, with psql:
> >
> >create table foo (id int4);
> >begin;
> >   insert into foo values (4);
> >   alter table foo alter column id type text;
> >prepare transaction 'foo';
> >commit prepared 'foo';
> >insert into foo values (1);
> >
> >3.  With current HEAD, after commit
> >bb38fb0d43c8d7ff54072bfd8bd63154e536b384, this produces an assertion
> >failure:
> >
> >TRAP: FailedAssertion("!(((xid) != ((TransactionId) 0)))", File:
> >"reorderbuffer.c", Line: 508)
> >
> >I believe that's we no longer assign another XID to the transaction that
> >does the COMMIT PREPARED. Previously, an extra XID, in addition to the
> >XID of the prepared transaction, was assigned for use in locking the
> >global transaction entry in shared memory, but that's no longer required.
> >
> >However, even with that patch reverted, it doesn't work correctly:
> >
> >ERROR:  could not map filenode "base/12142/16390" to relation OID
> >LOG:  starting logical decoding for slot fooslot
> >DETAIL:  streaming transactions committing after 0/16D1670, reading WAL
> >from 0/16BC470
> 
> Ok, so the immediate cause was quick to find: when decoding a
> commit-prepared WAL record, we have to use the XID from the record content
> (patch attached). The XID in the record header is the XID of the transaction
> doing the COMMIT PREPARED, which is always 0 after patch
> bb38fb0d43c8d7ff54072bfd8bd63154e536b384 which causes the assertion. But it
> was always wrong. After fixing, it no longer asserts or gives the above
> "could not map filenode" error.
> 
> However, it still doesn't seem right. When I do the above as a regular
> transaction, ie:
> 
> begin; insert into foo values (6);  alter table foo alter column id type
> text; commit;
> 
> pg_recvlogical prints this:
> 
> BEGIN 708
> table public.foo: INSERT: id[text]:'6'
> COMMIT 708
> 
> But if I do it as a prepared transaction:
> 
> begin; insert into foo values (7);  alter table foo alter column id type
> text; prepare transaction 'foo'; commit prepared 'foo';

Looking into it. I at some point dropped the prepared xact tests and
that was obviously a mistake. Will re-add them and fix.

Andres Freund

-- 
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] pg_recvlogical, stdout and SIGHUP

2014-05-15 Thread Heikki Linnakangas


On 05/15/2014 07:59 PM, Andres Freund wrote:

On 2014-05-13 17:43:47 +0300, Heikki Linnakangas wrote:

On 05/13/2014 04:35 PM, Andres Freund wrote:

On 2014-05-13 16:31:25 +0300, Heikki Linnakangas wrote:

Another thing I noticed is that if when the output goes to a file, the file
isn't re-opened immediately on SIGHUP. Only after receiving some data from
the server. I believe that's also not intentional.


Hm. I can't really get excited about that one. Not doing that seems to
complicate matters unneccessarily. What's the problem here?


Not sure if it matters in any real-world scenario, but I found it pretty
surprising while playing with it. It should be trivial to fix; ISTM the
problem is that there is a "continue" in the loop when select() is
interrupted by signal, but the re-opening is done after the select() in the
loop. I think all you need to do is move the check for output_reopen to the
beginning of the loop.


Thanks for fixing and sorry for being slow :(.

Any reason you didn't also move the opening of the output file up? It
seems a bit odd to not have an output file existing every now and
then...


No particular reason. But that would actually be a great idea, because 
currently you won't get any error you give pg_recvlogical an invalid 
path, until it receives the first piece of data from the server and 
tries to write it to the file. I'll go and do that.


- Heikki


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Logical replication woes

2014-05-15 Thread Heikki Linnakangas


On 05/15/2014 07:57 PM, Heikki Linnakangas wrote:

Spotted while testing pg_recvlogical:

1. Set up pg_recvlogical to receive:

./pg_recvlogical -S fooslot -d postgres --create
./pg_recvlogical -S fooslot -d postgres --start -f -

2. In another terminal, with psql:

create table foo (id int4);
begin;
   insert into foo values (4);
   alter table foo alter column id type text;
prepare transaction 'foo';
commit prepared 'foo';
insert into foo values (1);

3.  With current HEAD, after commit
bb38fb0d43c8d7ff54072bfd8bd63154e536b384, this produces an assertion
failure:

TRAP: FailedAssertion("!(((xid) != ((TransactionId) 0)))", File:
"reorderbuffer.c", Line: 508)

I believe that's we no longer assign another XID to the transaction that
does the COMMIT PREPARED. Previously, an extra XID, in addition to the
XID of the prepared transaction, was assigned for use in locking the
global transaction entry in shared memory, but that's no longer required.

However, even with that patch reverted, it doesn't work correctly:

ERROR:  could not map filenode "base/12142/16390" to relation OID
LOG:  starting logical decoding for slot fooslot
DETAIL:  streaming transactions committing after 0/16D1670, reading WAL
from 0/16BC470


Ok, so the immediate cause was quick to find: when decoding a 
commit-prepared WAL record, we have to use the XID from the record 
content (patch attached). The XID in the record header is the XID of the 
transaction doing the COMMIT PREPARED, which is always 0 after patch 
bb38fb0d43c8d7ff54072bfd8bd63154e536b384 which causes the assertion. But 
it was always wrong. After fixing, it no longer asserts or gives the 
above "could not map filenode" error.


However, it still doesn't seem right. When I do the above as a regular 
transaction, ie:


begin; insert into foo values (6);  alter table foo alter column id type 
text; commit;


pg_recvlogical prints this:

BEGIN 708
table public.foo: INSERT: id[text]:'6'
COMMIT 708

But if I do it as a prepared transaction:

begin; insert into foo values (7);  alter table foo alter column id type 
text; prepare transaction 'foo'; commit prepared 'foo';


pg_recvlogical prints nothing.

- Heikki
diff --git a/src/backend/replication/logical/decode.c b/src/backend/replication/logical/decode.c
index d6499d5..cc73652 100644
--- a/src/backend/replication/logical/decode.c
+++ b/src/backend/replication/logical/decode.c
@@ -225,7 +225,7 @@ DecodeXactOp(LogicalDecodingContext *ctx, XLogRecordBuffer *buf)
 subxacts = (TransactionId *) &(xlrec->xnodes[xlrec->nrels]);
 invals = (SharedInvalidationMessage *) &(subxacts[xlrec->nsubxacts]);
 
-DecodeCommit(ctx, buf, prec->xl_xid, xlrec->dbId,
+DecodeCommit(ctx, buf, prec->xid, xlrec->dbId,
 			 xlrec->xact_time,
 			 xlrec->nsubxacts, subxacts,
 			 xlrec->nmsgs, invals);

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] pg_recvlogical, stdout and SIGHUP

2014-05-15 Thread Andres Freund

On 2014-05-13 17:43:47 +0300, Heikki Linnakangas wrote:
> On 05/13/2014 04:35 PM, Andres Freund wrote:
> >On 2014-05-13 16:31:25 +0300, Heikki Linnakangas wrote:
> >>Another thing I noticed is that if when the output goes to a file, the file
> >>isn't re-opened immediately on SIGHUP. Only after receiving some data from
> >>the server. I believe that's also not intentional.
> >
> >Hm. I can't really get excited about that one. Not doing that seems to
> >complicate matters unneccessarily. What's the problem here?
> 
> Not sure if it matters in any real-world scenario, but I found it pretty
> surprising while playing with it. It should be trivial to fix; ISTM the
> problem is that there is a "continue" in the loop when select() is
> interrupted by signal, but the re-opening is done after the select() in the
> loop. I think all you need to do is move the check for output_reopen to the
> beginning of the loop.

Thanks for fixing and sorry for being slow :(.

Any reason you didn't also move the opening of the output file up? It
seems a bit odd to not have an output file existing every now and
then...

Greetings,

Andres Freund

-- 
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

[HACKERS] Logical replication woes

2014-05-15 Thread Heikki Linnakangas


Spotted while testing pg_recvlogical:

1. Set up pg_recvlogical to receive:

./pg_recvlogical -S fooslot -d postgres --create
./pg_recvlogical -S fooslot -d postgres --start -f -

2. In another terminal, with psql:

create table foo (id int4);
begin;
 insert into foo values (4);
 alter table foo alter column id type text;
prepare transaction 'foo';
commit prepared 'foo';
insert into foo values (1);

3.  With current HEAD, after commit 
bb38fb0d43c8d7ff54072bfd8bd63154e536b384, this produces an assertion 
failure:


TRAP: FailedAssertion("!(((xid) != ((TransactionId) 0)))", File: 
"reorderbuffer.c", Line: 508)


I believe that's we no longer assign another XID to the transaction that 
does the COMMIT PREPARED. Previously, an extra XID, in addition to the 
XID of the prepared transaction, was assigned for use in locking the 
global transaction entry in shared memory, but that's no longer required.


However, even with that patch reverted, it doesn't work correctly:

ERROR:  could not map filenode "base/12142/16390" to relation OID
LOG:  starting logical decoding for slot fooslot
DETAIL:  streaming transactions committing after 0/16D1670, reading WAL 
from 0/16BC470


- Heikki


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

[HACKERS] buildfarm animals and 'snapshot too old'

2014-05-15 Thread Tomas Vondra

Hi all,

today I got a few of errors like these (this one is from last week, though):

   Status Line: 493 snapshot too old: Wed May  7 04:36:57 2014 GMT
   Content:
   snapshot to old: Wed May  7 04:36:57 2014 GMT

on the new buildfarm animals. I believe it was my mistake (incorrectly
configured local git mirror), but it got me thinking about how this will
behave with the animals running CLOBBER_CACHE_RECURSIVELY.

If I understand the Perl code correctly, it does this:

(1) update the repository
(2) run the tests
(3) check that the snapshot is not older than 24 hours (pgstatus.pl:188)
(4) fail if older

Now, imagine that the test runs for days/weeks. This pretty much means
it's wasted, because the results will be thrown away anyway, no?

regards
Tomas



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Unportability of setvbuf()

2014-05-15 Thread Tom Lane

Alvaro Herrera  writes:
> Tom Lane wrote:
>> It might also be reasonable to create a wrapper macro along the line of
>> "PG_STD_IO_BUFFERING()" that would encapsulate the whole sequence
>> setvbuf(stdout, NULL, _IOLBF, 0);
>> setvbuf(stderr, NULL, _IONBF, 0);
>> Or maybe we should have separate macros for those two calls.  Or maybe
>> this is just a useless layer of abstraction and PG_IOLBF is enough
>> to make the calls portable.
>> 
>> Thoughts?

> I don't really know all that much about this stuff, but see commits
> 6eda3e9c27781dec369542a9b20cba7c3d832a5e and its parent about
> isolationtester.

Yeah, making them both unbuffered is another scenario that has its
use-cases, so maybe it's inappropriate to create a macro that presumes
to define the One True Way.

For the moment I'll just arrange for initdb to share the logic with
syslogger.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] proposal: Set effective_cache_size to greater of .conf value, shared_buffers

2014-05-15 Thread Bruce Momjian

On Thu, May 15, 2014 at 11:36:51PM +0900, Amit Langote wrote:
> > No, all memory allocat is per-process, except for shared memory.  We
> > probably need a way to record our large local memory allocations in
> > PGPROC that other backends can see;  same for effective cache size
> > assumptions we make.
> >
> 
> I see. I thought there would be some centralised way to traverse, say,
> a linked list of contexts that individual backends create or something
> like that. But, I suppose it would not be straightforward to make any
> of that work for what we are after here.

The problem is locking overhead between sessions.  Right now we avoid
all of that, and I think if we just put the value in PGPROC, it will be
good enough with limited locking required.

-- 
  Bruce Momjian  http://momjian.us
  EnterpriseDB http://enterprisedb.com

  + Everyone has their own god. +


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Unportability of setvbuf()

2014-05-15 Thread Alvaro Herrera

Tom Lane wrote:

> It might also be reasonable to create a wrapper macro along the line of
> "PG_STD_IO_BUFFERING()" that would encapsulate the whole sequence
>   setvbuf(stdout, NULL, _IOLBF, 0);
>   setvbuf(stderr, NULL, _IONBF, 0);
> Or maybe we should have separate macros for those two calls.  Or maybe
> this is just a useless layer of abstraction and PG_IOLBF is enough
> to make the calls portable.
> 
> Thoughts?

I don't really know all that much about this stuff, but see commits
6eda3e9c27781dec369542a9b20cba7c3d832a5e and its parent about
isolationtester.

-- 
Álvaro Herrerahttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Problem with txid_snapshot_in/out() functionality

2014-05-15 Thread Heikki Linnakangas


On 04/14/2014 11:55 AM, Marko Kreen wrote:

On Sun, Apr 13, 2014 at 05:46:20PM -0400, Jan Wieck wrote:

On 04/13/14 14:22, Jan Wieck wrote:

On 04/13/14 08:27, Marko Kreen wrote:

I think you need to do SET_VARSIZE also here.  Alternative is to
move SET_VARSIZE after sort_snapshot().

And it seems the drop-double-txid logic should be added also to
txid_snapshot_recv().  It seems weird to have it behave differently

>from txid_snapshot_in().




Thanks,

yes on both issues. Will create another patch.


New patch attached.

New github commit is 
https://github.com/wieck/postgres/commit/b8fd0d2eb78791e5171e34aecd233fd06218890d


Looks OK to me.


Ok, committed.

- Heikki


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

[HACKERS] Unportability of setvbuf()

2014-05-15 Thread Tom Lane

>From the department of punishing good deeds ... in commit 2dc4f011fd
I added setvbuf() calls to initdb to ensure that output to stdout and
stderr would appear in a consistent order regardless of whether the
output was going to a terminal or a file.  The buildfarm shows that
on several (but not all) Windows machines, initdb is now failing
without printing anything.  After a bit of research, I believe what
is happening is:

(1) On Windows, setvbuf interprets _IOLBF as _IOFBF.
(2) If _IOFBF is specified and size is zero, it reports EINVAL.
(3) If "parameter validation" is enabled, EINVAL turns into abort().

Thanks Microsoft for your careful attention to compliance with POSIX.

I see that syslogger.c encountered this problem long ago and solved
it by the expedient of using _IONBF not _IOLBF for the log output
file.  That's probably what we must do for stdout in initdb as well.
It seems likely that we may need the same in other client programs
someday.

What I'm not totally sure about is how to wrap this up nicely.
I'm inclined first of all to move syslogger.c's LBF_MODE symbol to
port.h and rename it to, say, PG_IOLBF.

It might also be reasonable to create a wrapper macro along the line of
"PG_STD_IO_BUFFERING()" that would encapsulate the whole sequence
setvbuf(stdout, NULL, _IOLBF, 0);
setvbuf(stderr, NULL, _IONBF, 0);
Or maybe we should have separate macros for those two calls.  Or maybe
this is just a useless layer of abstraction and PG_IOLBF is enough
to make the calls portable.

Thoughts?

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: Race condition between PREPARE TRANSACTION and COMMIT PREPARED (was Re: [HACKERS] Problem with txid_snapshot_in/out() functionality)

2014-05-15 Thread Andres Freund

On 2014-05-15 17:21:28 +0300, Heikki Linnakangas wrote:
> >Is it guaranteed that all paths have called LWLockReleaseAll()
> >before calling the proc exit hooks? Otherwise we might end up waiting
> >for ourselves...
> 
> Hmm. AbortTransaction() will release locks before we get here, but the
> before_shmem_exit() callpath will not. So an elog(FATAL), while holding
> TwoPhaseStateLock would cause us to deadlock with ourself. But there are no
> such elogs.

> I copied this design from async.c, which is quite similar, so if there's a
> problem that ought to be fixed too. And there are other more complicated
> before_shmem callbacks that worry me more, like createdb_failure_callback().
> But I think they're all all right.

Perhaps we should enforce that LWLockReleaseAll() is called first?
E.g. in shmem_exit()? It'll happen in ProcKill() atm, but that's
normally pretty much at the bottom of the stack.

> >It's not particularly nice to register shmem exit hooks in the middle of
> >normal processing because it makes it impossible to use
> >cancel_before_shmem_exit() previously registered hooks. I think this
> >should be registered at startup, if max_prepared_xacts > 0.
> 
> . async.c and namespace.c does the same, and it hasn't been a
> problem.

Well, it doesn't seem unreasonable to have C code using
PG_ENSURE_ERROR_CLEANUP/PG_END_ENSURE_ERROR_CLEANUP around a 2pc commit
to me. That'll break with this.
Perhaps we should just finally make cancel_before_shmem_exit search the
stack of callbacks.

Greetings,

Andres Freund

-- 
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] proposal: Set effective_cache_size to greater of .conf value, shared_buffers

2014-05-15 Thread Amit Langote

On Thu, May 15, 2014 at 11:24 PM, Bruce Momjian  wrote:
> On Thu, May 15, 2014 at 10:23:19PM +0900, Amit Langote wrote:
>> On Thu, May 15, 2014 at 9:06 PM, Bruce Momjian  wrote:
>> >
>> > This is the same problem we had with auto-tuning work_mem, in that we
>> > didn't know what other concurrent activity was happening.  Seems we need
>> > concurrent activity detection before auto-tuning work_mem and
>> > effective_cache_size.
>> >
>>
>> Perhaps I am missing something obvious here, but would mmgr have any
>> useful numbers on this? Like any book-keeping info maintained by
>> mcxt.c/aset.c? Would extending that interface help?
>
> No, all memory allocat is per-process, except for shared memory.  We
> probably need a way to record our large local memory allocations in
> PGPROC that other backends can see;  same for effective cache size
> assumptions we make.
>

I see. I thought there would be some centralised way to traverse, say,
a linked list of contexts that individual backends create or something
like that. But, I suppose it would not be straightforward to make any
of that work for what we are after here.

--
Amit


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] proposal: Set effective_cache_size to greater of .conf value, shared_buffers

2014-05-15 Thread Bruce Momjian

On Thu, May 15, 2014 at 10:23:19PM +0900, Amit Langote wrote:
> On Thu, May 15, 2014 at 9:06 PM, Bruce Momjian  wrote:
> >
> > This is the same problem we had with auto-tuning work_mem, in that we
> > didn't know what other concurrent activity was happening.  Seems we need
> > concurrent activity detection before auto-tuning work_mem and
> > effective_cache_size.
> >
> 
> Perhaps I am missing something obvious here, but would mmgr have any
> useful numbers on this? Like any book-keeping info maintained by
> mcxt.c/aset.c? Would extending that interface help?

No, all memory allocat is per-process, except for shared memory.  We
probably need a way to record our large local memory allocations in
PGPROC that other backends can see;  same for effective cache size
assumptions we make.

-- 
  Bruce Momjian  http://momjian.us
  EnterpriseDB http://enterprisedb.com

  + Everyone has their own god. +


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: Race condition between PREPARE TRANSACTION and COMMIT PREPARED (was Re: [HACKERS] Problem with txid_snapshot_in/out() functionality)

2014-05-15 Thread Heikki Linnakangas


On 05/06/2014 02:44 PM, Andres Freund wrote:

On 2014-05-05 13:41:00 +0300, Heikki Linnakangas wrote:

+/*
+ * Exit hook to unlock the global transaction entry we're working on.
+ */
+static void
+AtProcExit_Twophase(int code, Datum arg)
+{
+   /* same logic as abort */
+   AtAbort_Twophase();
+}
+
+/*
+ * Abort hook to unlock the global transaction entry we're working on.
+ */
+void
+AtAbort_Twophase(void)
+{
+   if (MyLockedGxact == NULL)
+   return;
+
+   /*
+* If we were in process of preparing the transaction, but haven't
+* written the WAL record yet, remove the global transaction entry.
+* Same if we are in the process of finishing an already-prepared
+* transaction, and fail after having already written the WAL 2nd
+* phase commit or rollback record.
+*
+* After that it's too late to abort, so just unlock the 
GlobalTransaction
+* entry.  We might not have transfered all locks and other state to the
+* prepared transaction yet, so this is a bit bogus, but it's the best 
we
+* can do.
+*/
+   if (!MyLockedGxact->valid)
+   {
+   RemoveGXact(MyLockedGxact);
+   }
+   else
+   {
+   LWLockAcquire(TwoPhaseStateLock, LW_EXCLUSIVE);
+
+   MyLockedGxact->locking_backend = InvalidBackendId;
+
+   LWLockRelease(TwoPhaseStateLock);
+   }
+   MyLockedGxact = NULL;
+}


Is it guaranteed that all paths have called LWLockReleaseAll()
before calling the proc exit hooks? Otherwise we might end up waiting
for ourselves...


Hmm. AbortTransaction() will release locks before we get here, but the 
before_shmem_exit() callpath will not. So an elog(FATAL), while holding 
TwoPhaseStateLock would cause us to deadlock with ourself. But there are 
no such elogs.


I copied this design from async.c, which is quite similar, so if there's 
a problem that ought to be fixed too. And there are other more 
complicated before_shmem callbacks that worry me more, like 
createdb_failure_callback(). But I think they're all all right.



  /*
   * MarkAsPreparing
@@ -261,29 +329,15 @@ MarkAsPreparing(TransactionId xid, const char *gid,
 errmsg("prepared transactions are disabled"),
  errhint("Set max_prepared_transactions to a nonzero 
value.")));

-   LWLockAcquire(TwoPhaseStateLock, LW_EXCLUSIVE);
-
-   /*
-* First, find and recycle any gxacts that failed during prepare. We do
-* this partly to ensure we don't mistakenly say their GIDs are still
-* reserved, and partly so we don't fail on out-of-slots unnecessarily.
-*/
-   for (i = 0; i < TwoPhaseState->numPrepXacts; i++)
+   /* on first call, register the exit hook */
+   if (!twophaseExitRegistered)
{
-   gxact = TwoPhaseState->prepXacts[i];
-   if (!gxact->valid && !TransactionIdIsActive(gxact->locking_xid))
-   {
-   /* It's dead Jim ... remove from the active array */
-   TwoPhaseState->numPrepXacts--;
-   TwoPhaseState->prepXacts[i] = 
TwoPhaseState->prepXacts[TwoPhaseState->numPrepXacts];
-   /* and put it back in the freelist */
-   gxact->next = TwoPhaseState->freeGXacts;
-   TwoPhaseState->freeGXacts = gxact;
-   /* Back up index count too, so we don't miss scanning 
one */
-   i--;
-   }
+   before_shmem_exit(AtProcExit_Twophase, 0);
+   twophaseExitRegistered = true;
}


It's not particularly nice to register shmem exit hooks in the middle of
normal processing because it makes it impossible to use
cancel_before_shmem_exit() previously registered hooks. I think this
should be registered at startup, if max_prepared_xacts > 0.


. async.c and namespace.c does the same, and it hasn't been a 
problem.


I committed this now, but please let me know if you see a concrete 
problem with the locks.


- Heikki


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

[HACKERS] Re: popen and pclose redefinitions causing many warning in Windows build

2014-05-15 Thread Heikki Linnakangas


On 05/15/2014 04:15 PM, Michael Paquier wrote:

On Thu, May 15, 2014 at 6:20 PM, Heikki Linnakangas
 wrote:

Ok, I committed #undefs. I don't have a Mingw(-w64) environment to test
with, so let's see if the buildfarm likes it.

There does not seem to be a buildfarm machine using MinGW-w64...


Jacana. It has "gcc 4.8.1" listed as the compiler, but if you look at 
the config in detail, it's mingw-w64. The popen/pclose warnings are 
there. It hasn't performed a build after I committed the fix yet.



Btw, I tested latest master on a Windows box and MinGW-w64 is happier now.


Thanks!

- Heikki


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] proposal: Set effective_cache_size to greater of .conf value, shared_buffers

2014-05-15 Thread Amit Langote

On Thu, May 15, 2014 at 9:06 PM, Bruce Momjian  wrote:
>
> This is the same problem we had with auto-tuning work_mem, in that we
> didn't know what other concurrent activity was happening.  Seems we need
> concurrent activity detection before auto-tuning work_mem and
> effective_cache_size.
>

Perhaps I am missing something obvious here, but would mmgr have any
useful numbers on this? Like any book-keeping info maintained by
mcxt.c/aset.c? Would extending that interface help?

--
Amit


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] popen and pclose redefinitions causing many warning in Windows build

2014-05-15 Thread Michael Paquier

On Thu, May 15, 2014 at 6:20 PM, Heikki Linnakangas
 wrote:
> Ok, I committed #undefs. I don't have a Mingw(-w64) environment to test
> with, so let's see if the buildfarm likes it.
There does not seem to be a buildfarm machine using MinGW-w64... Btw,
I tested latest master on a Windows box and MinGW-w64 is happier now.
Thanks!
-- 
Michael


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] proposal: Set effective_cache_size to greater of .conf value, shared_buffers

2014-05-15 Thread Bruce Momjian

On Tue, May  6, 2014 at 11:15:17PM +0100, Simon Riggs wrote:
> > Well, for what it's worth, I've encountered systems where setting
> > effective_cache_size too low resulted in bad query plans, but I've
> > never encountered the reverse situation.
> 
> I agree with that.
> 
> Though that misses my point, which is that you can't know that all of
> that memory is truly available on a server with many concurrent users.
> Choosing settings that undercost memory intensive plans are not the
> best choice for a default strategy in a mixed workload when cache may
> be better used elsewhere, even if such settings make sense for some
> individual users.

This is the same problem we had with auto-tuning work_mem, in that we
didn't know what other concurrent activity was happening.  Seems we need
concurrent activity detection before auto-tuning work_mem and
effective_cache_size.

-- 
  Bruce Momjian  http://momjian.us
  EnterpriseDB http://enterprisedb.com

  + Everyone has their own god. +


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Various cosmetic fixes

2014-05-15 Thread Heikki Linnakangas


On 05/14/2014 08:49 PM, Euler Taveira wrote:

While updating pt-br translation I noticed that some sentences could be
improved. I also fix some style glitches. A set of patches are attached.


Thanks, applied.

- Heikki


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] gettimeofday is at the end of its usefulness?

2014-05-15 Thread Andres Freund

Hi,

On 2014-05-13 18:58:11 -0400, Tom Lane wrote:
> Anyway it looks like clock_gettime() might be worth using on Linux
> just for the more precise output.  It doesn't seem to exist on OS X
> though, and I have no idea about elsewhere.

Agreed that using clock_gettime() would be a good idea. I'd say we
should have a wrapper around it that is able to provide nanosecond
precision. If only gettimeofday() (and whatever windows is using) is
available, we can dynamically fall back to that.

> I'm curious if anybody has ideas about other things we might do for
> portable high-precision timing.

It's far from a solve-it-all, but can we perhaps try to coalesce
repeated time measurements? We'll very frequently do a
InstrStopNode();
/* minimal amount of work */
InstrStartNode();
which will measure the time twice. I think there's a fair number of
scenarios where once would be enough. I'll freely admit that I haven't
looked enough to determine how we could do that API wise.

Greetings,

Andres Freund

-- 
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] gettimeofday is at the end of its usefulness?

2014-05-15 Thread Andres Freund

On 2014-05-15 12:04:25 +0100, Benedikt Grundmann wrote:
> On Thu, May 15, 2014 at 11:31 AM, Greg Stark  wrote:
> 
> > On Thu, May 15, 2014 at 8:19 AM, Benedikt Grundmann
> >  wrote:
> > > I posted this on this mailing list before at Jane Street we have
> > developed
> > > very fast code to get timing information based on TSC if available.  It's
> > > all ocaml but well documented and mostly just calls to c functions so
> > should
> > > be easy to port to C and we release it under a very liberal license so it
> > > should be no problem to take the ideas:
> >
> > What OS do you run it on though? How fast is your implementation
> > compared to the kernel implementation of clock_gettime()?
> >
> > Are you sure your implementation is actually faster? And are you sure
> > you're protected against clocks going backwards? I think you should
> > put some i/o in the loop in the test and start several threads running
> > it to make it more likely the thread is rescheduled to a different
> > processor during the test. It suspect you'll find the rdtsc goes
> > backwards sometimes or produces crazy results when switching
> > processors.
> >
> >
> There are benchmarks in the link I posted (obtained by a micro benchmarking
> library we developed / use internally which takes great care to obtain
> reliable numbers) .  We use posix threads extensively. We internally spend
> a lot of time setting up ntp and monitoring systems so that clock backwards
> never happens (so with other words I wouldn't be surprised if the library
> does NOT work correctly when it does  -- our protection is outside).  I do
> not believe we have seen the tdtsc going backwards on thread context switch
> you mention (and as said we use lots of threads).  OS?  Centos 6.5
> primarily.

Did you test it on server with more one socket (i.e. not just multiple
cores, but distinct cpu cases)? That's where you expect to see
differences in TSC to have funny effects.

Greetings,

Andres Freund

-- 
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] gettimeofday is at the end of its usefulness?

2014-05-15 Thread Benedikt Grundmann

On Thu, May 15, 2014 at 11:31 AM, Greg Stark  wrote:

> On Thu, May 15, 2014 at 8:19 AM, Benedikt Grundmann
>  wrote:
> > I posted this on this mailing list before at Jane Street we have
> developed
> > very fast code to get timing information based on TSC if available.  It's
> > all ocaml but well documented and mostly just calls to c functions so
> should
> > be easy to port to C and we release it under a very liberal license so it
> > should be no problem to take the ideas:
>
> What OS do you run it on though? How fast is your implementation
> compared to the kernel implementation of clock_gettime()?
>
> Are you sure your implementation is actually faster? And are you sure
> you're protected against clocks going backwards? I think you should
> put some i/o in the loop in the test and start several threads running
> it to make it more likely the thread is rescheduled to a different
> processor during the test. It suspect you'll find the rdtsc goes
> backwards sometimes or produces crazy results when switching
> processors.
>
>
There are benchmarks in the link I posted (obtained by a micro benchmarking
library we developed / use internally which takes great care to obtain
reliable numbers) .  We use posix threads extensively. We internally spend
a lot of time setting up ntp and monitoring systems so that clock backwards
never happens (so with other words I wouldn't be surprised if the library
does NOT work correctly when it does  -- our protection is outside).  I do
not believe we have seen the tdtsc going backwards on thread context switch
you mention (and as said we use lots of threads).  OS?  Centos 6.5
primarily.







--
> greg
>

Re: [HACKERS] gettimeofday is at the end of its usefulness?

2014-05-15 Thread Greg Stark

On Thu, May 15, 2014 at 8:19 AM, Benedikt Grundmann
 wrote:
> I posted this on this mailing list before at Jane Street we have developed
> very fast code to get timing information based on TSC if available.  It's
> all ocaml but well documented and mostly just calls to c functions so should
> be easy to port to C and we release it under a very liberal license so it
> should be no problem to take the ideas:

What OS do you run it on though? How fast is your implementation
compared to the kernel implementation of clock_gettime()?

Are you sure your implementation is actually faster? And are you sure
you're protected against clocks going backwards? I think you should
put some i/o in the loop in the test and start several threads running
it to make it more likely the thread is rescheduled to a different
processor during the test. It suspect you'll find the rdtsc goes
backwards sometimes or produces crazy results when switching
processors.

-- 
greg

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

[HACKERS] Re: popen and pclose redefinitions causing many warning in Windows build

2014-05-15 Thread Heikki Linnakangas


On 05/14/2014 06:06 PM, Noah Misch wrote:

On Wed, May 14, 2014 at 05:51:24PM +0300, Heikki Linnakangas wrote:

On 05/14/2014 05:37 PM, Noah Misch wrote:

On Wed, May 14, 2014 at 03:15:38PM +0300, Heikki Linnakangas wrote:

On 05/09/2014 02:56 AM, Noah Misch wrote:

MinGW: 
http://sourceforge.net/p/mingw/mingw-org-wsl/ci/master/tree/include/stdio.h#l467
MinGW-w64: 
http://sourceforge.net/p/mingw-w64/code/HEAD/tree/trunk/mingw-w64-headers/crt/stdio.h#l496

Building with any recent MinGW-w64, 32-bit or 64-bit, gets the reported
warnings; building with MinGW proper does not.


Hmm. The MinGW-w64 header does this:


#if !defined(NO_OLDNAMES) && !defined(popen)
#define popen _popen
#define pclose _pclose
#endif


So if we defined popen() before including stdio.h, that would get
rid of the warning. But we don't usually do things in that order.


True.  I have no strong preference between that and use of #undef.


I think I would prefer #undef. The risk with that is if some
platform has #defined popen() to something else entirely, for some
good reason, we would be bypassing that hypothetical wrapper. But I
guess we'll cross that bridge if we get there.


Works for me.  Since "(popen)(x, y)" shall behave the same as "popen(x, y)",
such a hypothetical system header would be buggy, anyway.


Ok, I committed #undefs. I don't have a Mingw(-w64) environment to test 
with, so let's see if the buildfarm likes it.


- Heikki


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Error in running DBT2

2014-05-15 Thread Rohit Goyal

Hi All,

I am runnig dbt2 with last postgresql kit. pg9.4. I tried everything again
after setting up awhole new machine again with ubuntu. Still facing the
same error.


I run the the *dbt2-pgsql-build-db -w 1 *

but, after some time, I faced this error

*/home/abhi/dbt2_install/bin/dbt2-pgsql-load-stored-procs: 45: [: c:
unexpected operator*
*/home/abhi/dbt2_install/bin/dbt2-pgsql-load-stored-procs: 53: [: c:
unexpected operator*
*unknown stored function type: c*

and script ends!!

Please guide me probable solution to get over it.

Regards,
Rohit
On Wed, May 14, 2014 at 12:43 AM, Rohit Goyal  wrote:

> Hi Peter,
>
> I tried the solution suggested by you. Please problem still persists.
>
> I run the the *dbt2-pgsql-build-db -w 1 *
>
> but, after some time, I faced this error
>
> */home/abhi/dbt2_install/bin/dbt2-pgsql-load-stored-procs: 45: [: c:
> unexpected operator*
> */home/abhi/dbt2_install/bin/dbt2-pgsql-load-stored-procs: 53: [: c:
> unexpected operator*
> *unknown stored function type: c*
>
> and script ends!!
>
> moreover, I see that DB has been created and also 9 tables are there in
> Database dbt2. Please suggest how to proceed.
>
> Regards,
> rohit Goyal
>
>

>
> On Tue, May 13, 2014 at 9:44 PM, Peter Geoghegan  wrote:
>
>>
>> On Tue, May 13, 2014 at 12:36 PM, Rohit Goyal wrote:
>>
>>> This pattern the above found many times. Please guide me through!!!
>>>
>>
>> IIRC, people have been working around this by setting
>> standard_conforming_strings to "off". It really ought to be fixed in a
>> principled way, though -- the real issue here is that dbt2 has severe
>> bit-rot.
>>
>> --
>> Peter Geoghegan
>>
>
>
>
> --
> Regards,
> Rohit Goyal
>



-- 
Regards,
Rohit Goyal

[HACKERS] Selectivity estimation for inet operators

2014-05-15 Thread Emre Hasegeli

New version of the selectivity estimation patch attached. I am adding
it to CommitFest 2014-06. Previous version of it reviewed by
Andreas Karlson on the previous CommitFest with the GiST support patch.
The new version includes join selectivity estimation.

Join selectivity is calculated in 4 steps:

* matching first MCV to second MCV
* searching first MCV in the second histogram
* searching second MCV in the first histogram
* searching boundaries of the first histogram in the second histogram

Comparing the lists with each other slows down the function when
statistics set to higher values. To avoid this problem I only use
log(n) values of the lists. It is the first log(n) value for MCV,
evenly separated values for histograms. In my tests, this optimization
does not affect the planning time when statistics = 100, but does
affect accuracy of the estimation. I can send the version without
this optimization, if slow down with larger statistics is not a problem
which should be solved on the selectivity estimation function.

I also attach the script I was using for testing and I left log statements
in the networkjoinsel() function to make testing easier. These statements
should be removed before commit.


inet-selfuncs-v4.patch
Description: Binary data


inet-selfuncs-test.sql
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] gettimeofday is at the end of its usefulness?

2014-05-15 Thread Benedikt Grundmann

On Thu, May 15, 2014 at 8:19 AM, Benedikt Grundmann <
bgrundm...@janestreet.com> wrote:

> I posted this on this mailing list before at Jane Street we have developed
> very fast code to get timing information based on TSC if available.  It's
> all ocaml but well documented and mostly just calls to c functions so
> should be easy to port to C and we release it under a very liberal license
> so it should be no problem to take the ideas:
>
> https://github.com/janestreet/core/blob/master/lib/time_stamp_counter.mli
>
> Hope this is useful.
>
> Bene
>
>
Also I'm sorry for top posting.  hackers is the only mailing list I'm on
that requires this and some others require top posting so this runs counter
my habits and I only realized after sending...



>
> On Wed, May 14, 2014 at 12:41 PM, Robert Haas wrote:
>
>> On Tue, May 13, 2014 at 11:34 PM, Greg Stark  wrote:
>> > I always assumed the kernel used rdtsc to implement some of the high
>> > performance timers. It can save the current time in a mapped page when
>> > it schedules a process and then in the vdso syscall (ie in user-space)
>> > it can use rdtsc to calculate the offset needed to adjust that
>> > timestamp to the current time. This seems consistent with your
>> > calculations that showed the 40ns overhead with +/- 10ns precision.
>>
>> Crazy idea: Instead of trying to time precisely the amount of time we
>> spend in each node, configure a very-high frequency timer interrupt
>> (or background thread?) that does:
>>
>> SomeGlobalVariablePointingToTheCurrentNode->profiling_counter++;
>>
>> --
>> Robert Haas
>> EnterpriseDB: http://www.enterprisedb.com
>> The Enterprise PostgreSQL Company
>>
>>
>> --
>> Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
>> To make changes to your subscription:
>> http://www.postgresql.org/mailpref/pgsql-hackers
>>
>
>

Re: [HACKERS] gettimeofday is at the end of its usefulness?

2014-05-15 Thread Benedikt Grundmann

I posted this on this mailing list before at Jane Street we have developed
very fast code to get timing information based on TSC if available.  It's
all ocaml but well documented and mostly just calls to c functions so
should be easy to port to C and we release it under a very liberal license
so it should be no problem to take the ideas:

https://github.com/janestreet/core/blob/master/lib/time_stamp_counter.mli

Hope this is useful.

Bene


On Wed, May 14, 2014 at 12:41 PM, Robert Haas  wrote:

> On Tue, May 13, 2014 at 11:34 PM, Greg Stark  wrote:
> > I always assumed the kernel used rdtsc to implement some of the high
> > performance timers. It can save the current time in a mapped page when
> > it schedules a process and then in the vdso syscall (ie in user-space)
> > it can use rdtsc to calculate the offset needed to adjust that
> > timestamp to the current time. This seems consistent with your
> > calculations that showed the 40ns overhead with +/- 10ns precision.
>
> Crazy idea: Instead of trying to time precisely the amount of time we
> spend in each node, configure a very-high frequency timer interrupt
> (or background thread?) that does:
>
> SomeGlobalVariablePointingToTheCurrentNode->profiling_counter++;
>
> --
> Robert Haas
> EnterpriseDB: http://www.enterprisedb.com
> The Enterprise PostgreSQL Company
>
>
> --
> Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-hackers
>

85 matches

Mail list logo