Re: [HACKERS] PostgreSQL for VAX on NetBSD/OpenBSD

2015-08-22 Thread Tom Lane
Greg Stark  writes:
> Hmm. The backtrace is here but I think it's lying about the specific line.

> #0  convert_one_string_to_scalar (value=0x7f20e9a3 "  ",
> rangelo=32, rangehi=122, 2132863395, 32, 122)
> at selfuncs.c:3873
> #1  0x00435880 in convert_string_to_scalar (
> value=0x7f20e990 "Aztec\n", ' ' , "Ct  ",
> scaledvalue=0x7fffdb44,
> lobound=0x7f225bf4 "Audrey", ' ' , "Dr  ",
> scaledlobound=0x7fffdb34,
> hibound=0x7f225c40 "Balmoral", ' ' , "Dr  ",
> scaledhibound=0x7fffdb3c, 2132863376, 2147474244, 2132958196,
> 2147474228, 2132958272, 2147474236) at selfuncs.c:3847

> Stepping through the code it looks like it actually happens on line
> 3882 when denom overflows.

Oh, interesting.  The largest possible value of "base" is 256, and the
code limits the amount of string it'll scan to 20 bytes, which means
"denom" could reach at most 256^21 = 3.7e50.  Perfectly fine with
IEEE-math doubles, but not so much with other arithmetics.

We could hold the worst-case value to within the VAX range if we
considered only about 14 bytes instead of 20.  Probably that wouldn't
lose much in terms of estimation accuracy, but given the lack of
complaints to date, I'm not sure we should change it ...

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] checkpointer continuous flushing

2015-08-22 Thread Amit Kapila
On Wed, Aug 19, 2015 at 12:13 PM, Fabien COELHO  wrote:

>
> Sure, I think what can help here is a testcase/'s (in form of script file
>> or some other form, to test this behaviour of patch) which you can write
>> and post here, so that others can use that to get the data and share it.
>>
>
> Sure... note that I already did that on this thread, without any echo...
> but I can do it again...
>
>
Thanks.

I have tried your scripts and found some problem while using avg.py
script.
grep 'progress:' test_medium4_FW_off.out | cut -d' ' -f4 | ./avg.py
--limit=10 --length=300
: No such file or directory

I didn't get chance to poke into avg.py script (the command without
avg.py works fine). Python version on the m/c, I planned to test is
Python 2.7.5.

Today while reading the first patch (checkpoint-continuous-flush-10-a),
I have given some thought to below part of patch which I would like
to share with you.

+static int

+NextBufferToWrite(

+ TableSpaceCheckpointStatus *spcStatus, int nb_spaces,

+ int *pspace, int num_to_write, int num_written)

+{

+ int space = *pspace, buf_id = -1, index;

+

+ /*

+ * Select a tablespace depending on the current overall progress.

+ *

+ * The progress ratio of each unfinished tablespace is compared to

+ * the overall progress ratio to find one with is not in advance

+ * (i.e. overall ratio > tablespace ratio,

+ *  i.e. tablespace written/to_write > overall written/to_write


Here, I think above calculation can go for toss if backend or bgwriter
starts writing buffers when checkpoint is in progress.  The tablespace
written parameter won't be able to consider the one's written by backends
or bgwriter.  Now it may not big thing to worry but I find Heikki's version
worth considering, he has not changed the overall idea of this patch, but
the calculations are somewhat simpler and hence less chance of going
wrong.


With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com


Re: [HACKERS] allowing wal_level change at run time

2015-08-22 Thread Peter Eisentraut
On 8/20/15 3:50 PM, Andres Freund wrote:
>> But, under any scheme to set wal_level automatically, how will the
>> primary know whether it needs to use level archive or hot_standby?
> 
> I'd have said archive_mode triggered archive and everything else
> hot_standby.

That might be a decent heuristic, but it's not correct if there is no
way to override it.  People are using archive-only replication with hot
standby (for delayed replication, for example).



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] PATCH: numeric timestamp in log_line_prefix

2015-08-22 Thread Tomas Vondra



On 08/22/2015 09:54 PM, Fabien COELHO wrote:


Hello Tomas,

Review of v2:


attached is a v2 of the patch, reworked based on the comments.


The patch applies cleanly to head, it compiles, I tested it and it
mostly work as expected, see below.


1) fix the docs (explicitly say that it's a Unix epoch)


I would add the word "numeric" in front of timestamp both in the doc and
in the postgresql.conf.sample, as it justifies the chosen %n.


I think we're already using 'unix epoch' in the docs without explicitly 
stating that it's a numeric value, so I don't think we should use it 
here as it'd be inconsistent.





2) handle 'padding' properly


I tried that without success. ISTM that what is padded is the empty
string, and the timestamp is just printed on its own without padding
afterwards.

I think that it should use a string buffer and then used the padding on
the string, as case 'c' or 't' for instance.


Hmmm, I'm not entirely sure how exactly the padding is supposed to work 
(IIRC I've never used it), and I thought it behaved correctly. But maybe 
not - I think the safest thing is copy what 't' does, so I've done that 
in attached v3 of the patch.


regards

--
Tomas Vondra  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
diff --git a/doc/src/sgml/config.sgml b/doc/src/sgml/config.sgml
index e900dcc..8328733 100644
--- a/doc/src/sgml/config.sgml
+++ b/doc/src/sgml/config.sgml
@@ -4630,6 +4630,11 @@ local0.*/var/log/postgresql
  no
 
 
+ %n
+ Time stamp with milliseconds (as a Unix epoch)
+ no
+
+
  %i
  Command tag: type of session's current command
  yes
diff --git a/src/backend/utils/error/elog.c b/src/backend/utils/error/elog.c
index 088c714..80ffdbd 100644
--- a/src/backend/utils/error/elog.c
+++ b/src/backend/utils/error/elog.c
@@ -2438,6 +2438,20 @@ log_line_prefix(StringInfo buf, ErrorData *edata)
 		appendStringInfoString(buf, strfbuf);
 }
 break;
+			case 'n':
+{
+	struct	timeval tv;
+	char	strfbuf[128];
+
+	gettimeofday(&tv, NULL);
+	sprintf(strfbuf, "%ld.%.03d", tv.tv_sec, (int)(tv.tv_usec / 1000));
+
+	if (padding != 0)
+		appendStringInfo(buf, "%*s", padding, strfbuf);
+	else
+		appendStringInfoString(buf, strfbuf);
+}
+break;
 			case 's':
 if (formatted_start_time[0] == '\0')
 	setup_formatted_start_time();
diff --git a/src/backend/utils/misc/postgresql.conf.sample b/src/backend/utils/misc/postgresql.conf.sample
index e5d275d..34abd17 100644
--- a/src/backend/utils/misc/postgresql.conf.sample
+++ b/src/backend/utils/misc/postgresql.conf.sample
@@ -425,6 +425,7 @@
 	#   %p = process ID
 	#   %t = timestamp without milliseconds
 	#   %m = timestamp with milliseconds
+	#   %n = timestamp with milliseconds (as a Unix epoch)
 	#   %i = command tag
 	#   %e = SQL state
 	#   %c = session ID

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] exposing pg_controldata and pg_config as functions

2015-08-22 Thread Joe Conway
-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1

On 08/20/2015 07:54 AM, Joe Conway wrote:
> On 08/20/2015 06:59 AM, Andrew Dunstan wrote:
>> I was a bit interested in pg_config information, for this reason:
>> I had a report of someone who had configured using --with-libxml
>> but the xml tests actually returned the results that are expected
>> without xml being configured. The regression tests thus passed,
>> but should not have. It occurred to me that if we had a test
>> like
> 
>> select pg_config('configure') ~ '--with-libxml' as has_xml;
> 
>> in the xml tests then this failure mode would be detected.
> 
> I've found use for them both in the past. A fair amount of bit-rot
> has set it no doubt, and these were quick and dirty to begin with,
> but I have these hacks from a while back:
> 
> https://github.com/jconway/pg_config

The attached implements pg_config as a backend SRF and a matching
system view. A few notes:

1) The syntax is a bit different than what Andrew proposed:

8<
select setting ~ '--with-libxml' as has_xml
from pg_config
where name = 'CONFIGURE';
 has_xml
- -
 t
(1 row)
8<

In particular note that the name values are all upper case to be
consistent with pg_config, and at least currently there is no version
of the function which accepts a name as an argument (didn't seem
worthwhile to me).

2) No docs or related regression test yet. I will do that if there is
enough interest in this getting committed. So far no one except Andrew
and I have chimed in.

3) Requires a catalog version bump (not in posted diff)

4) The static function cleanup_path() was borrowed from

src/bin/pg_config/pg_config.c

It is a small and stable function (no change since 2010 AFAICS), so
maybe not worth the effort, but I was wondering if it should be moved
to src/common somewhere and shared.

I will add this to the next commitfest. Comments/feedback encouraged.

Joe

- -- 
Crunchy Data - http://crunchydata.com
PostgreSQL Support for Secure Enterprises
Consulting, Training, & Open Source Development
-BEGIN PGP SIGNATURE-
Version: GnuPG v2.0.22 (GNU/Linux)

iQIcBAEBAgAGBQJV2PykAAoJEDfy90M199hlQW0P/1fLCtFe50wFanleOxo41aki
yR8uG5vUZCLosx7lYd4+LyeE2g+bfs+fK6XgL1qIafI0zyxQSAU8TtjsIPQjjsvU
rNn1MWRrlOLEfOMMzbJPo01w5wzLhBvFzrYQ8vVtvf+T2PzjbU1hTMOcmaeXv6If
jYv0KQDgPBk/VPZ0D7MI4nYXVzNSInDLD7TGEpoTQwZ0oqvZwScSXc933isoULB4
4isael+g6mQJNoPz+OQEhUSoC922mrGs12SarfHJiUqJs1/NleClRRZ/9llCBnb2
3+zW6cb4XNh8aVP33zTtCsbrio206VjumWUYMNs546+qChormBOnYtZiIVRNRnPk
z4x/vxuhXVndDp1VnE5V5mRiW3B8ABliBf1Bcnf/Z+Gxi84LaZVtmL2hJrmn7voT
EZsQn/gmpB6ThHKbOl3t060fGZ/RAPDUwOWoYUIVcohOQqxK/iIka0bFM5cnuXO0
8oJ7CFkPSW7kBPs3uPO4Psf/jzrfaK3b/ZfitoV77sMQiVCABlR3a8khw+cPBrok
av/1afnGfz6qSxsV8sAyKUmRZkLDtmT01GUHCuujof1PQ3tD8zVsQWI3r51UcGB3
tFKvvy9koTHEunqkU6yQrCWNOEzHpGXEa1RIV33Ywgh0deKVEU5EbfJF5iIHBgOy
dYf2PHbYW7F1RSqKnZIa
=A2+X
-END PGP SIGNATURE-
diff --git a/src/backend/catalog/system_views.sql b/src/backend/catalog/system_views.sql
index ccc030f..0d2e8f1 100644
*** a/src/backend/catalog/system_views.sql
--- b/src/backend/catalog/system_views.sql
*** CREATE VIEW pg_timezone_abbrevs AS
*** 425,430 
--- 425,433 
  CREATE VIEW pg_timezone_names AS
  SELECT * FROM pg_timezone_names();
  
+ CREATE VIEW pg_config AS
+ SELECT * FROM pg_config();
+ 
  -- Statistics views
  
  CREATE VIEW pg_stat_all_tables AS
diff --git a/src/backend/utils/misc/Makefile b/src/backend/utils/misc/Makefile
index 7889101..49c6f08 100644
*** a/src/backend/utils/misc/Makefile
--- b/src/backend/utils/misc/Makefile
*** include $(top_builddir)/src/Makefile.glo
*** 14,20 
  
  override CPPFLAGS := -I. -I$(srcdir) $(CPPFLAGS)
  
! OBJS = guc.o help_config.o pg_rusage.o ps_status.o rls.o \
 sampling.o superuser.o timeout.o tzparser.o
  
  # This location might depend on the installation directories. Therefore
--- 14,33 
  
  override CPPFLAGS := -I. -I$(srcdir) $(CPPFLAGS)
  
! # don't include subdirectory-path-dependent -I and -L switches
! STD_CPPFLAGS := $(filter-out -I$(top_srcdir)/src/include -I$(top_builddir)/src/include,$(CPPFLAGS))
! STD_LDFLAGS := $(filter-out -L$(top_builddir)/src/port,$(LDFLAGS))
! override CPPFLAGS += -DVAL_CONFIGURE="\"$(configure_args)\""
! override CPPFLAGS += -DVAL_CC="\"$(CC)\""
! override CPPFLAGS += -DVAL_CPPFLAGS="\"$(STD_CPPFLAGS)\""
! override CPPFLAGS += -DVAL_CFLAGS="\"$(CFLAGS)\""
! override CPPFLAGS += -DVAL_CFLAGS_SL="\"$(CFLAGS_SL)\""
! override CPPFLAGS += -DVAL_LDFLAGS="\"$(STD_LDFLAGS)\""
! override CPPFLAGS += -DVAL_LDFLAGS_EX="\"$(LDFLAGS_EX)\""
! override CPPFLAGS += -DVAL_LDFLAGS_SL="\"$(LDFLAGS_SL)\""
! override CPPFLAGS += -DVAL_LIBS="\"$(LIBS)\""
! 
! OBJS = guc.o help_config.o pg_config.o pg_rusage.o ps_status.o rls.o \
 sampling.o superuser.o timeout.o tzparser.o
  
  # This location might depend on the installation directories. Therefore
diff --git a/src/backend/utils/misc/pg_config.c b/src/backend/utils/misc/pg_con

Re: [HACKERS] PostgreSQL for VAX on NetBSD/OpenBSD

2015-08-22 Thread Greg Stark
On 22 Aug 2015 18:02, "Tom Lane"  wrote:
>
> Why not define infnan() and make it do the same thing as
> FloatExceptionHandler?  Or was that what you meant?

That's exactly what I meant, instead of my quick hack to add a signal
handler for sigill.

> > The hang is actually in the groupingset tests in
> > bipartite_match.c:hk_breadth_search().
...
> Is it that function itself that's hanging, or is the problem that its
> caller expects it to ultimately return true, and it never does?

I think it never exits that function but I'll try it again.

> I don't think we're really insisting on a true infinity here, only that
> get_float4_infinity() produce a large value that isinf() will recognize.
>
> I'm surprised that any of the hacks in src/port/isinf.c compile on Vax
> at all --- did you invent a new one?
>
> Also, I'd have thought that both get_floatX_infinity and get_floatX_nan
> would be liable to produce SIGFPE on non-IEEE machines.  Did you mess
> with those?

I didn't do anything. There's no isinf.o in that directory so I don't
think anything got compiled. There are other files in src/port but not
that.

> > The other place where non-IEEE floats are causing problems internal to
> > postgres appears to be inside spgist -- even when planning queries
> > using spgist:
>
> That's pretty odd --- it does not look like spgcostestimate does anything
> very exciting.  Can you get a stack trace showing where that FPE happens?

Hmm. The backtrace is here but I think it's lying about the specific line.

#0  convert_one_string_to_scalar (value=0x7f20e9a3 "  ",
rangelo=32, rangehi=122, 2132863395, 32, 122)
at selfuncs.c:3873
#1  0x00435880 in convert_string_to_scalar (
value=0x7f20e990 "Aztec\n", ' ' , "Ct  ",
scaledvalue=0x7fffdb44,
lobound=0x7f225bf4 "Audrey", ' ' , "Dr  ",
scaledlobound=0x7fffdb34,
hibound=0x7f225c40 "Balmoral", ' ' , "Dr  ",
scaledhibound=0x7fffdb3c, 2132863376, 2147474244, 2132958196,
2147474228, 2132958272, 2147474236) at selfuncs.c:3847

Stepping through the code it looks like it actually happens on line
3882 when denom overflows.

(gdb) n
3882denom *= base;
3: denom = 1.666427615935998e+37
2: num = 0.37361810145459621
1: slen = 0

(gdb) n
Program received signal SIGFPE, Arithmetic exception.
convert_one_string_to_scalar (value=0x7f20e9a4 " ",
rangelo=32, rangehi=122, 2132863396, 32, 122)
at selfuncs.c:3873


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] PATCH: numeric timestamp in log_line_prefix

2015-08-22 Thread Fabien COELHO


Hello Tomas,

Review of v2:


attached is a v2 of the patch, reworked based on the comments.


The patch applies cleanly to head, it compiles, I tested it and it mostly 
work as expected, see below.



1) fix the docs (explicitly say that it's a Unix epoch)


I would add the word "numeric" in front of timestamp both in the doc and 
in the postgresql.conf.sample, as it justifies the chosen %n.



2) handle 'padding' properly


I tried that without success. ISTM that what is padded is the empty 
string, and the timestamp is just printed on its own without padding 
afterwards.


I think that it should use a string buffer and then used the padding on 
the string, as case 'c' or 't' for instance.



3) get rid of timestamp_str - use appendString* methods directly


See above, I'm afraid it is necessary for padding, because there are two 
formatted fields.



4) support just the "with milliseconds" using '%n' escape sequence


Ok.

All those changes are quite trivial. The only annoying bit is that both '%u' 
and '%e' are already used, so none of the obvious choices for 'Unix Epoch' 
are available. So I simply took (%m+1) which is %n.


It stands for "numeric" as well, so I think it is quite nice.

--
Fabien.


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Error message with plpgsql CONTINUE

2015-08-22 Thread Tom Lane
I had a few second thoughts about the wording of the error messages
in this area.

First, consider

create or replace function foo() returns void language plpgsql as $$
begin
  <>
  loop
exit lab1;  -- ok
  end loop;
  loop
exit lab1;  -- not so ok
  end loop;
end$$;

ERROR:  label "lab1" does not exist
LINE 8: exit lab1;  -- not so ok
 ^

This message seems confusing: label "lab1" does exist, it's just not
attached to the right loop.  In a larger function that might not be too
obvious, and I can easily imagine somebody wasting some time before
figuring out the cause of his problem.  Given the way the namespace data
structure works, I am not sure that we can realistically detect at line 8
that there was an instance of lab1 earlier, but perhaps we could word the
error message to cover either possibility.  Maybe something like "there is
no label "foo" surrounding this statement"?

Second, consider

create or replace function foo() returns void language plpgsql as $$
begin
  <>
  begin
exit lab1;  -- ok
exit;   -- not so ok
  end;
end$$;

ERROR:  EXIT cannot be used outside a loop
LINE 6: exit;   -- not so ok
^

This is not too accurate, as shown by the fact that the first EXIT is
accepted.  Perhaps "EXIT without a label cannot be used outside a loop"?

I realize that this is pretty nitpicky, but if we're going to all the
trouble of improving the error messages about these things, seems like
we ought to be careful about what the messages actually say.

I'm not married to these particular wordings though.  Suggestions?

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] PATCH: numeric timestamp in log_line_prefix

2015-08-22 Thread Tomas Vondra

Hi all,

attached is a v2 of the patch, reworked based on the comments.

1) fix the docs (explicitly say that it's a Unix epoch)

2) handle 'padding' properly

3) get rid of timestamp_str - use appendString* methods directly

4) support just the "with milliseconds" using '%n' escape sequence

All those changes are quite trivial. The only annoying bit is that both 
'%u' and '%e' are already used, so none of the obvious choices for 'Unix 
Epoch' are available. So I simply took (%m+1) which is %n.


regards

--
Tomas Vondra  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
diff --git a/doc/src/sgml/config.sgml b/doc/src/sgml/config.sgml
index e900dcc..8328733 100644
--- a/doc/src/sgml/config.sgml
+++ b/doc/src/sgml/config.sgml
@@ -4630,6 +4630,11 @@ local0.*/var/log/postgresql
  no
 
 
+ %n
+ Time stamp with milliseconds (as a Unix epoch)
+ no
+
+
  %i
  Command tag: type of session's current command
  yes
diff --git a/src/backend/utils/error/elog.c b/src/backend/utils/error/elog.c
index 088c714..4520b9f 100644
--- a/src/backend/utils/error/elog.c
+++ b/src/backend/utils/error/elog.c
@@ -2438,6 +2438,18 @@ log_line_prefix(StringInfo buf, ErrorData *edata)
 		appendStringInfoString(buf, strfbuf);
 }
 break;
+			case 'n':
+{
+	struct timeval tv;
+	gettimeofday(&tv, NULL);
+
+	if (padding != 0)
+		appendStringInfo(buf, "%*s", padding, "");
+
+	appendStringInfo(buf, "%ld.%.03d", tv.tv_sec,
+	   (int)(tv.tv_usec / 1000));
+}
+break;
 			case 's':
 if (formatted_start_time[0] == '\0')
 	setup_formatted_start_time();
diff --git a/src/backend/utils/misc/postgresql.conf.sample b/src/backend/utils/misc/postgresql.conf.sample
index e5d275d..34abd17 100644
--- a/src/backend/utils/misc/postgresql.conf.sample
+++ b/src/backend/utils/misc/postgresql.conf.sample
@@ -425,6 +425,7 @@
 	#   %p = process ID
 	#   %t = timestamp without milliseconds
 	#   %m = timestamp with milliseconds
+	#   %n = timestamp with milliseconds (as a Unix epoch)
 	#   %i = command tag
 	#   %e = SQL state
 	#   %c = session ID

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Test code is worth the space

2015-08-22 Thread Jeff Janes
On Tue, Aug 18, 2015 at 3:32 PM, David Fetter  wrote:

> On Tue, Aug 18, 2015 at 04:54:07PM +0100, Greg Stark wrote:
> > On Tue, Aug 18, 2015 at 2:16 PM, David Fetter  wrote:
> > > I'm given to understand that this tight coupling is necessary for
> > > performance.  Are you saying that it could be unwound, or that
> > > testing strategies mostly need to take it into account, or...?
> >
> > I'm just saying that we shouldn't expect to find a magic bullet test
> > framework that solves all these problems. Without restructuring
> > code, which I don't think is really feasible, we won't be able to
> > have good unit test coverage for most existing code.
> >
> > It might be more practical to start using such a new tool for new
> > code only. Then the new code could be structured in ways that allow
> > the environment to be mocked more easily and the results observed
> > more easily.
>
> Great!
>
> Do we have examples of such tools and code bases structured to
> accommodate them that we'd like to use for reference, or at least for
> inspiration?
>

+1 on that.  It would be helpful to see successful examples.  Especially
ones written in C.

I can't really figure out what success looks like just from reading the
descriptions.

Cheers,

Jeff


Re: [HACKERS] Potential GIN vacuum bug

2015-08-22 Thread Jeff Janes
On Tue, Aug 18, 2015 at 8:59 AM, Robert Haas  wrote:

> On Mon, Aug 17, 2015 at 5:41 PM, Jeff Janes  wrote:
> > User backends attempt to take the lock conditionally, because otherwise
> they
> > would cause an autovacuum already holding the lock to cancel itself,
> which
> > seems quite bad.
> >
> > Not that this a substantial behavior change, in that with this code the
> user
> > backends which find the list already being cleaned will just add to the
> end
> > of the pending list and go about their business.  So if things are added
> to
> > the list faster than they can be cleaned up, the size of the pending list
> > can increase without bound.
> >
> > Under the existing code each concurrent user backend will try to clean
> the
> > pending list at the same time.  The work doesn't parallelize, so doing
> this
> > is just burns CPU (and possibly consuming up to maintenance_work_mem  for
> > *each* backend) but it does server to throttle the insertion rate and so
> > keep the list from growing without bound.
> >
> > This is just a proof-of-concept patch, because I don't know if this
> approach
> > is the right approach.
>
> I'm not sure if this is the right approach, but I'm a little wary of
> involving the heavyweight lock manager in this.  If pending list
> cleanups are frequent, this could involve a lot of additional lock
> manager traffic, which could be bad for performance.


I don't think 10 cleanups a second should be a problem (and most of those
would probably fail to acquire the lock, but I don't know if that would
make a difference).  If there were several hundred per second, I think you
would have bigger problems than traffic through the lock manager.  In that
case, it is time to either turn off fastupdate, or increase the pending
list size.

As a mini-vacuum, it seems natural to me to hold a lock of the same nature
as a regular vacuum, but just on the one index involved rather than the
hole table.


> Even if they are
> infrequent, it seems like it would be more natural to handle this
> without some regime of locks and pins and buffer cleanup locks on the
> buffers that are storing the pending list, rather than a heavyweight
> lock on the whole relation.  But I am just waving my hands wildly
> here.
>

I also thought of a buffer clean up lock on the pending list head buffer to
represent the right to do a clean up.  But with the proviso that once you
have obtained the clean up lock, you can then drop the exclusive buffer
content lock and continue to hold the conceptual lock just by maintaining
the pin.  I think that this would be semantically correct, but backends
doing a cleanup would have to get the lock conditionally, and I think you
would have too many chances for false failures where it bails out when the
other party simply holds a pin.  I guess I could implement it and see how
it fairs in my test case.

Cheers,

Jeff


Re: [HACKERS] (full) Memory context dump considered harmful

2015-08-22 Thread Tom Lane
Tomas Vondra  writes:
> One question regarding the proposed patch though - if I get it right 
> (just looking at the diff), it simply limits the output to first 100 
> child contexts at each level independently. So if each of those 100 
> child contexts has >100 child contexts on it's own, we get 100x100 
> lines. And so on, if the hierarchy is deeper. This probably is not 
> addressable without introducing some global counter of printed contexts, 
> and it may not be an issue at all (all the cases I could remember have a 
> single huge context or many sibling contexts).

Right.  The situation Stefan was complaining of, almost certainly,
involved a huge number of children of the same context.  This patch
would successfully abbreviate that case, no matter where it happened
in the context tree exactly.  In principle, if you were getting that
sort of expansion at multiple levels of the context tree concurrently,
you could still get a mighty long context dump ... but I really doubt
that would happen in practice.  (And if it did happen, an overall limit
on the number of contexts printed would hide the fact that it was
happening, which wouldn't be desirable.)

>> One thing we could consider doing to improve the odds that it's fine
>> would be to rearrange things so that child contexts of the same
>> parent are more likely to be "similar" --- for example, break out
>> all relcache entries to be children of a RelCacheContext instead of
>> the generic CacheMemoryContext, likewise for cached plans, etc. But
>> I'm not yet convinced that'd be worth the trouble.

> That'd be nice but I see that as an independent improvement - it might 
> improve the odds for internal contexts, but what about contexts coming 
> from user code (e.g. custom aggregates)?

Yeah, cases like custom aggregates would be hard to classify.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] (full) Memory context dump considered harmful

2015-08-22 Thread Tomas Vondra


On 08/22/2015 06:06 PM, Tom Lane wrote:

Tomas Vondra  writes:


Couldn't we make it a bit smarter to handle even cases like this? For
example we might first count/sum the child contexts, and then print
either all child contexts (if there are only a few of them) or just
those that are >5% of the total, 2x the average or something like that.


That seems overly complicated. In the first place, you couldn't
possibly implement that without two passes over the context set,
which would be mighty expensive. (In the situations where this is
getting run, most likely portions of the address space have been
swapped out, and we'll have to swap them back in again. Bad enough to
iterate over the whole process address space once, but twice?) In the
second place, it would seem to require a much wider API between
MemoryContextStats and the per-context-type stats methods, including
for example a way to decide which numbers about a context were the
most important ones. I'm already concerned about having had to expose
a set of accumulatable numbers at all ... don't want the API to get
into their semantics in that kind of detail.


Sure, the two passes seem quite annoying, although I'm not convinced 
it'd be a problem in practice - most modern systems I've been dealing 
with recently were configured with the 'no swapping' goal, and using 
only a tiny swap for extreme cases (e.g. 256GB RAM with 4GB of swap).


If only we had some memory accounting in place ;-) Then we wouldn't have 
to do the two passes ...



As I said, based on my experience in looking at memory context dumps
(and I've seen a few), a show-the-first-N-children heuristic
probably will work fine. If we find out differently we can work on
refining it, but I don't think a complex design is warranted at this
stage.


OK, let's go with the simple approach.

I'm still not quite convinced having no GUC for turning this off is a 
good idea, though. From time to time we're dealing with OOM issues at 
customer systems, with very limited access (e.g. no gdb). In those cases 
the memory context is the only information we have initial, and it's 
usually quite difficult to get additional info.


I agree that the 'first-N-children' heuristics is going to work in most 
cases, but if it doesn't it'd be nice to have a way to force "full" 
stats in ad-hoc manner (i.e. disable before query, etc.).


OTOH now that I'm thinking about it, most OOM errors I've seen (at least 
on Linux) were either because of exceptionally large palloc() requests 
(e.g. because of varlena header corruption, overflow bugs, ...) or 
because of OOM killer. In the first case the memory context stats are 
utterly useless because the issue has nothing to do with other memory 
contexts, in the latter case you don't get any stats at all because the 
process is simply killed.


One question regarding the proposed patch though - if I get it right 
(just looking at the diff), it simply limits the output to first 100 
child contexts at each level independently. So if each of those 100 
child contexts has >100 child contexts on it's own, we get 100x100 
lines. And so on, if the hierarchy is deeper. This probably is not 
addressable without introducing some global counter of printed contexts, 
and it may not be an issue at all (all the cases I could remember have a 
single huge context or many sibling contexts).



One thing we could consider doing to improve the odds that it's fine
 would be to rearrange things so that child contexts of the same
parent are more likely to be "similar" --- for example, break out
all relcache entries to be children of a RelCacheContext instead of
the generic CacheMemoryContext, likewise for cached plans, etc. But
I'm not yet convinced that'd be worth the trouble.


That'd be nice but I see that as an independent improvement - it might 
improve the odds for internal contexts, but what about contexts coming 
from user code (e.g. custom aggregates)?


kind regards

--
Tomas Vondra  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] PostgreSQL for VAX on NetBSD/OpenBSD

2015-08-22 Thread Tom Lane
Greg Stark  writes:
> On Thu, Aug 20, 2015 at 3:29 PM, Tom Lane  wrote:
>> That seems worth poking into.

> Mea culpa. Not a planner crash but rather an overflow from exp(). It
> turns out exp() and other math library functions on Vax do not signal
> FPE but rather have a curious api that lets us catch the overflow by
> defining a function "infnan()" to call when it overflows. If we don't
> define that function then it executes an illegal instruction which
> generates SIGILL with errno set to EDOM (iirc). For the moment I've
> just attached our FPE handler to SIGILL and that's letting me run the
> tests without crashes. It's probably just silly make-work but it would
> be pretty easy to add a simple function to call our FPE handler
> directly to avoid having to have a SIGILL handler which seems like a
> bad idea in general.

Why not define infnan() and make it do the same thing as 
FloatExceptionHandler?  Or was that what you meant?

> The hang is actually in the groupingset tests in
> bipartite_match.c:hk_breadth_search().

> Looking at that function it's not surprising that it doesn't work
> without IEEE floats given that the first line is
>   distance[0] = get_float4_infinity();

> And the return value of the function is
>   !isinf(distance[0]);

Is it that function itself that's hanging, or is the problem that its
caller expects it to ultimately return true, and it never does?

I don't think we're really insisting on a true infinity here, only that
get_float4_infinity() produce a large value that isinf() will recognize.

I'm surprised that any of the hacks in src/port/isinf.c compile on Vax
at all --- did you invent a new one?

Also, I'd have thought that both get_floatX_infinity and get_floatX_nan
would be liable to produce SIGFPE on non-IEEE machines.  Did you mess
with those?

> The other place where non-IEEE floats are causing problems internal to
> postgres appears to be inside spgist -- even when planning queries
> using spgist:

That's pretty odd --- it does not look like spgcostestimate does anything
very exciting.  Can you get a stack trace showing where that FPE happens?

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] (full) Memory context dump considered harmful

2015-08-22 Thread Tom Lane
Tomas Vondra  writes:
> On 08/21/2015 08:37 PM, Tom Lane wrote:
>> ...  But suppose we add a parameter to memory context stats
>> collection that is the maximum number of child contexts to print *per
>> parent context*.  If there are more than that, summarize the rest as per
>> your suggestion.
>> 
>> The case where you would lose important data is where the serious
>> bloat is in some specific child context that is after the first N
>> children of its direct parent. I don't believe I've ever seen a case
>> where that was critical information as long as N isn't too tiny.

> Couldn't we make it a bit smarter to handle even cases like this? For 
> example we might first count/sum the child contexts, and then print 
> either all child contexts (if there are only a few of them) or just 
> those that are >5% of the total, 2x the average or something like that.

That seems overly complicated.  In the first place, you couldn't possibly
implement that without two passes over the context set, which would be
mighty expensive.  (In the situations where this is getting run, most
likely portions of the address space have been swapped out, and we'll have
to swap them back in again.  Bad enough to iterate over the whole process
address space once, but twice?)  In the second place, it would seem to
require a much wider API between MemoryContextStats and the
per-context-type stats methods, including for example a way to decide
which numbers about a context were the most important ones.  I'm already
concerned about having had to expose a set of accumulatable numbers at all
... don't want the API to get into their semantics in that kind of detail.

As I said, based on my experience in looking at memory context dumps
(and I've seen a few), a show-the-first-N-children heuristic probably
will work fine.  If we find out differently we can work on refining it,
but I don't think a complex design is warranted at this stage.

One thing we could consider doing to improve the odds that it's fine
would be to rearrange things so that child contexts of the same parent
are more likely to be "similar" --- for example, break out all relcache
entries to be children of a RelCacheContext instead of the generic
CacheMemoryContext, likewise for cached plans, etc.  But I'm not yet
convinced that'd be worth the trouble.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] PATCH: numeric timestamp in log_line_prefix

2015-08-22 Thread Tomas Vondra

Hi,

On 08/22/2015 08:39 AM, Fabien COELHO wrote:


Hello Tomas,


from time to time I need to correlate PostgreSQL logs to other logs,
containing numeric timestamps - a prime example of that is pgbench. With
%t and %m that's not quite trivial, because of timezones etc.

I propose adding two new log_line_prefix escape sequences - %T and %M,
doing the same thing as %t and %m, but formatting the value as a number.

Patch attached, I'll add it to CF 2015-06.


Are you planing to update this patch? I was wanting to use it for some
tests and figured out that it had stayed as a proposal, too bad.

I would suggest to maybe follow Tom's %u idea and fix the implementation
details wrt to comments received?


Yes, I plan to update it according to those comments.

regards

--
Tomas Vondra  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] New functions

2015-08-22 Thread Heikki Linnakangas

On 07/08/2015 01:15 PM, Дмитрий Воронин wrote:

07.07.2015, 18:34, "Michael Paquier" :


  Speaking of which, I have rewritten the patch as attached. This looks
  way cleaner than the previous version submitted. Dmitry, does that
  look fine for you?
  I am switching this patch as "Waiting on Author".


Michael, hello. I'm looking your variant of patch. You create
function ssl_extensions_info(), that gives information of SSL
extensions in more informative view. So, it's cool.


Should check the return value of every OpenSSL call for errors. At least 
BIO_new() could return NULL, but check all the docs of the others too.


Are all the functions used in the patch available in all the versions of 
OpenSSL we support?


Other than those little things, looks good to me.

- Heikki



--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Reducing ClogControlLock contention

2015-08-22 Thread Andres Freund
Hi,

Amit pinged me about my opinion of this patch. I don't really have
time/energy for an in-depth review right now, but since I'd looked
enough to see some troublesome points, I thought I'd write those.

On 2015-06-30 08:02:25 +0100, Simon Riggs wrote:
> Proposal for improving this is to acquire the ClogControlLock in Shared
> mode, if possible.

I find that rather scary. That requires for all read and write paths in
clog.c and slru.c to only ever read memory locations once. Otherwise
those reads may not get the same results. That'd mean we'd need to add
indirections via volatile to all reads/writes. It also requires that we
never read in 4 byte quantities.


> This is safe because people checking visibility of an xid must always run
> TransactionIdIsInProgress() first to avoid race conditions, which will
> always return true for the transaction we are currently committing. As a
> result, we never get concurrent access to the same bits in clog, which
> would require a barrier.

I don't think that's really sufficient. There's a bunch of callers do
lookups without such checks, e.g. in heapam.c. It might be safe due to
other circumstances, but at the very least this is a significant but
sublte API revision.

> Two concurrent writers might access the same word concurrently, so we
> protect against that with a new CommitLock. We could partition that by
> pageno also, if needed.

To me it seems better to make this more integrated with slru.c. Change
the locking so that the control lock (relevant for page mappings et al)
is different from the locks for reading/writing data.

>* If we're doing an async commit (ie, lsn is valid), then we must wait
>* for any active write on the page slot to complete.  Otherwise our
>* update could reach disk in that write, which will not do since we
> @@ -273,7 +290,9 @@ TransactionIdSetPageStatus(TransactionId xid, int 
> nsubxids,
>* write-busy, since we don't care if the update reaches disk sooner 
> than
>* we think.
>*/
> - slotno = SimpleLruReadPage(ClogCtl, pageno, XLogRecPtrIsInvalid(lsn), 
> xid);
> + if (!InRecovery)
> + LWLockAcquire(CommitLock, LW_EXCLUSIVE);
> + slotno = SimpleLruReadPage_ReadOnly(ClogCtl, pageno, 
> XLogRecPtrIsInvalid(lsn), xid);
>  
>   /*
>* Set the main transaction id, if any.
> @@ -312,6 +331,9 @@ TransactionIdSetPageStatus(TransactionId xid, int 
> nsubxids,
>   ClogCtl->shared->page_dirty[slotno] = true;
>  
>   LWLockRelease(CLogControlLock);
> +
> + if (!InRecovery)
> + LWLockRelease(CommitLock);
>  }

TransactionIdSetPageStatus() calls TransactionIdSetStatusBit(), which
writes an 8 byte variable (the lsn). That's not safe.

Maybe write_ok = XLogRecPtrIsInvalid(lsn) is trying to address that? If
so, I don't see how. A page is returned with only the shared lock held
if it's in VALID state before. Even if that were changed, this'd be a
mightily subtle thing to do without a very fat comment.


> @@ -447,7 +447,8 @@ SimpleLruReadPage(SlruCtl ctl, int pageno, bool write_ok,
>   * It is unspecified whether the lock will be shared or exclusive.
>   */
>  int
> -SimpleLruReadPage_ReadOnly(SlruCtl ctl, int pageno, TransactionId xid)
> +SimpleLruReadPage_ReadOnly(SlruCtl ctl, int pageno, bool write_ok,
> +TransactionId xid)
>  {
>   SlruShared  shared = ctl->shared;
>   int slotno;
> @@ -460,7 +461,9 @@ SimpleLruReadPage_ReadOnly(SlruCtl ctl, int pageno, 
> TransactionId xid)
>   {
>   if (shared->page_number[slotno] == pageno &&
>   shared->page_status[slotno] != SLRU_PAGE_EMPTY &&
> - shared->page_status[slotno] != 
> SLRU_PAGE_READ_IN_PROGRESS)
> + shared->page_status[slotno] != 
> SLRU_PAGE_READ_IN_PROGRESS &&
> + (write_ok ||
> +  shared->page_status[slotno] != 
> SLRU_PAGE_WRITE_IN_PROGRESS))
>   {
>   /* See comments for SlruRecentlyUsed macro */
>   SlruRecentlyUsed(shared, slotno);
> @@ -472,7 +475,7 @@ SimpleLruReadPage_ReadOnly(SlruCtl ctl, int pageno, 
> TransactionId xid)
>   LWLockRelease(shared->ControlLock);
>   LWLockAcquire(shared->ControlLock, LW_EXCLUSIVE);
>  
> - return SimpleLruReadPage(ctl, pageno, true, xid);
> + return SimpleLruReadPage(ctl, pageno, write_ok, xid);
>  }

This function's name would definitely need to be changed, and it'll need
to be documented when/how it's safe to use write_ok = true. Same with
slru.c's header.


A patch like this will need far more changes. Every read and write from
a page has to be guaranteed to only be done once, otherwise you can get
'effectively torn' values.

That is, you can't just do
static void
TransactionIdSetStatusBit(TransactionId xid, XidStatus status, XLogRecPtr lsn, 
int slotno)
...
char   *byteptr;