Re: [HACKERS] array_length(anyarray)

2014-01-10 Thread Dean Rasheed
On 10 January 2014 00:36, Marko Tiikkaja ma...@joh.to wrote:
 On 1/10/14, 1:20 AM, Merlin Moncure wrote:

 I'm piling on: it's not clear at all to me why you've special cased
 this to lower_bound=1.  First of all, there are other reasons to check
 length than iteration.


Yes, I agree. A length function that returned 0 for empty arrays would
be far from useless.


 Can you point me to some examples?


The example I see all the time is code like

if array_length(nodes, 1)  5 then
... do something ...

then you realise (or not as the case may be) that this doesn't work
for empty arrays, and have to remember to wrap it in a coalesce call.

Simply being able to write

if cardinality(nodes)  5 then
   ... do something ...

is not just shorter, easier to type and easier to read, it is far less
likely to be the source of subtle bugs.

Regards,
Dean


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [PATCH] Negative Transition Aggregate Functions (WIP)

2014-01-10 Thread David Rowley
On Fri, Jan 10, 2014 at 4:09 AM, Dean Rasheed dean.a.rash...@gmail.comwrote:

 Hi,

 Reading over this, I realised that there is a problem with NaN
 handling --- once the state becomes NaN, it can never recover. So the
 results using the inverse transition function don't match HEAD in
 cases like this:

 create table t(a int, b numeric);
 insert into t values(1,1),(2,2),(3,'NaN'),(4,3),(5,4);
 select a, b,
sum(b) over(order by a rows between 1 preceding and current row)
   from t;

 which in HEAD produces:

  a |  b  | sum
 ---+-+-
  1 |   1 |   1
  2 |   2 |   3
  3 | NaN | NaN
  4 |   3 | NaN
  5 |   4 |   7
 (5 rows)

 but with this patch produces:

  a |  b  | sum
 ---+-+-
  1 |   1 |   1
  2 |   2 |   3
  3 | NaN | NaN
  4 |   3 | NaN
  5 |   4 | NaN
 (5 rows)


Nice catch! Thanks for having a look at the patch.

Ok, so I thought about this and I don't think it's too big a problem at to
fix it all. I think it can be handled very similar to how I'm taking care
of NULL values in window frame. For these, I simply keep a count of them in
an int64 and when the last one leaves the aggregate context things can
continue as normal.

Lucky for us that all numeric aggregation (and now inverse aggregation)
goes through 2 functions. do_numeric_accum() and the new inverse version of
it do_numeric_discard(), both these functions operate on a NumericAggState
which in the attached I've changed the isNaN bool field to a NaNCount int64
field. I'm just doing NaNCount++ when we get a NaN value in
do_numeric_accum and NaNCount-- in do_numeric_discard(), in the final
functions I'm just checking if NaNCount  0.

Though this implementation does fix the reported problem unfortunately it
may have an undesired performance impact for numeric aggregate functions
when not uses in the context of a window.. Let me explain what I mean:

Previously there was some code in do_numeric_accum() which did:

if (state-isNaN || NUMERIC_IS_NAN(newval))
{
  state-isNaN = true;
  return;
}

Which meant that it didn't bother adding new perfectly valid numerics to
the aggregate totals when there was an NaN encountered previously. I had to
change this to continue on regardless as we still need to keep the totals
just in case all the NaN values are removed and the totals are required
once again. This means that the non-window version of SUM(numeric) and
AVG(numeric) and and the stddev aggregates for numeric pay a price and have
to keep on totaling after encountering NaN values. :(

If there was a way to know if the function was being called in a window
context or a normal aggregate context then we probably almost completely
restore that possible performance regression just by skipping the totaling
when not in windows context. I really don't know how common NaN values are
in the real world to know if this matters too much. I'd hazard a guess that
more people would benefit from inverse transitions on numeric types more,
but I have nothing to back that up.

I've attached version 2 of the patch which fixes the NaN problem and adds a
regression test to cover it.

Thanks again for testing this and finding the problem.

Regards

David Rowley


 Regards,
 Dean



inverse_transition_functions_v2.0.patch.gz
Description: GNU Zip compressed data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [PATCH] Negative Transition Aggregate Functions (WIP)

2014-01-10 Thread David Rowley
On Fri, Jan 10, 2014 at 5:15 AM, Tom Lane t...@sss.pgh.pa.us wrote:

 Dean Rasheed dean.a.rash...@gmail.com writes:
  Reading over this, I realised that there is a problem with NaN
  handling --- once the state becomes NaN, it can never recover. So the
  results using the inverse transition function don't match HEAD in
  cases like this:

 Ouch!  That takes out numeric, float4, and float8 in one fell swoop.

 Given the relative infrequency of NaNs in most data, it seems like
 it might still be possible to get a speedup if we could use inverse
 transitions until we hit a NaN, then do it the hard way until the
 NaN is outside the window, then go back to inverse transitions.
 I'm not sure though if this is at all practical from an implementation
 standpoint.  We certainly don't want the core code knowing about
 anything as datatype-specific as a NaN, but maybe the inverse transition
 function could have an API that allows reporting I can't do it here,
 fall back to the hard way.


I had thought about that API, not for numeric as I think I've managed to
find another solution, it was for MAX and MIN.

I posted an idea about it here:
http://www.postgresql.org/message-id/caaphdvqu+ygw0vbpbb+yxhrpg5vcy_kifyi8xmxfo8kyocz...@mail.gmail.com
but it didn't generate much interest at the time and I didn't have any
ideas on how the inverse aggregate functions would communicate this
inability to remove the value to the caller. Perhaps it would be an idea
still, but I had put it to the back of my mind in favour of tuplestore
indexes that could be created on the fly based on the row position within
the frame and the aggregate's sort operator on the aggregate value.
This would mean that MAX and MIN values could be found quickly all the time
rather than just when the value being removed happened not to affect the
current maximum or minimum. It's not something I have planned for this
patch though and I'd have lots of questions around memory allocation before
I'd want to start any work on it.

Regards

David Rowley


 regards, tom lane



Re: [HACKERS] [PATCH] Negative Transition Aggregate Functions (WIP)

2014-01-10 Thread David Rowley


 bool_or()
   FALSE can be removed, removing TRUE requires a rescan. Could be made
   fully invertible by counting the number of TRUE and FALSE values,
   similar to my suggestion for how to handle NaN for sum(numeric).
   Same works for bool_and().

 bit_or()
   Like boo_or(), 0 can be removed, everything else requires a rescan.
   Same works for bit_and()


Interesting, I'd not thought of any way to optimise these ones, but I had
originally thought about allowing the inverse transition functions to
report if they could perform the inverse transition based on the value they
received and if they reported failure, then perform the rescan.
I just don't quite know yet the base way for the inverse transition
function to communicate this to the caller yet. If you have any ideas on
the best way to do this then I'd really like to hear them.

Regards

David Rowley


Re: [HACKERS] array_length(anyarray)

2014-01-10 Thread Marko Tiikkaja

On 1/10/14, 9:04 AM, Dean Rasheed wrote:

On 10 January 2014 00:36, Marko Tiikkaja ma...@joh.to wrote:


Can you point me to some examples?



The example I see all the time is code like

if array_length(nodes, 1)  5 then
 ... do something ...

then you realise (or not as the case may be) that this doesn't work
for empty arrays, and have to remember to wrap it in a coalesce call.

Simply being able to write

if cardinality(nodes)  5 then
... do something ...

is not just shorter, easier to type and easier to read, it is far less
likely to be the source of subtle bugs


But this is what I don't understand: why do you care whether there's 
less than 5 elements in the array, but you don't care about how they're 
organized?  '[2:3]={1,2}'::int[] and '{{1},{2}}'::int[] both give the 
same result when unnest()ed, sure, but why do you want to accept such 
crap as input if you just want a list of elements?


I guess what I truly want is a less generic type that's like an array, 
but always one-dimensional with a lower bound of 1.  There's too much 
garbage that can be passed to a function taking an array as an input 
right now.



Regards,
Marko Tiikkaja


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] array_length(anyarray)

2014-01-10 Thread Merlin Moncure
On Fri, Jan 10, 2014 at 2:04 AM, Dean Rasheed dean.a.rash...@gmail.com wrote:
 On 10 January 2014 00:36, Marko Tiikkaja ma...@joh.to wrote:
 On 1/10/14, 1:20 AM, Merlin Moncure wrote:

 I'm piling on: it's not clear at all to me why you've special cased
 this to lower_bound=1.  First of all, there are other reasons to check
 length than iteration.


 Yes, I agree. A length function that returned 0 for empty arrays would
 be far from useless.


 Can you point me to some examples?


 The example I see all the time is code like

 if array_length(nodes, 1)  5 then
 ... do something ...

 then you realise (or not as the case may be) that this doesn't work
 for empty arrays, and have to remember to wrap it in a coalesce call.

 Simply being able to write

 if cardinality(nodes)  5 then
... do something ...

 is not just shorter, easier to type and easier to read, it is far less
 likely to be the source of subtle bugs.

right -- exactly.  or, 'ORDER BY cardinatility(nodes)', etc etc.
Furthermore, we already have pretty good support for iteration with
arrays via unnest().  What's needed for better iteration support (IMO)
is a function that does what unnest does but returns an array on
indexes (one per dimsension) -- a generalization of the
_pg_expandarray function.  Lets' say 'unnest_dims'.  'unnest_dims' is
non-trivial to code in user land while 'array_length' is an extremely
trivial wrapper to array_upper().

cardinality() (which is much better name for the function IMSNHO)
gives a*b*c values say for a 3d array also does something non-trivial
*particularly in the case of offset arrays*.

On Fri, Jan 10, 2014 at 3:36 AM, Marko Tiikkaja ma...@joh.to wrote:
 I guess what I truly want is a less generic type that's like an array, but 
 always one-dimensional with a lower bound of 1.

Your function would be the only one in the array API that implemented
special behaviors like that.  That's suggests to me that the less
generic function belongs in user land, not in the core array API.

merlin


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [PATCH] Negative Transition Aggregate Functions (WIP)

2014-01-10 Thread Dean Rasheed
On 10 January 2014 08:12, David Rowley dgrowle...@gmail.com wrote:
 On Fri, Jan 10, 2014 at 4:09 AM, Dean Rasheed dean.a.rash...@gmail.com
 wrote:

 Hi,

 Reading over this, I realised that there is a problem with NaN
 handling --- once the state becomes NaN, it can never recover. So the
 results using the inverse transition function don't match HEAD in
 cases like this:

 create table t(a int, b numeric);
 insert into t values(1,1),(2,2),(3,'NaN'),(4,3),(5,4);
 select a, b,
sum(b) over(order by a rows between 1 preceding and current row)
   from t;

 which in HEAD produces:

  a |  b  | sum
 ---+-+-
  1 |   1 |   1
  2 |   2 |   3
  3 | NaN | NaN
  4 |   3 | NaN
  5 |   4 |   7
 (5 rows)

 but with this patch produces:

  a |  b  | sum
 ---+-+-
  1 |   1 |   1
  2 |   2 |   3
  3 | NaN | NaN
  4 |   3 | NaN
  5 |   4 | NaN
 (5 rows)


 Nice catch! Thanks for having a look at the patch.

 Ok, so I thought about this and I don't think it's too big a problem at to
 fix it all. I think it can be handled very similar to how I'm taking care of
 NULL values in window frame. For these, I simply keep a count of them in an
 int64 and when the last one leaves the aggregate context things can continue
 as normal.

 Lucky for us that all numeric aggregation (and now inverse aggregation) goes
 through 2 functions. do_numeric_accum() and the new inverse version of it
 do_numeric_discard(), both these functions operate on a NumericAggState
 which in the attached I've changed the isNaN bool field to a NaNCount int64
 field. I'm just doing NaNCount++ when we get a NaN value in do_numeric_accum
 and NaNCount-- in do_numeric_discard(), in the final functions I'm just
 checking if NaNCount  0.


Cool, that sounds like a neat fix.


 Though this implementation does fix the reported problem unfortunately it
 may have an undesired performance impact for numeric aggregate functions
 when not uses in the context of a window.. Let me explain what I mean:

 Previously there was some code in do_numeric_accum() which did:

 if (state-isNaN || NUMERIC_IS_NAN(newval))
 {
   state-isNaN = true;
   return;
 }

 Which meant that it didn't bother adding new perfectly valid numerics to the
 aggregate totals when there was an NaN encountered previously. I had to
 change this to continue on regardless as we still need to keep the totals
 just in case all the NaN values are removed and the totals are required once
 again. This means that the non-window version of SUM(numeric) and
 AVG(numeric) and and the stddev aggregates for numeric pay a price and have
 to keep on totaling after encountering NaN values. :(


I suspect that NaNs almost never occur in practice, so the fact that
it might now take longer to tell you that the sum is NaN doesn't worry
me. More important is that it always gives the right answer.

Note, if anyone ever did this for floats, +/- Infinity would also need
to be handled, so you'd have to maintain 3 counts and deal with logic
like Infinity - Infinity = NaN.

Regards,
Dean


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] array_length(anyarray)

2014-01-10 Thread Marko Tiikkaja

On 1/10/14, 10:41 AM, Merlin Moncure wrote:

What's needed for better iteration support (IMO)
is a function that does what unnest does but returns an array on
indexes (one per dimsension) -- a generalization of the
_pg_expandarray function.  Lets' say 'unnest_dims'.


So  unnest_dims('{{1,2},{3,4}}'::int[])  would return  VALUES (1, 
'{1,2}'::int[]), (2, '{3,4}'::int[])?  If so, then yes, that's a 
functionality I've considered us to have been missing for a long time.



Regards,
Marko Tiikkaja


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] array_length(anyarray)

2014-01-10 Thread Merlin Moncure
On Fri, Jan 10, 2014 at 3:52 AM, Marko Tiikkaja ma...@joh.to wrote:
 On 1/10/14, 10:41 AM, Merlin Moncure wrote:

 What's needed for better iteration support (IMO)
 is a function that does what unnest does but returns an array on
 indexes (one per dimsension) -- a generalization of the
 _pg_expandarray function.  Lets' say 'unnest_dims'.


 So  unnest_dims('{{1,2},{3,4}}'::int[])  would return  VALUES (1,
 '{1,2}'::int[]), (2, '{3,4}'::int[])?  If so, then yes, that's a
 functionality I've considered us to have been missing for a long time.

not quite.  it returns int[], anyelement: so, using your example, you'd get:

[1,1], 1
[1,2], 2
[2,1], 3
[2,2], 4

like unnest() it would fully decompose the array do individual
elements.  what you have above slices the array which is useful,but
probably shouldn't live under the 'unnest' name -- perhaps 'slice'.
Pavel added it to pl/pgsql under the FOREACH syntax (FYI).

merlin


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [PATCH] Relocation of tablespaces in pg_basebackup

2014-01-10 Thread Gabriele Bartolini
Hi Steeve,

Il 09/01/14 22:38, Steeve Lennmark ha scritto:
 I'm a barman user myself so that was actually my initial thought.

Ah! Very good!
 If there aren't some kind of hidden internal that I've missed I don't see
 a way to convert an OID (only have OID and path at this stage) to a
 tablespace name. This solution, even though not optimal, is a lot
 better than my initial one where I used the OID directly.

Try:

SELECT spcname, oid, pg_tablespace_location(oid) FROM pg_tablespace

Thanks,
Gabriele

-- 
 Gabriele Bartolini - 2ndQuadrant Italia
 PostgreSQL Training, Services and Support
 gabriele.bartol...@2ndquadrant.it | www.2ndQuadrant.it



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [PATCH] Relocation of tablespaces in pg_basebackup

2014-01-10 Thread Magnus Hagander
On Fri, Jan 10, 2014 at 12:25 PM, Gabriele Bartolini 
gabriele.bartol...@2ndquadrant.it wrote:

 Hi Steeve,

 Il 09/01/14 22:38, Steeve Lennmark ha scritto:
  I'm a barman user myself so that was actually my initial thought.

 Ah! Very good!
  If there aren't some kind of hidden internal that I've missed I don't see
  a way to convert an OID (only have OID and path at this stage) to a
  tablespace name. This solution, even though not optimal, is a lot
  better than my initial one where I used the OID directly.

 Try:

 SELECT spcname, oid, pg_tablespace_location(oid) FROM pg_tablespace


That would require a second connection to the database. You cannot run that
query from the walsender session. And that's exactly the issue that Steeve
pointed out in his first email.

I think it's better to let pg_basebackup work at the lower level, and then
leave it to  higher level tools to be able to do the mapping to names.

-- 
 Magnus Hagander
 Me: http://www.hagander.net/
 Work: http://www.redpill-linpro.com/


Re: [HACKERS] [PATCH] Relocation of tablespaces in pg_basebackup

2014-01-10 Thread Andres Freund
On 2014-01-10 12:27:23 +0100, Magnus Hagander wrote:
 On Fri, Jan 10, 2014 at 12:25 PM, Gabriele Bartolini 
 gabriele.bartol...@2ndquadrant.it wrote:
 
  Hi Steeve,
 
  Il 09/01/14 22:38, Steeve Lennmark ha scritto:
   I'm a barman user myself so that was actually my initial thought.
 
  Ah! Very good!
   If there aren't some kind of hidden internal that I've missed I don't see
   a way to convert an OID (only have OID and path at this stage) to a
   tablespace name. This solution, even though not optimal, is a lot
   better than my initial one where I used the OID directly.
 
  Try:
 
  SELECT spcname, oid, pg_tablespace_location(oid) FROM pg_tablespace
 
 
 That would require a second connection to the database. You cannot run that
 query from the walsender session. And that's exactly the issue that Steeve
 pointed out in his first email.

Theoretically nothing is stopping us from providing a command outputting
that information - it's a global catalog, so we can access it without
problems.

 I think it's better to let pg_basebackup work at the lower level, and then
 leave it to  higher level tools to be able to do the mapping to names.

That doesn't negate this argument though. Not really convinced either
way yet.

Greetings,

Andres Freund

-- 
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] array_length(anyarray)

2014-01-10 Thread Florian Pflug
On Jan10, 2014, at 11:00 , Merlin Moncure mmonc...@gmail.com wrote:
 On Fri, Jan 10, 2014 at 3:52 AM, Marko Tiikkaja ma...@joh.to wrote:
 On 1/10/14, 10:41 AM, Merlin Moncure wrote:
 
 What's needed for better iteration support (IMO)
 is a function that does what unnest does but returns an array on
 indexes (one per dimsension) -- a generalization of the
 _pg_expandarray function.  Lets' say 'unnest_dims'.
 
 
 So  unnest_dims('{{1,2},{3,4}}'::int[])  would return  VALUES (1,
 '{1,2}'::int[]), (2, '{3,4}'::int[])?  If so, then yes, that's a
 functionality I've considered us to have been missing for a long time.
 
 not quite.  it returns int[], anyelement: so, using your example, you'd get:
 
 [1,1], 1
 [1,2], 2
 [2,1], 3
 [2,2], 4

Now that we have WITH ORDINALITY, it'd be sufficient to have a
variant of array_dims() that returns int[][] instead of text, say
array_dimsarray(). Your unnest_dims could then be written as

  unnest(array_dimsarray(array)) with ordinality

best regards,
florian pflug






-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [PATCH] Negative Transition Aggregate Functions (WIP)

2014-01-10 Thread Florian Pflug
On Jan10, 2014, at 09:34 , David Rowley dgrowle...@gmail.com wrote:
 I just don't quite know yet the base way for the inverse transition function 
 to communicate this to the caller yet. If you have any ideas on the best way 
 to do this then I'd really like to hear them.

Could they maybe just return NULL as the new state? It would mean that
aggregates that do want to provide an inverse transition function couldn't
use NULL as a valid aggregate state, but do we need to support that?

Looking at the code it seems that for quite a few existing aggregates,
the state remains NULL until the first non-NULL input is processed. But
that doesn't hurt much - those aggregates can remain as they are until
someone wants to add an inverse transfer function. Once that happens,
there's a choice between living with needless rescans on trailing NULL
inputs or changing the state type.

This solution isn't particularly pretty, but I don't currently see a good
alternative that allows implementing inverse transfer functions is something
other than C and avoid needless overhead for those which are written in C.

best regards,
Florian Pflug



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] array_length(anyarray)

2014-01-10 Thread Merlin Moncure
On Fri, Jan 10, 2014 at 6:00 AM, Florian Pflug f...@phlo.org wrote:
 On Jan10, 2014, at 11:00 , Merlin Moncure mmonc...@gmail.com wrote:
 On Fri, Jan 10, 2014 at 3:52 AM, Marko Tiikkaja ma...@joh.to wrote:
 On 1/10/14, 10:41 AM, Merlin Moncure wrote:

 What's needed for better iteration support (IMO)
 is a function that does what unnest does but returns an array on
 indexes (one per dimsension) -- a generalization of the
 _pg_expandarray function.  Lets' say 'unnest_dims'.


 So  unnest_dims('{{1,2},{3,4}}'::int[])  would return  VALUES (1,
 '{1,2}'::int[]), (2, '{3,4}'::int[])?  If so, then yes, that's a
 functionality I've considered us to have been missing for a long time.

 not quite.  it returns int[], anyelement: so, using your example, you'd get:

 [1,1], 1
 [1,2], 2
 [2,1], 3
 [2,2], 4

 Now that we have WITH ORDINALITY, it'd be sufficient to have a
 variant of array_dims() that returns int[][] instead of text, say
 array_dimsarray(). Your unnest_dims could then be written as

   unnest(array_dimsarray(array)) with ordinality

hm, not quite following that.  maybe an example?

my issue with 'WITH ORDINALITY' (while it's pretty neat) is that it
doesn't give you the dimension coordinate of each datum so you can't
really use it to slice.  with unnest_dims(), you an slice, say via:

select array_agg(value) from (unnest_dims('{{1,2},{3,4}}'::int[])
group by dims[1];
or
select array_agg(value) from (unnest_dims('{{1,2},{3,4}}'::int[])
where dims[1] = 2;

not super elegant, but good enough for most uses I think.   anyways,
getting back on topic, the question on the table is cardinality() vs
array_length, right?

merlin


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [PATCH] Negative Transition Aggregate Functions (WIP)

2014-01-10 Thread Tom Lane
Florian Pflug f...@phlo.org writes:
 On Jan10, 2014, at 09:34 , David Rowley dgrowle...@gmail.com wrote:
 I just don't quite know yet the base way for the inverse transition function 
 to communicate this to the caller yet. If you have any ideas on the best way 
 to do this then I'd really like to hear them.

 Could they maybe just return NULL as the new state? It would mean that
 aggregates that do want to provide an inverse transition function couldn't
 use NULL as a valid aggregate state, but do we need to support that?

Yeah, I was going to suggest the same.  Seems like it wouldn't be that
difficult to come up with some alternative representation for everything
seen so far was NULL, if you needed to.

 Looking at the code it seems that for quite a few existing aggregates,
 the state remains NULL until the first non-NULL input is processed. But
 that doesn't hurt much - those aggregates can remain as they are until
 someone wants to add an inverse transfer function. Once that happens,
 there's a choice between living with needless rescans on trailing NULL
 inputs or changing the state type.

Also, it might be reasonable for both the regular and the inverse
transition functions to be strict.  If a null entering the window
does not matter, then a null exiting the window doesn't either, no?

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] INSERT...ON DUPLICATE KEY LOCK FOR UPDATE

2014-01-10 Thread Heikki Linnakangas

On 01/10/2014 05:36 AM, Peter Geoghegan wrote:

While I realize that everyone is busy, I'm concerned about the lack of
discussing here. It's been 6 full days since I posted my benchmark,
which I expected to quickly clear some things up, or at least garner
interest, and yet no one has commented here since.


Nah, that's nothing. I have a patch in the January commitfest that was 
already posted for the previous commitfest. It received zero review back 
then, and still has no reviewer signed up, let alone anyone actually 
reviewing it. And arguably it's a bug fix!


http://www.postgresql.org/message-id/5285071b.1040...@vmware.com

Wink wink, if you're looking for patches to review... ;-)


  The alternative exclusion* patch still deadlocks in an unprincipled
fashion, when simple, idiomatic usage encounters contention. Heikki
intends to produce a revision that fixes the problem, though having
considered it carefully myself, I don't know what mechanism he has in
mind, and frankly I'm skeptical.


Here's an updated patch. Hope it works now... This is based on an older 
version, and doesn't include any fixes from your latest 
btreelock_insert_on_dup.v7.2014_01_07.patch. Please check the common 
parts, and copy over any relevant changes.


The fix for the deadlocking issue consists of a few parts. First, 
there's a new heavy-weight lock type, a speculative insertion lock, 
which works somewhat similarly to XactLockTableWait(), but is only held 
for the duration of a single speculative insertion. When a backend is 
about to begin a speculative insertion, it first acquires the 
speculative insertion lock. When it's done with the insertion, meaning 
it has either cancelled it by killing the already-inserted tuple or 
decided that it's going to go ahead with it, the lock is released.


The speculative insertion lock is keyed by Xid, and token. The lock can 
be taken many times in the same transaction, and token's purpose is to 
distinguish which insertion is currently in progress. The token is 
simply a backend-local counter, incremented each time the lock is taken.


In addition to the heavy-weight lock, there are new fields in PGPROC to 
indicate which tuple the backend is currently inserting. When the tuple 
is inserted, the backend fills in the relation's relfilenode and item 
pointer in MyProc-specInsert* fields, while still holding the buffer 
lock. The current speculative insertion token is also stored there.


With that mechanism, when another backend sees a tuple whose xmin is 
still in progress, it can check if the insertion is a speculative 
insertion. To do that, scan the proc array, and find the backend with 
the given xid. Then, check that the relfilenode and itempointer in that 
backend's PGPROC slot match the tuple, and make note of the token the 
backend had advertised.


HeapTupleSatisfiesDirty() does the proc array check, and returns the 
token in the snapshot, alongside snapshot-xmin. The caller can then use 
that information in place of XactLockTableWait().



There would be other ways to skin the cat, but this seemed like the 
quickest to implement. One more straightforward approach would be to use 
the tuple's TID directly in the speculative insertion lock's key, 
instead of Xid+token, but then the inserter would have to grab the 
heavy-weight lock while holding the buffer lock, which seems dangerous. 
Another alternative would be to store token in the heap tuple header, 
instead of PGPROC; a tuple that's still being speculatively inserted has 
no xmax, so it could be placed in that field. Or ctid.



More importantly, I have to question
whether we should continue to pursue that alternative approach, giving
what we now know about its performance characteristics.


Yes.


It could be
improved, but not by terribly much, particularly for the case where
there is plenty of update contention, which was shown in [1] to be
approximately 2-3 times slower than extended page locking (*and*  it's
already looking for would-be duplicates*first*). I'm trying to be as
fair as possible, and yet the difference is huge.


*shrug*. I'm not too concerned about performance during contention. But 
let's see how this fixed version performs. Could you repeat the tests 
you did with this?


Any guesses what the bottleneck is? At a quick glance at a profile of a 
pgbench run with this patch, I didn't see anything out of ordinary, so 
I'm guessing it's lock contention somewhere.


- Heikki


speculative-insertions-2014_01_10.patch.gz
Description: GNU Zip compressed data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] INSERT...ON DUPLICATE KEY LOCK FOR UPDATE

2014-01-10 Thread Heikki Linnakangas

On 01/08/2014 06:46 AM, Peter Geoghegan wrote:

A new revision of my patch is attached.


I'm getting deadlocks with this patch, using the test script you posted 
earlier in 
http://www.postgresql.org/message-id/CAM3SWZQh=8xnvgbbzyhjexujbhwznjutjez9t-dbo9t_mx_...@mail.gmail.com. 
Am doing something wrong, or is that a regression?


- Heikki


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] getenv used in libpq caused missing values under Windows

2014-01-10 Thread Pavel Golub
Hello, Pgsql-hackers.

As you probably know dealing with Windows MSVCRT is some kind of hell.
That's why we have src/port/win32env.c particulalry, because there may
be several CRT's loaded at the same time.

libpq unfortunately is using standard getenv function call to
fill connection parameters inside conninfo_add_defaults(), see
http://www.postgresql.org/docs/9.3/interactive/libpq-envars.html

This happens because MSVCRT has it's own local copy of environment
variables table, and this table is not updated during execution. So if
one used SetEnvironmentVariable() Windows API to set any variable,
e.g. PGHOST, then this change will not be visible inside libpq's
conninfo_add_defaults() function where all default values are
obtained.

This situation especially unpleasant for non-C developers, since there
is no opportunity to use standard putenv function or ported
pgwin32_putenv.

My proposal is to implement pgwin32_getenv function which will call
GetEnvironmentVariable first and if it's NULL then call MSVCRT's
getenv in the same way as pgwin32_putenv does.

So now the bad scenario is:

1. SetEnvironmentVariable('PGHOST=192.188.9.9')
2. PQconnectdbParams without specifying host parameter will fail

-- 
With best wishes,
 Pavel  mailto:pa...@gf.microolap.com



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Add CREATE support to event triggers

2014-01-10 Thread Simon Riggs
On 8 January 2014 20:42, Alvaro Herrera alvhe...@2ndquadrant.com wrote:

 CREATE SCHEMA IF NOT EXISTS some schema AUTHORIZATION some guy;

Hmm, given in 9.3 it was OK to have only DROP event triggers, I think
it should be equally acceptable to have just CREATE, but without every
option on CREATE.  CREATE SCHEMA is easily the most complex thing here
and would be the command/event to deprioritise if we had any issues
getting this done/agreeing something for 9.4.

-- 
 Simon Riggs   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Performance Improvement by reducing WAL for Update Operation

2014-01-10 Thread Robert Haas
 2. Provide a new reloption to specify Wal compression
 for update operation on table
 Create table tbl(c1 char(100)) With (compress_wal = true);

 Alternative options:
 a. compress_wal can take input as operation, e.g. 'insert', 'update',
 b. use alternate syntax:
 Create table tbl(c1 char(100))  Compress Wal For Update;
 c. anything better?

I think WITH (compress_wal = true) is pretty good.  I don't understand
your point about taking the operation as input, because this only
applies to updates.  But we could try to work update into the name
of the setting somehow, so as to be less likely to conflict with
future additions, like maybe wal_compress_update.  I think the
alternate syntax you propose is clearly worse, because it would
involve adding new keywords, something we try to avoid.

The only possible enhancement I can think of here is to make the
setting an integer rather than a Boolean, defined as the minimum
acceptable compression ratio.  A setting of 0 means always compress; a
setting of 100 means never compress; intermediate values define the
least acceptable ratio.  But to be honest, I think that's overkill;
I'd be inclined to hard-code the default value of 25 in the patch and
make it a #define.  The only real advantage of requiring a minimum 25%
compression percentage is that we can bail out on compression
three-quarters of the way through the tuple if we're getting nowhere.
That's fine for what it is, but the idea that users are going to see
much benefit from twaddling that number seems very dubious to me.

 Points to consider
 -

 1. As the current algorithm store the entry for same chunks at head of list,
it will always find last but one chunk (we don't store last 4 bytes) for
long matching string during match phase in encoding (pgrb_delta_encode).

We can improve it either by storing same chunks at end of list instead of 
 at
head or by trying to find a good_match technique used in lz algorithm.
Finding good_match technique can have overhead in some of the cases
when there is actually no match.

I don't see what the good_match thing has to do with anything in the
Rabin algorithm.  But I do think there might be a bug here, which is
that, unless I'm misinterpreting something, hp is NOT the end of the
chunk.  After calling pgrb_hash_init(), we've looked at the first FOUR
bytes of the input.  If we find that we have a zero hash value at that
point, shouldn't the chunk size be 4, not 1?  And similarly if we find
it after sucking in one more byte, shouldn't the chunk size be 5, not
2?  Right now, we're deciding where the chunks should end based on the
data in the chunk plus the following 3 bytes, and that seems wonky.  I
would expect us to include all of those bytes in the chunk.

 2. Another optimization that we can do in pgrb_find_match(), is that
 currently if
 it doesn't find the first chunk (chunk got by hash index) matching, it
 continues to find the match in other chunks. I am not sure if there is any
 benefit to search for other chunks if first one is not matching.

Well, if you took that out, I suspect it would hurt the compression
ratio.  Unless the CPU savings are substantial, I'd leave it alone.

 3. We can move code from pg_lzcompress.c to some new file pg_rbcompress.c,
 if we want to move, then we need to either duplicate some common macros
 like pglz_out_tag or keep it common, but might be change the name.

+1 for a new file.

 4. Decide on min and max chunksize. (currently kept as 2 and 4 respectively).
 The point to consider is that if we keep bigger chunk sizes, then it can
 save us on CPU cycles, but less reduction in Wal, on the other side if we
 keep it small it can have better reduction in Wal but consume more CPU
 cycles.

Whoa.  That seems way too small.  Since PGRB_PATTERN_AFTER_BITS is 4,
the average length of a chunk is about 16 bytes.  It makes little
sense to have the maximum chunk size be 25% of the expected chunk
length.  I'd recommend making the maximum chunk length something like
4 * PGRB_CONST_NUM, and the minimum chunk length maybe something like
4.

 5. kept an guc variable 'wal_update_compression_ratio', for test purpose, we
can remove it before commit.

Let's remove it now.

 7. docs needs to be updated, tab completion needs some work.

Tab completion can be skipped for now, but documentation is important.

 8. We can extend Alter Table to set compress option for table.

I don't understand what you have in mind here.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Add CREATE support to event triggers

2014-01-10 Thread Robert Haas
On Thu, Jan 9, 2014 at 5:17 PM, Jim Nasby j...@nasby.net wrote:
 On 1/9/14, 11:58 AM, Alvaro Herrera wrote:
 Robert Haas escribió:

 On Wed, Jan 8, 2014 at 10:27 PM, Alvaro Herrera
 alvhe...@2ndquadrant.com wrote:


 Hmm.  This seems like a reasonable thing to do, except that I would like
 the output to always be the constant, and have some other way to
 enable the clause or disable it.  With your present boolean:
 so

 if_not_exists: {output: IF NOT EXISTS,
present: true/false}


 Why not:

 if_not_exists: true/false


 Yeah, that's another option.  If we do this, though, the expansion
 function would have to know that an if_not_exist element expands to IF
 NOT EXISTS.  Maybe that's okay.  Right now, the expansion function is
 pretty stupid, which is nice.

 Yeah, the source side of this will always have to understand the nuances of
 every command; it'd be really nice to not burden the other side with that as
 well. The only downside I see is a larger JSON output, but meh.

 Another advantage is if you really wanted to you could modify the output
 formatting in the JSON doc to do something radically different if so
 inclined...

Yeah.  I wasn't necessarily objecting to the way Alvaro did it, just
asking why he did it that way.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Standalone synchronous master

2014-01-10 Thread Bruce Momjian
On Fri, Jan 10, 2014 at 10:21:42AM +0530, Amit Kapila wrote:
 On Thu, Jan 9, 2014 at 10:45 PM, Bruce Momjian br...@momjian.us wrote:
 
  I think RAID-1 is a very good comparison because it is successful
  technology and has similar issues.
 
  RAID-1 is like Postgres synchronous_standby_names mode in the sense that
  the RAID-1 controller will not return success until writes have happened
  on both mirrors, but it is unlike synchronous_standby_names in that it
  will degrade and continue writes even when it can't write to both
  mirrors.  What is being discussed is to allow the RAID-1 behavior in
  Postgres.
 
  One issue that came up in discussions is the insufficiency of writing a
  degrade notice in a server log file because the log file isn't durable
  from server failures, meaning you don't know if a fail-over to the slave
  lost commits.  The degrade message has to be stored durably against a
  server failure, e.g. on a pager, probably using a command like we do for
  archive_command, and has to return success before the server continues
  in degrade mode.  I assume degraded RAID-1 controllers inform
  administrators in the same way.
 
 Here I think if user is aware from beginning that this is the behaviour,
 then may be the importance of message is not very high.
 What I want to say is that if we provide a UI in such a way that user
 decides during setup of server the behavior that is required by him.
 
 For example, if we provide a new parameter
 available_synchronous_standby_names along with current parameter
 and ask user to use this new parameter, if he wishes to synchronously
 commit transactions on another server when it is available, else it will
 operate as a standalone sync master.

I know there was a desire to remove this TODO item, but I think we have
brought up enough new issues that we can keep it to see if we can come
up with a solution.  I have added a link to this discussion on the TODO
item.

I think we will need at least four new GUC variables:

*  timeout control for degraded mode
*  command to run during switch to degraded mode
*  command to run during switch from degraded mode
*  read-only variable to report degraded mode

-- 
  Bruce Momjian  br...@momjian.ushttp://momjian.us
  EnterpriseDB http://enterprisedb.com

  + Everyone has their own god. +


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Add CREATE support to event triggers

2014-01-10 Thread Robert Haas
On Fri, Jan 10, 2014 at 10:36 AM, Simon Riggs si...@2ndquadrant.com wrote:
 On 8 January 2014 20:42, Alvaro Herrera alvhe...@2ndquadrant.com wrote:

 CREATE SCHEMA IF NOT EXISTS some schema AUTHORIZATION some guy;

 Hmm, given in 9.3 it was OK to have only DROP event triggers, I think
 it should be equally acceptable to have just CREATE, but without every
 option on CREATE.  CREATE SCHEMA is easily the most complex thing here
 and would be the command/event to deprioritise if we had any issues
 getting this done/agreeing something for 9.4.

I don't know that I agree with that, but I guess we can cross that
bridge when we come to it.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Add CREATE support to event triggers

2014-01-10 Thread Simon Riggs
On 10 January 2014 15:48, Robert Haas robertmh...@gmail.com wrote:
 On Fri, Jan 10, 2014 at 10:36 AM, Simon Riggs si...@2ndquadrant.com wrote:
 On 8 January 2014 20:42, Alvaro Herrera alvhe...@2ndquadrant.com wrote:

 CREATE SCHEMA IF NOT EXISTS some schema AUTHORIZATION some guy;

 Hmm, given in 9.3 it was OK to have only DROP event triggers, I think
 it should be equally acceptable to have just CREATE, but without every
 option on CREATE.  CREATE SCHEMA is easily the most complex thing here
 and would be the command/event to deprioritise if we had any issues
 getting this done/agreeing something for 9.4.

 I don't know that I agree with that, but I guess we can cross that
 bridge when we come to it.

We've come to it...

You would prefer either everything or nothing?? On what grounds?

-- 
 Simon Riggs   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Standalone synchronous master

2014-01-10 Thread Simon Riggs
On 10 January 2014 15:47, Bruce Momjian br...@momjian.us wrote:

 I know there was a desire to remove this TODO item, but I think we have
 brought up enough new issues that we can keep it to see if we can come
 up with a solution.

Can you summarise what you think the new issues are? All I see is some
further rehashing of old discussions.

There is already a solution to the problem because the docs are
already very clear that you need multiple standbys to achieve commit
guarantees AND high availability. RTFM is usually used as some form of
put down, but that is what needs to happen here.

-- 
 Simon Riggs   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [ADMIN] Column missing from pg_statistics

2014-01-10 Thread Tom Lane
Kadri Raudsepp kadri.rauds...@nordicgaming.com writes:
 I have set up a cron-job that sends me daily reports on bloat amount in
 tables and indices, which I calculate using pg_stats, not pgstattuple, for
 performance and I/O reasons. If the bloat amount or percentage are big
 enough, I use pg_repack to get rid of it. At some point I noticed, that
 some tables keep showing up in the reports with the same amount of bloat,
 which pg_repack was seemingly unable to remove. Investigation showed that
 pgstattuple gave very different results than my bloat-finding query.
 Reason - for some tables there are some columns that never show up in
 pg_statistics.

Hmm.  Eyeballing the ANALYZE code, I note that it will decide that it
hasn't got any valid statistics for a column if (1) it finds no NULL
values and (2) every single sampled value in the column is too wide
(more than WIDTH_THRESHOLD = 1024 bytes wide).  Does this describe your
problematic column?

It seems like the code is being too conservative here --- it could at
least generate valid values for stanullfrac and stawidth.  I'm inclined
to think maybe it should also set stadistinct = -1 (unique) in this
case, since the basic assumption that validates ignoring very wide
values is that they aren't duplicates.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Fixing pg_basebackup with tablespaces found in $PGDATA

2014-01-10 Thread Bruce Momjian
On Thu, Jan  2, 2014 at 03:34:04PM +0100, Bernd Helmle wrote:
 
 
 --On 1. Januar 2014 23:53:46 +0100 Dimitri Fontaine
 dimi...@2ndquadrant.fr wrote:
 
 Hi,
 
 As much as I've seen people frown upon $subject, it still happens in the
 wild, and Magnus seems to agree that the current failure mode of our
 pg_basebackup tool when confronted to the situation is a bug.
 
 So here's a fix, attached.
 
 I've seen having tablespaces under PGDATA as a policy within several
 data centres in the past. The main reasoning behind this seems that
 they strictly separate platform and database administration and for
 database inventory reasons. They are regularly surprised if you tell
 them to not use tablespaces in such a way, since they absorbed this
 practice over the years from other database systems. So +1 for
 fixing this.

FYI, this setup also causes problems for pg_upgrade.  There is a recent
thread about that that I will reply to.  The problem is that pre-9.2
servers get a mismatch between the symlink and the pg_tablespace path
when they rename the old cluster.

-- 
  Bruce Momjian  br...@momjian.ushttp://momjian.us
  EnterpriseDB http://enterprisedb.com

  + Everyone has their own god. +


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [PATCH] Negative Transition Aggregate Functions (WIP)

2014-01-10 Thread Florian Pflug
On Jan10, 2014, at 15:49 , Tom Lane t...@sss.pgh.pa.us wrote:
 Florian Pflug f...@phlo.org writes:
 Looking at the code it seems that for quite a few existing aggregates,
 the state remains NULL until the first non-NULL input is processed. But
 that doesn't hurt much - those aggregates can remain as they are until
 someone wants to add an inverse transfer function. Once that happens,
 there's a choice between living with needless rescans on trailing NULL
 inputs or changing the state type.
 
 Also, it might be reasonable for both the regular and the inverse
 transition functions to be strict.  If a null entering the window
 does not matter, then a null exiting the window doesn't either, no?

That's not true, I think, unless we're special-casing strict transition
functions somewhere. AFAICS, an aggregate with a strict transition function
will produce the state NULL whenever any of the inputs was NULL, i.e. we won't
ever transition out of the NULL state once we got there. The inverse
transition function, however, would *have* to be able to transition away
from the NULL state, which requires it to be non-strict. Otherwise, how would
the state aver become non-NULL after the last NULL input leaves the window?

best regards,
Florian Pflug



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [PATCH] Negative Transition Aggregate Functions (WIP)

2014-01-10 Thread Tom Lane
Florian Pflug f...@phlo.org writes:
 On Jan10, 2014, at 15:49 , Tom Lane t...@sss.pgh.pa.us wrote:
 Also, it might be reasonable for both the regular and the inverse
 transition functions to be strict.  If a null entering the window
 does not matter, then a null exiting the window doesn't either, no?

 That's not true, I think, unless we're special-casing strict transition
 functions somewhere. AFAICS, an aggregate with a strict transition function
 will produce the state NULL whenever any of the inputs was NULL, i.e. we won't
 ever transition out of the NULL state once we got there.

Nope, not the case; read xaggr.sgml and/or the CREATE AGGREGATE reference
page.  An aggregate with a strict transition function essentially just
ignores null input rows.  I suspect the inverse transition function could
just be made strict with a similar special-case rule (viz, keep the old
transition value when deleting a null input from the window); but maybe
I'm missing something and it has to work harder than that anyway.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Add CREATE support to event triggers

2014-01-10 Thread Robert Haas
On Fri, Jan 10, 2014 at 10:55 AM, Simon Riggs si...@2ndquadrant.com wrote:
 On 10 January 2014 15:48, Robert Haas robertmh...@gmail.com wrote:
 On Fri, Jan 10, 2014 at 10:36 AM, Simon Riggs si...@2ndquadrant.com wrote:
 On 8 January 2014 20:42, Alvaro Herrera alvhe...@2ndquadrant.com wrote:

 CREATE SCHEMA IF NOT EXISTS some schema AUTHORIZATION some guy;

 Hmm, given in 9.3 it was OK to have only DROP event triggers, I think
 it should be equally acceptable to have just CREATE, but without every
 option on CREATE.  CREATE SCHEMA is easily the most complex thing here
 and would be the command/event to deprioritise if we had any issues
 getting this done/agreeing something for 9.4.

 I don't know that I agree with that, but I guess we can cross that
 bridge when we come to it.

 We've come to it...

 You would prefer either everything or nothing?? On what grounds?

I hardly think I need to justify that position.  That's project policy
and always has been.  When somebody implements 50% of a feature, or
worse yet 95% of a feature, it violates the POLA for users and doesn't
always subsequently get completed, leaving us with long-term warts
that are hard to eliminate.  It's perfectly fine to implement a
feature incrementally if the pieces are individually self-consistent
and ideally even useful, but deciding to support every command except
one because the last one is hard to implement doesn't seem like a
principled approach to anything.  It's not even obvious to me that
CREATE SCHEMA is all that much harder than anything else and Alvaro
has not said that that's the only thing he can't implement (or why) so
I think it's entirely premature to make the decision now about which
way to proceed - but, OK, sure, if you want to force the issue now,
then yeah, I think it's better to have everything or nothing than to
have support for only some things justified by nothing more than
implementation complexity.

Aside from the general issue, in this particular case, I have
previously and repeatedly expressed concerns about regression test
coverage and suggested a path that would guarantee thorough regression
testing but which would require that support be complete for
everything present in our regression tests.  Although there may be
some other plan for guaranteeing thorough regression testing not only
now but going forward, I have not seen it proposed here.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [PATCH] Negative Transition Aggregate Functions (WIP)

2014-01-10 Thread Kevin Grittner
Florian Pflug f...@phlo.org wrote:
 Tom Lane t...@sss.pgh.pa.us wrote:
 Florian Pflug f...@phlo.org writes:
 For float 4 and float8, wasn't the consensus that the potential
 lossy-ness of addition makes this impossible anyway, even
 without the NaN issue? But...

 Well, that was my opinion, I'm not sure if it was consensus ;-).

 I'd say your example showing how it could produce completely
 bogus results was pretty convincing...

Aggregates on approximate (floating-point) numbers are not nearly
as consistent as many people probably assume.  Picture for a minute
a table where a column contains positive floating point numbers
happen to be located in the heap in increasing order, perhaps as
the result of a CLUSTER on an index on the column.  SELECT
sum(colname) FROM tablename; would tend to give the most accurate
answer possible when a sequence scan was run -- unless there
happened to be a seqscan already half-way through the heap.  Then
the result would be different.  So the exact same query against the
exact same data, with no intervening modifications or maintenance
activity could give one answer most of the time, and give various
other answers depending on concurrent SELECT queries.

Given that this is already the case with aggregates on floating
point approximate numbers, why should we rule out an optimization
which only makes rounding errors more likely to be visible?  The
real issue here is that if you are using an approximate data type
and expecting exact answers, you will have problems.

That's not to say that approximations are useless.  If you
represent the circumference of the earth with a double precision
number you're dealing with an expected rounding error of about a
foot.  That's close enough for many purposes.  The mistake is
assuming that it will be exact or that rounding errors cannot
accumulate.  In situations where SQL does not promise particular
ordering of operations, it should not be assumed; so any
expectations of a specific or repeatable result from a sum or
average of approximate numbers is misplaced.

 But NaN is an orthogonal problem I think.

Agreed.

--
Kevin Grittner
EDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Standalone synchronous master

2014-01-10 Thread Hannu Krosing
On 01/10/2014 05:09 PM, Simon Riggs wrote:
 On 10 January 2014 15:47, Bruce Momjian br...@momjian.us wrote:

 I know there was a desire to remove this TODO item, but I think we have
 brought up enough new issues that we can keep it to see if we can come
 up with a solution.
 Can you summarise what you think the new issues are? All I see is some
 further rehashing of old discussions.

 There is already a solution to the problem because the docs are
 already very clear that you need multiple standbys to achieve commit
 guarantees AND high availability. RTFM is usually used as some form of
 put down, but that is what needs to happen here.

If we want to get the guarantees that often come up in sync rep
discussions - namely that you can assume that your change is applied
on standby when commit returns - then we could implement this by
returning LSN from commit at protocol level and having an option in
queries on standby to wait for this LSN (again passed on wire below
the level of query)  to be applied.

This can be mostly hidden in drivers and would need very little effort
from end user to use. basically you tell the driver that one connection
is bound as the slave of another and driver can manage using the
right LSNs. That is the last LSN received from master is always
attached to queries on slaves.

Cheers

-- 
Hannu Krosing
PostgreSQL Consultant
Performance, Scalability and High Availability
2ndQuadrant Nordic OÜ



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [PATCH] Negative Transition Aggregate Functions (WIP)

2014-01-10 Thread Tom Lane
Kevin Grittner kgri...@ymail.com writes:
 Aggregates on approximate (floating-point) numbers are not nearly
 as consistent as many people probably assume.  Picture for a minute
 a table where a column contains positive floating point numbers
 happen to be located in the heap in increasing order, perhaps as
 the result of a CLUSTER on an index on the column.  SELECT
 sum(colname) FROM tablename; would tend to give the most accurate
 answer possible when a sequence scan was run -- unless there
 happened to be a seqscan already half-way through the heap.  Then
 the result would be different.

I don't think that argument holds any water.  In the first place, somebody
could turn off synchronize_seqscans if they needed to have the calculation
done the same way every time (and I recall questions from users who ended
up doing exactly that, shortly after we introduced synchronize_seqscans).
In the second place, for most use-cases it'd be pretty foolish to rely on
physical heap order, so somebody who was really trying to sum float8s
accurately would likely do

   select sum(x order by x) from ...

This is a well-defined, numerically stable calculation, and I don't want
to see us put in non-defeatable optimizations that break it.

 The real issue here is that if you are using an approximate data type
 and expecting exact answers, you will have problems.

That's a canard.  People who know what they're doing (admittedly a
minority) do not expect exact answers, but they do expect to be able to
specify how to do the calculation in a way that minimizes roundoff errors.
The inverse-transition-function approach breaks that, and it does so at a
level where the user can't work around it, short of building his own
aggregates.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] new json funcs

2014-01-10 Thread Alvaro Herrera
Is it just me, or is the json_array_element(json, int) function not
documented?

(Not a bug in this patch, I think ...)

-- 
Álvaro Herrerahttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Add CREATE support to event triggers

2014-01-10 Thread Simon Riggs
On 10 January 2014 17:07, Robert Haas robertmh...@gmail.com wrote:
 On Fri, Jan 10, 2014 at 10:55 AM, Simon Riggs si...@2ndquadrant.com wrote:
 On 10 January 2014 15:48, Robert Haas robertmh...@gmail.com wrote:
 On Fri, Jan 10, 2014 at 10:36 AM, Simon Riggs si...@2ndquadrant.com wrote:
 On 8 January 2014 20:42, Alvaro Herrera alvhe...@2ndquadrant.com wrote:

 CREATE SCHEMA IF NOT EXISTS some schema AUTHORIZATION some guy;

 Hmm, given in 9.3 it was OK to have only DROP event triggers, I think
 it should be equally acceptable to have just CREATE, but without every
 option on CREATE.  CREATE SCHEMA is easily the most complex thing here
 and would be the command/event to deprioritise if we had any issues
 getting this done/agreeing something for 9.4.

 I don't know that I agree with that, but I guess we can cross that
 bridge when we come to it.

 We've come to it...

 You would prefer either everything or nothing?? On what grounds?

 I hardly think I need to justify that position.

Yeh, you do. Everybody does.

 That's project policy
 and always has been.  When somebody implements 50% of a feature, or
 worse yet 95% of a feature, it violates the POLA for users and doesn't
 always subsequently get completed, leaving us with long-term warts
 that are hard to eliminate.

So why was project policy violated when we released 9.3 with only DROP
event support? Surely that was a worse violation of POLA than my
suggestion?

It's not reasonable to do something yourself and then object when
others suggest doing the same thing.

After 3 years we need something useful. I think the perfect being the
enemy of the good argument applies here after this length of time.

-- 
 Simon Riggs   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] new json funcs

2014-01-10 Thread Andrew Dunstan


On 01/10/2014 12:42 PM, Alvaro Herrera wrote:

Is it just me, or is the json_array_element(json, int) function not
documented?

(Not a bug in this patch, I think ...)




As discussed at the time, we didn't document the functions underlying 
the json operators, just the operators themselves.


cheers

andrew


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [PATCH] Negative Transition Aggregate Functions (WIP)

2014-01-10 Thread Florian Pflug
On Jan10, 2014, at 18:14 , Kevin Grittner kgri...@ymail.com wrote:
 Given that this is already the case with aggregates on floating
 point approximate numbers, why should we rule out an optimization
 which only makes rounding errors more likely to be visible?  The
 real issue here is that if you are using an approximate data type
 and expecting exact answers, you will have problems.

Because without the optimization, only the values which you 
*actually* process for a given result determine whether you lose
precision or not. With the optimization, OTOH, values which have
*nothing* to do with the result in question can nevertheless
make it completely bogus.

SUM() is a good example. As long as all your values are positive,
the amount of precision you lose is bound by the number of input
values. If I sum over 10 values, the worst that can happen is that
the first values is large enough to prevent the other 9 values from
influencing the result. That limits the relative error to something
like 9*epsilon, where epsilon is the relative precision of the
floating point type, i.e. 1e-15 or so for double. In other words,
as long as your frames are less than 10e13 rows long, the relative
error will stay below 1%.

But with the optimization, that is no longer true. If you sum
from, say, CURRENT ROW to UNBOUNDED FOLLOWING, the relative error
of the result in one row now depends on the magnitude of values
*preceding* that row, even though that value isn't in the frame.
And since we now internally subtract, not only add, the relative
error is no longer bound by the number of rows in the frame.

Here's the corresponding SELECT (which is basically the same
as Tom's example upthread):
 select
   n,
   x::float,
   sum(x::float) over (
order by n rows between current row and unbounded following
   )
  from (values
(1, 1e20),
(2, 1),
(3, 2)
  ) as t(n, x)
  order by n;

Currently that returns
   n |   x   |  sum  
  ---+---+---
   1 | 1e+20 | 1e+20
   2 | 1 | 3
   3 | 2 | 2

but with an inverse transfer function, it may very well return
   n |   x   |  sum  
  ---+---+---
   1 | 1e+20 | 1e+20
   2 | 1 | 0
   3 | 2 |-1

 That's not to say that approximations are useless.  If you
 represent the circumference of the earth with a double precision
 number you're dealing with an expected rounding error of about a
 foot.  That's close enough for many purposes.  The mistake is
 assuming that it will be exact or that rounding errors cannot
 accumulate.  In situations where SQL does not promise particular
 ordering of operations, it should not be assumed; so any
 expectations of a specific or repeatable result from a sum or
 average of approximate numbers is misplaced.

But this isn't about ordering, it's replacing one computation
with a completely different one that just happens to be equivalent
*algebraically*.

To me, the proposed optimization for float is akin to C compiler
which decided to evaluate
  a + b + c + … z
as
  -a + (2a - b) + (2b - c) + … + (2y - z) + 2z
Algebraically, these are the same, but it'd still be insane to
do that.

best regards,
Florian Pflug



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [PATCH] Negative Transition Aggregate Functions (WIP)

2014-01-10 Thread Tom Lane
I wrote:
 Kevin Grittner kgri...@ymail.com writes:
 The real issue here is that if you are using an approximate data type
 and expecting exact answers, you will have problems.

 That's a canard.  People who know what they're doing (admittedly a
 minority) do not expect exact answers, but they do expect to be able to
 specify how to do the calculation in a way that minimizes roundoff errors.
 The inverse-transition-function approach breaks that, and it does so at a
 level where the user can't work around it, short of building his own
 aggregates.

Although, having said that ... maybe build your own aggregate would
be a reasonable suggestion for people who need this?  I grant that
it's going to be a minority requirement, maybe even a small minority
requirement.  People who have the chops to get this sort of thing right
can probably manage a custom aggregate definition.

The constraint this would pose on the float4 and float8 implementations
is that it be possible to use their transition and final functions in
a custom aggregate declaration while leaving off the inverse function;
or, if that combination doesn't work for some reason, we have to continue
to provide the previous transition/final functions for use in user
aggregates.

Suitable documentation would be needed too, of course.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Add CREATE support to event triggers

2014-01-10 Thread Robert Haas
On Fri, Jan 10, 2014 at 12:59 PM, Simon Riggs si...@2ndquadrant.com wrote:
 That's project policy
 and always has been.  When somebody implements 50% of a feature, or
 worse yet 95% of a feature, it violates the POLA for users and doesn't
 always subsequently get completed, leaving us with long-term warts
 that are hard to eliminate.

 So why was project policy violated when we released 9.3 with only DROP
 event support? Surely that was a worse violation of POLA than my
 suggestion?

Well, obviously I didn't think so at the time, or I would have
objected.  I felt, and still feel, that implementing one kind of event
trigger (drop) does not necessarily require implementing another kind
(create).  I think that's clearly different from implementing either
one for only some object types.

This event trigger will be called whenever an object is dropped is a
reasonable contract with the user.  This other event trigger will be
called whenever an object is created, unless it happens to be a
schema is much less reasonable.

At least in my opinion.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [ANNOUNCE] IMCS: In Memory Columnar Store for PostgreSQL

2014-01-10 Thread Robert Haas
On Thu, Jan 9, 2014 at 12:46 PM, Claudio Freire klaussfre...@gmail.com wrote:
 On Thu, Jan 9, 2014 at 2:22 PM, Robert Haas robertmh...@gmail.com wrote:
 It would be nice to have better operating system support for this.
 For example, IIUC, 64-bit Linux has 128TB of address space available
 for user processes.  When you clone(), it can either share the entire
 address space (i.e. it's a thread) or none of it (i.e. it's a
 process).  There's no option to, say, share 64TB and not the other
 64TB, which would be ideal for us.  We could then map dynamic shared
 memory segments into the shared portion of the address space and do
 backend-private allocations in the unshared part.  Of course, even if
 we had that, it wouldn't be portable, so who knows how much good it
 would do.  But it would be awfully nice to have the option.

 You can map a segment at fork time, and unmap it after forking. That
 doesn't really use RAM, since it's supposed to be lazily allocated (it
 can be forced to be so, I believe, with PROT_NONE and MAP_NORESERVE,
 but I don't think that's portable).

 That guarantees it's free.

It guarantees that it is free as of the moment you unmap it, but it
doesn't guarantee that future memory allocations or shared library
loads couldn't stomp on the space.

Also, that not-portable thing is a bit of a problem.  I've got no
problem with the idea that third-party code may be platform-specific,
but I think the stuff we ship in core has got to work on more or less
all reasonably modern systems.

 Next, you can map shared memory at explicit addresses (linux's mmap
 has support for that, and I seem to recall Windows did too).

 All you have to do, is some book-keeping in shared memory (so all
 processes can coordinate new mappings).

I did something like this back in 1998 or 1999 at the operating system
level, and it turned out not to work very well.  I was working on an
experimental research operating system kernel, and we wanted to add
support for mmap(), so we set aside a portion of the virtual address
space for file mappings.  That region was shared across all processes
in the system.  One problem is that there's no guarantee the space is
big enough for whatever you want to map; and the other problem is that
it can easily get fragmented.  Now, 64-bit address spaces go some way
to ameliorating these concerns so maybe it can be made to work, but I
would be a teeny bit cautious about using the word just to describe
the complexity involved.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [ANNOUNCE] IMCS: In Memory Columnar Store for PostgreSQL

2014-01-10 Thread Robert Haas
On Thu, Jan 9, 2014 at 2:09 PM, Amit Kapila amit.kapil...@gmail.com wrote:
 On Thu, Jan 9, 2014 at 12:21 AM, Robert Haas robertmh...@gmail.com wrote:
 On Tue, Jan 7, 2014 at 10:20 PM, Amit Kapila amit.kapil...@gmail.com wrote:
 On Tue, Jan 7, 2014 at 2:46 AM, Robert Haas robertmh...@gmail.com wrote:

 Well, right now we just reopen the same object from all of the
 processes, which seems to work fine and doesn't require any of this
 complexity.  The only problem I don't know how to solve is how to make
 a segment stick around for the whole postmaster lifetime.  If
 duplicating the handle into the postmaster without its knowledge gets
 us there, it may be worth considering, but that doesn't seem like a
 good reason to rework the rest of the existing mechanism.

 I think one has to try this to see if it works as per the need. If it's not
 urgent, I can try this early next week?

 Anything we want to get into 9.4 has to be submitted by next Tuesday,
 but I don't know that we're going to get this into 9.4.

 Using DuplicateHandle(), we can make segment stick for Postmaster
 lifetime. I have used below test (used dsm_demo module) to verify:
 Session - 1
 select dsm_demo_create('this message is from session-1');
  dsm_demo_create
 -
82712

 Session - 2
 -
 select dsm_demo_read(82712);
dsm_demo_read
 
  this message is from session-1
 (1 row)

 Session-1
 \q

 -- till here it will work without DuplicateHandle as well

 Session -2
 select dsm_demo_read(82712);
dsm_demo_read
 
  this message is from session-1
 (1 row)

 Session -2
 \q

 Session -3
 select dsm_demo_read(82712);
dsm_demo_read
 
  this message is from session-1
 (1 row)

 -- above shows that handle stays around.

 Note -
 Currently I have to bypass below code in dam_attach(), as it assumes
 segment will not stay if it's removed from control file.

 /*
 * If we didn't find the handle we're looking for in the control
 * segment, it probably means that everyone else who had it mapped,
 * including the original creator, died before we got to this point.
 * It's up to the caller to decide what to do about that.
 */
 if (seg-control_slot == INVALID_CONTROL_SLOT)
 {
 dsm_detach(seg);
 return NULL;
 }


 Could you let me know what exactly you are expecting in patch,
 just a call to DuplicateHandle() after CreateFileMapping() or something
 else as well?

Well, I guess what I was thinking is that we could have a call
dsm_keep_segment() which would be invoked on an already-created
dsm_segment *.  On Linux, that would just bump the reference count in
the control segment up by one so that it doesn't get destroyed until
postmaster shutdown.  On Windows it may as well still do that for
consistency, but will also need to do this DuplicateHandle() trick.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] new json funcs

2014-01-10 Thread Tom Lane
Andrew Dunstan and...@dunslane.net writes:
 On 01/10/2014 12:42 PM, Alvaro Herrera wrote:
 Is it just me, or is the json_array_element(json, int) function not
 documented?

 As discussed at the time, we didn't document the functions underlying 
 the json operators, just the operators themselves.

I see though that json_array_element has a DESCR comment.  I believe
project policy is that if a function is not meant to be invoked by name
but only through an operator, its pg_description entry should just be
implementation of xyz operator, with the real comment attached only
to the operator.  Otherwise \df users are likely to be misled into using
the function when they're not really supposed to; and at the very least
they will bitch about its lack of documentation.

See commits 94133a935414407920a47d06a6e22734c974c3b8 and
908ab80286401bb20a519fa7dc7a837631f20369.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] nested hstore patch

2014-01-10 Thread Robert Haas
On Thu, Jan 9, 2014 at 5:08 PM, Andrew Dunstan and...@dunslane.net wrote:
  * I have replicated all the json processing functions for jsonb
(although not the json generating functions, such as to_json). Most
of these currently work by turning the jsonb back into json and then
processing as before. I am sorting out some technical issues and
hope to have all of these rewritten to use the native jsonb API in a
few days time.
  * We still need to document jsonb. That too I hope will be done quite
shortly.
  * The jsonb regression test currently contains U+ABCD - I guess we'd
better use some hex encoding or whatever for that - unlike json, the
jsonb de-serializer dissolves unicode escapes.

How does that work if the server encoding isn't UTF-8?

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [ANNOUNCE] IMCS: In Memory Columnar Store for PostgreSQL

2014-01-10 Thread Claudio Freire
On Fri, Jan 10, 2014 at 3:23 PM, Robert Haas robertmh...@gmail.com wrote:
 On Thu, Jan 9, 2014 at 12:46 PM, Claudio Freire klaussfre...@gmail.com 
 wrote:
 On Thu, Jan 9, 2014 at 2:22 PM, Robert Haas robertmh...@gmail.com wrote:
 It would be nice to have better operating system support for this.
 For example, IIUC, 64-bit Linux has 128TB of address space available
 for user processes.  When you clone(), it can either share the entire
 address space (i.e. it's a thread) or none of it (i.e. it's a
 process).  There's no option to, say, share 64TB and not the other
 64TB, which would be ideal for us.  We could then map dynamic shared
 memory segments into the shared portion of the address space and do
 backend-private allocations in the unshared part.  Of course, even if
 we had that, it wouldn't be portable, so who knows how much good it
 would do.  But it would be awfully nice to have the option.

 You can map a segment at fork time, and unmap it after forking. That
 doesn't really use RAM, since it's supposed to be lazily allocated (it
 can be forced to be so, I believe, with PROT_NONE and MAP_NORESERVE,
 but I don't think that's portable).

 That guarantees it's free.

 It guarantees that it is free as of the moment you unmap it, but it
 doesn't guarantee that future memory allocations or shared library
 loads couldn't stomp on the space.

You would only unmap prior to remapping, only the to-be-mapped
portion, so I don't see a problem.

 Also, that not-portable thing is a bit of a problem.  I've got no
 problem with the idea that third-party code may be platform-specific,
 but I think the stuff we ship in core has got to work on more or less
 all reasonably modern systems.

 Next, you can map shared memory at explicit addresses (linux's mmap
 has support for that, and I seem to recall Windows did too).

 All you have to do, is some book-keeping in shared memory (so all
 processes can coordinate new mappings).

 I did something like this back in 1998 or 1999 at the operating system
 level, and it turned out not to work very well.  I was working on an
 experimental research operating system kernel, and we wanted to add
 support for mmap(), so we set aside a portion of the virtual address
 space for file mappings.  That region was shared across all processes
 in the system.  One problem is that there's no guarantee the space is
 big enough for whatever you want to map; and the other problem is that
 it can easily get fragmented.  Now, 64-bit address spaces go some way
 to ameliorating these concerns so maybe it can be made to work, but I
 would be a teeny bit cautious about using the word just to describe
 the complexity involved.

Ok, yes, fragmentation could be an issue if the address range is not
humongus enough.


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] nested hstore patch

2014-01-10 Thread Andrew Dunstan


On 01/10/2014 01:29 PM, Robert Haas wrote:

On Thu, Jan 9, 2014 at 5:08 PM, Andrew Dunstan and...@dunslane.net wrote:

  * The jsonb regression test currently contains U+ABCD - I guess we'd
better use some hex encoding or whatever for that - unlike json, the
jsonb de-serializer dissolves unicode escapes.

How does that work if the server encoding isn't UTF-8?



There is a jsonb_1.out file for the non-utf8 case, just as there is a 
json_1.out for the same case. Unicode escapes for non-ascii characters 
are forbidden in jsonb as they are in json, if the encoding isn't utf8.


FYI, we are actually using the json lexing and parsing mechanism, so 
that these types will accept exactly the same inputs. However, since 
we're not storing json text in jsonb, but instead the decomposed 
elements, the unicode escapes are resolved in the stored values.


I already have a fix for the point above (see 
https://github.com/feodor/postgres/commit/7d5b8f12747b4a75e8b32914340d07617f1af302) 
and it will be included in the next version of the patch.


cheers

andrew


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] INSERT...ON DUPLICATE KEY LOCK FOR UPDATE

2014-01-10 Thread Peter Geoghegan
On Fri, Jan 10, 2014 at 7:12 AM, Heikki Linnakangas
hlinnakan...@vmware.com wrote:
 I'm getting deadlocks with this patch, using the test script you posted
 earlier in
 http://www.postgresql.org/message-id/CAM3SWZQh=8xnvgbbzyhjexujbhwznjutjez9t-dbo9t_mx_...@mail.gmail.com.
 Am doing something wrong, or is that a regression?

Yes. The point of that test case was that it made your V1 livelock
(which you fixed), not deadlock in a way detected by the deadlock
detector, which is the correct behavior.

This testcase was the one that showed up *unprincipled* deadlocking:

http://www.postgresql.org/message-id/cam3swzshbe29kpod44cvc3vpzjgmder6k_6fghiszeozgmt...@mail.gmail.com

I'd focus on that test case.
-- 
Peter Geoghegan


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] new json funcs

2014-01-10 Thread Andrew Dunstan


On 01/10/2014 01:27 PM, Tom Lane wrote:

Andrew Dunstan and...@dunslane.net writes:

On 01/10/2014 12:42 PM, Alvaro Herrera wrote:

Is it just me, or is the json_array_element(json, int) function not
documented?

As discussed at the time, we didn't document the functions underlying
the json operators, just the operators themselves.

I see though that json_array_element has a DESCR comment.  I believe
project policy is that if a function is not meant to be invoked by name
but only through an operator, its pg_description entry should just be
implementation of xyz operator, with the real comment attached only
to the operator.  Otherwise \df users are likely to be misled into using
the function when they're not really supposed to; and at the very least
they will bitch about its lack of documentation.

See commits 94133a935414407920a47d06a6e22734c974c3b8 and
908ab80286401bb20a519fa7dc7a837631f20369.





OK, I can fix that I guess.

cheers

andrew



--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [ANNOUNCE] IMCS: In Memory Columnar Store for PostgreSQL

2014-01-10 Thread Robert Haas
On Fri, Jan 10, 2014 at 1:35 PM, Claudio Freire klaussfre...@gmail.com wrote:
 You can map a segment at fork time, and unmap it after forking. That
 doesn't really use RAM, since it's supposed to be lazily allocated (it
 can be forced to be so, I believe, with PROT_NONE and MAP_NORESERVE,
 but I don't think that's portable).

 That guarantees it's free.

 It guarantees that it is free as of the moment you unmap it, but it
 doesn't guarantee that future memory allocations or shared library
 loads couldn't stomp on the space.

 You would only unmap prior to remapping, only the to-be-mapped
 portion, so I don't see a problem.

OK, yeah, that way works.  That's more or less what Noah proposed
before.  But I was skeptical it would work well everywhere.  I suppose
we won't know until somebody tries it.  (I didn't.)

 Ok, yes, fragmentation could be an issue if the address range is not
 humongus enough.

I've often thought that 64-bit machines are so capable that there's no
reason to go any higher.  But lately I've started to wonder.  There
are already machines out there with 2^40 bytes of physical memory,
and the number just keeps creeping up.  When you reserve a couple of
bits to indicate user or kernel space, and then consider that virtual
address space can be many times larger than physical memory, it starts
not to seem like that much.

But I'm not that excited about the amount of additional memory we'll
eat when somebody decides to make a pointer 16 bytes.  Ugh.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [PATCH] Negative Transition Aggregate Functions (WIP)

2014-01-10 Thread Florian Pflug
On Jan10, 2014, at 19:08 , Tom Lane t...@sss.pgh.pa.us wrote:
 I wrote:
 Kevin Grittner kgri...@ymail.com writes:
 The real issue here is that if you are using an approximate data type
 and expecting exact answers, you will have problems.
 
 That's a canard.  People who know what they're doing (admittedly a
 minority) do not expect exact answers, but they do expect to be able to
 specify how to do the calculation in a way that minimizes roundoff errors.
 The inverse-transition-function approach breaks that, and it does so at a
 level where the user can't work around it, short of building his own
 aggregates.
 
 Although, having said that ... maybe build your own aggregate would
 be a reasonable suggestion for people who need this?  I grant that
 it's going to be a minority requirement, maybe even a small minority
 requirement.  People who have the chops to get this sort of thing right
 can probably manage a custom aggregate definition.

So we'd put a footgun into the hands of people who don't know what they're
doing, to be fired for performance's sake, and leave it to the people
who know what they are doing to put the safety on?

best regards,
Florian Pflug


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] new json funcs

2014-01-10 Thread Tom Lane
Andrew Dunstan and...@dunslane.net writes:
 On 01/10/2014 01:27 PM, Tom Lane wrote:
 See commits 94133a935414407920a47d06a6e22734c974c3b8 and
 908ab80286401bb20a519fa7dc7a837631f20369.

 OK, I can fix that I guess.

Sure, just remove the DESCR comments for the functions that aren't meant
to be used directly.

I don't think this is back-patchable, but it's a minor point, so at least
for me a fix in HEAD is sufficient.

I wonder whether we should add an opr_sanity test verifying that operator
implementation functions don't have their own comments?  The trouble is
that there are a few that are supposed to, but maybe that list is stable
enough that it'd be okay to memorialize in the expected output.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Standalone synchronous master

2014-01-10 Thread Joshua D. Drake


On 01/10/2014 07:47 AM, Bruce Momjian wrote:


I know there was a desire to remove this TODO item, but I think we have
brought up enough new issues that we can keep it to see if we can come
up with a solution.  I have added a link to this discussion on the TODO
item.

I think we will need at least four new GUC variables:

*  timeout control for degraded mode
*  command to run during switch to degraded mode
*  command to run during switch from degraded mode
*  read-only variable to report degraded mode



I know I am the one that instigated all of this so I want to be very 
clear on what I and what I am confident that my customers would expect.


If a synchronous slave goes down, the master continues to operate. That 
is all. I don't care if it is configurable (I would be fine with that). 
I don't care if it is not automatic (e.g; slave goes down and we have to 
tell the master to continue).


I have read through this thread more than once, and I have also went 
back to the docs. I understand why we do it the way we do it. I also 
understand that from a business requirement for 99% of CMD's customers, 
it's wrong. At least in the sense of providing continuity of service.


Sincerely,

JD

--
Command Prompt, Inc. - http://www.commandprompt.com/  509-416-6579
PostgreSQL Support, Training, Professional Services and Development
High Availability, Oracle Conversion, Postgres-XC, @cmdpromptinc
In a time of universal deceit - telling the truth is a revolutionary 
act., George Orwell



--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] new json funcs

2014-01-10 Thread Alvaro Herrera
Tom Lane wrote:

 I wonder whether we should add an opr_sanity test verifying that operator
 implementation functions don't have their own comments?  The trouble is
 that there are a few that are supposed to, but maybe that list is stable
 enough that it'd be okay to memorialize in the expected output.

+1.  It's an easy rule to overlook.

-- 
Álvaro Herrerahttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] new json funcs

2014-01-10 Thread Alvaro Herrera
Andrew Dunstan wrote:
 
 On 01/10/2014 12:42 PM, Alvaro Herrera wrote:
 Is it just me, or is the json_array_element(json, int) function not
 documented?
 
 As discussed at the time, we didn't document the functions
 underlying the json operators, just the operators themselves.

Oh, I see.  That's fine with me.  From the source code it's hard to see
when a SQL-callable function is only there to implement an operator,
though (and it seems a bit far-fetched to suppose that the developer
will think, upon seeing an undocumented function, oh this must
implement some operator, I will look it up at pg_proc.h).

I think the operator(s) should be mentioned in the comment on top of the
function.

-- 
Álvaro Herrerahttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [ANNOUNCE] IMCS: In Memory Columnar Store for PostgreSQL

2014-01-10 Thread Tom Lane
Robert Haas robertmh...@gmail.com writes:
 I've often thought that 64-bit machines are so capable that there's no
 reason to go any higher.  But lately I've started to wonder.  There
 are already machines out there with 2^40 bytes of physical memory,
 and the number just keeps creeping up.  When you reserve a couple of
 bits to indicate user or kernel space, and then consider that virtual
 address space can be many times larger than physical memory, it starts
 not to seem like that much.

 But I'm not that excited about the amount of additional memory we'll
 eat when somebody decides to make a pointer 16 bytes.  Ugh.

Once you really need that, you're not going to care about doubling
the size of pointers.  At worst, you're giving up 1 bit of address
space to gain 64 more.

(Still, I rather doubt it'll happen in my lifetime.)

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [PATCH] Negative Transition Aggregate Functions (WIP)

2014-01-10 Thread Tom Lane
Florian Pflug f...@phlo.org writes:
 On Jan10, 2014, at 19:08 , Tom Lane t...@sss.pgh.pa.us wrote:
 Although, having said that ... maybe build your own aggregate would
 be a reasonable suggestion for people who need this?  I grant that
 it's going to be a minority requirement, maybe even a small minority
 requirement.  People who have the chops to get this sort of thing right
 can probably manage a custom aggregate definition.

 So we'd put a footgun into the hands of people who don't know what they're
 doing, to be fired for performance's sake, and leave it to the people
 who know what they are doing to put the safety on?

If I may put words in Kevin's mouth, I think his point is that having
float8 sum() at all is a foot-gun, and that's hard to deny.  You need
to know how to use it safely.

A compromise compromise might be to provide these alternative safer
aggregates built-in.  Or, depending on what color you like your bikeshed,
leave the standard aggregates alone and define fast_sum etc for the less
safe versions.  In any case it'd be incumbent on us to document the
tradeoffs.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] new json funcs

2014-01-10 Thread Andrew Dunstan


On 01/10/2014 01:58 PM, Tom Lane wrote:

Andrew Dunstan and...@dunslane.net writes:

On 01/10/2014 01:27 PM, Tom Lane wrote:

See commits 94133a935414407920a47d06a6e22734c974c3b8 and
908ab80286401bb20a519fa7dc7a837631f20369.

OK, I can fix that I guess.

Sure, just remove the DESCR comments for the functions that aren't meant
to be used directly.

I don't think this is back-patchable, but it's a minor point, so at least
for me a fix in HEAD is sufficient.

I wonder whether we should add an opr_sanity test verifying that operator
implementation functions don't have their own comments?  The trouble is
that there are a few that are supposed to, but maybe that list is stable
enough that it'd be okay to memorialize in the expected output.





Well, that would be ok as long as there was a comment in the file so 
that developers don't just think it's OK to extend the list (it's a bit 
like part of the reason we don't allow shift/reduce conflicts - if we 
allowed them people would just keep adding more, and they wouldn't stick 
out like a sore thumb.)


The comment in the current test says:

   -- Check that operators' underlying functions have suitable comments,
   -- namely 'implementation of XXX operator'.  In some cases involving
   legacy
   -- names for operators, there are multiple operators referencing the
   same
   -- pg_proc entry, so ignore operators whose comments say they are
   deprecated.
   -- We also have a few functions that are both operator support and
   meant to
   -- be called directly; those should have comments matching their
   operator.

The history here is that originally I was intending to have these 
functions documented, and so the descriptions were made to match the 
operator descriptions, so that we didn't get a failure on this test. 
Later we decided not to document them as part of last release's 
bike-shedding, but the function descriptions didn't get changed / removed.


cheers

andrew


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] INSERT...ON DUPLICATE KEY LOCK FOR UPDATE

2014-01-10 Thread Heikki Linnakangas

On 01/10/2014 08:37 PM, Peter Geoghegan wrote:

On Fri, Jan 10, 2014 at 7:12 AM, Heikki Linnakangas
hlinnakan...@vmware.com wrote:

I'm getting deadlocks with this patch, using the test script you posted
earlier in
http://www.postgresql.org/message-id/CAM3SWZQh=8xnvgbbzyhjexujbhwznjutjez9t-dbo9t_mx_...@mail.gmail.com.
Am doing something wrong, or is that a regression?


Yes. The point of that test case was that it made your V1 livelock
(which you fixed), not deadlock in a way detected by the deadlock
detector, which is the correct behavior.


Oh, ok. Interesting. With the patch version I posted today, I'm not 
getting deadlocks. I'm not getting duplicates in the table either, so it 
looks like the promise tuple approach somehow avoids the deadlocks, 
while the btreelock patch does not.


Why does it deadlock with the btreelock patch? I don't see why it 
should. If you have two backends inserting a single tuple, and they 
conflict, one of them should succeed to insert, and the other one should 
update.


- Heikki


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Time to do our Triage for 9.4

2014-01-10 Thread Josh Berkus
All,

To make this easier for everyone to participate in, I've created a wiki
page:

https://wiki.postgresql.org/wiki/9.4CF4Triage

Please add the patches you know well to the appropriate list, thanks!

-- 
Josh Berkus
PostgreSQL Experts Inc.
http://pgexperts.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] new json funcs

2014-01-10 Thread Tom Lane
Alvaro Herrera alvhe...@2ndquadrant.com writes:
 Oh, I see.  That's fine with me.  From the source code it's hard to see
 when a SQL-callable function is only there to implement an operator,
 though (and it seems a bit far-fetched to suppose that the developer
 will think, upon seeing an undocumented function, oh this must
 implement some operator, I will look it up at pg_proc.h).

 I think the operator(s) should be mentioned in the comment on top of the
 function.

Oh, you're complaining about the lack of any header comment for the
function in the source code.  That's a different matter from the
user-visible docs, but I agree that it's poor practice to not have
anything.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] new json funcs

2014-01-10 Thread Tom Lane
Andrew Dunstan and...@dunslane.net writes:
 The history here is that originally I was intending to have these 
 functions documented, and so the descriptions were made to match the 
 operator descriptions, so that we didn't get a failure on this test. 
 Later we decided not to document them as part of last release's 
 bike-shedding, but the function descriptions didn't get changed / removed.

Ah.  I suppose there's no way to cross-check the state of the function's
pg_description comment against whether it has SGML documentation :-(

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] new json funcs

2014-01-10 Thread David Fetter
On Fri, Jan 10, 2014 at 02:39:12PM -0500, Tom Lane wrote:
 Andrew Dunstan and...@dunslane.net writes:
  The history here is that originally I was intending to have these 
  functions documented, and so the descriptions were made to match the 
  operator descriptions, so that we didn't get a failure on this test. 
  Later we decided not to document them as part of last release's 
  bike-shedding, but the function descriptions didn't get changed / removed.
 
 Ah.  I suppose there's no way to cross-check the state of the function's
 pg_description comment against whether it has SGML documentation :-(

FDWs to the rescue!

Cheers,
David.
-- 
David Fetter da...@fetter.org http://fetter.org/
Phone: +1 415 235 3778  AIM: dfetter666  Yahoo!: dfetter
Skype: davidfetter  XMPP: david.fet...@gmail.com
iCal: webcal://www.tripit.com/feed/ical/people/david74/tripit.ics

Remember to vote!
Consider donating to Postgres: http://www.postgresql.org/about/donate


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] INSERT...ON DUPLICATE KEY LOCK FOR UPDATE

2014-01-10 Thread Peter Geoghegan
On Fri, Jan 10, 2014 at 11:28 AM, Heikki Linnakangas
hlinnakan...@vmware.com wrote:
 Why does it deadlock with the btreelock patch? I don't see why it should. If
 you have two backends inserting a single tuple, and they conflict, one of
 them should succeed to insert, and the other one should update.

Are you sure that it doesn't make your patch deadlock too, with enough
pressure? I've made that mistake myself.

That test-case made my patch deadlock (in a detected fashion) when it
used buffer locks as a value locking prototype - I say as much right
there in the November mail you linked to. I think that's acceptable,
because it's non-sensible use of the feature (my point was only that
it shouldn't livelock). The test case is naively locking a row without
knowing ahead of time (or pro-actively checking) if the conflict is on
the first or second unique index. So before too long, you're updating
the wrong row (no existing lock is really held), based on the 'a'
column's projected value, when in actuality the conflict was on the
'b' column's projected value. Conditions are right for deadlock,
because two rows are locked, not one.

Although I have not yet properly considered your most recent revision,
I can't imagine why the same would not apply there, since the row
locking component is (probably) still identical. Granted, that
distinction between row locking and value locking is a bit fuzzy in
your approach, but if you happened to not insert any rows in any
previous iterations (i.e. there were no unfilled promise tuples), and
you happened to perform conflict handling first, it could still
happen, albeit with lower probability, no?

-- 
Peter Geoghegan


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [PATCH] Negative Transition Aggregate Functions (WIP)

2014-01-10 Thread Mark Dilger
FYI, I'm using the verb rewind to talk about using the negative transition 
aggregation function to get a prior value.  I don't know if this is the right 
verb.

Conceptually, when aggregating over floating point numbers, there is some 
infinitely precise theoretical value, and the computation is approximating it.  
Call the infinitely precise value 'r'.  Call the computed value 'c', which is 
the result of the aggregation function.  (For float4_agg and float8_agg, 
typeof(c) == float8).

The problem you have with rewinding an aggregate is that you don't know if you 
are getting the same value of c that you would have gotten from a
 rescan.  But if you have a type that tracks a margin [min,max] where 
typeof(min) == typeof(max) is higher precision than typeof(c), then you can 
track:

min = r = max

By setting the rounding mode down, then up, when computing the next value of 
min and max, respectively.  (Extra flag bits or booleans could track whether 
you've encountered +inf, -inf, Nan, and any other oddball cases, with 
corresponding special logic that has been discussed already upthread.)  In many 
but not all cases:

min != max

but

(typeof(c))min == (typeof(c))max

Because the margin of error is small enough not to render different values when 
cast to the lower precision typeof(c).

You could rewind the aggregation whenever this second case holds, and only 
force a rescan when it does not.  This would render the same results for 
queries whether they were performed with rewinds or with rescans.  The results 
might differ from older versions of postgres, but only in that they might be 
more accurate, with less accumulated rounding errors, owing to the higher 
precision state transition variable.

For many modern platforms, typeof(min)
 could be __float128 using libquadmath, or something similar to that.  If not 
available at compile time, it could be float64 instead.  Even then, you'd still 
know that rewinding was possible when min == max and not otherwise, which is 
useful for cases of aggregation over exact values.

I admit I've done a bit of handwaving on the computation of the margin and the 
handling of floating-point rounding issues, but I believe the implementation 
details are tractible.


mark




On Friday, January 10, 2014 10:10 AM, Florian Pflug f...@phlo.org wrote:
 
On Jan10, 2014, at 18:14 , Kevin Grittner kgri...@ymail.com wrote:
 Given that this is already the case with aggregates on floating
 point approximate numbers, why should we rule out an optimization
 which only makes rounding errors more likely to be visible?  The
 real issue here is that if you are using an approximate data type
 and expecting exact answers, you will have problems.

Because without the optimization, only the values which you 
*actually* process for a given result determine whether you lose
precision or not. With the optimization, OTOH, values which have
*nothing* to do with the result in question can nevertheless
make it completely bogus.

SUM() is a good example. As long as all your values are positive,
the amount of precision you lose is bound by the number of input
values. If I sum over 10 values, the worst that can happen is that
the first values is large enough to prevent the other 9 values from
influencing the result. That limits the relative error to something
like 9*epsilon, where epsilon is the relative precision of the
floating point type, i.e. 1e-15 or so for double. In other words,
as long as your frames are less than 10e13 rows long, the relative
error will stay below 1%.

But with the optimization, that is no longer true. If you sum
from, say, CURRENT ROW to UNBOUNDED FOLLOWING, the relative error
of the result in one row now depends on the magnitude of values
*preceding* that row, even though that value isn't in the frame.
And since we now internally subtract, not only add, the relative
error is no longer bound by the number of rows in the frame.

Here's the corresponding SELECT (which is basically the same
as Tom's example upthread):
select
   n,
   x::float,
   sum(x::float) over (
    order by n rows between current row and unbounded following
   )
  from (values
    (1, 1e20),
    (2, 1),
    (3, 2)
  ) as t(n, x)
  order by n;

Currently that returns
   n |   x   |  sum  
  ---+---+---
   1 | 1e+20 | 1e+20
   2 |     1 |     3
   3 |     2 |     2

but with an inverse transfer function, it may very well return
   n |   x   |  sum  
  ---+---+---
   1 | 1e+20 | 1e+20
   2 |     1 |     0
   3 |     2 |    -1

 That's not to say that approximations are
 useless.  If you
 represent the circumference of the earth with a double precision
 number you're dealing with an expected rounding error of about a
 foot.  That's close enough for many purposes.  The mistake is
 assuming that it will be exact or that rounding errors cannot
 accumulate.  In situations where SQL does not promise particular
 ordering of operations, it should not be assumed; so any
 expectations of a specific 

Re: [HACKERS] [COMMITTERS] pgsql: Upgrade to Autoconf 2.69

2014-01-10 Thread Bruce Momjian
On Tue, Jan  7, 2014 at 08:19:49AM -0500, Peter Eisentraut wrote:
 That was probably me. I'll look into it.
 
 
 
  On Jan 6, 2014, at 11:40 PM, Tom Lane t...@sss.pgh.pa.us wrote:
  
  Bruce Momjian br...@momjian.us writes:
  On Sun, Dec 29, 2013 at 02:48:21AM -0500, Tom Lane wrote:
  3. pg_upgrade ignores the fact that pg_resetxlog failed, and keeps going.
  
  Does pg_resetxlog return a non-zero exit status?  If so, pg_upgrade
  should have caught that and exited.
  
  It certainly does:
  
 if (errno)
 {
 fprintf(stderr, _(%s: could not read from directory \%s\: %s\n),
 progname, XLOGDIR, strerror(errno));
 exit(1);
 }
  
  The bug is that pg_upgrade appears to assume (in many places not just this
  one) that exec_prog() will abort if the called program fails, but *it
  doesn't*, contrary to the claim in its own header comment.  This is
  because pg_log(FATAL, ...) doesn't call exit().  pg_fatal() does, but
  that's not what's being called in the throw_error case.
  
  I imagine that this used to work correctly and got broken by some
  ill-advised refactoring, but whatever the origin, it's 100% broken today.

I know Peter is looking at this, but I looked at and I can't see the
problem.  Every call of exec_prog() that uses pg_resetxlog has
throw_error = true, and the test there is:

result = system(cmd);

if (result != 0)
...
pg_log(FATAL, ...)

and in pg_log_v() I see:

switch (type)
...
case PG_FATAL:
printf(\n%s, _(message));
printf(Failure, exiting\n);
-- exit(1);
break;

so I am not clear how you are seeing the return status of pg_resetxlog
ignored.  I tried the attached patch which causes pg_resetxlog -f to
return -1, and got the proper error from pg_upgrade in git head:

Performing Upgrade
--
Analyzing all rows in the new cluster   ok
Freezing all rows on the new clusterok
Deleting files from new pg_clog ok
Copying old pg_clog to new server   ok
Setting next transaction ID for new cluster
*failure*

Consult the last few lines of pg_upgrade_utility.log for
the probable cause of the failure.
Failure, exiting

and the last line in pg_upgrade_utility.log is:

command: /u/pgsql/bin/pg_resetxlog -f -x 683 /u/pgsql/data 
pg_upgrade_utility.log 21

-- 
  Bruce Momjian  br...@momjian.ushttp://momjian.us
  EnterpriseDB http://enterprisedb.com

  + Everyone has their own god. +
diff --git a/src/bin/pg_resetxlog/pg_resetxlog.c b/src/bin/pg_resetxlog/pg_resetxlog.c
new file mode 100644
index 03f2fad..3e67630
*** a/src/bin/pg_resetxlog/pg_resetxlog.c
--- b/src/bin/pg_resetxlog/pg_resetxlog.c
*** main(int argc, char *argv[])
*** 121,126 
--- 121,127 
  		{
  			case 'f':
  force = true;
+ exit(1);
  break;
  
  			case 'n':

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [COMMITTERS] pgsql: Upgrade to Autoconf 2.69

2014-01-10 Thread Tom Lane
Bruce Momjian br...@momjian.us writes:
 On Tue, Jan  7, 2014 at 08:19:49AM -0500, Peter Eisentraut wrote:
 That was probably me. I'll look into it.

 and in pg_log_v() I see:

 switch (type)
 ...
 case PG_FATAL:
 printf(\n%s, _(message));
 printf(Failure, exiting\n);
 -- exit(1);
 break;

Peter just fixed that; see commit ca607b155e86ce529fc9ac322a232f264cda9ab6

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [COMMITTERS] pgsql: Upgrade to Autoconf 2.69

2014-01-10 Thread Alvaro Herrera
Bruce Momjian wrote:

 I know Peter is looking at this, but I looked at and I can't see the
 problem.  Every call of exec_prog() that uses pg_resetxlog has
 throw_error = true, and the test there is:
 
 result = system(cmd);
 
 if (result != 0)
   ...
   pg_log(FATAL, ...)
 
 and in pg_log_v() I see:
 
 switch (type)
 ...
 case PG_FATAL:
 printf(\n%s, _(message));
 printf(Failure, exiting\n);
 -- exit(1);
 break;

This was fixed by Peter two days ago in commit
http://git.postgresql.org/pg/commitdiff/ca607b155e86ce529fc9ac322a232f264cda9ab6

-- 
Álvaro Herrerahttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [COMMITTERS] pgsql: Upgrade to Autoconf 2.69

2014-01-10 Thread Bruce Momjian
On Fri, Jan 10, 2014 at 04:06:11PM -0500, Tom Lane wrote:
 Bruce Momjian br...@momjian.us writes:
  On Tue, Jan  7, 2014 at 08:19:49AM -0500, Peter Eisentraut wrote:
  That was probably me. I'll look into it.
 
  and in pg_log_v() I see:
 
  switch (type)
  ...
  case PG_FATAL:
  printf(\n%s, _(message));
  printf(Failure, exiting\n);
  -- exit(1);
  break;
 
 Peter just fixed that; see commit ca607b155e86ce529fc9ac322a232f264cda9ab6

Oh, I guess I checked git log on the wrong file then.  Thanks.

-- 
  Bruce Momjian  br...@momjian.ushttp://momjian.us
  EnterpriseDB http://enterprisedb.com

  + Everyone has their own god. +


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] INSERT...ON DUPLICATE KEY LOCK FOR UPDATE

2014-01-10 Thread Heikki Linnakangas

On 01/10/2014 10:00 PM, Peter Geoghegan wrote:

On Fri, Jan 10, 2014 at 11:28 AM, Heikki Linnakangas
hlinnakan...@vmware.com wrote:

Why does it deadlock with the btreelock patch? I don't see why it should. If
you have two backends inserting a single tuple, and they conflict, one of
them should succeed to insert, and the other one should update.


Are you sure that it doesn't make your patch deadlock too, with enough
pressure? I've made that mistake myself.

That test-case made my patch deadlock (in a detected fashion) when it
used buffer locks as a value locking prototype - I say as much right
there in the November mail you linked to. I think that's acceptable,
because it's non-sensible use of the feature (my point was only that
it shouldn't livelock). The test case is naively locking a row without
knowing ahead of time (or pro-actively checking) if the conflict is on
the first or second unique index. So before too long, you're updating
the wrong row (no existing lock is really held), based on the 'a'
column's projected value, when in actuality the conflict was on the
'b' column's projected value. Conditions are right for deadlock,
because two rows are locked, not one.


I see. Yeah, I also get deadlocks when I change update statement to use 
foo.b = rej.b instead of foo.a = rej.a. I think it's down to the 
indexes are processed, ie. which conflict you see first.


This is pretty much the same issue we discussed wrt. exclusion 
contraints. If the tuple being inserted conflicts with several existing 
tuples, what to do? I think the best answer would be to return and lock 
them all. It could still deadlock, but it's nevertheless less surprising 
behavior than returning one of the tuples in random. Actually, we could 
even avoid the deadlock by always locking the tuples in a certain order, 
although I'm not sure if it's worth the trouble.


- Heikki


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Disallow arrays with non-standard lower bounds

2014-01-10 Thread Jim Nasby

On 1/9/14, 10:58 PM, Tom Lane wrote:

Jim Nasby j...@nasby.net writes:

ISTM that allowing users to pick arbitrary lower array bounds was a huge 
mistake. I've never seen anyone make use of it, can't think of any legitimate 
use cases for it, and hate the stupendous amount of extra code needed to deal 
with it.


You lack imagination, sir.


Considering what you'd normally want to do in SQL, the only example I can think 
of is to not have the argument over 0 vs 1 based.

Actually, I was thinking there might be some computational problems where 
changing lower bound would be nice, but then again, what other languages 
actually support this?
--
Jim C. Nasby, Data Architect   j...@nasby.net
512.569.9461 (cell) http://jim.nasby.net


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [PATCH] Negative Transition Aggregate Functions (WIP)

2014-01-10 Thread David Rowley
On Sat, Jan 11, 2014 at 7:08 AM, Tom Lane t...@sss.pgh.pa.us wrote:

 Although, having said that ... maybe build your own aggregate would
 be a reasonable suggestion for people who need this?  I grant that
 it's going to be a minority requirement, maybe even a small minority
 requirement.  People who have the chops to get this sort of thing right
 can probably manage a custom aggregate definition.


I more or less wrote off the idea of inverse transition functions after
your example upthread. I had thought that perhaps if we could get inverse
transitions in there for SUM(numeric) then people who need more speed could
just cast their value to numeric then back to float or double precision
after aggregation takes place. I had to delay writing any documentation
around that as I'm still not sure if we can have sum(numeric) use an
inverse transition function due to the fact that it can introduce extra
zeros after the decimal point.

As the patch stands at the moment, I currently have a regression test which
currently fails due to these extra zeros after the decimal point:

-- This test currently fails due extra trailing 0 digits.
SELECT SUM(n::numeric) OVER (ORDER BY i ROWS BETWEEN CURRENT ROW AND
UNBOUNDED FOLLOWING)
  FROM (VALUES(1,1.01),(2,2),(3,3)) v(i,n);

Patched produces:
 6.01
 5.00
 3.00
Unpatched produces:
 6.01
 5
 3


With inverse transitions this query still produces correct results, it just
does not produces the numeric in the same format as it does without
performing inverse transitions. Personally I'd rather focus on trying to
get SUM(numeric) in there for 9.4 and maybe focus on floating point stuff
at a later date as casting to numeric can be the work around for users who
complain about the speed. Or if they really want they can create their own
aggregate, using an existing built in function as the inverse transition,
like float8_mi.

There's certain things that currently seem a big magical to me when it
comes to numeric, for example I've no idea why the following query produces
20 0's after the decimal point for 1 and only 16 for 2.

select n::numeric / 1 from generate_series(1,2) g(n);

To me it does not look very consistent at all and I'm really wondering if
there is some special reason why we bother including the useless zeros at
the end at all. I've written a patch which gets rid of them in numeric_out,
but I had not planned on posting it here in case it gets laughed off stage
due to some special reason we have for keeping those zeros that I don't
know about.

Can anyone explain to me why we have these unneeded zeros in numeric when
the precision is not supplied?

Regards

David Rowley


 The constraint this would pose on the float4 and float8 implementations
 is that it be possible to use their transition and final functions in
 a custom aggregate declaration while leaving off the inverse function;
 or, if that combination doesn't work for some reason, we have to continue
 to provide the previous transition/final functions for use in user
 aggregates.

 Suitable documentation would be needed too, of course.

 regards, tom lane



Re: [HACKERS] Time to do our Triage for 9.4

2014-01-10 Thread David Rowley
On Sat, Jan 11, 2014 at 8:28 AM, Josh Berkus j...@agliodbs.com wrote:

 All,

 To make this easier for everyone to participate in, I've created a wiki
 page:

 https://wiki.postgresql.org/wiki/9.4CF4Triage

 Please add the patches you know well to the appropriate list, thanks!


I know my own patch pretty well and from my own point of view it's very
close to being about ready to go, but a good review may change that. Should
we be waiting for 2nd opinions or can patch authors decide for themselves?
Or were you talking only to commiters?

Regards

David Rowley


--
 Josh Berkus
 PostgreSQL Experts Inc.
 http://pgexperts.com


 --
 Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
 To make changes to your subscription:
 http://www.postgresql.org/mailpref/pgsql-hackers



Re: [HACKERS] Standalone synchronous master

2014-01-10 Thread Jim Nasby

On 1/10/14, 12:59 PM, Joshua D. Drake wrote:

I know I am the one that instigated all of this so I want to be very clear on 
what I and what I am confident that my customers would expect.

If a synchronous slave goes down, the master continues to operate. That is all. 
I don't care if it is configurable (I would be fine with that). I don't care if 
it is not automatic (e.g; slave goes down and we have to tell the master to 
continue).

I have read through this thread more than once, and I have also went back to 
the docs. I understand why we do it the way we do it. I also understand that 
from a business requirement for 99% of CMD's customers, it's wrong. At least in 
the sense of providing continuity of service.


+1

I understand that this is a degredation of full-on sync rep. But there is 
definite value added with sync-rep that can automatically (or at least easily) 
degrade over async; it protects you from single failures. I fully understand 
that it will not protect you from a double failure. That's OK in many cases.
Jim C. Nasby, Data Architect   j...@nasby.net
512.569.9461 (cell) http://jim.nasby.net


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Add CREATE support to event triggers

2014-01-10 Thread Simon Riggs
On 10 January 2014 18:17, Robert Haas robertmh...@gmail.com wrote:
 On Fri, Jan 10, 2014 at 12:59 PM, Simon Riggs si...@2ndquadrant.com wrote:
 That's project policy
 and always has been.  When somebody implements 50% of a feature, or
 worse yet 95% of a feature, it violates the POLA for users and doesn't
 always subsequently get completed, leaving us with long-term warts
 that are hard to eliminate.

 So why was project policy violated when we released 9.3 with only DROP
 event support? Surely that was a worse violation of POLA than my
 suggestion?

 Well, obviously I didn't think so at the time, or I would have
 objected.  I felt, and still feel, that implementing one kind of event
 trigger (drop) does not necessarily require implementing another kind
 (create).  I think that's clearly different from implementing either
 one for only some object types.

 This event trigger will be called whenever an object is dropped is a
 reasonable contract with the user.  This other event trigger will be
 called whenever an object is created, unless it happens to be a
 schema is much less reasonable.

 At least in my opinion.

In the fullness of time, I agree that is not a restriction we should maintain.

Given that CREATE SCHEMA with multiple objects is less well used, its
a reasonable restriction to accept for one release, if the alternative
is to implement nothing at all of value. Especially since we are now
in the third year of development of this set of features, it is time
to reduce the scope to ensure delivery.

There may be other ways to ensure something of value is added, this
was just one suggestion.

-- 
 Simon Riggs   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Disallow arrays with non-standard lower bounds

2014-01-10 Thread Andrew Dunstan


On 01/10/2014 04:26 PM, Jim Nasby wrote:

On 1/9/14, 10:58 PM, Tom Lane wrote:

Jim Nasby j...@nasby.net writes:
ISTM that allowing users to pick arbitrary lower array bounds was a 
huge mistake. I've never seen anyone make use of it, can't think of 
any legitimate use cases for it, and hate the stupendous amount of 
extra code needed to deal with it.


You lack imagination, sir.


Considering what you'd normally want to do in SQL, the only example I 
can think of is to not have the argument over 0 vs 1 based.


Actually, I was thinking there might be some computational problems 
where changing lower bound would be nice, but then again, what other 
languages actually support this?


Ada, for one. In fact, in Ada the index doesn't need to be an integer, 
just an enumerable type (e.g. an enum). You can iterate over 
one-dimensional arrays by saying:


FOR i IN [REVERSE] my_array'range LOOP ...

cheers

andrew (who sadly hasn't used Ada in anger for about 20 years).




--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Standalone synchronous master

2014-01-10 Thread Andres Freund
On 2014-01-10 10:59:23 -0800, Joshua D. Drake wrote:
 
 On 01/10/2014 07:47 AM, Bruce Momjian wrote:
 
 I know there was a desire to remove this TODO item, but I think we have
 brought up enough new issues that we can keep it to see if we can come
 up with a solution.  I have added a link to this discussion on the TODO
 item.
 
 I think we will need at least four new GUC variables:
 
 *  timeout control for degraded mode
 *  command to run during switch to degraded mode
 *  command to run during switch from degraded mode
 *  read-only variable to report degraded mode
 
 
 I know I am the one that instigated all of this so I want to be very clear
 on what I and what I am confident that my customers would expect.
 
 If a synchronous slave goes down, the master continues to operate. That is
 all. I don't care if it is configurable (I would be fine with that). I don't
 care if it is not automatic (e.g; slave goes down and we have to tell the
 master to continue).

Would you please explain, as precise as possible, what the advantages of
using a synchronous standby would be in such a scenario?

Greetings,

Andres Freund

-- 
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Standalone synchronous master

2014-01-10 Thread Stephen Frost
* Andres Freund (and...@2ndquadrant.com) wrote:
 On 2014-01-10 10:59:23 -0800, Joshua D. Drake wrote:
  If a synchronous slave goes down, the master continues to operate. That is
  all. I don't care if it is configurable (I would be fine with that). I don't
  care if it is not automatic (e.g; slave goes down and we have to tell the
  master to continue).
 
 Would you please explain, as precise as possible, what the advantages of
 using a synchronous standby would be in such a scenario?

In a degraded/failure state, things continue to *work*.  In a
non-degraded/failure state, you're able to handle a system failure and
know that you didn't lose any transactions.

Tom's point is correct, that you will fail on the have two copies of
everything in this mode, but that could certainly be acceptable in the
case where there is a system failure.  As pointed out by someone
previously, that's how RAID-1 works (which I imagine quite a few of us
use).

I've been thinking about this a fair bit and I've come to like the RAID1
analogy.  Stinks that we can't keep things going (automatically) if
either side fails, but perhaps we will one day...

Thanks,

Stephen


signature.asc
Description: Digital signature


Re: [HACKERS] Disallow arrays with non-standard lower bounds

2014-01-10 Thread David Fetter
On Fri, Jan 10, 2014 at 03:26:04PM -0600, Jim Nasby wrote:
 On 1/9/14, 10:58 PM, Tom Lane wrote:
 Jim Nasby j...@nasby.net writes:
 ISTM that allowing users to pick arbitrary lower array bounds was
 a huge mistake. I've never seen anyone make use of it, can't think
 of any legitimate use cases for it, and hate the stupendous amount
 of extra code needed to deal with it.
 
 You lack imagination, sir.
 
 Considering what you'd normally want to do in SQL, the only example
 I can think of is to not have the argument over 0 vs 1 based.
 
 Actually, I was thinking there might be some computational problems
 where changing lower bound would be nice, but then again, what other
 languages actually support this?

Well, there's Perl, but that's not an argument *for* doing this, and I
say that as a long-time Perl user (who has never seen this feature
used in code worth not scrapping, by the way).

Cheers,
David.
-- 
David Fetter da...@fetter.org http://fetter.org/
Phone: +1 415 235 3778  AIM: dfetter666  Yahoo!: dfetter
Skype: davidfetter  XMPP: david.fet...@gmail.com
iCal: webcal://www.tripit.com/feed/ical/people/david74/tripit.ics

Remember to vote!
Consider donating to Postgres: http://www.postgresql.org/about/donate


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Disallow arrays with non-standard lower bounds

2014-01-10 Thread Kevin Grittner
Gavin Flower gavinflo...@archidevsys.co.nz wrote:

 Starting arrays at zero makes the most sense, as then you can
 calculate the displacement simply as (index) * (size of entry),
 and not have subtract one from the index first.  This would be my
 preference.

The SQL standard explicitly specifies that arrays positions range
from 1 to the cardinality of the array, with individual elements
referenced by position.  When implementing a language for which
there is an international standard, my preference is to conform to
the standard.

I don't have a problem with extensions to the language, and a
variable low bound is workable as an extension as long as the
standard ways to create an array default to a low bound of 1.

A bigger problem with our array implementation is that is is really
a multidimensional matrix, rather than an array which can contain
nested arrays.  That is both non-standard and limiting.

That said, I think it would be nice to have better support for
arrays defined with a single dimension and a low bound of 1, as the
standard requires.  Functions which throw an error when passed a
non-conforming parameter and can provide a simplified API as a
result would probably get used by me.

--
Kevin Grittner
EDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Disallow arrays with non-standard lower bounds

2014-01-10 Thread Jeff Janes
On Fri, Jan 10, 2014 at 1:26 PM, Jim Nasby j...@nasby.net wrote:

 On 1/9/14, 10:58 PM, Tom Lane wrote:

 Jim Nasby j...@nasby.net writes:

 ISTM that allowing users to pick arbitrary lower array bounds was a huge
 mistake. I've never seen anyone make use of it, can't think of any
 legitimate use cases for it, and hate the stupendous amount of extra code
 needed to deal with it.


 You lack imagination, sir.


 Considering what you'd normally want to do in SQL, the only example I can
 think of is to not have the argument over 0 vs 1 based.

 Actually, I was thinking there might be some computational problems where
 changing lower bound would be nice, but then again, what other languages
 actually support this?


Perl does, though they regret it bitterly.

Cheers,

Jeff


Re: [HACKERS] Disallow arrays with non-standard lower bounds

2014-01-10 Thread Merlin Moncure
On Fri, Jan 10, 2014 at 4:10 PM, Jeff Janes jeff.ja...@gmail.com wrote:
 On Fri, Jan 10, 2014 at 1:26 PM, Jim Nasby j...@nasby.net wrote:

 On 1/9/14, 10:58 PM, Tom Lane wrote:

 Jim Nasby j...@nasby.net writes:

 ISTM that allowing users to pick arbitrary lower array bounds was a huge
 mistake. I've never seen anyone make use of it, can't think of any
 legitimate use cases for it, and hate the stupendous amount of extra code
 needed to deal with it.


 You lack imagination, sir.


 Considering what you'd normally want to do in SQL, the only example I can
 think of is to not have the argument over 0 vs 1 based.

 Actually, I was thinking there might be some computational problems where
 changing lower bound would be nice, but then again, what other languages
 actually support this?

 Perl does, though they regret it bitterly.

What does it matter?  Our arrays have had the capability for years and
years and because it's cleaner is simply not justification to break
people's applications.  Why are we even considering this?

merlin


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Standalone synchronous master

2014-01-10 Thread Andres Freund
On 2014-01-10 17:02:08 -0500, Stephen Frost wrote:
 * Andres Freund (and...@2ndquadrant.com) wrote:
  On 2014-01-10 10:59:23 -0800, Joshua D. Drake wrote:
   If a synchronous slave goes down, the master continues to operate. That is
   all. I don't care if it is configurable (I would be fine with that). I 
   don't
   care if it is not automatic (e.g; slave goes down and we have to tell the
   master to continue).
  
  Would you please explain, as precise as possible, what the advantages of
  using a synchronous standby would be in such a scenario?
 
 In a degraded/failure state, things continue to *work*.  In a
 non-degraded/failure state, you're able to handle a system failure and
 know that you didn't lose any transactions.

Why do you know that you didn't loose any transactions? Trivial network
hiccups, a restart of a standby, IO overload on the standby all can
cause a very short interruptions in the walsender connection - leading
to degradation.

 As pointed out by someone
 previously, that's how RAID-1 works (which I imagine quite a few of us
 use).

I don't think that argument makes much sense. Raid-1 isn't safe
as-is. It's only safe if you use some sort of journaling or similar
ontop. If you issued a write during a crash you normally will just get
either the version from before or the version after the last write back,
depending on the state on the individual disks and which disk is treated
as authoritative by the raid software.

And even if you disregard that, there's not much outside influence that
can lead to loosing connection to a disk drive inside a raid outside an
actually broken drive. Any network connection is normally kept *outside*
the leven at which you build raids.

Greetings,

Andres Freund

-- 
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Standalone synchronous master

2014-01-10 Thread Stephen Frost
Andres,

On Friday, January 10, 2014, Andres Freund wrote:

 On 2014-01-10 17:02:08 -0500, Stephen Frost wrote:
  * Andres Freund (and...@2ndquadrant.com javascript:;) wrote:
   On 2014-01-10 10:59:23 -0800, Joshua D. Drake wrote:
If a synchronous slave goes down, the master continues to operate.
 That is
all. I don't care if it is configurable (I would be fine with that).
 I don't
care if it is not automatic (e.g; slave goes down and we have to
 tell the
master to continue).
  
   Would you please explain, as precise as possible, what the advantages
 of
   using a synchronous standby would be in such a scenario?
 
  In a degraded/failure state, things continue to *work*.  In a
  non-degraded/failure state, you're able to handle a system failure and
  know that you didn't lose any transactions.

 Why do you know that you didn't loose any transactions? Trivial network
 hiccups, a restart of a standby, IO overload on the standby all can
 cause a very short interruptions in the walsender connection - leading
 to degradation.


You know that you haven't *lost* any by virtue of the master still being
up. The case you describe is a double-failure scenario- the link between
the master and slave has to go away AND the master must accept a
transaction and then fail independently.


  As pointed out by someone
  previously, that's how RAID-1 works (which I imagine quite a few of us
  use).

 I don't think that argument makes much sense. Raid-1 isn't safe
 as-is. It's only safe if you use some sort of journaling or similar
 ontop. If you issued a write during a crash you normally will just get
 either the version from before or the version after the last write back,
 depending on the state on the individual disks and which disk is treated
 as authoritative by the raid software.


Uh, you need a decent raid controller then and we're talking about after a
transaction commit/sync.

And even if you disregard that, there's not much outside influence that
 can lead to loosing connection to a disk drive inside a raid outside an
 actually broken drive. Any network connection is normally kept *outside*
 the leven at which you build raids.


This is a fair point and perhaps we should have the timeout or jitter GUC
which was proposed elsewhere, but the notion that this configuration is
completely unreasonable is not accurate and therefore having it would be a
benefit overall.

Thanks,

Stephen


Re: [HACKERS] Standalone synchronous master

2014-01-10 Thread Joshua D. Drake


On 01/10/2014 01:49 PM, Andres Freund wrote:


I know I am the one that instigated all of this so I want to be very clear
on what I and what I am confident that my customers would expect.

If a synchronous slave goes down, the master continues to operate. That is
all. I don't care if it is configurable (I would be fine with that). I don't
care if it is not automatic (e.g; slave goes down and we have to tell the
master to continue).


Would you please explain, as precise as possible, what the advantages of
using a synchronous standby would be in such a scenario?


Current behavior:

db01-sync-db02

Transactions are happening. Everything is happy. Website is up. Orders 
are being made.


db02 goes down. It doesn't matter why. It is down. Because it is down, 
db01 for all intents and purposes is also down because we are using sync 
replication. We have just lost continuity of service, we can no longer 
accept orders, we can no longer allow people to log into the website, we 
can no longer service accounts.


In short, we are out of business.

Proposed behavior:

db01-sync-db02

Transactions are happening. Everything is happy. Website is up. Orders 
are being made.


db02 goes down. It doesn't matter why. It is down. db01 continues to 
accept orders, allow people to log into the website and we can still 
service accounts. The continuity of service continues.


Yes, there are all kinds of things that need to be considered when that 
happens, that isn't the point. The point is, PostgreSQL continues its 
uptime guarantee and allows the business to continue to function as (if) 
nothing has happened.


For many and I dare say the majority of businesses, this is enough. They 
know that if the slave goes down they can continue to operate. They know 
if the master goes down they can fail over. They know that while both 
are up they are using sync rep (with various caveats). They are happy. 
They like that it is simple and just works. They continue to use PostgreSQL.



Sincerely,

JD

--
Command Prompt, Inc. - http://www.commandprompt.com/  509-416-6579
PostgreSQL Support, Training, Professional Services and Development
High Availability, Oracle Conversion, Postgres-XC, @cmdpromptinc
In a time of universal deceit - telling the truth is a revolutionary 
act., George Orwell



--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Standalone synchronous master

2014-01-10 Thread Andres Freund
On 2014-01-10 14:29:58 -0800, Joshua D. Drake wrote:
 db02 goes down. It doesn't matter why. It is down. db01 continues to accept
 orders, allow people to log into the website and we can still service
 accounts. The continuity of service continues.

Why is that configuration advantageous over a async configuration is the
question. Why, with those requirements, are you using a synchronous
standby at all?

Greetings,

Andres Freund

-- 
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] INSERT...ON DUPLICATE KEY LOCK FOR UPDATE

2014-01-10 Thread Peter Geoghegan
On Fri, Jan 10, 2014 at 1:25 PM, Heikki Linnakangas
hlinnakan...@vmware.com wrote:
 This is pretty much the same issue we discussed wrt. exclusion contraints.
 If the tuple being inserted conflicts with several existing tuples, what to
 do? I think the best answer would be to return and lock them all. It could
 still deadlock, but it's nevertheless less surprising behavior than
 returning one of the tuples in random. Actually, we could even avoid the
 deadlock by always locking the tuples in a certain order, although I'm not
 sure if it's worth the trouble.

I understand and accept that as long as we're intent on locking more
than one row per transaction, that action could deadlock with another
session doing something similar. Actually, I've even encountered
people giving advice in relation to proprietary systems along the
lines of: if your big SQL MERGE statement is deadlocking excessively,
you might try hinting to make sure a nested loop join is used. I
think that this kind of ugly compromise is unavoidable in those
scenarios (in reality the most popular strategy is probably cross
your fingers). But as everyone agrees, the common case where an xact
only upserts one row should never deadlock with another, similar xact.
So *that* isn't a problem I have with making row locking work for
exclusion constraints.

My problem is that in general I'm not sold on the actual utility of
making this kind of row locking work with exclusion constraints. I'm
sincerely having a hard time thinking of a practical use-case
(although, as I've said, I want to make it work with IGNORE). Even if
you work all this row locking stuff out, and the spill-to-disk aspect
out, the interface is still wrong, because you need to figure out a
way to project more than one reject per slot. Maybe I lack imagination
around how to make that work, but there are a lot of ifs and buts
either way.

-- 
Peter Geoghegan


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Standalone synchronous master

2014-01-10 Thread Adrian Klaver

On 01/10/2014 02:33 PM, Andres Freund wrote:

On 2014-01-10 14:29:58 -0800, Joshua D. Drake wrote:

db02 goes down. It doesn't matter why. It is down. db01 continues to accept
orders, allow people to log into the website and we can still service
accounts. The continuity of service continues.


Why is that configuration advantageous over a async configuration is the
question. Why, with those requirements, are you using a synchronous
standby at all?


+1



Greetings,

Andres Freund




--
Adrian Klaver
adrian.kla...@gmail.com


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Standalone synchronous master

2014-01-10 Thread Joshua D. Drake


On 01/10/2014 02:33 PM, Andres Freund wrote:


On 2014-01-10 14:29:58 -0800, Joshua D. Drake wrote:

db02 goes down. It doesn't matter why. It is down. db01 continues to accept
orders, allow people to log into the website and we can still service
accounts. The continuity of service continues.


Why is that configuration advantageous over a async configuration is the
question. Why, with those requirements, are you using a synchronous
standby at all?


If the master goes down, I can fail over knowing that as many of my 
transactions as possible have been replicated.


JD




--
Command Prompt, Inc. - http://www.commandprompt.com/  509-416-6579
PostgreSQL Support, Training, Professional Services and Development
High Availability, Oracle Conversion, Postgres-XC, @cmdpromptinc
In a time of universal deceit - telling the truth is a revolutionary 
act., George Orwell



--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Standalone synchronous master

2014-01-10 Thread Andres Freund
Hi,

On 2014-01-10 17:28:55 -0500, Stephen Frost wrote:
  Why do you know that you didn't loose any transactions? Trivial network
  hiccups, a restart of a standby, IO overload on the standby all can
  cause a very short interruptions in the walsender connection - leading
  to degradation.

 You know that you haven't *lost* any by virtue of the master still being
 up. The case you describe is a double-failure scenario- the link between
 the master and slave has to go away AND the master must accept a
 transaction and then fail independently.

Unfortunately network outages do correlate with other system
faults. What you're wishing for really is the I like the world to be
friendly to me mode.
Even if you have only disk problems, quite often if your disks die, you
can continue to write (especially with a BBU), but uncached reads
fail. So the walsender connection errors out because a read failed, and
youre degrading into async mode. *Because* your primary is about to die.

   As pointed out by someone
   previously, that's how RAID-1 works (which I imagine quite a few of us
   use).
 
  I don't think that argument makes much sense. Raid-1 isn't safe
  as-is. It's only safe if you use some sort of journaling or similar
  ontop. If you issued a write during a crash you normally will just get
  either the version from before or the version after the last write back,
  depending on the state on the individual disks and which disk is treated
  as authoritative by the raid software.

 Uh, you need a decent raid controller then and we're talking about after a
 transaction commit/sync.

Yes, if you have a BBU that memory is authoritative in most cases. But
in that case the argument of having two disks is pretty much pointless,
the SPOF suddenly became the battery + ram.

Greetings,

Andres Freund

-- 
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Standalone synchronous master

2014-01-10 Thread Jeff Janes
On Fri, Jan 10, 2014 at 2:33 PM, Andres Freund and...@2ndquadrant.comwrote:

 On 2014-01-10 14:29:58 -0800, Joshua D. Drake wrote:
  db02 goes down. It doesn't matter why. It is down. db01 continues to
 accept
  orders, allow people to log into the website and we can still service
  accounts. The continuity of service continues.

 Why is that configuration advantageous over a async configuration is the
 question.


Because it is orders of magnitude less likely to lose transactions that
were reported to have been committed.  A permanent failure of the master is
almost guaranteed to lose transactions with async.  With auto-degrade, a
permanent failure of the master only loses reported-committed transactions
if it co-occurs with a temporary failure of the replica or the network,
lasting longer than the time out period.


Why, with those requirements, are you using a synchronous
 standby at all?


They aren't using synchronous standby, they are using asynchronous standby
because we fail to provide the choice they prefer, which is a compromise
between the two.

Cheers,

Jeff


Re: [HACKERS] Standalone synchronous master

2014-01-10 Thread Andres Freund
On 2014-01-10 14:44:28 -0800, Joshua D. Drake wrote:
 
 On 01/10/2014 02:33 PM, Andres Freund wrote:
 
 On 2014-01-10 14:29:58 -0800, Joshua D. Drake wrote:
 db02 goes down. It doesn't matter why. It is down. db01 continues to accept
 orders, allow people to log into the website and we can still service
 accounts. The continuity of service continues.
 
 Why is that configuration advantageous over a async configuration is the
 question. Why, with those requirements, are you using a synchronous
 standby at all?
 
 If the master goes down, I can fail over knowing that as many of my
 transactions as possible have been replicated.

It's not like async replication mode delays sending data to the standby
in any way.

Really, the commits themselves are sent to the server at exactly the
same speed independent of sync/async. The only thing that's delayed is
the *notificiation* of the client that sent the commit. Not the commit
itself.

Greetings,

Andres Freund

-- 
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Standalone synchronous master

2014-01-10 Thread Stephen Frost
Greetings,

On Friday, January 10, 2014, Andres Freund wrote:

 Hi,

 On 2014-01-10 17:28:55 -0500, Stephen Frost wrote:
   Why do you know that you didn't loose any transactions? Trivial network
   hiccups, a restart of a standby, IO overload on the standby all can
   cause a very short interruptions in the walsender connection - leading
   to degradation.

  You know that you haven't *lost* any by virtue of the master still being
  up. The case you describe is a double-failure scenario- the link between
  the master and slave has to go away AND the master must accept a
  transaction and then fail independently.

 Unfortunately network outages do correlate with other system
 faults. What you're wishing for really is the I like the world to be
 friendly to me mode.
 Even if you have only disk problems, quite often if your disks die, you
 can continue to write (especially with a BBU), but uncached reads
 fail. So the walsender connection errors out because a read failed, and
 youre degrading into async mode. *Because* your primary is about to die.


That can happen, sure, but I don't agree that people using a single drive
with a BBU or having two drives in a raid1 die at the same time cases are
reasonable arguments against this option. Not to mention that, today, if
the master has an issue then we're SOL anyway. Also, if the network fails
then likely there aren't any new transactions happening.


As pointed out by someone
previously, that's how RAID-1 works (which I imagine quite a few of
 us
use).
  
   I don't think that argument makes much sense. Raid-1 isn't safe
   as-is. It's only safe if you use some sort of journaling or similar
   ontop. If you issued a write during a crash you normally will just get
   either the version from before or the version after the last write
 back,
   depending on the state on the individual disks and which disk is
 treated
   as authoritative by the raid software.

  Uh, you need a decent raid controller then and we're talking about after
 a
  transaction commit/sync.

 Yes, if you have a BBU that memory is authoritative in most cases. But
 in that case the argument of having two disks is pretty much pointless,
 the SPOF suddenly became the battery + ram.


If that is a concern then use multiple controllers. Certainly not unheard
of- look at SANs...

Thanks,

Stephen


Re: [HACKERS] Standalone synchronous master

2014-01-10 Thread Joshua D. Drake


On 01/10/2014 02:47 PM, Andres Freund wrote:


Really, the commits themselves are sent to the server at exactly the
same speed independent of sync/async. The only thing that's delayed is
the *notificiation* of the client that sent the commit. Not the commit
itself.


Which is irrelevant to the point that if the standby goes down, we are 
now out of business.


Any continuous replication should not be a SPOF. The current behavior 
guarantees that a two node sync cluster is a SPOF. The proposed behavior 
removes that.


Sincerely,

Joshua D. Drake



--
Command Prompt, Inc. - http://www.commandprompt.com/  509-416-6579
PostgreSQL Support, Training, Professional Services and Development
High Availability, Oracle Conversion, Postgres-XC, @cmdpromptinc
In a time of universal deceit - telling the truth is a revolutionary 
act., George Orwell



--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Standalone synchronous master

2014-01-10 Thread Joshua D. Drake


On 01/10/2014 02:57 PM, Stephen Frost wrote:


Yes, if you have a BBU that memory is authoritative in most cases. But
in that case the argument of having two disks is pretty much pointless,
the SPOF suddenly became the battery + ram.


If that is a concern then use multiple controllers. Certainly not
unheard of- look at SANs...



And in PostgreSQL we obviously have the option of having a third or 
fourth standby but that isn't the problem we are trying to solve.


JD



--
Command Prompt, Inc. - http://www.commandprompt.com/  509-416-6579
PostgreSQL Support, Training, Professional Services and Development
High Availability, Oracle Conversion, Postgres-XC, @cmdpromptinc
In a time of universal deceit - telling the truth is a revolutionary 
act., George Orwell



--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Standalone synchronous master

2014-01-10 Thread Hannu Krosing
On 01/10/2014 11:59 PM, Joshua D. Drake wrote:

 On 01/10/2014 02:57 PM, Stephen Frost wrote:

 Yes, if you have a BBU that memory is authoritative in most
 cases. But
 in that case the argument of having two disks is pretty much
 pointless,
 the SPOF suddenly became the battery + ram.


 If that is a concern then use multiple controllers. Certainly not
 unheard of- look at SANs...


 And in PostgreSQL we obviously have the option of having a third or
 fourth standby but that isn't the problem we are trying to solve.
The problem you are trying to solve is a controller with enough
Battery Backed Cache RAM to cache the entire database but with
write-though mode.

And you want it to degrade to write-back in case of disk failure so that
you can continue while the disk is broken.

People here are telling you that it would not be safe, use at least RAID-1
if you want availability

Cheers

-- 
Hannu Krosing
PostgreSQL Consultant
Performance, Scalability and High Availability
2ndQuadrant Nordic OÜ



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Standalone synchronous master

2014-01-10 Thread Josh Berkus
On 01/10/2014 02:59 PM, Joshua D. Drake wrote:
 
 On 01/10/2014 02:47 PM, Andres Freund wrote:
 
 Really, the commits themselves are sent to the server at exactly the
 same speed independent of sync/async. The only thing that's delayed is
 the *notificiation* of the client that sent the commit. Not the commit
 itself.
 
 Which is irrelevant to the point that if the standby goes down, we are
 now out of business.
 
 Any continuous replication should not be a SPOF. The current behavior
 guarantees that a two node sync cluster is a SPOF. The proposed behavior
 removes that.

Again, if that's your goal, then use async replication.

I really don't understand the use-case here.

The purpose of sync rep is to know determinatively whether or not you
have lost data when disaster strikes.  If knowing for certain isn't
important to you, then use async.

BTW, people are using RAID1 as an analogy to 2-node sync replication.
That's a very bad analogy, because in RAID1 you have a *single*
controller which is capable of determining if the disks are in a failed
state or not, and this is all happening on a single node where things
like network outages aren't a consideration.  It's really not the same
situation at all.

Also, frankly, I absolutely can't count the number of times I've had to
rescue a customer or family member who had RAID1 but wan't monitoring
syslog, and so one of their disks had been down for months without them
knowning it.  Heck, I've done this myself.

So ... the Filesystem geeks have already been through this.  Filesystem
clustering started out with systems like DRBD, which includes an
auto-degrade option.  However, DBRD with auto-degrade is widely
considered untrustworthy and is a significant portion of why DBRD isn't
trusted today.

From here, clustered filesystems went in two directions: RHCS added
layers of monitoring and management to make auto-degrade a safer option
than it is with DRBD (and still not the default option).  Scalable
clustered filesystems added N(M) quorum commit in order to support more
than 2 nodes.  Either of these courses are reasonable for us to pursue.

What's a bad idea is adding an auto-degrade option without any tools to
manage and monitor it, which is what this patch does by my reading.  If
I'm wrong, then someone can point it out to me.

-- 
Josh Berkus
PostgreSQL Experts Inc.
http://pgexperts.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Add CREATE support to event triggers

2014-01-10 Thread Alvaro Herrera

Can we please stop arguing over a problem I don't have?  I started with
CREATE SCHEMA because it is one of the easy cases, not because it was
the most difficult case: we only need to deparse the bits of it that
don't involve the objects within, because those are reported by the
event trigger as separate commands.  Each object gets its own creation
command, which works pretty nicely.  Of course, the deparsed version of
the command will not look very much like what was submitted by the user,
but that doesn't matter -- what does matter is that the objects created
by running the commands reported in the event trigger will be (or should
be) the same as those created by the original command.

I was showing CREATE SCHEMA as a way to discuss the fine details: how to
report identifiers that might need quoting, what to do with optional
clauses (AUTHORIZATION), etc.  I am past that now.

On the subject of testing the triggers, Robert Haas wrote:

 Here's one idea: create a contrib module that (somehow, via APIs to be
 invented) runs every DDL command that gets executed through the
 deparsing code, and then parses the result and executes *that* instead
 of the original command.  Then, add a build target that runs the
 regression test suite in that mode, and get the buildfarm configured
 to run that build target regularly on at least some machines.  That
 way, adding syntax to the regular regression test suite also serves to
 test that the deparsing logic for that syntax is working.  If we do
 this, there's still some maintenance burden associated with having DDL
 deparsing code, but at least our chances of noticing when we've failed
 to maintain it should be pretty good.

I gave this some more thought and hit a snag.  The problem here is that
by the time the event trigger runs, the original object has already been
created.  At that point, we can't simply replace the created objects
with objects that would hypothetically be created by a command trigger.

A couple of very hand-wavy ideas:

1. in the event trigger, DROP the original object and CREATE it as
reported by the creation_commands SRF.

2. Have ddl_command_start open a savepoint, and then roll it back in
ddl_command_end, then create the object again.  Not sure this is doable
because of the whole SPI nesting issue .. maybe with C-language event
trigger functions?

-- 
Álvaro Herrerahttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Standalone synchronous master

2014-01-10 Thread Josh Berkus
On 01/10/2014 01:49 PM, Andres Freund wrote:
 On 2014-01-10 10:59:23 -0800, Joshua D. Drake wrote:

 On 01/10/2014 07:47 AM, Bruce Momjian wrote:

 I know there was a desire to remove this TODO item, but I think we have
 brought up enough new issues that we can keep it to see if we can come
 up with a solution.  I have added a link to this discussion on the TODO
 item.

 I think we will need at least four new GUC variables:

 *  timeout control for degraded mode
 *  command to run during switch to degraded mode
 *  command to run during switch from degraded mode
 *  read-only variable to report degraded mode

I would argue that we don't need the first.  We just want a command to
switch synchronous/degraded, and a variable (or function) to report on
degraded mode.  If we have those things, then it becomes completely
possible to have an external monitoring framework, which is capable of
answering questions like is the replica down or just slow?, control
degrade.

Oh, wait!  We DO have such a command.  It's called ALTER SYSTEM SET!
Recently committed.  So this is really a solvable issue if one is
willing to use an external utility.

-- 
Josh Berkus
PostgreSQL Experts Inc.
http://pgexperts.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Time to do our Triage for 9.4

2014-01-10 Thread Josh Berkus
On 01/10/2014 01:34 PM, David Rowley wrote:
 On Sat, Jan 11, 2014 at 8:28 AM, Josh Berkus j...@agliodbs.com wrote:
 
 All,

 To make this easier for everyone to participate in, I've created a wiki
 page:

 https://wiki.postgresql.org/wiki/9.4CF4Triage

 Please add the patches you know well to the appropriate list, thanks!


 I know my own patch pretty well and from my own point of view it's very
 close to being about ready to go, but a good review may change that. Should
 we be waiting for 2nd opinions or can patch authors decide for themselves?
 Or were you talking only to commiters?

Well, I'd prefer that someone other than the patch author assess the
patch state; the author is going to be characteristically optimistic.

However, it's a wiki.  If you put it under good to go someone else who
disagrees can move it.  IMHO, if the patch hasn't had at least one
review yet (in a prior CF), though, I'd put it under Nearly Ready.


-- 
Josh Berkus
PostgreSQL Experts Inc.
http://pgexperts.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Standalone synchronous master

2014-01-10 Thread Joshua D. Drake


On 01/10/2014 03:17 PM, Josh Berkus wrote:


Any continuous replication should not be a SPOF. The current behavior
guarantees that a two node sync cluster is a SPOF. The proposed behavior
removes that.


Again, if that's your goal, then use async replication.


I think I have gone about this the wrong way. Async does not meet the 
technical or business requirements that I have. Sync does except that it 
increases the possibility of an outage. That is the requirement I am 
trying to address.




The purpose of sync rep is to know determinatively whether or not you
have lost data when disaster strikes.  If knowing for certain isn't
important to you, then use async.


PostgreSQL Sync replication increases the possibility of an outage. That 
is incorrect behavior.


I want sync because on the chance that the master goes down, I have as 
much data as possible to fail over to. However, I can't use sync because 
it increases the possibility that my business will not be able to 
function on the chance that the standby goes down.




What's a bad idea is adding an auto-degrade option without any tools to
manage and monitor it, which is what this patch does by my reading.  If


This we absolutely agree on.

JD


--
Command Prompt, Inc. - http://www.commandprompt.com/  509-416-6579
PostgreSQL Support, Training, Professional Services and Development
High Availability, Oracle Conversion, Postgres-XC, @cmdpromptinc
In a time of universal deceit - telling the truth is a revolutionary 
act., George Orwell



--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


  1   2   >