date:20140110

Re: [HACKERS] array_length(anyarray)

2014-01-10 Thread Dean Rasheed

On 10 January 2014 00:36, Marko Tiikkaja ma...@joh.to wrote:
 On 1/10/14, 1:20 AM, Merlin Moncure wrote:

 I'm piling on: it's not clear at all to me why you've special cased
 this to lower_bound=1.  First of all, there are other reasons to check
 length than iteration.


Yes, I agree. A length function that returned 0 for empty arrays would
be far from useless.


 Can you point me to some examples?


The example I see all the time is code like

if array_length(nodes, 1)  5 then
... do something ...

then you realise (or not as the case may be) that this doesn't work
for empty arrays, and have to remember to wrap it in a coalesce call.

Simply being able to write

if cardinality(nodes)  5 then
   ... do something ...

is not just shorter, easier to type and easier to read, it is far less
likely to be the source of subtle bugs.

Regards,
Dean


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [PATCH] Negative Transition Aggregate Functions (WIP)

2014-01-10 Thread David Rowley

On Fri, Jan 10, 2014 at 4:09 AM, Dean Rasheed dean.a.rash...@gmail.comwrote:

 Hi,

 Reading over this, I realised that there is a problem with NaN
 handling --- once the state becomes NaN, it can never recover. So the
 results using the inverse transition function don't match HEAD in
 cases like this:

 create table t(a int, b numeric);
 insert into t values(1,1),(2,2),(3,'NaN'),(4,3),(5,4);
 select a, b,
sum(b) over(order by a rows between 1 preceding and current row)
   from t;

 which in HEAD produces:

  a |  b  | sum
 ---+-+-
  1 |   1 |   1
  2 |   2 |   3
  3 | NaN | NaN
  4 |   3 | NaN
  5 |   4 |   7
 (5 rows)

 but with this patch produces:

  a |  b  | sum
 ---+-+-
  1 |   1 |   1
  2 |   2 |   3
  3 | NaN | NaN
  4 |   3 | NaN
  5 |   4 | NaN
 (5 rows)


Nice catch! Thanks for having a look at the patch.

Ok, so I thought about this and I don't think it's too big a problem at to
fix it all. I think it can be handled very similar to how I'm taking care
of NULL values in window frame. For these, I simply keep a count of them in
an int64 and when the last one leaves the aggregate context things can
continue as normal.

Lucky for us that all numeric aggregation (and now inverse aggregation)
goes through 2 functions. do_numeric_accum() and the new inverse version of
it do_numeric_discard(), both these functions operate on a NumericAggState
which in the attached I've changed the isNaN bool field to a NaNCount int64
field. I'm just doing NaNCount++ when we get a NaN value in
do_numeric_accum and NaNCount-- in do_numeric_discard(), in the final
functions I'm just checking if NaNCount  0.

Though this implementation does fix the reported problem unfortunately it
may have an undesired performance impact for numeric aggregate functions
when not uses in the context of a window.. Let me explain what I mean:

Previously there was some code in do_numeric_accum() which did:

if (state-isNaN || NUMERIC_IS_NAN(newval))
{
  state-isNaN = true;
  return;
}

Which meant that it didn't bother adding new perfectly valid numerics to
the aggregate totals when there was an NaN encountered previously. I had to
change this to continue on regardless as we still need to keep the totals
just in case all the NaN values are removed and the totals are required
once again. This means that the non-window version of SUM(numeric) and
AVG(numeric) and and the stddev aggregates for numeric pay a price and have
to keep on totaling after encountering NaN values. :(

If there was a way to know if the function was being called in a window
context or a normal aggregate context then we probably almost completely
restore that possible performance regression just by skipping the totaling
when not in windows context. I really don't know how common NaN values are
in the real world to know if this matters too much. I'd hazard a guess that
more people would benefit from inverse transitions on numeric types more,
but I have nothing to back that up.

I've attached version 2 of the patch which fixes the NaN problem and adds a
regression test to cover it.

Thanks again for testing this and finding the problem.

Regards

David Rowley


 Regards,
 Dean



inverse_transition_functions_v2.0.patch.gz
Description: GNU Zip compressed data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [PATCH] Negative Transition Aggregate Functions (WIP)

2014-01-10 Thread David Rowley

On Fri, Jan 10, 2014 at 5:15 AM, Tom Lane t...@sss.pgh.pa.us wrote:

Dean Rasheed dean.a.rash...@gmail.com writes:
Reading over this, I realised that there is a problem with NaN
handling --- once the state becomes NaN, it can never recover. So the
results using the inverse transition function don't match HEAD in
cases like this:

Ouch! That takes out numeric, float4, and float8 in one fell swoop.

Given the relative infrequency of NaNs in most data, it seems like
it might still be possible to get a speedup if we could use inverse
transitions until we hit a NaN, then do it the hard way until the
NaN is outside the window, then go back to inverse transitions.
I'm not sure though if this is at all practical from an implementation
standpoint. We certainly don't want the core code knowing about
anything as datatype-specific as a NaN, but maybe the inverse transition
function could have an API that allows reporting I can't do it here,
fall back to the hard way.

I had thought about that API, not for numeric as I think I've managed to
find another solution, it was for MAX and MIN.

I posted an idea about it here:
http://www.postgresql.org/message-id/caaphdvqu+ygw0vbpbb+yxhrpg5vcy_kifyi8xmxfo8kyocz...@mail.gmail.com
but it didn't generate much interest at the time and I didn't have any
ideas on how the inverse aggregate functions would communicate this
inability to remove the value to the caller. Perhaps it would be an idea
still, but I had put it to the back of my mind in favour of tuplestore
indexes that could be created on the fly based on the row position within
the frame and the aggregate's sort operator on the aggregate value.
This would mean that MAX and MIN values could be found quickly all the time
rather than just when the value being removed happened not to affect the
current maximum or minimum. It's not something I have planned for this
patch though and I'd have lots of questions around memory allocation before
I'd want to start any work on it.

1 2 >

1 - 100 of 143 matches

Mail list logo