valid functions can be written for reduce and merge when the zero is null.
so not being able to provide null as the initial value is something
troublesome.
i guess the proper way to do this is use Option, and have the None be the
zero, which is what i assumed you did?
unfortunately last time i
Thanks for pointing that Koert!
I understand now why zero() and not init(a: IN), though I still don't see a
good reason to skip the aggregation if zero returns null.
If the user did it, it's on him to take care of null cases in reduce/merge,
but it opens-up the possibility to use the input to
its the difference between a semigroup and a monoid, and yes max does not
easily fit into a monoid.
see also discussion here:
https://issues.apache.org/jira/browse/SPARK-15598
On Mon, Jun 27, 2016 at 3:19 AM, Amit Sela wrote:
> OK. I see that, but the current (provided)
OK. I see that, but the current (provided) implementations are very naive -
Sum, Count, Average -let's take Max for example: I guess zero() would be
set to some value like Long.MIN_VALUE, but what if you trigger (I assume in
the future Spark streaming will support time-based triggers) for a result
No, TypedAggregateExpression that uses Aggregator#zero is different between
v2.0 and v1.6.
v2.0:
https://github.com/apache/spark/blob/master/sql/core/src/main/scala/org/apache/spark/sql/execution/aggregate/TypedAggregateExpression.scala#L91
v1.6:
This "if (value == null)" condition you point to exists in 1.6 branch as
well, so that's probably not the reason.
On Sun, Jun 26, 2016 at 1:53 PM Takeshi Yamamuro
wrote:
> Whatever it is, this is expected; if an initial value is null, spark
> codegen removes all the
Whatever it is, this is expected; if an initial value is null, spark
codegen removes all the aggregates.
See:
https://github.com/apache/spark/blob/master/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/literals.scala#L199
// maropu
On Sun, Jun 26, 2016 at 7:46 PM, Amit Sela
Not sure about what's the rule in case of `b + null = null` but the same
code works perfectly in 1.6.1, just tried it..
On Sun, Jun 26, 2016 at 1:24 PM Takeshi Yamamuro
wrote:
> Hi,
>
> This behaviour seems to be expected because you must ensure `b + zero() =
> b`
> The
Hi,
This behaviour seems to be expected because you must ensure `b + zero() = b`
The your case `b + null = null` breaks this rule.
This is the same with v1.6.1.
See:
https://github.com/apache/spark/blob/master/sql/core/src/main/scala/org/apache/spark/sql/expressions/Aggregator.scala#L57
//