Re: Aggregator (Spark 2.0) skips aggregation is zero(0 returns null

Takeshi Yamamuro Sun, 26 Jun 2016 03:53:46 -0700

Whatever it is, this is expected; if an initial value is null, spark
codegen removes all the aggregates.
See:
https://github.com/apache/spark/blob/master/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/literals.scala#L199


// maropu

On Sun, Jun 26, 2016 at 7:46 PM, Amit Sela <amitsel...@gmail.com> wrote:

> Not sure about what's the rule in case of `b + null = null` but the same
> code works perfectly in 1.6.1, just tried it..
>
> On Sun, Jun 26, 2016 at 1:24 PM Takeshi Yamamuro <linguin....@gmail.com>
> wrote:
>
>> Hi,
>>
>> This behaviour seems to be expected because you must ensure `b + zero() =
>> b`
>> The your case `b + null = null` breaks this rule.
>> This is the same with v1.6.1.
>> See:
>> https://github.com/apache/spark/blob/master/sql/core/src/main/scala/org/apache/spark/sql/expressions/Aggregator.scala#L57
>>
>> // maropu
>>
>>
>> On Sun, Jun 26, 2016 at 6:06 PM, Amit Sela <amitsel...@gmail.com> wrote:
>>
>>> Sometimes, the BUF for the aggregator may depend on the actual input..
>>> and while this passes the responsibility to handle null in merge/reduce to
>>> the developer, it sounds fine to me if he is the one who put null in zero()
>>> anyway.
>>> Now, it seems that the aggregation is skipped entirely when zero() =
>>> null. Not sure if that was the behaviour in 1.6
>>>
>>> Is this behaviour wanted ?
>>>
>>> Thanks,
>>> Amit
>>>
>>> Aggregator example:
>>>
>>> public static class Agg extends Aggregator<Tuple2<String, Integer>, 
>>> Integer, Integer> {
>>>
>>>   @Override
>>>   public Integer zero() {
>>>     return null;
>>>   }
>>>
>>>   @Override
>>>   public Integer reduce(Integer b, Tuple2<String, Integer> a) {
>>>     if (b == null) {
>>>       b = 0;
>>>     }
>>>     return b + a._2();
>>>   }
>>>
>>>   @Override
>>>   public Integer merge(Integer b1, Integer b2) {
>>>     if (b1 == null) {
>>>       return b2;
>>>     } else if (b2 == null) {
>>>       return b1;
>>>     } else {
>>>       return b1 + b2;
>>>     }
>>>   }
>>>
>>>
>>
>>
>> --
>> ---
>> Takeshi Yamamuro
>>
>


-- 
---
Takeshi Yamamuro

Re: Aggregator (Spark 2.0) skips aggregation is zero(0 returns null

Reply via email to