Whatever it is, this is expected; if an initial value is null, spark codegen removes all the aggregates. See: https://github.com/apache/spark/blob/master/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/literals.scala#L199
// maropu On Sun, Jun 26, 2016 at 7:46 PM, Amit Sela <amitsel...@gmail.com> wrote: > Not sure about what's the rule in case of `b + null = null` but the same > code works perfectly in 1.6.1, just tried it.. > > On Sun, Jun 26, 2016 at 1:24 PM Takeshi Yamamuro <linguin....@gmail.com> > wrote: > >> Hi, >> >> This behaviour seems to be expected because you must ensure `b + zero() = >> b` >> The your case `b + null = null` breaks this rule. >> This is the same with v1.6.1. >> See: >> https://github.com/apache/spark/blob/master/sql/core/src/main/scala/org/apache/spark/sql/expressions/Aggregator.scala#L57 >> >> // maropu >> >> >> On Sun, Jun 26, 2016 at 6:06 PM, Amit Sela <amitsel...@gmail.com> wrote: >> >>> Sometimes, the BUF for the aggregator may depend on the actual input.. >>> and while this passes the responsibility to handle null in merge/reduce to >>> the developer, it sounds fine to me if he is the one who put null in zero() >>> anyway. >>> Now, it seems that the aggregation is skipped entirely when zero() = >>> null. Not sure if that was the behaviour in 1.6 >>> >>> Is this behaviour wanted ? >>> >>> Thanks, >>> Amit >>> >>> Aggregator example: >>> >>> public static class Agg extends Aggregator<Tuple2<String, Integer>, >>> Integer, Integer> { >>> >>> @Override >>> public Integer zero() { >>> return null; >>> } >>> >>> @Override >>> public Integer reduce(Integer b, Tuple2<String, Integer> a) { >>> if (b == null) { >>> b = 0; >>> } >>> return b + a._2(); >>> } >>> >>> @Override >>> public Integer merge(Integer b1, Integer b2) { >>> if (b1 == null) { >>> return b2; >>> } else if (b2 == null) { >>> return b1; >>> } else { >>> return b1 + b2; >>> } >>> } >>> >>> >> >> >> -- >> --- >> Takeshi Yamamuro >> > -- --- Takeshi Yamamuro