Hi, This behaviour seems to be expected because you must ensure `b + zero() = b` The your case `b + null = null` breaks this rule. This is the same with v1.6.1. See: https://github.com/apache/spark/blob/master/sql/core/src/main/scala/org/apache/spark/sql/expressions/Aggregator.scala#L57
// maropu On Sun, Jun 26, 2016 at 6:06 PM, Amit Sela <amitsel...@gmail.com> wrote: > Sometimes, the BUF for the aggregator may depend on the actual input.. and > while this passes the responsibility to handle null in merge/reduce to the > developer, it sounds fine to me if he is the one who put null in zero() > anyway. > Now, it seems that the aggregation is skipped entirely when zero() = > null. Not sure if that was the behaviour in 1.6 > > Is this behaviour wanted ? > > Thanks, > Amit > > Aggregator example: > > public static class Agg extends Aggregator<Tuple2<String, Integer>, Integer, > Integer> { > > @Override > public Integer zero() { > return null; > } > > @Override > public Integer reduce(Integer b, Tuple2<String, Integer> a) { > if (b == null) { > b = 0; > } > return b + a._2(); > } > > @Override > public Integer merge(Integer b1, Integer b2) { > if (b1 == null) { > return b2; > } else if (b2 == null) { > return b1; > } else { > return b1 + b2; > } > } > > -- --- Takeshi Yamamuro