Hi, I've recently noticed a bug in Spark (branch 1.6) that appears if you do the following
Let's have some DataFrame called df. 1) Aggregation of multiple columns on the Dataframe df and store result as result_agg_1 2) Do another aggregation of multiple columns, but on one less grouping columns and store the result as result_agg_2 3) Align the result of second aggregation by adding missing grouping column with value empty lit("") 4) Union result_agg_1 and result_agg_2 5) Do the projection from "sum(count_column)" to "count_column" for all aggregated columns. The result is structurally inconsistent DataFrame that has all the data coming from result_agg_1 shifted. An example of stripped down code and example result can be seen here: https://gist.github.com/xjrk58/e0c7171287ee9bdc8df8 https://gist.github.com/xjrk58/7a297a42ebb94f300d96 Best, Jiri Syrovy