Hi Jiří,

Thanks for your mail.

Could you create a JIRA ticket for this:
 
https://issues.apache.org/jira/browse/SPARK/?selectedTab=com.atlassian.jira.jira-projects-plugin:summary-panel
<https://issues.apache.org/jira/browse/SPARK/?selectedTab=com.atlassian.jira.jira-projects-plugin:summary-panel>

<https://issues.apache.org/jira/browse/SPARK/?selectedTab=com.atlassian.jira.jira-projects-plugin:summary-panel>
?

Kind regards,

Herman van Hövell


2016-02-26 15:11 GMT+01:00 Jiří Syrový <syrovy.j...@gmail.com>:

> Hi,
>
> I've recently noticed a bug in Spark (branch 1.6) that appears if you do
> the following
>
> Let's have some DataFrame called df.
>
> 1) Aggregation of multiple columns on the Dataframe df and store result as
> result_agg_1
> 2) Do another aggregation of multiple columns, but on one less grouping
> columns and store the result as result_agg_2
> 3) Align the result of second aggregation by adding missing grouping
> column with value empty lit("")
> 4) Union result_agg_1 and result_agg_2
> 5) Do the projection from "sum(count_column)" to "count_column" for all
> aggregated columns.
>
> The result is structurally inconsistent DataFrame that has all the data
> coming from result_agg_1 shifted.
>
> An example of stripped down code and example result can be seen here:
>
> https://gist.github.com/xjrk58/e0c7171287ee9bdc8df8
> https://gist.github.com/xjrk58/7a297a42ebb94f300d96
>
> Best,
> Jiri Syrovy
>
>

Reply via email to