Github user liancheng commented on the issue: https://github.com/apache/spark/pull/20174 @mgaido91 We can't because we do not know whether there are any input rows or not. For example: ```scala val df1 = spark.range(10).select() val df2 = spark.range(10).filter($"id" < 0).select() val df3 = df1.dropDuplicates() val df4 = df2.dropDuplicates() ``` `df1` has zero columns and ten rows while `df2` has no columns and zero rows. Therefore, `df3` should return one row containing zero columns while `df4` should return zero rows.
--- --------------------------------------------------------------------- To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org