[ https://issues.apache.org/jira/browse/SPARK-19037?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15788833#comment-15788833 ]
J.P Feng commented on SPARK-19037: ---------------------------------- yes, my spark version is version 2.1 downloaded from official site. Thanks for your reply. I will try the patch. > Run count(distinct x) from sub query found some errors > ------------------------------------------------------ > > Key: SPARK-19037 > URL: https://issues.apache.org/jira/browse/SPARK-19037 > Project: Spark > Issue Type: Bug > Components: Spark Shell, SQL > Affects Versions: 2.1.0 > Environment: spark 2.1.0, scala 2.11 > Reporter: J.P Feng > Labels: distinct, sparkSQL, sub-query > > when i use spark-shell or spark-sql to execute count(distinct name) from > subquery, some errors occur: > select count(distinct name) from (select * from mytest limit 10) as a > if i do this in hive-server2, i can get the correct result. > if i just execute select count(name) from (select * from mytest limit 10) as > a, i can also get the right result. > besides, i found the same errors when i use distinct(),groupby() with > subquery. > I think there maybe some bugs when doing key-reduce jobs with subquery. > I will add the errors in new comment. > besides, i test dropDuplicates in spark-shell: > 1. spark.sql("select * from mytest limit 10").dropDuplicates("name").show > it will throw some exceptions > 2. spark.table("mytest").dropDuplicates("name").show > it will return the right result -- This message was sent by Atlassian JIRA (v6.3.4#6332) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org