[ 
https://issues.apache.org/jira/browse/SPARK-19037?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15788833#comment-15788833
 ] 

J.P Feng commented on SPARK-19037:
----------------------------------

yes, my spark version is version 2.1 downloaded from official site. Thanks for 
your reply. I will try the patch.

> Run count(distinct x) from sub query found some errors
> ------------------------------------------------------
>
>                 Key: SPARK-19037
>                 URL: https://issues.apache.org/jira/browse/SPARK-19037
>             Project: Spark
>          Issue Type: Bug
>          Components: Spark Shell, SQL
>    Affects Versions: 2.1.0
>         Environment: spark 2.1.0, scala 2.11 
>            Reporter: J.P Feng
>              Labels: distinct, sparkSQL, sub-query
>
> when i use spark-shell or spark-sql to execute count(distinct name) from 
> subquery, some errors occur:
> select count(distinct name) from (select * from mytest limit 10) as a
> if i do this in hive-server2, i can get the correct result.
> if i just execute select count(name) from (select * from mytest limit 10) as 
> a, i can also get the right result.
> besides, i found the same errors when i use distinct(),groupby() with 
> subquery.
> I think there maybe some bugs when doing key-reduce jobs with subquery.
> I will add the errors in new comment.
> besides, i test dropDuplicates in spark-shell:
> 1. spark.sql("select * from mytest limit 10").dropDuplicates("name").show
> it will throw some exceptions
> 2. spark.table("mytest").dropDuplicates("name").show
> it will return the right result



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

Reply via email to