Here is the link on jira: https://issues.apache.org/jira/browse/SPARK-4243
https://issues.apache.org/jira/browse/SPARK-4243
--
View this message in context:
http://apache-spark-user-list.1001560.n3.nabble.com/SQL-COUNT-DISTINCT-tp17818p18166.html
Sent from the Apache Spark User List
questions or anything like that.
Best regards
Bojan
--
View this message in context:
http://apache-spark-user-list.1001560.n3.nabble.com/SQL-COUNT-DISTINCT-tp17818p17939.html
Sent from the Apache Spark User List mailing list archive at Nabble.com
On Mon, Nov 3, 2014 at 12:45 AM, Bojan Kostic blood9ra...@gmail.com wrote:
But will this improvement also affect when you want to count distinct on 2
or more fields:
SELECT COUNT(f1), COUNT(DISTINCT f2), COUNT(DISTINCT f3), COUNT(DISTINCT
f4)
FROM parquetFile
Unfortunately I think this
node. But i wonder
can i add some parallelism to the collect process.
--
View this message in context:
http://apache-spark-user-list.1001560.n3.nabble.com/SQL-COUNT-DISTINCT-tp17818.html
Sent from the Apache Spark User List mailing list archive at Nabble.com
wonder
can i add some parallelism to the collect process.
--
View this message in context:
http://apache-spark-user-list.1001560.n3.nabble.com/SQL-COUNT-DISTINCT-tp17818.html
Sent from the Apache Spark User List mailing list archive at Nabble.com