Re: HyperLogLogUDT

2015-07-01 Thread Nick Pentreath
Any thoughts? — Sent from Mailbox On Tue, Jun 23, 2015 at 11:19 AM, Nick Pentreath nick.pentre...@gmail.com wrote: Hey Spark devs I've been looking at DF UDFs and UDAFs. The approx distinct is using hyperloglog, but there is only an option to return the count as a Long. It can be useful

Re: HyperLogLogUDT

2015-07-01 Thread Daniel Darabos
It's already possible to just copy the code from countApproxDistinct https://github.com/apache/spark/blob/v1.4.0/core/src/main/scala/org/apache/spark/rdd/RDD.scala#L1153 and access the HLL directly, or do anything you like. On Wed, Jul 1, 2015 at 5:26 PM, Nick Pentreath nick.pentre...@gmail.com

Re: [pyspark] What is the best way to run a minimum unit testing related to our developing module?

2015-07-01 Thread Reynold Xin
Run ./python/run-tests --help and you will see. :) On Wed, Jul 1, 2015 at 9:10 PM, Yu Ishikawa yuu.ishikawa+sp...@gmail.com wrote: Hi all, When I develop pyspark modules, such as adding a spark.ml API in Python, I'd like to run a minimum unit testing related to the developing module again

Re: [pyspark] What is the best way to run a minimum unit testing related to our developing module?

2015-07-01 Thread Yu Ishikawa
Thanks! --Yu - -- Yu Ishikawa -- View this message in context: http://apache-spark-developers-list.1001551.n3.nabble.com/pyspark-What-is-the-best-way-to-run-a-minimum-unit-testing-related-to-our-developing-module-tp12987p12989.html Sent from the Apache Spark Developers List mailing list

Re: [pyspark] What is the best way to run a minimum unit testing related to our developing module?

2015-07-01 Thread Yu ISHIKAWA
Thanks! --Yu 2015-07-02 13:13 GMT+09:00 Reynold Xin r...@databricks.com: Run ./python/run-tests --help and you will see. :) On Wed, Jul 1, 2015 at 9:10 PM, Yu Ishikawa yuu.ishikawa+sp...@gmail.com wrote: Hi all, When I develop pyspark modules, such as adding a spark.ml API in

[pyspark] What is the best way to run a minimum unit testing related to our developing module?

2015-07-01 Thread Yu Ishikawa
Hi all, When I develop pyspark modules, such as adding a spark.ml API in Python, I'd like to run a minimum unit testing related to the developing module again and again. In the previous version, that was easy with commenting out unrelated modules in the ./python/run-tests script. So what is the

Re: enum-like types in Spark

2015-07-01 Thread Stephen Boesch
I am reviving an old thread here. The link for the example code for the java enum based solution is now dead: would someone please post an updated link showing the proper interop? Specifically: it is my understanding that java enum's may not be created within Scala. So is the proposed solution