Pretrained Word2Vec models

2016-12-05 Thread Lee Becker
Hi all, Is there a way for Spark to load Word2Vec models trained using gensim or the original C implementation of Word2Vec? Specifically I'd like to play with the Google News model

Re: Returning DataFrame as Scala method return type

2016-09-08 Thread Lee Becker
On Thu, Sep 8, 2016 at 11:35 AM, Ashish Tadose wrote: > I wish to organize these dataframe operations by grouping them Scala > Object methods. > Something like below > > > >> *Object Driver {* >> *def main(args: Array[String]) {* >> * val df =

collect_set without nulls (1.6 vs 2.0)

2016-09-07 Thread Lee Becker
Hello everyone, Consider this toy example: case class Foo(x: String, y: String) val df = sparkSession.createDataFrame(Array(Foo(null), Foo("a"), Foo("b")) df.select(collect_set($"x")).show In Spark 2.0.0 I get the following results: +--+ |collect_set(x)| +--+ | [null,

Re: countDistinct, partial aggregates and Spark 2.0

2016-08-12 Thread Lee Becker
On Fri, Aug 12, 2016 at 11:55 AM, Lee Becker <lee.bec...@hapara.com> wrote: > val df = sc.parallelize(Array(("a", "a"), ("b", "c"), ("c", > "a"))).toDF("x", "y") > val grouped = df.groupBy($"

countDistinct, partial aggregates and Spark 2.0

2016-08-12 Thread Lee Becker
Hi everyone, I've started experimenting with my codebase to see how much work I will need to port it from 1.6.1 to 2.0.0. In regressing some of my dataframe transforms, I've discovered I can no longer pair a countDistinct with a collect_set in the same aggregation. Consider: val df =

Re: Dataset aggregateByKey equivalent

2016-04-25 Thread Lee Becker
On Sat, Apr 23, 2016 at 8:56 AM, Michael Armbrust wrote: > Have you looked at aggregators? > > > https://docs.cloud.databricks.com/docs/spark/1.6/index.html#examples/Dataset%20Aggregator.html > Thanks for the pointer to aggregators. I wasn't yet aware of them. However,