date:20170201

FP growth - Items in a transaction must be unique

2017-02-01 Thread Devi P.V

Hi all, I am trying to run FP growth algorithm using spark and scala.sample input dataframe is following, +---+ |productName

Re: increasing cross join speed

2017-02-01 Thread Takeshi Yamamuro

Hi, I'm not sure how to improve this kind of queries only on vanilla spark though, you can write custom physical plans for top-k queries. You can check the link below as a reference; benchmark: https://github.com/apache/incubator-hivemall/pull/33 manual:

Re: pivot over non numerical data

2017-02-01 Thread Kevin Mellott

This should work for non-numerical data as well - can you please elaborate on the error you are getting and provide a code sample? As a preliminary hint, you can "aggregate" text values using *max*. df.groupBy("someCol") .pivot("anotherCol") .agg(max($"textCol")) Thanks, Kevin On Wed, Feb

Re: Parameterized types and Datasets - Spark 2.1.0

2017-02-01 Thread Don Drake

I imported that as my first command in my previous email. I'm using a spark-shell. scala> import org.apache.spark.sql.Encoder import org.apache.spark.sql.Encoder scala> Any comments regarding importing implicits in an application? Thanks. -Don On Wed, Feb 1, 2017 at 6:10 PM, Michael

Re: JavaRDD text matadata(file name) findings

2017-02-01 Thread neil90

You can use the https://spark.apache.org/docs/latest/api/java/org/apache/spark/api/java/JavaSparkContext.html#wholeTextFiles(java.lang.String) but it will return a rdd as such (filename,content) -- View this message in context:

Re: Parameterized types and Datasets - Spark 2.1.0

2017-02-01 Thread Michael Armbrust

This is the error, you are missing an import: :13: error: not found: type Encoder abstract class RawTable[A : Encoder](inDir: String) { Works for me in a REPL.

Re: Question about Multinomial LogisticRegression in spark mllib in spark 2.1.0

2017-02-01 Thread Hollin Wilkins

Hey Aseem, If you are looking for a full-featured library to execute Spark ML pipelines outside of Spark, take a look at MLeap: https://github.com/combust/mleap Not only does it support transforming single instances of a feature vector, but you can execute your entire ML pipeline including

Re: Dataset Question: No Encoder found for Set[(scala.Long, scala.Long)]

2017-02-01 Thread Jerry Lam

Hi Koert, Thank you for your help! GOT IT! Best Regards, Jerry On Wed, Feb 1, 2017 at 6:24 PM, Koert Kuipers wrote: > you can still use it as Dataset[Set[X]]. all transformations should work > correctly. > > however dataset.schema will show binary type, and dataset.show

Re: Parameterized types and Datasets - Spark 2.1.0

2017-02-01 Thread Don Drake

Thanks for the reply. I did give that syntax a try [A : Encoder] yesterday, but I kept getting this exception in a spark-shell and Zeppelin browser. scala> import org.apache.spark.sql.Encoder import org.apache.spark.sql.Encoder scala> scala> case class RawTemp(f1: String, f2: String, temp:

Re: Dataset Question: No Encoder found for Set[(scala.Long, scala.Long)]

2017-02-01 Thread Koert Kuipers

you can still use it as Dataset[Set[X]]. all transformations should work correctly. however dataset.schema will show binary type, and dataset.show will show bytes (unfortunately). for example: scala> implicit def setEncoder[X]: Encoder[Set[X]] = Encoders.kryo[Set[X]] setEncoder: [X]=>

RE: Jars directory in Spark 2.0

2017-02-01 Thread Sidney Feiner

Ok, good to know ☺ Shading every spark app it is then… Thanks! Sidney Feiner / SW Developer M: +972.528197720 / Skype: sidney.feiner.startapp [StartApp] From: Marcelo Vanzin [mailto:van...@cloudera.com] Sent: Wednesday, February 1, 2017 7:41 PM To: Sidney Feiner

Re: Parameterized types and Datasets - Spark 2.1.0

2017-02-01 Thread Michael Armbrust

You need to enforce that an Encoder is available for the type A using a context bound . import org.apache.spark.sql.Encoder abstract class RawTable[A : Encoder](inDir: String) { ... } On Tue, Jan 31, 2017 at 8:12 PM, Don Drake

Re: using withWatermark on Dataset

2017-02-01 Thread Michael Armbrust

Can you give the full stack trace? Also which version of Spark are you running? On Wed, Feb 1, 2017 at 10:38 AM, Jerry Lam wrote: > Hi everyone, > > Anyone knows how to use withWatermark on Dataset? > > I have tried the following but hit this exception: > > dataset

pivot over non numerical data

2017-02-01 Thread Darshan Pandya

Hello, I am trying to transpose some data using groupBy pivot aggr as mentioned in this blog https://databricks.com/blog/2016/02/09/reshaping-data-with-pivot-in-apache-spark.html But this works only for numerical data. Any hints for doing the same thing for non numerical data ? -- Sincerely,

using withWatermark on Dataset

2017-02-01 Thread Jerry Lam

Hi everyone, Anyone knows how to use withWatermark on Dataset? I have tried the following but hit this exception: dataset org.apache.spark.sql.catalyst.expressions.GenericRowWithSchema cannot be cast to "MyType" The code looks like the following: dataset .withWatermark("timestamp", "5

Re: Jars directory in Spark 2.0

2017-02-01 Thread Marcelo Vanzin

Spark has never shaded dependencies (in the sense of renaming the classes), with a couple of exceptions (Guava and Jetty). So that behavior is nothing new. Spark's dependencies themselves have a lot of other dependencies, so doing that would have limited benefits anyway. On Tue, Jan 31, 2017 at

Re: Dataset Question: No Encoder found for Set[(scala.Long, scala.Long)]

2017-02-01 Thread Jerry Lam

Hi Koert, Thanks for the tips. I tried to do that but the column's type is now Binary. Do I get the Set[X] back in the Dataset? Best Regards, Jerry On Tue, Jan 31, 2017 at 8:04 PM, Koert Kuipers wrote: > set is currently not supported. you can use kryo encoder. there is

Re: Question about Multinomial LogisticRegression in spark mllib in spark 2.1.0

2017-02-01 Thread Seth Hendrickson

In Spark.ML the coefficients are not "pivoted" meaning that they do not set one of the coefficient sets equal to zero. You can read more about it here: https://en.wikipedia.org/wiki/Multinomial_logistic_regression#As_a_set_of_independent_binary_regressions You can translate your set of

union of compatible types

2017-02-01 Thread Koert Kuipers

spark's onion/merging of compatible types seems kind of weak. it works on basic types in the top level record, but it fails for nested records, maps, arrays, etc. are there any known workarounds or plans to improve this? for example i get errors like this: org.apache.spark.sql.AnalysisException:

Re: tylerchap...@yahoo-inc.com is no longer with Yahoo! (was: Question about Multinomial LogisticRegression in spark mllib in spark 2.1.0)

2017-02-01 Thread Aseem Bansal

Can a admin of mailing list please remove this email? I get this email every time I send an email to the mailing list. On Wed, Feb 1, 2017 at 5:12 PM, Yahoo! No Reply wrote: > > This is an automatically generated message. > > tylerchap...@yahoo-inc.com is no longer

Re: Hive Java UDF running on spark-sql issue

2017-02-01 Thread Alex

Yes... Its taking values form a record which is a json and converting it into multiple columns after typecasting... On Wed, Feb 1, 2017 at 4:07 PM, Marco Mistroni wrote: > Hi > What is the UDF supposed to do? Are you trying to write a generic > function to convert values

Question about Multinomial LogisticRegression in spark mllib in spark 2.1.0

2017-02-01 Thread Aseem Bansal

*What I want to do* I have a trained a ml.classification.LogisticRegressionModel using spark ml package. It has 3 features and 3 classes. So the generated model has coefficients in (3, 3) matrix and intercepts in Vector of length (3) as expected. Now, I want to take these coefficients and

Re: Hive Java UDF running on spark-sql issue

2017-02-01 Thread Marco Mistroni

Hi What is the UDF supposed to do? Are you trying to write a generic function to convert values to another type depending on what is the type of the original value? Kr On 1 Feb 2017 5:56 am, "Alex" wrote: Hi , we have Java Hive UDFS which are working perfectly fine in

A question about inconsistency during dataframe creation with RDD/dict in PySpark

2017-02-01 Thread Han-Cheol Cho

Dear spark user ml members, I have quite messy input data so it is difficult to load them as a dataframe object directly. What I did is to load it as an RDD of strings first, convert it to an RDD of pyspark.sql.Row objects, then use toDF method as below. mydf = myrdd.map(parse).toDF() I

FP growth - Items in a transaction must be unique

Re: increasing cross join speed

Re: pivot over non numerical data

Re: Parameterized types and Datasets - Spark 2.1.0

Re: JavaRDD text matadata(file name) findings

Re: Parameterized types and Datasets - Spark 2.1.0

Re: Question about Multinomial LogisticRegression in spark mllib in spark 2.1.0

Re: Dataset Question: No Encoder found for Set[(scala.Long, scala.Long)]

Re: Parameterized types and Datasets - Spark 2.1.0

Re: Dataset Question: No Encoder found for Set[(scala.Long, scala.Long)]

RE: Jars directory in Spark 2.0

Re: Parameterized types and Datasets - Spark 2.1.0

Re: using withWatermark on Dataset

pivot over non numerical data

using withWatermark on Dataset

Re: Jars directory in Spark 2.0

Re: Dataset Question: No Encoder found for Set[(scala.Long, scala.Long)]

Re: Question about Multinomial LogisticRegression in spark mllib in spark 2.1.0

union of compatible types

Re: tylerchap...@yahoo-inc.com is no longer with Yahoo! (was: Question about Multinomial LogisticRegression in spark mllib in spark 2.1.0)

Re: Hive Java UDF running on spark-sql issue

Question about Multinomial LogisticRegression in spark mllib in spark 2.1.0

Re: Hive Java UDF running on spark-sql issue

A question about inconsistency during dataframe creation with RDD/dict in PySpark

24 matches

Site Navigation

Mail list logo

Footer information