Re: MLlib, Java, and DataFrame

2016-07-22 Thread Marco Mistroni
Hi Inam i sorted it. i reply to all, in case anyone else follow the blog and get into the same issue - First off, the Environment.I have tested the sample using purely spark-1.6.1, no hive, no hadoop. I launched pyspark as follow pyspark --packages com.databricks:spark-csv_2.10:1.4.0 -

Re: MLlib, Java, and DataFrame

2016-07-22 Thread Marco Mistroni
How did you build your spark distribution? Could you detail the steps? Hive afaik is dependent on hadoop. If you don't configure ur spark correctly it will assume hadoop is ur filesystem... I m not using hadoop or hive.u might want to get a cloudera distribution which has spark hadoop and hive

Re: MLlib, Java, and DataFrame

2016-07-22 Thread Inam Ur Rehman
Hello guys..i know its irrelevant to this topic but i've been looking desperately for the solution. I am facing en exception http://apache-spark-user-list.1001560.n3.nabble.com/how-to-resolve-you-must-build-spark-with-hive-exception-td27390.html plz help me.. I couldn't find any solution..plz On

Re: MLlib, Java, and DataFrame

2016-07-22 Thread Jean Georges Perrin
Thanks Marco - I like the idea of sticking with DataFrames ;) > On Jul 22, 2016, at 7:07 AM, Marco Mistroni wrote: > > Hello Jean > you can take ur current DataFrame and send them to mllib (i was doing that > coz i dindt know the ml package),but the process is littlebit

Re: MLlib, Java, and DataFrame

2016-07-22 Thread Marco Mistroni
Hello Jean you can take ur current DataFrame and send them to mllib (i was doing that coz i dindt know the ml package),but the process is littlebit cumbersome 1. go from DataFrame to Rdd of Rdd of [LabeledVectorPoint] 2. run your ML model i'd suggest you stick to DataFrame + ml package :) hth

Re: MLlib, Java, and DataFrame

2016-07-22 Thread Jean Georges Perrin
Thanks Bryan - I keep forgetting about the examples... This is almost it :) I can work with that :) > On Jul 22, 2016, at 1:39 AM, Bryan Cutler wrote: > > Hi JG, > > If you didn't know this, Spark MLlib has 2 APIs, one of which uses > DataFrames. Take a look at this

Re: MLlib, Java, and DataFrame

2016-07-22 Thread Jean Georges Perrin
Hi Jules, Thanks but not really: I know what DataFrames are and I actually use them - specially as the RDD will slowly fade. A lot of the example I see are focusing on cleaning / prep the data, which is an important part, but not really on "after"... Sorry if I am not completely clear. > On

Re: MLlib, Java, and DataFrame

2016-07-22 Thread VG
Interesting. thanks for this information. On Fri, Jul 22, 2016 at 11:26 AM, Bryan Cutler wrote: > ML has a DataFrame based API, while MLlib is RDDs and will be deprecated > as of Spark 2.0. > > On Thu, Jul 21, 2016 at 10:41 PM, VG wrote: > >> Why do we

Re: MLlib, Java, and DataFrame

2016-07-21 Thread Bryan Cutler
ML has a DataFrame based API, while MLlib is RDDs and will be deprecated as of Spark 2.0. On Thu, Jul 21, 2016 at 10:41 PM, VG wrote: > Why do we have these 2 packages ... ml and mlib? > What is the difference in these > > > > On Fri, Jul 22, 2016 at 11:09 AM, Bryan Cutler

Re: MLlib, Java, and DataFrame

2016-07-21 Thread VG
Why do we have these 2 packages ... ml and mlib? What is the difference in these On Fri, Jul 22, 2016 at 11:09 AM, Bryan Cutler wrote: > Hi JG, > > If you didn't know this, Spark MLlib has 2 APIs, one of which uses > DataFrames. Take a look at this example >

Re: MLlib, Java, and DataFrame

2016-07-21 Thread Bryan Cutler
Hi JG, If you didn't know this, Spark MLlib has 2 APIs, one of which uses DataFrames. Take a look at this example https://github.com/apache/spark/blob/master/examples/src/main/java/org/apache/spark/examples/ml/JavaLinearRegressionWithElasticNetExample.java This example uses a Dataset, which is

MLlib, Java, and DataFrame

2016-07-21 Thread Jean Georges Perrin
Hi, I am looking for some really super basic examples of MLlib (like a linear regression over a list of values) in Java. I have found a few, but I only saw them using JavaRDD... and not DataFrame. I was kind of hoping to take my current DataFrame and send them in MLlib. Am I too optimistic?