So you will need to convert your input DataFrame into something with vectors and labels to train on - the Spark ML documentation has examples http://spark.apache.org/docs/latest/ml-guide.html (although the website seems to be having some issues mid update to Spark 2.0 so if you want to read it right now http://spark.apache.org/docs/1.6.2/ml-guide.html#example-pipeline )
As for why some algorithms are available in the RDD API and not the DataFrame API yet - simply development time. The DataFrame/Pipeline time will be the actively developed API going forward. Cheers, Holden :) On Tuesday, July 26, 2016, Shi Yu <shiyu....@gmail.com> wrote: > Hello, > > *Question 1: *I am new to Spark. I am trying to train classification > model on Spark DataFrame. I am using PySpark. And aFrame object in df:ted > a Spark DataFrame object in df: > > from pyspark.sql.types import * > > query = """select * from table""" > > df = sqlContext.sql(query) > > My question is how to continue extend the code to train models (e.g., > classification model etc.) on object df? I have checked many online > resources and haven't seen any similar approach like the following: > > lr = LogisticRegression(maxIter=10, regParam=0.3, elasticNetParam=0.8) > # Fit the modellrModel = lr.fit(df) > > Is it a feasible way to train the model? If yes, where could I find the > reference code? > > *Question 2: *Why in MLib dataframe based API there is no SVM model support, > however, in RDD-based APIs there was SVM model? > > Thanks a lot! > > > Best, > > > Shi > > > -- Cell : 425-233-8271 Twitter: https://twitter.com/holdenkarau