Re: feeding DataFrames into predictive algorithms

2015-02-17 Thread Xiangrui Meng
Hey Sandy, The work should be done by a VectorAssembler, which combines multiple columns (double/int/vector) into a vector column, which becomes the features column for regression. We can going to create JIRAs for each of these standard feature transformers. It would be great if you can help

feeding DataFrames into predictive algorithms

2015-02-11 Thread Sandy Ryza
Hey All, I've been playing around with the new DataFrame and ML pipelines APIs and am having trouble accomplishing what seems like should be a fairly basic task. I have a DataFrame where each column is a Double. I'd like to turn this into a DataFrame with a features column and a label column

Re: feeding DataFrames into predictive algorithms

2015-02-11 Thread Michael Armbrust
It sounds like you probably want to do a standard Spark map, that results in a tuple with the structure you are looking for. You can then just assign names to turn it back into a dataframe. Assuming the first column is your label and the rest are features you can do something like this: val df

Re: feeding DataFrames into predictive algorithms

2015-02-11 Thread Patrick Wendell
I think there is a minor error here in that the first example needs a tail after the seq: df.map { row = (row.getDouble(0), row.toSeq.tail.map(_.asInstanceOf[Double])) }.toDataFrame(label, features) On Wed, Feb 11, 2015 at 7:46 PM, Michael Armbrust mich...@databricks.com wrote: It sounds like