Hey Sandy,
The work should be done by a VectorAssembler, which combines multiple
columns (double/int/vector) into a vector column, which becomes the
features column for regression. We can going to create JIRAs for each
of these standard feature transformers. It would be great if you can
help
Hey All,
I've been playing around with the new DataFrame and ML pipelines APIs and
am having trouble accomplishing what seems like should be a fairly basic
task.
I have a DataFrame where each column is a Double. I'd like to turn this
into a DataFrame with a features column and a label column
It sounds like you probably want to do a standard Spark map, that results
in a tuple with the structure you are looking for. You can then just
assign names to turn it back into a dataframe.
Assuming the first column is your label and the rest are features you can
do something like this:
val df
I think there is a minor error here in that the first example needs a
tail after the seq:
df.map { row =
(row.getDouble(0), row.toSeq.tail.map(_.asInstanceOf[Double]))
}.toDataFrame(label, features)
On Wed, Feb 11, 2015 at 7:46 PM, Michael Armbrust
mich...@databricks.com wrote:
It sounds like