Hey Sandy,
The work should be done by a VectorAssembler, which combines multiple
columns (double/int/vector) into a vector column, which becomes the
features column for regression. We can going to create JIRAs for each
of these standard feature transformers. It would be great if you can
help
It sounds like you probably want to do a standard Spark map, that results
in a tuple with the structure you are looking for. You can then just
assign names to turn it back into a dataframe.
Assuming the first column is your label and the rest are features you can
do something like this:
val df
I think there is a minor error here in that the first example needs a
tail after the seq:
df.map { row =
(row.getDouble(0), row.toSeq.tail.map(_.asInstanceOf[Double]))
}.toDataFrame(label, features)
On Wed, Feb 11, 2015 at 7:46 PM, Michael Armbrust
mich...@databricks.com wrote:
It sounds like