Hi,

I'm trying to use the StringIndexer and OneHotEncoder, in order to vectorize
some of my features. Unfortunately, OneHotEncoder only returns sparse
vectors. I can't find a way, much less an efficient one, to convert the
columns generated by OneHotEncoder into dense vectors. I need this as I will
eventually be doing some deep learning on my data, not something internal to
spark.

If I were to update OneHotEncoder to have a setDense option, is there much
of a chance it would be accepted as a PR?

Since the first question seems unlikely, is there a way to change a
dataframe, with a sparse vector and index columns into columns, like the
pandas get_dummies method:
http://queirozf.com/entries/one-hot-encoding-a-feature-on-a-pandas-dataframe-an-example

or is there a better way to replicate the get_dummies functionality?

Thanks,

Ian





--
View this message in context: 
http://apache-spark-user-list.1001560.n3.nabble.com/Dense-Vectors-outputs-in-feature-engineering-tp27331.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

---------------------------------------------------------------------
To unsubscribe e-mail: user-unsubscr...@spark.apache.org

Reply via email to