Hi, I'm trying to use the StringIndexer and OneHotEncoder, in order to vectorize some of my features. Unfortunately, OneHotEncoder only returns sparse vectors. I can't find a way, much less an efficient one, to convert the columns generated by OneHotEncoder into dense vectors. I need this as I will eventually be doing some deep learning on my data, not something internal to spark.
If I were to update OneHotEncoder to have a setDense option, is there much of a chance it would be accepted as a PR? Since the first question seems unlikely, is there a way to change a dataframe, with a sparse vector and index columns into columns, like the pandas get_dummies method: http://queirozf.com/entries/one-hot-encoding-a-feature-on-a-pandas-dataframe-an-example or is there a better way to replicate the get_dummies functionality? Thanks, Ian -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/Dense-Vectors-outputs-in-feature-engineering-tp27331.html Sent from the Apache Spark User List mailing list archive at Nabble.com. --------------------------------------------------------------------- To unsubscribe e-mail: user-unsubscr...@spark.apache.org