Re: one hot encode a column of vector

2017-04-24 Thread Yan Facai
How about using countvectorizer? http://spark.apache.org/docs/latest/ml-features.html#countvectorizer On Tue, Apr 25, 2017 at 9:31 AM, Zeming Yu wrote: > how do I do one hot encode on a column of array? e.g. ['TG', 'CA'] > > > FYI here's my code for one hot encoding

one hot encode a column of vector

2017-04-24 Thread Zeming Yu
how do I do one hot encode on a column of array? e.g. ['TG', 'CA'] FYI here's my code for one hot encoding normal categorical columns. How do I make it work for a column of array? from pyspark.ml import Pipeline from pyspark.ml.feature import StringIndexer indexers =