How about using countvectorizer?
http://spark.apache.org/docs/latest/ml-features.html#countvectorizer
On Tue, Apr 25, 2017 at 9:31 AM, Zeming Yu wrote:
> how do I do one hot encode on a column of array? e.g. ['TG', 'CA']
>
>
> FYI here's my code for one hot encoding
how do I do one hot encode on a column of array? e.g. ['TG', 'CA']
FYI here's my code for one hot encoding normal categorical columns.
How do I make it work for a column of array?
from pyspark.ml import Pipeline
from pyspark.ml.feature import StringIndexer
indexers =