Hi I have dataset which is completely categorical and it does not contain even one column as numerical. Now I want to apply classification using Naive Bayes I have to predict whether given alert is actionable or not using YES/NO I have the following example of my dataset
DayOfWeek(int),AlertType(String),Application(String),Router(String),Symptom(String),Action(String) 0,Network1,App1,Router1,Not reachable,YES 0,Network1,App2,Router5,Not reachable,NO I am using Spark 1.6 and I see there is StringIndexer class which is used OneHotEncoding example given here https://spark.apache.org/docs/latest/ml-features.html#onehotencoder but I have almost 10000 unique words/features to map into continuous how do I create such a huge map. I have my dataset in csv file please guide me how do I convert my all the categorical features in csv file and use it in naive bayes model. -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/Spark-MLLlib-Ideal-way-to-convert-categorical-features-into-LabeledPoint-RDD-tp26125.html Sent from the Apache Spark User List mailing list archive at Nabble.com. --------------------------------------------------------------------- To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional commands, e-mail: user-h...@spark.apache.org