hi all i have been toying around with this well known RandomForestExample code
val forest = RandomForest.trainClassifier( trainData, 7, Map(10 -> 4, 11 -> 40), 20, "auto", "entropy", 30, 300) This comes from this link ( https://www.safaribooksonline.com/library/view/advanced-analytics-with/9781491912751/ch04.html), and also Sean Owen's presentation (https://www.youtube.com/watch?v=ObiCMJ24ezs) and now i want to migrate it to use ML Libraries. The problem i have is that the MLLib example has categorical features, and i cannot find a way to use categorical features with ML Apparently i should use VectorIndexer, but VectorIndexer assumes only one input column for features. I am at the moment using Vectorassembler instead, but i cannot find a way to achieve the same I have checed spark samples, but all i can see is RandomForestClassifier using VectorIndexer for 1 feature Could anyone assist? This is my current code....what do i need to add to take into account categorical features? val labelIndexer = new StringIndexer() .setInputCol("Col0") .setOutputCol("indexedLabel") .fit(data) val features = new VectorAssembler() .setInputCols(Array( "Col1", "Col2", "Col3", "Col4", "Col5", "Col6", "Col7", "Col8", "Col9", "Col10")) .setOutputCol("features") val labelConverter = new IndexToString() .setInputCol("prediction") .setOutputCol("predictedLabel") .setLabels(labelIndexer.labels) val rf = new RandomForestClassifier() .setLabelCol("indexedLabel") .setFeaturesCol("features") .setNumTrees(20) .setMaxDepth(30) .setMaxBins(300) .setImpurity("entropy") println("Kicking off pipeline..") val pipeline = new Pipeline() .setStages(Array(labelIndexer, features, rf, labelConverter)) thanks in advance and regards Marco