Hi All,
I need to print auc and prc for GBTClassifier model, it seems okay for
RandomForestClassifier but not GBTClassifier, though rawPrediction column is
neither in original data.
the codes are :
.......................................... // Set up Pipeline val stages
= new mutable.ArrayBuffer[PipelineStage]()
val labelColName = if (algo == "GBTClassification") "indexedLabel" else
"label" if (algo == "GBTClassification") { val labelIndexer = new
StringIndexer() .setInputCol("label") .setOutputCol(labelColName)
stages += labelIndexer }
val rawFeatureSize =
data.select("rawFeatures").first().toString().split(",").length; var indices
: Array[Int] = new Array[Int](rawFeatureSize); for (i <- 0 until
rawFeatureSize) { indices(i) = i; } val featuresSlicer = new
VectorSlicer() .setInputCol("rawFeatures") .setOutputCol("features")
.setIndices(indices) stages += featuresSlicer
val dt = algo match {
// THE PROBLEM IS HERE:
//GBTClassifier will not work, error is that field rawPrediction is not there,
which appeared in the last line of code as pipeline.fit(data) //however, the
similar codes are okay for RandomForestClassifier//in fact, rawPrediction
column seems not in original data, but generated in
BinaryClassificationEvaluator pipelineModel by auto
case "GBTClassification" => new GBTClassifier()
.setFeaturesCol("features") .setLabelCol(labelColName)
.setLabelCol(labelColName) case _ => throw new
IllegalArgumentException("Algo ${params.algo} not supported.") }
val grid = new ParamGridBuilder() .addGrid(dt.maxDepth, Array(1))
.addGrid(dt.subsamplingRate, Array(0.5)) .build() val cv = new
CrossValidator() .setEstimator(dt) .setEstimatorParamMaps(grid)
.setEvaluator((new BinaryClassificationEvaluator)) .setNumFolds(6)
stages += cv
val pipeline = new Pipeline().setStages(stages.toArray)
// Fit the Pipeline val pipelineModel =
pipeline.fit(data)........................
Thanks in advance ~~
Zhiliang