Hi All,
I need to print auc and prc for GBTClassifier model, it seems okay for 
RandomForestClassifier but not GBTClassifier, though rawPrediction column is 
neither in original data.
the codes are :
..........................................    // Set up Pipeline    val stages 
= new mutable.ArrayBuffer[PipelineStage]()
    val labelColName = if (algo == "GBTClassification") "indexedLabel" else 
"label"    if (algo == "GBTClassification") {      val labelIndexer = new 
StringIndexer()        .setInputCol("label")        .setOutputCol(labelColName) 
     stages += labelIndexer    }
    val rawFeatureSize = 
data.select("rawFeatures").first().toString().split(",").length;    var indices 
: Array[Int] = new Array[Int](rawFeatureSize);    for (i <- 0 until 
rawFeatureSize) {        indices(i) = i;    }    val featuresSlicer = new 
VectorSlicer()      .setInputCol("rawFeatures")      .setOutputCol("features")  
    .setIndices(indices)    stages += featuresSlicer
    val dt = algo match {
// THE PROBLEM IS HERE:
//GBTClassifier will not work, error is that field rawPrediction is not there, 
which appeared in the last line of code as pipeline.fit(data) //however, the 
similar codes are okay for RandomForestClassifier//in fact, rawPrediction 
column seems not in original data, but generated in 
BinaryClassificationEvaluator pipelineModel by auto 
      case "GBTClassification" =>        new GBTClassifier()           
.setFeaturesCol("features")          .setLabelCol(labelColName)          
.setLabelCol(labelColName)      case _ => throw new 
IllegalArgumentException("Algo ${params.algo} not supported.")    }
    val grid = new ParamGridBuilder()      .addGrid(dt.maxDepth, Array(1))      
.addGrid(dt.subsamplingRate, Array(0.5))      .build()    val cv = new 
CrossValidator()      .setEstimator(dt)      .setEstimatorParamMaps(grid)      
.setEvaluator((new BinaryClassificationEvaluator))      .setNumFolds(6)    
stages += cv
    val pipeline = new Pipeline().setStages(stages.toArray)
    // Fit the Pipeline    val pipelineModel = 
pipeline.fit(data)........................
Thanks in advance ~~
Zhiliang 

Reply via email to