Github user tomerk commented on the pull request: https://github.com/apache/spark/pull/3637#issuecomment-71546339 Well, from my perspective an ideal interface for scala-only support for the developer API example would look something like as follows: ```scala /** * Example of defining a type of [[Classifier]]. * * NOTE: This is private since it is an example. In practice, you may not want it to be private. */ private class MyLogisticRegression extends Classifier[Vector] with MaxIterParam(100) { // Initialize default value of MaxIter // This method is used by fit() override protected def train( dataset: SchemaRDD, params: ParamMap): MyLogisticRegressionModel = { // Extract columns from data using helper method. val oldDataset = extractLabeledPoints(dataset, params) // Do learning to estimate the weight vector. val numFeatures = oldDataset.take(1)(0).features.size val weights = Vectors.zeros(numFeatures) // Learning would happen here. // Create a model, and return it. new MyLogisticRegressionModel(weights) } } /** * Example of defining a type of [[ClassificationModel]]. * * NOTE: This is private since it is an example. In practice, you may not want it to be private. */ private class MyLogisticRegressionModel(val weights: Vector) extends ClassificationModel[Vector] { // This uses the default implementation of transform(), which reads column "features" and outputs // columns "prediction" and "rawPrediction." // This uses the default implementation of predict(), which chooses the label corresponding to // the maximum value returned by [[predictRaw()]]. /** * Raw prediction for each possible label. * The meaning of a "raw" prediction may vary between algorithms, but it intuitively gives * a measure of confidence in each possible label (where larger = more confident). * This internal method is used to implement [[transform()]] and output [[rawPredictionCol]]. * * @return vector where element i is the raw prediction for label i. * This raw prediction may be any real number, where a larger value indicates greater * confidence for that label. */ override protected def predictRaw(features: Vector): Vector = { val margin = BLAS.dot(features, weights) // There are 2 classes (binary classification), so we return a length-2 vector, // where index i corresponds to class i (i = 0, 1). Vectors.dense(-margin, margin) } /** Number of classes the label can take. 2 indicates binary classification. */ override val numClasses: Int = 2 } ``` I guess some things of note here are: - Less parameter trickiness (Already discussed) - Less generics needed everywhere (Thanks to scala type inference & this.type) - No need for developers to specify their own copy method (which would require developers to remember that the parameter map requires a deep copy), it would just happen in the background somehow - No need to specify the "fittingParamMap" in the transformer's definition, the background stuff should automatically pass everything along to the transformer Like I said, I have doubts about how much of this can be done because of the need to support Java.
--- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- --------------------------------------------------------------------- To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org