Github user jkbradley commented on the pull request: https://github.com/apache/spark/pull/7099#issuecomment-117384810 > Are the *TrainingResults and Results classes too specialized for LinearRegressionModel? Where would be an appropriate level of abstraction? It's OK if we add abstractions now or later on, and those abstractions can be private at first if we are uncertain about the API. The public methods and classes won't be able to change in the future, so we do need to think about those for sure. > Any thoughts on RDDs versus DataFrames? If using DataFrames, suggested schemas for each intermediate step? Also, how to create a "local DataFrame" without a sqlContext? I think we should try to use DataFrames instead of RDDs wherever possible within the spark.ml APIs. Hopefully the schema will be clear based on the source of the data (e.g., following the input and output schema of the model.transform method). You can create a SQLContext from the given DataFrame's SparkContext.
--- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- --------------------------------------------------------------------- To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org