[ https://issues.apache.org/jira/browse/SPARK-28980?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Sean Owen updated SPARK-28980: ------------------------------ Component/s: MLlib Summary: Remove most remaining deprecated items since <= 2.2.0 for 3.0 (was: Remove remaining deprecated items since <= 2.2.0 for 3.0) > Remove most remaining deprecated items since <= 2.2.0 for 3.0 > ------------------------------------------------------------- > > Key: SPARK-28980 > URL: https://issues.apache.org/jira/browse/SPARK-28980 > Project: Spark > Issue Type: Task > Components: MLlib, PySpark, Spark Core, SQL, Structured Streaming, > YARN > Affects Versions: 3.0.0 > Reporter: Sean Owen > Assignee: Sean Owen > Priority: Major > > Following on https://issues.apache.org/jira/browse/SPARK-25908 I'd like to > propose removing the rest of the items that have been deprecated since <= > Spark 2.2.0, before Spark 3.0. > This appears to be: > - Remove SQLContext.createExternalTable and Catalog.createExternalTable, > deprecated in favor of createTable since 2.2.0, plus tests of deprecated > methods > - Remove deprecated toDegrees, toRadians SQL functions (see below) > - Remove deprecated KinesisUtils.createStream methods, plus tests of > deprecated methods, deprecate in 2.2.0 > - Remove deprecated MLlib (not Spark ML) linear method support, mostly > utility constructors and 'train' methods, and associated docs. This includes > methods in LinearRegression, LogisticRegression, Lasso, RidgeRegression. > These have been deprecated since 2.0.0 > - Remove deprecated Pyspark MLlib linear method support, including > LogisticRegressionWithSGD, LinearRegressionWithSGD, LassoWithSGD > - Remove 'runs' argument in KMeans.train() method, which has been a no-op > since 2.0.0 > - Remove deprecated ChiSqSelector isSorted protected method > - Remove deprecated 'yarn-cluster' and 'yarn-client' master argument in favor > of 'yarn' and deploy mode 'cluster', etc > But while preparing the change, I found: > - I was not able to remove deprecated DataFrameReader.json(RDD) in favor of > DataFrameReader.json(Dataset); the former was deprecated in 2.2.0, but, it is > still needed to support Pyspark's .json() method, which can't use a Dataset. > - Looks like SQLContext.createExternalTable was not actually deprecated in > Pyspark, but, almost certainly was meant to be? Catalog.createExternalTable > was. > - I afterwards noted that the toDegrees, toRadians functions were almost > removed fully in SPARK-25908, but Felix suggested keeping just the R version > as they hadn't been technically deprecated. I'd like to revisit that. Do we > really want the inconsistency? I'm not against reverting it again, but then > that implies leaving SQLContext.createExternalTable just in Pyspark too, > which seems weird. > - I *kept* LogisticRegressionWithSGD, LinearRegressionWithSGD, LassoWithSGD, > RidgeRegressionWithSGD in Pyspark, though deprecated, as it is hard to remove > them (still used by StreamingLogisticRegressionWithSGD?) and they are not > fully removed in Scala. Maybe should not have been deprecated. > I will open a PR accordingly for more detailed review. -- This message was sent by Atlassian Jira (v8.3.2#803003) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org