[ 
https://issues.apache.org/jira/browse/SPARK-28980?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sean Owen updated SPARK-28980:
------------------------------
    Component/s: MLlib
        Summary: Remove most remaining deprecated items since <= 2.2.0 for 3.0  
(was: Remove remaining deprecated items since <= 2.2.0 for 3.0)

> Remove most remaining deprecated items since <= 2.2.0 for 3.0
> -------------------------------------------------------------
>
>                 Key: SPARK-28980
>                 URL: https://issues.apache.org/jira/browse/SPARK-28980
>             Project: Spark
>          Issue Type: Task
>          Components: MLlib, PySpark, Spark Core, SQL, Structured Streaming, 
> YARN
>    Affects Versions: 3.0.0
>            Reporter: Sean Owen
>            Assignee: Sean Owen
>            Priority: Major
>
> Following on https://issues.apache.org/jira/browse/SPARK-25908 I'd like to 
> propose removing the rest of the items that have been deprecated since <= 
> Spark 2.2.0, before Spark 3.0.
> This appears to be:
> - Remove SQLContext.createExternalTable and Catalog.createExternalTable, 
> deprecated in favor of createTable since 2.2.0, plus tests of deprecated 
> methods
> - Remove deprecated toDegrees, toRadians SQL functions (see below)
> - Remove deprecated KinesisUtils.createStream methods, plus tests of 
> deprecated methods, deprecate in 2.2.0
> - Remove deprecated MLlib (not Spark ML) linear method support, mostly 
> utility constructors and 'train' methods, and associated docs. This includes 
> methods in LinearRegression, LogisticRegression, Lasso, RidgeRegression. 
> These have been deprecated since 2.0.0
> - Remove deprecated Pyspark MLlib linear method support, including 
> LogisticRegressionWithSGD, LinearRegressionWithSGD, LassoWithSGD
> - Remove 'runs' argument in KMeans.train() method, which has been a no-op 
> since 2.0.0
> - Remove deprecated ChiSqSelector isSorted protected method
> - Remove deprecated 'yarn-cluster' and 'yarn-client' master argument in favor 
> of 'yarn' and deploy mode 'cluster', etc
> But while preparing the change, I found:
> - I was not able to remove deprecated DataFrameReader.json(RDD) in favor of 
> DataFrameReader.json(Dataset); the former was deprecated in 2.2.0, but, it is 
> still needed to support Pyspark's .json() method, which can't use a Dataset.
> - Looks like SQLContext.createExternalTable was not actually deprecated in 
> Pyspark, but, almost certainly was meant to be? Catalog.createExternalTable 
> was.
> - I afterwards noted that the toDegrees, toRadians functions were almost 
> removed fully in SPARK-25908, but Felix suggested keeping just the R version 
> as they hadn't been technically deprecated. I'd like to revisit that. Do we 
> really want the inconsistency? I'm not against reverting it again, but then 
> that implies leaving SQLContext.createExternalTable just in Pyspark too, 
> which seems weird.
> - I *kept* LogisticRegressionWithSGD, LinearRegressionWithSGD, LassoWithSGD, 
> RidgeRegressionWithSGD in Pyspark, though deprecated, as it is hard to remove 
> them (still used by StreamingLogisticRegressionWithSGD?) and they are not 
> fully removed in Scala. Maybe should not have been deprecated.
> I will open a PR accordingly for more detailed review.



--
This message was sent by Atlassian Jira
(v8.3.2#803003)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

Reply via email to