[jira] [Created] (SPARK-48805) Replace calls to bridged APIs based on SparkSession#sqlContext with SparkSession API

Yang Jie (Jira) Thu, 04 Jul 2024 00:11:04 -0700

Yang Jie created SPARK-48805:
--------------------------------

             Summary: Replace calls to bridged APIs based on 
SparkSession#sqlContext with SparkSession API
                 Key: SPARK-48805
                 URL: https://issues.apache.org/jira/browse/SPARK-48805
             Project: Spark
          Issue Type: Improvement
          Components: Examples, ML, SQL, Structured Streaming
    Affects Versions: 4.0.0
            Reporter: Yang Jie



In the internal code of Spark, there are instances where, despite having a 
SparkSession instance, the bridged APIs based on SparkSession#sqlContext are 
still used. So we can makes some simplifications:


1. `SparkSession#sqlContext#read` -> `SparkSession#read`


```scala
/**
   * Returns a [[DataFrameReader]] that can be used to read non-streaming data 
in as a
   * `DataFrame`.
   * {{{
   *   sqlContext.read.parquet("/path/to/file.parquet")
   *   sqlContext.read.schema(schema).json("/path/to/file.json")
   * }}}
   *
   * @group genericdata
   * @since 1.4.0
   */
  def read: DataFrameReader = sparkSession.read
```

2. `SparkSession#sqlContext#setConf` -> `SparkSession#conf#set`


```scala
  /**
   * Set the given Spark SQL configuration property.
   *
   * @group config
   * @since 1.0.0
   */
  def setConf(key: String, value: String): Unit = {
    sparkSession.conf.set(key, value)
  }
```


3. `SparkSession#sqlContext#getConf` -> `SparkSession#conf#get`

```scala
/**
   * Return the value of Spark SQL configuration property for the given key.
   *
   * @group config
   * @since 1.0.0
   */
  def getConf(key: String): String = {
    sparkSession.conf.get(key)
  }
```

4. `SparkSession#sqlContext#createDataFrame` -> `SparkSession#createDataFrame`

```scala
/**
   * Creates a DataFrame from an RDD of Product (e.g. case classes, tuples).
   *
   * @group dataframes
   * @since 1.3.0
   */
  def createDataFrame[A <: Product : TypeTag](rdd: RDD[A]): DataFrame = {
    sparkSession.createDataFrame(rdd)
  }
```

5. `SparkSession#sqlContext#sessionState` -> `SparkSession#sessionState`

```scala
private[sql] def sessionState: SessionState = sparkSession.sessionState
```

6. `SparkSession#sqlContext#sharedState` -> `SparkSession#sharedState`

```scala
private[sql] def sharedState: SharedState = sparkSession.sharedState
```

7. `SparkSession#sqlContext#streams` -> `SparkSession#streams`


```
/**
   * Returns a `StreamingQueryManager` that allows managing all the
   * [[org.apache.spark.sql.streaming.StreamingQuery StreamingQueries]] active 
on `this` context.
   *
   * @since 2.0.0
   */
  def streams: StreamingQueryManager = sparkSession.streams
```

8. `SparkSession#sqlContext#uncacheTable` -> 
``SparkSession#catalog#uncacheTable`

```
/**
   * Removes the specified table from the in-memory cache.
   * @group cachemgmt
   * @since 1.3.0
   */
  def uncacheTable(tableName: String): Unit = {
    sparkSession.catalog.uncacheTable(tableName)
  }
```



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-48805) Replace calls to bridged APIs based on SparkSession#sqlContext with SparkSession API

Reply via email to