Re: [PR] [SPARK-56505][SQL][TESTS] SharedSparkSession provides sql.SparkSession, add SharedClassicSparkSession, ClassicQueryTest [spark]

via GitHub Mon, 25 May 2026 06:17:39 -0700


cloud-fan commented on code in PR #55356:
URL: https://github.com/apache/spark/pull/55356#discussion_r3298358795



##########
sql/core/src/test/scala/org/apache/spark/sql/QueryTest.scala:
##########
@@ -286,11 +284,10 @@ trait QueryTestBase
    * but the implicits import is needed in the constructor.
    */
   protected object testImplicits
-    extends classic.SQLImplicits
+    extends SQLImplicits
       with classic.ClassicConversions
       with classic.ColumnConversions {
-    override protected def session: classic.SparkSession =
-      self.spark.asInstanceOf[classic.SparkSession]
+    override protected def session: SparkSession = self.spark
     override protected def converter: classic.ColumnNodeToExpressionConverter =
       self.spark.asInstanceOf[classic.SparkSession].converter

Review Comment:
   `testImplicits` was lifted to `session: SparkSession` on line 290, but the 
converter on this line still casts to `classic.SparkSession`. The trait is then 
declared in the abstract `QueryTestBase`. Any future Connect-providing suite 
that uses `import testImplicits._` and triggers any conversion that touches the 
converter will throw `ClassCastException` — and the ~434 callers across the 
repo still import this. Since `testImplicits` extends 
`classic.ClassicConversions`/`classic.ColumnConversions`, it's effectively 
classic-only and belongs in `classic.QueryTest` (where `classicTestImplicits` 
already lives, doing the same thing without the cast). Either migrate the ~434 
callers in this PR (mechanical, since the new `classicTestImplicits` has 
identical semantics for classic suites), or document that `testImplicits` is a 
classic-only escape hatch retained for compatibility — silently failing on 
Connect with no signpost is the worst of both options.



##########
sql/core/src/test/scala/org/apache/spark/sql/QueryTest.scala:
##########
@@ -404,12 +401,18 @@ trait QueryTestBase
 
   // Blocking uncache table for tests
   protected def uncacheTable(tableName: String): Unit = {
-    val tableIdent = 
spark.sessionState.sqlParser.parseTableIdentifier(tableName)
-    val cascade = !spark.sessionState.catalog.isTempView(tableIdent)
-    spark.sharedState.cacheManager.uncacheQuery(
-      spark.table(tableName).asInstanceOf[classic.Dataset[_]],
-      cascade = cascade,
-      blocking = true)
+    spark match {
+      case spark: classic.SparkSession =>
+        val tableIdent = 
spark.sessionState.sqlParser.parseTableIdentifier(tableName)
+        val cascade = !spark.sessionState.catalog.isTempView(tableIdent)
+        spark.sharedState.cacheManager.uncacheQuery(
+          spark.table(tableName),
+          cascade = cascade,
+          blocking = true)
+      case spark =>
+        // TODO i guess this doesn't block?!

Review Comment:
   `// TODO i guess this doesn't block?!` — author's own admission that the 
Connect branch is unverified. Connect's `spark.sql("UNCACHE TABLE …")` returns 
a lazy client DataFrame; without `.collect()` the command may never execute on 
the server, and even with `.collect()` the *blocking* contract of the classic 
branch is gone (the original code awaits via `uncacheQuery(..., blocking = 
true)`). Three options: (a) drop the Connect branch and move the method to 
`classic.QueryTest` until a real Connect impl exists; (b) replace the TODO with 
`throw new UnsupportedOperationException("uncacheTable: not supported on 
Connect")` so the failure is loud; (c) actually implement+verify the Connect 
path. Shipping the TODO as-is leaves a known-broken code path in the lifted 
base.



##########
sql/core/src/test/scala/org/apache/spark/sql/QueryTest.scala:
##########
@@ -535,7 +517,8 @@ trait QueryTestBase
    */
   def makeQualifiedPath(path: String): URI = {
     val hadoopPath = new Path(path)
-    val fs = hadoopPath.getFileSystem(spark.sessionState.newHadoopConf())
+    val fs = hadoopPath.getFileSystem(
+      spark.asInstanceOf[classic.SparkSession].sessionState.newHadoopConf())

Review Comment:
   `spark.asInstanceOf[classic.SparkSession].sessionState.newHadoopConf()` in 
the abstract base — same pattern as `stripSparkFilter` and 
`logicalPlanToSparkQuery`, which were correctly moved to `classic.QueryTest`. 
This one was missed. Relocate it.



##########
sql/core/src/test/scala/org/apache/spark/sql/QueryTest.scala:
##########
@@ -668,9 +651,12 @@ trait QueryTestBase
    * Waits for all tasks on all executors to be finished.
    */
   protected def waitForTasksToFinish(): Unit = {
-    eventually(timeout(10.seconds)) {
-      assert(spark.sparkContext.statusTracker
-        .getExecutorInfos.map(_.numRunningTasks()).sum == 0)
+    spark match {

Review Comment:
   `match` with only `case spark: classic.SparkSession` and no fallback case 
produces a `MatchError` if called from a Connect-providing suite — not a useful 
error message. Either add `case _ => throw new 
UnsupportedOperationException("waitForTasksToFinish: not supported on 
Connect")`, or — better — move the whole method to `classic.QueryTest` since it 
has no Connect counterpart anyway. Same anti-pattern as `uncacheTable` on line 
404, but worse: that one at least *tries* to handle Connect (incorrectly); this 
one fails with a cryptic JVM error.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Re: [PR] [SPARK-56505][SQL][TESTS] SharedSparkSession provides sql.SparkSession, add SharedClassicSparkSession, ClassicQueryTest [spark]

Reply via email to