cloud-fan commented on code in PR #55356:
URL: https://github.com/apache/spark/pull/55356#discussion_r3298358795
##########
sql/core/src/test/scala/org/apache/spark/sql/QueryTest.scala:
##########
@@ -286,11 +284,10 @@ trait QueryTestBase
* but the implicits import is needed in the constructor.
*/
protected object testImplicits
- extends classic.SQLImplicits
+ extends SQLImplicits
with classic.ClassicConversions
with classic.ColumnConversions {
- override protected def session: classic.SparkSession =
- self.spark.asInstanceOf[classic.SparkSession]
+ override protected def session: SparkSession = self.spark
override protected def converter: classic.ColumnNodeToExpressionConverter =
self.spark.asInstanceOf[classic.SparkSession].converter
Review Comment:
`testImplicits` was lifted to `session: SparkSession` on line 290, but the
converter on this line still casts to `classic.SparkSession`. The trait is then
declared in the abstract `QueryTestBase`. Any future Connect-providing suite
that uses `import testImplicits._` and triggers any conversion that touches the
converter will throw `ClassCastException` — and the ~434 callers across the
repo still import this. Since `testImplicits` extends
`classic.ClassicConversions`/`classic.ColumnConversions`, it's effectively
classic-only and belongs in `classic.QueryTest` (where `classicTestImplicits`
already lives, doing the same thing without the cast). Either migrate the ~434
callers in this PR (mechanical, since the new `classicTestImplicits` has
identical semantics for classic suites), or document that `testImplicits` is a
classic-only escape hatch retained for compatibility — silently failing on
Connect with no signpost is the worst of both options.
##########
sql/core/src/test/scala/org/apache/spark/sql/QueryTest.scala:
##########
@@ -404,12 +401,18 @@ trait QueryTestBase
// Blocking uncache table for tests
protected def uncacheTable(tableName: String): Unit = {
- val tableIdent =
spark.sessionState.sqlParser.parseTableIdentifier(tableName)
- val cascade = !spark.sessionState.catalog.isTempView(tableIdent)
- spark.sharedState.cacheManager.uncacheQuery(
- spark.table(tableName).asInstanceOf[classic.Dataset[_]],
- cascade = cascade,
- blocking = true)
+ spark match {
+ case spark: classic.SparkSession =>
+ val tableIdent =
spark.sessionState.sqlParser.parseTableIdentifier(tableName)
+ val cascade = !spark.sessionState.catalog.isTempView(tableIdent)
+ spark.sharedState.cacheManager.uncacheQuery(
+ spark.table(tableName),
+ cascade = cascade,
+ blocking = true)
+ case spark =>
+ // TODO i guess this doesn't block?!
Review Comment:
`// TODO i guess this doesn't block?!` — author's own admission that the
Connect branch is unverified. Connect's `spark.sql("UNCACHE TABLE …")` returns
a lazy client DataFrame; without `.collect()` the command may never execute on
the server, and even with `.collect()` the *blocking* contract of the classic
branch is gone (the original code awaits via `uncacheQuery(..., blocking =
true)`). Three options: (a) drop the Connect branch and move the method to
`classic.QueryTest` until a real Connect impl exists; (b) replace the TODO with
`throw new UnsupportedOperationException("uncacheTable: not supported on
Connect")` so the failure is loud; (c) actually implement+verify the Connect
path. Shipping the TODO as-is leaves a known-broken code path in the lifted
base.
##########
sql/core/src/test/scala/org/apache/spark/sql/QueryTest.scala:
##########
@@ -535,7 +517,8 @@ trait QueryTestBase
*/
def makeQualifiedPath(path: String): URI = {
val hadoopPath = new Path(path)
- val fs = hadoopPath.getFileSystem(spark.sessionState.newHadoopConf())
+ val fs = hadoopPath.getFileSystem(
+ spark.asInstanceOf[classic.SparkSession].sessionState.newHadoopConf())
Review Comment:
`spark.asInstanceOf[classic.SparkSession].sessionState.newHadoopConf()` in
the abstract base — same pattern as `stripSparkFilter` and
`logicalPlanToSparkQuery`, which were correctly moved to `classic.QueryTest`.
This one was missed. Relocate it.
##########
sql/core/src/test/scala/org/apache/spark/sql/QueryTest.scala:
##########
@@ -668,9 +651,12 @@ trait QueryTestBase
* Waits for all tasks on all executors to be finished.
*/
protected def waitForTasksToFinish(): Unit = {
- eventually(timeout(10.seconds)) {
- assert(spark.sparkContext.statusTracker
- .getExecutorInfos.map(_.numRunningTasks()).sum == 0)
+ spark match {
Review Comment:
`match` with only `case spark: classic.SparkSession` and no fallback case
produces a `MatchError` if called from a Connect-providing suite — not a useful
error message. Either add `case _ => throw new
UnsupportedOperationException("waitForTasksToFinish: not supported on
Connect")`, or — better — move the whole method to `classic.QueryTest` since it
has no Connect counterpart anyway. Same anti-pattern as `uncacheTable` on line
404, but worse: that one at least *tries* to handle Connect (incorrectly); this
one fails with a cryptic JVM error.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]