Feng Zhang created SEDONA-568:
---------------------------------
Summary: Refactor TestBaseScala to use method instead of a
class-level variable for sparkSession
Key: SEDONA-568
URL: https://issues.apache.org/jira/browse/SEDONA-568
Project: Apache Sedona
Issue Type: Improvement
Reporter: Feng Zhang
Refactoring the base class (org.apache.sedona.sql.TestBaseScala) to use a
method instead of a class-level variable for sparkSession can be a good idea
for several reasons:
- *Lazy* Initialization: Using a method allows for lazy initialization, which
can be beneficial if the creation of the SparkSession is resource-intensive or
if it should only be created when needed.
- {*}Flexibility{*}: It provides more flexibility for derived classes to
customize or extend the initialization logic without having to override a
class-level variable.
- {*}Testability{*}: It can improve testability by allowing the SparkSession to
be created in a controlled manner, which can be useful for unit tests.
An example is as followings:
{code:java}
trait SparkSessionBuilder {
protected val warehouseLocation: String
protected val resourceFolder: String def
createSparkSession(enableBroadcastJoin: Boolean, setInference: Boolean,
enableMetrics: Boolean): SparkSession = {
val builder = SedonaContext.builder()
.master("local[*]")
.appName("sedonasqlScalaTest")
.config("spark.sql.warehouse.dir", warehouseLocation) if
(enableBroadcastJoin) {
builder.config("sedona.join.autoBroadcastJoinThreshold", "-1")
} if (setInference) {
builder.config("spark.kryoserializer.buffer.max", "64m")
.config("spark.wherobots.inference.entrance", resourceFolder +
"python/udfEntrance.py")
.config("spark.wherobots.inference.files", resourceFolder +
"python/udfDefinition.py")
.config("spark.wherobots.inference.args", "3")
} if (enableMetrics) {
builder.config("spark.metrics.conf.*.sink.console.class",
"org.apache.spark.metrics.sink.ConsoleSink")
} builder.getOrCreate()
}
} {code}
--
This message was sent by Atlassian Jira
(v8.20.10#820010)