Feng Zhang created SEDONA-568: --------------------------------- Summary: Refactor TestBaseScala to use method instead of a class-level variable for sparkSession Key: SEDONA-568 URL: https://issues.apache.org/jira/browse/SEDONA-568 Project: Apache Sedona Issue Type: Improvement Reporter: Feng Zhang
Refactoring the base class (org.apache.sedona.sql.TestBaseScala) to use a method instead of a class-level variable for sparkSession can be a good idea for several reasons: - *Lazy* Initialization: Using a method allows for lazy initialization, which can be beneficial if the creation of the SparkSession is resource-intensive or if it should only be created when needed. - {*}Flexibility{*}: It provides more flexibility for derived classes to customize or extend the initialization logic without having to override a class-level variable. - {*}Testability{*}: It can improve testability by allowing the SparkSession to be created in a controlled manner, which can be useful for unit tests. An example is as followings: {code:java} trait SparkSessionBuilder { protected val warehouseLocation: String protected val resourceFolder: String def createSparkSession(enableBroadcastJoin: Boolean, setInference: Boolean, enableMetrics: Boolean): SparkSession = { val builder = SedonaContext.builder() .master("local[*]") .appName("sedonasqlScalaTest") .config("spark.sql.warehouse.dir", warehouseLocation) if (enableBroadcastJoin) { builder.config("sedona.join.autoBroadcastJoinThreshold", "-1") } if (setInference) { builder.config("spark.kryoserializer.buffer.max", "64m") .config("spark.wherobots.inference.entrance", resourceFolder + "python/udfEntrance.py") .config("spark.wherobots.inference.files", resourceFolder + "python/udfDefinition.py") .config("spark.wherobots.inference.args", "3") } if (enableMetrics) { builder.config("spark.metrics.conf.*.sink.console.class", "org.apache.spark.metrics.sink.ConsoleSink") } builder.getOrCreate() } } {code} -- This message was sent by Atlassian Jira (v8.20.10#820010)