Anh Tuan Pham created SPARK-47926:
-------------------------------------

             Summary: Scala Spark Test Utils/Framework
                 Key: SPARK-47926
                 URL: https://issues.apache.org/jira/browse/SPARK-47926
             Project: Spark
          Issue Type: New Feature
          Components: Spark Core
    Affects Versions: 3.5.0
            Reporter: Anh Tuan Pham


As users of Apache Spark in Scala, I've observed a gap in the availability of a 
dedicated testing framework similar to the one present for PySpark. While 
various open-source repositories and resources exist for PySpark testing also 
has Scala counterpart:
 * [spark-testing-base|[holdenk/spark-testing-base: Base classes to use when 
writing tests with Spark 
(github.com)|https://github.com/holdenk/spark-testing-base]]
 * [spark-fast-test|[MrPowers/spark-fast-tests: Apache Spark testing helpers 
(dependency free & works with Scalatest, uTest, and MUnit) 
(github.com)|https://github.com/MrPowers/spark-fast-tests]]

 

The same level of support is lacking for Spark Scala. For example, 
spark-fast-test was last updated 2 years ago and still not supporting newer 
spark version. This cause some pain when I am trying to do Integration and 
performance testing on our cluster where we provide Fat Test Jar with provided 
Spark Jar of newer version.

I propose the development of an official testing utils tailored specifically 
for Spark Scala with similar if not the same feature providing by PySpark Test 
utils provided since 3.5.0. These utilities would empower Scala developers with 
the tools and capabilities necessary to write robust and reliable tests for 
their Spark applications.

 

Key Features to Include:
 * Utility Functions: Provide a suite of utility functions designed to simplify 
common testing tasks, such as assert_dataframe_equality and 
assert_schema_equality, thereby reducing boilerplate code and accelerating test 
development. In the Future this would also help us implement, consolidate the 
best practices for comparing two dataframes correctly and efficiently with 
automated testing and benchmarks. 
 * Comprehensive Documentation: Offer comprehensive documentation and examples 
to guide users in effectively utilizing the testing framework/utils, ensuring 
ease of adoption and accelerating the learning curve.

I am happy to contribute to this feature and would welcome any help. Also Let 
me know if This would require a SPIP



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

Reply via email to