We run CometFuzz manually. However, many of our more recent Scala tests use
the same underlying classes to generate random data for a specified schema,
and we cover all the usual edge cases there (NaN, Infinity, -0.0, etc.).

On Thu, Dec 18, 2025 at 9:04 AM James Xu <[email protected]> wrote:

> Thanks Andy, very informative!
>
> For the fuzz testing, does Comet run in CI or just manually?
>
> On 2025/12/18 14:15:04 Andy Grove wrote:
> > I'd like to share some quick notes about our experiences with correctness
> > testing in Apache DataFusion Comet.
> >
> > We run the full Spark SQL test suite and it has caught many bugs. One
> issue
> > is that many of the tests explicitly check that plans contain specific
> > operators such as ProjectExec, so we have to modify those tests to also
> > accept CometProjectExec. We have to do this for multiple Spark versions
> > too. The approach we took is to maintain diff files in the Comet repo
> that
> > we apply to Spark to modify the tests. You can read about our approach in
> > the contributor guide documentation [1].
> >
> > We also developed CometFuzz [2], a fuzz testing tool that generates
> random
> > Parquet files and random queries and then runs those queries with Comet
> > disabled, then enabled, and compares the results. This tool actually has
> no
> > dependencies on Comet and you could use it with Auron as well.
> >
> > I hope this is helpful.
> >
> > Thanks,
> >
> > Andy.
> >
> > [1]
> >
> https://datafusion.apache.org/comet/contributor-guide/spark-sql-tests.html
> > [2] https://github.com/apache/datafusion-comet/tree/main/fuzz-testing
> >
> > On Thu, Dec 18, 2025 at 1:15 AM James Xu <[email protected]> wrote:
> >
> > > Hi Mang Zhang,
> > >
> > > For question 1: Yes, there will be large amount of Spark test file, but
> > > most of the code is simply some chore work to inherit the vanilla Spark
> > > test with native engine enabled, we are NOT copying the Spark tests
> into
> > > Auron code. We will depend on the Spark's test JAR as you mentioned,
> but we
> > > need to do some inheritance, enable the native engine, and sometimes
> > > disable some tests(due to Auron's bug).
> > >
> > >
> > > For question 2: First the test code of a specific released version of
> > > Spark will not change. And if there is change in Spark, e.g. some bugs
> are
> > > fixed, Spark change, we also need to change, this is the purpose of
> > > correctness testing.
> > >
> > > On 2025/12/18 04:18:02 Mang Zhang wrote:
> > > > Hi James,
> > > > Thanks for driving this. +1!
> > > > I see the proposal mentions migrating a large amount of Spark test
> code.
> > > There are two issues here:
> > > > 1. There will be a significant amount of migration work.
> > > > 2. Code maintenance work: If Spark code changes, Auron may also
> require
> > > corresponding modifications.
> > > >
> > > >
> > > > Can we achieve our testing objectives by introducing Spark's test
> JAR?
> > > > This approach would only require updating Spark's dependency version,
> > > saving us a significant amount of work.
> > > > Similarly, Flink could adopt the same pattern in the future.
> > > >
> > > >
> > > >
> > > > --
> > > >
> > > > Best regards,
> > > > Mang Zhang
> > > >
> > > >
> > > >
> > > > At 2025-12-18 11:38:17, "Shreyesh Arangath" <
> [email protected]>
> > > wrote:
> > > > >Great effort! Thanks for driving this. +1!
> > > > >
> > > > >PS: Could you also provide comment access for the document so that
> we
> > > can
> > > > >ask questions? Thanks
> > > > >
> > > > >Best,
> > > > >Shreyesh
> > > > >
> > > > >On Tue, Dec 16, 2025 at 8:05 AM James <[email protected]>
> wrote:
> > > > >
> > > > >> Hi, everyone I'd like to start a discussion about AIP-2: Enhance
> > > Auron’s
> > > > >> Correctness Testing [1]. Looking forward to your feedback.
> > > > >>
> > > > >> [1].
> > > > >>
> > > > >>
> > >
> https://docs.google.com/document/d/1v8wMyLZXuA7tmDSJysAo8CRqWX36SWR5ZgjyOMSfm94/edit?tab=t.0
> > > > >>
> > > >
> > >
> >
>

Reply via email to