This is an automated email from the ASF dual-hosted git repository.
github-bot pushed a commit to branch asf-site
in repository https://gitbox.apache.org/repos/asf/datafusion-comet.git
The following commit(s) were added to refs/heads/asf-site by this push:
new 7a5d592e1 Publish built docs triggered by
84df1ce61df409243c89d65d1aeb347234b5bc21
7a5d592e1 is described below
commit 7a5d592e18166fd06c8d50bed32875a8a8e0bf39
Author: github-actions[bot] <github-actions[bot]@users.noreply.github.com>
AuthorDate: Wed Feb 25 14:49:02 2026 +0000
Publish built docs triggered by 84df1ce61df409243c89d65d1aeb347234b5bc21
---
.../adding_a_new_expression.md.txt | 68 +++++++++++++++++-----
_sources/user-guide/latest/configs.md.txt | 1 +
contributor-guide/adding_a_new_expression.html | 65 ++++++++++++++++-----
searchindex.js | 2 +-
user-guide/latest/configs.html | 14 +++--
5 files changed, 115 insertions(+), 35 deletions(-)
diff --git a/_sources/contributor-guide/adding_a_new_expression.md.txt
b/_sources/contributor-guide/adding_a_new_expression.md.txt
index 7853c126b..e989b7636 100644
--- a/_sources/contributor-guide/adding_a_new_expression.md.txt
+++ b/_sources/contributor-guide/adding_a_new_expression.md.txt
@@ -210,9 +210,59 @@ Any notes provided will be logged to help with debugging
and understanding why a
#### Adding Spark-side Tests for the New Expression
-It is important to verify that the new expression is correctly recognized by
the native execution engine and matches the expected spark behavior. To do
this, you can add a set of test cases in the `CometExpressionSuite`, and use
the `checkSparkAnswerAndOperator` method to compare the results of the new
expression with the expected Spark results and that Comet's native execution
engine is able to execute the expression.
+It is important to verify that the new expression is correctly recognized by
the native execution engine and matches the expected Spark behavior. The
preferred way to add test coverage is to write a SQL test file using the SQL
file test framework. This approach is simpler than writing Scala test code and
makes it easy to cover many input combinations and edge cases.
+
+##### Writing a SQL test file
+
+Create a `.sql` file under the appropriate subdirectory in
`spark/src/test/resources/sql-tests/expressions/` (e.g., `string/`, `math/`,
`array/`). The file should create a table with test data, then run queries that
exercise the expression. Here is an example for the `unhex` expression:
+
+```sql
+-- ConfigMatrix: parquet.enable.dictionary=false,true
+
+statement
+CREATE TABLE test_unhex(col string) USING parquet
+
+statement
+INSERT INTO test_unhex VALUES
+ ('537061726B2053514C'),
+ ('737472696E67'),
+ ('\0'),
+ (''),
+ ('###'),
+ ('G123'),
+ ('hello'),
+ ('A1B'),
+ ('0A1B'),
+ (NULL)
+
+-- column argument
+query
+SELECT unhex(col) FROM test_unhex
+
+-- literal arguments
+query
+SELECT unhex('48656C6C6F'), unhex(''), unhex(NULL)
+```
+
+Each `query` block automatically runs the SQL through both Spark and Comet and
compares results, and also verifies that Comet executes the expression natively
(not falling back to Spark).
+
+Run the test with:
+
+```shell
+./mvnw test -Dsuites="org.apache.comet.CometSqlFileTestSuite unhex" -Dtest=none
+```
+
+For full documentation on the test file format — including directives like
`ConfigMatrix`, query modes like `spark_answer_only` and `tolerance`, handling
known bugs with `ignore(...)`, and tips for writing thorough tests — see the
[SQL File Tests](sql-file-tests.md) guide.
+
+##### Tips
-For example, this is the test case for the `unhex` expression:
+- **Cover both column references and literals.** Comet often uses different
code paths for each. The SQL file test suite automatically disables constant
folding, so all-literal queries are evaluated natively.
+- **Include edge cases** such as `NULL`, empty strings, boundary values,
`NaN`, and multibyte UTF-8 characters.
+- **Keep one file per expression** to make failures easy to locate.
+
+##### Scala tests (alternative)
+
+For cases that require programmatic setup or custom assertions beyond what SQL
files support, you can also add Scala test cases in `CometExpressionSuite`
using the `checkSparkAnswerAndOperator` method:
```scala
test("unhex") {
@@ -236,11 +286,7 @@ test("unhex") {
}
```
-#### Testing with Literal Values
-
-When writing tests that use literal values (e.g., `SELECT
my_func('literal')`), Spark's constant folding optimizer may evaluate the
expression at planning time rather than execution time. This means your Comet
implementation might not actually be exercised during the test.
-
-To ensure literal expressions are executed by Comet, disable the constant
folding optimizer:
+When writing Scala tests with literal values (e.g., `SELECT
my_func('literal')`), Spark's constant folding optimizer may evaluate the
expression at planning time, bypassing Comet. To prevent this, disable constant
folding:
```scala
test("my_func with literals") {
@@ -251,14 +297,6 @@ test("my_func with literals") {
}
```
-This is particularly important for:
-
-- Edge case tests using specific literal values (e.g., null handling, overflow
conditions)
-- Tests verifying behavior with special input values
-- Any test where the expression inputs are entirely literal
-
-When possible, prefer testing with column references from tables (as shown in
the `unhex` example above), which naturally avoids the constant folding issue.
-
### Adding the Expression To the Protobuf Definition
Once you have the expression implemented in Scala, you might need to update
the protobuf definition to include the new expression. You may not need to do
this if the expression is already covered by the existing protobuf definition
(e.g. you're adding a new scalar function that uses the `ScalarFunc` message).
diff --git a/_sources/user-guide/latest/configs.md.txt
b/_sources/user-guide/latest/configs.md.txt
index 48668992f..9a3accc0c 100644
--- a/_sources/user-guide/latest/configs.md.txt
+++ b/_sources/user-guide/latest/configs.md.txt
@@ -28,6 +28,7 @@ Comet provides the following configuration settings.
| Config | Description | Default Value |
|--------|-------------|---------------|
| `spark.comet.scan.enabled` | Whether to enable native scans. When this is
turned on, Spark will use Comet to read supported data sources (currently only
Parquet is supported natively). Note that to enable native vectorized
execution, both this config and `spark.comet.exec.enabled` need to be enabled.
| true |
+| `spark.comet.scan.icebergNative.dataFileConcurrencyLimit` | The number of
Iceberg data files to read concurrently within a single task. Higher values
improve throughput for tables with many small files by overlapping I/O latency,
but increase memory usage. Values between 2 and 8 are suggested. | 1 |
| `spark.comet.scan.icebergNative.enabled` | Whether to enable native Iceberg
table scan using iceberg-rust. When enabled, Iceberg tables are read directly
through native execution, bypassing Spark's DataSource V2 API for better
performance. | false |
| `spark.comet.scan.preFetch.enabled` | Whether to enable pre-fetching feature
of CometScan. | false |
| `spark.comet.scan.preFetch.threadNum` | The number of threads running
pre-fetching for CometScan. Effective if spark.comet.scan.preFetch.enabled is
enabled. Note that more pre-fetching threads means more memory requirement to
store pre-fetched row groups. | 2 |
diff --git a/contributor-guide/adding_a_new_expression.html
b/contributor-guide/adding_a_new_expression.html
index 7729b11b9..6366bce73 100644
--- a/contributor-guide/adding_a_new_expression.html
+++ b/contributor-guide/adding_a_new_expression.html
@@ -635,8 +635,55 @@ under the License.
</section>
<section id="adding-spark-side-tests-for-the-new-expression">
<h4>Adding Spark-side Tests for the New Expression<a class="headerlink"
href="#adding-spark-side-tests-for-the-new-expression" title="Link to this
heading">#</a></h4>
-<p>It is important to verify that the new expression is correctly recognized
by the native execution engine and matches the expected spark behavior. To do
this, you can add a set of test cases in the <code class="docutils literal
notranslate"><span class="pre">CometExpressionSuite</span></code>, and use the
<code class="docutils literal notranslate"><span
class="pre">checkSparkAnswerAndOperator</span></code> method to compare the
results of the new expression with the expected Spark resu [...]
-<p>For example, this is the test case for the <code class="docutils literal
notranslate"><span class="pre">unhex</span></code> expression:</p>
+<p>It is important to verify that the new expression is correctly recognized
by the native execution engine and matches the expected Spark behavior. The
preferred way to add test coverage is to write a SQL test file using the SQL
file test framework. This approach is simpler than writing Scala test code and
makes it easy to cover many input combinations and edge cases.</p>
+<section id="writing-a-sql-test-file">
+<h5>Writing a SQL test file<a class="headerlink"
href="#writing-a-sql-test-file" title="Link to this heading">#</a></h5>
+<p>Create a <code class="docutils literal notranslate"><span
class="pre">.sql</span></code> file under the appropriate subdirectory in <code
class="docutils literal notranslate"><span
class="pre">spark/src/test/resources/sql-tests/expressions/</span></code>
(e.g., <code class="docutils literal notranslate"><span
class="pre">string/</span></code>, <code class="docutils literal
notranslate"><span class="pre">math/</span></code>, <code class="docutils
literal notranslate"><span class="pre"> [...]
+<div class="highlight-sql notranslate"><div
class="highlight"><pre><span></span><span class="c1">-- ConfigMatrix:
parquet.enable.dictionary=false,true</span>
+
+<span class="k">statement</span>
+<span class="k">CREATE</span><span class="w"> </span><span
class="k">TABLE</span><span class="w"> </span><span
class="n">test_unhex</span><span class="p">(</span><span
class="n">col</span><span class="w"> </span><span class="n">string</span><span
class="p">)</span><span class="w"> </span><span class="k">USING</span><span
class="w"> </span><span class="n">parquet</span>
+
+<span class="k">statement</span>
+<span class="k">INSERT</span><span class="w"> </span><span
class="k">INTO</span><span class="w"> </span><span
class="n">test_unhex</span><span class="w"> </span><span class="k">VALUES</span>
+<span class="w"> </span><span class="p">(</span><span
class="s1">'537061726B2053514C'</span><span class="p">),</span>
+<span class="w"> </span><span class="p">(</span><span
class="s1">'737472696E67'</span><span class="p">),</span>
+<span class="w"> </span><span class="p">(</span><span
class="s1">'\0'</span><span class="p">),</span>
+<span class="w"> </span><span class="p">(</span><span
class="s1">''</span><span class="p">),</span>
+<span class="w"> </span><span class="p">(</span><span
class="s1">'###'</span><span class="p">),</span>
+<span class="w"> </span><span class="p">(</span><span
class="s1">'G123'</span><span class="p">),</span>
+<span class="w"> </span><span class="p">(</span><span
class="s1">'hello'</span><span class="p">),</span>
+<span class="w"> </span><span class="p">(</span><span
class="s1">'A1B'</span><span class="p">),</span>
+<span class="w"> </span><span class="p">(</span><span
class="s1">'0A1B'</span><span class="p">),</span>
+<span class="w"> </span><span class="p">(</span><span
class="k">NULL</span><span class="p">)</span>
+
+<span class="c1">-- column argument</span>
+<span class="n">query</span>
+<span class="k">SELECT</span><span class="w"> </span><span
class="n">unhex</span><span class="p">(</span><span class="n">col</span><span
class="p">)</span><span class="w"> </span><span class="k">FROM</span><span
class="w"> </span><span class="n">test_unhex</span>
+
+<span class="c1">-- literal arguments</span>
+<span class="n">query</span>
+<span class="k">SELECT</span><span class="w"> </span><span
class="n">unhex</span><span class="p">(</span><span
class="s1">'48656C6C6F'</span><span class="p">),</span><span class="w">
</span><span class="n">unhex</span><span class="p">(</span><span
class="s1">''</span><span class="p">),</span><span class="w">
</span><span class="n">unhex</span><span class="p">(</span><span
class="k">NULL</span><span class="p">)</span>
+</pre></div>
+</div>
+<p>Each <code class="docutils literal notranslate"><span
class="pre">query</span></code> block automatically runs the SQL through both
Spark and Comet and compares results, and also verifies that Comet executes the
expression natively (not falling back to Spark).</p>
+<p>Run the test with:</p>
+<div class="highlight-shell notranslate"><div
class="highlight"><pre><span></span>./mvnw<span class="w"> </span><span
class="nb">test</span><span class="w"> </span>-Dsuites<span
class="o">=</span><span class="s2">"org.apache.comet.CometSqlFileTestSuite
unhex"</span><span class="w"> </span>-Dtest<span class="o">=</span>none
+</pre></div>
+</div>
+<p>For full documentation on the test file format — including directives like
<code class="docutils literal notranslate"><span
class="pre">ConfigMatrix</span></code>, query modes like <code class="docutils
literal notranslate"><span class="pre">spark_answer_only</span></code> and
<code class="docutils literal notranslate"><span
class="pre">tolerance</span></code>, handling known bugs with <code
class="docutils literal notranslate"><span
class="pre">ignore(...)</span></code>, and tips for [...]
+</section>
+<section id="tips">
+<h5>Tips<a class="headerlink" href="#tips" title="Link to this
heading">#</a></h5>
+<ul class="simple">
+<li><p><strong>Cover both column references and literals.</strong> Comet often
uses different code paths for each. The SQL file test suite automatically
disables constant folding, so all-literal queries are evaluated
natively.</p></li>
+<li><p><strong>Include edge cases</strong> such as <code class="docutils
literal notranslate"><span class="pre">NULL</span></code>, empty strings,
boundary values, <code class="docutils literal notranslate"><span
class="pre">NaN</span></code>, and multibyte UTF-8 characters.</p></li>
+<li><p><strong>Keep one file per expression</strong> to make failures easy to
locate.</p></li>
+</ul>
+</section>
+<section id="scala-tests-alternative">
+<h5>Scala tests (alternative)<a class="headerlink"
href="#scala-tests-alternative" title="Link to this heading">#</a></h5>
+<p>For cases that require programmatic setup or custom assertions beyond what
SQL files support, you can also add Scala test cases in <code class="docutils
literal notranslate"><span class="pre">CometExpressionSuite</span></code> using
the <code class="docutils literal notranslate"><span
class="pre">checkSparkAnswerAndOperator</span></code> method:</p>
<div class="highlight-scala notranslate"><div
class="highlight"><pre><span></span><span class="n">test</span><span
class="p">(</span><span class="s">"unhex"</span><span
class="p">)</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="kd">val</span><span class="w">
</span><span class="n">table</span><span class="w"> </span><span
class="o">=</span><span class="w"> </span><span
class="s">"unhex_table"</span>
<span class="w"> </span><span class="n">withTable</span><span
class="p">(</span><span class="n">table</span><span class="p">)</span><span
class="w"> </span><span class="p">{</span>
@@ -658,11 +705,7 @@ under the License.
<span class="p">}</span>
</pre></div>
</div>
-</section>
-<section id="testing-with-literal-values">
-<h4>Testing with Literal Values<a class="headerlink"
href="#testing-with-literal-values" title="Link to this heading">#</a></h4>
-<p>When writing tests that use literal values (e.g., <code class="docutils
literal notranslate"><span class="pre">SELECT</span> <span
class="pre">my_func('literal')</span></code>), Spark’s constant folding
optimizer may evaluate the expression at planning time rather than execution
time. This means your Comet implementation might not actually be exercised
during the test.</p>
-<p>To ensure literal expressions are executed by Comet, disable the constant
folding optimizer:</p>
+<p>When writing Scala tests with literal values (e.g., <code class="docutils
literal notranslate"><span class="pre">SELECT</span> <span
class="pre">my_func('literal')</span></code>), Spark’s constant folding
optimizer may evaluate the expression at planning time, bypassing Comet. To
prevent this, disable constant folding:</p>
<div class="highlight-scala notranslate"><div
class="highlight"><pre><span></span><span class="n">test</span><span
class="p">(</span><span class="s">"my_func with literals"</span><span
class="p">)</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="n">withSQLConf</span><span
class="p">(</span><span class="nc">SQLConf</span><span class="p">.</span><span
class="nc">OPTIMIZER_EXCLUDED_RULES</span><span class="p">.</span><span
class="n">key</span><span class="w"> </span><span class="o">-></span>
<span class="w"> </span><span
class="s">"org.apache.spark.sql.catalyst.optimizer.ConstantFolding"</span><span
class="p">)</span><span class="w"> </span><span class="p">{</span>
@@ -671,13 +714,7 @@ under the License.
<span class="p">}</span>
</pre></div>
</div>
-<p>This is particularly important for:</p>
-<ul class="simple">
-<li><p>Edge case tests using specific literal values (e.g., null handling,
overflow conditions)</p></li>
-<li><p>Tests verifying behavior with special input values</p></li>
-<li><p>Any test where the expression inputs are entirely literal</p></li>
-</ul>
-<p>When possible, prefer testing with column references from tables (as shown
in the <code class="docutils literal notranslate"><span
class="pre">unhex</span></code> example above), which naturally avoids the
constant folding issue.</p>
+</section>
</section>
</section>
<section id="adding-the-expression-to-the-protobuf-definition">
diff --git a/searchindex.js b/searchindex.js
index 7a4ca8d25..dd309a87a 100644
--- a/searchindex.js
+++ b/searchindex.js
@@ -1 +1 @@
-Search.setIndex({"alltitles": {"1. Format Your Code": [[12,
"format-your-code"]], "1. Install Comet": [[22, "install-comet"]], "1. Native
Operators (nativeExecs map)": [[4, "native-operators-nativeexecs-map"]], "2.
Build and Verify": [[12, "build-and-verify"]], "2. Clone Spark and Apply Diff":
[[22, "clone-spark-and-apply-diff"]], "2. Sink Operators (sinks map)": [[4,
"sink-operators-sinks-map"]], "3. Comet JVM Operators": [[4,
"comet-jvm-operators"]], "3. Run Clippy (Recommended)": [[12 [...]
\ No newline at end of file
+Search.setIndex({"alltitles": {"1. Format Your Code": [[12,
"format-your-code"]], "1. Install Comet": [[22, "install-comet"]], "1. Native
Operators (nativeExecs map)": [[4, "native-operators-nativeexecs-map"]], "2.
Build and Verify": [[12, "build-and-verify"]], "2. Clone Spark and Apply Diff":
[[22, "clone-spark-and-apply-diff"]], "2. Sink Operators (sinks map)": [[4,
"sink-operators-sinks-map"]], "3. Comet JVM Operators": [[4,
"comet-jvm-operators"]], "3. Run Clippy (Recommended)": [[12 [...]
\ No newline at end of file
diff --git a/user-guide/latest/configs.html b/user-guide/latest/configs.html
index 6ca35b790..eaaca6158 100644
--- a/user-guide/latest/configs.html
+++ b/user-guide/latest/configs.html
@@ -477,23 +477,27 @@ under the License.
<td><p>Whether to enable native scans. When this is turned on, Spark will use
Comet to read supported data sources (currently only Parquet is supported
natively). Note that to enable native vectorized execution, both this config
and <code class="docutils literal notranslate"><span
class="pre">spark.comet.exec.enabled</span></code> need to be enabled.</p></td>
<td><p>true</p></td>
</tr>
-<tr class="row-odd"><td><p><code class="docutils literal notranslate"><span
class="pre">spark.comet.scan.icebergNative.enabled</span></code></p></td>
+<tr class="row-odd"><td><p><code class="docutils literal notranslate"><span
class="pre">spark.comet.scan.icebergNative.dataFileConcurrencyLimit</span></code></p></td>
+<td><p>The number of Iceberg data files to read concurrently within a single
task. Higher values improve throughput for tables with many small files by
overlapping I/O latency, but increase memory usage. Values between 2 and 8 are
suggested.</p></td>
+<td><p>1</p></td>
+</tr>
+<tr class="row-even"><td><p><code class="docutils literal notranslate"><span
class="pre">spark.comet.scan.icebergNative.enabled</span></code></p></td>
<td><p>Whether to enable native Iceberg table scan using iceberg-rust. When
enabled, Iceberg tables are read directly through native execution, bypassing
Spark’s DataSource V2 API for better performance.</p></td>
<td><p>false</p></td>
</tr>
-<tr class="row-even"><td><p><code class="docutils literal notranslate"><span
class="pre">spark.comet.scan.preFetch.enabled</span></code></p></td>
+<tr class="row-odd"><td><p><code class="docutils literal notranslate"><span
class="pre">spark.comet.scan.preFetch.enabled</span></code></p></td>
<td><p>Whether to enable pre-fetching feature of CometScan.</p></td>
<td><p>false</p></td>
</tr>
-<tr class="row-odd"><td><p><code class="docutils literal notranslate"><span
class="pre">spark.comet.scan.preFetch.threadNum</span></code></p></td>
+<tr class="row-even"><td><p><code class="docutils literal notranslate"><span
class="pre">spark.comet.scan.preFetch.threadNum</span></code></p></td>
<td><p>The number of threads running pre-fetching for CometScan. Effective if
spark.comet.scan.preFetch.enabled is enabled. Note that more pre-fetching
threads means more memory requirement to store pre-fetched row groups.</p></td>
<td><p>2</p></td>
</tr>
-<tr class="row-even"><td><p><code class="docutils literal notranslate"><span
class="pre">spark.comet.scan.unsignedSmallIntSafetyCheck</span></code></p></td>
+<tr class="row-odd"><td><p><code class="docutils literal notranslate"><span
class="pre">spark.comet.scan.unsignedSmallIntSafetyCheck</span></code></p></td>
<td><p>Parquet files may contain unsigned 8-bit integers (UINT_8) which Spark
maps to ShortType. When this config is true (default), Comet falls back to
Spark for ShortType columns because we cannot distinguish signed INT16 (safe)
from unsigned UINT_8 (may produce different results). Set to false to allow
native execution of ShortType columns if you know your data does not contain
unsigned UINT_8 columns from improperly encoded Parquet files. For more
information, refer to the <a class=" [...]
<td><p>true</p></td>
</tr>
-<tr class="row-odd"><td><p><code class="docutils literal notranslate"><span
class="pre">spark.hadoop.fs.comet.libhdfs.schemes</span></code></p></td>
+<tr class="row-even"><td><p><code class="docutils literal notranslate"><span
class="pre">spark.hadoop.fs.comet.libhdfs.schemes</span></code></p></td>
<td><p>Defines filesystem schemes (e.g., hdfs, webhdfs) that the native side
accesses via libhdfs, separated by commas. Valid only when built with hdfs
feature enabled.</p></td>
<td><p></p></td>
</tr>
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]