(datafusion-comet) branch asf-site updated: Publish built docs triggered by 84df1ce61df409243c89d65d1aeb347234b5bc21

github-bot Wed, 25 Feb 2026 06:49:33 -0800

This is an automated email from the ASF dual-hosted git repository.

github-bot pushed a commit to branch asf-site
in repository https://gitbox.apache.org/repos/asf/datafusion-comet.git



The following commit(s) were added to refs/heads/asf-site by this push:
     new 7a5d592e1 Publish built docs triggered by 
84df1ce61df409243c89d65d1aeb347234b5bc21
7a5d592e1 is described below

commit 7a5d592e18166fd06c8d50bed32875a8a8e0bf39
Author: github-actions[bot] <github-actions[bot]@users.noreply.github.com>
AuthorDate: Wed Feb 25 14:49:02 2026 +0000

    Publish built docs triggered by 84df1ce61df409243c89d65d1aeb347234b5bc21
---
 .../adding_a_new_expression.md.txt                 | 68 +++++++++++++++++-----
 _sources/user-guide/latest/configs.md.txt          |  1 +
 contributor-guide/adding_a_new_expression.html     | 65 ++++++++++++++++-----
 searchindex.js                                     |  2 +-
 user-guide/latest/configs.html                     | 14 +++--
 5 files changed, 115 insertions(+), 35 deletions(-)

diff --git a/_sources/contributor-guide/adding_a_new_expression.md.txt 
b/_sources/contributor-guide/adding_a_new_expression.md.txt
index 7853c126b..e989b7636 100644
--- a/_sources/contributor-guide/adding_a_new_expression.md.txt
+++ b/_sources/contributor-guide/adding_a_new_expression.md.txt
@@ -210,9 +210,59 @@ Any notes provided will be logged to help with debugging 
and understanding why a
 
 #### Adding Spark-side Tests for the New Expression
 
-It is important to verify that the new expression is correctly recognized by 
the native execution engine and matches the expected spark behavior. To do 
this, you can add a set of test cases in the `CometExpressionSuite`, and use 
the `checkSparkAnswerAndOperator` method to compare the results of the new 
expression with the expected Spark results and that Comet's native execution 
engine is able to execute the expression.
+It is important to verify that the new expression is correctly recognized by 
the native execution engine and matches the expected Spark behavior. The 
preferred way to add test coverage is to write a SQL test file using the SQL 
file test framework. This approach is simpler than writing Scala test code and 
makes it easy to cover many input combinations and edge cases.
+
+##### Writing a SQL test file
+
+Create a `.sql` file under the appropriate subdirectory in 
`spark/src/test/resources/sql-tests/expressions/` (e.g., `string/`, `math/`, 
`array/`). The file should create a table with test data, then run queries that 
exercise the expression. Here is an example for the `unhex` expression:
+
+```sql
+-- ConfigMatrix: parquet.enable.dictionary=false,true
+
+statement
+CREATE TABLE test_unhex(col string) USING parquet
+
+statement
+INSERT INTO test_unhex VALUES
+  ('537061726B2053514C'),
+  ('737472696E67'),
+  ('\0'),
+  (''),
+  ('###'),
+  ('G123'),
+  ('hello'),
+  ('A1B'),
+  ('0A1B'),
+  (NULL)
+
+-- column argument
+query
+SELECT unhex(col) FROM test_unhex
+
+-- literal arguments
+query
+SELECT unhex('48656C6C6F'), unhex(''), unhex(NULL)
+```
+
+Each `query` block automatically runs the SQL through both Spark and Comet and 
compares results, and also verifies that Comet executes the expression natively 
(not falling back to Spark).
+
+Run the test with:
+
+```shell
+./mvnw test -Dsuites="org.apache.comet.CometSqlFileTestSuite unhex" -Dtest=none
+```
+
+For full documentation on the test file format — including directives like 
`ConfigMatrix`, query modes like `spark_answer_only` and `tolerance`, handling 
known bugs with `ignore(...)`, and tips for writing thorough tests — see the 
[SQL File Tests](sql-file-tests.md) guide.
+
+##### Tips
 
-For example, this is the test case for the `unhex` expression:
+- **Cover both column references and literals.** Comet often uses different 
code paths for each. The SQL file test suite automatically disables constant 
folding, so all-literal queries are evaluated natively.
+- **Include edge cases** such as `NULL`, empty strings, boundary values, 
`NaN`, and multibyte UTF-8 characters.
+- **Keep one file per expression** to make failures easy to locate.
+
+##### Scala tests (alternative)
+
+For cases that require programmatic setup or custom assertions beyond what SQL 
files support, you can also add Scala test cases in `CometExpressionSuite` 
using the `checkSparkAnswerAndOperator` method:
 
 ```scala
 test("unhex") {
@@ -236,11 +286,7 @@ test("unhex") {
 }
 ```
 
-#### Testing with Literal Values
-
-When writing tests that use literal values (e.g., `SELECT 
my_func('literal')`), Spark's constant folding optimizer may evaluate the 
expression at planning time rather than execution time. This means your Comet 
implementation might not actually be exercised during the test.
-
-To ensure literal expressions are executed by Comet, disable the constant 
folding optimizer:
+When writing Scala tests with literal values (e.g., `SELECT 
my_func('literal')`), Spark's constant folding optimizer may evaluate the 
expression at planning time, bypassing Comet. To prevent this, disable constant 
folding:
 
 ```scala
 test("my_func with literals") {
@@ -251,14 +297,6 @@ test("my_func with literals") {
 }
 ```
 
-This is particularly important for:
-
-- Edge case tests using specific literal values (e.g., null handling, overflow 
conditions)
-- Tests verifying behavior with special input values
-- Any test where the expression inputs are entirely literal
-
-When possible, prefer testing with column references from tables (as shown in 
the `unhex` example above), which naturally avoids the constant folding issue.
-
 ### Adding the Expression To the Protobuf Definition
 
 Once you have the expression implemented in Scala, you might need to update 
the protobuf definition to include the new expression. You may not need to do 
this if the expression is already covered by the existing protobuf definition 
(e.g. you're adding a new scalar function that uses the `ScalarFunc` message).
diff --git a/_sources/user-guide/latest/configs.md.txt 
b/_sources/user-guide/latest/configs.md.txt
index 48668992f..9a3accc0c 100644
--- a/_sources/user-guide/latest/configs.md.txt
+++ b/_sources/user-guide/latest/configs.md.txt
@@ -28,6 +28,7 @@ Comet provides the following configuration settings.
 | Config | Description | Default Value |
 |--------|-------------|---------------|
 | `spark.comet.scan.enabled` | Whether to enable native scans. When this is 
turned on, Spark will use Comet to read supported data sources (currently only 
Parquet is supported natively). Note that to enable native vectorized 
execution, both this config and `spark.comet.exec.enabled` need to be enabled. 
| true |
+| `spark.comet.scan.icebergNative.dataFileConcurrencyLimit` | The number of 
Iceberg data files to read concurrently within a single task. Higher values 
improve throughput for tables with many small files by overlapping I/O latency, 
but increase memory usage. Values between 2 and 8 are suggested. | 1 |
 | `spark.comet.scan.icebergNative.enabled` | Whether to enable native Iceberg 
table scan using iceberg-rust. When enabled, Iceberg tables are read directly 
through native execution, bypassing Spark's DataSource V2 API for better 
performance. | false |
 | `spark.comet.scan.preFetch.enabled` | Whether to enable pre-fetching feature 
of CometScan. | false |
 | `spark.comet.scan.preFetch.threadNum` | The number of threads running 
pre-fetching for CometScan. Effective if spark.comet.scan.preFetch.enabled is 
enabled. Note that more pre-fetching threads means more memory requirement to 
store pre-fetched row groups. | 2 |
diff --git a/contributor-guide/adding_a_new_expression.html 
b/contributor-guide/adding_a_new_expression.html
index 7729b11b9..6366bce73 100644
--- a/contributor-guide/adding_a_new_expression.html
+++ b/contributor-guide/adding_a_new_expression.html
@@ -635,8 +635,55 @@ under the License.
 </section>
 <section id="adding-spark-side-tests-for-the-new-expression">
 <h4>Adding Spark-side Tests for the New Expression<a class="headerlink" 
href="#adding-spark-side-tests-for-the-new-expression" title="Link to this 
heading">#</a></h4>
-<p>It is important to verify that the new expression is correctly recognized 
by the native execution engine and matches the expected spark behavior. To do 
this, you can add a set of test cases in the <code class="docutils literal 
notranslate"><span class="pre">CometExpressionSuite</span></code>, and use the 
<code class="docutils literal notranslate"><span 
class="pre">checkSparkAnswerAndOperator</span></code> method to compare the 
results of the new expression with the expected Spark resu [...]
-<p>For example, this is the test case for the <code class="docutils literal 
notranslate"><span class="pre">unhex</span></code> expression:</p>
+<p>It is important to verify that the new expression is correctly recognized 
by the native execution engine and matches the expected Spark behavior. The 
preferred way to add test coverage is to write a SQL test file using the SQL 
file test framework. This approach is simpler than writing Scala test code and 
makes it easy to cover many input combinations and edge cases.</p>
+<section id="writing-a-sql-test-file">
+<h5>Writing a SQL test file<a class="headerlink" 
href="#writing-a-sql-test-file" title="Link to this heading">#</a></h5>
+<p>Create a <code class="docutils literal notranslate"><span 
class="pre">.sql</span></code> file under the appropriate subdirectory in <code 
class="docutils literal notranslate"><span 
class="pre">spark/src/test/resources/sql-tests/expressions/</span></code> 
(e.g., <code class="docutils literal notranslate"><span 
class="pre">string/</span></code>, <code class="docutils literal 
notranslate"><span class="pre">math/</span></code>, <code class="docutils 
literal notranslate"><span class="pre"> [...]
+<div class="highlight-sql notranslate"><div 
class="highlight"><pre><span></span><span class="c1">-- ConfigMatrix: 
parquet.enable.dictionary=false,true</span>
+
+<span class="k">statement</span>
+<span class="k">CREATE</span><span class="w"> </span><span 
class="k">TABLE</span><span class="w"> </span><span 
class="n">test_unhex</span><span class="p">(</span><span 
class="n">col</span><span class="w"> </span><span class="n">string</span><span 
class="p">)</span><span class="w"> </span><span class="k">USING</span><span 
class="w"> </span><span class="n">parquet</span>
+
+<span class="k">statement</span>
+<span class="k">INSERT</span><span class="w"> </span><span 
class="k">INTO</span><span class="w"> </span><span 
class="n">test_unhex</span><span class="w"> </span><span class="k">VALUES</span>
+<span class="w">  </span><span class="p">(</span><span 
class="s1">&#39;537061726B2053514C&#39;</span><span class="p">),</span>
+<span class="w">  </span><span class="p">(</span><span 
class="s1">&#39;737472696E67&#39;</span><span class="p">),</span>
+<span class="w">  </span><span class="p">(</span><span 
class="s1">&#39;\0&#39;</span><span class="p">),</span>
+<span class="w">  </span><span class="p">(</span><span 
class="s1">&#39;&#39;</span><span class="p">),</span>
+<span class="w">  </span><span class="p">(</span><span 
class="s1">&#39;###&#39;</span><span class="p">),</span>
+<span class="w">  </span><span class="p">(</span><span 
class="s1">&#39;G123&#39;</span><span class="p">),</span>
+<span class="w">  </span><span class="p">(</span><span 
class="s1">&#39;hello&#39;</span><span class="p">),</span>
+<span class="w">  </span><span class="p">(</span><span 
class="s1">&#39;A1B&#39;</span><span class="p">),</span>
+<span class="w">  </span><span class="p">(</span><span 
class="s1">&#39;0A1B&#39;</span><span class="p">),</span>
+<span class="w">  </span><span class="p">(</span><span 
class="k">NULL</span><span class="p">)</span>
+
+<span class="c1">-- column argument</span>
+<span class="n">query</span>
+<span class="k">SELECT</span><span class="w"> </span><span 
class="n">unhex</span><span class="p">(</span><span class="n">col</span><span 
class="p">)</span><span class="w"> </span><span class="k">FROM</span><span 
class="w"> </span><span class="n">test_unhex</span>
+
+<span class="c1">-- literal arguments</span>
+<span class="n">query</span>
+<span class="k">SELECT</span><span class="w"> </span><span 
class="n">unhex</span><span class="p">(</span><span 
class="s1">&#39;48656C6C6F&#39;</span><span class="p">),</span><span class="w"> 
</span><span class="n">unhex</span><span class="p">(</span><span 
class="s1">&#39;&#39;</span><span class="p">),</span><span class="w"> 
</span><span class="n">unhex</span><span class="p">(</span><span 
class="k">NULL</span><span class="p">)</span>
+</pre></div>
+</div>
+<p>Each <code class="docutils literal notranslate"><span 
class="pre">query</span></code> block automatically runs the SQL through both 
Spark and Comet and compares results, and also verifies that Comet executes the 
expression natively (not falling back to Spark).</p>
+<p>Run the test with:</p>
+<div class="highlight-shell notranslate"><div 
class="highlight"><pre><span></span>./mvnw<span class="w"> </span><span 
class="nb">test</span><span class="w"> </span>-Dsuites<span 
class="o">=</span><span class="s2">&quot;org.apache.comet.CometSqlFileTestSuite 
unhex&quot;</span><span class="w"> </span>-Dtest<span class="o">=</span>none
+</pre></div>
+</div>
+<p>For full documentation on the test file format — including directives like 
<code class="docutils literal notranslate"><span 
class="pre">ConfigMatrix</span></code>, query modes like <code class="docutils 
literal notranslate"><span class="pre">spark_answer_only</span></code> and 
<code class="docutils literal notranslate"><span 
class="pre">tolerance</span></code>, handling known bugs with <code 
class="docutils literal notranslate"><span 
class="pre">ignore(...)</span></code>, and tips for [...]
+</section>
+<section id="tips">
+<h5>Tips<a class="headerlink" href="#tips" title="Link to this 
heading">#</a></h5>
+<ul class="simple">
+<li><p><strong>Cover both column references and literals.</strong> Comet often 
uses different code paths for each. The SQL file test suite automatically 
disables constant folding, so all-literal queries are evaluated 
natively.</p></li>
+<li><p><strong>Include edge cases</strong> such as <code class="docutils 
literal notranslate"><span class="pre">NULL</span></code>, empty strings, 
boundary values, <code class="docutils literal notranslate"><span 
class="pre">NaN</span></code>, and multibyte UTF-8 characters.</p></li>
+<li><p><strong>Keep one file per expression</strong> to make failures easy to 
locate.</p></li>
+</ul>
+</section>
+<section id="scala-tests-alternative">
+<h5>Scala tests (alternative)<a class="headerlink" 
href="#scala-tests-alternative" title="Link to this heading">#</a></h5>
+<p>For cases that require programmatic setup or custom assertions beyond what 
SQL files support, you can also add Scala test cases in <code class="docutils 
literal notranslate"><span class="pre">CometExpressionSuite</span></code> using 
the <code class="docutils literal notranslate"><span 
class="pre">checkSparkAnswerAndOperator</span></code> method:</p>
 <div class="highlight-scala notranslate"><div 
class="highlight"><pre><span></span><span class="n">test</span><span 
class="p">(</span><span class="s">&quot;unhex&quot;</span><span 
class="p">)</span><span class="w"> </span><span class="p">{</span>
 <span class="w">  </span><span class="kd">val</span><span class="w"> 
</span><span class="n">table</span><span class="w"> </span><span 
class="o">=</span><span class="w"> </span><span 
class="s">&quot;unhex_table&quot;</span>
 <span class="w">  </span><span class="n">withTable</span><span 
class="p">(</span><span class="n">table</span><span class="p">)</span><span 
class="w"> </span><span class="p">{</span>
@@ -658,11 +705,7 @@ under the License.
 <span class="p">}</span>
 </pre></div>
 </div>
-</section>
-<section id="testing-with-literal-values">
-<h4>Testing with Literal Values<a class="headerlink" 
href="#testing-with-literal-values" title="Link to this heading">#</a></h4>
-<p>When writing tests that use literal values (e.g., <code class="docutils 
literal notranslate"><span class="pre">SELECT</span> <span 
class="pre">my_func('literal')</span></code>), Spark’s constant folding 
optimizer may evaluate the expression at planning time rather than execution 
time. This means your Comet implementation might not actually be exercised 
during the test.</p>
-<p>To ensure literal expressions are executed by Comet, disable the constant 
folding optimizer:</p>
+<p>When writing Scala tests with literal values (e.g., <code class="docutils 
literal notranslate"><span class="pre">SELECT</span> <span 
class="pre">my_func('literal')</span></code>), Spark’s constant folding 
optimizer may evaluate the expression at planning time, bypassing Comet. To 
prevent this, disable constant folding:</p>
 <div class="highlight-scala notranslate"><div 
class="highlight"><pre><span></span><span class="n">test</span><span 
class="p">(</span><span class="s">&quot;my_func with literals&quot;</span><span 
class="p">)</span><span class="w"> </span><span class="p">{</span>
 <span class="w">  </span><span class="n">withSQLConf</span><span 
class="p">(</span><span class="nc">SQLConf</span><span class="p">.</span><span 
class="nc">OPTIMIZER_EXCLUDED_RULES</span><span class="p">.</span><span 
class="n">key</span><span class="w"> </span><span class="o">-&gt;</span>
 <span class="w">      </span><span 
class="s">&quot;org.apache.spark.sql.catalyst.optimizer.ConstantFolding&quot;</span><span
 class="p">)</span><span class="w"> </span><span class="p">{</span>
@@ -671,13 +714,7 @@ under the License.
 <span class="p">}</span>
 </pre></div>
 </div>
-<p>This is particularly important for:</p>
-<ul class="simple">
-<li><p>Edge case tests using specific literal values (e.g., null handling, 
overflow conditions)</p></li>
-<li><p>Tests verifying behavior with special input values</p></li>
-<li><p>Any test where the expression inputs are entirely literal</p></li>
-</ul>
-<p>When possible, prefer testing with column references from tables (as shown 
in the <code class="docutils literal notranslate"><span 
class="pre">unhex</span></code> example above), which naturally avoids the 
constant folding issue.</p>
+</section>
 </section>
 </section>
 <section id="adding-the-expression-to-the-protobuf-definition">
diff --git a/searchindex.js b/searchindex.js
index 7a4ca8d25..dd309a87a 100644
--- a/searchindex.js
+++ b/searchindex.js
@@ -1 +1 @@
-Search.setIndex({"alltitles": {"1. Format Your Code": [[12, 
"format-your-code"]], "1. Install Comet": [[22, "install-comet"]], "1. Native 
Operators (nativeExecs map)": [[4, "native-operators-nativeexecs-map"]], "2. 
Build and Verify": [[12, "build-and-verify"]], "2. Clone Spark and Apply Diff": 
[[22, "clone-spark-and-apply-diff"]], "2. Sink Operators (sinks map)": [[4, 
"sink-operators-sinks-map"]], "3. Comet JVM Operators": [[4, 
"comet-jvm-operators"]], "3. Run Clippy (Recommended)": [[12 [...]
\ No newline at end of file
+Search.setIndex({"alltitles": {"1. Format Your Code": [[12, 
"format-your-code"]], "1. Install Comet": [[22, "install-comet"]], "1. Native 
Operators (nativeExecs map)": [[4, "native-operators-nativeexecs-map"]], "2. 
Build and Verify": [[12, "build-and-verify"]], "2. Clone Spark and Apply Diff": 
[[22, "clone-spark-and-apply-diff"]], "2. Sink Operators (sinks map)": [[4, 
"sink-operators-sinks-map"]], "3. Comet JVM Operators": [[4, 
"comet-jvm-operators"]], "3. Run Clippy (Recommended)": [[12 [...]
\ No newline at end of file
diff --git a/user-guide/latest/configs.html b/user-guide/latest/configs.html
index 6ca35b790..eaaca6158 100644
--- a/user-guide/latest/configs.html
+++ b/user-guide/latest/configs.html
@@ -477,23 +477,27 @@ under the License.
 <td><p>Whether to enable native scans. When this is turned on, Spark will use 
Comet to read supported data sources (currently only Parquet is supported 
natively). Note that to enable native vectorized execution, both this config 
and <code class="docutils literal notranslate"><span 
class="pre">spark.comet.exec.enabled</span></code> need to be enabled.</p></td>
 <td><p>true</p></td>
 </tr>
-<tr class="row-odd"><td><p><code class="docutils literal notranslate"><span 
class="pre">spark.comet.scan.icebergNative.enabled</span></code></p></td>
+<tr class="row-odd"><td><p><code class="docutils literal notranslate"><span 
class="pre">spark.comet.scan.icebergNative.dataFileConcurrencyLimit</span></code></p></td>
+<td><p>The number of Iceberg data files to read concurrently within a single 
task. Higher values improve throughput for tables with many small files by 
overlapping I/O latency, but increase memory usage. Values between 2 and 8 are 
suggested.</p></td>
+<td><p>1</p></td>
+</tr>
+<tr class="row-even"><td><p><code class="docutils literal notranslate"><span 
class="pre">spark.comet.scan.icebergNative.enabled</span></code></p></td>
 <td><p>Whether to enable native Iceberg table scan using iceberg-rust. When 
enabled, Iceberg tables are read directly through native execution, bypassing 
Spark’s DataSource V2 API for better performance.</p></td>
 <td><p>false</p></td>
 </tr>
-<tr class="row-even"><td><p><code class="docutils literal notranslate"><span 
class="pre">spark.comet.scan.preFetch.enabled</span></code></p></td>
+<tr class="row-odd"><td><p><code class="docutils literal notranslate"><span 
class="pre">spark.comet.scan.preFetch.enabled</span></code></p></td>
 <td><p>Whether to enable pre-fetching feature of CometScan.</p></td>
 <td><p>false</p></td>
 </tr>
-<tr class="row-odd"><td><p><code class="docutils literal notranslate"><span 
class="pre">spark.comet.scan.preFetch.threadNum</span></code></p></td>
+<tr class="row-even"><td><p><code class="docutils literal notranslate"><span 
class="pre">spark.comet.scan.preFetch.threadNum</span></code></p></td>
 <td><p>The number of threads running pre-fetching for CometScan. Effective if 
spark.comet.scan.preFetch.enabled is enabled. Note that more pre-fetching 
threads means more memory requirement to store pre-fetched row groups.</p></td>
 <td><p>2</p></td>
 </tr>
-<tr class="row-even"><td><p><code class="docutils literal notranslate"><span 
class="pre">spark.comet.scan.unsignedSmallIntSafetyCheck</span></code></p></td>
+<tr class="row-odd"><td><p><code class="docutils literal notranslate"><span 
class="pre">spark.comet.scan.unsignedSmallIntSafetyCheck</span></code></p></td>
 <td><p>Parquet files may contain unsigned 8-bit integers (UINT_8) which Spark 
maps to ShortType. When this config is true (default), Comet falls back to 
Spark for ShortType columns because we cannot distinguish signed INT16 (safe) 
from unsigned UINT_8 (may produce different results). Set to false to allow 
native execution of ShortType columns if you know your data does not contain 
unsigned UINT_8 columns from improperly encoded Parquet files. For more 
information, refer to the <a class=" [...]
 <td><p>true</p></td>
 </tr>
-<tr class="row-odd"><td><p><code class="docutils literal notranslate"><span 
class="pre">spark.hadoop.fs.comet.libhdfs.schemes</span></code></p></td>
+<tr class="row-even"><td><p><code class="docutils literal notranslate"><span 
class="pre">spark.hadoop.fs.comet.libhdfs.schemes</span></code></p></td>
 <td><p>Defines filesystem schemes (e.g., hdfs, webhdfs) that the native side 
accesses via libhdfs, separated by commas. Valid only when built with hdfs 
feature enabled.</p></td>
 <td><p></p></td>
 </tr>


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

(datafusion-comet) branch asf-site updated: Publish built docs triggered by 84df1ce61df409243c89d65d1aeb347234b5bc21

Reply via email to