This is an automated email from the ASF dual-hosted git repository.
github-bot pushed a commit to branch asf-site
in repository https://gitbox.apache.org/repos/asf/datafusion.git
The following commit(s) were added to refs/heads/asf-site by this push:
new 1f54e88581 Publish built docs triggered by
dfba22862da6cbd59537edee963f5bce55bd7aa2
1f54e88581 is described below
commit 1f54e8858181e2d2151da6c4fb72361e32d181c4
Author: github-actions[bot] <github-actions[bot]@users.noreply.github.com>
AuthorDate: Wed Oct 29 18:52:01 2025 +0000
Publish built docs triggered by dfba22862da6cbd59537edee963f5bce55bd7aa2
---
_sources/library-user-guide/upgrading.md.txt | 78 ++++++++++++++++++++++++++++
library-user-guide/upgrading.html | 71 +++++++++++++++++++++++++
searchindex.js | 2 +-
3 files changed, 150 insertions(+), 1 deletion(-)
diff --git a/_sources/library-user-guide/upgrading.md.txt
b/_sources/library-user-guide/upgrading.md.txt
index c568b8b28e..f34b8b2a5c 100644
--- a/_sources/library-user-guide/upgrading.md.txt
+++ b/_sources/library-user-guide/upgrading.md.txt
@@ -182,6 +182,84 @@ let indices = projection_exprs.column_indices();
_execution plan_ of the query. With this release, `DESCRIBE query` now outputs
the computed _schema_ of the query, consistent with the behavior of `DESCRIBE
table_name`.
+### Introduction of `TableSchema` and changes to `FileSource::with_schema()`
method
+
+A new `TableSchema` struct has been introduced in the `datafusion-datasource`
crate to better manage table schemas with partition columns. This struct helps
distinguish between:
+
+- **File schema**: The schema of actual data files on disk
+- **Partition columns**: Columns derived from directory structure (e.g.,
Hive-style partitioning)
+- **Table schema**: The complete schema combining both file and partition
columns
+
+As part of this change, the `FileSource::with_schema()` method signature has
changed from accepting a `SchemaRef` to accepting a `TableSchema`.
+
+**Who is affected:**
+
+- Users who have implemented custom `FileSource` implementations will need to
update their code
+- Users who only use built-in file sources (Parquet, CSV, JSON, AVRO, Arrow)
are not affected
+
+**Migration guide for custom `FileSource` implementations:**
+
+```diff
+ use datafusion_datasource::file::FileSource;
+-use arrow::datatypes::SchemaRef;
++use datafusion_datasource::TableSchema;
+
+ impl FileSource for MyCustomSource {
+- fn with_schema(&self, schema: SchemaRef) -> Arc<dyn FileSource> {
++ fn with_schema(&self, schema: TableSchema) -> Arc<dyn FileSource> {
+ Arc::new(Self {
+- schema: Some(schema),
++ // Use schema.file_schema() to get the file schema without
partition columns
++ schema: Some(Arc::clone(schema.file_schema())),
+ ..self.clone()
+ })
+ }
+ }
+```
+
+For implementations that need access to partition columns:
+
+```rust,ignore
+fn with_schema(&self, schema: TableSchema) -> Arc<dyn FileSource> {
+ Arc::new(Self {
+ file_schema: Arc::clone(schema.file_schema()),
+ partition_cols: schema.table_partition_cols().clone(),
+ table_schema: Arc::clone(schema.table_schema()),
+ ..self.clone()
+ })
+}
+```
+
+**Note**: Most `FileSource` implementations only need to store the file schema
(without partition columns), as shown in the first example. The second pattern
of storing all three schema components is typically only needed for advanced
use cases where you need access to different schema representations for
different operations (e.g., ParquetSource uses the file schema for building
pruning predicates but needs the table schema for filter pushdown logic).
+
+**Using `TableSchema` directly:**
+
+If you're constructing a `FileScanConfig` or working with table schemas and
partition columns, you can now use `TableSchema`:
+
+```rust
+use datafusion_datasource::TableSchema;
+use arrow::datatypes::{Schema, Field, DataType};
+use std::sync::Arc;
+
+// Create a TableSchema with partition columns
+let file_schema = Arc::new(Schema::new(vec![
+ Field::new("user_id", DataType::Int64, false),
+ Field::new("amount", DataType::Float64, false),
+]));
+
+let partition_cols = vec![
+ Arc::new(Field::new("date", DataType::Utf8, false)),
+ Arc::new(Field::new("region", DataType::Utf8, false)),
+];
+
+let table_schema = TableSchema::new(file_schema, partition_cols);
+
+// Access different schema representations
+let file_schema_ref = table_schema.file_schema(); // Schema without
partition columns
+let full_schema = table_schema.table_schema(); // Complete schema
with partition columns
+let partition_cols_ref = table_schema.table_partition_cols(); // Just the
partition columns
+```
+
## DataFusion `50.0.0`
### ListingTable automatically detects Hive Partitioned tables
diff --git a/library-user-guide/upgrading.html
b/library-user-guide/upgrading.html
index fe81df1f4e..e600c52721 100644
--- a/library-user-guide/upgrading.html
+++ b/library-user-guide/upgrading.html
@@ -850,6 +850,76 @@ Users may need to update their paths to account for these
changes.</p>
<em>execution plan</em> of the query. With this release, <code class="docutils
literal notranslate"><span class="pre">DESCRIBE</span> <span
class="pre">query</span></code> now outputs
the computed <em>schema</em> of the query, consistent with the behavior of
<code class="docutils literal notranslate"><span class="pre">DESCRIBE</span>
<span class="pre">table_name</span></code>.</p>
</section>
+<section
id="introduction-of-tableschema-and-changes-to-filesource-with-schema-method">
+<h3>Introduction of <code class="docutils literal notranslate"><span
class="pre">TableSchema</span></code> and changes to <code class="docutils
literal notranslate"><span class="pre">FileSource::with_schema()</span></code>
method<a class="headerlink"
href="#introduction-of-tableschema-and-changes-to-filesource-with-schema-method"
title="Link to this heading">#</a></h3>
+<p>A new <code class="docutils literal notranslate"><span
class="pre">TableSchema</span></code> struct has been introduced in the <code
class="docutils literal notranslate"><span
class="pre">datafusion-datasource</span></code> crate to better manage table
schemas with partition columns. This struct helps distinguish between:</p>
+<ul class="simple">
+<li><p><strong>File schema</strong>: The schema of actual data files on
disk</p></li>
+<li><p><strong>Partition columns</strong>: Columns derived from directory
structure (e.g., Hive-style partitioning)</p></li>
+<li><p><strong>Table schema</strong>: The complete schema combining both file
and partition columns</p></li>
+</ul>
+<p>As part of this change, the <code class="docutils literal
notranslate"><span class="pre">FileSource::with_schema()</span></code> method
signature has changed from accepting a <code class="docutils literal
notranslate"><span class="pre">SchemaRef</span></code> to accepting a <code
class="docutils literal notranslate"><span
class="pre">TableSchema</span></code>.</p>
+<p><strong>Who is affected:</strong></p>
+<ul class="simple">
+<li><p>Users who have implemented custom <code class="docutils literal
notranslate"><span class="pre">FileSource</span></code> implementations will
need to update their code</p></li>
+<li><p>Users who only use built-in file sources (Parquet, CSV, JSON, AVRO,
Arrow) are not affected</p></li>
+</ul>
+<p><strong>Migration guide for custom <code class="docutils literal
notranslate"><span class="pre">FileSource</span></code>
implementations:</strong></p>
+<div class="highlight-diff notranslate"><div
class="highlight"><pre><span></span><span class="w"> </span>use
datafusion_datasource::file::FileSource;
+<span class="gd">-use arrow::datatypes::SchemaRef;</span>
+<span class="gi">+use datafusion_datasource::TableSchema;</span>
+
+<span class="w"> </span>impl FileSource for MyCustomSource {
+<span class="gd">- fn with_schema(&self, schema: SchemaRef) ->
Arc<dyn FileSource> {</span>
+<span class="gi">+ fn with_schema(&self, schema: TableSchema) ->
Arc<dyn FileSource> {</span>
+<span class="w"> </span> Arc::new(Self {
+<span class="gd">- schema: Some(schema),</span>
+<span class="gi">+ // Use schema.file_schema() to get the file
schema without partition columns</span>
+<span class="gi">+ schema:
Some(Arc::clone(schema.file_schema())),</span>
+<span class="w"> </span> ..self.clone()
+<span class="w"> </span> })
+<span class="w"> </span> }
+<span class="w"> </span>}
+</pre></div>
+</div>
+<p>For implementations that need access to partition columns:</p>
+<div class="highlight-rust notranslate"><div
class="highlight"><pre><span></span><span class="k">fn</span><span class="w">
</span><span class="nf">with_schema</span><span class="p">(</span><span
class="o">&</span><span class="bp">self</span><span class="p">,</span><span
class="w"> </span><span class="n">schema</span><span class="p">:</span><span
class="w"> </span><span class="nc">TableSchema</span><span
class="p">)</span><span class="w"> </span><span class="p">-></span><span
class [...]
+<span class="w"> </span><span class="n">Arc</span><span
class="p">::</span><span class="n">new</span><span class="p">(</span><span
class="bp">Self</span><span class="w"> </span><span class="p">{</span>
+<span class="w"> </span><span class="n">file_schema</span><span
class="p">:</span><span class="w"> </span><span class="nc">Arc</span><span
class="p">::</span><span class="n">clone</span><span class="p">(</span><span
class="n">schema</span><span class="p">.</span><span
class="n">file_schema</span><span class="p">()),</span>
+<span class="w"> </span><span class="n">partition_cols</span><span
class="p">:</span><span class="w"> </span><span class="nc">schema</span><span
class="p">.</span><span class="n">table_partition_cols</span><span
class="p">().</span><span class="n">clone</span><span class="p">(),</span>
+<span class="w"> </span><span class="n">table_schema</span><span
class="p">:</span><span class="w"> </span><span class="nc">Arc</span><span
class="p">::</span><span class="n">clone</span><span class="p">(</span><span
class="n">schema</span><span class="p">.</span><span
class="n">table_schema</span><span class="p">()),</span>
+<span class="w"> </span><span class="o">..</span><span
class="bp">self</span><span class="p">.</span><span class="n">clone</span><span
class="p">()</span>
+<span class="w"> </span><span class="p">})</span>
+<span class="p">}</span>
+</pre></div>
+</div>
+<p><strong>Note</strong>: Most <code class="docutils literal
notranslate"><span class="pre">FileSource</span></code> implementations only
need to store the file schema (without partition columns), as shown in the
first example. The second pattern of storing all three schema components is
typically only needed for advanced use cases where you need access to different
schema representations for different operations (e.g., ParquetSource uses the
file schema for building pruning predicates b [...]
+<p><strong>Using <code class="docutils literal notranslate"><span
class="pre">TableSchema</span></code> directly:</strong></p>
+<p>If you’re constructing a <code class="docutils literal notranslate"><span
class="pre">FileScanConfig</span></code> or working with table schemas and
partition columns, you can now use <code class="docutils literal
notranslate"><span class="pre">TableSchema</span></code>:</p>
+<div class="highlight-rust notranslate"><div
class="highlight"><pre><span></span><span class="k">use</span><span class="w">
</span><span class="n">datafusion_datasource</span><span
class="p">::</span><span class="n">TableSchema</span><span class="p">;</span>
+<span class="k">use</span><span class="w"> </span><span
class="n">arrow</span><span class="p">::</span><span
class="n">datatypes</span><span class="p">::{</span><span
class="n">Schema</span><span class="p">,</span><span class="w"> </span><span
class="n">Field</span><span class="p">,</span><span class="w"> </span><span
class="n">DataType</span><span class="p">};</span>
+<span class="k">use</span><span class="w"> </span><span
class="n">std</span><span class="p">::</span><span class="n">sync</span><span
class="p">::</span><span class="n">Arc</span><span class="p">;</span>
+
+<span class="c1">// Create a TableSchema with partition columns</span>
+<span class="kd">let</span><span class="w"> </span><span
class="n">file_schema</span><span class="w"> </span><span
class="o">=</span><span class="w"> </span><span class="n">Arc</span><span
class="p">::</span><span class="n">new</span><span class="p">(</span><span
class="n">Schema</span><span class="p">::</span><span class="n">new</span><span
class="p">(</span><span class="fm">vec!</span><span class="p">[</span>
+<span class="w"> </span><span class="n">Field</span><span
class="p">::</span><span class="n">new</span><span class="p">(</span><span
class="s">"user_id"</span><span class="p">,</span><span class="w">
</span><span class="n">DataType</span><span class="p">::</span><span
class="n">Int64</span><span class="p">,</span><span class="w"> </span><span
class="kc">false</span><span class="p">),</span>
+<span class="w"> </span><span class="n">Field</span><span
class="p">::</span><span class="n">new</span><span class="p">(</span><span
class="s">"amount"</span><span class="p">,</span><span class="w">
</span><span class="n">DataType</span><span class="p">::</span><span
class="n">Float64</span><span class="p">,</span><span class="w"> </span><span
class="kc">false</span><span class="p">),</span>
+<span class="p">]));</span>
+
+<span class="kd">let</span><span class="w"> </span><span
class="n">partition_cols</span><span class="w"> </span><span
class="o">=</span><span class="w"> </span><span class="fm">vec!</span><span
class="p">[</span>
+<span class="w"> </span><span class="n">Arc</span><span
class="p">::</span><span class="n">new</span><span class="p">(</span><span
class="n">Field</span><span class="p">::</span><span class="n">new</span><span
class="p">(</span><span class="s">"date"</span><span
class="p">,</span><span class="w"> </span><span class="n">DataType</span><span
class="p">::</span><span class="n">Utf8</span><span class="p">,</span><span
class="w"> </span><span class="kc">false</span><span class="p [...]
+<span class="w"> </span><span class="n">Arc</span><span
class="p">::</span><span class="n">new</span><span class="p">(</span><span
class="n">Field</span><span class="p">::</span><span class="n">new</span><span
class="p">(</span><span class="s">"region"</span><span
class="p">,</span><span class="w"> </span><span class="n">DataType</span><span
class="p">::</span><span class="n">Utf8</span><span class="p">,</span><span
class="w"> </span><span class="kc">false</span><span class= [...]
+<span class="p">];</span>
+
+<span class="kd">let</span><span class="w"> </span><span
class="n">table_schema</span><span class="w"> </span><span
class="o">=</span><span class="w"> </span><span
class="n">TableSchema</span><span class="p">::</span><span
class="n">new</span><span class="p">(</span><span
class="n">file_schema</span><span class="p">,</span><span class="w">
</span><span class="n">partition_cols</span><span class="p">);</span>
+
+<span class="c1">// Access different schema representations</span>
+<span class="kd">let</span><span class="w"> </span><span
class="n">file_schema_ref</span><span class="w"> </span><span
class="o">=</span><span class="w"> </span><span
class="n">table_schema</span><span class="p">.</span><span
class="n">file_schema</span><span class="p">();</span><span class="w">
</span><span class="c1">// Schema without partition columns</span>
+<span class="kd">let</span><span class="w"> </span><span
class="n">full_schema</span><span class="w"> </span><span
class="o">=</span><span class="w"> </span><span
class="n">table_schema</span><span class="p">.</span><span
class="n">table_schema</span><span class="p">();</span><span class="w">
</span><span class="c1">// Complete schema with partition columns</span>
+<span class="kd">let</span><span class="w"> </span><span
class="n">partition_cols_ref</span><span class="w"> </span><span
class="o">=</span><span class="w"> </span><span
class="n">table_schema</span><span class="p">.</span><span
class="n">table_partition_cols</span><span class="p">();</span><span class="w">
</span><span class="c1">// Just the partition columns</span>
+</pre></div>
+</div>
+</section>
</section>
<section id="datafusion-50-0-0">
<h2>DataFusion <code class="docutils literal notranslate"><span
class="pre">50.0.0</span></code><a class="headerlink" href="#datafusion-50-0-0"
title="Link to this heading">#</a></h2>
@@ -1806,6 +1876,7 @@ take care of constructing the <code class="docutils
literal notranslate"><span c
<li class="toc-h3 nav-item toc-entry"><a class="reference internal nav-link"
href="#reorganization-of-arrowsource-into-datafusion-datasource-arrow-crate">Reorganization
of <code class="docutils literal notranslate"><span
class="pre">ArrowSource</span></code> into <code class="docutils literal
notranslate"><span class="pre">datafusion-datasource-arrow</span></code>
crate</a></li>
<li class="toc-h3 nav-item toc-entry"><a class="reference internal nav-link"
href="#filescanconfig-projection-renamed-to-filescanconfig-projection-exprs"><code
class="docutils literal notranslate"><span
class="pre">FileScanConfig::projection</span></code> renamed to <code
class="docutils literal notranslate"><span
class="pre">FileScanConfig::projection_exprs</span></code></a></li>
<li class="toc-h3 nav-item toc-entry"><a class="reference internal nav-link"
href="#describe-query-support"><code class="docutils literal notranslate"><span
class="pre">DESCRIBE</span> <span class="pre">query</span></code>
support</a></li>
+<li class="toc-h3 nav-item toc-entry"><a class="reference internal nav-link"
href="#introduction-of-tableschema-and-changes-to-filesource-with-schema-method">Introduction
of <code class="docutils literal notranslate"><span
class="pre">TableSchema</span></code> and changes to <code class="docutils
literal notranslate"><span class="pre">FileSource::with_schema()</span></code>
method</a></li>
</ul>
</li>
<li class="toc-h2 nav-item toc-entry"><a class="reference internal nav-link"
href="#datafusion-50-0-0">DataFusion <code class="docutils literal
notranslate"><span class="pre">50.0.0</span></code></a><ul class="nav
section-nav flex-column">
diff --git a/searchindex.js b/searchindex.js
index 0bfdd18584..cb848a1f1c 100644
--- a/searchindex.js
+++ b/searchindex.js
@@ -1 +1 @@
-Search.setIndex({"alltitles":{"!=":[[60,"op-neq"]],"!~":[[60,"op-re-not-match"]],"!~*":[[60,"op-re-not-match-i"]],"!~~":[[60,"id19"]],"!~~*":[[60,"id20"]],"#":[[60,"op-bit-xor"]],"%":[[60,"op-modulo"]],"&":[[60,"op-bit-and"]],"(relation,
name) tuples in logical fields and logical columns are
unique":[[13,"relation-name-tuples-in-logical-fields-and-logical-columns-are-unique"]],"*":[[60,"op-multiply"]],"+":[[60,"op-plus"]],"-":[[60,"op-minus"]],"/":[[60,"op-divide"]],"<":[[60,"op-lt"]],"<
[...]
\ No newline at end of file
+Search.setIndex({"alltitles":{"!=":[[60,"op-neq"]],"!~":[[60,"op-re-not-match"]],"!~*":[[60,"op-re-not-match-i"]],"!~~":[[60,"id19"]],"!~~*":[[60,"id20"]],"#":[[60,"op-bit-xor"]],"%":[[60,"op-modulo"]],"&":[[60,"op-bit-and"]],"(relation,
name) tuples in logical fields and logical columns are
unique":[[13,"relation-name-tuples-in-logical-fields-and-logical-columns-are-unique"]],"*":[[60,"op-multiply"]],"+":[[60,"op-plus"]],"-":[[60,"op-minus"]],"/":[[60,"op-divide"]],"<":[[60,"op-lt"]],"<
[...]
\ No newline at end of file
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]