(datafusion) branch asf-site updated: Publish built docs triggered by 330ece8e435ecb47d3fa14729df3c3f8378526c7

github-bot Fri, 28 Jun 2024 13:14:44 -0700

This is an automated email from the ASF dual-hosted git repository.

github-bot pushed a commit to branch asf-site
in repository https://gitbox.apache.org/repos/asf/datafusion.git



The following commit(s) were added to refs/heads/asf-site by this push:
     new 8e0426dcfa Publish built docs triggered by 
330ece8e435ecb47d3fa14729df3c3f8378526c7
8e0426dcfa is described below

commit 8e0426dcfa660c4bf0a947f16a5f6c3fe7037772
Author: github-actions[bot] <github-actions[bot]@users.noreply.github.com>
AuthorDate: Fri Jun 28 20:14:29 2024 +0000

    Publish built docs triggered by 330ece8e435ecb47d3fa14729df3c3f8378526c7
---
 _sources/user-guide/configs.md.txt           |  1 +
 _sources/user-guide/sql/dml.md.txt           |  5 ++-
 _sources/user-guide/sql/write_options.md.txt | 10 +++++
 searchindex.js                               |  2 +-
 user-guide/configs.html                      | 60 +++++++++++++++-------------
 user-guide/sql/dml.html                      |  5 ++-
 user-guide/sql/write_options.html            | 24 +++++++++++
 7 files changed, 76 insertions(+), 31 deletions(-)

diff --git a/_sources/user-guide/configs.md.txt 
b/_sources/user-guide/configs.md.txt
index c5f22725e0..0f0aa84604 100644
--- a/_sources/user-guide/configs.md.txt
+++ b/_sources/user-guide/configs.md.txt
@@ -86,6 +86,7 @@ Environment variables are read during `SessionConfig` 
initialisation so they mus
 | datafusion.execution.listing_table_ignore_subdirectory                  | 
true                      | Should sub directories be ignored when scanning 
directories for data files. Defaults to true (ignores subdirectories), 
consistent with Hive. Note that this setting does not affect reading 
partitioned tables (e.g. `/table/year=2021/month=01/data.parquet`).             
                                                                                
                                         [...]
 | datafusion.execution.enable_recursive_ctes                              | 
true                      | Should DataFusion support recursive CTEs            
                                                                                
                                                                                
                                                                                
                                                                                
                 [...]
 | datafusion.execution.split_file_groups_by_statistics                    | 
false                     | Attempt to eliminate sorts by packing & sorting 
files with non-overlapping statistics into the same file groups. Currently 
experimental                                                                    
                                                                                
                                                                                
                          [...]
+| datafusion.execution.keep_partition_by_columns                          | 
false                     | Should Datafusion keep the columns used for 
partition_by in the output RecordBatches                                        
                                                                                
                                                                                
                                                                                
                         [...]
 | datafusion.optimizer.enable_distinct_aggregation_soft_limit             | 
true                      | When set to true, the optimizer will push a limit 
operation into grouped aggregations which have no aggregate expressions, as a 
soft limit, emitting groups once the limit is reached, before all rows in the 
group are read.                                                                 
                                                                                
                       [...]
 | datafusion.optimizer.enable_round_robin_repartition                     | 
true                      | When set to true, the physical plan optimizer will 
try to add round robin repartitioning to increase parallelism to leverage more 
CPU cores                                                                       
                                                                                
                                                                                
                   [...]
 | datafusion.optimizer.enable_topk_aggregation                            | 
true                      | When set to true, the optimizer will attempt to 
perform limit operations during aggregations, if possible                       
                                                                                
                                                                                
                                                                                
                     [...]
diff --git a/_sources/user-guide/sql/dml.md.txt 
b/_sources/user-guide/sql/dml.md.txt
index 42e0c8054c..dd016cabbf 100644
--- a/_sources/user-guide/sql/dml.md.txt
+++ b/_sources/user-guide/sql/dml.md.txt
@@ -39,7 +39,10 @@ TO '<i><b>file_name</i></b>'
 clause is not specified, it will be inferred from the file extension if 
possible.
 
 `PARTITIONED BY` specifies the columns to use for partitioning the output 
files into
-separate hive-style directories.
+separate hive-style directories. By default, columns used in `PARTITIONED BY` 
will be removed
+from the output format. If you want to keep the columns, you should provide 
the option
+`execution.keep_partition_by_columns true`. 
`execution.keep_partition_by_columns` flag can also
+be enabled through `ExecutionOptions` within `SessionConfig`.
 
 The output format is determined by the first match of the following rules:
 
diff --git a/_sources/user-guide/sql/write_options.md.txt 
b/_sources/user-guide/sql/write_options.md.txt
index 3c4790dd02..6fb4ef215f 100644
--- a/_sources/user-guide/sql/write_options.md.txt
+++ b/_sources/user-guide/sql/write_options.md.txt
@@ -70,6 +70,16 @@ In this example, we write the entirety of `source_table` out 
to a folder of parq
 
 ## Available Options
 
+### Execution Specific Options
+
+The following options are available when executing a `COPY` query.
+
+| Option                              | Description                            
                                            | Default Value |
+| ----------------------------------- | 
----------------------------------------------------------------------------------
 | ------------- |
+| execution.keep_partition_by_columns | Flag to retain the columns in the 
output data when using `PARTITIONED BY` queries. | false         |
+
+Note: `execution.keep_partition_by_columns` flag can also be enabled through 
`ExecutionOptions` within `SessionConfig`.
+
 ### JSON Format Specific Options
 
 The following options are available when writing JSON files. Note: If any 
unsupported option is specified, an error will be raised and the query will 
fail.
diff --git a/searchindex.js b/searchindex.js
index 4a8ae1aeee..03905132e0 100644
--- a/searchindex.js
+++ b/searchindex.js
@@ -1 +1 @@
-Search.setIndex({"alltitles": {"!=": [[43, "op-neq"]], "!~": [[43, 
"op-re-not-match"]], "!~*": [[43, "op-re-not-match-i"]], "!~~": [[43, "id18"]], 
"!~~*": [[43, "id19"]], "#": [[43, "op-bit-xor"]], "%": [[43, "op-modulo"]], 
"&": [[43, "op-bit-and"]], "(relation, name) tuples in logical fields and 
logical columns are unique": [[10, 
"relation-name-tuples-in-logical-fields-and-logical-columns-are-unique"]], "*": 
[[43, "op-multiply"]], "+": [[43, "op-plus"]], "-": [[43, "op-minus"]], "/": [[ 
[...]
\ No newline at end of file
+Search.setIndex({"alltitles": {"!=": [[43, "op-neq"]], "!~": [[43, 
"op-re-not-match"]], "!~*": [[43, "op-re-not-match-i"]], "!~~": [[43, "id18"]], 
"!~~*": [[43, "id19"]], "#": [[43, "op-bit-xor"]], "%": [[43, "op-modulo"]], 
"&": [[43, "op-bit-and"]], "(relation, name) tuples in logical fields and 
logical columns are unique": [[10, 
"relation-name-tuples-in-logical-fields-and-logical-columns-are-unique"]], "*": 
[[43, "op-multiply"]], "+": [[43, "op-plus"]], "-": [[43, "op-minus"]], "/": [[ 
[...]
\ No newline at end of file
diff --git a/user-guide/configs.html b/user-guide/configs.html
index 4a268bbc5b..0d1a7eab8a 100644
--- a/user-guide/configs.html
+++ b/user-guide/configs.html
@@ -728,115 +728,119 @@ Environment variables are read during <code 
class="docutils literal notranslate"
 <td><p>false</p></td>
 <td><p>Attempt to eliminate sorts by packing &amp; sorting files with 
non-overlapping statistics into the same file groups. Currently 
experimental</p></td>
 </tr>
-<tr 
class="row-odd"><td><p>datafusion.optimizer.enable_distinct_aggregation_soft_limit</p></td>
+<tr 
class="row-odd"><td><p>datafusion.execution.keep_partition_by_columns</p></td>
+<td><p>false</p></td>
+<td><p>Should Datafusion keep the columns used for partition_by in the output 
RecordBatches</p></td>
+</tr>
+<tr 
class="row-even"><td><p>datafusion.optimizer.enable_distinct_aggregation_soft_limit</p></td>
 <td><p>true</p></td>
 <td><p>When set to true, the optimizer will push a limit operation into 
grouped aggregations which have no aggregate expressions, as a soft limit, 
emitting groups once the limit is reached, before all rows in the group are 
read.</p></td>
 </tr>
-<tr 
class="row-even"><td><p>datafusion.optimizer.enable_round_robin_repartition</p></td>
+<tr 
class="row-odd"><td><p>datafusion.optimizer.enable_round_robin_repartition</p></td>
 <td><p>true</p></td>
 <td><p>When set to true, the physical plan optimizer will try to add round 
robin repartitioning to increase parallelism to leverage more CPU cores</p></td>
 </tr>
-<tr 
class="row-odd"><td><p>datafusion.optimizer.enable_topk_aggregation</p></td>
+<tr 
class="row-even"><td><p>datafusion.optimizer.enable_topk_aggregation</p></td>
 <td><p>true</p></td>
 <td><p>When set to true, the optimizer will attempt to perform limit 
operations during aggregations, if possible</p></td>
 </tr>
-<tr class="row-even"><td><p>datafusion.optimizer.filter_null_join_keys</p></td>
+<tr class="row-odd"><td><p>datafusion.optimizer.filter_null_join_keys</p></td>
 <td><p>false</p></td>
 <td><p>When set to true, the optimizer will insert filters before a join 
between a nullable and non-nullable column to filter out nulls on the nullable 
side. This filter can add additional overhead when the file format does not 
fully support predicate push down.</p></td>
 </tr>
-<tr 
class="row-odd"><td><p>datafusion.optimizer.repartition_aggregations</p></td>
+<tr 
class="row-even"><td><p>datafusion.optimizer.repartition_aggregations</p></td>
 <td><p>true</p></td>
 <td><p>Should DataFusion repartition data using the aggregate keys to execute 
aggregates in parallel using the provided <code class="docutils literal 
notranslate"><span class="pre">target_partitions</span></code> level</p></td>
 </tr>
-<tr 
class="row-even"><td><p>datafusion.optimizer.repartition_file_min_size</p></td>
+<tr 
class="row-odd"><td><p>datafusion.optimizer.repartition_file_min_size</p></td>
 <td><p>10485760</p></td>
 <td><p>Minimum total files size in bytes to perform file scan 
repartitioning.</p></td>
 </tr>
-<tr class="row-odd"><td><p>datafusion.optimizer.repartition_joins</p></td>
+<tr class="row-even"><td><p>datafusion.optimizer.repartition_joins</p></td>
 <td><p>true</p></td>
 <td><p>Should DataFusion repartition data using the join keys to execute joins 
in parallel using the provided <code class="docutils literal notranslate"><span 
class="pre">target_partitions</span></code> level</p></td>
 </tr>
-<tr 
class="row-even"><td><p>datafusion.optimizer.allow_symmetric_joins_without_pruning</p></td>
+<tr 
class="row-odd"><td><p>datafusion.optimizer.allow_symmetric_joins_without_pruning</p></td>
 <td><p>true</p></td>
 <td><p>Should DataFusion allow symmetric hash joins for unbounded data sources 
even when its inputs do not have any ordering or filtering If the flag is not 
enabled, the SymmetricHashJoin operator will be unable to prune its internal 
buffers, resulting in certain join types - such as Full, Left, LeftAnti, 
LeftSemi, Right, RightAnti, and RightSemi - being produced only at the end of 
the execution. This is not typical in stream processing. Additionally, without 
proper design for long runne [...]
 </tr>
-<tr class="row-odd"><td><p>datafusion.optimizer.repartition_file_scans</p></td>
+<tr 
class="row-even"><td><p>datafusion.optimizer.repartition_file_scans</p></td>
 <td><p>true</p></td>
 <td><p>When set to <code class="docutils literal notranslate"><span 
class="pre">true</span></code>, file groups will be repartitioned to achieve 
maximum parallelism. Currently Parquet and CSV formats are supported. If set to 
<code class="docutils literal notranslate"><span 
class="pre">true</span></code>, all files will be repartitioned evenly (i.e., a 
single large file might be partitioned into smaller chunks) for parallel 
scanning. If set to <code class="docutils literal notranslate"><s [...]
 </tr>
-<tr class="row-even"><td><p>datafusion.optimizer.repartition_windows</p></td>
+<tr class="row-odd"><td><p>datafusion.optimizer.repartition_windows</p></td>
 <td><p>true</p></td>
 <td><p>Should DataFusion repartition data using the partitions keys to execute 
window functions in parallel using the provided <code class="docutils literal 
notranslate"><span class="pre">target_partitions</span></code> level</p></td>
 </tr>
-<tr class="row-odd"><td><p>datafusion.optimizer.repartition_sorts</p></td>
+<tr class="row-even"><td><p>datafusion.optimizer.repartition_sorts</p></td>
 <td><p>true</p></td>
 <td><p>Should DataFusion execute sorts in a per-partition fashion and merge 
afterwards instead of coalescing first and sorting globally. With this flag is 
enabled, plans in the form below <code class="docutils literal 
notranslate"><span class="pre">text</span> <span 
class="pre">&quot;SortExec:</span> <span class="pre">[a&#64;0</span> <span 
class="pre">ASC]&quot;,</span> <span class="pre">&quot;</span> <span 
class="pre">CoalescePartitionsExec&quot;,</span> <span 
class="pre">&quot;</span>  [...]
 </tr>
-<tr class="row-even"><td><p>datafusion.optimizer.prefer_existing_sort</p></td>
+<tr class="row-odd"><td><p>datafusion.optimizer.prefer_existing_sort</p></td>
 <td><p>false</p></td>
 <td><p>When true, DataFusion will opportunistically remove sorts when the data 
is already sorted, (i.e. setting <code class="docutils literal 
notranslate"><span class="pre">preserve_order</span></code> to true on <code 
class="docutils literal notranslate"><span 
class="pre">RepartitionExec</span></code> and using <code class="docutils 
literal notranslate"><span class="pre">SortPreservingMergeExec</span></code>) 
When false, DataFusion will maximize plan parallelism using <code class="docut 
[...]
 </tr>
-<tr class="row-odd"><td><p>datafusion.optimizer.skip_failed_rules</p></td>
+<tr class="row-even"><td><p>datafusion.optimizer.skip_failed_rules</p></td>
 <td><p>false</p></td>
 <td><p>When set to true, the logical plan optimizer will produce warning 
messages if any optimization rules produce errors and then proceed to the next 
rule. When set to false, any rules that produce errors will cause the query to 
fail</p></td>
 </tr>
-<tr class="row-even"><td><p>datafusion.optimizer.max_passes</p></td>
+<tr class="row-odd"><td><p>datafusion.optimizer.max_passes</p></td>
 <td><p>3</p></td>
 <td><p>Number of times that the optimizer will attempt to optimize the 
plan</p></td>
 </tr>
-<tr 
class="row-odd"><td><p>datafusion.optimizer.top_down_join_key_reordering</p></td>
+<tr 
class="row-even"><td><p>datafusion.optimizer.top_down_join_key_reordering</p></td>
 <td><p>true</p></td>
 <td><p>When set to true, the physical plan optimizer will run a top down 
process to reorder the join keys</p></td>
 </tr>
-<tr class="row-even"><td><p>datafusion.optimizer.prefer_hash_join</p></td>
+<tr class="row-odd"><td><p>datafusion.optimizer.prefer_hash_join</p></td>
 <td><p>true</p></td>
 <td><p>When set to true, the physical plan optimizer will prefer HashJoin over 
SortMergeJoin. HashJoin can work more efficiently than SortMergeJoin but 
consumes more memory</p></td>
 </tr>
-<tr 
class="row-odd"><td><p>datafusion.optimizer.hash_join_single_partition_threshold</p></td>
+<tr 
class="row-even"><td><p>datafusion.optimizer.hash_join_single_partition_threshold</p></td>
 <td><p>1048576</p></td>
 <td><p>The maximum estimated size in bytes for one input side of a HashJoin 
will be collected into a single partition</p></td>
 </tr>
-<tr 
class="row-even"><td><p>datafusion.optimizer.hash_join_single_partition_threshold_rows</p></td>
+<tr 
class="row-odd"><td><p>datafusion.optimizer.hash_join_single_partition_threshold_rows</p></td>
 <td><p>131072</p></td>
 <td><p>The maximum estimated size in rows for one input side of a HashJoin 
will be collected into a single partition</p></td>
 </tr>
-<tr 
class="row-odd"><td><p>datafusion.optimizer.default_filter_selectivity</p></td>
+<tr 
class="row-even"><td><p>datafusion.optimizer.default_filter_selectivity</p></td>
 <td><p>20</p></td>
 <td><p>The default filter selectivity used by Filter Statistics when an exact 
selectivity cannot be determined. Valid values are between 0 (no selectivity) 
and 100 (all rows are selected).</p></td>
 </tr>
-<tr class="row-even"><td><p>datafusion.optimizer.prefer_existing_union</p></td>
+<tr class="row-odd"><td><p>datafusion.optimizer.prefer_existing_union</p></td>
 <td><p>false</p></td>
 <td><p>When set to true, the optimizer will not attempt to convert Union to 
Interleave</p></td>
 </tr>
-<tr class="row-odd"><td><p>datafusion.explain.logical_plan_only</p></td>
+<tr class="row-even"><td><p>datafusion.explain.logical_plan_only</p></td>
 <td><p>false</p></td>
 <td><p>When set to true, the explain statement will only print logical 
plans</p></td>
 </tr>
-<tr class="row-even"><td><p>datafusion.explain.physical_plan_only</p></td>
+<tr class="row-odd"><td><p>datafusion.explain.physical_plan_only</p></td>
 <td><p>false</p></td>
 <td><p>When set to true, the explain statement will only print physical 
plans</p></td>
 </tr>
-<tr class="row-odd"><td><p>datafusion.explain.show_statistics</p></td>
+<tr class="row-even"><td><p>datafusion.explain.show_statistics</p></td>
 <td><p>false</p></td>
 <td><p>When set to true, the explain statement will print operator statistics 
for physical plans</p></td>
 </tr>
-<tr class="row-even"><td><p>datafusion.explain.show_sizes</p></td>
+<tr class="row-odd"><td><p>datafusion.explain.show_sizes</p></td>
 <td><p>true</p></td>
 <td><p>When set to true, the explain statement will print the partition 
sizes</p></td>
 </tr>
-<tr 
class="row-odd"><td><p>datafusion.sql_parser.parse_float_as_decimal</p></td>
+<tr 
class="row-even"><td><p>datafusion.sql_parser.parse_float_as_decimal</p></td>
 <td><p>false</p></td>
 <td><p>When set to true, SQL parser will parse float as decimal type</p></td>
 </tr>
-<tr 
class="row-even"><td><p>datafusion.sql_parser.enable_ident_normalization</p></td>
+<tr 
class="row-odd"><td><p>datafusion.sql_parser.enable_ident_normalization</p></td>
 <td><p>true</p></td>
 <td><p>When set to true, SQL parser will normalize ident (convert ident to 
lowercase when not quoted)</p></td>
 </tr>
-<tr class="row-odd"><td><p>datafusion.sql_parser.dialect</p></td>
+<tr class="row-even"><td><p>datafusion.sql_parser.dialect</p></td>
 <td><p>generic</p></td>
 <td><p>Configure the SQL dialect used by DataFusion’s parser; supported values 
include: Generic, MySQL, PostgreSQL, Hive, SQLite, Snowflake, Redshift, MsSQL, 
ClickHouse, BigQuery, and Ansi.</p></td>
 </tr>
-<tr 
class="row-even"><td><p>datafusion.sql_parser.support_varchar_with_length</p></td>
+<tr 
class="row-odd"><td><p>datafusion.sql_parser.support_varchar_with_length</p></td>
 <td><p>true</p></td>
 <td><p>If true, permit lengths for <code class="docutils literal 
notranslate"><span class="pre">VARCHAR</span></code> such as <code 
class="docutils literal notranslate"><span 
class="pre">VARCHAR(20)</span></code>, but ignore the length. If false, error 
if a <code class="docutils literal notranslate"><span 
class="pre">VARCHAR</span></code> with a length is specified. The Arrow type 
system does not have a notion of maximum string length and thus DataFusion can 
not enforce such limits.</p></td>
 </tr>
diff --git a/user-guide/sql/dml.html b/user-guide/sql/dml.html
index 609e2fb486..7f074220b5 100644
--- a/user-guide/sql/dml.html
+++ b/user-guide/sql/dml.html
@@ -556,7 +556,10 @@ TO '<i><b>file_name</i></b>'
 <p><code class="docutils literal notranslate"><span class="pre">STORED</span> 
<span class="pre">AS</span></code> specifies the file format the <code 
class="docutils literal notranslate"><span class="pre">COPY</span></code> 
command will write. If this
 clause is not specified, it will be inferred from the file extension if 
possible.</p>
 <p><code class="docutils literal notranslate"><span 
class="pre">PARTITIONED</span> <span class="pre">BY</span></code> specifies the 
columns to use for partitioning the output files into
-separate hive-style directories.</p>
+separate hive-style directories. By default, columns used in <code 
class="docutils literal notranslate"><span class="pre">PARTITIONED</span> <span 
class="pre">BY</span></code> will be removed
+from the output format. If you want to keep the columns, you should provide 
the option
+<code class="docutils literal notranslate"><span 
class="pre">execution.keep_partition_by_columns</span> <span 
class="pre">true</span></code>. <code class="docutils literal 
notranslate"><span 
class="pre">execution.keep_partition_by_columns</span></code> flag can also
+be enabled through <code class="docutils literal notranslate"><span 
class="pre">ExecutionOptions</span></code> within <code class="docutils literal 
notranslate"><span class="pre">SessionConfig</span></code>.</p>
 <p>The output format is determined by the first match of the following 
rules:</p>
 <ol class="arabic simple">
 <li><p>Value of <code class="docutils literal notranslate"><span 
class="pre">STORED</span> <span class="pre">AS</span></code></p></li>
diff --git a/user-guide/sql/write_options.html 
b/user-guide/sql/write_options.html
index ee679608b2..f5e4d4afdd 100644
--- a/user-guide/sql/write_options.html
+++ b/user-guide/sql/write_options.html
@@ -478,6 +478,11 @@
    Available Options
   </a>
   <ul class="nav section-nav flex-column">
+   <li class="toc-h3 nav-item toc-entry">
+    <a class="reference internal nav-link" href="#execution-specific-options">
+     Execution Specific Options
+    </a>
+   </li>
    <li class="toc-h3 nav-item toc-entry">
     <a class="reference internal nav-link" 
href="#json-format-specific-options">
      JSON Format Specific Options
@@ -586,6 +591,25 @@
 </section>
 <section id="available-options">
 <h2>Available Options<a class="headerlink" href="#available-options" 
title="Link to this heading">¶</a></h2>
+<section id="execution-specific-options">
+<h3>Execution Specific Options<a class="headerlink" 
href="#execution-specific-options" title="Link to this heading">¶</a></h3>
+<p>The following options are available when executing a <code class="docutils 
literal notranslate"><span class="pre">COPY</span></code> query.</p>
+<table class="table">
+<thead>
+<tr class="row-odd"><th class="head"><p>Option</p></th>
+<th class="head"><p>Description</p></th>
+<th class="head"><p>Default Value</p></th>
+</tr>
+</thead>
+<tbody>
+<tr class="row-even"><td><p>execution.keep_partition_by_columns</p></td>
+<td><p>Flag to retain the columns in the output data when using <code 
class="docutils literal notranslate"><span class="pre">PARTITIONED</span> <span 
class="pre">BY</span></code> queries.</p></td>
+<td><p>false</p></td>
+</tr>
+</tbody>
+</table>
+<p>Note: <code class="docutils literal notranslate"><span 
class="pre">execution.keep_partition_by_columns</span></code> flag can also be 
enabled through <code class="docutils literal notranslate"><span 
class="pre">ExecutionOptions</span></code> within <code class="docutils literal 
notranslate"><span class="pre">SessionConfig</span></code>.</p>
+</section>
 <section id="json-format-specific-options">
 <h3>JSON Format Specific Options<a class="headerlink" 
href="#json-format-specific-options" title="Link to this heading">¶</a></h3>
 <p>The following options are available when writing JSON files. Note: If any 
unsupported option is specified, an error will be raised and the query will 
fail.</p>


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

(datafusion) branch asf-site updated: Publish built docs triggered by 330ece8e435ecb47d3fa14729df3c3f8378526c7

Reply via email to