This is an automated email from the ASF dual-hosted git repository.
github-bot pushed a commit to branch asf-site
in repository https://gitbox.apache.org/repos/asf/datafusion.git
The following commit(s) were added to refs/heads/asf-site by this push:
new 91f5ba6230 Publish built docs triggered by
6d77748b2ac2add162fe048526267614b802a259
91f5ba6230 is described below
commit 91f5ba623034fa76814ae28c89d8372ac1075666
Author: github-actions[bot] <github-actions[bot]@users.noreply.github.com>
AuthorDate: Thu May 2 11:31:05 2024 +0000
Publish built docs triggered by 6d77748b2ac2add162fe048526267614b802a259
---
_sources/user-guide/configs.md.txt | 3 +-
searchindex.js | 2 +-
user-guide/configs.html | 94 ++++++++++++++++++++------------------
3 files changed, 52 insertions(+), 47 deletions(-)
diff --git a/_sources/user-guide/configs.md.txt
b/_sources/user-guide/configs.md.txt
index af7a92f403..db192d7807 100644
--- a/_sources/user-guide/configs.md.txt
+++ b/_sources/user-guide/configs.md.txt
@@ -68,7 +68,8 @@ Environment variables are read during `SessionConfig`
initialisation so they mus
| datafusion.execution.parquet.column_index_truncate_length |
NULL | Sets column index truncate length
[...]
| datafusion.execution.parquet.data_page_row_count_limit |
18446744073709551615 | Sets best effort maximum number of rows in data
page
[...]
| datafusion.execution.parquet.encoding |
NULL | Sets default encoding for any column Valid values
are: plain, plain_dictionary, rle, bit_packed, delta_binary_packed,
delta_length_byte_array, delta_byte_array, rle_dictionary, and
byte_stream_split. These values are not case sensitive. If NULL, uses default
parquet writer setting
[...]
-| datafusion.execution.parquet.bloom_filter_enabled |
false | Sets if bloom filter is enabled for any column
[...]
+| datafusion.execution.parquet.bloom_filter_on_read |
true | Use any available bloom filters when reading
parquet files
[...]
+| datafusion.execution.parquet.bloom_filter_on_write |
false | Write bloom filters for all columns when creating
parquet files
[...]
| datafusion.execution.parquet.bloom_filter_fpp |
NULL | Sets bloom filter false positive probability. If
NULL, uses default parquet writer setting
[...]
| datafusion.execution.parquet.bloom_filter_ndv |
NULL | Sets bloom filter number of distinct values. If
NULL, uses default parquet writer setting
[...]
| datafusion.execution.parquet.allow_single_file_parallelism |
true | Controls whether DataFusion will attempt to speed
up writing parquet files by serializing them in parallel. Each column in each
row group in each output file are serialized in parallel leveraging a maximum
possible core count of n_files*n_row_groups*n_columns.
[...]
diff --git a/searchindex.js b/searchindex.js
index b9ab2482b0..b72e3e6370 100644
--- a/searchindex.js
+++ b/searchindex.js
@@ -1 +1 @@
-Search.setIndex({"alltitles": {"!=": [[39, "op-neq"]], "!~": [[39,
"op-re-not-match"]], "!~*": [[39, "op-re-not-match-i"]], "!~~": [[39, "id18"]],
"!~~*": [[39, "id19"]], "#": [[39, "op-bit-xor"]], "%": [[39, "op-modulo"]],
"&": [[39, "op-bit-and"]], "(relation, name) tuples in logical fields and
logical columns are unique": [[7,
"relation-name-tuples-in-logical-fields-and-logical-columns-are-unique"]], "*":
[[39, "op-multiply"]], "+": [[39, "op-plus"]], "-": [[39, "op-minus"]], "/":
[[3 [...]
\ No newline at end of file
+Search.setIndex({"alltitles": {"!=": [[39, "op-neq"]], "!~": [[39,
"op-re-not-match"]], "!~*": [[39, "op-re-not-match-i"]], "!~~": [[39, "id18"]],
"!~~*": [[39, "id19"]], "#": [[39, "op-bit-xor"]], "%": [[39, "op-modulo"]],
"&": [[39, "op-bit-and"]], "(relation, name) tuples in logical fields and
logical columns are unique": [[7,
"relation-name-tuples-in-logical-fields-and-logical-columns-are-unique"]], "*":
[[39, "op-multiply"]], "+": [[39, "op-plus"]], "-": [[39, "op-minus"]], "/":
[[3 [...]
\ No newline at end of file
diff --git a/user-guide/configs.html b/user-guide/configs.html
index 45d8348d2c..be0c70c00d 100644
--- a/user-guide/configs.html
+++ b/user-guide/configs.html
@@ -614,179 +614,183 @@ Environment variables are read during <code
class="docutils literal notranslate"
<td><p>NULL</p></td>
<td><p>Sets default encoding for any column Valid values are: plain,
plain_dictionary, rle, bit_packed, delta_binary_packed,
delta_length_byte_array, delta_byte_array, rle_dictionary, and
byte_stream_split. These values are not case sensitive. If NULL, uses default
parquet writer setting</p></td>
</tr>
-<tr
class="row-odd"><td><p>datafusion.execution.parquet.bloom_filter_enabled</p></td>
+<tr
class="row-odd"><td><p>datafusion.execution.parquet.bloom_filter_on_read</p></td>
+<td><p>true</p></td>
+<td><p>Use any available bloom filters when reading parquet files</p></td>
+</tr>
+<tr
class="row-even"><td><p>datafusion.execution.parquet.bloom_filter_on_write</p></td>
<td><p>false</p></td>
-<td><p>Sets if bloom filter is enabled for any column</p></td>
+<td><p>Write bloom filters for all columns when creating parquet files</p></td>
</tr>
-<tr
class="row-even"><td><p>datafusion.execution.parquet.bloom_filter_fpp</p></td>
+<tr
class="row-odd"><td><p>datafusion.execution.parquet.bloom_filter_fpp</p></td>
<td><p>NULL</p></td>
<td><p>Sets bloom filter false positive probability. If NULL, uses default
parquet writer setting</p></td>
</tr>
-<tr
class="row-odd"><td><p>datafusion.execution.parquet.bloom_filter_ndv</p></td>
+<tr
class="row-even"><td><p>datafusion.execution.parquet.bloom_filter_ndv</p></td>
<td><p>NULL</p></td>
<td><p>Sets bloom filter number of distinct values. If NULL, uses default
parquet writer setting</p></td>
</tr>
-<tr
class="row-even"><td><p>datafusion.execution.parquet.allow_single_file_parallelism</p></td>
+<tr
class="row-odd"><td><p>datafusion.execution.parquet.allow_single_file_parallelism</p></td>
<td><p>true</p></td>
<td><p>Controls whether DataFusion will attempt to speed up writing parquet
files by serializing them in parallel. Each column in each row group in each
output file are serialized in parallel leveraging a maximum possible core count
of n_files<em>n_row_groups</em>n_columns.</p></td>
</tr>
-<tr
class="row-odd"><td><p>datafusion.execution.parquet.maximum_parallel_row_group_writers</p></td>
+<tr
class="row-even"><td><p>datafusion.execution.parquet.maximum_parallel_row_group_writers</p></td>
<td><p>1</p></td>
<td><p>By default parallel parquet writer is tuned for minimum memory usage in
a streaming execution plan. You may see a performance benefit when writing
large parquet files by increasing maximum_parallel_row_group_writers and
maximum_buffered_record_batches_per_stream if your system has idle cores and
can tolerate additional memory usage. Boosting these values is likely
worthwhile when writing out already in-memory data, such as from a cached data
frame.</p></td>
</tr>
-<tr
class="row-even"><td><p>datafusion.execution.parquet.maximum_buffered_record_batches_per_stream</p></td>
+<tr
class="row-odd"><td><p>datafusion.execution.parquet.maximum_buffered_record_batches_per_stream</p></td>
<td><p>2</p></td>
<td><p>By default parallel parquet writer is tuned for minimum memory usage in
a streaming execution plan. You may see a performance benefit when writing
large parquet files by increasing maximum_parallel_row_group_writers and
maximum_buffered_record_batches_per_stream if your system has idle cores and
can tolerate additional memory usage. Boosting these values is likely
worthwhile when writing out already in-memory data, such as from a cached data
frame.</p></td>
</tr>
-<tr
class="row-odd"><td><p>datafusion.execution.aggregate.scalar_update_factor</p></td>
+<tr
class="row-even"><td><p>datafusion.execution.aggregate.scalar_update_factor</p></td>
<td><p>10</p></td>
<td><p>Specifies the threshold for using <code class="docutils literal
notranslate"><span class="pre">ScalarValue</span></code>s to update
accumulators during high-cardinality aggregations for each input batch. The
aggregation is considered high-cardinality if the number of affected groups is
greater than or equal to <code class="docutils literal notranslate"><span
class="pre">batch_size</span> <span class="pre">/</span> <span
class="pre">scalar_update_factor</span></code>. In such cases [...]
</tr>
-<tr class="row-even"><td><p>datafusion.execution.planning_concurrency</p></td>
+<tr class="row-odd"><td><p>datafusion.execution.planning_concurrency</p></td>
<td><p>0</p></td>
<td><p>Fan-out during initial physical planning. This is mostly use to plan
<code class="docutils literal notranslate"><span
class="pre">UNION</span></code> children in parallel. Defaults to the number of
CPU cores on the system</p></td>
</tr>
-<tr
class="row-odd"><td><p>datafusion.execution.sort_spill_reservation_bytes</p></td>
+<tr
class="row-even"><td><p>datafusion.execution.sort_spill_reservation_bytes</p></td>
<td><p>10485760</p></td>
<td><p>Specifies the reserved memory for each spillable sort operation to
facilitate an in-memory merge. When a sort operation spills to disk, the
in-memory data must be sorted and merged before being written to a file. This
setting reserves a specific amount of memory for that in-memory sort/merge
process. Note: This setting is irrelevant if the sort operation cannot spill
(i.e., if there’s no <code class="docutils literal notranslate"><span
class="pre">DiskManager</span></code> configu [...]
</tr>
-<tr
class="row-even"><td><p>datafusion.execution.sort_in_place_threshold_bytes</p></td>
+<tr
class="row-odd"><td><p>datafusion.execution.sort_in_place_threshold_bytes</p></td>
<td><p>1048576</p></td>
<td><p>When sorting, below what size should data be concatenated and sorted in
a single RecordBatch rather than sorted in batches and merged.</p></td>
</tr>
-<tr class="row-odd"><td><p>datafusion.execution.meta_fetch_concurrency</p></td>
+<tr
class="row-even"><td><p>datafusion.execution.meta_fetch_concurrency</p></td>
<td><p>32</p></td>
<td><p>Number of files to read in parallel when inferring schema and
statistics</p></td>
</tr>
-<tr
class="row-even"><td><p>datafusion.execution.minimum_parallel_output_files</p></td>
+<tr
class="row-odd"><td><p>datafusion.execution.minimum_parallel_output_files</p></td>
<td><p>4</p></td>
<td><p>Guarantees a minimum level of output files running in parallel.
RecordBatches will be distributed in round robin fashion to each parallel
writer. Each writer is closed and a new file opened once
soft_max_rows_per_output_file is reached.</p></td>
</tr>
-<tr
class="row-odd"><td><p>datafusion.execution.soft_max_rows_per_output_file</p></td>
+<tr
class="row-even"><td><p>datafusion.execution.soft_max_rows_per_output_file</p></td>
<td><p>50000000</p></td>
<td><p>Target number of rows in output files when writing multiple. This is a
soft max, so it can be exceeded slightly. There also will be one file smaller
than the limit if the total number of rows written is not roughly divisible by
the soft max</p></td>
</tr>
-<tr
class="row-even"><td><p>datafusion.execution.max_buffered_batches_per_output_file</p></td>
+<tr
class="row-odd"><td><p>datafusion.execution.max_buffered_batches_per_output_file</p></td>
<td><p>2</p></td>
<td><p>This is the maximum number of RecordBatches buffered for each output
file being worked. Higher values can potentially give faster write performance
at the cost of higher peak memory consumption</p></td>
</tr>
-<tr
class="row-odd"><td><p>datafusion.execution.listing_table_ignore_subdirectory</p></td>
+<tr
class="row-even"><td><p>datafusion.execution.listing_table_ignore_subdirectory</p></td>
<td><p>true</p></td>
<td><p>Should sub directories be ignored when scanning directories for data
files. Defaults to true (ignores subdirectories), consistent with Hive. Note
that this setting does not affect reading partitioned tables (e.g. <code
class="docutils literal notranslate"><span
class="pre">/table/year=2021/month=01/data.parquet</span></code>).</p></td>
</tr>
-<tr class="row-even"><td><p>datafusion.execution.enable_recursive_ctes</p></td>
+<tr class="row-odd"><td><p>datafusion.execution.enable_recursive_ctes</p></td>
<td><p>true</p></td>
<td><p>Should DataFusion support recursive CTEs</p></td>
</tr>
-<tr
class="row-odd"><td><p>datafusion.execution.split_file_groups_by_statistics</p></td>
+<tr
class="row-even"><td><p>datafusion.execution.split_file_groups_by_statistics</p></td>
<td><p>false</p></td>
<td><p>Attempt to eliminate sorts by packing & sorting files with
non-overlapping statistics into the same file groups. Currently
experimental</p></td>
</tr>
-<tr
class="row-even"><td><p>datafusion.optimizer.enable_distinct_aggregation_soft_limit</p></td>
+<tr
class="row-odd"><td><p>datafusion.optimizer.enable_distinct_aggregation_soft_limit</p></td>
<td><p>true</p></td>
<td><p>When set to true, the optimizer will push a limit operation into
grouped aggregations which have no aggregate expressions, as a soft limit,
emitting groups once the limit is reached, before all rows in the group are
read.</p></td>
</tr>
-<tr
class="row-odd"><td><p>datafusion.optimizer.enable_round_robin_repartition</p></td>
+<tr
class="row-even"><td><p>datafusion.optimizer.enable_round_robin_repartition</p></td>
<td><p>true</p></td>
<td><p>When set to true, the physical plan optimizer will try to add round
robin repartitioning to increase parallelism to leverage more CPU cores</p></td>
</tr>
-<tr
class="row-even"><td><p>datafusion.optimizer.enable_topk_aggregation</p></td>
+<tr
class="row-odd"><td><p>datafusion.optimizer.enable_topk_aggregation</p></td>
<td><p>true</p></td>
<td><p>When set to true, the optimizer will attempt to perform limit
operations during aggregations, if possible</p></td>
</tr>
-<tr class="row-odd"><td><p>datafusion.optimizer.filter_null_join_keys</p></td>
+<tr class="row-even"><td><p>datafusion.optimizer.filter_null_join_keys</p></td>
<td><p>false</p></td>
<td><p>When set to true, the optimizer will insert filters before a join
between a nullable and non-nullable column to filter out nulls on the nullable
side. This filter can add additional overhead when the file format does not
fully support predicate push down.</p></td>
</tr>
-<tr
class="row-even"><td><p>datafusion.optimizer.repartition_aggregations</p></td>
+<tr
class="row-odd"><td><p>datafusion.optimizer.repartition_aggregations</p></td>
<td><p>true</p></td>
<td><p>Should DataFusion repartition data using the aggregate keys to execute
aggregates in parallel using the provided <code class="docutils literal
notranslate"><span class="pre">target_partitions</span></code> level</p></td>
</tr>
-<tr
class="row-odd"><td><p>datafusion.optimizer.repartition_file_min_size</p></td>
+<tr
class="row-even"><td><p>datafusion.optimizer.repartition_file_min_size</p></td>
<td><p>10485760</p></td>
<td><p>Minimum total files size in bytes to perform file scan
repartitioning.</p></td>
</tr>
-<tr class="row-even"><td><p>datafusion.optimizer.repartition_joins</p></td>
+<tr class="row-odd"><td><p>datafusion.optimizer.repartition_joins</p></td>
<td><p>true</p></td>
<td><p>Should DataFusion repartition data using the join keys to execute joins
in parallel using the provided <code class="docutils literal notranslate"><span
class="pre">target_partitions</span></code> level</p></td>
</tr>
-<tr
class="row-odd"><td><p>datafusion.optimizer.allow_symmetric_joins_without_pruning</p></td>
+<tr
class="row-even"><td><p>datafusion.optimizer.allow_symmetric_joins_without_pruning</p></td>
<td><p>true</p></td>
<td><p>Should DataFusion allow symmetric hash joins for unbounded data sources
even when its inputs do not have any ordering or filtering If the flag is not
enabled, the SymmetricHashJoin operator will be unable to prune its internal
buffers, resulting in certain join types - such as Full, Left, LeftAnti,
LeftSemi, Right, RightAnti, and RightSemi - being produced only at the end of
the execution. This is not typical in stream processing. Additionally, without
proper design for long runne [...]
</tr>
-<tr
class="row-even"><td><p>datafusion.optimizer.repartition_file_scans</p></td>
+<tr class="row-odd"><td><p>datafusion.optimizer.repartition_file_scans</p></td>
<td><p>true</p></td>
<td><p>When set to <code class="docutils literal notranslate"><span
class="pre">true</span></code>, file groups will be repartitioned to achieve
maximum parallelism. Currently Parquet and CSV formats are supported. If set to
<code class="docutils literal notranslate"><span
class="pre">true</span></code>, all files will be repartitioned evenly (i.e., a
single large file might be partitioned into smaller chunks) for parallel
scanning. If set to <code class="docutils literal notranslate"><s [...]
</tr>
-<tr class="row-odd"><td><p>datafusion.optimizer.repartition_windows</p></td>
+<tr class="row-even"><td><p>datafusion.optimizer.repartition_windows</p></td>
<td><p>true</p></td>
<td><p>Should DataFusion repartition data using the partitions keys to execute
window functions in parallel using the provided <code class="docutils literal
notranslate"><span class="pre">target_partitions</span></code> level</p></td>
</tr>
-<tr class="row-even"><td><p>datafusion.optimizer.repartition_sorts</p></td>
+<tr class="row-odd"><td><p>datafusion.optimizer.repartition_sorts</p></td>
<td><p>true</p></td>
<td><p>Should DataFusion execute sorts in a per-partition fashion and merge
afterwards instead of coalescing first and sorting globally. With this flag is
enabled, plans in the form below <code class="docutils literal
notranslate"><span class="pre">text</span> <span
class="pre">"SortExec:</span> <span class="pre">[a@0</span> <span
class="pre">ASC]",</span> <span class="pre">"</span> <span
class="pre">CoalescePartitionsExec",</span> <span
class="pre">"</span> [...]
</tr>
-<tr class="row-odd"><td><p>datafusion.optimizer.prefer_existing_sort</p></td>
+<tr class="row-even"><td><p>datafusion.optimizer.prefer_existing_sort</p></td>
<td><p>false</p></td>
<td><p>When true, DataFusion will opportunistically remove sorts when the data
is already sorted, (i.e. setting <code class="docutils literal
notranslate"><span class="pre">preserve_order</span></code> to true on <code
class="docutils literal notranslate"><span
class="pre">RepartitionExec</span></code> and using <code class="docutils
literal notranslate"><span class="pre">SortPreservingMergeExec</span></code>)
When false, DataFusion will maximize plan parallelism using <code class="docut
[...]
</tr>
-<tr class="row-even"><td><p>datafusion.optimizer.skip_failed_rules</p></td>
+<tr class="row-odd"><td><p>datafusion.optimizer.skip_failed_rules</p></td>
<td><p>false</p></td>
<td><p>When set to true, the logical plan optimizer will produce warning
messages if any optimization rules produce errors and then proceed to the next
rule. When set to false, any rules that produce errors will cause the query to
fail</p></td>
</tr>
-<tr class="row-odd"><td><p>datafusion.optimizer.max_passes</p></td>
+<tr class="row-even"><td><p>datafusion.optimizer.max_passes</p></td>
<td><p>3</p></td>
<td><p>Number of times that the optimizer will attempt to optimize the
plan</p></td>
</tr>
-<tr
class="row-even"><td><p>datafusion.optimizer.top_down_join_key_reordering</p></td>
+<tr
class="row-odd"><td><p>datafusion.optimizer.top_down_join_key_reordering</p></td>
<td><p>true</p></td>
<td><p>When set to true, the physical plan optimizer will run a top down
process to reorder the join keys</p></td>
</tr>
-<tr class="row-odd"><td><p>datafusion.optimizer.prefer_hash_join</p></td>
+<tr class="row-even"><td><p>datafusion.optimizer.prefer_hash_join</p></td>
<td><p>true</p></td>
<td><p>When set to true, the physical plan optimizer will prefer HashJoin over
SortMergeJoin. HashJoin can work more efficiently than SortMergeJoin but
consumes more memory</p></td>
</tr>
-<tr
class="row-even"><td><p>datafusion.optimizer.hash_join_single_partition_threshold</p></td>
+<tr
class="row-odd"><td><p>datafusion.optimizer.hash_join_single_partition_threshold</p></td>
<td><p>1048576</p></td>
<td><p>The maximum estimated size in bytes for one input side of a HashJoin
will be collected into a single partition</p></td>
</tr>
-<tr
class="row-odd"><td><p>datafusion.optimizer.hash_join_single_partition_threshold_rows</p></td>
+<tr
class="row-even"><td><p>datafusion.optimizer.hash_join_single_partition_threshold_rows</p></td>
<td><p>131072</p></td>
<td><p>The maximum estimated size in rows for one input side of a HashJoin
will be collected into a single partition</p></td>
</tr>
-<tr
class="row-even"><td><p>datafusion.optimizer.default_filter_selectivity</p></td>
+<tr
class="row-odd"><td><p>datafusion.optimizer.default_filter_selectivity</p></td>
<td><p>20</p></td>
<td><p>The default filter selectivity used by Filter Statistics when an exact
selectivity cannot be determined. Valid values are between 0 (no selectivity)
and 100 (all rows are selected).</p></td>
</tr>
-<tr class="row-odd"><td><p>datafusion.optimizer.prefer_existing_union</p></td>
+<tr class="row-even"><td><p>datafusion.optimizer.prefer_existing_union</p></td>
<td><p>false</p></td>
<td><p>When set to true, the optimizer will not attempt to convert Union to
Interleave</p></td>
</tr>
-<tr class="row-even"><td><p>datafusion.explain.logical_plan_only</p></td>
+<tr class="row-odd"><td><p>datafusion.explain.logical_plan_only</p></td>
<td><p>false</p></td>
<td><p>When set to true, the explain statement will only print logical
plans</p></td>
</tr>
-<tr class="row-odd"><td><p>datafusion.explain.physical_plan_only</p></td>
+<tr class="row-even"><td><p>datafusion.explain.physical_plan_only</p></td>
<td><p>false</p></td>
<td><p>When set to true, the explain statement will only print physical
plans</p></td>
</tr>
-<tr class="row-even"><td><p>datafusion.explain.show_statistics</p></td>
+<tr class="row-odd"><td><p>datafusion.explain.show_statistics</p></td>
<td><p>false</p></td>
<td><p>When set to true, the explain statement will print operator statistics
for physical plans</p></td>
</tr>
-<tr class="row-odd"><td><p>datafusion.explain.show_sizes</p></td>
+<tr class="row-even"><td><p>datafusion.explain.show_sizes</p></td>
<td><p>true</p></td>
<td><p>When set to true, the explain statement will print the partition
sizes</p></td>
</tr>
-<tr
class="row-even"><td><p>datafusion.sql_parser.parse_float_as_decimal</p></td>
+<tr
class="row-odd"><td><p>datafusion.sql_parser.parse_float_as_decimal</p></td>
<td><p>false</p></td>
<td><p>When set to true, SQL parser will parse float as decimal type</p></td>
</tr>
-<tr
class="row-odd"><td><p>datafusion.sql_parser.enable_ident_normalization</p></td>
+<tr
class="row-even"><td><p>datafusion.sql_parser.enable_ident_normalization</p></td>
<td><p>true</p></td>
<td><p>When set to true, SQL parser will normalize ident (convert ident to
lowercase when not quoted)</p></td>
</tr>
-<tr class="row-even"><td><p>datafusion.sql_parser.dialect</p></td>
+<tr class="row-odd"><td><p>datafusion.sql_parser.dialect</p></td>
<td><p>generic</p></td>
<td><p>Configure the SQL dialect used by DataFusion’s parser; supported values
include: Generic, MySQL, PostgreSQL, Hive, SQLite, Snowflake, Redshift, MsSQL,
ClickHouse, BigQuery, and Ansi.</p></td>
</tr>
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]