This is an automated email from the ASF dual-hosted git repository.
git-site-role pushed a commit to branch asf-site
in repository https://gitbox.apache.org/repos/asf/drill-site.git
The following commit(s) were added to refs/heads/asf-site by this push:
new 71ced7a6ca Automatic Site Publish by Buildbot
71ced7a6ca is described below
commit 71ced7a6ca567fa4b279aada53bb45d28138187b
Author: buildbot <[email protected]>
AuthorDate: Mon Aug 19 15:30:09 2024 +0000
Automatic Site Publish by Buildbot
---
output/docs/parquet-filter-pushdown/index.html | 36 +++++++++++++++++++++++
output/feed.xml | 4 +--
output/zh/docs/parquet-filter-pushdown/index.html | 36 +++++++++++++++++++++++
output/zh/feed.xml | 4 +--
4 files changed, 76 insertions(+), 4 deletions(-)
diff --git a/output/docs/parquet-filter-pushdown/index.html
b/output/docs/parquet-filter-pushdown/index.html
index 810e2a118f..addb0cdfe7 100644
--- a/output/docs/parquet-filter-pushdown/index.html
+++ b/output/docs/parquet-filter-pushdown/index.html
@@ -1521,6 +1521,42 @@
<p>The query planner looks at the minimum and maximum values in each row group
for an intersection. If no intersection exists, the planner can prune the row
group in the table. If the minimum and maximum value range is too large, Drill
does not apply Parquet filter pushdown. The query planner can typically prune
more data when the tables in the Parquet file are sorted by row groups.</p>
+<h3 id="filter-pushdown-threshold">Filter Pushdown Threshold</h3>
+
+<p>There is a limit on the number of row groups the planner will examine for
pruning. This limit is controlled by the option <code class="language-plaintext
highlighter-rouge">planner.store.parquet.rowgroup.filter.pushdown.threshold</code>,
which has a default value of 10,000.</p>
+
+<p>A query on many and/or large Parquet files that takes a long time to
execute could benefit from increasing this threshold. The planning will take
longer time, but the overall execution time may still be shorter.</p>
+
+<p>Use the <a href="/docs/explain/">EXPLAIN PLAN command</a> command to check
whether filter pushdown is used to prune row groups in a specific query.</p>
+
+<p>Example:</p>
+
+<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre
class="highlight"><code>EXPLAIN PLAN FOR SELECT col1 from dfs.`dir/subdir`
WHERE col2 >= 100 AND col2 < 200
+</code></pre></div></div>
+
+<p>If filter pushdown is applied to the query, the command will produce a plan
similar to</p>
+
+<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre
class="highlight"><code>00-00 Screen
+00-01 Project(col1=[$0])
+00-02 UnionExchange
+01-01 Scan(table=[[dfs, dir/subdir]], groupscan=[ParquetGroupScan
[entries=...
+</code></pre></div></div>
+
+<p>where <code class="language-plaintext highlighter-rouge">entries</code>
will contain the paths to the Parquet files in <code class="language-plaintext
highlighter-rouge">dir/subdir</code> for which the metadata indicates that
<code class="language-plaintext highlighter-rouge">col2</code> has values in
the specified range.</p>
+
+<p>Should however filter pushdown <em>not</em> be applied to the query, the
plan will look like</p>
+
+<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre
class="highlight"><code>00-00 Screen
+00-01 Project(col1=[$0])
+00-02 UnionExchange
+01-01 Project(col1=[$1])
+01-02 SelectionVectorRemover
+01-03 Filter(condition=[SEARCH($0, Sarg[(100..200)])])
+01-04 Scan(table=[[dfs, dir/subdir]],
groupscan=[[ParquetGroupScan [entries=...
+</code></pre></div></div>
+
+<p>where <code class="language-plaintext highlighter-rouge">entries</code>
will contain the paths to all Parquet files in <code class="language-plaintext
highlighter-rouge">dir/subdir</code>.</p>
+
<h2 id="parquet-filter-pushdown-for-varchar-and-decimal-data-types">Parquet
Filter Pushdown for VARCHAR and DECIMAL Data Types</h2>
<p>Starting in Drill 1.15, Drill supports Parquet filter pushdown for the
VARCHAR and DECIMAL data types. Drill uses binary statistics in the Parquet
file or Drill metadata file to push filters on VARCHAR and DECIMAL data types
down to the data source.</p>
diff --git a/output/feed.xml b/output/feed.xml
index fc2124e349..46ad84cc14 100644
--- a/output/feed.xml
+++ b/output/feed.xml
@@ -6,8 +6,8 @@
</description>
<link>/</link>
<atom:link href="/feed.xml" rel="self" type="application/rss+xml"/>
- <pubDate>Sat, 03 Aug 2024 07:15:13 +0000</pubDate>
- <lastBuildDate>Sat, 03 Aug 2024 07:15:13 +0000</lastBuildDate>
+ <pubDate>Mon, 19 Aug 2024 15:28:00 +0000</pubDate>
+ <lastBuildDate>Mon, 19 Aug 2024 15:28:00 +0000</lastBuildDate>
<generator>Jekyll v3.9.1</generator>
<item>
diff --git a/output/zh/docs/parquet-filter-pushdown/index.html
b/output/zh/docs/parquet-filter-pushdown/index.html
index f10a8a6433..fbda0cfa4a 100644
--- a/output/zh/docs/parquet-filter-pushdown/index.html
+++ b/output/zh/docs/parquet-filter-pushdown/index.html
@@ -1521,6 +1521,42 @@
<p>The query planner looks at the minimum and maximum values in each row group
for an intersection. If no intersection exists, the planner can prune the row
group in the table. If the minimum and maximum value range is too large, Drill
does not apply Parquet filter pushdown. The query planner can typically prune
more data when the tables in the Parquet file are sorted by row groups.</p>
+<h3 id="filter-pushdown-threshold">Filter Pushdown Threshold</h3>
+
+<p>There is a limit on the number of row groups the planner will examine for
pruning. This limit is controlled by the option <code class="language-plaintext
highlighter-rouge">planner.store.parquet.rowgroup.filter.pushdown.threshold</code>,
which has a default value of 10,000.</p>
+
+<p>A query on many and/or large Parquet files that takes a long time to
execute could benefit from increasing this threshold. The planning will take
longer time, but the overall execution time may still be shorter.</p>
+
+<p>Use the <a href="/zh/docs/explain/">EXPLAIN PLAN command</a> command to
check whether filter pushdown is used to prune row groups in a specific
query.</p>
+
+<p>Example:</p>
+
+<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre
class="highlight"><code>EXPLAIN PLAN FOR SELECT col1 from dfs.`dir/subdir`
WHERE col2 >= 100 AND col2 < 200
+</code></pre></div></div>
+
+<p>If filter pushdown is applied to the query, the command will produce a plan
similar to</p>
+
+<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre
class="highlight"><code>00-00 Screen
+00-01 Project(col1=[$0])
+00-02 UnionExchange
+01-01 Scan(table=[[dfs, dir/subdir]], groupscan=[ParquetGroupScan
[entries=...
+</code></pre></div></div>
+
+<p>where <code class="language-plaintext highlighter-rouge">entries</code>
will contain the paths to the Parquet files in <code class="language-plaintext
highlighter-rouge">dir/subdir</code> for which the metadata indicates that
<code class="language-plaintext highlighter-rouge">col2</code> has values in
the specified range.</p>
+
+<p>Should however filter pushdown <em>not</em> be applied to the query, the
plan will look like</p>
+
+<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre
class="highlight"><code>00-00 Screen
+00-01 Project(col1=[$0])
+00-02 UnionExchange
+01-01 Project(col1=[$1])
+01-02 SelectionVectorRemover
+01-03 Filter(condition=[SEARCH($0, Sarg[(100..200)])])
+01-04 Scan(table=[[dfs, dir/subdir]],
groupscan=[[ParquetGroupScan [entries=...
+</code></pre></div></div>
+
+<p>where <code class="language-plaintext highlighter-rouge">entries</code>
will contain the paths to all Parquet files in <code class="language-plaintext
highlighter-rouge">dir/subdir</code>.</p>
+
<h2 id="parquet-filter-pushdown-for-varchar-and-decimal-data-types">Parquet
Filter Pushdown for VARCHAR and DECIMAL Data Types</h2>
<p>Starting in Drill 1.15, Drill supports Parquet filter pushdown for the
VARCHAR and DECIMAL data types. Drill uses binary statistics in the Parquet
file or Drill metadata file to push filters on VARCHAR and DECIMAL data types
down to the data source.</p>
diff --git a/output/zh/feed.xml b/output/zh/feed.xml
index 490cee918e..995546d55d 100644
--- a/output/zh/feed.xml
+++ b/output/zh/feed.xml
@@ -6,8 +6,8 @@
</description>
<link>/</link>
<atom:link href="/zh/feed.xml" rel="self" type="application/rss+xml"/>
- <pubDate>Sat, 03 Aug 2024 07:15:13 +0000</pubDate>
- <lastBuildDate>Sat, 03 Aug 2024 07:15:13 +0000</lastBuildDate>
+ <pubDate>Mon, 19 Aug 2024 15:28:00 +0000</pubDate>
+ <lastBuildDate>Mon, 19 Aug 2024 15:28:00 +0000</lastBuildDate>
<generator>Jekyll v3.9.1</generator>
<item>