This is an automated email from the ASF dual-hosted git repository.
github-bot pushed a commit to branch asf-site
in repository https://gitbox.apache.org/repos/asf/datafusion.git
The following commit(s) were added to refs/heads/asf-site by this push:
new e8fc65b29f Publish built docs triggered by
8ae56fc2b8c8b283daa16d540fbbf84dd49e1469
e8fc65b29f is described below
commit e8fc65b29f987408815353a124e9205857215074
Author: github-actions[bot] <github-actions[bot]@users.noreply.github.com>
AuthorDate: Mon Jul 8 21:00:49 2024 +0000
Publish built docs triggered by 8ae56fc2b8c8b283daa16d540fbbf84dd49e1469
---
.../aggregate_function.rs | 7 -
_sources/user-guide/dataframe.md.txt | 123 +++++-------
searchindex.js | 2 +-
user-guide/dataframe.html | 207 ++++-----------------
4 files changed, 83 insertions(+), 256 deletions(-)
diff --git a/_downloads/f9718f9b04809de030b1693c73858f19/aggregate_function.rs
b/_downloads/f9718f9b04809de030b1693c73858f19/aggregate_function.rs
index 760952d948..23e98714df 100644
--- a/_downloads/f9718f9b04809de030b1693c73858f19/aggregate_function.rs
+++ b/_downloads/f9718f9b04809de030b1693c73858f19/aggregate_function.rs
@@ -39,8 +39,6 @@ pub enum AggregateFunction {
Max,
/// Aggregation into an array
ArrayAgg,
- /// N'th value in a group according to some ordering
- NthValue,
}
impl AggregateFunction {
@@ -50,7 +48,6 @@ impl AggregateFunction {
Min => "MIN",
Max => "MAX",
ArrayAgg => "ARRAY_AGG",
- NthValue => "NTH_VALUE",
}
}
}
@@ -69,7 +66,6 @@ impl FromStr for AggregateFunction {
"max" => AggregateFunction::Max,
"min" => AggregateFunction::Min,
"array_agg" => AggregateFunction::ArrayAgg,
- "nth_value" => AggregateFunction::NthValue,
_ => {
return plan_err!("There is no built-in function named {name}");
}
@@ -114,7 +110,6 @@ impl AggregateFunction {
coerced_data_types[0].clone(),
input_expr_nullable[0],
)))),
- AggregateFunction::NthValue => Ok(coerced_data_types[0].clone()),
}
}
@@ -124,7 +119,6 @@ impl AggregateFunction {
match self {
AggregateFunction::Max | AggregateFunction::Min => Ok(true),
AggregateFunction::ArrayAgg => Ok(false),
- AggregateFunction::NthValue => Ok(true),
}
}
}
@@ -147,7 +141,6 @@ impl AggregateFunction {
.collect::<Vec<_>>();
Signature::uniform(1, valid, Volatility::Immutable)
}
- AggregateFunction::NthValue => Signature::any(2,
Volatility::Immutable),
}
}
}
diff --git a/_sources/user-guide/dataframe.md.txt
b/_sources/user-guide/dataframe.md.txt
index f011e68fad..c3d0b6c2d6 100644
--- a/_sources/user-guide/dataframe.md.txt
+++ b/_sources/user-guide/dataframe.md.txt
@@ -19,17 +19,30 @@
# DataFrame API
-A DataFrame represents a logical set of rows with the same named columns,
similar to a [Pandas
DataFrame](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.html)
or
-[Spark
DataFrame](https://spark.apache.org/docs/latest/sql-programming-guide.html).
+A DataFrame represents a logical set of rows with the same named columns,
+similar to a [Pandas DataFrame] or [Spark DataFrame].
-DataFrames are typically created by calling a method on
-`SessionContext`, such as `read_csv`, and can then be modified
-by calling the transformation methods, such as `filter`, `select`,
`aggregate`, and `limit`
-to build up a query definition.
+DataFrames are typically created by calling a method on [`SessionContext`],
such
+as [`read_csv`], and can then be modified by calling the transformation
methods,
+such as [`filter`], [`select`], [`aggregate`], and [`limit`] to build up a
query
+definition.
-The query can be executed by calling the `collect` method.
+The query can be executed by calling the [`collect`] method.
-The DataFrame struct is part of DataFusion's prelude and can be imported with
the following statement.
+DataFusion DataFrames use lazy evaluation, meaning that each transformation
+creates a new plan but does not actually perform any immediate actions. This
+approach allows for the overall plan to be optimized before execution. The plan
+is evaluated (executed) when an action method is invoked, such as [`collect`].
+See the [Library Users Guide] for more details.
+
+The DataFrame API is well documented in the [API reference on docs.rs].
+Please refer to the [Expressions Reference] for more information on
+building logical expressions (`Expr`) to use with the DataFrame API.
+
+## Example
+
+The DataFrame struct is part of DataFusion's `prelude` and can be imported with
+the following statement.
```rust
use datafusion::prelude::*;
@@ -38,73 +51,31 @@ use datafusion::prelude::*;
Here is a minimal example showing the execution of a query using the DataFrame
API.
```rust
-let ctx = SessionContext::new();
-let df = ctx.read_csv("tests/data/example.csv", CsvReadOptions::new()).await?;
-let df = df.filter(col("a").lt_eq(col("b")))?
- .aggregate(vec![col("a")], vec![min(col("b"))])?
- .limit(0, Some(100))?;
-// Print results
-df.show().await?;
+use datafusion::prelude::*;
+use datafusion::error::Result;
+
+#[tokio::main]
+async fn main() -> Result<()> {
+ let ctx = SessionContext::new();
+ let df = ctx.read_csv("tests/data/example.csv",
CsvReadOptions::new()).await?;
+ let df = df.filter(col("a").lt_eq(col("b")))?
+ .aggregate(vec![col("a")], vec![min(col("b"))])?
+ .limit(0, Some(100))?;
+ // Print results
+ df.show().await?;
+ Ok(())
+}
```
-The DataFrame API is well documented in the [API reference on
docs.rs](https://docs.rs/datafusion/latest/datafusion/dataframe/struct.DataFrame.html).
-
-Refer to the [Expressions Reference](expressions) for available functions for
building logical expressions for use with the
-DataFrame API.
-
-## DataFrame Transformations
-
-These methods create a new DataFrame after applying a transformation to the
logical plan that the DataFrame represents.
-
-DataFusion DataFrames use lazy evaluation, meaning that each transformation is
just creating a new query plan and
-not actually performing any transformations. This approach allows for the
overall plan to be optimized before
-execution. The plan is evaluated (executed) when an action method is invoked,
such as `collect`.
-
-| Function | Notes
|
-| ------------------- |
------------------------------------------------------------------------------------------------------------------------------------------
|
-| aggregate | Perform an aggregate query with optional grouping
expressions.
|
-| distinct | Filter out duplicate rows.
|
-| distinct_on | Filter out duplicate rows based on provided
expressions.
|
-| drop_columns | Create a projection with all but the provided column
names.
|
-| except | Calculate the exception of two DataFrames. The two
DataFrames must have exactly the same schema
|
-| filter | Filter a DataFrame to only include rows that match the
specified filter expression.
|
-| intersect | Calculate the intersection of two DataFrames. The two
DataFrames must have exactly the same schema
|
-| join | Join this DataFrame with another DataFrame using the
specified columns as join keys.
|
-| join_on | Join this DataFrame with another DataFrame using
arbitrary expressions.
|
-| limit | Limit the number of rows returned from this DataFrame.
|
-| repartition | Repartition a DataFrame based on a logical
partitioning scheme.
|
-| sort | Sort the DataFrame by the specified sorting
expressions. Any expression can be turned into a sort expression by calling its
`sort` method. |
-| select | Create a projection based on arbitrary expressions.
Example: `df.select(vec![col("c1"), abs(col("c2"))])?`
|
-| select_columns | Create a projection based on column names. Example:
`df.select_columns(&["id", "name"])?`.
|
-| union | Calculate the union of two DataFrames, preserving
duplicate rows. The two DataFrames must have exactly the same schema.
|
-| union_distinct | Calculate the distinct union of two DataFrames. The
two DataFrames must have exactly the same schema.
|
-| with_column | Add an additional column to the DataFrame.
|
-| with_column_renamed | Rename one column by applying a new projection.
|
-
-## DataFrame Actions
-
-These methods execute the logical plan represented by the DataFrame and either
collects the results into memory, prints them to stdout, or writes them to disk.
-
-| Function | Notes
|
-| -------------------------- |
---------------------------------------------------------------------------------------------------------------------------
|
-| collect | Executes this DataFrame and collects all
results into a vector of RecordBatch.
|
-| collect_partitioned | Executes this DataFrame and collects all
results into a vector of vector of RecordBatch maintaining the input
partitioning. |
-| count | Executes this DataFrame to get the total number
of rows. |
-| execute_stream | Executes this DataFrame and returns a stream
over a single partition. |
-| execute_stream_partitioned | Executes this DataFrame and returns one stream
per partition. |
-| show | Execute this DataFrame and print the results to
stdout. |
-| show_limit | Execute this DataFrame and print a subset of
results to stdout. |
-| write_csv | Execute this DataFrame and write the results to
disk in CSV format. |
-| write_json | Execute this DataFrame and write the results to
disk in JSON format. |
-| write_parquet | Execute this DataFrame and write the results to
disk in Parquet format. |
-| write_table | Execute this DataFrame and write the results
via the insert_into method of the registered TableProvider |
-
-## Other DataFrame Methods
-
-| Function | Notes
|
-| ------------------- |
------------------------------------------------------------------------------------------------------------------------------------------------------------
|
-| explain | Return a DataFrame with the explanation of its plan so
far.
|
-| registry | Return a `FunctionRegistry` used to plan udf's calls.
|
-| schema | Returns the schema describing the output of this
DataFrame in terms of columns returned, where each column has a name, data
type, and nullability attribute. |
-| to_logical_plan | Return the optimized logical plan represented by this
DataFrame.
|
-| to_unoptimized_plan | Return the unoptimized logical plan represented by
this DataFrame.
|
+[pandas dataframe]:
https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.html
+[spark dataframe]:
https://spark.apache.org/docs/latest/sql-programming-guide.html
+[`sessioncontext`]:
https://docs.rs/datafusion/latest/datafusion/execution/context/struct.SessionContext.html
+[`read_csv`]:
https://docs.rs/datafusion/latest/datafusion/execution/context/struct.SessionContext.html#method.read_csv
+[`filter`]:
https://docs.rs/datafusion/latest/datafusion/dataframe/struct.DataFrame.html#method.filter
+[`select`]:
https://docs.rs/datafusion/latest/datafusion/dataframe/struct.DataFrame.html#method.select
+[`aggregate`]:
https://docs.rs/datafusion/latest/datafusion/dataframe/struct.DataFrame.html#method.aggregate
+[`limit`]:
https://docs.rs/datafusion/latest/datafusion/dataframe/struct.DataFrame.html#method.limit
+[`collect`]:
https://docs.rs/datafusion/latest/datafusion/dataframe/struct.DataFrame.html#method.collect
+[library users guide]: ../library-user-guide/using-the-dataframe-api.md
+[api reference on docs.rs]:
https://docs.rs/datafusion/latest/datafusion/dataframe/struct.DataFrame.html
+[expressions reference]: expressions
diff --git a/searchindex.js b/searchindex.js
index d2744b2c80..d77d00826c 100644
--- a/searchindex.js
+++ b/searchindex.js
@@ -1 +1 @@
-Search.setIndex({"alltitles": {"!=": [[43, "op-neq"]], "!~": [[43,
"op-re-not-match"]], "!~*": [[43, "op-re-not-match-i"]], "!~~": [[43, "id18"]],
"!~~*": [[43, "id19"]], "#": [[43, "op-bit-xor"]], "%": [[43, "op-modulo"]],
"&": [[43, "op-bit-and"]], "(relation, name) tuples in logical fields and
logical columns are unique": [[10,
"relation-name-tuples-in-logical-fields-and-logical-columns-are-unique"]], "*":
[[43, "op-multiply"]], "+": [[43, "op-plus"]], "-": [[43, "op-minus"]], "/": [[
[...]
\ No newline at end of file
+Search.setIndex({"alltitles": {"!=": [[43, "op-neq"]], "!~": [[43,
"op-re-not-match"]], "!~*": [[43, "op-re-not-match-i"]], "!~~": [[43, "id18"]],
"!~~*": [[43, "id19"]], "#": [[43, "op-bit-xor"]], "%": [[43, "op-modulo"]],
"&": [[43, "op-bit-and"]], "(relation, name) tuples in logical fields and
logical columns are unique": [[10,
"relation-name-tuples-in-logical-fields-and-logical-columns-are-unique"]], "*":
[[43, "op-multiply"]], "+": [[43, "op-plus"]], "-": [[43, "op-minus"]], "/": [[
[...]
\ No newline at end of file
diff --git a/user-guide/dataframe.html b/user-guide/dataframe.html
index f9ce094437..ea23d5b2d3 100644
--- a/user-guide/dataframe.html
+++ b/user-guide/dataframe.html
@@ -469,18 +469,8 @@
<nav id="bd-toc-nav">
<ul class="visible nav section-nav flex-column">
<li class="toc-h2 nav-item toc-entry">
- <a class="reference internal nav-link" href="#dataframe-transformations">
- DataFrame Transformations
- </a>
- </li>
- <li class="toc-h2 nav-item toc-entry">
- <a class="reference internal nav-link" href="#dataframe-actions">
- DataFrame Actions
- </a>
- </li>
- <li class="toc-h2 nav-item toc-entry">
- <a class="reference internal nav-link" href="#other-dataframe-methods">
- Other DataFrame Methods
+ <a class="reference internal nav-link" href="#example">
+ Example
</a>
</li>
</ul>
@@ -531,172 +521,45 @@
-->
<section id="dataframe-api">
<h1>DataFrame API<a class="headerlink" href="#dataframe-api" title="Link to
this heading">¶</a></h1>
-<p>A DataFrame represents a logical set of rows with the same named columns,
similar to a <a class="reference external"
href="https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.html">Pandas
DataFrame</a> or
-<a class="reference external"
href="https://spark.apache.org/docs/latest/sql-programming-guide.html">Spark
DataFrame</a>.</p>
-<p>DataFrames are typically created by calling a method on
-<code class="docutils literal notranslate"><span
class="pre">SessionContext</span></code>, such as <code class="docutils literal
notranslate"><span class="pre">read_csv</span></code>, and can then be modified
-by calling the transformation methods, such as <code class="docutils literal
notranslate"><span class="pre">filter</span></code>, <code class="docutils
literal notranslate"><span class="pre">select</span></code>, <code
class="docutils literal notranslate"><span class="pre">aggregate</span></code>,
and <code class="docutils literal notranslate"><span
class="pre">limit</span></code>
-to build up a query definition.</p>
-<p>The query can be executed by calling the <code class="docutils literal
notranslate"><span class="pre">collect</span></code> method.</p>
-<p>The DataFrame struct is part of DataFusion’s prelude and can be imported
with the following statement.</p>
+<p>A DataFrame represents a logical set of rows with the same named columns,
+similar to a <a class="reference external"
href="https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.html">Pandas
DataFrame</a> or <a class="reference external"
href="https://spark.apache.org/docs/latest/sql-programming-guide.html">Spark
DataFrame</a>.</p>
+<p>DataFrames are typically created by calling a method on <a class="reference
external"
href="https://docs.rs/datafusion/latest/datafusion/execution/context/struct.SessionContext.html"><code
class="docutils literal notranslate"><span
class="pre">SessionContext</span></code></a>, such
+as <a class="reference external"
href="https://docs.rs/datafusion/latest/datafusion/execution/context/struct.SessionContext.html#method.read_csv"><code
class="docutils literal notranslate"><span
class="pre">read_csv</span></code></a>, and can then be modified by calling the
transformation methods,
+such as <a class="reference external"
href="https://docs.rs/datafusion/latest/datafusion/dataframe/struct.DataFrame.html#method.filter"><code
class="docutils literal notranslate"><span
class="pre">filter</span></code></a>, <a class="reference external"
href="https://docs.rs/datafusion/latest/datafusion/dataframe/struct.DataFrame.html#method.select"><code
class="docutils literal notranslate"><span
class="pre">select</span></code></a>, <a class="reference external"
href="https://docs.rs/da [...]
+definition.</p>
+<p>The query can be executed by calling the <a class="reference external"
href="https://docs.rs/datafusion/latest/datafusion/dataframe/struct.DataFrame.html#method.collect"><code
class="docutils literal notranslate"><span
class="pre">collect</span></code></a> method.</p>
+<p>DataFusion DataFrames use lazy evaluation, meaning that each transformation
+creates a new plan but does not actually perform any immediate actions. This
+approach allows for the overall plan to be optimized before execution. The plan
+is evaluated (executed) when an action method is invoked, such as <a
class="reference external"
href="https://docs.rs/datafusion/latest/datafusion/dataframe/struct.DataFrame.html#method.collect"><code
class="docutils literal notranslate"><span
class="pre">collect</span></code></a>.
+See the <a class="reference internal"
href="../library-user-guide/using-the-dataframe-api.html"><span class="std
std-doc">Library Users Guide</span></a> for more details.</p>
+<p>The DataFrame API is well documented in the <a class="reference external"
href="https://docs.rs/datafusion/latest/datafusion/dataframe/struct.DataFrame.html">API
reference on docs.rs</a>.
+Please refer to the <a class="reference internal"
href="expressions.html"><span class="doc std std-doc">Expressions
Reference</span></a> for more information on
+building logical expressions (<code class="docutils literal notranslate"><span
class="pre">Expr</span></code>) to use with the DataFrame API.</p>
+<section id="example">
+<h2>Example<a class="headerlink" href="#example" title="Link to this
heading">¶</a></h2>
+<p>The DataFrame struct is part of DataFusion’s <code class="docutils literal
notranslate"><span class="pre">prelude</span></code> and can be imported with
+the following statement.</p>
<div class="highlight-rust notranslate"><div
class="highlight"><pre><span></span><span class="k">use</span><span class="w">
</span><span class="n">datafusion</span><span class="p">::</span><span
class="n">prelude</span><span class="p">::</span><span class="o">*</span><span
class="p">;</span>
</pre></div>
</div>
<p>Here is a minimal example showing the execution of a query using the
DataFrame API.</p>
-<div class="highlight-rust notranslate"><div
class="highlight"><pre><span></span><span class="kd">let</span><span class="w">
</span><span class="n">ctx</span><span class="w"> </span><span
class="o">=</span><span class="w"> </span><span
class="n">SessionContext</span><span class="p">::</span><span
class="n">new</span><span class="p">();</span>
-<span class="kd">let</span><span class="w"> </span><span
class="n">df</span><span class="w"> </span><span class="o">=</span><span
class="w"> </span><span class="n">ctx</span><span class="p">.</span><span
class="n">read_csv</span><span class="p">(</span><span
class="s">"tests/data/example.csv"</span><span
class="p">,</span><span class="w"> </span><span
class="n">CsvReadOptions</span><span class="p">::</span><span
class="n">new</span><span class="p">()).</span><span class="k">awa [...]
-<span class="kd">let</span><span class="w"> </span><span
class="n">df</span><span class="w"> </span><span class="o">=</span><span
class="w"> </span><span class="n">df</span><span class="p">.</span><span
class="n">filter</span><span class="p">(</span><span class="n">col</span><span
class="p">(</span><span class="s">"a"</span><span
class="p">).</span><span class="n">lt_eq</span><span class="p">(</span><span
class="n">col</span><span class="p">(</span><span class="s">"b" [...]
-<span class="w"> </span><span class="p">.</span><span
class="n">aggregate</span><span class="p">(</span><span
class="fm">vec!</span><span class="p">[</span><span class="n">col</span><span
class="p">(</span><span class="s">"a"</span><span
class="p">)],</span><span class="w"> </span><span class="fm">vec!</span><span
class="p">[</span><span class="n">min</span><span class="p">(</span><span
class="n">col</span><span class="p">(</span><span
class="s">"b"</span><s [...]
-<span class="w"> </span><span class="p">.</span><span
class="n">limit</span><span class="p">(</span><span class="mi">0</span><span
class="p">,</span><span class="w"> </span><span class="nb">Some</span><span
class="p">(</span><span class="mi">100</span><span class="p">))</span><span
class="o">?</span><span class="p">;</span>
-<span class="c1">// Print results</span>
-<span class="n">df</span><span class="p">.</span><span
class="n">show</span><span class="p">().</span><span
class="k">await</span><span class="o">?</span><span class="p">;</span>
+<div class="highlight-rust notranslate"><div
class="highlight"><pre><span></span><span class="k">use</span><span class="w">
</span><span class="n">datafusion</span><span class="p">::</span><span
class="n">prelude</span><span class="p">::</span><span class="o">*</span><span
class="p">;</span>
+<span class="k">use</span><span class="w"> </span><span
class="n">datafusion</span><span class="p">::</span><span
class="n">error</span><span class="p">::</span><span
class="nb">Result</span><span class="p">;</span>
+
+<span class="cp">#[tokio::main]</span>
+<span class="k">async</span><span class="w"> </span><span
class="k">fn</span><span class="w"> </span><span class="nf">main</span><span
class="p">()</span><span class="w"> </span><span class="p">-></span><span
class="w"> </span><span class="nb">Result</span><span
class="o"><</span><span class="p">()</span><span class="o">></span><span
class="w"> </span><span class="p">{</span>
+<span class="w"> </span><span class="kd">let</span><span class="w">
</span><span class="n">ctx</span><span class="w"> </span><span
class="o">=</span><span class="w"> </span><span
class="n">SessionContext</span><span class="p">::</span><span
class="n">new</span><span class="p">();</span>
+<span class="w"> </span><span class="kd">let</span><span class="w">
</span><span class="n">df</span><span class="w"> </span><span
class="o">=</span><span class="w"> </span><span class="n">ctx</span><span
class="p">.</span><span class="n">read_csv</span><span class="p">(</span><span
class="s">"tests/data/example.csv"</span><span
class="p">,</span><span class="w"> </span><span
class="n">CsvReadOptions</span><span class="p">::</span><span
class="n">new</span><span class="p">()) [...]
+<span class="w"> </span><span class="kd">let</span><span class="w">
</span><span class="n">df</span><span class="w"> </span><span
class="o">=</span><span class="w"> </span><span class="n">df</span><span
class="p">.</span><span class="n">filter</span><span class="p">(</span><span
class="n">col</span><span class="p">(</span><span
class="s">"a"</span><span class="p">).</span><span
class="n">lt_eq</span><span class="p">(</span><span class="n">col</span><span
class="p">(</span><s [...]
+<span class="w"> </span><span class="p">.</span><span
class="n">aggregate</span><span class="p">(</span><span
class="fm">vec!</span><span class="p">[</span><span class="n">col</span><span
class="p">(</span><span class="s">"a"</span><span
class="p">)],</span><span class="w"> </span><span class="fm">vec!</span><span
class="p">[</span><span class="n">min</span><span class="p">(</span><span
class="n">col</span><span class="p">(</span><span
class="s">"b"</span><span [...]
+<span class="w"> </span><span class="p">.</span><span
class="n">limit</span><span class="p">(</span><span class="mi">0</span><span
class="p">,</span><span class="w"> </span><span class="nb">Some</span><span
class="p">(</span><span class="mi">100</span><span class="p">))</span><span
class="o">?</span><span class="p">;</span>
+<span class="w"> </span><span class="c1">// Print results</span>
+<span class="w"> </span><span class="n">df</span><span
class="p">.</span><span class="n">show</span><span class="p">().</span><span
class="k">await</span><span class="o">?</span><span class="p">;</span>
+<span class="w"> </span><span class="nb">Ok</span><span
class="p">(())</span>
+<span class="p">}</span>
</pre></div>
</div>
-<p>The DataFrame API is well documented in the <a class="reference external"
href="https://docs.rs/datafusion/latest/datafusion/dataframe/struct.DataFrame.html">API
reference on docs.rs</a>.</p>
-<p>Refer to the <a class="reference internal" href="expressions.html"><span
class="doc std std-doc">Expressions Reference</span></a> for available
functions for building logical expressions for use with the
-DataFrame API.</p>
-<section id="dataframe-transformations">
-<h2>DataFrame Transformations<a class="headerlink"
href="#dataframe-transformations" title="Link to this heading">¶</a></h2>
-<p>These methods create a new DataFrame after applying a transformation to the
logical plan that the DataFrame represents.</p>
-<p>DataFusion DataFrames use lazy evaluation, meaning that each transformation
is just creating a new query plan and
-not actually performing any transformations. This approach allows for the
overall plan to be optimized before
-execution. The plan is evaluated (executed) when an action method is invoked,
such as <code class="docutils literal notranslate"><span
class="pre">collect</span></code>.</p>
-<table class="table">
-<thead>
-<tr class="row-odd"><th class="head"><p>Function</p></th>
-<th class="head"><p>Notes</p></th>
-</tr>
-</thead>
-<tbody>
-<tr class="row-even"><td><p>aggregate</p></td>
-<td><p>Perform an aggregate query with optional grouping expressions.</p></td>
-</tr>
-<tr class="row-odd"><td><p>distinct</p></td>
-<td><p>Filter out duplicate rows.</p></td>
-</tr>
-<tr class="row-even"><td><p>distinct_on</p></td>
-<td><p>Filter out duplicate rows based on provided expressions.</p></td>
-</tr>
-<tr class="row-odd"><td><p>drop_columns</p></td>
-<td><p>Create a projection with all but the provided column names.</p></td>
-</tr>
-<tr class="row-even"><td><p>except</p></td>
-<td><p>Calculate the exception of two DataFrames. The two DataFrames must have
exactly the same schema</p></td>
-</tr>
-<tr class="row-odd"><td><p>filter</p></td>
-<td><p>Filter a DataFrame to only include rows that match the specified filter
expression.</p></td>
-</tr>
-<tr class="row-even"><td><p>intersect</p></td>
-<td><p>Calculate the intersection of two DataFrames. The two DataFrames must
have exactly the same schema</p></td>
-</tr>
-<tr class="row-odd"><td><p>join</p></td>
-<td><p>Join this DataFrame with another DataFrame using the specified columns
as join keys.</p></td>
-</tr>
-<tr class="row-even"><td><p>join_on</p></td>
-<td><p>Join this DataFrame with another DataFrame using arbitrary
expressions.</p></td>
-</tr>
-<tr class="row-odd"><td><p>limit</p></td>
-<td><p>Limit the number of rows returned from this DataFrame.</p></td>
-</tr>
-<tr class="row-even"><td><p>repartition</p></td>
-<td><p>Repartition a DataFrame based on a logical partitioning scheme.</p></td>
-</tr>
-<tr class="row-odd"><td><p>sort</p></td>
-<td><p>Sort the DataFrame by the specified sorting expressions. Any expression
can be turned into a sort expression by calling its <code class="docutils
literal notranslate"><span class="pre">sort</span></code> method.</p></td>
-</tr>
-<tr class="row-even"><td><p>select</p></td>
-<td><p>Create a projection based on arbitrary expressions. Example: <code
class="docutils literal notranslate"><span
class="pre">df.select(vec![col("c1"),</span> <span
class="pre">abs(col("c2"))])?</span></code></p></td>
-</tr>
-<tr class="row-odd"><td><p>select_columns</p></td>
-<td><p>Create a projection based on column names. Example: <code
class="docutils literal notranslate"><span
class="pre">df.select_columns(&["id",</span> <span
class="pre">"name"])?</span></code>.</p></td>
-</tr>
-<tr class="row-even"><td><p>union</p></td>
-<td><p>Calculate the union of two DataFrames, preserving duplicate rows. The
two DataFrames must have exactly the same schema.</p></td>
-</tr>
-<tr class="row-odd"><td><p>union_distinct</p></td>
-<td><p>Calculate the distinct union of two DataFrames. The two DataFrames must
have exactly the same schema.</p></td>
-</tr>
-<tr class="row-even"><td><p>with_column</p></td>
-<td><p>Add an additional column to the DataFrame.</p></td>
-</tr>
-<tr class="row-odd"><td><p>with_column_renamed</p></td>
-<td><p>Rename one column by applying a new projection.</p></td>
-</tr>
-</tbody>
-</table>
-</section>
-<section id="dataframe-actions">
-<h2>DataFrame Actions<a class="headerlink" href="#dataframe-actions"
title="Link to this heading">¶</a></h2>
-<p>These methods execute the logical plan represented by the DataFrame and
either collects the results into memory, prints them to stdout, or writes them
to disk.</p>
-<table class="table">
-<thead>
-<tr class="row-odd"><th class="head"><p>Function</p></th>
-<th class="head"><p>Notes</p></th>
-</tr>
-</thead>
-<tbody>
-<tr class="row-even"><td><p>collect</p></td>
-<td><p>Executes this DataFrame and collects all results into a vector of
RecordBatch.</p></td>
-</tr>
-<tr class="row-odd"><td><p>collect_partitioned</p></td>
-<td><p>Executes this DataFrame and collects all results into a vector of
vector of RecordBatch maintaining the input partitioning.</p></td>
-</tr>
-<tr class="row-even"><td><p>count</p></td>
-<td><p>Executes this DataFrame to get the total number of rows.</p></td>
-</tr>
-<tr class="row-odd"><td><p>execute_stream</p></td>
-<td><p>Executes this DataFrame and returns a stream over a single
partition.</p></td>
-</tr>
-<tr class="row-even"><td><p>execute_stream_partitioned</p></td>
-<td><p>Executes this DataFrame and returns one stream per partition.</p></td>
-</tr>
-<tr class="row-odd"><td><p>show</p></td>
-<td><p>Execute this DataFrame and print the results to stdout.</p></td>
-</tr>
-<tr class="row-even"><td><p>show_limit</p></td>
-<td><p>Execute this DataFrame and print a subset of results to stdout.</p></td>
-</tr>
-<tr class="row-odd"><td><p>write_csv</p></td>
-<td><p>Execute this DataFrame and write the results to disk in CSV
format.</p></td>
-</tr>
-<tr class="row-even"><td><p>write_json</p></td>
-<td><p>Execute this DataFrame and write the results to disk in JSON
format.</p></td>
-</tr>
-<tr class="row-odd"><td><p>write_parquet</p></td>
-<td><p>Execute this DataFrame and write the results to disk in Parquet
format.</p></td>
-</tr>
-<tr class="row-even"><td><p>write_table</p></td>
-<td><p>Execute this DataFrame and write the results via the insert_into method
of the registered TableProvider</p></td>
-</tr>
-</tbody>
-</table>
-</section>
-<section id="other-dataframe-methods">
-<h2>Other DataFrame Methods<a class="headerlink"
href="#other-dataframe-methods" title="Link to this heading">¶</a></h2>
-<table class="table">
-<thead>
-<tr class="row-odd"><th class="head"><p>Function</p></th>
-<th class="head"><p>Notes</p></th>
-</tr>
-</thead>
-<tbody>
-<tr class="row-even"><td><p>explain</p></td>
-<td><p>Return a DataFrame with the explanation of its plan so far.</p></td>
-</tr>
-<tr class="row-odd"><td><p>registry</p></td>
-<td><p>Return a <code class="docutils literal notranslate"><span
class="pre">FunctionRegistry</span></code> used to plan udf’s calls.</p></td>
-</tr>
-<tr class="row-even"><td><p>schema</p></td>
-<td><p>Returns the schema describing the output of this DataFrame in terms of
columns returned, where each column has a name, data type, and nullability
attribute.</p></td>
-</tr>
-<tr class="row-odd"><td><p>to_logical_plan</p></td>
-<td><p>Return the optimized logical plan represented by this
DataFrame.</p></td>
-</tr>
-<tr class="row-even"><td><p>to_unoptimized_plan</p></td>
-<td><p>Return the unoptimized logical plan represented by this
DataFrame.</p></td>
-</tr>
-</tbody>
-</table>
</section>
</section>
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]