This is an automated email from the ASF dual-hosted git repository.
agrove pushed a commit to branch main
in repository https://gitbox.apache.org/repos/asf/datafusion-python.git
The following commit(s) were added to refs/heads/main by this push:
new 7fd0c96 Add document about basics of working with expressions (#668)
7fd0c96 is described below
commit 7fd0c96b6f59c750e7dd59f92beed7d57d371f6a
Author: Tim Saucer <[email protected]>
AuthorDate: Thu May 9 12:01:06 2024 -0400
Add document about basics of working with expressions (#668)
---
.../user-guide/common-operations/expressions.rst | 94 ++++++++++++++++++++++
docs/source/user-guide/common-operations/index.rst | 1 +
2 files changed, 95 insertions(+)
diff --git a/docs/source/user-guide/common-operations/expressions.rst
b/docs/source/user-guide/common-operations/expressions.rst
new file mode 100644
index 0000000..ebb514f
--- /dev/null
+++ b/docs/source/user-guide/common-operations/expressions.rst
@@ -0,0 +1,94 @@
+.. Licensed to the Apache Software Foundation (ASF) under one
+.. or more contributor license agreements. See the NOTICE file
+.. distributed with this work for additional information
+.. regarding copyright ownership. The ASF licenses this file
+.. to you under the Apache License, Version 2.0 (the
+.. "License"); you may not use this file except in compliance
+.. with the License. You may obtain a copy of the License at
+
+.. http://www.apache.org/licenses/LICENSE-2.0
+
+.. Unless required by applicable law or agreed to in writing,
+.. software distributed under the License is distributed on an
+.. "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+.. KIND, either express or implied. See the License for the
+.. specific language governing permissions and limitations
+.. under the License.
+
+Expressions
+===========
+
+In DataFusion an expression is an abstraction that represents a computation.
+Expressions are used as the primary inputs and ouputs for most functions within
+DataFusion. As such, expressions can be combined to create expression trees, a
+concept shared across most compilers and databases.
+
+Column
+------
+
+The first expression most new users will interact with is the Column, which is
created by calling :func:`col`.
+This expression represents a column within a DataFrame. The function
:func:`col` takes as in input a string
+and returns an expression as it's output.
+
+Literal
+-------
+
+Literal expressions represent a single value. These are helpful in a wide
range of operations where
+a specific, known value is of interest. You can create a literal expression
using the function :func:`lit`.
+The type of the object passed to the :func:`lit` function will be used to
convert it to a known data type.
+
+In the following example we create expressions for the column named `color`
and the literal scalar string `red`.
+The resultant variable `red_units` is itself also an expression.
+
+.. ipython:: python
+
+ red_units = col("color") == lit("red")
+
+Boolean
+-------
+
+When combining expressions that evaluate to a boolean value, you can combine
these expressions using boolean operators.
+It is important to note that in order to combine these expressions, you *must*
use bitwise operators. See the following
+examples for the and, or, and not operations.
+
+
+.. ipython:: python
+
+ red_or_green_units = (col("color") == lit("red")) | (col("color") ==
lit("green"))
+ heavy_red_units = (col("color") == lit("red")) & (col("weight") > lit(42))
+ not_red_units = ~(col("color") == lit("red"))
+
+Functions
+---------
+
+As mentioned before, most functions in DataFusion return an expression at
their output. This allows us to create
+a wide variety of expressions built up from other expressions. For example,
:func:`.alias` is a function that takes
+as it input a single expression and returns an expression in which the name of
the expression has changed.
+
+The following example shows a series of expressions that are built up from
functions operating on expressions.
+
+.. ipython:: python
+
+ from datafusion import SessionContext
+ from datafusion import column, lit
+ from datafusion import functions as f
+ import random
+
+ ctx = SessionContext()
+ df = ctx.from_pydict(
+ {
+ "name": ["Albert", "Becca", "Carlos", "Dante"],
+ "age": [42, 67, 27, 71],
+ "years_in_position": [13, 21, 10, 54],
+ },
+ name="employees"
+ )
+
+ age_col = col("age")
+ renamed_age = age_col.alias("age_in_years")
+ start_age = age_col - col("years_in_position")
+ started_young = start_age < lit(18)
+ can_retire = age_col > lit(65)
+ long_timer = started_young & can_retire
+
+ df.filter(long_timer).select(col("name"), renamed_age,
col("years_in_position"))
diff --git a/docs/source/user-guide/common-operations/index.rst
b/docs/source/user-guide/common-operations/index.rst
index 950afb9..b15b04c 100644
--- a/docs/source/user-guide/common-operations/index.rst
+++ b/docs/source/user-guide/common-operations/index.rst
@@ -23,6 +23,7 @@ Common Operations
basic-info
select-and-filter
+ expressions
joins
functions
aggregations
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]