Jefffrey commented on code in PR #17264:
URL: https://github.com/apache/datafusion/pull/17264#discussion_r2289930820
##########
datafusion/expr/src/udf.rs:
##########
@@ -469,18 +469,28 @@ pub trait ScalarUDFImpl: Debug + DynEq + DynHash + Send +
Sync {
))
}
- /// Returns the function's [`Signature`] for information about what input
- /// types are accepted and the function's Volatility.
+ /// Returns a [`Signature`] describing the argument types for which this
+ /// function has an implementation, and the function's [`Volatility`].
+ ///
+ /// See [`Signature`] for more details on argument type handling
+ /// and [`Self::return_type`] for computing the return type.
+ ///
+ /// [`Volatility`]: datafusion_expr_common::signature::Volatility
fn signature(&self) -> &Signature;
- /// What [`DataType`] will be returned by this function, given the types of
- /// the arguments.
+ /// [`DataType`] returned by this function, given the types of the
+ /// arguments.
+ ///
+ /// # Arguments
+ ///
+ /// `arg_types` Data types of the arguments. These types are guaranteed to
+ /// match one of [`Self::signature`]s.
///
/// # Notes
///
/// If you provide an implementation for [`Self::return_field_from_args`],
/// DataFusion will not call `return_type` (this function). In such cases
- /// is recommended to return [`DataFusionError::Internal`].
+ /// is recommended to return [`DataFusionError::Internal`] from this
function.
///
/// [`DataFusionError::Internal`]:
datafusion_common::DataFusionError::Internal
fn return_type(&self, arg_types: &[DataType]) -> Result<DataType>;
Review Comment:
Maybe not related to this PR, but one part I want to understand is what
checks should be done in this function? e.g. should it check that the input
argument types are valid, like so:
https://github.com/apache/datafusion/blob/d64b7c3b4b3f446c5626f73e59d61933da6656f0/datafusion/functions-nested/src/resize.rs#L126-L137
Or should it take on faith that some other part of the code already checked
the input argument types are valid?
##########
datafusion/expr-common/src/signature.rs:
##########
@@ -44,42 +43,79 @@ pub const TIMEZONE_WILDCARD: &str = "+TZ";
/// valid length. It exists to avoid the need to enumerate all possible fixed
size list lengths.
pub const FIXED_SIZE_LIST_WILDCARD: i32 = i32::MIN;
-/// A function's volatility, which defines the functions eligibility for
certain optimizations
+/// When a function's output changes when the input does not
+///
+/// The volatility of a function determine eligibility for certain
+/// optimizations. You should always defined your function to have the
strictest
+/// volatility that applies to it to maximize performance and avoid unexpected
+/// results.
Review Comment:
```suggestion
/// How a function's output changes with respect to a fixed input
///
/// The volatility of a function determine eligibility for certain
/// optimizations. You should always define your function to have the
strictest
/// possible volatility to maximize performance and avoid unexpected
/// results.
```
##########
datafusion/expr-common/src/signature.rs:
##########
@@ -44,42 +43,79 @@ pub const TIMEZONE_WILDCARD: &str = "+TZ";
/// valid length. It exists to avoid the need to enumerate all possible fixed
size list lengths.
pub const FIXED_SIZE_LIST_WILDCARD: i32 = i32::MIN;
-/// A function's volatility, which defines the functions eligibility for
certain optimizations
+/// When a function's output changes when the input does not
+///
+/// The volatility of a function determine eligibility for certain
+/// optimizations. You should always defined your function to have the
strictest
+/// volatility that applies to it to maximize performance and avoid unexpected
+/// results.
+///
#[derive(Debug, PartialEq, Eq, PartialOrd, Ord, Clone, Copy, Hash)]
pub enum Volatility {
- /// An immutable function will always return the same output when given
the same
- /// input. DataFusion will attempt to inline immutable functions during
planning.
+ /// Always returns the same output when given the same input.
+ ///
+ /// DataFusion will inline immutable functions during planning.
Immutable,
- /// A stable function may return different values given the same input
across different
- /// queries but must return the same value for a given input within a
query. An example of
- /// this is the `Now` function. DataFusion will attempt to inline `Stable`
functions
- /// during planning, when possible.
- /// For query `select col1, now() from t1`, it might take a while to
execute but
- /// `now()` column will be the same for each output row, which is evaluated
- /// during planning.
+ /// May return different values given the same input across different
+ /// queries but must return the same value for a given input within a
query.
+ ///
+ /// An example of a stable function is the `now()` function. For example,
+ /// the query `select col1, now() from t1`, will return different results
+ /// each time it is run, but within the same query, the output of the
+ /// `now()` function has the same value for each output row.
+ ///
+ /// DataFusion will inline `Stable` functions during planning, when
+ /// possible.
Stable,
- /// A volatile function may change the return value from evaluation to
evaluation.
- /// Multiple invocations of a volatile function may return different
results when used in the
- /// same query. An example of this is the random() function. DataFusion
- /// can not evaluate such functions during planning.
- /// In the query `select col1, random() from t1`, `random()` function will
be evaluated
- /// for each output row, resulting in a unique random value for each row.
+ /// May change the return value from evaluation to evaluation.
+ ///
+ /// Multiple invocations of a volatile function may return different
results
+ /// when used in the same query on different rows. An example of this is
the
+ /// `random()` function.
+ ///
+ /// DataFusion can not evaluate such functions during planning or push
these
+ /// predicates into scans. In the query `select col1, random() from t1`,
+ /// `random()` function will be evaluated for each output row, resulting in
+ /// a unique random value for each row.
Volatile,
}
-/// A function's type signature defines the types of arguments the function
supports.
+/// The types of arguments for which a function has implementations.
+///
+/// Functions typically provide implementations for a small number of different
+/// argument [`DataType`]s, rather than all possible combinations. If a user
+/// calls a function with arguments that do not match any of the declared
types,
+/// DataFusion will attempt to automatically coerces (add casts to) function
Review Comment:
```suggestion
/// DataFusion will attempt to automatically coerce (add casts to) function
```
##########
datafusion/expr/src/udf.rs:
##########
@@ -469,18 +469,28 @@ pub trait ScalarUDFImpl: Debug + DynEq + DynHash + Send +
Sync {
))
}
- /// Returns the function's [`Signature`] for information about what input
- /// types are accepted and the function's Volatility.
+ /// Returns a [`Signature`] describing the argument types for which this
+ /// function has an implementation, and the function's [`Volatility`].
+ ///
+ /// See [`Signature`] for more details on argument type handling
+ /// and [`Self::return_type`] for computing the return type.
+ ///
+ /// [`Volatility`]: datafusion_expr_common::signature::Volatility
fn signature(&self) -> &Signature;
- /// What [`DataType`] will be returned by this function, given the types of
- /// the arguments.
+ /// [`DataType`] returned by this function, given the types of the
+ /// arguments.
+ ///
+ /// # Arguments
+ ///
+ /// `arg_types` Data types of the arguments. These types are guaranteed to
+ /// match one of [`Self::signature`]s.
///
/// # Notes
///
/// If you provide an implementation for [`Self::return_field_from_args`],
/// DataFusion will not call `return_type` (this function). In such cases
- /// is recommended to return [`DataFusionError::Internal`].
+ /// is recommended to return [`DataFusionError::Internal`] from this
function.
Review Comment:
In this case, is it valid to suggest putting `unimplemented!()` or
`unreachable!()` given this function isn't called anyway? Or safer to have
error return even though it would never be returned anyway?
##########
datafusion/expr-common/src/signature.rs:
##########
@@ -44,42 +43,79 @@ pub const TIMEZONE_WILDCARD: &str = "+TZ";
/// valid length. It exists to avoid the need to enumerate all possible fixed
size list lengths.
pub const FIXED_SIZE_LIST_WILDCARD: i32 = i32::MIN;
-/// A function's volatility, which defines the functions eligibility for
certain optimizations
+/// When a function's output changes when the input does not
+///
+/// The volatility of a function determine eligibility for certain
+/// optimizations. You should always defined your function to have the
strictest
+/// volatility that applies to it to maximize performance and avoid unexpected
+/// results.
+///
#[derive(Debug, PartialEq, Eq, PartialOrd, Ord, Clone, Copy, Hash)]
pub enum Volatility {
- /// An immutable function will always return the same output when given
the same
- /// input. DataFusion will attempt to inline immutable functions during
planning.
+ /// Always returns the same output when given the same input.
+ ///
+ /// DataFusion will inline immutable functions during planning.
Review Comment:
Silly question, but what does inlining mean in this context?
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]