This is an automated email from the ASF dual-hosted git repository.

github-bot pushed a commit to branch main
in repository https://gitbox.apache.org/repos/asf/datafusion.git


The following commit(s) were added to refs/heads/main by this push:
     new 60904e4f1a Deduplicate range/gen_series nested functions code (#18198)
60904e4f1a is described below

commit 60904e4f1a7b5a95950b2b5574d54216a88a8e5d
Author: Jeffrey Vo <[email protected]>
AuthorDate: Tue Oct 28 13:17:54 2025 +1100

    Deduplicate range/gen_series nested functions code (#18198)
    
    ## Which issue does this PR close?
    
    <!--
    We generally require a GitHub issue to be filed for all bug fixes and
    enhancements and this helps us generate change logs for our releases.
    You can link an issue to this PR using the GitHub syntax. For example
    `Closes #123` indicates that this PR will close issue #123.
    -->
    
    - Doing some prework for #15881
    
    ## Rationale for this change
    
    <!--
    Why are you proposing this change? If this is already explained clearly
    in the issue then this section is not needed.
    Explaining clearly why changes are proposed helps reviewers understand
    your changes and offer better suggestions for fixes.
    -->
    
    `Range` and `GenSeries` are essentially the same except for whether they
    include upper bounds or not; unify their function code to reduce
    duplication, making future changes easier.
    
    ## What changes are included in this PR?
    
    <!--
    There is no need to duplicate the description in the issue here but it
    is sometimes worth providing a summary of the individual changes in this
    PR.
    -->
    
    Remove `GenSeries` struct, folding it into `Range`. Do some more minor
    refactoring to their code.
    
    ## Are these changes tested?
    
    <!--
    We typically require tests for all PRs in order to:
    1. Prevent the code from being accidentally broken by subsequent changes
    2. Serve as another way to document the expected behavior of the code
    
    If tests are not included in your PR, please explain why (for example,
    are they covered by existing tests)?
    -->
    
    Existing tests (updated some error messages).
    
    ## Are there any user-facing changes?
    
    <!--
    If there are user-facing changes then we may require documentation to be
    updated before approving the PR.
    -->
    
    Not really (updated some error messages).
    
    <!--
    If there are any breaking changes to public APIs, please add the `api
    change` label.
    -->
---
 datafusion/functions-nested/src/macros.rs      |  25 +-
 datafusion/functions-nested/src/range.rs       | 705 +++++++++++--------------
 datafusion/sqllogictest/test_files/array.slt   |  10 +-
 docs/source/user-guide/sql/scalar_functions.md |  14 +-
 4 files changed, 334 insertions(+), 420 deletions(-)

diff --git a/datafusion/functions-nested/src/macros.rs 
b/datafusion/functions-nested/src/macros.rs
index cec7f2fd56..5380f6b127 100644
--- a/datafusion/functions-nested/src/macros.rs
+++ b/datafusion/functions-nested/src/macros.rs
@@ -41,10 +41,15 @@
 /// * `arg`: 0 or more named arguments for the function
 /// * `DOC`: documentation string for the function
 /// * `SCALAR_UDF_FUNC`: name of the function to create (just) the `ScalarUDF`
+/// * (optional) `$CTOR`: Pass a custom constructor. When omitted it
+///   automatically resolves to `$UDF::new()`.
 ///
 /// [`ScalarUDFImpl`]: datafusion_expr::ScalarUDFImpl
 macro_rules! make_udf_expr_and_func {
-    ($UDF:ty, $EXPR_FN:ident, $($arg:ident)*, $DOC:expr , 
$SCALAR_UDF_FN:ident) => {
+    ($UDF:ident, $EXPR_FN:ident, $($arg:ident)*, $DOC:expr, 
$SCALAR_UDF_FN:ident) => {
+        make_udf_expr_and_func!($UDF, $EXPR_FN, $($arg)*, $DOC, 
$SCALAR_UDF_FN, $UDF::new);
+    };
+    ($UDF:ident, $EXPR_FN:ident, $($arg:ident)*, $DOC:expr, 
$SCALAR_UDF_FN:ident, $CTOR:path) => {
         paste::paste! {
             // "fluent expr_fn" style function
             #[doc = $DOC]
@@ -54,10 +59,13 @@ macro_rules! make_udf_expr_and_func {
                     vec![$($arg),*],
                 ))
             }
-            create_func!($UDF, $SCALAR_UDF_FN);
+            create_func!($UDF, $SCALAR_UDF_FN, $CTOR);
         }
     };
-    ($UDF:ty, $EXPR_FN:ident, $DOC:expr , $SCALAR_UDF_FN:ident) => {
+    ($UDF:ident, $EXPR_FN:ident, $DOC:expr, $SCALAR_UDF_FN:ident) => {
+        make_udf_expr_and_func!($UDF, $EXPR_FN, $DOC, $SCALAR_UDF_FN, 
$UDF::new);
+    };
+    ($UDF:ident, $EXPR_FN:ident, $DOC:expr, $SCALAR_UDF_FN:ident, $CTOR:path) 
=> {
         paste::paste! {
             // "fluent expr_fn" style function
             #[doc = $DOC]
@@ -67,7 +75,7 @@ macro_rules! make_udf_expr_and_func {
                     arg,
                 ))
             }
-            create_func!($UDF, $SCALAR_UDF_FN);
+            create_func!($UDF, $SCALAR_UDF_FN, $CTOR);
         }
     };
 }
@@ -80,10 +88,15 @@ macro_rules! make_udf_expr_and_func {
 /// # Arguments
 /// * `UDF`: name of the [`ScalarUDFImpl`]
 /// * `SCALAR_UDF_FUNC`: name of the function to create (just) the `ScalarUDF`
+/// * (optional) `$CTOR`: Pass a custom constructor. When omitted it
+///   automatically resolves to `$UDF::new()`.
 ///
 /// [`ScalarUDFImpl`]: datafusion_expr::ScalarUDFImpl
 macro_rules! create_func {
-    ($UDF:ty, $SCALAR_UDF_FN:ident) => {
+    ($UDF:ident, $SCALAR_UDF_FN:ident) => {
+        create_func!($UDF, $SCALAR_UDF_FN, $UDF::new);
+    };
+    ($UDF:ident, $SCALAR_UDF_FN:ident, $CTOR:path) => {
         paste::paste! {
             #[doc = concat!("ScalarFunction that returns a 
[`ScalarUDF`](datafusion_expr::ScalarUDF) for ")]
             #[doc = stringify!($UDF)]
@@ -92,7 +105,7 @@ macro_rules! create_func {
                 static INSTANCE: 
std::sync::LazyLock<std::sync::Arc<datafusion_expr::ScalarUDF>> =
                     std::sync::LazyLock::new(|| {
                         
std::sync::Arc::new(datafusion_expr::ScalarUDF::new_from_impl(
-                            <$UDF>::new(),
+                            $CTOR(),
                         ))
                     });
                 std::sync::Arc::clone(&INSTANCE)
diff --git a/datafusion/functions-nested/src/range.rs 
b/datafusion/functions-nested/src/range.rs
index 619b0e84c1..01c6e9c43f 100644
--- a/datafusion/functions-nested/src/range.rs
+++ b/datafusion/functions-nested/src/range.rs
@@ -22,20 +22,23 @@ use arrow::array::{
     builder::{Date32Builder, TimestampNanosecondBuilder},
     temporal_conversions::as_datetime_with_timezone,
     timezone::Tz,
-    types::{Date32Type, IntervalMonthDayNanoType, TimestampNanosecondType as 
TSNT},
-    Array, ArrayRef, Int64Array, ListArray, ListBuilder, NullArray, 
NullBufferBuilder,
-    TimestampNanosecondArray,
+    types::{Date32Type, IntervalMonthDayNanoType, TimestampNanosecondType},
+    Array, ArrayRef, Int64Array, ListArray, ListBuilder, NullBufferBuilder,
 };
 use arrow::buffer::OffsetBuffer;
 use arrow::datatypes::{
     DataType, DataType::*, Field, IntervalUnit::MonthDayNano, 
TimeUnit::Nanosecond,
 };
-use datafusion_common::cast::{
-    as_date32_array, as_int64_array, as_interval_mdn_array, 
as_timestamp_nanosecond_array,
+use datafusion_common::{
+    cast::{
+        as_date32_array, as_int64_array, as_interval_mdn_array,
+        as_timestamp_nanosecond_array,
+    },
+    DataFusionError, ScalarValue,
 };
 use datafusion_common::{
-    exec_datafusion_err, exec_err, internal_err, not_impl_datafusion_err,
-    utils::take_function_args, Result,
+    exec_datafusion_err, exec_err, not_impl_datafusion_err, 
utils::take_function_args,
+    Result,
 };
 use datafusion_expr::{
     ColumnarValue, Documentation, ScalarUDFImpl, Signature, Volatility,
@@ -53,13 +56,24 @@ make_udf_expr_and_func!(
     range,
     start stop step,
     "create a list of values in the range between start and stop",
-    range_udf
+    range_udf,
+    Range::new
+);
+
+make_udf_expr_and_func!(
+    GenSeries,
+    gen_series,
+    start stop step,
+    "create a list of values in the range between start and stop, include 
upper bound",
+    gen_series_udf,
+    Range::generate_series
 );
 
 #[user_doc(
     doc_section(label = "Array Functions"),
     description = "Returns an Arrow array between start and stop with step. 
The range start..end contains all values with start <= x < end. It is empty if 
start >= end. Step cannot be 0.",
-    syntax_example = "range(start, stop, step)",
+    syntax_example = "range(stop)
+range(start, stop[, step])",
     sql_example = r#"```sql
 > select range(2, 10, 3);
 +-----------------------------------+
@@ -69,11 +83,11 @@ make_udf_expr_and_func!(
 +-----------------------------------+
 
 > select range(DATE '1992-09-01', DATE '1993-03-01', INTERVAL '1' MONTH);
-+--------------------------------------------------------------+
-| range(DATE '1992-09-01', DATE '1993-03-01', INTERVAL '1' MONTH) |
-+--------------------------------------------------------------+
++--------------------------------------------------------------------------+
+| range(DATE '1992-09-01', DATE '1993-03-01', INTERVAL '1' MONTH)          |
++--------------------------------------------------------------------------+
 | [1992-09-01, 1992-10-01, 1992-11-01, 1992-12-01, 1993-01-01, 1993-02-01] |
-+--------------------------------------------------------------+
++--------------------------------------------------------------------------+
 ```"#,
     argument(
         name = "start",
@@ -88,115 +102,13 @@ make_udf_expr_and_func!(
         description = "Increase by step (cannot be 0). Steps less than a day 
are supported only for timestamp ranges."
     )
 )]
-#[derive(Debug, PartialEq, Eq, Hash)]
-pub struct Range {
-    signature: Signature,
-    aliases: Vec<String>,
-}
-
-impl Default for Range {
-    fn default() -> Self {
-        Self::new()
-    }
-}
-impl Range {
-    pub fn new() -> Self {
-        Self {
-            signature: Signature::user_defined(Volatility::Immutable),
-            aliases: vec![],
-        }
-    }
-}
-impl ScalarUDFImpl for Range {
-    fn as_any(&self) -> &dyn Any {
-        self
-    }
-    fn name(&self) -> &str {
-        "range"
-    }
-
-    fn signature(&self) -> &Signature {
-        &self.signature
-    }
-
-    fn coerce_types(&self, arg_types: &[DataType]) -> Result<Vec<DataType>> {
-        arg_types
-            .iter()
-            .map(|arg_type| match arg_type {
-                Null => Ok(Null),
-                Int8 => Ok(Int64),
-                Int16 => Ok(Int64),
-                Int32 => Ok(Int64),
-                Int64 => Ok(Int64),
-                UInt8 => Ok(Int64),
-                UInt16 => Ok(Int64),
-                UInt32 => Ok(Int64),
-                UInt64 => Ok(Int64),
-                Timestamp(_, tz) => Ok(Timestamp(Nanosecond, tz.clone())),
-                Date32 => Ok(Date32),
-                Date64 => Ok(Date32),
-                Utf8 => Ok(Date32),
-                LargeUtf8 => Ok(Date32),
-                Utf8View => Ok(Date32),
-                Interval(_) => Ok(Interval(MonthDayNano)),
-                _ => exec_err!("Unsupported DataType"),
-            })
-            .try_collect()
-    }
-
-    fn return_type(&self, arg_types: &[DataType]) -> Result<DataType> {
-        if arg_types.iter().any(|t| t.is_null()) {
-            Ok(Null)
-        } else {
-            Ok(List(Arc::new(Field::new_list_field(
-                arg_types[0].clone(),
-                true,
-            ))))
-        }
-    }
-
-    fn invoke_with_args(
-        &self,
-        args: datafusion_expr::ScalarFunctionArgs,
-    ) -> Result<ColumnarValue> {
-        let args = &args.args;
-
-        if args.iter().any(|arg| arg.data_type().is_null()) {
-            return Ok(ColumnarValue::Array(Arc::new(NullArray::new(1))));
-        }
-        match args[0].data_type() {
-            Int64 => make_scalar_function(|args| gen_range_inner(args, 
false))(args),
-            Date32 => make_scalar_function(|args| gen_range_date(args, 
false))(args),
-            Timestamp(_, _) => {
-                make_scalar_function(|args| gen_range_timestamp(args, 
false))(args)
-            }
-            dt => {
-                exec_err!("unsupported type for RANGE. Expected Int64, Date32 
or Timestamp, got: {dt}")
-            }
-        }
-    }
-
-    fn aliases(&self) -> &[String] {
-        &self.aliases
-    }
-
-    fn documentation(&self) -> Option<&Documentation> {
-        self.doc()
-    }
-}
-
-make_udf_expr_and_func!(
-    GenSeries,
-    gen_series,
-    start stop step,
-    "create a list of values in the range between start and stop, include 
upper bound",
-    gen_series_udf
-);
+struct RangeDoc {}
 
 #[user_doc(
     doc_section(label = "Array Functions"),
     description = "Similar to the range function, but it includes the upper 
bound.",
-    syntax_example = "generate_series(start, stop, step)",
+    syntax_example = "generate_series(stop)
+generate_series(start, stop[, step])",
     sql_example = r#"```sql
 > select generate_series(1,3);
 +------------------------------------+
@@ -218,25 +130,50 @@ make_udf_expr_and_func!(
         description = "Increase by step (can not be 0). Steps less than a day 
are supported only for timestamp ranges."
     )
 )]
+struct GenerateSeriesDoc {}
+
 #[derive(Debug, PartialEq, Eq, Hash)]
-pub(super) struct GenSeries {
+pub struct Range {
     signature: Signature,
-    aliases: Vec<String>,
+    /// `false` for range, `true` for generate_series
+    include_upper_bound: bool,
+}
+
+impl Default for Range {
+    fn default() -> Self {
+        Self::new()
+    }
 }
-impl GenSeries {
+
+impl Range {
+    /// Generate `range()` function which excludes upper bound.
     pub fn new() -> Self {
         Self {
             signature: Signature::user_defined(Volatility::Immutable),
-            aliases: vec![],
+            include_upper_bound: false,
+        }
+    }
+
+    /// Generate `generate_series()` function which includes upper bound.
+    fn generate_series() -> Self {
+        Self {
+            signature: Signature::user_defined(Volatility::Immutable),
+            include_upper_bound: true,
         }
     }
 }
-impl ScalarUDFImpl for GenSeries {
+
+impl ScalarUDFImpl for Range {
     fn as_any(&self) -> &dyn Any {
         self
     }
+
     fn name(&self) -> &str {
-        "generate_series"
+        if self.include_upper_bound {
+            "generate_series"
+        } else {
+            "range"
+        }
     }
 
     fn signature(&self) -> &Signature {
@@ -286,107 +223,263 @@ impl ScalarUDFImpl for GenSeries {
         let args = &args.args;
 
         if args.iter().any(|arg| arg.data_type().is_null()) {
-            return Ok(ColumnarValue::Array(Arc::new(NullArray::new(1))));
+            return Ok(ColumnarValue::Scalar(ScalarValue::Null));
         }
         match args[0].data_type() {
-            Int64 => make_scalar_function(|args| gen_range_inner(args, 
true))(args),
-            Date32 => make_scalar_function(|args| gen_range_date(args, 
true))(args),
+            Int64 => make_scalar_function(|args| 
self.gen_range_inner(args))(args),
+            Date32 => make_scalar_function(|args| 
self.gen_range_date(args))(args),
             Timestamp(_, _) => {
-                make_scalar_function(|args| gen_range_timestamp(args, 
true))(args)
+                make_scalar_function(|args| 
self.gen_range_timestamp(args))(args)
             }
             dt => {
-                exec_err!(
-                    "unsupported type for GENERATE_SERIES. Expected Int64, 
Date32 or Timestamp, got: {}",
-                    dt
-                )
+                exec_err!("unsupported type for {}. Expected Int64, Date32 or 
Timestamp, got: {dt}", self.name())
             }
         }
     }
 
-    fn aliases(&self) -> &[String] {
-        &self.aliases
-    }
-
     fn documentation(&self) -> Option<&Documentation> {
-        self.doc()
+        if self.include_upper_bound {
+            GenerateSeriesDoc {}.doc()
+        } else {
+            RangeDoc {}.doc()
+        }
     }
 }
 
-/// Generates an array of integers from start to stop with a given step.
-///
-/// This function takes 1 to 3 ArrayRefs as arguments, representing start, 
stop, and step values.
-/// It returns a `Result<ArrayRef>` representing the resulting ListArray after 
the operation.
-///
-/// # Arguments
-///
-/// * `args` - An array of 1 to 3 ArrayRefs representing start, stop, and 
step(step value can not be zero.) values.
-///
-/// # Examples
-///
-/// gen_range(3) => [0, 1, 2]
-/// gen_range(1, 4) => [1, 2, 3]
-/// gen_range(1, 7, 2) => [1, 3, 5]
-pub(super) fn gen_range_inner(
-    args: &[ArrayRef],
-    include_upper: bool,
-) -> Result<ArrayRef> {
-    let (start_array, stop_array, step_array) = match args.len() {
-        1 => (None, as_int64_array(&args[0])?, None),
-        2 => (
-            Some(as_int64_array(&args[0])?),
-            as_int64_array(&args[1])?,
-            None,
-        ),
-        3 => (
-            Some(as_int64_array(&args[0])?),
-            as_int64_array(&args[1])?,
-            Some(as_int64_array(&args[2])?),
-        ),
-        _ => return exec_err!("gen_range expects 1 to 3 arguments"),
-    };
-
-    let mut values = vec![];
-    let mut offsets = vec![0];
-    let mut valid = NullBufferBuilder::new(stop_array.len());
-    for (idx, stop) in stop_array.iter().enumerate() {
-        match retrieve_range_args(start_array, stop, step_array, idx) {
-            Some((_, _, 0)) => {
-                return exec_err!(
-                    "step can't be 0 for function {}(start [, stop, step])",
-                    if include_upper {
-                        "generate_series"
-                    } else {
-                        "range"
-                    }
-                );
+impl Range {
+    /// Generates an array of integers from start to stop with a given step.
+    ///
+    /// This function takes 1 to 3 ArrayRefs as arguments, representing start, 
stop, and step values.
+    /// It returns a `Result<ArrayRef>` representing the resulting ListArray 
after the operation.
+    ///
+    /// # Arguments
+    ///
+    /// * `args` - An array of 1 to 3 ArrayRefs representing start, stop, and 
step(step value can not be zero.) values.
+    ///
+    /// # Examples
+    ///
+    /// gen_range(3) => [0, 1, 2]
+    /// gen_range(1, 4) => [1, 2, 3]
+    /// gen_range(1, 7, 2) => [1, 3, 5]
+    fn gen_range_inner(&self, args: &[ArrayRef]) -> Result<ArrayRef> {
+        let (start_array, stop_array, step_array) = match args {
+            [stop_array] => (None, as_int64_array(stop_array)?, None),
+            [start_array, stop_array] => (
+                Some(as_int64_array(start_array)?),
+                as_int64_array(stop_array)?,
+                None,
+            ),
+            [start_array, stop_array, step_array] => (
+                Some(as_int64_array(start_array)?),
+                as_int64_array(stop_array)?,
+                Some(as_int64_array(step_array)?),
+            ),
+            _ => return exec_err!("{} expects 1 to 3 arguments", self.name()),
+        };
+
+        let mut values = vec![];
+        let mut offsets = vec![0];
+        let mut valid = NullBufferBuilder::new(stop_array.len());
+        for (idx, stop) in stop_array.iter().enumerate() {
+            match retrieve_range_args(start_array, stop, step_array, idx) {
+                Some((_, _, 0)) => {
+                    return exec_err!(
+                        "step can't be 0 for function {}(start [, stop, 
step])",
+                        self.name()
+                    );
+                }
+                Some((start, stop, step)) => {
+                    // Below, we utilize `usize` to represent steps.
+                    // On 32-bit targets, the absolute value of `i64` may fail 
to fit into `usize`.
+                    let step_abs =
+                        usize::try_from(step.unsigned_abs()).map_err(|_| {
+                            not_impl_datafusion_err!("step {} can't fit into 
usize", step)
+                        })?;
+                    values.extend(
+                        gen_range_iter(start, stop, step < 0, 
self.include_upper_bound)
+                            .step_by(step_abs),
+                    );
+                    offsets.push(values.len() as i32);
+                    valid.append_non_null();
+                }
+                // If any of the arguments is NULL, append a NULL value to the 
result.
+                None => {
+                    offsets.push(values.len() as i32);
+                    valid.append_null();
+                }
+            };
+        }
+        let arr = Arc::new(ListArray::try_new(
+            Arc::new(Field::new_list_field(Int64, true)),
+            OffsetBuffer::new(offsets.into()),
+            Arc::new(Int64Array::from(values)),
+            valid.finish(),
+        )?);
+        Ok(arr)
+    }
+
+    fn gen_range_date(&self, args: &[ArrayRef]) -> Result<ArrayRef> {
+        let [start, stop, step] = take_function_args(self.name(), args)?;
+
+        let (start_array, stop_array, step_array) = (
+            as_date32_array(start)?,
+            as_date32_array(stop)?,
+            as_interval_mdn_array(step)?,
+        );
+
+        // values are date32s
+        let values_builder = Date32Builder::new();
+        let mut list_builder = ListBuilder::new(values_builder);
+
+        for idx in 0..stop_array.len() {
+            if start_array.is_null(idx)
+                || stop_array.is_null(idx)
+                || step_array.is_null(idx)
+            {
+                list_builder.append_null();
+                continue;
             }
-            Some((start, stop, step)) => {
-                // Below, we utilize `usize` to represent steps.
-                // On 32-bit targets, the absolute value of `i64` may fail to 
fit into `usize`.
-                let step_abs = 
usize::try_from(step.unsigned_abs()).map_err(|_| {
-                    not_impl_datafusion_err!("step {} can't fit into usize", 
step)
-                })?;
-                values.extend(
-                    gen_range_iter(start, stop, step < 0, include_upper)
-                        .step_by(step_abs),
-                );
-                offsets.push(values.len() as i32);
-                valid.append_non_null();
+
+            let start = start_array.value(idx);
+            let stop = stop_array.value(idx);
+            let step = step_array.value(idx);
+
+            let (months, days, _) = IntervalMonthDayNanoType::to_parts(step);
+            if months == 0 && days == 0 {
+                return exec_err!("Cannot generate date range less than 1 
day.");
+            }
+
+            let stop = if !self.include_upper_bound {
+                Date32Type::subtract_month_day_nano(stop, step)
+            } else {
+                stop
+            };
+
+            let neg = months < 0 || days < 0;
+            let mut new_date = start;
+
+            let values = from_fn(|| {
+                if (neg && new_date < stop) || (!neg && new_date > stop) {
+                    None
+                } else {
+                    let current_date = new_date;
+                    new_date = Date32Type::add_month_day_nano(new_date, step);
+                    Some(Some(current_date))
+                }
+            });
+
+            list_builder.append_value(values);
+        }
+
+        let arr = Arc::new(list_builder.finish());
+
+        Ok(arr)
+    }
+
+    fn gen_range_timestamp(&self, args: &[ArrayRef]) -> Result<ArrayRef> {
+        let [start, stop, step] = take_function_args(self.name(), args)?;
+
+        // coerce_types fn should coerce all types to Timestamp(Nanosecond, tz)
+        // TODO: remove these map_err once the signature is robust enough to 
guard against this
+        let start_arr = as_timestamp_nanosecond_array(start).map_err(|_e| {
+            DataFusionError::Internal(format!(
+                "Unexpected argument type for {} : {}",
+                self.name(),
+                start.data_type()
+            ))
+        })?;
+        let stop_arr = as_timestamp_nanosecond_array(stop).map_err(|_e| {
+            DataFusionError::Internal(format!(
+                "Unexpected argument type for {} : {}",
+                self.name(),
+                stop.data_type()
+            ))
+        })?;
+        let step_arr = as_interval_mdn_array(step)?;
+        let start_tz = parse_tz(&start_arr.timezone())?;
+        let stop_tz = parse_tz(&stop_arr.timezone())?;
+
+        // values are timestamps
+        let values_builder = start_arr
+            .timezone()
+            .map_or_else(TimestampNanosecondBuilder::new, |start_tz_str| {
+                TimestampNanosecondBuilder::new().with_timezone(start_tz_str)
+            });
+        let mut list_builder = ListBuilder::new(values_builder);
+
+        for idx in 0..start_arr.len() {
+            if start_arr.is_null(idx) || stop_arr.is_null(idx) || 
step_arr.is_null(idx) {
+                list_builder.append_null();
+                continue;
             }
-            // If any of the arguments is NULL, append a NULL value to the 
result.
-            None => {
-                offsets.push(values.len() as i32);
-                valid.append_null();
+
+            let start = start_arr.value(idx);
+            let stop = stop_arr.value(idx);
+            let step = step_arr.value(idx);
+
+            let (months, days, ns) = IntervalMonthDayNanoType::to_parts(step);
+            if months == 0 && days == 0 && ns == 0 {
+                return exec_err!("Interval argument to {} must not be 0", 
self.name());
             }
-        };
+
+            let neg = TimestampNanosecondType::add_month_day_nano(start, step, 
start_tz)
+                .ok_or(exec_datafusion_err!(
+                    "Cannot generate timestamp range where start + step 
overflows"
+                ))?
+                .cmp(&start)
+                == Ordering::Less;
+
+            let stop_dt =
+                as_datetime_with_timezone::<TimestampNanosecondType>(stop, 
stop_tz)
+                    .ok_or(exec_datafusion_err!(
+                        "Cannot generate timestamp for stop: {}: {:?}",
+                        stop,
+                        stop_tz
+                    ))?;
+
+            let mut current = start;
+            let mut current_dt =
+                as_datetime_with_timezone::<TimestampNanosecondType>(current, 
start_tz)
+                    .ok_or(exec_datafusion_err!(
+                    "Cannot generate timestamp for start: {}: {:?}",
+                    current,
+                    start_tz
+                ))?;
+
+            let values = from_fn(|| {
+                let generate_series_should_end = self.include_upper_bound
+                    && ((neg && current_dt < stop_dt) || (!neg && current_dt > 
stop_dt));
+                let range_should_end = !self.include_upper_bound
+                    && ((neg && current_dt <= stop_dt)
+                        || (!neg && current_dt >= stop_dt));
+                if generate_series_should_end || range_should_end {
+                    return None;
+                }
+
+                let prev_current = current;
+
+                if let Some(ts) =
+                    TimestampNanosecondType::add_month_day_nano(current, step, 
start_tz)
+                {
+                    current = ts;
+                    current_dt = 
as_datetime_with_timezone::<TimestampNanosecondType>(
+                        current, start_tz,
+                    )?;
+
+                    Some(Some(prev_current))
+                } else {
+                    // we failed to parse the timestamp here so terminate the 
series
+                    None
+                }
+            });
+
+            list_builder.append_value(values);
+        }
+
+        let arr = Arc::new(list_builder.finish());
+
+        Ok(arr)
     }
-    let arr = Arc::new(ListArray::try_new(
-        Arc::new(Field::new_list_field(Int64, true)),
-        OffsetBuffer::new(offsets.into()),
-        Arc::new(Int64Array::from(values)),
-        valid.finish(),
-    )?);
-    Ok(arr)
 }
 
 /// Get the (start, stop, step) args for the range and generate_series 
function.
@@ -436,201 +529,7 @@ fn gen_range_iter(
     }
 }
 
-fn gen_range_date(args: &[ArrayRef], include_upper_bound: bool) -> 
Result<ArrayRef> {
-    let [start, stop, step] = take_function_args("range", args)?;
-
-    let (start_array, stop_array, step_array) = (
-        Some(as_date32_array(start)?),
-        as_date32_array(stop)?,
-        Some(as_interval_mdn_array(step)?),
-    );
-
-    // values are date32s
-    let values_builder = Date32Builder::new();
-    let mut list_builder = ListBuilder::new(values_builder);
-
-    for idx in 0..stop_array.len() {
-        if stop_array.is_null(idx) {
-            list_builder.append_null();
-            continue;
-        }
-        let mut stop = stop_array.value(idx);
-
-        let start = if let Some(start_array_values) = start_array {
-            if start_array_values.is_null(idx) {
-                list_builder.append_null();
-                continue;
-            }
-            start_array_values.value(idx)
-        } else {
-            list_builder.append_null();
-            continue;
-        };
-
-        let step = if let Some(step) = step_array {
-            if step.is_null(idx) {
-                list_builder.append_null();
-                continue;
-            }
-            step.value(idx)
-        } else {
-            list_builder.append_null();
-            continue;
-        };
-
-        let (months, days, _) = IntervalMonthDayNanoType::to_parts(step);
-
-        if months == 0 && days == 0 {
-            return exec_err!("Cannot generate date range less than 1 day.");
-        }
-
-        let neg = months < 0 || days < 0;
-        if !include_upper_bound {
-            stop = Date32Type::subtract_month_day_nano(stop, step);
-        }
-        let mut new_date = start;
-
-        let values = from_fn(|| {
-            if (neg && new_date < stop) || (!neg && new_date > stop) {
-                None
-            } else {
-                let current_date = new_date;
-                new_date = Date32Type::add_month_day_nano(new_date, step);
-                Some(Some(current_date))
-            }
-        });
-
-        list_builder.append_value(values);
-    }
-
-    let arr = Arc::new(list_builder.finish());
-
-    Ok(arr)
-}
-
-fn gen_range_timestamp(args: &[ArrayRef], include_upper_bound: bool) -> 
Result<ArrayRef> {
-    let func_name = if include_upper_bound {
-        "GENERATE_SERIES"
-    } else {
-        "RANGE"
-    };
-    let [start, stop, step] = take_function_args(func_name, args)?;
-
-    // coerce_types fn should coerce all types to Timestamp(Nanosecond, tz)
-    let (start_arr, start_tz_opt) = cast_timestamp_arg(start, 
include_upper_bound)?;
-    let (stop_arr, stop_tz_opt) = cast_timestamp_arg(stop, 
include_upper_bound)?;
-    let step_arr = as_interval_mdn_array(step)?;
-    let start_tz = parse_tz(start_tz_opt)?;
-    let stop_tz = parse_tz(stop_tz_opt)?;
-
-    // values are timestamps
-    let values_builder = start_tz_opt
-        .clone()
-        .map_or_else(TimestampNanosecondBuilder::new, |start_tz_str| {
-            TimestampNanosecondBuilder::new().with_timezone(start_tz_str)
-        });
-    let mut list_builder = ListBuilder::new(values_builder);
-
-    for idx in 0..start_arr.len() {
-        if start_arr.is_null(idx) || stop_arr.is_null(idx) || 
step_arr.is_null(idx) {
-            list_builder.append_null();
-            continue;
-        }
-
-        let start = start_arr.value(idx);
-        let stop = stop_arr.value(idx);
-        let step = step_arr.value(idx);
-
-        let (months, days, ns) = IntervalMonthDayNanoType::to_parts(step);
-        if months == 0 && days == 0 && ns == 0 {
-            return exec_err!(
-                "Interval argument to {} must not be 0",
-                if include_upper_bound {
-                    "GENERATE_SERIES"
-                } else {
-                    "RANGE"
-                }
-            );
-        }
-
-        let neg = TSNT::add_month_day_nano(start, step, start_tz)
-            .ok_or(exec_datafusion_err!(
-                "Cannot generate timestamp range where start + step overflows"
-            ))?
-            .cmp(&start)
-            == Ordering::Less;
-
-        let stop_dt = as_datetime_with_timezone::<TSNT>(stop, stop_tz).ok_or(
-            exec_datafusion_err!(
-                "Cannot generate timestamp for stop: {}: {:?}",
-                stop,
-                stop_tz
-            ),
-        )?;
-
-        let mut current = start;
-        let mut current_dt = as_datetime_with_timezone::<TSNT>(current, 
start_tz).ok_or(
-            exec_datafusion_err!(
-                "Cannot generate timestamp for start: {}: {:?}",
-                current,
-                start_tz
-            ),
-        )?;
-
-        let values = from_fn(|| {
-            if (include_upper_bound
-                && ((neg && current_dt < stop_dt) || (!neg && current_dt > 
stop_dt)))
-                || (!include_upper_bound
-                    && ((neg && current_dt <= stop_dt)
-                        || (!neg && current_dt >= stop_dt)))
-            {
-                return None;
-            }
-
-            let prev_current = current;
-
-            if let Some(ts) = TSNT::add_month_day_nano(current, step, 
start_tz) {
-                current = ts;
-                current_dt = as_datetime_with_timezone::<TSNT>(current, 
start_tz)?;
-
-                Some(Some(prev_current))
-            } else {
-                // we failed to parse the timestamp here so terminate the 
series
-                None
-            }
-        });
-
-        list_builder.append_value(values);
-    }
-
-    let arr = Arc::new(list_builder.finish());
-
-    Ok(arr)
-}
-
-fn cast_timestamp_arg(
-    arg: &ArrayRef,
-    include_upper: bool,
-) -> Result<(&TimestampNanosecondArray, &Option<Arc<str>>)> {
-    match arg.data_type() {
-        Timestamp(Nanosecond, tz_opt) => {
-            Ok((as_timestamp_nanosecond_array(arg)?, tz_opt))
-        }
-        _ => {
-            internal_err!(
-                "Unexpected argument type for {} : {}",
-                if include_upper {
-                    "GENERATE_SERIES"
-                } else {
-                    "RANGE"
-                },
-                arg.data_type()
-            )
-        }
-    }
-}
-
-fn parse_tz(tz: &Option<Arc<str>>) -> Result<Tz> {
+fn parse_tz(tz: &Option<&str>) -> Result<Tz> {
     let tz = tz.as_ref().map_or_else(|| "+00", |s| s);
 
     Tz::from_str(tz)
diff --git a/datafusion/sqllogictest/test_files/array.slt 
b/datafusion/sqllogictest/test_files/array.slt
index 29f0241c88..144e3b757a 100644
--- a/datafusion/sqllogictest/test_files/array.slt
+++ b/datafusion/sqllogictest/test_files/array.slt
@@ -6054,7 +6054,7 @@ NULL NULL
 # array_has([], 1) -> 'false' (empty array should return false)
 # array_has(null, 1) -> 'null' (null array should return null)
 query ?T
-SELECT column1, COALESCE(CAST(array_has(column1, column2) AS VARCHAR), 'null') 
+SELECT column1, COALESCE(CAST(array_has(column1, column2) AS VARCHAR), 'null')
 from array_has_table_empty;
 ----
 [1, 3, 5] true
@@ -6315,7 +6315,7 @@ true false false true
 false true false false
 NULL NULL false false
 false false NULL false
-false false false NULL 
+false false false NULL
 
 query BBBB
 select array_has_all(make_array(1,2,3), []),
@@ -7131,7 +7131,7 @@ select generate_series(arrow_cast('2021-01-01T00:00:00', 
'Timestamp(Nanosecond,
 [2021-01-01T00:00:00-05:00, 2021-01-01T01:29:54.500-05:00, 
2021-01-01T02:59:49-05:00, 2021-01-01T04:29:43.500-05:00, 
2021-01-01T05:59:38-05:00]
 
 ## mixing types for timestamps is not supported
-query error DataFusion error: Internal error: Unexpected argument type for 
GENERATE_SERIES : Date32
+query error DataFusion error: Internal error: Unexpected argument type for 
generate_series : Date32
 select generate_series(arrow_cast('2021-01-01T00:00:00', 
'Timestamp(Nanosecond, Some("-05:00"))'), DATE '2021-01-02', INTERVAL '1' HOUR);
 
 
@@ -7239,7 +7239,7 @@ query error DataFusion error: Execution error: step can't 
be 0 for function gene
 select generate_series(1, 1, 0);
 
 # Test generate_series with zero step
-query error DataFusion error: Execution error: Interval argument to 
GENERATE_SERIES must not be 0
+query error DataFusion error: Execution error: Interval argument to 
generate_series must not be 0
 select generate_series(TIMESTAMP '2000-01-02', TIMESTAMP '2000-01-01', 
INTERVAL '0' MINUTE);
 
 # Test generate_series with big steps
@@ -8209,7 +8209,7 @@ select array_reverse(arrow_cast(make_array(1, 2, 3), 
'LargeList(Int64)')), array
 [3, 2, 1] [1]
 
 query ????
-select array_reverse(arrow_cast(make_array(1, 2, 3), 'FixedSizeList(3, 
Int64)')), 
+select array_reverse(arrow_cast(make_array(1, 2, 3), 'FixedSizeList(3, 
Int64)')),
   array_reverse(arrow_cast(make_array(1), 'FixedSizeList(1, Int64)')),
   array_reverse(arrow_cast(make_array(1, NULL, 3), 'FixedSizeList(3, Int64)')),
   array_reverse(arrow_cast(make_array(NULL, NULL, NULL), 'FixedSizeList(3, 
Int64)'));
diff --git a/docs/source/user-guide/sql/scalar_functions.md 
b/docs/source/user-guide/sql/scalar_functions.md
index 63a3eb2da8..0f091cb43e 100644
--- a/docs/source/user-guide/sql/scalar_functions.md
+++ b/docs/source/user-guide/sql/scalar_functions.md
@@ -4182,7 +4182,8 @@ flatten(array)
 Similar to the range function, but it includes the upper bound.
 
 ```sql
-generate_series(start, stop, step)
+generate_series(stop)
+generate_series(start, stop[, step])
 ```
 
 #### Arguments
@@ -4402,7 +4403,8 @@ _Alias of [make_array](#make_array)._
 Returns an Arrow array between start and stop with step. The range start..end 
contains all values with start <= x < end. It is empty if start >= end. Step 
cannot be 0.
 
 ```sql
-range(start, stop, step)
+range(stop)
+range(start, stop[, step])
 ```
 
 #### Arguments
@@ -4422,11 +4424,11 @@ range(start, stop, step)
 +-----------------------------------+
 
 > select range(DATE '1992-09-01', DATE '1993-03-01', INTERVAL '1' MONTH);
-+--------------------------------------------------------------+
-| range(DATE '1992-09-01', DATE '1993-03-01', INTERVAL '1' MONTH) |
-+--------------------------------------------------------------+
++--------------------------------------------------------------------------+
+| range(DATE '1992-09-01', DATE '1993-03-01', INTERVAL '1' MONTH)          |
++--------------------------------------------------------------------------+
 | [1992-09-01, 1992-10-01, 1992-11-01, 1992-12-01, 1993-01-01, 1993-02-01] |
-+--------------------------------------------------------------+
++--------------------------------------------------------------------------+
 ```
 
 ### `string_to_array`


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]


Reply via email to