andygrove opened a new issue, #3106:
URL: https://github.com/apache/datafusion-comet/issues/3106

   ## What is the problem the feature request solves?
   
   > **Note:** This issue was generated with AI assistance. The specification 
details have been extracted from Spark documentation and may need verification.
   
   Comet does not currently support the Spark `make_timestamp` function, 
causing queries using this function to fall back to Spark's JVM execution 
instead of running natively on DataFusion.
   
   The `MakeTimestamp` expression constructs a timestamp value from separate 
year, month, day, hour, minute, and second components, with optional timezone 
specification. It supports microsecond precision through decimal seconds and 
can operate in both fail-on-error (ANSI) and null-on-error modes depending on 
configuration.
   
   Supporting this expression would allow more Spark workloads to benefit from 
Comet's native acceleration.
   
   ## Describe the potential solution
   
   ### Spark Specification
   
   **Syntax:**
   ```sql
   make_timestamp(year, month, day, hour, min, sec [, timezone])
   ```
   
   **Arguments:**
   | Argument | Type | Description |
   |----------|------|-------------|
   | year | IntegerType | The year component (e.g., 2023) |
   | month | IntegerType | The month component (1-12) |
   | day | IntegerType | The day component (1-31) |
   | hour | IntegerType | The hour component (0-23) |
   | min | IntegerType | The minute component (0-59) |
   | sec | DecimalType(16,6) | The second component with microsecond precision 
(0-59.999999) |
   | timezone | StringType (optional) | The timezone identifier (e.g., "UTC", 
"America/New_York") |
   
   **Return Type:** Returns the configured timestamp type 
(`SQLConf.get.timestampType`), which can be either `TimestampType` (timestamp 
with timezone) or `TimestampNTZType` (timestamp without timezone).
   
   **Supported Data Types:**
   - Year, month, day, hour, minute: Integer types that can be cast to 
`IntegerType`
   - Seconds: Numeric types that can be cast to `DecimalType(16,6)` to preserve 
microsecond precision
   - Timezone: String types with collation support
   
   **Edge Cases:**
   - Null inputs: Returns null if any input is null (null intolerant)
   - Invalid dates: Returns null in non-ANSI mode, throws exception in ANSI 
mode (e.g., February 30th)
   - Seconds = 60: Supported only when nanoseconds = 0, adds one minute for 
PostgreSQL compatibility
   - Fractional seconds > 60: Throws `invalidFractionOfSecondError`
   - Invalid timezone strings: Throws exception during timezone parsing
   - Overflow conditions: Handled by underlying Java time libraries with 
appropriate exceptions
   
   **Examples:**
   ```sql
   -- Create timestamp with explicit timezone
   SELECT make_timestamp(2023, 12, 25, 14, 30, 45.123456, 'UTC');
   
   -- Create timestamp using session timezone
   SELECT make_timestamp(2023, 1, 1, 0, 0, 0.0);
   
   -- Handle leap seconds (PostgreSQL compatibility)
   SELECT make_timestamp(2023, 6, 30, 23, 59, 60.0, 'UTC');
   ```
   
   ```scala
   // DataFrame API usage
   import org.apache.spark.sql.functions._
   
   df.withColumn("timestamp", 
     expr("make_timestamp(year_col, month_col, day_col, hour_col, min_col, 
sec_col, 'America/New_York')"))
   
   // Using literals
   df.select(expr("make_timestamp(2023, 12, 25, 14, 30, 45.123456)"))
   ```
   
   ### Implementation Approach
   
   See the [Comet guide on adding new 
expressions](https://datafusion.apache.org/comet/contributor-guide/adding_a_new_expression.html)
 for detailed instructions.
   
   1. **Scala Serde**: Add expression handler in 
`spark/src/main/scala/org/apache/comet/serde/`
   2. **Register**: Add to appropriate map in `QueryPlanSerde.scala`
   3. **Protobuf**: Add message type in `native/proto/src/proto/expr.proto` if 
needed
   4. **Rust**: Implement in `native/spark-expr/src/` (check if DataFusion has 
built-in support first)
   
   
   ## Additional context
   
   **Difficulty:** Large
   **Spark Expression Class:** 
`org.apache.spark.sql.catalyst.expressions.MakeTimestamp`
   
   **Related:**
   - `MakeDate` - Creates date values from year, month, day components
   - `ToTimestamp` - Parses timestamp from string with format
   - `DateAdd` / `DateSub` - Arithmetic operations on dates
   - `FromUnixTime` - Converts Unix timestamp to formatted date string
   
   ---
   *This issue was auto-generated from Spark reference documentation.*
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to