parthchandra commented on code in PR #3730:
URL: https://github.com/apache/datafusion-comet/pull/3730#discussion_r2976308663


##########
native/spark-expr/src/conversion_funcs/string.rs:
##########
@@ -959,8 +974,20 @@ fn get_timestamp_values<T: TimeZone>(
     timestamp_type: &str,
     tz: &T,
 ) -> SparkResult<Option<i64>> {
-    let values: Vec<_> = value.split(['T', '-', ':', '.']).collect();
-    let year = values[0].parse::<i32>().unwrap_or_default();
+    // Handle negative year: strip leading '-' and remember the sign.
+    let (sign, date_part) = if let Some(stripped) = value.strip_prefix('-') {
+        (-1i32, stripped)
+    } else {
+        (1i32, value)
+    };
+    let values: Vec<_> = date_part.split(['T', ' ', '-', ':', '.']).collect();

Review Comment:
   Will try to improve this in a followup



##########
native/spark-expr/src/conversion_funcs/string.rs:
##########
@@ -1092,40 +1109,119 @@ fn timestamp_parser<T: TimeZone>(
     if value.is_empty() {
         return Ok(None);
     }
-    // Define regex patterns and corresponding parsing functions
-    let patterns = &[
+
+    // Handle Z or ±HH:MM offset suffix: strip it and parse with the explicit 
fixed offset.
+    if let Some((stripped, offset_secs)) = extract_offset_suffix(value) {
+        let fixed_tz = chrono::FixedOffset::east_opt(offset_secs)
+            .ok_or_else(|| SparkError::Internal("Invalid timezone 
offset".to_string()))?;
+        return timestamp_parser_with_tz(stripped, eval_mode, &fixed_tz);
+    }
+
+    timestamp_parser_with_tz(value, eval_mode, tz)
+}
+
+/// If `value` ends with a UTC offset suffix (`Z`, `+HH:MM`, or `-HH:MM`), 
returns the
+/// stripped string and the offset in seconds. Returns `None` if no offset 
suffix is present.

Review Comment:
   We'll return None and eventually fail. The corresponding code in Spark 
`SparkDateTimeUtils.getZoneId` indicates that these formats (H:mm and HH:m) are 
retained for backward compatibility with Spark before 3.0.  Let me create an 
issue to handle those as well as named timezones (e.g. 'America/New_York') 
which are less commonly used. 
   There is a Spark test ` test("SPARK-35780: support full range of timestamp 
string")` which currently passes. I'll double check in the next PR to make sure 
it is being exercised. The test does not include named timezones so perhaps 
that is not needed.



##########
native/spark-expr/src/conversion_funcs/string.rs:
##########
@@ -1092,40 +1109,119 @@ fn timestamp_parser<T: TimeZone>(
     if value.is_empty() {
         return Ok(None);
     }
-    // Define regex patterns and corresponding parsing functions
-    let patterns = &[
+
+    // Handle Z or ±HH:MM offset suffix: strip it and parse with the explicit 
fixed offset.
+    if let Some((stripped, offset_secs)) = extract_offset_suffix(value) {
+        let fixed_tz = chrono::FixedOffset::east_opt(offset_secs)
+            .ok_or_else(|| SparkError::Internal("Invalid timezone 
offset".to_string()))?;
+        return timestamp_parser_with_tz(stripped, eval_mode, &fixed_tz);
+    }
+
+    timestamp_parser_with_tz(value, eval_mode, tz)
+}
+
+/// If `value` ends with a UTC offset suffix (`Z`, `+HH:MM`, or `-HH:MM`), 
returns the
+/// stripped string and the offset in seconds. Returns `None` if no offset 
suffix is present.

Review Comment:
   Issue to support more formats - 
https://github.com/apache/datafusion-comet/issues/3775



##########
native/spark-expr/src/conversion_funcs/string.rs:
##########
@@ -1027,28 +1054,19 @@ fn parse_timestamp_to_micros<T: TimeZone>(
         timestamp_info.second,
     );
 
-    // Check if datetime is not None
-    let tz_datetime = match datetime.single() {
+    // Spark uses the offset before daylight savings change so we need to use 
earliest()
+    // Return None for LocalResult::None which is the invalid time in a DST 
spring forward gap).

Review Comment:
   There are tests in Sparks DateTimeUtilsSuite which do.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to