[jira] [Updated] (SPARK-54601) Data loss in TimestampType when writing to MS SQL Server: Microsecond precision truncated to milliseconds

ASF GitHub Bot (Jira) Thu, 04 Dec 2025 20:11:13 -0800


     [ 
https://issues.apache.org/jira/browse/SPARK-54601?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]


ASF GitHub Bot updated SPARK-54601:
-----------------------------------
    Labels: pull-request-available  (was: )

> Data loss in TimestampType when writing to MS SQL Server: Microsecond 
> precision truncated to milliseconds
> ---------------------------------------------------------------------------------------------------------
>
>                 Key: SPARK-54601
>                 URL: https://issues.apache.org/jira/browse/SPARK-54601
>             Project: Spark
>          Issue Type: Bug
>          Components: SQL
>    Affects Versions: 3.5.0, 4.0.0
>            Reporter: Vinod KC
>            Priority: Major
>              Labels: pull-request-available
>
> Spark silently loses microsecond precision when writing *timestamps* to MS 
> SQL Server via JDBC, causing data corruption.
> Spark maps TimestampType/TimestampTypeNTZ to SQL Server's legacy DATETIME 
> type which only supports millisecond precision (3 decimal places). The last 3 
> decimal places (microseconds) are automatically truncated with no warning.
> Data Loss Example - Complete Round-Trip
>  
> {code:java}
> import java.sql.Timestamp
> // Create data with Spark's full microsecond precision
> val data = Seq(
>   (1, Timestamp.valueOf("2024-12-03 14:25:37.789123")),
>   (2, Timestamp.valueOf("2024-12-03 14:25:37.789456"))
> ).toDF("id", "timestamp")
> // Original microseconds: 789123, 789456 (Spark stores 6 decimal places)
> data.collect().foreach(r => 
>   println(s"Microseconds: ${r.getAs[Timestamp]("timestamp").getNanos / 1000}")
> )
> // Output: Microseconds: 789123, Microseconds: 789456
> // Write to SQL Server (mapped to DATETIME - only 3 decimal places!)
> data.write.mode("overwrite").jdbc(jdbcUrl, "test_table", props)
> // Read back
> val result = spark.read.jdbc(jdbcUrl, "test_table", props)
> // Retrieved microseconds: 789000, 789000  ❌ DATA LOST!
> result.collect().foreach(r => 
>   println(s"Microseconds: ${r.getAs[Timestamp]("timestamp").getNanos / 1000}")
> )
> // Output: Microseconds: 789000, Microseconds: 789000
> // Lost: 123 and 456 microseconds respectively (digits 4-6)
> // Both records now appear simultaneous - ordering corrupted
> {code}
> *Root cause :* 
> Spark currently maps TimestampType and TimestampNTZType to SQL Server's 
> legacy DATETIME type, which only supports 3 decimal places (millisecond 
> precision). This causes automatic truncation of any microsecond data beyond 
> the third decimal place.
>  
> *References:*
> Microsoft Documentation: "Avoid using datetime for new work. Instead, use the 
>  datetime2 data type."
> [https://learn.microsoft.com/en-us/sql/t-sql/data-types/datetime-transact-sql]
>  
> {code:java}
> DATETIME limitations: 3.33 milliseconds precision (rounded)
> DATETIME2 precision: 100 nanoseconds (0.0000001 seconds){code}
>  
>  
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[jira] [Updated] (SPARK-54601) Data loss in TimestampType when writing to MS SQL Server: Microsecond precision truncated to milliseconds

Reply via email to