[ https://issues.apache.org/jira/browse/SPARK-49529?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17879956#comment-17879956 ]
Bruce Robbins commented on SPARK-49529: --------------------------------------- This actually matches Java 17's behavior. Try this code in the REPL: {noformat} import java.time._ import java.time.temporal.ChronoUnit._ val utcZid = ZoneId.of("UTC") val istZid = ZoneId.of("Asia/Calcutta") val utcLdt = ZonedDateTime.of(LocalDateTime.of(1, 1, 1, 0, 0, 0, 0), utcZid) val istLdt = utcLdt.withZoneSameInstant(istZid) println(istLdt) {noformat} The code prints the following: {noformat} 0001-01-01T05:53:28+05:53:28[Asia/Calcutta] {noformat} Note that Java thinks the timezone offset is +05:53:28 for that period of time (maybe because 0001-01-01 was before some official codification of the timezones in India?). > Incorrect results from from_utc_timestamp function > -------------------------------------------------- > > Key: SPARK-49529 > URL: https://issues.apache.org/jira/browse/SPARK-49529 > Project: Spark > Issue Type: Bug > Components: SQL > Affects Versions: 3.4.0, 3.4.1, 4.0.0, 3.5.2, 3.4.4, 3.5.3 > Reporter: Ankit Prakash Gupta > Priority: Major > > The values returned as output from_utc_timestamp are erratic and are not > consistent in case of values before year 1850 > > {code:java} > ❯ JAVA_HOME=/Library/Java/JavaVirtualMachines/openjdk-17.jdk/Contents/Home/ > bin/spark-shell --master local --conf spark.sql.session.timeZone=UTC > WARNING: Using incubator modules: jdk.incubator.vector > Using Spark's default log4j profile: > org/apache/spark/log4j2-defaults.properties > {"ts":"2024-09-06T02:42:45.333Z","level":"WARN","msg":"Your hostname, > RINMAC2772, resolves to a loopback address: 127.0.0.1; using 192.168.28.3 > instead (on interface > en0)","context":{"host":"RINMAC2772","host_port":"127.0.0.1","host_port2":"192.168.28.3","network_if":"en0"},"logger":"Utils"} > {"ts":"2024-09-06T02:42:45.336Z","level":"WARN","msg":"Set SPARK_LOCAL_IP if > you need to bind to another address","logger":"Utils"} > {"ts":"2024-09-06T02:42:45.536Z","level":"INFO","msg":"Registering signal > handler for INT","logger":"SignalUtils"} > Welcome to > ____ __ > / __/__ ___ _____/ /__ > _\ \/ _ \/ _ `/ __/ '_/ > /___/ .__/\_,_/_/ /_/\_\ version 4.0.0-preview1 > /_/Using Scala version 2.13.14 (OpenJDK 64-Bit Server VM, Java 17.0.11) > Type in expressions to have them evaluated. > Type :help for more information. > {"ts":"2024-09-06T02:42:48.551Z","level":"INFO","msg":"Found configuration > file null","logger":"HiveConf"} > {"ts":"2024-09-06T02:42:48.597Z","level":"INFO","msg":"Running Spark version > 4.0.0-preview1","logger":"SparkContext"} > {"ts":"2024-09-06T02:42:48.598Z","level":"INFO","msg":"OS info Mac OS X, > 14.5, aarch64","logger":"SparkContext"} > {"ts":"2024-09-06T02:42:48.598Z","level":"INFO","msg":"Java version > 17.0.11","logger":"SparkContext"} > {"ts":"2024-09-06T02:42:48.647Z","level":"WARN","msg":"Unable to load > native-hadoop library for your platform... using builtin-java classes where > applicable","logger":"NativeCodeLoader"} > {"ts":"2024-09-06T02:42:48.683Z","level":"INFO","msg":"==============================================================","logger":"ResourceUtils"} > {"ts":"2024-09-06T02:42:48.683Z","level":"INFO","msg":"No custom resources > configured for spark.driver.","logger":"ResourceUtils"} > {"ts":"2024-09-06T02:42:48.683Z","level":"INFO","msg":"==============================================================","logger":"ResourceUtils"} > {"ts":"2024-09-06T02:42:48.684Z","level":"INFO","msg":"Submitted application: > Spark shell","logger":"SparkContext"} > {"ts":"2024-09-06T02:42:48.697Z","level":"INFO","msg":"Default > ResourceProfile created, executor resources: Map(cores -> name: cores, > amount: 1, script: , vendor: , memory -> name: memory, amount: 1024, script: > , vendor: , offHeap -> name: offHeap, amount: 0, script: , vendor: ), task > resources: Map(cpus -> name: cpus, amount: 1.0)","logger":"ResourceProfile"} > {"ts":"2024-09-06T02:42:48.698Z","level":"INFO","msg":"Limiting resource is > cpu","logger":"ResourceProfile"} > {"ts":"2024-09-06T02:42:48.698Z","level":"INFO","msg":"Added ResourceProfile > id: 0","logger":"ResourceProfileManager"} > {"ts":"2024-09-06T02:42:48.724Z","level":"INFO","msg":"Changing view acls to: > ankit.gupta","logger":"SecurityManager"} > {"ts":"2024-09-06T02:42:48.724Z","level":"INFO","msg":"Changing modify acls > to: ankit.gupta","logger":"SecurityManager"} > {"ts":"2024-09-06T02:42:48.725Z","level":"INFO","msg":"Changing view acls > groups to: ","logger":"SecurityManager"} > {"ts":"2024-09-06T02:42:48.725Z","level":"INFO","msg":"Changing modify acls > groups to: ","logger":"SecurityManager"} > {"ts":"2024-09-06T02:42:48.727Z","level":"INFO","msg":"SecurityManager: > authentication disabled; ui acls disabled; users with view permissions: > ankit.gupta; groups with view permissions: EMPTY; users with modify > permissions: ankit.gupta; groups with modify permissions: EMPTY; RPC SSL > disabled","logger":"SecurityManager"} > {"ts":"2024-09-06T02:42:48.845Z","level":"INFO","msg":"Successfully started > service 'sparkDriver' on port 58034.","logger":"Utils"} > {"ts":"2024-09-06T02:42:48.858Z","level":"INFO","msg":"Registering > MapOutputTracker","logger":"SparkEnv"} > {"ts":"2024-09-06T02:42:48.862Z","level":"INFO","msg":"Registering > BlockManagerMaster","logger":"SparkEnv"} > {"ts":"2024-09-06T02:42:48.870Z","level":"INFO","msg":"Using > org.apache.spark.storage.DefaultTopologyMapper for getting topology > information","logger":"BlockManagerMasterEndpoint"} > {"ts":"2024-09-06T02:42:48.871Z","level":"INFO","msg":"BlockManagerMasterEndpoint > up","logger":"BlockManagerMasterEndpoint"} > {"ts":"2024-09-06T02:42:48.872Z","level":"INFO","msg":"Registering > BlockManagerMasterHeartbeat","logger":"SparkEnv"} > {"ts":"2024-09-06T02:42:48.888Z","level":"INFO","msg":"Created local > directory at > /private/var/folders/8h/6blq52c15c303rydrrb15tkj20sffw/T/blockmgr-27c3b3f9-785e-45e2-a417-8f6bd0ff3a44","logger":"DiskBlockManager"} > {"ts":"2024-09-06T02:42:48.899Z","level":"INFO","msg":"Registering > OutputCommitCoordinator","logger":"SparkEnv"} > {"ts":"2024-09-06T02:42:48.967Z","level":"INFO","msg":"Start Jetty > 0.0.0.0:4040 for SparkUI","logger":"JettyUtils"} > {"ts":"2024-09-06T02:42:48.999Z","level":"INFO","msg":"Successfully started > service 'SparkUI' on port 4040.","logger":"Utils"} > {"ts":"2024-09-06T02:42:49.026Z","level":"INFO","msg":"Changing view acls to: > ankit.gupta","logger":"SecurityManager"} > {"ts":"2024-09-06T02:42:49.026Z","level":"INFO","msg":"Changing modify acls > to: ankit.gupta","logger":"SecurityManager"} > {"ts":"2024-09-06T02:42:49.026Z","level":"INFO","msg":"Changing view acls > groups to: ","logger":"SecurityManager"} > {"ts":"2024-09-06T02:42:49.026Z","level":"INFO","msg":"Changing modify acls > groups to: ","logger":"SecurityManager"} > {"ts":"2024-09-06T02:42:49.026Z","level":"INFO","msg":"SecurityManager: > authentication disabled; ui acls disabled; users with view permissions: > ankit.gupta; groups with view permissions: EMPTY; users with modify > permissions: ankit.gupta; groups with modify permissions: EMPTY; RPC SSL > disabled","logger":"SecurityManager"} > {"ts":"2024-09-06T02:42:49.063Z","level":"INFO","msg":"Starting executor ID > driver on host 192.168.28.3","logger":"Executor"} > {"ts":"2024-09-06T02:42:49.063Z","level":"INFO","msg":"OS info Mac OS X, > 14.5, aarch64","logger":"Executor"} > {"ts":"2024-09-06T02:42:49.064Z","level":"INFO","msg":"Java version > 17.0.11","logger":"Executor"} > {"ts":"2024-09-06T02:42:49.067Z","level":"INFO","msg":"Starting executor with > user classpath (userClassPathFirst = false): ''","logger":"Executor"} > {"ts":"2024-09-06T02:42:49.067Z","level":"INFO","msg":"Using REPL class URI: > spark://192.168.28.3:58034/classes","logger":"Executor"} > {"ts":"2024-09-06T02:42:49.070Z","level":"INFO","msg":"Created or updated > repl class loader org.apache.spark.executor.ExecutorClassLoader@41bc501b for > default.","logger":"Executor"} > {"ts":"2024-09-06T02:42:49.079Z","level":"INFO","msg":"Successfully started > service 'org.apache.spark.network.netty.NettyBlockTransferService' on port > 58036.","logger":"Utils"} > {"ts":"2024-09-06T02:42:49.080Z","level":"INFO","msg":"Server created on > 192.168.28.3:58036","logger":"NettyBlockTransferService"} > {"ts":"2024-09-06T02:42:49.081Z","level":"INFO","msg":"Using > org.apache.spark.storage.RandomBlockReplicationPolicy for block replication > policy","logger":"BlockManager"} > {"ts":"2024-09-06T02:42:49.086Z","level":"INFO","msg":"Registering > BlockManager BlockManagerId(driver, 192.168.28.3, 58036, > None)","logger":"BlockManagerMaster"} > {"ts":"2024-09-06T02:42:49.089Z","level":"INFO","msg":"Registering block > manager 192.168.28.3:58036 with 434.4 MiB RAM, BlockManagerId(driver, > 192.168.28.3, 58036, None)","logger":"BlockManagerMasterEndpoint"} > {"ts":"2024-09-06T02:42:49.090Z","level":"INFO","msg":"Registered > BlockManager BlockManagerId(driver, 192.168.28.3, 58036, > None)","logger":"BlockManagerMaster"} > {"ts":"2024-09-06T02:42:49.091Z","level":"INFO","msg":"Initialized > BlockManager: BlockManagerId(driver, 192.168.28.3, 58036, > None)","logger":"BlockManager"} > Spark context Web UI available at http://192.168.28.3:4040 > Spark context available as 'sc' (master = local, app id = > local-1725590569043). > Spark session available as 'spark'.scala> :paste > // Entering paste mode (ctrl-D to > finish)java.util.TimeZone.setDefault(java.util.TimeZone.getTimeZone("UTC")) > val df = Seq(java.sql.Timestamp.valueOf("0001-01-01 00:00:00"), > java.sql.Timestamp.valueOf("1900-01-01 00:00:00"), > java.sql.Timestamp.valueOf("1799-12-31 00:00:00"), > java.sql.Timestamp.valueOf("1850-12-31 00:00:00"), new > java.sql.Timestamp(0)).toDF("ts") > df.withColumn("ts_trans", from_utc_timestamp($"ts", "JST")).show > // Exiting paste mode... now interpreting. > warning: 1 deprecation (since 2.13.3); for details, enable `:setting > -deprecation` or `:replay -deprecation` > {"ts":"2024-09-06T02:42:55.091Z","level":"INFO","msg":"Setting > hive.metastore.warehouse.dir ('null') to the value of > spark.sql.warehouse.dir.","logger":"SharedState"} > {"ts":"2024-09-06T02:42:55.096Z","level":"INFO","msg":"Warehouse path is > 'file:/Users/ankit.gupta/Workspace/softwares/spark-4.0.0-preview1-bin-hadoop3/spark-warehouse'.","logger":"SharedState"} > {"ts":"2024-09-06T02:42:55.573Z","level":"INFO","msg":"Code generated in > 110.8475 ms","logger":"CodeGenerator"} > {"ts":"2024-09-06T02:42:56.179Z","level":"INFO","msg":"Code generated in > 5.344709 ms","logger":"CodeGenerator"} > {"ts":"2024-09-06T02:42:56.196Z","level":"INFO","msg":"Code generated in > 7.283917 ms","logger":"CodeGenerator"} > +-------------------+-------------------+ > | ts| ts_trans| > +-------------------+-------------------+ > |0001-01-01 00:00:00|0001-01-01 09:18:59| > |1900-01-01 00:00:00|1900-01-01 09:00:00| > |1799-12-31 00:00:00|1799-12-31 09:18:59| > |1850-12-31 00:00:00|1850-12-31 09:18:59| > |1970-01-01 00:00:00|1970-01-01 09:00:00| > +-------------------+-------------------+val df: > org.apache.spark.sql.DataFrame = [ts: timestamp] > {code} -- This message was sent by Atlassian Jira (v8.20.10#820010) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org