[ https://issues.apache.org/jira/browse/SPARK-25873?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16733558#comment-16733558 ]
Pablo Langa Blanco commented on SPARK-25873: -------------------------------------------- Hello, it seems duplicated with SPARK-25919 that has been resolved. Could it be closed too? Thank you Regards > Date corruption when Spark and Hive both are on different timezones > ------------------------------------------------------------------- > > Key: SPARK-25873 > URL: https://issues.apache.org/jira/browse/SPARK-25873 > Project: Spark > Issue Type: Bug > Components: Spark Core, Spark Shell, Spark Submit > Affects Versions: 2.2.1 > Reporter: Pawan > Priority: Major > > There is date alteration when loading date from one table to another in hive > through spark. This happens when Hive is on a remote machine with timezone > different than the one on which Spark is running. This happens only when the > Source table format is > 'org.apache.hadoop.hive.ql.io.parquet.serde.ParquetHiveSerDe' > Below are the steps to produce the issue: > 1. Create two tables as below in hive which has a timezone, say in, EST > {code} > CREATE TABLE t_src( > name varchar(10), > dob timestamp > ) > ROW FORMAT SERDE > 'org.apache.hadoop.hive.ql.io.parquet.serde.ParquetHiveSerDe' > STORED AS INPUTFORMAT > 'org.apache.hadoop.hive.ql.io.parquet.MapredParquetInputFormat' > OUTPUTFORMAT > 'org.apache.hadoop.hive.ql.io.parquet.MapredParquetOutputFormat' > {code} > {code} > INSERT INTO t_src VALUES ('p1', '0001-01-01 00:00:00.0'),('p2', '0002-01-01 > 00:00:00.0'), ('p3', '0003-01-01 00:00:00.0'),('p4', '0004-01-01 00:00:00.0'); > {code} > > {code} > CREATE TABLE t_tgt( > name varchar(10), > dob timestamp > ); > {code} > 2. Copy {{hive-site.xml}} to {{spark-2.2.1-bin-hadoop2.7/conf}} folder, so > that when you create {{sqlContext}} for hive it connects to your remote hive > server. > 3. Start your spark-shell on some other machine whose timezone is different > than that of Hive, say, PDT > 4. Execute below code: > {code} > import org.apache.spark.sql.hive.HiveContext > val sqlContext = new org.apache.spark.sql.hive.HiveContext(sc) > val q0 = "TRUNCATE table t_tgt" > val q1 = "SELECT CAST(alias.name AS String) as a0, alias.dob as a1 FROM t_src > alias" > val q2 = "INSERT OVERWRITE TABLE t_tgt SELECT tbl0.a0 as c0, tbl0.a1 as c1 > FROM tbl0" > sqlContext.sql(q0) > sqlContext.sql(q1).select("a0","a1").createOrReplaceTempView("tbl0") > sqlContext.sql(q2) > {code} > 5. Now navigate to hive and check the contents of the {{TARGET table > (t_tgt)}}. The dob field will have incorrect values. > > Is this a known issue? Is there any work around on this? Can it be fixed? > > Thanks & regards, > Pawan Lawale -- This message was sent by Atlassian JIRA (v7.6.3#76005) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org