[
https://issues.apache.org/jira/browse/PHOENIX-6623?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Viraj Jasani updated PHOENIX-6623:
----------------------------------
Fix Version/s: (was: 5.2.0)
(was: 5.1.4)
> Phoenix Spark reading DATE datatype value less than one day from phoenix table
> ------------------------------------------------------------------------------
>
> Key: PHOENIX-6623
> URL: https://issues.apache.org/jira/browse/PHOENIX-6623
> Project: Phoenix
> Issue Type: Bug
> Components: spark-connector
> Affects Versions: 4.7.0
> Reporter: Anand
> Assignee: Istvan Toth
> Priority: Blocker
> Attachments: image-2023-07-27-18-08-32-282.png,
> image-2023-07-27-18-10-11-670.png, image-2023-07-27-18-10-42-609.png,
> image-2023-07-27-18-11-47-404.png, image-2023-07-27-19-33-50-780.png
>
>
> We are using below versions of Phoenix, HBase and Spark.
> Phoenix - 4.7
> HBase - 2.6.5
> Spark - 2.4
> Created a phoenix table by mentioning one of the field datatype as DATE and
> TIMESTAMP in Phoenix using Squirrel SQL. DDL is given below.
> CREATE TABLE IF NOT EXISTS NS_TEST.CUSTOMER_TBL (
> "CID" INTEGER,
> *"CDATE" DATE,*
> "CTIMESTAMP" TIMESTAMP,
> CONSTRAINT CUSTOMER_TBL_PK PRIMARY KEY ("CID"));
> Upserted records using upsert command and below is the data in table.
>
> |CID|CDATE |CTIMESTAMP |
>
> |1 |*2021-11-21*|2022-01-18 18:30:33.896|
> |2 |*2021-11-18*|2022-01-18 18:45:59.336|
> |3 |*2021-11-17*|2022-01-18 19:01:04.265|
>
> Now, reading data from above created table in pyspark shell. We have set
> *spark.sql.session.timeZone=UTC* to spark while launching pyspark shell.
> Also, we have set *phoenix.query.dateFormatTimeZone=UTC* in *hbase-site.xml*
> file. **
> Below code snippet read data from phoenix via JDBC and it r{*}ead DATE
> datatype field as one day less{*}.
> {noformat}
> >>> val df = spark.read.format("jdbc")
> .option("driver", "org.apache.phoenix.jdbc.PhoenixDriver")
> .option("url", "jdbc:phoenix:localhost:2181:/hbase-secure")
> .option("dbtable", "(SELECT CID, CDATE, CTIMESTAMP FROM
> NS_TEST.CUSTOMER_TBL) q")
> .load(){noformat}
> >>>df.printSchema()
> root
> |-- CID: integer (nullable = true)
> |-- *CDATE: date* (nullable = true)
> |-- CTIMESTAMP: timestamp (nullable = true)
> >>>{*}df.select('{*}').show(truncate=False)\{*}
> |CID|CDATE |CTIMESTAMP |
>
> |1 |*2021-11-20*|2022-01-18 18:30:33.896|
> |2 |*2021-11-17*|2022-01-18 18:45:59.336|
> |3 |*2021-11-16*|2022-01-18 19:01:04.265|
>
> We have also tried using phoenix data source instead of JDBC and below is the
> code snippet. It also read DATE datatype field as one day less.
> {noformat}
> val df2 = spark.read.format("org.apache.phoenix.spark")
> .option("table", "NS_TEST.CUSTOMER_TBL")
> .option("zkUrl", "jdbc:phoenix:localhost:2181:/hbase-secure")
> .load(){noformat}
> **
> >>>df.printSchema()
> root
> |-- CID: integer (nullable = true)
> |-- *CDATE: date* (nullable = true)
> |-- CTIMESTAMP: timestamp (nullable = true)
> >>>{*}df.select('{*}').show(truncate=False)\{*}
> |CID|CDATE |CTIMESTAMP |
>
> |1 |*2021-11-20*|2022-01-18 18:30:33.896|
> |2 |*2021-11-17*|2022-01-18 18:45:59.336|
> |3 |*2021-11-16*|2022-01-18 19:01:04.265|
>
> Please help us on this issue why Phoenix Spark reading DATE datatype field
> value as {*}one day less{*}.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)