[jira] [Updated] (PHOENIX-6623) Phoenix Spark reading DATE datatype value less than one day from phoenix table

Istvan Toth (Jira) Thu, 23 Mar 2023 00:13:08 -0700


     [ 
https://issues.apache.org/jira/browse/PHOENIX-6623?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]


Istvan Toth updated PHOENIX-6623:
---------------------------------
    Description: 
We are using below versions of Phoenix, HBase and Spark.

Phoenix - 4.7
HBase - 2.6.5
Spark - 2.4

Created a phoenix table by mentioning one of the field datatype as DATE and 
TIMESTAMP in Phoenix using Squirrel SQL. DDL is given below.

CREATE TABLE IF NOT EXISTS NS_TEST.CUSTOMER_TBL (
"CID" INTEGER,
*"CDATE" DATE,*
"CTIMESTAMP" TIMESTAMP,
CONSTRAINT CUSTOMER_TBL_PK PRIMARY KEY ("CID"));

Upserted records using upsert command and below is the data in table.

 
|CID|CDATE         |CTIMESTAMP                 |

 
|1   |*2021-11-21*|2022-01-18 18:30:33.896|
|2   |*2021-11-18*|2022-01-18 18:45:59.336|
|3   |*2021-11-17*|2022-01-18 19:01:04.265|

 

Now, reading data from above created table in pyspark shell. We have set 
*spark.sql.session.timeZone=UTC* to spark while launching pyspark shell. Also, 
we have set *phoenix.query.dateFormatTimeZone=UTC* in *hbase-site.xml* file. ** 

Below code snippet read data from phoenix via JDBC and it r{*}ead DATE datatype 
field as one day less{*}.
{noformat}
>>> val df = spark.read.format("jdbc")
          .option("driver", "org.apache.phoenix.jdbc.PhoenixDriver")
          .option("url", "jdbc:phoenix:localhost:2181:/hbase-secure")
          .option("dbtable", "(SELECT CID, CDATE, CTIMESTAMP FROM 
NS_TEST.CUSTOMER_TBL) q")
.load(){noformat}

>>>df.printSchema()
root
 |-- CID: integer (nullable = true)
 |-- *CDATE: date* (nullable = true)
 |-- CTIMESTAMP: timestamp (nullable = true)
>>>{*}df.select('{*}').show(truncate=False)\{*}
|CID|CDATE         |CTIMESTAMP                   |

 
|1     |*2021-11-20*|2022-01-18 18:30:33.896|
|2     |*2021-11-17*|2022-01-18 18:45:59.336|
|3     |*2021-11-16*|2022-01-18 19:01:04.265|

 

We have also tried using phoenix data source instead of JDBC and below is the 
code snippet. It also read DATE datatype field as one day less.
{noformat}
val df2 = spark.read.format("org.apache.phoenix.spark")
         .option("table", "NS_TEST.CUSTOMER_TBL")
         .option("zkUrl", "jdbc:phoenix:localhost:2181:/hbase-secure")
         .load(){noformat}
**
>>>df.printSchema()
root
 |-- CID: integer (nullable = true)
 |-- *CDATE: date* (nullable = true)
 |-- CTIMESTAMP: timestamp (nullable = true)
>>>{*}df.select('{*}').show(truncate=False)\{*}
|CID|CDATE         |CTIMESTAMP                   |

 
|1     |*2021-11-20*|2022-01-18 18:30:33.896|
|2     |*2021-11-17*|2022-01-18 18:45:59.336|
|3     |*2021-11-16*|2022-01-18 19:01:04.265|

 

Please help us on this issue why Phoenix Spark reading DATE datatype field 
value as {*}one day less{*}.

  was:
We are using below versions of Phoenix, HBase and Spark.

Phoenix - 4.7
HBase - 2.6.5
Spark - 2.4

Created a phoenix table by mentioning one of the field datatype as DATE and 
TIMESTAMP in Phoenix using Squirrel SQL. DDL is given below.

CREATE TABLE IF NOT EXISTS NS_TEST.CUSTOMER_TBL (
"CID" INTEGER,
*"CDATE" DATE,*
"CTIMESTAMP" TIMESTAMP,
CONSTRAINT CUSTOMER_TBL_PK PRIMARY KEY ("ID"));

Upserted records using upsert command and below is the data in table.

 
|CID|CDATE         |CTIMESTAMP                 |

 
|1   |*2021-11-21*|2022-01-18 18:30:33.896|
|2   |*2021-11-18*|2022-01-18 18:45:59.336|
|3   |*2021-11-17*|2022-01-18 19:01:04.265|

 

Now, reading data from above created table in pyspark shell. We have set 
*spark.sql.session.timeZone=UTC* to spark while launching pyspark shell. Also, 
we have set *phoenix.query.dateFormatTimeZone=UTC* in *hbase-site.xml* file. ** 

Below code snippet read data from phoenix via JDBC and it r{*}ead DATE datatype 
field as one day less{*}.

>>> *df = spark.read.format("jdbc") *
          *.option("driver", "org.apache.phoenix.jdbc.PhoenixDriver") *
          *.option("url", "jdbc:phoenix:localhost:2181:/hbase-secure") *
          *.option("dbtable", "(SELECT CID, CDATE, CTIMESTAMP FROM 
NS_TEST.CUSTOMER_TBL) q") *
          *.load()*
>>>df.printSchema()
root
 |-- CID: integer (nullable = true)
 |-- *CDATE: date* (nullable = true)
 |-- CTIMESTAMP: timestamp (nullable = true)
>>>{*}df.select('{*}').show(truncate=False)\{*}
|CID|CDATE         |CTIMESTAMP                   |

 
|1     |*2021-11-20*|2022-01-18 18:30:33.896|
|2     |*2021-11-17*|2022-01-18 18:45:59.336|
|3     |*2021-11-16*|2022-01-18 19:01:04.265|

 

We have also tried using phoenix data source instead of JDBC and below is the 
code snippet. It also read DATE datatype field as one day less.

>>>{*}df = spark.read.format("org.apache.phoenix.spark") {{*}}
         *.option("table", "NS_TEST.CUSTOMER_TBL") *
         *.option("zkUrl", "jdbc:phoenix:localhost:2181:/hbase-secure") *
         *.load()*
>>>df.printSchema()
root
 |-- CID: integer (nullable = true)
 |-- *CDATE: date* (nullable = true)
 |-- CTIMESTAMP: timestamp (nullable = true)
>>>{*}df.select('{*}').show(truncate=False)\{*}
|CID|CDATE         |CTIMESTAMP                   |

 
|1     |*2021-11-20*|2022-01-18 18:30:33.896|
|2     |*2021-11-17*|2022-01-18 18:45:59.336|
|3     |*2021-11-16*|2022-01-18 19:01:04.265|

 

Please help us on this issue why Phoenix Spark reading DATE datatype field 
value as {*}one day less{*}.


> Phoenix Spark reading DATE datatype value less than one day from phoenix table
> ------------------------------------------------------------------------------
>
>                 Key: PHOENIX-6623
>                 URL: https://issues.apache.org/jira/browse/PHOENIX-6623
>             Project: Phoenix
>          Issue Type: Bug
>          Components: spark-connector
>    Affects Versions: 4.7.0
>            Reporter: Anand
>            Assignee: Istvan Toth
>            Priority: Blocker
>
> We are using below versions of Phoenix, HBase and Spark.
> Phoenix - 4.7
> HBase - 2.6.5
> Spark - 2.4
> Created a phoenix table by mentioning one of the field datatype as DATE and 
> TIMESTAMP in Phoenix using Squirrel SQL. DDL is given below.
> CREATE TABLE IF NOT EXISTS NS_TEST.CUSTOMER_TBL (
> "CID" INTEGER,
> *"CDATE" DATE,*
> "CTIMESTAMP" TIMESTAMP,
> CONSTRAINT CUSTOMER_TBL_PK PRIMARY KEY ("CID"));
> Upserted records using upsert command and below is the data in table.
>  
> |CID|CDATE         |CTIMESTAMP                 |
>  
> |1   |*2021-11-21*|2022-01-18 18:30:33.896|
> |2   |*2021-11-18*|2022-01-18 18:45:59.336|
> |3   |*2021-11-17*|2022-01-18 19:01:04.265|
>  
> Now, reading data from above created table in pyspark shell. We have set 
> *spark.sql.session.timeZone=UTC* to spark while launching pyspark shell. 
> Also, we have set *phoenix.query.dateFormatTimeZone=UTC* in *hbase-site.xml* 
> file. ** 
> Below code snippet read data from phoenix via JDBC and it r{*}ead DATE 
> datatype field as one day less{*}.
> {noformat}
> >>> val df = spark.read.format("jdbc")
>           .option("driver", "org.apache.phoenix.jdbc.PhoenixDriver")
>           .option("url", "jdbc:phoenix:localhost:2181:/hbase-secure")
>           .option("dbtable", "(SELECT CID, CDATE, CTIMESTAMP FROM 
> NS_TEST.CUSTOMER_TBL) q")
> .load(){noformat}
> >>>df.printSchema()
> root
>  |-- CID: integer (nullable = true)
>  |-- *CDATE: date* (nullable = true)
>  |-- CTIMESTAMP: timestamp (nullable = true)
> >>>{*}df.select('{*}').show(truncate=False)\{*}
> |CID|CDATE         |CTIMESTAMP                   |
>  
> |1     |*2021-11-20*|2022-01-18 18:30:33.896|
> |2     |*2021-11-17*|2022-01-18 18:45:59.336|
> |3     |*2021-11-16*|2022-01-18 19:01:04.265|
>  
> We have also tried using phoenix data source instead of JDBC and below is the 
> code snippet. It also read DATE datatype field as one day less.
> {noformat}
> val df2 = spark.read.format("org.apache.phoenix.spark")
>          .option("table", "NS_TEST.CUSTOMER_TBL")
>          .option("zkUrl", "jdbc:phoenix:localhost:2181:/hbase-secure")
>          .load(){noformat}
> **
> >>>df.printSchema()
> root
>  |-- CID: integer (nullable = true)
>  |-- *CDATE: date* (nullable = true)
>  |-- CTIMESTAMP: timestamp (nullable = true)
> >>>{*}df.select('{*}').show(truncate=False)\{*}
> |CID|CDATE         |CTIMESTAMP                   |
>  
> |1     |*2021-11-20*|2022-01-18 18:30:33.896|
> |2     |*2021-11-17*|2022-01-18 18:45:59.336|
> |3     |*2021-11-16*|2022-01-18 19:01:04.265|
>  
> Please help us on this issue why Phoenix Spark reading DATE datatype field 
> value as {*}one day less{*}.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Updated] (PHOENIX-6623) Phoenix Spark reading DATE datatype value less than one day from phoenix table

Reply via email to