Lokesh Kumar created PHOENIX-4347: ------------------------------------- Summary: Spark Dataset loaded using Phoenix Spark Datasource - Timestamp filter issue Key: PHOENIX-4347 URL: https://issues.apache.org/jira/browse/PHOENIX-4347 Project: Phoenix Issue Type: Bug Affects Versions: 4.11.0 Environment: CentOS 6.5, Fedora 25 Reporter: Lokesh Kumar Priority: Major
Created a Phoenix table with below schema: {code:java} CREATE TABLE IF NOT EXISTS sample_table ( id VARCHAR NOT NULL, metricid VARCHAR NOT NULL, timestamp TIMESTAMP NOT NULL, metricvalue DOUBLE, CONSTRAINT st_pk PRIMARY KEY(id,metricid,timestamp)) SALT_BUCKETS = 20; {code} Inserted some data into this and loaded as Spark Dataset using the Phoenix spark datasource ('org.apache.phoenix.spark') options. The Spark Dataset's schema is as given below: root |-- ID: string (nullable = true) |-- METRICID: string (nullable = true) |-- TIMESTAMP: timestamp (nullable = true) |-- METRICVALUE: double (nullable = true) I apply the Dataset's filter operation on Timestamp column as given below: {code:java} Dataset<Row> ds = <Derived from Phoenix> ds = ds.filter("TIMESTAMP >= CAST('2017-10-31 00:00:00.0' AS TIMESTAMP)") {code} This operation throws me an exception as: testPhoenixTimestamp(DatasetTest): org.apache.phoenix.exception.PhoenixParserException: ERROR 604 (42P00): Syntax error. Mismatched input. Expecting "RPAREN", got "00" at line 1, column 145. The generated query looks like this: {code:java} 2017-11-02 15:29:31,722 INFO [main] org.apache.phoenix.mapreduce.PhoenixInputFormat Select Statement: SELECT "ID","METRICID","TIMESTAMP","0"."METRICVALUE" FROM METRIC_TBR_DATA WHERE ( "TIMESTAMP" IS NOT NULL AND "TIMESTAMP" >= *2017-10-31 00:00:00.0*) {code} The issue is highlighted in bold above, where the timestamp value is not wrapped in to_timestamp() function. I have fixed this locally in org.apache.phoenix.spark.PhoenixRelation class compileValue() function, by checking the value's class. If it is java.sql.Timestamp then I am wrapping the value with to_timestamp() function. Please let me know if there is another way of correctly querying Timestamp values in Phoenix through Spark's Dataset API. -- This message was sent by Atlassian JIRA (v6.4.14#64029)