[jira] [Commented] (PHOENIX-4347) Spark Dataset loaded using Phoenix Spark Datasource - Timestamp filter issue

Josh Mahonin (JIRA) Thu, 02 Nov 2017 08:28:23 -0700

    [ 
https://issues.apache.org/jira/browse/PHOENIX-4347?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16235907#comment-16235907
 ]


Josh Mahonin commented on PHOENIX-4347:
---------------------------------------

Can you post this question to the phoenix-users mailing list? I suspect someone 
may have run into this and found a way to do it already. However, if you're 
able to provide a reproducible unit test in PhoenixSparkIT [ 1 ] which 
necessitates a patch, a contribution would be most welcome.

Thanks! 

[ 1 ] 
https://github.com/apache/phoenix/blob/master/phoenix-spark/src/it/scala/org/apache/phoenix/spark/PhoenixSparkIT.scala

> Spark Dataset loaded using Phoenix Spark Datasource - Timestamp filter issue
> ----------------------------------------------------------------------------
>
>                 Key: PHOENIX-4347
>                 URL: https://issues.apache.org/jira/browse/PHOENIX-4347
>             Project: Phoenix
>          Issue Type: Bug
>    Affects Versions: 4.11.0
>         Environment: CentOS 6.5, Fedora 25
>            Reporter: Lokesh Kumar
>            Priority: Major
>              Labels: phoenix, spark-sql
>
> Created a Phoenix table with below schema:
> {code:java}
> CREATE TABLE IF NOT EXISTS sample_table (
>   id VARCHAR NOT NULL, 
>   metricid VARCHAR NOT NULL,
>   timestamp TIMESTAMP NOT NULL,
>   metricvalue DOUBLE, 
>   CONSTRAINT st_pk PRIMARY KEY(id,metricid,timestamp)) SALT_BUCKETS = 20;
> {code}
> Inserted some data into this and loaded as Spark Dataset using the Phoenix 
> spark datasource  ('org.apache.phoenix.spark') options.
> The Spark Dataset's schema is as given below:
> root
>  |-- ID: string (nullable = true)
>  |-- METRICID: string (nullable = true)
>  |-- TIMESTAMP: timestamp (nullable = true)
>  |-- METRICVALUE: double (nullable = true)
> I apply the Dataset's filter operation on Timestamp column as given below:
> {code:java}
> Dataset<Row> ds = <Derived from Phoenix>
> ds = ds.filter("TIMESTAMP >= CAST('2017-10-31 00:00:00.0' AS TIMESTAMP)")
> {code}
> This operation throws me an exception as:
>  testPhoenixTimestamp(DatasetTest): 
> org.apache.phoenix.exception.PhoenixParserException: ERROR 604 (42P00): 
> Syntax error. Mismatched input. Expecting "RPAREN", got "00" at line 1, 
> column 145.
> The generated query looks like this:
> {code:java}
> 2017-11-02 15:29:31,722 INFO  [main] 
> org.apache.phoenix.mapreduce.PhoenixInputFormat
> Select Statement: SELECT "ID","METRICID","TIMESTAMP","0"."METRICVALUE" FROM 
> SAMPLE_TABLE WHERE ( "TIMESTAMP" IS NOT NULL AND "TIMESTAMP" >= 2017-10-31 
> 00:00:00.0)
> {code}
> The issue is with Timestamp filter condition, where the timestamp value is 
> not wrapped in to_timestamp() function.
> I have fixed this locally in org.apache.phoenix.spark.PhoenixRelation class 
> compileValue() function, by checking the value's class. If it is 
> java.sql.Timestamp then I am wrapping the value with to_timestamp() function.
> Please let me know if there is another way of correctly querying Timestamp 
> values in Phoenix through Spark's Dataset API.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Commented] (PHOENIX-4347) Spark Dataset loaded using Phoenix Spark Datasource - Timestamp filter issue

Reply via email to