[jira] [Commented] (PHOENIX-3664) Pyspark: pushing filter by date against apache phoenix

Josh Mahonin (JIRA) Mon, 13 Feb 2017 06:06:32 -0800

    [ 
https://issues.apache.org/jira/browse/PHOENIX-3664?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15863720#comment-15863720
 ]


Josh Mahonin commented on PHOENIX-3664:
---------------------------------------

Hi [~pablo.castellanos]

I've not seen this before, although I wonder if there's perhaps a few issues at 
play.
1) Some sort of date translation issue between python datetime, pySpark and 
phoenix-spark
2) An issue with how Spark treats the 'java.sql.Date' type, and how Phoenix 
stores it internally

Re: 1) Is it possible to attempt a similar code block using Scala in the 
spark-shell? I think it should be pretty much the same code, just replace 
{{datetime.datetime.now}} with {{System.currentTimeMillis}}

Re: 2) You might have some success passing the 'dateAsTimestamp' flag to Spark. 
Effectively Spark truncates the HH:MM:SS part of a date off, even though it is 
present in the Phoenix data type. I wonder if pyspark is doing anything strange 
with that.

https://github.com/apache/phoenix/blob/a0e5efcec5a1a732b2dce9794251242c3d66eea6/phoenix-spark/src/it/scala/org/apache/phoenix/spark/PhoenixSparkIT.scala#L622-L633

> Pyspark: pushing filter by date against apache phoenix
> ------------------------------------------------------
>
>                 Key: PHOENIX-3664
>                 URL: https://issues.apache.org/jira/browse/PHOENIX-3664
>             Project: Phoenix
>          Issue Type: Bug
>    Affects Versions: 4.7.0
>         Environment: Azure HDIndight - pyspark using phoenix client.
>            Reporter: Pablo Castilla
>
> I am trying to filter by date in apache phoenix from pyspark. The column in 
> phoenix is created as Date and the filter is a datetime. When I use explain I 
> see spark doesn't push the filter to phoenix. I have tried a lot of 
> combinations without luck.
> Any way to do it?
> df = sqlContext.read \
>    .format("org.apache.phoenix.spark") \
>   .option("table", "TABLENAME") \
>   .option("zkUrl",zookepperServer +":2181:/hbase-unsecure" ) \
>   .load()
> print(df.printSchema())
> startValidation = datetime.datetime.now()
> print(df.filter(df['FH'] >startValidation).explain(True))
> Results:
> root
>  |-- METER_ID: string (nullable = true)
>  |-- FH: date (nullable = true)
> None
>    == Parsed Logical Plan ==
> 'Filter (FH#53 > 1486726683446150)
> +- 
> Relation[METER_ID#52,FH#53,SUMMERTIME#54,MAGNITUDE#55,SOURCE#56,ENTRY_DATETIME#57,BC#58,T_VAL_AE#59,T_VAL_AI#60,T_VAL_R1#61,T_VAL_R2#62,T_VAL_R3#63,T_VAL_R4#64]
>  PhoenixRelation(DAILYREADS,10.0.0.13:2181:/hbase-unsecure)
> == Analyzed Logical Plan ==
> METER_ID: string, FH: date, SUMMERTIME: string, MAGNITUDE: int, SOURCE: int, 
> ENTRY_DATETIME: date, BC: string, T_VAL_AE: int, T_VAL_AI: int, T_VAL_R1: 
> int, T_VAL_R2: int, T_VAL_R3: int, T_VAL_R4: int
> Filter (cast(FH#53 as string) > cast(1486726683446150 as string))
> +- 
> Relation[METER_ID#52,FH#53,SUMMERTIME#54,MAGNITUDE#55,SOURCE#56,ENTRY_DATETIME#57,BC#58,T_VAL_AE#59,T_VAL_AI#60,T_VAL_R1#61,T_VAL_R2#62,T_VAL_R3#63,T_VAL_R4#64]
>  PhoenixRelation(DAILYREADS,10.0.0.13:2181:/hbase-unsecure)
> == Optimized Logical Plan ==
> Filter (cast(FH#53 as string) > 2017-02-10 11:38:03.44615)
> +- 
> Relation[METER_ID#52,FH#53,SUMMERTIME#54,MAGNITUDE#55,SOURCE#56,ENTRY_DATETIME#57,BC#58,T_VAL_AE#59,T_VAL_AI#60,T_VAL_R1#61,T_VAL_R2#62,T_VAL_R3#63,T_VAL_R4#64]
>  PhoenixRelation(DAILYREADS,10.0.0.13:2181:/hbase-unsecure)
> == Physical Plan ==
> Filter (cast(FH#53 as string) > 2017-02-10 11:38:03.44615)
> +- Scan 
> PhoenixRelation(DAILYREADS,10.0.0.13:2181:/hbase-unsecure)[METER_ID#52,FH#53,SUMMERTIME#54,MAGNITUDE#55,SOURCE#56,ENTRY_DATETIME#57,BC#58,T_VAL_AE#59,T_VAL_AI#60,T_VAL_R1#61,T_VAL_R2#62,T_VAL_R3#63,T_VAL_R4#64]
> None
> if I set the FH column as timestamp it pushes the filter but throws an 
> exception:
> Caused by: org.apache.phoenix.exception.PhoenixParserException: ERROR 604 
> (42P00): Syntax error. Mismatched input. Expecting "RPAREN", got "12" at line 
> 1, column 219.
>     at 
> org.apache.phoenix.exception.PhoenixParserException.newException(PhoenixParserException.java:33)
>     at org.apache.phoenix.parse.SQLParser.parseStatement(SQLParser.java:111)
>     at 
> org.apache.phoenix.jdbc.PhoenixStatement$PhoenixStatementParser.parseStatement(PhoenixStatement.java:1280)
>     at 
> org.apache.phoenix.jdbc.PhoenixStatement.parseStatement(PhoenixStatement.java:1363)
>     at 
> org.apache.phoenix.jdbc.PhoenixStatement.compileQuery(PhoenixStatement.java:1373)
>     at 
> org.apache.phoenix.jdbc.PhoenixStatement.optimizeQuery(PhoenixStatement.java:1368)
>     at 
> org.apache.phoenix.mapreduce.PhoenixInputFormat.getQueryPlan(PhoenixInputFormat.java:122)
>     ... 102 more
> Caused by: MismatchedTokenException(106!=129)
>     at 
> org.apache.phoenix.parse.PhoenixSQLParser.recoverFromMismatchedToken(PhoenixSQLParser.java:360)
>     at 
> org.apache.phoenix.shaded.org.antlr.runtime.BaseRecognizer.match(BaseRecognizer.java:115)
>     at 
> org.apache.phoenix.parse.PhoenixSQLParser.not_expression(PhoenixSQLParser.java:6862)
>     at 
> org.apache.phoenix.parse.PhoenixSQLParser.and_expression(PhoenixSQLParser.java:6677)
>     at 
> org.apache.phoenix.parse.PhoenixSQLParser.or_expression(PhoenixSQLParser.java:6614)
>     at 
> org.apache.phoenix.parse.PhoenixSQLParser.expression(PhoenixSQLParser.java:6579)
>     at 
> org.apache.phoenix.parse.PhoenixSQLParser.single_select(PhoenixSQLParser.java:4615)
>     at 
> org.apache.phoenix.parse.PhoenixSQLParser.unioned_selects(PhoenixSQLParser.java:4697)
>     at 
> org.apache.phoenix.parse.PhoenixSQLParser.select_node(PhoenixSQLParser.java:4763)
>     at 
> org.apache.phoenix.parse.PhoenixSQLParser.oneStatement(PhoenixSQLParser.java:789)
>     at 
> org.apache.phoenix.parse.PhoenixSQLParser.statement(PhoenixSQLParser.java:508)
>     at org.apache.phoenix.parse.SQLParser.parseStatement(SQLParser.java:108)
>     ... 107 more



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

[jira] [Commented] (PHOENIX-3664) Pyspark: pushing filter by date against apache phoenix

Reply via email to