[ https://issues.apache.org/jira/browse/PHOENIX-5035?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
zhongyuhai updated PHOENIX-5035: -------------------------------- Attachment: (was: patch) > phoenix-spark dataframe filtes date or timestamp type with error > ---------------------------------------------------------------- > > Key: PHOENIX-5035 > URL: https://issues.apache.org/jira/browse/PHOENIX-5035 > Project: Phoenix > Issue Type: Bug > Affects Versions: 4.13.0, 4.14.0, 4.13.1, 5.0.0, 4.14.1 > Environment: HBase:apache 1.2 > Phoenix:4.13.1-HBase-1.2 > Hadoop:CDH 2.6 > Spark:2.3.1 > Reporter: zhongyuhai > Priority: Critical > Labels: patch, pull-request-available > Attachments: PHOENIX-5035.patch, table desc.png > > Original Estimate: 0h > Remaining Estimate: 0h > > *table desc as following:* > attach "table desc.png" > > *code as following:* > val df = SparkUtil.getActiveSession().read.format( > "org.apache.phoenix.spark").options(options).load() > df.filter("INCREATEDDATE = date'2018-07-14'") > > *exception as following:* > java.lang.RuntimeException: org.apache.phoenix.schema.TypeMismatchException: > ERROR 203 (22005): Type mismatch. DATE and BIGINT for "INCREATEDDATE" = 1997 > at > org.apache.phoenix.mapreduce.PhoenixInputFormat.getQueryPlan(PhoenixInputFormat.java:201) > at > org.apache.phoenix.mapreduce.PhoenixInputFormat.getSplits(PhoenixInputFormat.java:87) > at org.apache.spark.rdd.NewHadoopRDD.getPartitions(NewHadoopRDD.scala:127) > at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:253) > at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:251) > > *analyse as following:* > In the org.apache.phoenix.spark.PhoenixRelation.compileValue(value: Any): Any > , > > > {code:java} > private def compileValue(value: Any): Any = { > value match { > case stringValue: String => s"'${escapeStringConstant(stringValue)}'" > // Borrowed from 'elasticsearch-hadoop', support these internal UTF types > across Spark versions > // Spark 1.4 > case utf if (isClass(utf, "org.apache.spark.sql.types.UTF8String")) => > s"'${escapeStringConstant(utf.toString)}'" > // Spark 1.5 > case utf if (isClass(utf, "org.apache.spark.unsafe.types.UTF8String")) => > s"'${escapeStringConstant(utf.toString)}'" > > // Pass through anything else > case _ => value > } > {code} > > It only handles the String type , other type returns the toString。It makes > the Spark filte condition "INCREATEDDATE = date'2018-07-14'" translate to > Phoenix filte condition like "INCREATEDDATE = 2018-07-14" ,so Phoenix can not > run with this syntax and throw the exception ERROR 203 (22005): Type > mismatch. DATE and BIGINT for "INCREATEDDATE" = 1997 。 > *soluation as following:* > add handle to other type just like Date 、Timestamp > {code:java} > private def compileValue(value: Any): Any = { > value match { > case stringValue: String => s"'${escapeStringConstant(stringValue)}'" > // Borrowed from 'elasticsearch-hadoop', support these internal UTF types > across Spark versions > // Spark 1.4 > case utf if (isClass(utf, "org.apache.spark.sql.types.UTF8String")) => > s"'${escapeStringConstant(utf.toString)}'" > // Spark 1.5 > case utf if (isClass(utf, "org.apache.spark.unsafe.types.UTF8String")) => > s"'${escapeStringConstant(utf.toString)}'" > case d if(isClass(d , "java.lang.Date") || isClass(d , "java.sql.Date")) => { > val config: Configuration = > HBaseFactoryProvider.getConfigurationFactory.getConfiguration > val dateFormat = config.get(QueryServices.DATE_FORMAT_ATTRIB, > DateUtil.DEFAULT_DATE_FORMAT) > val df = new SimpleDateFormat(dateFormat) > s"date'${df.format(d)}'" > } > case dt if(isClass(dt , "java.sql.Timestamp")) => { > val config: Configuration = > HBaseFactoryProvider.getConfigurationFactory.getConfiguration > val dateTimeFormat = config.get(QueryServices.TIMESTAMP_FORMAT_ATTRIB, > DateUtil.DEFAULT_TIMESTAMP_FORMAT) > val df = new SimpleDateFormat(dateTimeFormat) > s"timestamp'${df.format(dt)}'" > } > // Pass through anything else > case _ => value > } > } > {code} > > > -- This message was sent by Atlassian JIRA (v7.6.3#76005)