Danke Liu created SPARK-47542: --------------------------------- Summary: spark cannot hit oracle's index when column type is DATE Key: SPARK-47542 URL: https://issues.apache.org/jira/browse/SPARK-47542 Project: Spark Issue Type: Bug Components: Spark Core Affects Versions: 3.2.4 Reporter: Danke Liu
When I use spark's jdbc to pull data from oracle, it will not hit the index if the pushed filter's type in oralce is DATE. Here is my scenario: first I created a dataframe that read from oracle: val df = spark.read.format("jdbc"). option("url", url). option("driver", driver). option("user", user). option("password", passwd). option("dbtable", "select * from foobar.tbl1") .load() then I pushed a filter to the dataframe like this: df.filter("""`update_time` >= to_date('2024-03-12 06:18:17', 'yyyy-MM-dd HH:mm:ss') """).count() this will not hit the index on update_time column. Reason: the update_time column in oracle is DATE type, this mapped to spark has became Timestamp(because precision of DATE in oracle is second), and when I pushed a filter to oracle, it will hit the codes bellow: // class is org.apache.spark.sql.jdbc.OracleDialect override def compileValue(value: Any): Any = value match { // The JDBC drivers support date literals in SQL statements written in the // format: \{d 'yyyy-mm-dd'} and timestamp literals in SQL statements written // in the format: \{ts 'yyyy-mm-dd hh:mm:ss.f...'}. For details, see // 'Oracle Database JDBC Developer’s Guide and Reference, 11g Release 1 (11.1)' // Appendix A Reference Information. case stringValue: String => s"'${escapeSql(stringValue)}'" case timestampValue: Timestamp => "\{ts '" + timestampValue + "'}" case dateValue: Date => "\{d '" + dateValue + "'}" case arrayValue: Array[Any] => arrayValue.map(compileValue).mkString(", ") case _ => value } and this "update_time >= \{ts '2024-03-12 06:18:17'}" will never hit the index. In my case, as a work around, I just change the code to this: {color:#cc7832}case {color}timestampValue: Timestamp =>{color:#6a8759}s"{color}{color:#6a8759}to_date({color}{{color:#9876aa}dateFormat{color}.format(timestampValue)}{color:#6a8759},'yyyy-MM-dd HH:mi:ss'){color}{color:#6a8759}"{color} then it worked well. -- This message was sent by Atlassian Jira (v8.20.10#820010) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org