[ https://issues.apache.org/jira/browse/SPARK-47542?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Danke Liu updated SPARK-47542: ------------------------------ Description: When I use spark's jdbc to pull data from oracle, it will not hit the index if the pushed filter's type in oralce is DATE. Here is my scenario: first I created a dataframe that reads from oracle: val df = spark.read.format("jdbc"). option("url", url). option("driver", driver). option("user", user). option("password", passwd). option("dbtable", "select * from foobar.tbl1") .load() then I apply a filter to the dataframe like this: df.filter("""`update_time` >= to_date('2024-03-12 06:18:17', 'yyyy-MM-dd HH:mm:ss') """).count() this will not hit the index on update_time column. Reason: the update_time column in oracle is DATE type, this mapped to spark has became Timestamp(because precision of DATE in oracle is second), and when I pushed a filter to oracle, it will hit the codes bellow: // class is org.apache.spark.sql.jdbc.OracleDialect override def compileValue(value: Any): Any = value match { // The JDBC drivers support date literals in SQL statements written in the // format: \\{d 'yyyy-mm-dd'} and timestamp literals in SQL statements written // in the format: \{ts 'yyyy-mm-dd hh:mm:ss.f...'}. For details, see // 'Oracle Database JDBC Developer’s Guide and Reference, 11g Release 1 (11.1)' // Appendix A Reference Information. case stringValue: String => s"'${escapeSql(stringValue)}'" case timestampValue: Timestamp => "\{ts '" + timestampValue + "'}" case dateValue: Date => "\{d '" + dateValue + "'}" case arrayValue: Array[Any] => arrayValue.map(compileValue).mkString(", ") case _ => value } and this "update_time >= \{ts '2024-03-12 06:18:17'}" will never hit the index. In my case, as a work around, I just change the code to this: {color:#cc7832}case {color}timestampValue: Timestamp =>{color:#6a8759}s"{color}{color:#6a8759}to_date({color}{\{color:#9876aa}dateFormat.format(timestampValue)},'yyyy-MM-dd HH:mi:ss'){color:#6a8759}"{color} then it worked well. was: When I use spark's jdbc to pull data from oracle, it will not hit the index if the pushed filter's type in oralce is DATE. Here is my scenario: first I created a dataframe that read from oracle: val df = spark.read.format("jdbc"). option("url", url). option("driver", driver). option("user", user). option("password", passwd). option("dbtable", "select * from foobar.tbl1") .load() then I pushed a filter to the dataframe like this: df.filter("""`update_time` >= to_date('2024-03-12 06:18:17', 'yyyy-MM-dd HH:mm:ss') """).count() this will not hit the index on update_time column. Reason: the update_time column in oracle is DATE type, this mapped to spark has became Timestamp(because precision of DATE in oracle is second), and when I pushed a filter to oracle, it will hit the codes bellow: // class is org.apache.spark.sql.jdbc.OracleDialect override def compileValue(value: Any): Any = value match { // The JDBC drivers support date literals in SQL statements written in the // format: \{d 'yyyy-mm-dd'} and timestamp literals in SQL statements written // in the format: \{ts 'yyyy-mm-dd hh:mm:ss.f...'}. For details, see // 'Oracle Database JDBC Developer’s Guide and Reference, 11g Release 1 (11.1)' // Appendix A Reference Information. case stringValue: String => s"'${escapeSql(stringValue)}'" case timestampValue: Timestamp => "\{ts '" + timestampValue + "'}" case dateValue: Date => "\{d '" + dateValue + "'}" case arrayValue: Array[Any] => arrayValue.map(compileValue).mkString(", ") case _ => value } and this "update_time >= \{ts '2024-03-12 06:18:17'}" will never hit the index. In my case, as a work around, I just change the code to this: {color:#cc7832}case {color}timestampValue: Timestamp =>{color:#6a8759}s"{color}{color:#6a8759}to_date({color}{{color:#9876aa}dateFormat{color}.format(timestampValue)}{color:#6a8759},'yyyy-MM-dd HH:mi:ss'){color}{color:#6a8759}"{color} then it worked well. > spark cannot hit oracle's index when column type is DATE > -------------------------------------------------------- > > Key: SPARK-47542 > URL: https://issues.apache.org/jira/browse/SPARK-47542 > Project: Spark > Issue Type: Bug > Components: Spark Core > Affects Versions: 3.2.4 > Reporter: Danke Liu > Priority: Minor > > When I use spark's jdbc to pull data from oracle, it will not hit the index > if the pushed filter's type in oralce is DATE. > Here is my scenario: > first I created a dataframe that reads from oracle: > val df = spark.read.format("jdbc"). > > option("url", url). > > option("driver", driver). > > option("user", user). > > option("password", passwd). > > option("dbtable", "select * from foobar.tbl1") > .load() > then I apply a filter to the dataframe like this: > df.filter("""`update_time` >= to_date('2024-03-12 06:18:17', 'yyyy-MM-dd > HH:mm:ss') """).count() > this will not hit the index on update_time column. > > Reason: > the update_time column in oracle is DATE type, this mapped to spark has > became Timestamp(because precision of DATE in oracle is second), and when I > pushed a filter to oracle, it will hit the codes bellow: > // class is org.apache.spark.sql.jdbc.OracleDialect > override def compileValue(value: Any): Any = value match > { // The JDBC drivers support date literals in SQL statements written in > the // format: \\{d 'yyyy-mm-dd'} > and timestamp literals in SQL statements written > // in the format: \{ts 'yyyy-mm-dd hh:mm:ss.f...'}. For details, see > // 'Oracle Database JDBC Developer’s Guide and Reference, 11g Release 1 > (11.1)' > // Appendix A Reference Information. > case stringValue: String => s"'${escapeSql(stringValue)}'" > case timestampValue: Timestamp => "\{ts '" + timestampValue + "'}" > case dateValue: Date => "\{d '" + dateValue + "'}" > case arrayValue: Array[Any] => arrayValue.map(compileValue).mkString(", ") > case _ => value > } > > and this "update_time >= \{ts '2024-03-12 06:18:17'}" will never hit the > index. > In my case, as a work around, I just change the code to this: > {color:#cc7832}case {color}timestampValue: Timestamp > =>{color:#6a8759}s"{color}{color:#6a8759}to_date({color}{\{color:#9876aa}dateFormat.format(timestampValue)},'yyyy-MM-dd > HH:mi:ss'){color:#6a8759}"{color} > > then it worked well. > > > > -- This message was sent by Atlassian Jira (v8.20.10#820010) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org