Danke Liu created SPARK-47542:
---------------------------------

             Summary: spark cannot hit oracle's index when column type is DATE
                 Key: SPARK-47542
                 URL: https://issues.apache.org/jira/browse/SPARK-47542
             Project: Spark
          Issue Type: Bug
          Components: Spark Core
    Affects Versions: 3.2.4
            Reporter: Danke Liu


When I use spark's jdbc to pull data from oracle, it will not hit the index if 
the pushed filter's type in oralce is DATE.

Here is my scenario:

first I created  a dataframe that read from oracle:

val df = spark.read.format("jdbc").                                             
                           
          option("url", url).                                                   
                       
          option("driver", driver).                                             
                       
          option("user", user).                                                 
                  
          option("password", passwd).                                           
                                                      
          option("dbtable", "select * from foobar.tbl1") 
          .load()

then I pushed a filter to the dataframe like this:

df.filter("""`update_time` >= to_date('2024-03-12 06:18:17', 'yyyy-MM-dd 
HH:mm:ss')  """).count()

this will not hit the index on update_time column.

 

Reason:

the update_time column in oracle is DATE type, this mapped to spark has became 
Timestamp(because precision of DATE in oracle is second), and when I pushed a 
filter to oracle, it will hit the codes bellow:

// class is org.apache.spark.sql.jdbc.OracleDialect

override def compileValue(value: Any): Any = value match {
    // The JDBC drivers support date literals in SQL statements written in the
    // format: \{d 'yyyy-mm-dd'} and timestamp literals in SQL statements 
written
    // in the format: \{ts 'yyyy-mm-dd hh:mm:ss.f...'}. For details, see
    // 'Oracle Database JDBC Developer’s Guide and Reference, 11g Release 1 
(11.1)'
    // Appendix A Reference Information.
    case stringValue: String => s"'${escapeSql(stringValue)}'"
    case timestampValue: Timestamp => "\{ts '" + timestampValue + "'}"
    case dateValue: Date => "\{d '" + dateValue + "'}"
    case arrayValue: Array[Any] => arrayValue.map(compileValue).mkString(", ")
    case _ => value
  }

 

and this "update_time >= \{ts '2024-03-12 06:18:17'}" will never hit the index.

In my case, as a work around, I just change the code to this:

{color:#cc7832}case {color}timestampValue: Timestamp 
=>{color:#6a8759}s"{color}{color:#6a8759}to_date({color}{{color:#9876aa}dateFormat{color}.format(timestampValue)}{color:#6a8759},'yyyy-MM-dd
 HH:mi:ss'){color}{color:#6a8759}"{color}

 

then it worked well.

 

 

 

 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

Reply via email to