[jira] [Updated] (SPARK-47542) spark cannot hit oracle's index when column type is DATE

Danke Liu (Jira) Mon, 25 Mar 2024 02:09:07 -0700


     [ 
https://issues.apache.org/jira/browse/SPARK-47542?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]


Danke Liu updated SPARK-47542:
------------------------------
    Description: 
When I use spark's jdbc to pull data from oracle, it will not hit the index if 
the pushed filter's type in oralce is DATE.

Here is my scenario:

first I created  a dataframe that reads from oracle:

val df = spark.read.format("jdbc").                                             
                           
          option("url", url).                                                   
                       
          option("driver", driver).                                             
                       
          option("user", user).                                                 
                  
          option("password", passwd).                                           
                                                      
          option("dbtable", "select * from foobar.tbl1") 
          .load()

then I apply a filter to the dataframe like this:

df.filter("""`update_time` >= to_date('2024-03-12 06:18:17', 'yyyy-MM-dd 
HH:mm:ss')  """).count()

this will not hit the index on update_time column.

 

Reason：

The update_time column in Oracle is of type DATE, which is mapped to Timestamp 
in Spark (because the precision of DATE in Oracle is second). When I push a 
filter to Oracle, it triggers the following code in 
org.apache.spark.sql.jdbc.OracleDialect:

 

override def compileValue(value: Any): Any = value match

{     // The JDBC drivers support date literals in SQL statements written in 
the     // format:

{d 'yyyy-mm-dd'}

and timestamp literals in SQL statements written
    // in the format: \{ts 'yyyy-mm-dd hh:mm:ss.f...'}. For details, see
    // 'Oracle Database JDBC Developer’s Guide and Reference, 11g Release 1 
(11.1)'
    // Appendix A Reference Information.
    case stringValue: String => s"'${escapeSql(stringValue)}'"
    case timestampValue: Timestamp => "\{ts '" + timestampValue + "'}"
    case dateValue: Date => "\{d '" + dateValue + "'}"
    case arrayValue: Array[Any] => arrayValue.map(compileValue).mkString(", ")
    case _ => value
  }

 

As a result, the condition "update_time >= \{ts '2024-03-12 06:18:17'} will 
never hit the index.

In my case, as a workaround, I changed the code to this:

{color:#cc7832}case {color}timestampValue: Timestamp 
=>{color:#6a8759}s"{color}{color:#6a8759}to_date({color}{\{color:#9876aa}dateFormat.format(timestampValue)},'yyyy-MM-dd
 HH:mi:ss')"

 

After this modification, it worked well.

 

 

 

 

  was:
When I use spark's jdbc to pull data from oracle, it will not hit the index if 
the pushed filter's type in oralce is DATE.

Here is my scenario:

first I created  a dataframe that reads from oracle:

val df = spark.read.format("jdbc").                                             
                           
          option("url", url).                                                   
                       
          option("driver", driver).                                             
                       
          option("user", user).                                                 
                  
          option("password", passwd).                                           
                                                      
          option("dbtable", "select * from foobar.tbl1") 
          .load()

then I apply a filter to the dataframe like this:

df.filter("""`update_time` >= to_date('2024-03-12 06:18:17', 'yyyy-MM-dd 
HH:mm:ss')  """).count()

this will not hit the index on update_time column.

 

Reason：

The update_time column in Oracle is of type DATE, which is mapped to Timestamp 
in Spark (because the precision of DATE in Oracle is second). When I push a 
filter to Oracle, it triggers the following code in 
org.apache.spark.sql.jdbc.OracleDialect:

// class is org.apache.spark.sql.jdbc.OracleDialect

override def compileValue(value: Any): Any = value match

{     // The JDBC drivers support date literals in SQL statements written in 
the     // format:

{d 'yyyy-mm-dd'}

and timestamp literals in SQL statements written
    // in the format: \{ts 'yyyy-mm-dd hh:mm:ss.f...'}. For details, see
    // 'Oracle Database JDBC Developer’s Guide and Reference, 11g Release 1 
(11.1)'
    // Appendix A Reference Information.
    case stringValue: String => s"'${escapeSql(stringValue)}'"
    case timestampValue: Timestamp => "\{ts '" + timestampValue + "'}"
    case dateValue: Date => "\{d '" + dateValue + "'}"
    case arrayValue: Array[Any] => arrayValue.map(compileValue).mkString(", ")
    case _ => value
  }

 

As a result, the condition "update_time >= \{ts '2024-03-12 06:18:17'} will 
never hit the index.

In my case, as a workaround, I changed the code to this:

{color:#cc7832}case {color}timestampValue: Timestamp 
=>{color:#6a8759}s"{color}{color:#6a8759}to_date({color}{{color:#9876aa}dateFormat.format(timestampValue)},'yyyy-MM-dd
 HH:mi:ss')"{color}

 

After this modification, it worked well.

 

 

 

 


> spark cannot hit oracle's index when column type is DATE
> --------------------------------------------------------
>
>                 Key: SPARK-47542
>                 URL: https://issues.apache.org/jira/browse/SPARK-47542
>             Project: Spark
>          Issue Type: Bug
>          Components: Spark Core
>    Affects Versions: 3.2.4
>            Reporter: Danke Liu
>            Priority: Minor
>
> When I use spark's jdbc to pull data from oracle, it will not hit the index 
> if the pushed filter's type in oralce is DATE.
> Here is my scenario:
> first I created  a dataframe that reads from oracle:
> val df = spark.read.format("jdbc").                                           
>                              
>           option("url", url).                                                 
>                          
>           option("driver", driver).                                           
>                          
>           option("user", user).                                               
>                     
>           option("password", passwd).                                         
>                                                         
>           option("dbtable", "select * from foobar.tbl1") 
>           .load()
> then I apply a filter to the dataframe like this:
> df.filter("""`update_time` >= to_date('2024-03-12 06:18:17', 'yyyy-MM-dd 
> HH:mm:ss')  """).count()
> this will not hit the index on update_time column.
>  
> Reason：
> The update_time column in Oracle is of type DATE, which is mapped to 
> Timestamp in Spark (because the precision of DATE in Oracle is second). When 
> I push a filter to Oracle, it triggers the following code in 
> org.apache.spark.sql.jdbc.OracleDialect:
>  
> override def compileValue(value: Any): Any = value match
> {     // The JDBC drivers support date literals in SQL statements written in 
> the     // format:
> {d 'yyyy-mm-dd'}
> and timestamp literals in SQL statements written
>     // in the format: \{ts 'yyyy-mm-dd hh:mm:ss.f...'}. For details, see
>     // 'Oracle Database JDBC Developer’s Guide and Reference, 11g Release 1 
> (11.1)'
>     // Appendix A Reference Information.
>     case stringValue: String => s"'${escapeSql(stringValue)}'"
>     case timestampValue: Timestamp => "\{ts '" + timestampValue + "'}"
>     case dateValue: Date => "\{d '" + dateValue + "'}"
>     case arrayValue: Array[Any] => arrayValue.map(compileValue).mkString(", ")
>     case _ => value
>   }
>  
> As a result, the condition "update_time >= \{ts '2024-03-12 06:18:17'} will 
> never hit the index.
> In my case, as a workaround, I changed the code to this:
> {color:#cc7832}case {color}timestampValue: Timestamp 
> =>{color:#6a8759}s"{color}{color:#6a8759}to_date({color}{\{color:#9876aa}dateFormat.format(timestampValue)},'yyyy-MM-dd
>  HH:mi:ss')"
>  
> After this modification, it worked well.
>  
>  
>  
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-47542) spark cannot hit oracle's index when column type is DATE

Reply via email to