subject:"\[jira\] \[Updated\] \(SPARK\-47542\) spark cannot hit oracle's index when column type is DATE"

[jira] [Updated] (SPARK-47542) spark cannot hit oracle's index when column type is DATE

2024-03-25 Thread Danke Liu (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-47542?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Danke Liu updated SPARK-47542:
--
Description: 
When I use spark's jdbc to pull data from oracle, it will not hit the index if 
the pushed filter's type in oralce is DATE.

Here is my scenario:

first I create a dataframe that reads from oracle:

val df = spark.read.format("jdbc").                                             
                           
          option("url", url).                                                   
                       
          option("driver", driver).                                             
                       
          option("user", user).                                                 
                  
          option("password", passwd).                                           
                                                      
          option("dbtable", "select * from foobar.tbl1") 
          .load()

then I apply a filter to the dataframe like this:

df.filter("""`update_time` >= to_date('2024-03-12 06:18:17', '-MM-dd 
HH:mm:ss')  """).count()

this will not hit the index on update_time column.

 

Reason：

The update_time column in Oracle is of type DATE, which is mapped to Timestamp 
in Spark (because the precision of DATE in Oracle is second). When I push a 
filter to Oracle, it triggers the following code in 
org.apache.spark.sql.jdbc.OracleDialect:

 

override def compileValue(value: Any): Any = value match

{     // The JDBC drivers support date literals in SQL statements written in 
the     // format:

{d '-mm-dd'}

and timestamp literals in SQL statements written
    // in the format: \{ts '-mm-dd hh:mm:ss.f...'}. For details, see
    // 'Oracle Database JDBC Developer’s Guide and Reference, 11g Release 1 
(11.1)'
    // Appendix A Reference Information.
    case stringValue: String => s"'${escapeSql(stringValue)}'"
    case timestampValue: Timestamp => "\{ts '" + timestampValue + "'}"
    case dateValue: Date => "\{d '" + dateValue + "'}"
    case arrayValue: Array[Any] => arrayValue.map(compileValue).mkString(", ")
    case _ => value
  }

 

As a result, the condition "update_time >= \{ts '2024-03-12 06:18:17'} will 
never hit the index.

In my case, as a workaround, I changed the code to this:

{color:#cc7832}case {color}timestampValue: Timestamp 
=>{color:#6a8759}s"{color}{color:#6a8759}to_date({color}

{dateFormat.format(timestampValue)}

,'-MM-dd HH:mi:ss')"

 

After this modification, it worked well.

 

 

 

 

  was:
When I use spark's jdbc to pull data from oracle, it will not hit the index if 
the pushed filter's type in oralce is DATE.

Here is my scenario:

first I created  a dataframe that reads from oracle:

val df = spark.read.format("jdbc").                                             
                           
          option("url", url).                                                   
                       
          option("driver", driver).                                             
                       
          option("user", user).                                                 
                  
          option("password", passwd).                                           
                                                      
          option("dbtable", "select * from foobar.tbl1") 
          .load()

then I apply a filter to the dataframe like this:

df.filter("""`update_time` >= to_date('2024-03-12 06:18:17', '-MM-dd 
HH:mm:ss')  """).count()

this will not hit the index on update_time column.

 

Reason：

The update_time column in Oracle is of type DATE, which is mapped to Timestamp 
in Spark (because the precision of DATE in Oracle is second). When I push a 
filter to Oracle, it triggers the following code in 
org.apache.spark.sql.jdbc.OracleDialect:

 

override def compileValue(value: Any): Any = value match

{     // The JDBC drivers support date literals in SQL statements written in 
the     // format:

{d '-mm-dd'}

and timestamp literals in SQL statements written
    // in the format: \{ts '-mm-dd hh:mm:ss.f...'}. For details, see
    // 'Oracle Database JDBC Developer’s Guide and Reference, 11g Release 1 
(11.1)'
    // Appendix A Reference Information.
    case stringValue: String => s"'${escapeSql(stringValue)}'"
    case timestampValue: Timestamp => "\{ts '" + timestampValue + "'}"
    case dateValue: Date => "\{d '" + dateValue + "'}"
    case arrayValue: Array[Any] => arrayValue.map(compileValue).mkString(", ")
    case _ => value
  }

 

As a result, the condition "update_time >= \{ts '2024-03-12 06:18:17'} will 
never hit the index.

In my case, as a workaround, I changed the code to this:

{color:#cc7832}case {color}timestampValue: Timestamp 
=>{color:#6a8759}s"{color}{color:#6a8759}to_date({color}{dateFormat.format(timestampValue)},'-MM-dd
 HH:mi:ss')"

 

After this

[jira] [Updated] (SPARK-47542) spark cannot hit oracle's index when column type is DATE

2024-03-25 Thread Danke Liu (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-47542?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Danke Liu updated SPARK-47542:
--
Description: 
When I use spark's jdbc to pull data from oracle, it will not hit the index if 
the pushed filter's type in oralce is DATE.

Here is my scenario:

first I created  a dataframe that reads from oracle:

val df = spark.read.format("jdbc").                                             
                           
          option("url", url).                                                   
                       
          option("driver", driver).                                             
                       
          option("user", user).                                                 
                  
          option("password", passwd).                                           
                                                      
          option("dbtable", "select * from foobar.tbl1") 
          .load()

then I apply a filter to the dataframe like this:

df.filter("""`update_time` >= to_date('2024-03-12 06:18:17', '-MM-dd 
HH:mm:ss')  """).count()

this will not hit the index on update_time column.

 

Reason：

The update_time column in Oracle is of type DATE, which is mapped to Timestamp 
in Spark (because the precision of DATE in Oracle is second). When I push a 
filter to Oracle, it triggers the following code in 
org.apache.spark.sql.jdbc.OracleDialect:

 

override def compileValue(value: Any): Any = value match

{     // The JDBC drivers support date literals in SQL statements written in 
the     // format:

{d '-mm-dd'}

and timestamp literals in SQL statements written
    // in the format: \{ts '-mm-dd hh:mm:ss.f...'}. For details, see
    // 'Oracle Database JDBC Developer’s Guide and Reference, 11g Release 1 
(11.1)'
    // Appendix A Reference Information.
    case stringValue: String => s"'${escapeSql(stringValue)}'"
    case timestampValue: Timestamp => "\{ts '" + timestampValue + "'}"
    case dateValue: Date => "\{d '" + dateValue + "'}"
    case arrayValue: Array[Any] => arrayValue.map(compileValue).mkString(", ")
    case _ => value
  }

 

As a result, the condition "update_time >= \{ts '2024-03-12 06:18:17'} will 
never hit the index.

In my case, as a workaround, I changed the code to this:

{color:#cc7832}case {color}timestampValue: Timestamp 
=>{color:#6a8759}s"{color}{color:#6a8759}to_date({color}{dateFormat.format(timestampValue)},'-MM-dd
 HH:mi:ss')"

 

After this modification, it worked well.

 

 

 

 

  was:
When I use spark's jdbc to pull data from oracle, it will not hit the index if 
the pushed filter's type in oralce is DATE.

Here is my scenario:

first I created  a dataframe that reads from oracle:

val df = spark.read.format("jdbc").                                             
                           
          option("url", url).                                                   
                       
          option("driver", driver).                                             
                       
          option("user", user).                                                 
                  
          option("password", passwd).                                           
                                                      
          option("dbtable", "select * from foobar.tbl1") 
          .load()

then I apply a filter to the dataframe like this:

df.filter("""`update_time` >= to_date('2024-03-12 06:18:17', '-MM-dd 
HH:mm:ss')  """).count()

this will not hit the index on update_time column.

 

Reason：

The update_time column in Oracle is of type DATE, which is mapped to Timestamp 
in Spark (because the precision of DATE in Oracle is second). When I push a 
filter to Oracle, it triggers the following code in 
org.apache.spark.sql.jdbc.OracleDialect:

 

override def compileValue(value: Any): Any = value match

{     // The JDBC drivers support date literals in SQL statements written in 
the     // format:

{d '-mm-dd'}

and timestamp literals in SQL statements written
    // in the format: \{ts '-mm-dd hh:mm:ss.f...'}. For details, see
    // 'Oracle Database JDBC Developer’s Guide and Reference, 11g Release 1 
(11.1)'
    // Appendix A Reference Information.
    case stringValue: String => s"'${escapeSql(stringValue)}'"
    case timestampValue: Timestamp => "\{ts '" + timestampValue + "'}"
    case dateValue: Date => "\{d '" + dateValue + "'}"
    case arrayValue: Array[Any] => arrayValue.map(compileValue).mkString(", ")
    case _ => value
  }

 

As a result, the condition "update_time >= \{ts '2024-03-12 06:18:17'} will 
never hit the index.

In my case, as a workaround, I changed the code to this:

{color:#cc7832}case {color}timestampValue: Timestamp 
=>{color:#6a8759}s"{color}{color:#6a8759}to_date({color}{\{color:#9876aa}dateFormat.format(timestampValue)},'-MM-dd
 HH:mi:ss')"

 

After

[jira] [Updated] (SPARK-47542) spark cannot hit oracle's index when column type is DATE

2024-03-25 Thread Danke Liu (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-47542?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Danke Liu updated SPARK-47542:
--
Description: 
When I use spark's jdbc to pull data from oracle, it will not hit the index if 
the pushed filter's type in oralce is DATE.

Here is my scenario:

first I created  a dataframe that reads from oracle:

val df = spark.read.format("jdbc").                                             
                           
          option("url", url).                                                   
                       
          option("driver", driver).                                             
                       
          option("user", user).                                                 
                  
          option("password", passwd).                                           
                                                      
          option("dbtable", "select * from foobar.tbl1") 
          .load()

then I apply a filter to the dataframe like this:

df.filter("""`update_time` >= to_date('2024-03-12 06:18:17', '-MM-dd 
HH:mm:ss')  """).count()

this will not hit the index on update_time column.

 

Reason：

The update_time column in Oracle is of type DATE, which is mapped to Timestamp 
in Spark (because the precision of DATE in Oracle is second). When I push a 
filter to Oracle, it triggers the following code in 
org.apache.spark.sql.jdbc.OracleDialect:

 

override def compileValue(value: Any): Any = value match

{     // The JDBC drivers support date literals in SQL statements written in 
the     // format:

{d '-mm-dd'}

and timestamp literals in SQL statements written
    // in the format: \{ts '-mm-dd hh:mm:ss.f...'}. For details, see
    // 'Oracle Database JDBC Developer’s Guide and Reference, 11g Release 1 
(11.1)'
    // Appendix A Reference Information.
    case stringValue: String => s"'${escapeSql(stringValue)}'"
    case timestampValue: Timestamp => "\{ts '" + timestampValue + "'}"
    case dateValue: Date => "\{d '" + dateValue + "'}"
    case arrayValue: Array[Any] => arrayValue.map(compileValue).mkString(", ")
    case _ => value
  }

 

As a result, the condition "update_time >= \{ts '2024-03-12 06:18:17'} will 
never hit the index.

In my case, as a workaround, I changed the code to this:

{color:#cc7832}case {color}timestampValue: Timestamp 
=>{color:#6a8759}s"{color}{color:#6a8759}to_date({color}{\{color:#9876aa}dateFormat.format(timestampValue)},'-MM-dd
 HH:mi:ss')"

 

After this modification, it worked well.

 

 

 

 

  was:
When I use spark's jdbc to pull data from oracle, it will not hit the index if 
the pushed filter's type in oralce is DATE.

Here is my scenario:

first I created  a dataframe that reads from oracle:

val df = spark.read.format("jdbc").                                             
                           
          option("url", url).                                                   
                       
          option("driver", driver).                                             
                       
          option("user", user).                                                 
                  
          option("password", passwd).                                           
                                                      
          option("dbtable", "select * from foobar.tbl1") 
          .load()

then I apply a filter to the dataframe like this:

df.filter("""`update_time` >= to_date('2024-03-12 06:18:17', '-MM-dd 
HH:mm:ss')  """).count()

this will not hit the index on update_time column.

 

Reason：

The update_time column in Oracle is of type DATE, which is mapped to Timestamp 
in Spark (because the precision of DATE in Oracle is second). When I push a 
filter to Oracle, it triggers the following code in 
org.apache.spark.sql.jdbc.OracleDialect:

// class is org.apache.spark.sql.jdbc.OracleDialect

override def compileValue(value: Any): Any = value match

{     // The JDBC drivers support date literals in SQL statements written in 
the     // format:

{d '-mm-dd'}

and timestamp literals in SQL statements written
    // in the format: \{ts '-mm-dd hh:mm:ss.f...'}. For details, see
    // 'Oracle Database JDBC Developer’s Guide and Reference, 11g Release 1 
(11.1)'
    // Appendix A Reference Information.
    case stringValue: String => s"'${escapeSql(stringValue)}'"
    case timestampValue: Timestamp => "\{ts '" + timestampValue + "'}"
    case dateValue: Date => "\{d '" + dateValue + "'}"
    case arrayValue: Array[Any] => arrayValue.map(compileValue).mkString(", ")
    case _ => value
  }

 

As a result, the condition "update_time >= \{ts '2024-03-12 06:18:17'} will 
never hit the index.

In my case, as a workaround, I changed the code to this:

{color:#cc7832}case {color}timestampValue: Timestamp

[jira] [Updated] (SPARK-47542) spark cannot hit oracle's index when column type is DATE

2024-03-25 Thread Danke Liu (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-47542?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Danke Liu updated SPARK-47542:
--
Description: 
When I use spark's jdbc to pull data from oracle, it will not hit the index if 
the pushed filter's type in oralce is DATE.

Here is my scenario:

first I created  a dataframe that reads from oracle:

val df = spark.read.format("jdbc").                                             
                           
          option("url", url).                                                   
                       
          option("driver", driver).                                             
                       
          option("user", user).                                                 
                  
          option("password", passwd).                                           
                                                      
          option("dbtable", "select * from foobar.tbl1") 
          .load()

then I apply a filter to the dataframe like this:

df.filter("""`update_time` >= to_date('2024-03-12 06:18:17', '-MM-dd 
HH:mm:ss')  """).count()

this will not hit the index on update_time column.

 

Reason：

The update_time column in Oracle is of type DATE, which is mapped to Timestamp 
in Spark (because the precision of DATE in Oracle is second). When I push a 
filter to Oracle, it triggers the following code in 
org.apache.spark.sql.jdbc.OracleDialect:

// class is org.apache.spark.sql.jdbc.OracleDialect

override def compileValue(value: Any): Any = value match

{     // The JDBC drivers support date literals in SQL statements written in 
the     // format:

{d '-mm-dd'}

and timestamp literals in SQL statements written
    // in the format: \{ts '-mm-dd hh:mm:ss.f...'}. For details, see
    // 'Oracle Database JDBC Developer’s Guide and Reference, 11g Release 1 
(11.1)'
    // Appendix A Reference Information.
    case stringValue: String => s"'${escapeSql(stringValue)}'"
    case timestampValue: Timestamp => "\{ts '" + timestampValue + "'}"
    case dateValue: Date => "\{d '" + dateValue + "'}"
    case arrayValue: Array[Any] => arrayValue.map(compileValue).mkString(", ")
    case _ => value
  }

 

As a result, the condition "update_time >= \{ts '2024-03-12 06:18:17'} will 
never hit the index.

In my case, as a workaround, I changed the code to this:

{color:#cc7832}case {color}timestampValue: Timestamp 
=>{color:#6a8759}s"{color}{color:#6a8759}to_date({color}{{color:#9876aa}dateFormat.format(timestampValue)},'-MM-dd
 HH:mi:ss')"{color}

 

After this modification, it worked well.

 

 

 

 

  was:
When I use spark's jdbc to pull data from oracle, it will not hit the index if 
the pushed filter's type in oralce is DATE.

Here is my scenario:

first I created  a dataframe that reads from oracle:

val df = spark.read.format("jdbc").                                             
                           
          option("url", url).                                                   
                       
          option("driver", driver).                                             
                       
          option("user", user).                                                 
                  
          option("password", passwd).                                           
                                                      
          option("dbtable", "select * from foobar.tbl1") 
          .load()

then I apply a filter to the dataframe like this:

df.filter("""`update_time` >= to_date('2024-03-12 06:18:17', '-MM-dd 
HH:mm:ss')  """).count()

this will not hit the index on update_time column.

 

Reason：

The update_time column in Oracle is of type DATE, which is mapped to Timestamp 
in Spark (because the precision of DATE in Oracle is second). When I push a 
filter to Oracle, it triggers the following code in 
org.apache.spark.sql.jdbc.OracleDialect:

// class is org.apache.spark.sql.jdbc.OracleDialect

override def compileValue(value: Any): Any = value match

{     // The JDBC drivers support date literals in SQL statements written in 
the     // format:

{d '-mm-dd'}

and timestamp literals in SQL statements written
    // in the format: \{ts '-mm-dd hh:mm:ss.f...'}. For details, see
    // 'Oracle Database JDBC Developer’s Guide and Reference, 11g Release 1 
(11.1)'
    // Appendix A Reference Information.
    case stringValue: String => s"'${escapeSql(stringValue)}'"
    case timestampValue: Timestamp => "\{ts '" + timestampValue + "'}"
    case dateValue: Date => "\{d '" + dateValue + "'}"
    case arrayValue: Array[Any] => arrayValue.map(compileValue).mkString(", ")
    case _ => value
  }

 

As a result, the condition "update_time >= \{ts '2024-03-12 06:18:17'} will 
never hit the index.

In my case, as a workaround, I changed the code to this:

{color:#cc7832}case {color}timestampValue: Timestamp

[jira] [Updated] (SPARK-47542) spark cannot hit oracle's index when column type is DATE

2024-03-25 Thread Danke Liu (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-47542?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Danke Liu updated SPARK-47542:
--
Description: 
When I use spark's jdbc to pull data from oracle, it will not hit the index if 
the pushed filter's type in oralce is DATE.

Here is my scenario:

first I created  a dataframe that reads from oracle:

val df = spark.read.format("jdbc").                                             
                           
          option("url", url).                                                   
                       
          option("driver", driver).                                             
                       
          option("user", user).                                                 
                  
          option("password", passwd).                                           
                                                      
          option("dbtable", "select * from foobar.tbl1") 
          .load()

then I apply a filter to the dataframe like this:

df.filter("""`update_time` >= to_date('2024-03-12 06:18:17', '-MM-dd 
HH:mm:ss')  """).count()

this will not hit the index on update_time column.

 

Reason：

The update_time column in Oracle is of type DATE, which is mapped to Timestamp 
in Spark (because the precision of DATE in Oracle is second). When I push a 
filter to Oracle, it triggers the following code in 
org.apache.spark.sql.jdbc.OracleDialect:

// class is org.apache.spark.sql.jdbc.OracleDialect

override def compileValue(value: Any): Any = value match

{     // The JDBC drivers support date literals in SQL statements written in 
the     // format:

{d '-mm-dd'}

and timestamp literals in SQL statements written
    // in the format: \{ts '-mm-dd hh:mm:ss.f...'}. For details, see
    // 'Oracle Database JDBC Developer’s Guide and Reference, 11g Release 1 
(11.1)'
    // Appendix A Reference Information.
    case stringValue: String => s"'${escapeSql(stringValue)}'"
    case timestampValue: Timestamp => "\{ts '" + timestampValue + "'}"
    case dateValue: Date => "\{d '" + dateValue + "'}"
    case arrayValue: Array[Any] => arrayValue.map(compileValue).mkString(", ")
    case _ => value
  }

 

As a result, the condition "update_time >= \{ts '2024-03-12 06:18:17'} will 
never hit the index.

In my case, as a workaround, I changed the code to this:

{color:#cc7832}case {color}timestampValue: Timestamp 
=>{color:#6a8759}s"{color}{color:#6a8759}to_date({color}

{
{color:#9876aa}

dateFormat.format(timestampValue)},'-MM-dd HH:mi:ss')"{color}

 

then it worked well.

 

 

 

 

  was:
When I use spark's jdbc to pull data from oracle, it will not hit the index if 
the pushed filter's type in oralce is DATE.

Here is my scenario:

first I created  a dataframe that reads from oracle:

val df = spark.read.format("jdbc").                                             
                           
          option("url", url).                                                   
                       
          option("driver", driver).                                             
                       
          option("user", user).                                                 
                  
          option("password", passwd).                                           
                                                      
          option("dbtable", "select * from foobar.tbl1") 
          .load()

then I apply a filter to the dataframe like this:

df.filter("""`update_time` >= to_date('2024-03-12 06:18:17', '-MM-dd 
HH:mm:ss')  """).count()

this will not hit the index on update_time column.

 

Reason：

The update_time column in Oracle is of type DATE, which is mapped to Timestamp 
in Spark (because the precision of DATE in Oracle is second). When I push a 
filter to Oracle, it triggers the following code in 
org.apache.spark.sql.jdbc.OracleDialect:

// class is org.apache.spark.sql.jdbc.OracleDialect

override def compileValue(value: Any): Any = value match

{     // The JDBC drivers support date literals in SQL statements written in 
the     // format: 

{d '-mm-dd'}

and timestamp literals in SQL statements written
    // in the format: \{ts '-mm-dd hh:mm:ss.f...'}. For details, see
    // 'Oracle Database JDBC Developer’s Guide and Reference, 11g Release 1 
(11.1)'
    // Appendix A Reference Information.
    case stringValue: String => s"'${escapeSql(stringValue)}'"
    case timestampValue: Timestamp => "\{ts '" + timestampValue + "'}"
    case dateValue: Date => "\{d '" + dateValue + "'}"
    case arrayValue: Array[Any] => arrayValue.map(compileValue).mkString(", ")
    case _ => value
  }

 

and this "update_time >= \{ts '2024-03-12 06:18:17'}" will never hit the index.

In my case, as a work around, I just change the code to this:

{color:#cc7832}case {color}timestampValue: Timestamp

[jira] [Updated] (SPARK-47542) spark cannot hit oracle's index when column type is DATE

2024-03-25 Thread Danke Liu (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-47542?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Danke Liu updated SPARK-47542:
--
Description: 
When I use spark's jdbc to pull data from oracle, it will not hit the index if 
the pushed filter's type in oralce is DATE.

Here is my scenario:

first I created  a dataframe that reads from oracle:

val df = spark.read.format("jdbc").                                             
                           
          option("url", url).                                                   
                       
          option("driver", driver).                                             
                       
          option("user", user).                                                 
                  
          option("password", passwd).                                           
                                                      
          option("dbtable", "select * from foobar.tbl1") 
          .load()

then I apply a filter to the dataframe like this:

df.filter("""`update_time` >= to_date('2024-03-12 06:18:17', '-MM-dd 
HH:mm:ss')  """).count()

this will not hit the index on update_time column.

 

Reason：

The update_time column in Oracle is of type DATE, which is mapped to Timestamp 
in Spark (because the precision of DATE in Oracle is second). When I push a 
filter to Oracle, it triggers the following code in 
org.apache.spark.sql.jdbc.OracleDialect:

// class is org.apache.spark.sql.jdbc.OracleDialect

override def compileValue(value: Any): Any = value match

{     // The JDBC drivers support date literals in SQL statements written in 
the     // format: 

{d '-mm-dd'}

and timestamp literals in SQL statements written
    // in the format: \{ts '-mm-dd hh:mm:ss.f...'}. For details, see
    // 'Oracle Database JDBC Developer’s Guide and Reference, 11g Release 1 
(11.1)'
    // Appendix A Reference Information.
    case stringValue: String => s"'${escapeSql(stringValue)}'"
    case timestampValue: Timestamp => "\{ts '" + timestampValue + "'}"
    case dateValue: Date => "\{d '" + dateValue + "'}"
    case arrayValue: Array[Any] => arrayValue.map(compileValue).mkString(", ")
    case _ => value
  }

 

and this "update_time >= \{ts '2024-03-12 06:18:17'}" will never hit the index.

In my case, as a work around, I just change the code to this:

{color:#cc7832}case {color}timestampValue: Timestamp 
=>{color:#6a8759}s"{color}{color:#6a8759}to_date({color}

{\\{color:#9876aa}

dateFormat.format(timestampValue)},'-MM-dd HH:mi:ss'){color:#6a8759}"{color}

 

then it worked well.

 

 

 

 

  was:
When I use spark's jdbc to pull data from oracle, it will not hit the index if 
the pushed filter's type in oralce is DATE.

Here is my scenario:

first I created  a dataframe that reads from oracle:

val df = spark.read.format("jdbc").                                             
                           
          option("url", url).                                                   
                       
          option("driver", driver).                                             
                       
          option("user", user).                                                 
                  
          option("password", passwd).                                           
                                                      
          option("dbtable", "select * from foobar.tbl1") 
          .load()

then I apply a filter to the dataframe like this:

df.filter("""`update_time` >= to_date('2024-03-12 06:18:17', '-MM-dd 
HH:mm:ss')  """).count()

this will not hit the index on update_time column.

 

Reason：

the update_time column in oracle is DATE type, this mapped to spark has became 
Timestamp(because precision of DATE in oracle is second), and when I pushed a 
filter to oracle, it will hit the codes bellow:

// class is org.apache.spark.sql.jdbc.OracleDialect

override def compileValue(value: Any): Any = value match

{     // The JDBC drivers support date literals in SQL statements written in 
the     // format: \\{d '-mm-dd'}

and timestamp literals in SQL statements written
    // in the format: \{ts '-mm-dd hh:mm:ss.f...'}. For details, see
    // 'Oracle Database JDBC Developer’s Guide and Reference, 11g Release 1 
(11.1)'
    // Appendix A Reference Information.
    case stringValue: String => s"'${escapeSql(stringValue)}'"
    case timestampValue: Timestamp => "\{ts '" + timestampValue + "'}"
    case dateValue: Date => "\{d '" + dateValue + "'}"
    case arrayValue: Array[Any] => arrayValue.map(compileValue).mkString(", ")
    case _ => value
  }

 

and this "update_time >= \{ts '2024-03-12 06:18:17'}" will never hit the index.

In my case, as a work around, I just change the code to this:

{color:#cc7832}case {color}timestampValue: Timestamp

[jira] [Updated] (SPARK-47542) spark cannot hit oracle's index when column type is DATE

2024-03-25 Thread Danke Liu (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-47542?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Danke Liu updated SPARK-47542:
--
Description: 
When I use spark's jdbc to pull data from oracle, it will not hit the index if 
the pushed filter's type in oralce is DATE.

Here is my scenario:

first I created  a dataframe that reads from oracle:

val df = spark.read.format("jdbc").                                             
                           
          option("url", url).                                                   
                       
          option("driver", driver).                                             
                       
          option("user", user).                                                 
                  
          option("password", passwd).                                           
                                                      
          option("dbtable", "select * from foobar.tbl1") 
          .load()

then I apply a filter to the dataframe like this:

df.filter("""`update_time` >= to_date('2024-03-12 06:18:17', '-MM-dd 
HH:mm:ss')  """).count()

this will not hit the index on update_time column.

 

Reason：

the update_time column in oracle is DATE type, this mapped to spark has became 
Timestamp(because precision of DATE in oracle is second), and when I pushed a 
filter to oracle, it will hit the codes bellow:

// class is org.apache.spark.sql.jdbc.OracleDialect

override def compileValue(value: Any): Any = value match

{     // The JDBC drivers support date literals in SQL statements written in 
the     // format: \\{d '-mm-dd'}

and timestamp literals in SQL statements written
    // in the format: \{ts '-mm-dd hh:mm:ss.f...'}. For details, see
    // 'Oracle Database JDBC Developer’s Guide and Reference, 11g Release 1 
(11.1)'
    // Appendix A Reference Information.
    case stringValue: String => s"'${escapeSql(stringValue)}'"
    case timestampValue: Timestamp => "\{ts '" + timestampValue + "'}"
    case dateValue: Date => "\{d '" + dateValue + "'}"
    case arrayValue: Array[Any] => arrayValue.map(compileValue).mkString(", ")
    case _ => value
  }

 

and this "update_time >= \{ts '2024-03-12 06:18:17'}" will never hit the index.

In my case, as a work around, I just change the code to this:

{color:#cc7832}case {color}timestampValue: Timestamp 
=>{color:#6a8759}s"{color}{color:#6a8759}to_date({color}{\{color:#9876aa}dateFormat.format(timestampValue)},'-MM-dd
 HH:mi:ss'){color:#6a8759}"{color}

 

then it worked well.

 

 

 

 

  was:
When I use spark's jdbc to pull data from oracle, it will not hit the index if 
the pushed filter's type in oralce is DATE.

Here is my scenario:

first I created  a dataframe that read from oracle:

val df = spark.read.format("jdbc").                                             
                           
          option("url", url).                                                   
                       
          option("driver", driver).                                             
                       
          option("user", user).                                                 
                  
          option("password", passwd).                                           
                                                      
          option("dbtable", "select * from foobar.tbl1") 
          .load()

then I pushed a filter to the dataframe like this:

df.filter("""`update_time` >= to_date('2024-03-12 06:18:17', '-MM-dd 
HH:mm:ss')  """).count()

this will not hit the index on update_time column.

 

Reason：

the update_time column in oracle is DATE type, this mapped to spark has became 
Timestamp(because precision of DATE in oracle is second), and when I pushed a 
filter to oracle, it will hit the codes bellow:

// class is org.apache.spark.sql.jdbc.OracleDialect

override def compileValue(value: Any): Any = value match {
    // The JDBC drivers support date literals in SQL statements written in the
    // format: \{d '-mm-dd'} and timestamp literals in SQL statements 
written
    // in the format: \{ts '-mm-dd hh:mm:ss.f...'}. For details, see
    // 'Oracle Database JDBC Developer’s Guide and Reference, 11g Release 1 
(11.1)'
    // Appendix A Reference Information.
    case stringValue: String => s"'${escapeSql(stringValue)}'"
    case timestampValue: Timestamp => "\{ts '" + timestampValue + "'}"
    case dateValue: Date => "\{d '" + dateValue + "'}"
    case arrayValue: Array[Any] => arrayValue.map(compileValue).mkString(", ")
    case _ => value
  }

 

and this "update_time >= \{ts '2024-03-12 06:18:17'}" will never hit the index.

In my case, as a work around, I just change the code to this:

{color:#cc7832}case {color}timestampValue: Timestamp

[jira] [Updated] (SPARK-47542) spark cannot hit oracle's index when column type is DATE

[jira] [Updated] (SPARK-47542) spark cannot hit oracle's index when column type is DATE

[jira] [Updated] (SPARK-47542) spark cannot hit oracle's index when column type is DATE

[jira] [Updated] (SPARK-47542) spark cannot hit oracle's index when column type is DATE

[jira] [Updated] (SPARK-47542) spark cannot hit oracle's index when column type is DATE

[jira] [Updated] (SPARK-47542) spark cannot hit oracle's index when column type is DATE

[jira] [Updated] (SPARK-47542) spark cannot hit oracle's index when column type is DATE

7 matches

Site Navigation

Mail list logo

Footer information