[
https://issues.apache.org/jira/browse/SPARK-47542?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Danke Liu updated SPARK-47542:
------------------------------
Description:
When I use spark's jdbc to pull data from oracle, it will not hit the index if
the pushed filter's type in oralce is DATE.
Here is my scenario:
first I created a dataframe that reads from oracle:
val df = spark.read.format("jdbc").
option("url", url).
option("driver", driver).
option("user", user).
option("password", passwd).
option("dbtable", "select * from foobar.tbl1")
.load()
then I apply a filter to the dataframe like this:
df.filter("""`update_time` >= to_date('2024-03-12 06:18:17', 'yyyy-MM-dd
HH:mm:ss') """).count()
this will not hit the index on update_time column.
Reason:
The update_time column in Oracle is of type DATE, which is mapped to Timestamp
in Spark (because the precision of DATE in Oracle is second). When I push a
filter to Oracle, it triggers the following code in
org.apache.spark.sql.jdbc.OracleDialect:
override def compileValue(value: Any): Any = value match
{ // The JDBC drivers support date literals in SQL statements written in
the // format:
{d 'yyyy-mm-dd'}
and timestamp literals in SQL statements written
// in the format: \{ts 'yyyy-mm-dd hh:mm:ss.f...'}. For details, see
// 'Oracle Database JDBC Developer’s Guide and Reference, 11g Release 1
(11.1)'
// Appendix A Reference Information.
case stringValue: String => s"'${escapeSql(stringValue)}'"
case timestampValue: Timestamp => "\{ts '" + timestampValue + "'}"
case dateValue: Date => "\{d '" + dateValue + "'}"
case arrayValue: Array[Any] => arrayValue.map(compileValue).mkString(", ")
case _ => value
}
As a result, the condition "update_time >= \{ts '2024-03-12 06:18:17'} will
never hit the index.
In my case, as a workaround, I changed the code to this:
{color:#cc7832}case {color}timestampValue: Timestamp
=>{color:#6a8759}s"{color}{color:#6a8759}to_date({color}{\{color:#9876aa}dateFormat.format(timestampValue)},'yyyy-MM-dd
HH:mi:ss')"
After this modification, it worked well.
was:
When I use spark's jdbc to pull data from oracle, it will not hit the index if
the pushed filter's type in oralce is DATE.
Here is my scenario:
first I created a dataframe that reads from oracle:
val df = spark.read.format("jdbc").
option("url", url).
option("driver", driver).
option("user", user).
option("password", passwd).
option("dbtable", "select * from foobar.tbl1")
.load()
then I apply a filter to the dataframe like this:
df.filter("""`update_time` >= to_date('2024-03-12 06:18:17', 'yyyy-MM-dd
HH:mm:ss') """).count()
this will not hit the index on update_time column.
Reason:
The update_time column in Oracle is of type DATE, which is mapped to Timestamp
in Spark (because the precision of DATE in Oracle is second). When I push a
filter to Oracle, it triggers the following code in
org.apache.spark.sql.jdbc.OracleDialect:
// class is org.apache.spark.sql.jdbc.OracleDialect
override def compileValue(value: Any): Any = value match
{ // The JDBC drivers support date literals in SQL statements written in
the // format:
{d 'yyyy-mm-dd'}
and timestamp literals in SQL statements written
// in the format: \{ts 'yyyy-mm-dd hh:mm:ss.f...'}. For details, see
// 'Oracle Database JDBC Developer’s Guide and Reference, 11g Release 1
(11.1)'
// Appendix A Reference Information.
case stringValue: String => s"'${escapeSql(stringValue)}'"
case timestampValue: Timestamp => "\{ts '" + timestampValue + "'}"
case dateValue: Date => "\{d '" + dateValue + "'}"
case arrayValue: Array[Any] => arrayValue.map(compileValue).mkString(", ")
case _ => value
}
As a result, the condition "update_time >= \{ts '2024-03-12 06:18:17'} will
never hit the index.
In my case, as a workaround, I changed the code to this:
{color:#cc7832}case {color}timestampValue: Timestamp
=>{color:#6a8759}s"{color}{color:#6a8759}to_date({color}{{color:#9876aa}dateFormat.format(timestampValue)},'yyyy-MM-dd
HH:mi:ss')"{color}
After this modification, it worked well.
> spark cannot hit oracle's index when column type is DATE
> --------------------------------------------------------
>
> Key: SPARK-47542
> URL: https://issues.apache.org/jira/browse/SPARK-47542
> Project: Spark
> Issue Type: Bug
> Components: Spark Core
> Affects Versions: 3.2.4
> Reporter: Danke Liu
> Priority: Minor
>
> When I use spark's jdbc to pull data from oracle, it will not hit the index
> if the pushed filter's type in oralce is DATE.
> Here is my scenario:
> first I created a dataframe that reads from oracle:
> val df = spark.read.format("jdbc").
>
> option("url", url).
>
> option("driver", driver).
>
> option("user", user).
>
> option("password", passwd).
>
> option("dbtable", "select * from foobar.tbl1")
> .load()
> then I apply a filter to the dataframe like this:
> df.filter("""`update_time` >= to_date('2024-03-12 06:18:17', 'yyyy-MM-dd
> HH:mm:ss') """).count()
> this will not hit the index on update_time column.
>
> Reason:
> The update_time column in Oracle is of type DATE, which is mapped to
> Timestamp in Spark (because the precision of DATE in Oracle is second). When
> I push a filter to Oracle, it triggers the following code in
> org.apache.spark.sql.jdbc.OracleDialect:
>
> override def compileValue(value: Any): Any = value match
> { // The JDBC drivers support date literals in SQL statements written in
> the // format:
> {d 'yyyy-mm-dd'}
> and timestamp literals in SQL statements written
> // in the format: \{ts 'yyyy-mm-dd hh:mm:ss.f...'}. For details, see
> // 'Oracle Database JDBC Developer’s Guide and Reference, 11g Release 1
> (11.1)'
> // Appendix A Reference Information.
> case stringValue: String => s"'${escapeSql(stringValue)}'"
> case timestampValue: Timestamp => "\{ts '" + timestampValue + "'}"
> case dateValue: Date => "\{d '" + dateValue + "'}"
> case arrayValue: Array[Any] => arrayValue.map(compileValue).mkString(", ")
> case _ => value
> }
>
> As a result, the condition "update_time >= \{ts '2024-03-12 06:18:17'} will
> never hit the index.
> In my case, as a workaround, I changed the code to this:
> {color:#cc7832}case {color}timestampValue: Timestamp
> =>{color:#6a8759}s"{color}{color:#6a8759}to_date({color}{\{color:#9876aa}dateFormat.format(timestampValue)},'yyyy-MM-dd
> HH:mi:ss')"
>
> After this modification, it worked well.
>
>
>
>
--
This message was sent by Atlassian Jira
(v8.20.10#820010)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]