Re: Spark on Oracle available as an Apache licensed open source repo

Harish Butani Fri, 14 Jan 2022 06:22:57 -0800

Look at the pushdown plans for all the TPCDS queries here 
<https://github.com/oracle/spark-oracle/wiki/TPCDS-Queries>
We push Joins, Aggregates, Windowing etc, as I said we can do complete pushdown 
of 95 of 99 TPCDS queries.
The Generic JDBC Datasource push single table scans, filters and partial 
aggregates. In that case a lot of data is moved from the Oracle instance to 
Spark, during query execution.


Beyond this, the SQL Macro 
<https://github.com/oracle/spark-oracle/wiki/Spark_SQL_macros> feature can 
translate certain kinds of UDFs to Oracle expressions, which again avoids a lot 
of data movement because instead of UDF
 execution happening in Spark an equivalent Oracle expression is evaluated in 
Oracle.

This works on-premise Oracle, currently tested on 19c.

regards,
Harish.

> On Jan 14, 2022, at 2:51 AM, Mich Talebzadeh <mich.talebza...@gmail.com> 
> wrote:
> 
> Hello,
> 
> Thanks for this info.
> 
> Have you tested this feature on Oracle on-premise say, 11c, 12c besides ADW 
> in Cloud?
> 
> I can see the transactional feature useful in terms of commit/rollback to 
> Oracle but I cannot figure out the performance gains in your blog etc.
> 
> My concern is we currently connect to Oracle as well as many other JDBC 
> compliant databases  through Spark generic JDBC connections 
> <https://spark.apache.org/docs/latest/sql-data-sources-jdbc.html> with the 
> same look and feel. Unless there is an overriding reason, I don't  see why 
> there is a need to switch to this feature.
> 
> 
> Cheers
> 
>    view my Linkedin profile 
> <https://www.linkedin.com/in/mich-talebzadeh-ph-d-5205b2/>
>  
> Disclaimer: Use it at your own risk. Any and all responsibility for any loss, 
> damage or destruction of data or any other property which may arise from 
> relying on this email's technical content is explicitly disclaimed. The 
> author will in no case be liable for any monetary damages arising from such 
> loss, damage or destruction.
>  
> 
> 
> On Fri, 14 Jan 2022 at 00:50, Harish Butani <rhbutani.sp...@gmail.com 
> <mailto:rhbutani.sp...@gmail.com>> wrote:
> Spark on Oracle is now available as an open source Apache licensed github 
> repo <https://github.com/oracle/spark-oracle>. Build and deploy it as an 
> extension jar in your Spark clusters.
> 
> Use it to combine Apache Spark programs with data in your existing Oracle 
> databases without expensive data copying or query time data movement. 
> 
> The core capability is Optimizer extensions that collapse SQL operator 
> sub-graphs to an OraScan that executes equivalent SQL in Oracle. Physical 
> plan parallelism  
> <https://github.com/oracle/spark-oracle/wiki/Query-Splitting>can be 
> controlled to split Spark tasks to operate on Oracle data block ranges, or on 
> resultset pages or on table partitions.
> 
> We pushdown large parts of Spark SQL to Oracle, for example 95 of 99 TPCDS 
> queries are completely pushed to Oracle. 
> <https://github.com/oracle/spark-oracle/wiki/TPCDS-Queries>
> 
> With Spark SQL macros 
> <https://github.com/oracle/spark-oracle/wiki/Spark_SQL_macros>  you can write 
> custom Spark UDFs that get translated and pushed as Oracle SQL expressions. 
> 
> With DML pushdown <https://github.com/oracle/spark-oracle/wiki/DML-Support> 
> inserts in Spark SQL get pushed as transactionally consistent inserts/updates 
> on Oracle tables.
> 
> See Quick Start Guide 
> <https://github.com/oracle/spark-oracle/wiki/Quick-Start-Guide>  on how to 
> set up an Oracle free tier ADW instance, load it with TPCDS data and try out 
> the Spark on Oracle Demo <https://github.com/oracle/spark-oracle/wiki/Demo>  
> on your Spark cluster. 
> 
> More  details can be found in our blog 
> <https://hbutani.github.io/blogs/blog/Spark_on_Oracle_Blog.html> and the 
> project wiki. <https://github.com/oracle/spark-oracle/wiki>
> 
> regards,
> Harish Butani

Re: Spark on Oracle available as an Apache licensed open source repo

Reply via email to