Re: Why Apache Spark doesn't use Calcite?

2020-01-13 Thread Matei Zaharia
I’m pretty sure that Catalyst was built before Calcite, or at least in parallel. Calcite 1.0 was only released in 2015. From a technical standpoint, building Catalyst in Scala also made it more concise and easier to extend than an optimizer written in Java (you can find various presentations

Re: Why Apache Spark doesn't use Calcite?

2020-01-13 Thread Michael Mior
It's fairly common for adapters (Calcite's abstraction of a data source) to push down predicates. However, the API certainly looks a lot different than Catalyst's. -- Michael Mior mm...@apache.org Le lun. 13 janv. 2020 à 09:45, Jason Nerothin a écrit : > > The implementation they chose supports

Re: Why Apache Spark doesn't use Calcite?

2020-01-13 Thread Jason Nerothin
The implementation they chose supports push down predicates, Datasets and other features that are not available in Calcite: https://databricks.com/glossary/catalyst-optimizer On Mon, Jan 13, 2020 at 8:24 AM newroyker wrote: > Was there a qualitative or quantitative benchmark done before a

Why Apache Spark doesn't use Calcite?

2020-01-13 Thread newroyker
Was there a qualitative or quantitative benchmark done before a design decision was made not to use Calcite? Are there limitations (for heuristic based, cost based, * aware optimizer) in Calcite, and frameworks built on top of Calcite? In the context of big data / TCPH benchmarks. I was unable

Reading 7z file in spark

2020-01-13 Thread HARSH TAKKAR
Hi, Is it possible to read 7z compressed file in spark? Kind Regards Harsh Takkar