Re: Why Apache Spark doesn't use Calcite?

2020-01-13 Thread Matei Zaharia
I’m pretty sure that Catalyst was built before Calcite, or at least in 
parallel. Calcite 1.0 was only released in 2015. From a technical standpoint, 
building Catalyst in Scala also made it more concise and easier to extend than 
an optimizer written in Java (you can find various presentations about how 
Catalyst works).

Matei

> On Jan 13, 2020, at 8:41 AM, Michael Mior  wrote:
> 
> It's fairly common for adapters (Calcite's abstraction of a data
> source) to push down predicates. However, the API certainly looks a
> lot different than Catalyst's.
> --
> Michael Mior
> mm...@apache.org
> 
> Le lun. 13 janv. 2020 à 09:45, Jason Nerothin
>  a écrit :
>> 
>> The implementation they chose supports push down predicates, Datasets and 
>> other features that are not available in Calcite:
>> 
>> https://databricks.com/glossary/catalyst-optimizer
>> 
>> On Mon, Jan 13, 2020 at 8:24 AM newroyker  wrote:
>>> 
>>> Was there a qualitative or quantitative benchmark done before a design
>>> decision was made not to use Calcite?
>>> 
>>> Are there limitations (for heuristic based, cost based, * aware optimizer)
>>> in Calcite, and frameworks built on top of Calcite? In the context of big
>>> data / TCPH benchmarks.
>>> 
>>> I was unable to dig up anything concrete from user group / Jira. Appreciate
>>> if any Catalyst veteran here can give me pointers. Trying to defend
>>> Spark/Catalyst.
>>> 
>>> 
>>> 
>>> 
>>> 
>>> --
>>> Sent from: http://apache-spark-user-list.1001560.n3.nabble.com/
>>> 
>>> -
>>> To unsubscribe e-mail: user-unsubscr...@spark.apache.org
>>> 
>> 
>> 
>> --
>> Thanks,
>> Jason
> 
> -
> To unsubscribe e-mail: user-unsubscr...@spark.apache.org
> 


-
To unsubscribe e-mail: user-unsubscr...@spark.apache.org



Re: Why Apache Spark doesn't use Calcite?

2020-01-13 Thread Michael Mior
It's fairly common for adapters (Calcite's abstraction of a data
source) to push down predicates. However, the API certainly looks a
lot different than Catalyst's.
--
Michael Mior
mm...@apache.org

Le lun. 13 janv. 2020 à 09:45, Jason Nerothin
 a écrit :
>
> The implementation they chose supports push down predicates, Datasets and 
> other features that are not available in Calcite:
>
> https://databricks.com/glossary/catalyst-optimizer
>
> On Mon, Jan 13, 2020 at 8:24 AM newroyker  wrote:
>>
>> Was there a qualitative or quantitative benchmark done before a design
>> decision was made not to use Calcite?
>>
>> Are there limitations (for heuristic based, cost based, * aware optimizer)
>> in Calcite, and frameworks built on top of Calcite? In the context of big
>> data / TCPH benchmarks.
>>
>> I was unable to dig up anything concrete from user group / Jira. Appreciate
>> if any Catalyst veteran here can give me pointers. Trying to defend
>> Spark/Catalyst.
>>
>>
>>
>>
>>
>> --
>> Sent from: http://apache-spark-user-list.1001560.n3.nabble.com/
>>
>> -
>> To unsubscribe e-mail: user-unsubscr...@spark.apache.org
>>
>
>
> --
> Thanks,
> Jason

-
To unsubscribe e-mail: user-unsubscr...@spark.apache.org



Re: Why Apache Spark doesn't use Calcite?

2020-01-13 Thread Jason Nerothin
The implementation they chose supports push down predicates, Datasets and
other features that are not available in Calcite:

https://databricks.com/glossary/catalyst-optimizer

On Mon, Jan 13, 2020 at 8:24 AM newroyker  wrote:

> Was there a qualitative or quantitative benchmark done before a design
> decision was made not to use Calcite?
>
> Are there limitations (for heuristic based, cost based, * aware optimizer)
> in Calcite, and frameworks built on top of Calcite? In the context of big
> data / TCPH benchmarks.
>
> I was unable to dig up anything concrete from user group / Jira. Appreciate
> if any Catalyst veteran here can give me pointers. Trying to defend
> Spark/Catalyst.
>
>
>
>
>
> --
> Sent from: http://apache-spark-user-list.1001560.n3.nabble.com/
>
> -
> To unsubscribe e-mail: user-unsubscr...@spark.apache.org
>
>

-- 
Thanks,
Jason


Why Apache Spark doesn't use Calcite?

2020-01-13 Thread newroyker
Was there a qualitative or quantitative benchmark done before a design
decision was made not to use Calcite? 

Are there limitations (for heuristic based, cost based, * aware optimizer)
in Calcite, and frameworks built on top of Calcite? In the context of big
data / TCPH benchmarks.

I was unable to dig up anything concrete from user group / Jira. Appreciate
if any Catalyst veteran here can give me pointers. Trying to defend
Spark/Catalyst.





--
Sent from: http://apache-spark-user-list.1001560.n3.nabble.com/

-
To unsubscribe e-mail: user-unsubscr...@spark.apache.org



Reading 7z file in spark

2020-01-13 Thread HARSH TAKKAR
Hi,


Is it possible to read 7z compressed file in spark?


Kind Regards
Harsh Takkar