External hive metastore (remote) managed tables

2020-05-28 Thread Debajyoti Roy
Hi, anyone knows the behavior of dropping managed tables in case of external hive meta store: Deletion of the data (e.g. from object store) happens from Spark sql or, the external hive metastore ? Confused by local mode and remote mode codes.

Re: Why Apache Spark doesn't use Calcite?

2020-01-15 Thread Debajyoti Roy
> also can improve the existing CBO and make it more general. The paper of > Spark SQL was published 5 years ago. A lot of great contributions were made > in the past 5 years. > > Cheers, > > Xiao > > Debajyoti Roy 于2020年1月15日周三 上午9:23写道: > >> Thanks all, and Matei

Re: Why Apache Spark doesn't use Calcite?

2020-01-15 Thread Debajyoti Roy
Thanks all, and Matei. TL;DR of the conclusion for my particular case: Qualitatively, while Catalyst[1] tries to mitigate learning curve and maintenance burden, it lacks the dynamic programming approach used by Calcite[2] and risks falling into local minima. Quantitatively, there is no

Spark Dataset transformations for time based events

2018-12-25 Thread Debajyoti Roy
Hope everyone is enjoying their holidays. If anyone here ran into these time based event transformation patterns or have a strong opinion about the approach please let me know / reply in SO: 1. Enrich using as-of-time:

Given events with start and end times, how to count the number of simultaneous events using Spark?

2018-09-26 Thread Debajyoti Roy
The problem statement and an approach to solve it using windows is described here: https://stackoverflow.com/questions/52509498/given-events-with-start-and-end-times-how-to-count-the-number-of-simultaneous-e Looking for more elegant/performant solutions, if they exist. TIA !