Re: [DISCUSS][Catalog API] Deprecate 4 Catalog API that takes two parameters which are (dbName, tableName/functionName)

2022-07-08 Thread Rui Wang
Yes. The current goal is a pure educational deprecation. So given the proposal: 1. existing users or users who do not care about catalog names in table identifiers can still use all the API that maintain their past behavior. 2. new users who intend to use table identifiers with catalog names get

Re: [DISCUSS][Catalog API] Deprecate 4 Catalog API that takes two parameters which are (dbName, tableName/functionName)

2022-07-08 Thread Wenchen Fan
It's better to keep all APIs working. But in this case, I really have no idea how to make these 4 APIs reasonable. For example, tableExists(dbName: String, tableName: String) currently checks if table "dbName.tableName" exists in the Hive metastore, and does not work with v2 catalogs at all. It's

Re: [DISCUSS] Deprecate Trigger.Once and promote Trigger.AvailableNow

2022-07-08 Thread Adam Binford
Dang I was hoping it was the second one. In our case the data is too large to run the whole backfill for the aggregation in a single batch (the shuffle is too big). We currently resort to manually batching (i.e. not streaming) the backlog (anything older than the watermark) when we need to

Re: [DISCUSS] Deprecate Trigger.Once and promote Trigger.AvailableNow

2022-07-08 Thread Jungtaek Lim
Thanks for the input, Adam! Replying inline. On Fri, Jul 8, 2022 at 8:48 PM Adam Binford wrote: > We use Trigger.Once a lot, usually for backfilling data for new streams. I > feel like I could see a continuing use case for "ignore trigger limits for > this batch" (ignoring the whole issue with

Re: [DISCUSS] Deprecate Trigger.Once and promote Trigger.AvailableNow

2022-07-08 Thread Adam Binford
We use Trigger.Once a lot, usually for backfilling data for new streams. I feel like I could see a continuing use case for "ignore trigger limits for this batch" (ignoring the whole issue with re-running the last failed batch vs a new batch), but we haven't actually been able to upgrade yet and

Re: [DISCUSS] Deprecate Trigger.Once and promote Trigger.AvailableNow

2022-07-08 Thread Jungtaek Lim
Bump to get a chance to expose the proposal to wider audiences. Given that there are not many active contributors/maintainers in area Structured Streaming, I'd consider the discussion as "lazy consensus" to avoid being stuck. I'll give a final reminder early next week, and move forward if there

Re: Apache Spark 3.2.2 Release?

2022-07-08 Thread Dongjoon Hyun
Thank you so much! :) Dongjoon. On Thu, Jul 7, 2022 at 6:51 PM Joshua Rosen wrote: > > +1; thanks for coordinating this! > > I have a few more correctness bugs to add to the list in your original email > (these were originally missing the 'correctness' JIRA label): > > -

Re: [DISCUSS][Catalog API] Deprecate 4 Catalog API that takes two parameters which are (dbName, tableName/functionName)

2022-07-08 Thread Dongjoon Hyun
Thank you for starting the official discussion, Rui. 'Unneeded API' doesn't sound like a good frame for this discussion because it ignores the existing users and codes completely. Technically, the above mentioned reasons look irrelevant to any specific existing bugs or future maintenance cost