Re: [DISCUSS] Support temporary tables in SQL API

Dawid Wysakowicz Wed, 24 Jul 2019 03:04:56 -0700

Hi all,

Thank you Xuefu for clarifying your opinion. Now we have 3 votes for
both of the options. To conclude this discussion I am willing to change
my vote to option 3 as I had only a slight preference towards option 2.


Therefore the final results of the poll are as follows:

/option #2: 2 votes(Kurt, Aljoscha)/

/option #3: 4 votes(Timo, Jingsong, Xuefu, me)/

I will prepare appropriate PRs according to the decision (unless
somebody objects). We will revisit the long-term solution in a separate
thread as part of the 1.10 release after 1.9 is released.

Thank you all for your opinions!

Best,

Dawid

On 24/07/2019 09:35, Aljoscha Krettek wrote:
> Isn’t https://issues.apache.org/jira/browse/FLINK-13279 
> <https://issues.apache.org/jira/browse/FLINK-13279> already a sign that there 
> are surprises for users if we go with option #3?
>
> Aljoscha
>
>> On 24. Jul 2019, at 00:33, Xuefu Z <usxu...@gmail.com> wrote:
>>
>> I favored #3 if that wasn't obvious.
>>
>> Usability issue with #2 makes Hive  too hard to use. #3 keeps the old
>> behavior for existing users who don't have Hive and thus there is only one,
>> in-memory catalog. If a user does register Hive, he/she understands that
>> there are multiple catalogs and that fully qualified table name is
>> necessary. Thus, #3 has no impact (and no surprises) for existing users,
>> and new requirement of fully qualified names is for only for users of the
>> new feature (multiple catalogs), which seems very natural.
>>
>> Thanks,
>> Xuefu
>>
>> On Tue, Jul 23, 2019 at 5:47 AM Dawid Wysakowicz <dwysakow...@apache.org 
>> <mailto:dwysakow...@apache.org>>
>> wrote:
>>
>>> I think we all agree so far that we should implement one of the short term
>>> solutions for 1.9 release (#2 or #3) and continue the discussion on option
>>> #1 for the next release. Personally I prefer option #2, because it is
>>> closest to the current behavior and as Kurt said it is the most intuitive
>>> one, but I am also fine with option #3
>>>
>>> To sum up the opinions so far:
>>>
>>> *option #2: 3 votes(Kurt, Aljoscha, me)*
>>>
>>> *option #3: 2 votes(Timo, Jingsong)*
>>>
>>> I wasn't sure which option out of the two Xuefu prefers.
>>>
>>> I would like to conclude the discussion by the end of tomorrow, so that we
>>> can prepare a proper fix as soon as possible. Therefore I would suggest to
>>> proceed with the option that gets the most votes until tomorrow (*July
>>> 24th 12:00 CET*), unless there are some hard objections.
>>>
>>>
>>> Comment on option #1 concerns:
>>>
>>> I agree with Jingsong on that. I think there are some benefits of the
>>> approach, as it makes Flink in control of the temporary tables.
>>>
>>> 1. We have a unified behavior across all catalogs. Also for the catalogs
>>> that do not support temporary tables natively.
>>>
>>> 2. As Flink is in control of the temporary tables it makes it easier to
>>> control their lifecycle.
>>>
>>> Best,
>>>
>>> Dawid
>>> On 23/07/2019 11:40, JingsongLee wrote:
>>>
>>> And I think we should recommend user to use catalog api to
>>> createTable and createFunction,(I guess most scenarios do
>>> not use temporary objects) in this way, it is good to option #3
>>>
>>> Best, JingsongLee
>>>
>>>
>>> ------------------------------------------------------------------
>>> From:JingsongLee <lzljs3620...@aliyun.com.INVALID 
>>> <mailto:lzljs3620...@aliyun.com.INVALID>> <lzljs3620...@aliyun.com.INVALID 
>>> <mailto:lzljs3620...@aliyun.com.INVALID>>
>>> Send Time:2019年7月23日(星期二) 17:35
>>> To:dev <dev@flink.apache.org <mailto:dev@flink.apache.org>> 
>>> <dev@flink.apache.org <mailto:dev@flink.apache.org>>
>>> Subject:Re: [DISCUSS] Support temporary tables in SQL API
>>>
>>> Thanks Dawid and other people.
>>> +1 for using option #3 for 1.9.0 and go with option #1
>>> in 1.10.0.
>>>
>>> Regarding Xuefu's concern, I don't know how necessary it is for each 
>>> catalog to
>>> deal with tmpView. I think Catalog is different from DB, we can have single 
>>> concept for tmpView, that make user easier to understand.
>>>
>>> Regarding option #2, It is hard to use if we let user to use fully 
>>> qualified name for hive catalog. Would this experience be too bad to use?
>>>
>>> Best, Jingsong Lee
>>>
>>>
>>> ------------------------------------------------------------------
>>> From:Kurt Young <ykt...@gmail.com <mailto:ykt...@gmail.com>> 
>>> <ykt...@gmail.com <mailto:ykt...@gmail.com>>
>>> Send Time:2019年7月23日(星期二) 17:03
>>> To:dev <dev@flink.apache.org <mailto:dev@flink.apache.org>> 
>>> <dev@flink.apache.org <mailto:dev@flink.apache.org>>
>>> Subject:Re: [DISCUSS] Support temporary tables in SQL API
>>>
>>> Thanks Dawid for driving this discussion.
>>> Personally, I would +1 for using option #2 for 1.9.0 and go with option #1
>>> in 1.10.0.
>>>
>>> Regarding Xuefu's concern about option #1, I think we could also try to
>>> reuse the in-memory catalog
>>> for the builtin temporary table storage.
>>>
>>> Regarding to option #2 and option #3, from user's perspective, IIUC option
>>> #2 allows user to have
>>> simple name to reference temporary table and should use fully qualified
>>> name for external catalogs.
>>> But option #3 provide the opposite behavior, user can use simple name for
>>> external tables after he
>>> changed current catalog and current database, but have to use fully
>>> qualified name for temporary
>>> tables. IMO, option #2 will be more straightforward.
>>>
>>> Best,
>>> Kurt
>>>
>>>
>>> On Tue, Jul 23, 2019 at 4:01 PM Aljoscha Krettek <aljos...@apache.org 
>>> <mailto:aljos...@apache.org>> <aljos...@apache.org 
>>> <mailto:aljos...@apache.org>>
>>> wrote:
>>>
>>>
>>> I would be fine with option 3) but I think option 2) is the more implicit
>>> solution that has less surprising behaviour.
>>>
>>> Aljoscha
>>>
>>>
>>> On 22. Jul 2019, at 23:59, Xuefu Zhang <xu...@apache.org 
>>> <mailto:xu...@apache.org>> <xu...@apache.org <mailto:xu...@apache.org>> 
>>> wrote:
>>>
>>> Thanks to Dawid for initiating the discussion. Overall, I agree with Timo
>>> that for 1.9 we should have some quick and simple solution, leaving time
>>> for more thorough discussions for 1.10.
>>>
>>> In particular, I'm not fully with solution #1. For one thing, it seems
>>> proposing storing all temporary objects in a memory map in
>>>
>>> CatalogManager,
>>>
>>> and the memory map duplicates the functionality of the in-memory catalog,
>>> which also store temporary objects. For another, as pointed out by the
>>> google doc, different db may handle the temporary tables differently, and
>>> accordingly it may make more sense to let each catalog to handle its
>>> temporary objects.
>>>
>>> Therefore, postponing the fix buys us time to flush out all the details.
>>>
>>> Thanks,
>>> Xuefu
>>>
>>> On Mon, Jul 22, 2019 at 7:19 AM Timo Walther <twal...@apache.org 
>>> <mailto:twal...@apache.org>> <twal...@apache.org 
>>> <mailto:twal...@apache.org>> wrote:
>>>
>>>
>>> Thanks for summarizing our offline discussion Dawid! Even though I would
>>> prefer solution 1 instead of releasing half-baked features, I also
>>> understand that the Table API should not further block the next release.
>>> Therefore, I would be fine with solution 3 but introduce the new
>>> user-facing `createTemporaryTable` methods as synonyms of the existing
>>> ones already. This allows us to deprecate the methods with undefined
>>> behavior as early as possible.
>>>
>>> Thanks,
>>> Timo
>>>
>>>
>>> Am 22.07.19 um 16:13 schrieb Dawid Wysakowicz:
>>>
>>> Hi all,
>>>
>>> When working on FLINK-13279[1] we realized we could benefit from a
>>> better temporary objects support in the Catalog API/Table API.
>>> Unfortunately we are already long past the feature freeze that's why I
>>> wanted to get some opinions from the community how should we proceed
>>> with this topic. I tried to prepare a summary of the current state and
>>>
>>> 3
>>>
>>> different suggested approaches that we could take. Please see the
>>> attached document[2]
>>>
>>> I will appreciate your thoughts!
>>>
>>>
>>> [1] https://issues.apache.org/jira/browse/FLINK-13279
>>>
>>> [2]
>>>
>>>
>>> https://docs.google.com/document/d/1RxLj4tDB9GXVjF5qrkM38SKUPkvJt_BSefGYTQ-cVX4/edit?usp=sharing
>>>
>>>
>> -- 
>> Xuefu Zhang
>>
>> "In Honey We Trust!"
>

signature.asc
Description: OpenPGP digital signature

Re: [DISCUSS] Support temporary tables in SQL API

Reply via email to