Re: [DISCUSS] FLIP-221 Abstraction for lookup source cache and metric

Qingsheng Ren Tue, 21 Jun 2022 01:09:34 -0700

Hi devs,

I’d like to push FLIP-221 forward a little bit. Recently we had some offline 
discussions and updated the FLIP. Here’s the diff compared to the previous 
version:


1. (Async)LookupFunctionProvider is designed as a base interface for 
constructing lookup functions.
2. From the LookupFunction we extend PartialCaching / FullCachingLookupProvider 
for partial and full caching mode.
3. Introduce CacheReloadTrigger for specifying reload stratrgy in full caching 
mode, and provide 2 default implementations (Periodic / TimedCacheReloadTrigger)

Looking forward to your replies~

Best,
Qingsheng

> On Jun 2, 2022, at 17:15, Qingsheng Ren <[email protected]> wrote:
> 
> Hi Becket,
> 
> Thanks for your feedback!
> 
> 1. An alternative way is to let the implementation of cache to decide
> whether to store a missing key in the cache instead of the framework.
> This sounds more reasonable and makes the LookupProvider interface
> cleaner. I can update the FLIP and clarify in the JavaDoc of
> LookupCache#put that the cache should decide whether to store an empty
> collection.
> 
> 2. Initially the builder pattern is for the extensibility of
> LookupProvider interfaces that we could need to add more
> configurations in the future. We can remove the builder now as we have
> resolved the issue in 1. As for the builder in DefaultLookupCache I
> prefer to keep it because we have a lot of arguments in the
> constructor.
> 
> 3. I think this might overturn the overall design. I agree with
> Becket's idea that the API design should be layered considering
> extensibility and it'll be great to have one unified interface
> supporting both partial, full and even mixed custom strategies, but we
> have some issues to resolve. The original purpose of treating full
> caching separately is that we'd like to reuse the ability of
> ScanRuntimeProvider. Developers just need to hand over Source /
> SourceFunction / InputFormat so that the framework could be able to
> compose the underlying topology and control the reload (maybe in a
> distributed way). Under your design we leave the reload operation
> totally to the CacheStrategy and I think it will be hard for
> developers to reuse the source in the initializeCache method.
> 
> Best regards,
> 
> Qingsheng
> 
> On Thu, Jun 2, 2022 at 1:50 PM Becket Qin <[email protected]> wrote:
>> 
>> Thanks for updating the FLIP, Qingsheng. A few more comments:
>> 
>> 1. I am still not sure about what is the use case for cacheMissingKey().
>> More specifically, when would users want to have getCache() return a
>> non-empty value and cacheMissingKey() returns false?
>> 
>> 2. The builder pattern. Usually the builder pattern is used when there are
>> a lot of variations of constructors. For example, if a class has three
>> variables and all of them are optional, so there could potentially be many
>> combinations of the variables. But in this FLIP, I don't see such case.
>> What is the reason we have builders for all the classes?
>> 
>> 3. Should the caching strategy be excluded from the top level provider API?
>> Technically speaking, the Flink framework should only have two interfaces
>> to deal with:
>>    A) LookupFunction
>>    B) AsyncLookupFunction
>> Orthogonally, we *believe* there are two different strategies people can do
>> caching. Note that the Flink framework does not care what is the caching
>> strategy here.
>>    a) partial caching
>>    b) full caching
>> 
>> Putting them together, we end up with 3 combinations that we think are
>> valid:
>>     Aa) PartialCachingLookupFunctionProvider
>>     Ba) PartialCachingAsyncLookupFunctionProvider
>>     Ab) FullCachingLookupFunctionProvider
>> 
>> However, the caching strategy could actually be quite flexible. E.g. an
>> initial full cache load followed by some partial updates. Also, I am not
>> 100% sure if the full caching will always use ScanTableSource. Including
>> the caching strategy in the top level provider API would make it harder to
>> extend.
>> 
>> One possible solution is to just have *LookupFunctionProvider* and
>> *AsyncLookupFunctionProvider
>> *as the top level API, both with a getCacheStrategy() method returning an
>> optional CacheStrategy. The CacheStrategy class would have the following
>> methods:
>> 1. void open(Context), the context exposes some of the resources that may
>> be useful for the the caching strategy, e.g. an ExecutorService that is
>> synchronized with the data processing, or a cache refresh trigger which
>> blocks data processing and refresh the cache.
>> 2. void initializeCache(), a blocking method allows users to pre-populate
>> the cache before processing any data if they wish.
>> 3. void maybeCache(RowData key, Collection<RowData> value), blocking or
>> non-blocking method.
>> 4. void refreshCache(), a blocking / non-blocking method that is invoked by
>> the Flink framework when the cache refresh trigger is pulled.
>> 
>> In the above design, partial caching and full caching would be
>> implementations of the CachingStrategy. And it is OK for users to implement
>> their own CachingStrategy if they want to.
>> 
>> Thanks,
>> 
>> Jiangjie (Becket) Qin
>> 
>> 
>> On Thu, Jun 2, 2022 at 12:14 PM Jark Wu <[email protected]> wrote:
>> 
>>> Thank Qingsheng for the detailed summary and updates,
>>> 
>>> The changes look good to me in general. I just have one minor improvement
>>> comment.
>>> Could we add a static util method to the "FullCachingReloadTrigger"
>>> interface for quick usage?
>>> 
>>> #periodicReloadAtFixedRate(Duration)
>>> #periodicReloadWithFixedDelay(Duration)
>>> 
>>> I think we can also do this for LookupCache. Because users may not know
>>> where is the default
>>> implementations and how to use them.
>>> 
>>> Best,
>>> Jark
>>> 
>>> 
>>> 
>>> 
>>> 
>>> 
>>> On Wed, 1 Jun 2022 at 18:32, Qingsheng Ren <[email protected]> wrote:
>>> 
>>>> Hi Jingsong,
>>>> 
>>>> Thanks for your comments!
>>>> 
>>>>> AllCache definition is not flexible, for example, PartialCache can use
>>>> any custom storage, while the AllCache can not, AllCache can also be
>>>> considered to store memory or disk, also need a flexible strategy.
>>>> 
>>>> We had an offline discussion with Jark and Leonard. Basically we think
>>>> exposing the interface of full cache storage to connector developers
>>> might
>>>> limit our future optimizations. The storage of full caching shouldn’t
>>> have
>>>> too many variations for different lookup tables so making it pluggable
>>>> might not help a lot. Also I think it is not quite easy for connector
>>>> developers to implement such an optimized storage. We can keep optimizing
>>>> this storage in the future and all full caching lookup tables would
>>> benefit
>>>> from this.
>>>> 
>>>>> We are more inclined to deprecate the connector `async` option when
>>>> discussing FLIP-234. Can we remove this option from this FLIP?
>>>> 
>>>> Thanks for the reminder! This option has been removed in the latest
>>>> version.
>>>> 
>>>> Best regards,
>>>> 
>>>> Qingsheng
>>>> 
>>>> 
>>>>> On Jun 1, 2022, at 15:28, Jingsong Li <[email protected]> wrote:
>>>>> 
>>>>> Thanks Alexander for your reply. We can discuss the new interface when
>>> it
>>>>> comes out.
>>>>> 
>>>>> We are more inclined to deprecate the connector `async` option when
>>>>> discussing FLIP-234 [1]. We should use hint to let planner decide.
>>>>> Although the discussion has not yet produced a conclusion, can we
>>> remove
>>>>> this option from this FLIP? It doesn't seem to be related to this FLIP,
>>>> but
>>>>> more to FLIP-234, and we can form a conclusion over there.
>>>>> 
>>>>> [1] https://lists.apache.org/thread/9k1sl2519kh2n3yttwqc00p07xdfns3h
>>>>> 
>>>>> Best,
>>>>> Jingsong
>>>>> 
>>>>> On Wed, Jun 1, 2022 at 4:59 AM Jing Ge <[email protected]> wrote:
>>>>> 
>>>>>> Hi Jark,
>>>>>> 
>>>>>> Thanks for clarifying it. It would be fine. as long as we could
>>> provide
>>>> the
>>>>>> no-cache solution. I was just wondering if the client side cache could
>>>>>> really help when HBase is used, since the data to look up should be
>>>> huge.
>>>>>> Depending how much data will be cached on the client side, the data
>>> that
>>>>>> should be lru in e.g. LruBlockCache will not be lru anymore. In the
>>>> worst
>>>>>> case scenario, once the cached data at client side is expired, the
>>>> request
>>>>>> will hit disk which will cause extra latency temporarily, if I am not
>>>>>> mistaken.
>>>>>> 
>>>>>> Best regards,
>>>>>> Jing
>>>>>> 
>>>>>> On Mon, May 30, 2022 at 9:59 AM Jark Wu <[email protected]> wrote:
>>>>>> 
>>>>>>> Hi Jing Ge,
>>>>>>> 
>>>>>>> What do you mean about the "impact on the block cache used by HBase"?
>>>>>>> In my understanding, the connector cache and HBase cache are totally
>>>> two
>>>>>>> things.
>>>>>>> The connector cache is a local/client cache, and the HBase cache is a
>>>>>>> server cache.
>>>>>>> 
>>>>>>>> does it make sense to have a no-cache solution as one of the
>>>>>>> default solutions so that customers will have no effort for the
>>>> migration
>>>>>>> if they want to stick with Hbase cache
>>>>>>> 
>>>>>>> The implementation migration should be transparent to users. Take the
>>>>>> HBase
>>>>>>> connector as
>>>>>>> an example,  it already supports lookup cache but is disabled by
>>>> default.
>>>>>>> After migration, the
>>>>>>> connector still disables cache by default (i.e. no-cache solution).
>>> No
>>>>>>> migration effort for users.
>>>>>>> 
>>>>>>> HBase cache and connector cache are two different things. HBase cache
>>>>>> can't
>>>>>>> simply replace
>>>>>>> connector cache. Because one of the most important usages for
>>> connector
>>>>>>> cache is reducing
>>>>>>> the I/O request/response and improving the throughput, which can
>>>> achieve
>>>>>>> by just using a server cache.
>>>>>>> 
>>>>>>> Best,
>>>>>>> Jark
>>>>>>> 
>>>>>>> 
>>>>>>> 
>>>>>>> 
>>>>>>> On Fri, 27 May 2022 at 22:42, Jing Ge <[email protected]> wrote:
>>>>>>> 
>>>>>>>> Thanks all for the valuable discussion. The new feature looks very
>>>>>>>> interesting.
>>>>>>>> 
>>>>>>>> According to the FLIP description: "*Currently we have JDBC, Hive
>>> and
>>>>>>> HBase
>>>>>>>> connector implemented lookup table source. All existing
>>>> implementations
>>>>>>>> will be migrated to the current design and the migration will be
>>>>>>>> transparent to end users*." I was only wondering if we should pay
>>>>>>> attention
>>>>>>>> to HBase and similar DBs. Since, commonly, the lookup data will be
>>>> huge
>>>>>>>> while using HBase, partial caching will be used in this case, if I
>>> am
>>>>>> not
>>>>>>>> mistaken, which might have an impact on the block cache used by
>>> HBase,
>>>>>>> e.g.
>>>>>>>> LruBlockCache.
>>>>>>>> Another question is that, since HBase provides a sophisticated cache
>>>>>>>> solution, does it make sense to have a no-cache solution as one of
>>> the
>>>>>>>> default solutions so that customers will have no effort for the
>>>>>> migration
>>>>>>>> if they want to stick with Hbase cache?
>>>>>>>> 
>>>>>>>> Best regards,
>>>>>>>> Jing
>>>>>>>> 
>>>>>>>> On Fri, May 27, 2022 at 11:19 AM Jingsong Li <
>>> [email protected]>
>>>>>>>> wrote:
>>>>>>>> 
>>>>>>>>> Hi all,
>>>>>>>>> 
>>>>>>>>> I think the problem now is below:
>>>>>>>>> 1. AllCache and PartialCache interface on the non-uniform, one
>>> needs
>>>>>> to
>>>>>>>>> provide LookupProvider, the other needs to provide CacheBuilder.
>>>>>>>>> 2. AllCache definition is not flexible, for example, PartialCache
>>> can
>>>>>>> use
>>>>>>>>> any custom storage, while the AllCache can not, AllCache can also
>>> be
>>>>>>>>> considered to store memory or disk, also need a flexible strategy.
>>>>>>>>> 3. AllCache can not customize ReloadStrategy, currently only
>>>>>>>>> ScheduledReloadStrategy.
>>>>>>>>> 
>>>>>>>>> In order to solve the above problems, the following are my ideas.
>>>>>>>>> 
>>>>>>>>> ## Top level cache interfaces:
>>>>>>>>> 
>>>>>>>>> ```
>>>>>>>>> 
>>>>>>>>> public interface CacheLookupProvider extends
>>>>>>>>> LookupTableSource.LookupRuntimeProvider {
>>>>>>>>> 
>>>>>>>>>   CacheBuilder createCacheBuilder();
>>>>>>>>> }
>>>>>>>>> 
>>>>>>>>> 
>>>>>>>>> public interface CacheBuilder {
>>>>>>>>>   Cache create();
>>>>>>>>> }
>>>>>>>>> 
>>>>>>>>> 
>>>>>>>>> public interface Cache {
>>>>>>>>> 
>>>>>>>>>   /**
>>>>>>>>>    * Returns the value associated with key in this cache, or null
>>>>>> if
>>>>>>>>> there is no cached value for
>>>>>>>>>    * key.
>>>>>>>>>    */
>>>>>>>>>   @Nullable
>>>>>>>>>   Collection<RowData> getIfPresent(RowData key);
>>>>>>>>> 
>>>>>>>>>   /** Returns the number of key-value mappings in the cache. */
>>>>>>>>>   long size();
>>>>>>>>> }
>>>>>>>>> 
>>>>>>>>> ```
>>>>>>>>> 
>>>>>>>>> ## Partial cache
>>>>>>>>> 
>>>>>>>>> ```
>>>>>>>>> 
>>>>>>>>> public interface PartialCacheLookupFunction extends
>>>>>>> CacheLookupProvider {
>>>>>>>>> 
>>>>>>>>>   @Override
>>>>>>>>>   PartialCacheBuilder createCacheBuilder();
>>>>>>>>> 
>>>>>>>>> /** Creates an {@link LookupFunction} instance. */
>>>>>>>>> LookupFunction createLookupFunction();
>>>>>>>>> }
>>>>>>>>> 
>>>>>>>>> 
>>>>>>>>> public interface PartialCacheBuilder extends CacheBuilder {
>>>>>>>>> 
>>>>>>>>>   PartialCache create();
>>>>>>>>> }
>>>>>>>>> 
>>>>>>>>> 
>>>>>>>>> public interface PartialCache extends Cache {
>>>>>>>>> 
>>>>>>>>>   /**
>>>>>>>>>    * Associates the specified value rows with the specified key
>>> row
>>>>>>>>> in the cache. If the cache
>>>>>>>>>    * previously contained value associated with the key, the old
>>>>>>>>> value is replaced by the
>>>>>>>>>    * specified value.
>>>>>>>>>    *
>>>>>>>>>    * @return the previous value rows associated with key, or null
>>>>>> if
>>>>>>>>> there was no mapping for key.
>>>>>>>>>    * @param key - key row with which the specified value is to be
>>>>>>>>> associated
>>>>>>>>>    * @param value – value rows to be associated with the specified
>>>>>>> key
>>>>>>>>>    */
>>>>>>>>>   Collection<RowData> put(RowData key, Collection<RowData> value);
>>>>>>>>> 
>>>>>>>>>   /** Discards any cached value for the specified key. */
>>>>>>>>>   void invalidate(RowData key);
>>>>>>>>> }
>>>>>>>>> 
>>>>>>>>> ```
>>>>>>>>> 
>>>>>>>>> ## All cache
>>>>>>>>> ```
>>>>>>>>> 
>>>>>>>>> public interface AllCacheLookupProvider extends
>>> CacheLookupProvider {
>>>>>>>>> 
>>>>>>>>>   void registerReloadStrategy(ScheduledExecutorService
>>>>>>>>> executorService, Reloader reloader);
>>>>>>>>> 
>>>>>>>>>   ScanTableSource.ScanRuntimeProvider getScanRuntimeProvider();
>>>>>>>>> 
>>>>>>>>>   @Override
>>>>>>>>>   AllCacheBuilder createCacheBuilder();
>>>>>>>>> }
>>>>>>>>> 
>>>>>>>>> 
>>>>>>>>> public interface AllCacheBuilder extends CacheBuilder {
>>>>>>>>> 
>>>>>>>>>   AllCache create();
>>>>>>>>> }
>>>>>>>>> 
>>>>>>>>> 
>>>>>>>>> public interface AllCache extends Cache {
>>>>>>>>> 
>>>>>>>>>   void putAll(Iterator<Map<RowData, RowData>> allEntries);
>>>>>>>>> 
>>>>>>>>>   void clearAll();
>>>>>>>>> }
>>>>>>>>> 
>>>>>>>>> 
>>>>>>>>> public interface Reloader {
>>>>>>>>> 
>>>>>>>>>   void reload();
>>>>>>>>> }
>>>>>>>>> 
>>>>>>>>> ```
>>>>>>>>> 
>>>>>>>>> Best,
>>>>>>>>> Jingsong
>>>>>>>>> 
>>>>>>>>> On Fri, May 27, 2022 at 11:10 AM Jingsong Li <
>>> [email protected]
>>>>>>> 
>>>>>>>>> wrote:
>>>>>>>>> 
>>>>>>>>>> Thanks Qingsheng and all for your discussion.
>>>>>>>>>> 
>>>>>>>>>> Very sorry to jump in so late.
>>>>>>>>>> 
>>>>>>>>>> Maybe I missed something?
>>>>>>>>>> My first impression when I saw the cache interface was, why don't
>>>>>> we
>>>>>>>>>> provide an interface similar to guava cache [1], on top of guava
>>>>>>> cache,
>>>>>>>>>> caffeine also makes extensions for asynchronous calls.[2]
>>>>>>>>>> There is also the bulk load in caffeine too.
>>>>>>>>>> 
>>>>>>>>>> I am also more confused why first from LookupCacheFactory.Builder
>>>>>> and
>>>>>>>>> then
>>>>>>>>>> to Factory to create Cache.
>>>>>>>>>> 
>>>>>>>>>> [1] https://github.com/google/guava
>>>>>>>>>> [2] https://github.com/ben-manes/caffeine/wiki/Population
>>>>>>>>>> 
>>>>>>>>>> Best,
>>>>>>>>>> Jingsong
>>>>>>>>>> 
>>>>>>>>>> On Thu, May 26, 2022 at 11:17 PM Jark Wu <[email protected]>
>>> wrote:
>>>>>>>>>> 
>>>>>>>>>>> After looking at the new introduced ReloadTime and Becket's
>>>>>> comment,
>>>>>>>>>>> I agree with Becket we should have a pluggable reloading
>>> strategy.
>>>>>>>>>>> We can provide some common implementations, e.g., periodic
>>>>>>> reloading,
>>>>>>>>> and
>>>>>>>>>>> daily reloading.
>>>>>>>>>>> But there definitely be some connector- or business-specific
>>>>>>> reloading
>>>>>>>>>>> strategies, e.g.
>>>>>>>>>>> notify by a zookeeper watcher, reload once a new Hive partition
>>> is
>>>>>>>>>>> complete.
>>>>>>>>>>> 
>>>>>>>>>>> Best,
>>>>>>>>>>> Jark
>>>>>>>>>>> 
>>>>>>>>>>> On Thu, 26 May 2022 at 11:52, Becket Qin <[email protected]>
>>>>>>>> wrote:
>>>>>>>>>>> 
>>>>>>>>>>>> Hi Qingsheng,
>>>>>>>>>>>> 
>>>>>>>>>>>> Thanks for updating the FLIP. A few comments / questions below:
>>>>>>>>>>>> 
>>>>>>>>>>>> 1. Is there a reason that we have both "XXXFactory" and
>>>>>>>> "XXXProvider".
>>>>>>>>>>>> What is the difference between them? If they are the same, can
>>>>>> we
>>>>>>>> just
>>>>>>>>>>> use
>>>>>>>>>>>> XXXFactory everywhere?
>>>>>>>>>>>> 
>>>>>>>>>>>> 2. Regarding the FullCachingLookupProvider, should the reloading
>>>>>>>>> policy
>>>>>>>>>>>> also be pluggable? Periodical reloading could be sometimes be
>>>>>>> tricky
>>>>>>>>> in
>>>>>>>>>>>> practice. For example, if user uses 24 hours as the cache
>>>>>> refresh
>>>>>>>>>>> interval
>>>>>>>>>>>> and some nightly batch job delayed, the cache update may still
>>>>>> see
>>>>>>>> the
>>>>>>>>>>>> stale data.
>>>>>>>>>>>> 
>>>>>>>>>>>> 3. In DefaultLookupCacheFactory, it looks like InitialCapacity
>>>>>>>> should
>>>>>>>>> be
>>>>>>>>>>>> removed.
>>>>>>>>>>>> 
>>>>>>>>>>>> 4. The purpose of LookupFunctionProvider#cacheMissingKey()
>>>>>> seems a
>>>>>>>>>>> little
>>>>>>>>>>>> confusing to me. If Optional<LookupCacheFactory>
>>>>>> getCacheFactory()
>>>>>>>>>>> returns
>>>>>>>>>>>> a non-empty factory, doesn't that already indicates the
>>>>>> framework
>>>>>>> to
>>>>>>>>>>> cache
>>>>>>>>>>>> the missing keys? Also, why is this method returning an
>>>>>>>>>>> Optional<Boolean>
>>>>>>>>>>>> instead of boolean?
>>>>>>>>>>>> 
>>>>>>>>>>>> Thanks,
>>>>>>>>>>>> 
>>>>>>>>>>>> Jiangjie (Becket) Qin
>>>>>>>>>>>> 
>>>>>>>>>>>> 
>>>>>>>>>>>> 
>>>>>>>>>>>> On Wed, May 25, 2022 at 5:07 PM Qingsheng Ren <
>>>>>> [email protected]
>>>>>>>> 
>>>>>>>>>>> wrote:
>>>>>>>>>>>> 
>>>>>>>>>>>>> Hi Lincoln and Jark,
>>>>>>>>>>>>> 
>>>>>>>>>>>>> Thanks for the comments! If the community reaches a consensus
>>>>>>> that
>>>>>>>> we
>>>>>>>>>>> use
>>>>>>>>>>>>> SQL hint instead of table options to decide whether to use sync
>>>>>>> or
>>>>>>>>>>> async
>>>>>>>>>>>>> mode, it’s indeed not necessary to introduce the “lookup.async”
>>>>>>>>> option.
>>>>>>>>>>>>> 
>>>>>>>>>>>>> I think it’s a good idea to let the decision of async made on
>>>>>>> query
>>>>>>>>>>>>> level, which could make better optimization with more
>>>>>> infomation
>>>>>>>>>>> gathered
>>>>>>>>>>>>> by planner. Is there any FLIP describing the issue in
>>>>>>> FLINK-27625?
>>>>>>>> I
>>>>>>>>>>>>> thought FLIP-234 is only proposing adding SQL hint for retry on
>>>>>>>>> missing
>>>>>>>>>>>>> instead of the entire async mode to be controlled by hint.
>>>>>>>>>>>>> 
>>>>>>>>>>>>> Best regards,
>>>>>>>>>>>>> 
>>>>>>>>>>>>> Qingsheng
>>>>>>>>>>>>> 
>>>>>>>>>>>>>> On May 25, 2022, at 15:13, Lincoln Lee <
>>>>>> [email protected]
>>>>>>>> 
>>>>>>>>>>> wrote:
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> Hi Jark,
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> Thanks for your reply!
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> Currently 'lookup.async' just lies in HBase connector, I have
>>>>>>> no
>>>>>>>>> idea
>>>>>>>>>>>>>> whether or when to remove it (we can discuss it in another
>>>>>>> issue
>>>>>>>>> for
>>>>>>>>>>> the
>>>>>>>>>>>>>> HBase connector after FLINK-27625 is done), just not add it
>>>>>>> into
>>>>>>>> a
>>>>>>>>>>>>> common
>>>>>>>>>>>>>> option now.
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> Best,
>>>>>>>>>>>>>> Lincoln Lee
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> Jark Wu <[email protected]> 于2022年5月24日周二 20:14写道：
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>> Hi Lincoln,
>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>> I have taken a look at FLIP-234, and I agree with you that
>>>>>> the
>>>>>>>>>>>>> connectors
>>>>>>>>>>>>>>> can
>>>>>>>>>>>>>>> provide both async and sync runtime providers simultaneously
>>>>>>>>> instead
>>>>>>>>>>>>> of one
>>>>>>>>>>>>>>> of them.
>>>>>>>>>>>>>>> At that point, "lookup.async" looks redundant. If this
>>>>>> option
>>>>>>> is
>>>>>>>>>>>>> planned to
>>>>>>>>>>>>>>> be removed
>>>>>>>>>>>>>>> in the long term, I think it makes sense not to introduce it
>>>>>>> in
>>>>>>>>> this
>>>>>>>>>>>>> FLIP.
>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>> Best,
>>>>>>>>>>>>>>> Jark
>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>> On Tue, 24 May 2022 at 11:08, Lincoln Lee <
>>>>>>>> [email protected]
>>>>>>>>>> 
>>>>>>>>>>>>> wrote:
>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>> Hi Qingsheng,
>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>> Sorry for jumping into the discussion so late. It's a good
>>>>>>> idea
>>>>>>>>>>> that
>>>>>>>>>>>>> we
>>>>>>>>>>>>>>> can
>>>>>>>>>>>>>>>> have a common table option. I have a minor comments on
>>>>>>>>>>> 'lookup.async'
>>>>>>>>>>>>>>> that
>>>>>>>>>>>>>>>> not make it a common option:
>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>> The table layer abstracts both sync and async lookup
>>>>>>>>> capabilities,
>>>>>>>>>>>>>>>> connectors implementers can choose one or both, in the case
>>>>>>> of
>>>>>>>>>>>>>>> implementing
>>>>>>>>>>>>>>>> only one capability(status of the most of existing builtin
>>>>>>>>>>> connectors)
>>>>>>>>>>>>>>>> 'lookup.async' will not be used.  And when a connector has
>>>>>>> both
>>>>>>>>>>>>>>>> capabilities, I think this choice is more suitable for
>>>>>> making
>>>>>>>>>>>>> decisions
>>>>>>>>>>>>>>> at
>>>>>>>>>>>>>>>> the query level, for example, table planner can choose the
>>>>>>>>> physical
>>>>>>>>>>>>>>>> implementation of async lookup or sync lookup based on its
>>>>>>> cost
>>>>>>>>>>>>> model, or
>>>>>>>>>>>>>>>> users can give query hint based on their own better
>>>>>>>>>>> understanding.  If
>>>>>>>>>>>>>>>> there is another common table option 'lookup.async', it may
>>>>>>>>> confuse
>>>>>>>>>>>>> the
>>>>>>>>>>>>>>>> users in the long run.
>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>> So, I prefer to leave the 'lookup.async' option in private
>>>>>>>> place
>>>>>>>>>>> (for
>>>>>>>>>>>>> the
>>>>>>>>>>>>>>>> current hbase connector) and not turn it into a common
>>>>>>> option.
>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>> WDYT?
>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>> Best,
>>>>>>>>>>>>>>>> Lincoln Lee
>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>> Qingsheng Ren <[email protected]> 于2022年5月23日周一 14:54写道：
>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>> Hi Alexander,
>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>> Thanks for the review! We recently updated the FLIP and
>>>>>> you
>>>>>>>> can
>>>>>>>>>>> find
>>>>>>>>>>>>>>>> those
>>>>>>>>>>>>>>>>> changes from my latest email. Since some terminologies has
>>>>>>>>>>> changed so
>>>>>>>>>>>>>>>> I’ll
>>>>>>>>>>>>>>>>> use the new concept for replying your comments.
>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>> 1. Builder vs ‘of’
>>>>>>>>>>>>>>>>> I’m OK to use builder pattern if we have additional
>>>>>> optional
>>>>>>>>>>>>> parameters
>>>>>>>>>>>>>>>>> for full caching mode (“rescan” previously). The
>>>>>>>>>>> schedule-with-delay
>>>>>>>>>>>>>>> idea
>>>>>>>>>>>>>>>>> looks reasonable to me, but I think we need to redesign
>>>>>> the
>>>>>>>>>>> builder
>>>>>>>>>>>>> API
>>>>>>>>>>>>>>>> of
>>>>>>>>>>>>>>>>> full caching to make it more descriptive for developers.
>>>>>>> Would
>>>>>>>>> you
>>>>>>>>>>>>> mind
>>>>>>>>>>>>>>>>> sharing your ideas about the API? For accessing the FLIP
>>>>>>>>> workspace
>>>>>>>>>>>>> you
>>>>>>>>>>>>>>>> can
>>>>>>>>>>>>>>>>> just provide your account ID and ping any PMC member
>>>>>>> including
>>>>>>>>>>> Jark.
>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>> 2. Common table options
>>>>>>>>>>>>>>>>> We have some discussions these days and propose to
>>>>>>> introduce 8
>>>>>>>>>>> common
>>>>>>>>>>>>>>>>> table options about caching. It has been updated on the
>>>>>>> FLIP.
>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>> 3. Retries
>>>>>>>>>>>>>>>>> I think we are on the same page :-)
>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>> For your additional concerns:
>>>>>>>>>>>>>>>>> 1) The table option has been updated.
>>>>>>>>>>>>>>>>> 2) We got “lookup.cache” back for configuring whether to
>>>>>> use
>>>>>>>>>>> partial
>>>>>>>>>>>>> or
>>>>>>>>>>>>>>>>> full caching mode.
>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>> Best regards,
>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>> Qingsheng
>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>> On May 19, 2022, at 17:25, Александр Смирнов <
>>>>>>>>>>> [email protected]>
>>>>>>>>>>>>>>>>> wrote:
>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>> Also I have a few additions:
>>>>>>>>>>>>>>>>>> 1) maybe rename 'lookup.cache.maximum-size' to
>>>>>>>>>>>>>>>>>> 'lookup.cache.max-rows'? I think it will be more clear
>>>>>> that
>>>>>>>> we
>>>>>>>>>>> talk
>>>>>>>>>>>>>>>>>> not about bytes, but about the number of rows. Plus it
>>>>>> fits
>>>>>>>>> more,
>>>>>>>>>>>>>>>>>> considering my optimization with filters.
>>>>>>>>>>>>>>>>>> 2) How will users enable rescanning? Are we going to
>>>>>>> separate
>>>>>>>>>>>>> caching
>>>>>>>>>>>>>>>>>> and rescanning from the options point of view? Like
>>>>>>> initially
>>>>>>>>> we
>>>>>>>>>>> had
>>>>>>>>>>>>>>>>>> one option 'lookup.cache' with values LRU / ALL. I think
>>>>>>> now
>>>>>>>> we
>>>>>>>>>>> can
>>>>>>>>>>>>>>>>>> make a boolean option 'lookup.rescan'. RescanInterval can
>>>>>>> be
>>>>>>>>>>>>>>>>>> 'lookup.rescan.interval', etc.
>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>> Best regards,
>>>>>>>>>>>>>>>>>> Alexander
>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>> чт, 19 мая 2022 г. в 14:50, Александр Смирнов <
>>>>>>>>>>> [email protected]
>>>>>>>>>>>>>>>> :
>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>> Hi Qingsheng and Jark,
>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>> 1. Builders vs 'of'
>>>>>>>>>>>>>>>>>>> I understand that builders are used when we have
>>>>>> multiple
>>>>>>>>>>>>>>> parameters.
>>>>>>>>>>>>>>>>>>> I suggested them because we could add parameters later.
>>>>>> To
>>>>>>>>>>> prevent
>>>>>>>>>>>>>>>>>>> Builder for ScanRuntimeProvider from looking redundant I
>>>>>>> can
>>>>>>>>>>>>> suggest
>>>>>>>>>>>>>>>>>>> one more config now - "rescanStartTime".
>>>>>>>>>>>>>>>>>>> It's a time in UTC (LocalTime class) when the first
>>>>>> reload
>>>>>>>> of
>>>>>>>>>>> cache
>>>>>>>>>>>>>>>>>>> starts. This parameter can be thought of as
>>>>>> 'initialDelay'
>>>>>>>>> (diff
>>>>>>>>>>>>>>>>>>> between current time and rescanStartTime) in method
>>>>>>>>>>>>>>>>>>> ScheduleExecutorService#scheduleWithFixedDelay [1] . It
>>>>>>> can
>>>>>>>> be
>>>>>>>>>>> very
>>>>>>>>>>>>>>>>>>> useful when the dimension table is updated by some other
>>>>>>>>>>> scheduled
>>>>>>>>>>>>>>> job
>>>>>>>>>>>>>>>>>>> at a certain time. Or when the user simply wants a
>>>>>> second
>>>>>>>> scan
>>>>>>>>>>>>>>> (first
>>>>>>>>>>>>>>>>>>> cache reload) be delayed. This option can be used even
>>>>>>>> without
>>>>>>>>>>>>>>>>>>> 'rescanInterval' - in this case 'rescanInterval' will be
>>>>>>> one
>>>>>>>>>>> day.
>>>>>>>>>>>>>>>>>>> If you are fine with this option, I would be very glad
>>>>>> if
>>>>>>>> you
>>>>>>>>>>> would
>>>>>>>>>>>>>>>>>>> give me access to edit FLIP page, so I could add it
>>>>>> myself
>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>> 2. Common table options
>>>>>>>>>>>>>>>>>>> I also think that FactoryUtil would be overloaded by all
>>>>>>>> cache
>>>>>>>>>>>>>>>>>>> options. But maybe unify all suggested options, not only
>>>>>>> for
>>>>>>>>>>>>> default
>>>>>>>>>>>>>>>>>>> cache? I.e. class 'LookupOptions', that unifies default
>>>>>>>> cache
>>>>>>>>>>>>>>> options,
>>>>>>>>>>>>>>>>>>> rescan options, 'async', 'maxRetries'. WDYT?
>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>> 3. Retries
>>>>>>>>>>>>>>>>>>> I'm fine with suggestion close to
>>>>>>> RetryUtils#tryTimes(times,
>>>>>>>>>>> call)
>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>> [1]
>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>> 
>>>>>>>>>>>>> 
>>>>>>>>>>> 
>>>>>>>>> 
>>>>>>>> 
>>>>>>> 
>>>>>> 
>>>> 
>>> https://docs.oracle.com/javase/8/docs/api/java/util/concurrent/ScheduledExecutorService.html#scheduleWithFixedDelay-java.lang.Runnable-long-long-java.util.concurrent.TimeUnit-
>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>> Best regards,
>>>>>>>>>>>>>>>>>>> Alexander
>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>> ср, 18 мая 2022 г. в 16:04, Qingsheng Ren <
>>>>>>>> [email protected]
>>>>>>>>>> :
>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>> Hi Jark and Alexander,
>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>> Thanks for your comments! I’m also OK to introduce
>>>>>> common
>>>>>>>>> table
>>>>>>>>>>>>>>>>> options. I prefer to introduce a new
>>>>>>> DefaultLookupCacheOptions
>>>>>>>>>>> class
>>>>>>>>>>>>>>> for
>>>>>>>>>>>>>>>>> holding these option definitions because putting all
>>>>>> options
>>>>>>>>> into
>>>>>>>>>>>>>>>>> FactoryUtil would make it a bit ”crowded” and not well
>>>>>>>>>>> categorized.
>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>> FLIP has been updated according to suggestions above:
>>>>>>>>>>>>>>>>>>>> 1. Use static “of” method for constructing
>>>>>>>>>>> RescanRuntimeProvider
>>>>>>>>>>>>>>>>> considering both arguments are required.
>>>>>>>>>>>>>>>>>>>> 2. Introduce new table options matching
>>>>>>>>>>> DefaultLookupCacheFactory
>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>> Best,
>>>>>>>>>>>>>>>>>>>> Qingsheng
>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>> On Wed, May 18, 2022 at 2:57 PM Jark Wu <
>>>>>>> [email protected]>
>>>>>>>>>>> wrote:
>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>> Hi Alex,
>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>> 1) retry logic
>>>>>>>>>>>>>>>>>>>>> I think we can extract some common retry logic into
>>>>>>>>> utilities,
>>>>>>>>>>>>>>> e.g.
>>>>>>>>>>>>>>>>> RetryUtils#tryTimes(times, call).
>>>>>>>>>>>>>>>>>>>>> This seems independent of this FLIP and can be reused
>>>>>> by
>>>>>>>>>>>>>>> DataStream
>>>>>>>>>>>>>>>>> users.
>>>>>>>>>>>>>>>>>>>>> Maybe we can open an issue to discuss this and where
>>>>>> to
>>>>>>>> put
>>>>>>>>>>> it.
>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>> 2) cache ConfigOptions
>>>>>>>>>>>>>>>>>>>>> I'm fine with defining cache config options in the
>>>>>>>>> framework.
>>>>>>>>>>>>>>>>>>>>> A candidate place to put is FactoryUtil which also
>>>>>>>> includes
>>>>>>>>>>>>>>>>> "sink.parallelism", "format" options.
>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>> Best,
>>>>>>>>>>>>>>>>>>>>> Jark
>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>> On Wed, 18 May 2022 at 13:52, Александр Смирнов <
>>>>>>>>>>>>>>>> [email protected]>
>>>>>>>>>>>>>>>>> wrote:
>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>> Hi Qingsheng,
>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>> Thank you for considering my comments.
>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>> there might be custom logic before making retry,
>>>>>> such
>>>>>>> as
>>>>>>>>>>>>>>>>> re-establish the connection
>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>> Yes, I understand that. I meant that such logic can
>>>>>> be
>>>>>>>>>>> placed in
>>>>>>>>>>>>>>> a
>>>>>>>>>>>>>>>>>>>>>> separate function, that can be implemented by
>>>>>>> connectors.
>>>>>>>>>>> Just
>>>>>>>>>>>>>>>> moving
>>>>>>>>>>>>>>>>>>>>>> the retry logic would make connector's LookupFunction
>>>>>>>> more
>>>>>>>>>>>>>>> concise
>>>>>>>>>>>>>>>> +
>>>>>>>>>>>>>>>>>>>>>> avoid duplicate code. However, it's a minor change.
>>>>>> The
>>>>>>>>>>> decision
>>>>>>>>>>>>>>> is
>>>>>>>>>>>>>>>>> up
>>>>>>>>>>>>>>>>>>>>>> to you.
>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>> We decide not to provide common DDL options and let
>>>>>>>>>>> developers
>>>>>>>>>>>>>>> to
>>>>>>>>>>>>>>>>> define their own options as we do now per connector.
>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>> What is the reason for that? One of the main goals of
>>>>>>>> this
>>>>>>>>>>> FLIP
>>>>>>>>>>>>>>> was
>>>>>>>>>>>>>>>>> to
>>>>>>>>>>>>>>>>>>>>>> unify the configs, wasn't it? I understand that
>>>>>> current
>>>>>>>>> cache
>>>>>>>>>>>>>>>> design
>>>>>>>>>>>>>>>>>>>>>> doesn't depend on ConfigOptions, like was before. But
>>>>>>>> still
>>>>>>>>>>> we
>>>>>>>>>>>>>>> can
>>>>>>>>>>>>>>>>> put
>>>>>>>>>>>>>>>>>>>>>> these options into the framework, so connectors can
>>>>>>> reuse
>>>>>>>>>>> them
>>>>>>>>>>>>>>> and
>>>>>>>>>>>>>>>>>>>>>> avoid code duplication, and, what is more
>>>>>> significant,
>>>>>>>>> avoid
>>>>>>>>>>>>>>>> possible
>>>>>>>>>>>>>>>>>>>>>> different options naming. This moment can be pointed
>>>>>>> out
>>>>>>>> in
>>>>>>>>>>>>>>>>>>>>>> documentation for connector developers.
>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>> Best regards,
>>>>>>>>>>>>>>>>>>>>>> Alexander
>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>> вт, 17 мая 2022 г. в 17:11, Qingsheng Ren <
>>>>>>>>>>> [email protected]>:
>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>> Hi Alexander,
>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>> Thanks for the review and glad to see we are on the
>>>>>>> same
>>>>>>>>>>> page!
>>>>>>>>>>>>> I
>>>>>>>>>>>>>>>>> think you forgot to cc the dev mailing list so I’m also
>>>>>>>> quoting
>>>>>>>>>>> your
>>>>>>>>>>>>>>>> reply
>>>>>>>>>>>>>>>>> under this email.
>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>> We can add 'maxRetryTimes' option into this class
>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>> In my opinion the retry logic should be implemented
>>>>>> in
>>>>>>>>>>> lookup()
>>>>>>>>>>>>>>>>> instead of in LookupFunction#eval(). Retrying is only
>>>>>>>> meaningful
>>>>>>>>>>>>> under
>>>>>>>>>>>>>>>> some
>>>>>>>>>>>>>>>>> specific retriable failures, and there might be custom
>>>>>> logic
>>>>>>>>>>> before
>>>>>>>>>>>>>>>> making
>>>>>>>>>>>>>>>>> retry, such as re-establish the connection
>>>>>>>>>>> (JdbcRowDataLookupFunction
>>>>>>>>>>>>>>> is
>>>>>>>>>>>>>>>> an
>>>>>>>>>>>>>>>>> example), so it's more handy to leave it to the connector.
>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>> I don't see DDL options, that were in previous
>>>>>>> version
>>>>>>>> of
>>>>>>>>>>>>> FLIP.
>>>>>>>>>>>>>>>> Do
>>>>>>>>>>>>>>>>> you have any special plans for them?
>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>> We decide not to provide common DDL options and let
>>>>>>>>>>> developers
>>>>>>>>>>>>>>> to
>>>>>>>>>>>>>>>>> define their own options as we do now per connector.
>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>> The rest of comments sound great and I’ll update the
>>>>>>>> FLIP.
>>>>>>>>>>> Hope
>>>>>>>>>>>>>>> we
>>>>>>>>>>>>>>>>> can finalize our proposal soon!
>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>> Best,
>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>> Qingsheng
>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>> On May 17, 2022, at 13:46, Александр Смирнов <
>>>>>>>>>>>>>>>> [email protected]>
>>>>>>>>>>>>>>>>> wrote:
>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>> Hi Qingsheng and devs!
>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>> I like the overall design of updated FLIP, however
>>>>>> I
>>>>>>>> have
>>>>>>>>>>>>>>> several
>>>>>>>>>>>>>>>>>>>>>>>> suggestions and questions.
>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>> 1) Introducing LookupFunction as a subclass of
>>>>>>>>>>> TableFunction
>>>>>>>>>>>>>>> is a
>>>>>>>>>>>>>>>>> good
>>>>>>>>>>>>>>>>>>>>>>>> idea. We can add 'maxRetryTimes' option into this
>>>>>>>> class.
>>>>>>>>>>>>> 'eval'
>>>>>>>>>>>>>>>>> method
>>>>>>>>>>>>>>>>>>>>>>>> of new LookupFunction is great for this purpose.
>>>>>> The
>>>>>>>> same
>>>>>>>>>>> is
>>>>>>>>>>>>>>> for
>>>>>>>>>>>>>>>>>>>>>>>> 'async' case.
>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>> 2) There might be other configs in future, such as
>>>>>>>>>>>>>>>>> 'cacheMissingKey'
>>>>>>>>>>>>>>>>>>>>>>>> in LookupFunctionProvider or 'rescanInterval' in
>>>>>>>>>>>>>>>>> ScanRuntimeProvider.
>>>>>>>>>>>>>>>>>>>>>>>> Maybe use Builder pattern in LookupFunctionProvider
>>>>>>> and
>>>>>>>>>>>>>>>>>>>>>>>> RescanRuntimeProvider for more flexibility (use one
>>>>>>>>> 'build'
>>>>>>>>>>>>>>>> method
>>>>>>>>>>>>>>>>>>>>>>>> instead of many 'of' methods in future)?
>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>> 3) What are the plans for existing
>>>>>>>> TableFunctionProvider
>>>>>>>>>>> and
>>>>>>>>>>>>>>>>>>>>>>>> AsyncTableFunctionProvider? I think they should be
>>>>>>>>>>> deprecated.
>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>> 4) Am I right that the current design does not
>>>>>> assume
>>>>>>>>>>> usage of
>>>>>>>>>>>>>>>>>>>>>>>> user-provided LookupCache in re-scanning? In this
>>>>>>> case,
>>>>>>>>> it
>>>>>>>>>>> is
>>>>>>>>>>>>>>> not
>>>>>>>>>>>>>>>>> very
>>>>>>>>>>>>>>>>>>>>>>>> clear why do we need methods such as 'invalidate'
>>>>>> or
>>>>>>>>>>> 'putAll'
>>>>>>>>>>>>>>> in
>>>>>>>>>>>>>>>>>>>>>>>> LookupCache.
>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>> 5) I don't see DDL options, that were in previous
>>>>>>>> version
>>>>>>>>>>> of
>>>>>>>>>>>>>>>> FLIP.
>>>>>>>>>>>>>>>>> Do
>>>>>>>>>>>>>>>>>>>>>>>> you have any special plans for them?
>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>> If you don't mind, I would be glad to be able to
>>>>>> make
>>>>>>>>> small
>>>>>>>>>>>>>>>>>>>>>>>> adjustments to the FLIP document too. I think it's
>>>>>>>> worth
>>>>>>>>>>>>>>>> mentioning
>>>>>>>>>>>>>>>>>>>>>>>> about what exactly optimizations are planning in
>>>>>> the
>>>>>>>>>>> future.
>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>> Best regards,
>>>>>>>>>>>>>>>>>>>>>>>> Smirnov Alexander
>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>> пт, 13 мая 2022 г. в 20:27, Qingsheng Ren <
>>>>>>>>>>> [email protected]
>>>>>>>>>>>>>>>> :
>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>>> Hi Alexander and devs,
>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>>> Thank you very much for the in-depth discussion!
>>>>>> As
>>>>>>>> Jark
>>>>>>>>>>>>>>>>> mentioned we were inspired by Alexander's idea and made a
>>>>>>>>>>> refactor on
>>>>>>>>>>>>>>> our
>>>>>>>>>>>>>>>>> design. FLIP-221 [1] has been updated to reflect our
>>>>>> design
>>>>>>>> now
>>>>>>>>>>> and
>>>>>>>>>>>>> we
>>>>>>>>>>>>>>>> are
>>>>>>>>>>>>>>>>> happy to hear more suggestions from you!
>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>>> Compared to the previous design:
>>>>>>>>>>>>>>>>>>>>>>>>> 1. The lookup cache serves at table runtime level
>>>>>>> and
>>>>>>>> is
>>>>>>>>>>>>>>>>> integrated as a component of LookupJoinRunner as discussed
>>>>>>>>>>>>> previously.
>>>>>>>>>>>>>>>>>>>>>>>>> 2. Interfaces are renamed and re-designed to
>>>>>> reflect
>>>>>>>> the
>>>>>>>>>>> new
>>>>>>>>>>>>>>>>> design.
>>>>>>>>>>>>>>>>>>>>>>>>> 3. We separate the all-caching case individually
>>>>>> and
>>>>>>>>>>>>>>> introduce a
>>>>>>>>>>>>>>>>> new RescanRuntimeProvider to reuse the ability of
>>>>>> scanning.
>>>>>>> We
>>>>>>>>> are
>>>>>>>>>>>>>>>> planning
>>>>>>>>>>>>>>>>> to support SourceFunction / InputFormat for now
>>>>>> considering
>>>>>>>> the
>>>>>>>>>>>>>>>> complexity
>>>>>>>>>>>>>>>>> of FLIP-27 Source API.
>>>>>>>>>>>>>>>>>>>>>>>>> 4. A new interface LookupFunction is introduced to
>>>>>>>> make
>>>>>>>>>>> the
>>>>>>>>>>>>>>>>> semantic of lookup more straightforward for developers.
>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>>> For replying to Alexander:
>>>>>>>>>>>>>>>>>>>>>>>>>> However I'm a little confused whether InputFormat
>>>>>>> is
>>>>>>>>>>>>>>> deprecated
>>>>>>>>>>>>>>>>> or not. Am I right that it will be so in the future, but
>>>>>>>>> currently
>>>>>>>>>>>>> it's
>>>>>>>>>>>>>>>> not?
>>>>>>>>>>>>>>>>>>>>>>>>> Yes you are right. InputFormat is not deprecated
>>>>>> for
>>>>>>>>> now.
>>>>>>>>>>> I
>>>>>>>>>>>>>>>> think
>>>>>>>>>>>>>>>>> it will be deprecated in the future but we don't have a
>>>>>>> clear
>>>>>>>>> plan
>>>>>>>>>>>>> for
>>>>>>>>>>>>>>>> that.
>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>>> Thanks again for the discussion on this FLIP and
>>>>>>>> looking
>>>>>>>>>>>>>>> forward
>>>>>>>>>>>>>>>>> to cooperating with you after we finalize the design and
>>>>>>>>>>> interfaces!
>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>>> [1]
>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>> 
>>>>>>>>>>>>> 
>>>>>>>>>>> 
>>>>>>>>> 
>>>>>>>> 
>>>>>>> 
>>>>>> 
>>>> 
>>> https://cwiki.apache.org/confluence/display/FLINK/FLIP-221+Abstraction+for+lookup+source+cache+and+metric
>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>>> Best regards,
>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>>> Qingsheng
>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>>> On Fri, May 13, 2022 at 12:12 AM Александр
>>>>>> Смирнов <
>>>>>>>>>>>>>>>>> [email protected]> wrote:
>>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>>>> Hi Jark, Qingsheng and Leonard!
>>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>>>> Glad to see that we came to a consensus on almost
>>>>>>> all
>>>>>>>>>>>>> points!
>>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>>>> However I'm a little confused whether InputFormat
>>>>>>> is
>>>>>>>>>>>>>>> deprecated
>>>>>>>>>>>>>>>>> or
>>>>>>>>>>>>>>>>>>>>>>>>>> not. Am I right that it will be so in the future,
>>>>>>> but
>>>>>>>>>>>>>>> currently
>>>>>>>>>>>>>>>>> it's
>>>>>>>>>>>>>>>>>>>>>>>>>> not? Actually I also think that for the first
>>>>>>> version
>>>>>>>>>>> it's
>>>>>>>>>>>>> OK
>>>>>>>>>>>>>>>> to
>>>>>>>>>>>>>>>>> use
>>>>>>>>>>>>>>>>>>>>>>>>>> InputFormat in ALL cache realization, because
>>>>>>>>> supporting
>>>>>>>>>>>>>>> rescan
>>>>>>>>>>>>>>>>>>>>>>>>>> ability seems like a very distant prospect. But
>>>>>> for
>>>>>>>>> this
>>>>>>>>>>>>>>>>> decision we
>>>>>>>>>>>>>>>>>>>>>>>>>> need a consensus among all discussion
>>>>>> participants.
>>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>>>> In general, I don't have something to argue with
>>>>>>> your
>>>>>>>>>>>>>>>>> statements. All
>>>>>>>>>>>>>>>>>>>>>>>>>> of them correspond my ideas. Looking ahead, it
>>>>>>> would
>>>>>>>> be
>>>>>>>>>>> nice
>>>>>>>>>>>>>>> to
>>>>>>>>>>>>>>>>> work
>>>>>>>>>>>>>>>>>>>>>>>>>> on this FLIP cooperatively. I've already done a
>>>>>> lot
>>>>>>>> of
>>>>>>>>>>> work
>>>>>>>>>>>>>>> on
>>>>>>>>>>>>>>>>> lookup
>>>>>>>>>>>>>>>>>>>>>>>>>> join caching with realization very close to the
>>>>>> one
>>>>>>>> we
>>>>>>>>>>> are
>>>>>>>>>>>>>>>>> discussing,
>>>>>>>>>>>>>>>>>>>>>>>>>> and want to share the results of this work.
>>>>>> Anyway
>>>>>>>>>>> looking
>>>>>>>>>>>>>>>>> forward for
>>>>>>>>>>>>>>>>>>>>>>>>>> the FLIP update!
>>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>>>> Best regards,
>>>>>>>>>>>>>>>>>>>>>>>>>> Smirnov Alexander
>>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>>>> чт, 12 мая 2022 г. в 17:38, Jark Wu <
>>>>>>>> [email protected]
>>>>>>>>>> :
>>>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>>>>> Hi Alex,
>>>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>>>>> Thanks for summarizing your points.
>>>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>>>>> In the past week, Qingsheng, Leonard, and I have
>>>>>>>>>>> discussed
>>>>>>>>>>>>>>> it
>>>>>>>>>>>>>>>>> several times
>>>>>>>>>>>>>>>>>>>>>>>>>>> and we have totally refactored the design.
>>>>>>>>>>>>>>>>>>>>>>>>>>> I'm glad to say we have reached a consensus on
>>>>>>> many
>>>>>>>> of
>>>>>>>>>>> your
>>>>>>>>>>>>>>>>> points!
>>>>>>>>>>>>>>>>>>>>>>>>>>> Qingsheng is still working on updating the
>>>>>> design
>>>>>>>> docs
>>>>>>>>>>> and
>>>>>>>>>>>>>>>>> maybe can be
>>>>>>>>>>>>>>>>>>>>>>>>>>> available in the next few days.
>>>>>>>>>>>>>>>>>>>>>>>>>>> I will share some conclusions from our
>>>>>>> discussions:
>>>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>>>>> 1) we have refactored the design towards to
>>>>>> "cache
>>>>>>>> in
>>>>>>>>>>>>>>>>> framework" way.
>>>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>>>>> 2) a "LookupCache" interface for users to
>>>>>>> customize
>>>>>>>>> and
>>>>>>>>>>> a
>>>>>>>>>>>>>>>>> default
>>>>>>>>>>>>>>>>>>>>>>>>>>> implementation with builder for users to
>>>>>> easy-use.
>>>>>>>>>>>>>>>>>>>>>>>>>>> This can both make it possible to both have
>>>>>>>>> flexibility
>>>>>>>>>>> and
>>>>>>>>>>>>>>>>> conciseness.
>>>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>>>>> 3) Filter pushdown is important for ALL and LRU
>>>>>>>> lookup
>>>>>>>>>>>>>>> cache,
>>>>>>>>>>>>>>>>> esp reducing
>>>>>>>>>>>>>>>>>>>>>>>>>>> IO.
>>>>>>>>>>>>>>>>>>>>>>>>>>> Filter pushdown should be the final state and
>>>>>> the
>>>>>>>>>>> unified
>>>>>>>>>>>>>>> way
>>>>>>>>>>>>>>>>> to both
>>>>>>>>>>>>>>>>>>>>>>>>>>> support pruning ALL cache and LRU cache,
>>>>>>>>>>>>>>>>>>>>>>>>>>> so I think we should make effort in this
>>>>>>> direction.
>>>>>>>> If
>>>>>>>>>>> we
>>>>>>>>>>>>>>> need
>>>>>>>>>>>>>>>>> to support
>>>>>>>>>>>>>>>>>>>>>>>>>>> filter pushdown for ALL cache anyway, why not
>>>>>> use
>>>>>>>>>>>>>>>>>>>>>>>>>>> it for LRU cache as well? Either way, as we
>>>>>> decide
>>>>>>>> to
>>>>>>>>>>>>>>>> implement
>>>>>>>>>>>>>>>>> the cache
>>>>>>>>>>>>>>>>>>>>>>>>>>> in the framework, we have the chance to support
>>>>>>>>>>>>>>>>>>>>>>>>>>> filter on cache anytime. This is an optimization
>>>>>>> and
>>>>>>>>> it
>>>>>>>>>>>>>>>> doesn't
>>>>>>>>>>>>>>>>> affect the
>>>>>>>>>>>>>>>>>>>>>>>>>>> public API. I think we can create a JIRA issue
>>>>>> to
>>>>>>>>>>>>>>>>>>>>>>>>>>> discuss it when the FLIP is accepted.
>>>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>>>>> 4) The idea to support ALL cache is similar to
>>>>>>> your
>>>>>>>>>>>>>>> proposal.
>>>>>>>>>>>>>>>>>>>>>>>>>>> In the first version, we will only support
>>>>>>>>> InputFormat,
>>>>>>>>>>>>>>>>> SourceFunction for
>>>>>>>>>>>>>>>>>>>>>>>>>>> cache all (invoke InputFormat in join operator).
>>>>>>>>>>>>>>>>>>>>>>>>>>> For FLIP-27 source, we need to join a true
>>>>>> source
>>>>>>>>>>> operator
>>>>>>>>>>>>>>>>> instead of
>>>>>>>>>>>>>>>>>>>>>>>>>>> calling it embedded in the join operator.
>>>>>>>>>>>>>>>>>>>>>>>>>>> However, this needs another FLIP to support the
>>>>>>>>> re-scan
>>>>>>>>>>>>>>>> ability
>>>>>>>>>>>>>>>>> for FLIP-27
>>>>>>>>>>>>>>>>>>>>>>>>>>> Source, and this can be a large work.
>>>>>>>>>>>>>>>>>>>>>>>>>>> In order to not block this issue, we can put the
>>>>>>>>> effort
>>>>>>>>>>> of
>>>>>>>>>>>>>>>>> FLIP-27 source
>>>>>>>>>>>>>>>>>>>>>>>>>>> integration into future work and integrate
>>>>>>>>>>>>>>>>>>>>>>>>>>> InputFormat&SourceFunction for now.
>>>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>>>>> I think it's fine to use
>>>>>>> InputFormat&SourceFunction,
>>>>>>>>> as
>>>>>>>>>>>>> they
>>>>>>>>>>>>>>>>> are not
>>>>>>>>>>>>>>>>>>>>>>>>>>> deprecated, otherwise, we have to introduce
>>>>>>> another
>>>>>>>>>>>>> function
>>>>>>>>>>>>>>>>>>>>>>>>>>> similar to them which is meaningless. We need to
>>>>>>>> plan
>>>>>>>>>>>>>>> FLIP-27
>>>>>>>>>>>>>>>>> source
>>>>>>>>>>>>>>>>>>>>>>>>>>> integration ASAP before InputFormat &
>>>>>>> SourceFunction
>>>>>>>>> are
>>>>>>>>>>>>>>>>> deprecated.
>>>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>>>>> Best,
>>>>>>>>>>>>>>>>>>>>>>>>>>> Jark
>>>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>>>>> On Thu, 12 May 2022 at 15:46, Александр Смирнов
>>>>>> <
>>>>>>>>>>>>>>>>> [email protected]>
>>>>>>>>>>>>>>>>>>>>>>>>>>> wrote:
>>>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>>>>>> Hi Martijn!
>>>>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>>>>>> Got it. Therefore, the realization with
>>>>>>> InputFormat
>>>>>>>>> is
>>>>>>>>>>> not
>>>>>>>>>>>>>>>>> considered.
>>>>>>>>>>>>>>>>>>>>>>>>>>>> Thanks for clearing that up!
>>>>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>>>>>> Best regards,
>>>>>>>>>>>>>>>>>>>>>>>>>>>> Smirnov Alexander
>>>>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>>>>>> чт, 12 мая 2022 г. в 14:23, Martijn Visser <
>>>>>>>>>>>>>>>>> [email protected]>:
>>>>>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Hi,
>>>>>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>> With regards to:
>>>>>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> But if there are plans to refactor all
>>>>>>> connectors
>>>>>>>>> to
>>>>>>>>>>>>>>>> FLIP-27
>>>>>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Yes, FLIP-27 is the target for all connectors.
>>>>>>> The
>>>>>>>>> old
>>>>>>>>>>>>>>>>> interfaces will be
>>>>>>>>>>>>>>>>>>>>>>>>>>>>> deprecated and connectors will either be
>>>>>>>> refactored
>>>>>>>>> to
>>>>>>>>>>>>> use
>>>>>>>>>>>>>>>>> the new ones
>>>>>>>>>>>>>>>>>>>>>>>>>>>> or
>>>>>>>>>>>>>>>>>>>>>>>>>>>>> dropped.
>>>>>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>> The caching should work for connectors that
>>>>>> are
>>>>>>>>> using
>>>>>>>>>>>>>>>> FLIP-27
>>>>>>>>>>>>>>>>> interfaces,
>>>>>>>>>>>>>>>>>>>>>>>>>>>>> we should not introduce new features for old
>>>>>>>>>>> interfaces.
>>>>>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Best regards,
>>>>>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Martijn
>>>>>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>> On Thu, 12 May 2022 at 06:19, Александр
>>>>>> Смирнов
>>>>>>> <
>>>>>>>>>>>>>>>>> [email protected]>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>> wrote:
>>>>>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Hi Jark!
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Sorry for the late response. I would like to
>>>>>>> make
>>>>>>>>>>> some
>>>>>>>>>>>>>>>>> comments and
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> clarify my points.
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> 1) I agree with your first statement. I think
>>>>>>> we
>>>>>>>>> can
>>>>>>>>>>>>>>>> achieve
>>>>>>>>>>>>>>>>> both
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> advantages this way: put the Cache interface
>>>>>> in
>>>>>>>>>>>>>>>>> flink-table-common,
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> but have implementations of it in
>>>>>>>>>>> flink-table-runtime.
>>>>>>>>>>>>>>>>> Therefore if a
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> connector developer wants to use existing
>>>>>> cache
>>>>>>>>>>>>>>> strategies
>>>>>>>>>>>>>>>>> and their
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> implementations, he can just pass
>>>>>> lookupConfig
>>>>>>> to
>>>>>>>>> the
>>>>>>>>>>>>>>>>> planner, but if
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> he wants to have its own cache implementation
>>>>>>> in
>>>>>>>>> his
>>>>>>>>>>>>>>>>> TableFunction, it
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> will be possible for him to use the existing
>>>>>>>>>>> interface
>>>>>>>>>>>>>>> for
>>>>>>>>>>>>>>>>> this
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> purpose (we can explicitly point this out in
>>>>>>> the
>>>>>>>>>>>>>>>>> documentation). In
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> this way all configs and metrics will be
>>>>>>> unified.
>>>>>>>>>>> WDYT?
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> If a filter can prune 90% of data in the
>>>>>>> cache,
>>>>>>>> we
>>>>>>>>>>> will
>>>>>>>>>>>>>>>>> have 90% of
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> lookup requests that can never be cached
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> 2) Let me clarify the logic filters
>>>>>>> optimization
>>>>>>>> in
>>>>>>>>>>> case
>>>>>>>>>>>>>>> of
>>>>>>>>>>>>>>>>> LRU cache.
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> It looks like Cache<RowData,
>>>>>>>> Collection<RowData>>.
>>>>>>>>>>> Here
>>>>>>>>>>>>>>> we
>>>>>>>>>>>>>>>>> always
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> store the response of the dimension table in
>>>>>>>> cache,
>>>>>>>>>>> even
>>>>>>>>>>>>>>>>> after
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> applying calc function. I.e. if there are no
>>>>>>> rows
>>>>>>>>>>> after
>>>>>>>>>>>>>>>>> applying
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> filters to the result of the 'eval' method of
>>>>>>>>>>>>>>>> TableFunction,
>>>>>>>>>>>>>>>>> we store
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> the empty list by lookup keys. Therefore the
>>>>>>>> cache
>>>>>>>>>>> line
>>>>>>>>>>>>>>>> will
>>>>>>>>>>>>>>>>> be
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> filled, but will require much less memory (in
>>>>>>>>> bytes).
>>>>>>>>>>>>>>> I.e.
>>>>>>>>>>>>>>>>> we don't
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> completely filter keys, by which result was
>>>>>>>> pruned,
>>>>>>>>>>> but
>>>>>>>>>>>>>>>>> significantly
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> reduce required memory to store this result.
>>>>>> If
>>>>>>>> the
>>>>>>>>>>> user
>>>>>>>>>>>>>>>>> knows about
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> this behavior, he can increase the 'max-rows'
>>>>>>>>> option
>>>>>>>>>>>>>>> before
>>>>>>>>>>>>>>>>> the start
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> of the job. But actually I came up with the
>>>>>>> idea
>>>>>>>>>>> that we
>>>>>>>>>>>>>>>> can
>>>>>>>>>>>>>>>>> do this
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> automatically by using the 'maximumWeight'
>>>>>> and
>>>>>>>>>>> 'weigher'
>>>>>>>>>>>>>>>>> methods of
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> GuavaCache [1]. Weight can be the size of the
>>>>>>>>>>> collection
>>>>>>>>>>>>>>> of
>>>>>>>>>>>>>>>>> rows
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> (value of cache). Therefore cache can
>>>>>>>> automatically
>>>>>>>>>>> fit
>>>>>>>>>>>>>>>> much
>>>>>>>>>>>>>>>>> more
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> records than before.
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Flink SQL has provided a standard way to do
>>>>>>>>> filters
>>>>>>>>>>> and
>>>>>>>>>>>>>>>>> projects
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> pushdown, i.e., SupportsFilterPushDown and
>>>>>>>>>>>>>>>>> SupportsProjectionPushDown.
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Jdbc/hive/HBase haven't implemented the
>>>>>>>>> interfaces,
>>>>>>>>>>>>>>> don't
>>>>>>>>>>>>>>>>> mean it's
>>>>>>>>>>>>>>>>>>>>>>>>>>>> hard
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> to implement.
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> It's debatable how difficult it will be to
>>>>>>>>> implement
>>>>>>>>>>>>>>> filter
>>>>>>>>>>>>>>>>> pushdown.
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> But I think the fact that currently there is
>>>>>> no
>>>>>>>>>>> database
>>>>>>>>>>>>>>>>> connector
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> with filter pushdown at least means that this
>>>>>>>>> feature
>>>>>>>>>>>>>>> won't
>>>>>>>>>>>>>>>>> be
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> supported soon in connectors. Moreover, if we
>>>>>>>> talk
>>>>>>>>>>> about
>>>>>>>>>>>>>>>>> other
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> connectors (not in Flink repo), their
>>>>>> databases
>>>>>>>>> might
>>>>>>>>>>>>> not
>>>>>>>>>>>>>>>>> support all
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Flink filters (or not support filters at
>>>>>> all).
>>>>>>> I
>>>>>>>>>>> think
>>>>>>>>>>>>>>>> users
>>>>>>>>>>>>>>>>> are
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> interested in supporting cache filters
>>>>>>>> optimization
>>>>>>>>>>>>>>>>> independently of
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> supporting other features and solving more
>>>>>>>> complex
>>>>>>>>>>>>>>> problems
>>>>>>>>>>>>>>>>> (or
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> unsolvable at all).
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> 3) I agree with your third statement.
>>>>>> Actually
>>>>>>> in
>>>>>>>>> our
>>>>>>>>>>>>>>>>> internal version
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> I also tried to unify the logic of scanning
>>>>>> and
>>>>>>>>>>>>> reloading
>>>>>>>>>>>>>>>>> data from
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> connectors. But unfortunately, I didn't find
>>>>>> a
>>>>>>>> way
>>>>>>>>> to
>>>>>>>>>>>>>>> unify
>>>>>>>>>>>>>>>>> the logic
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> of all ScanRuntimeProviders (InputFormat,
>>>>>>>>>>>>> SourceFunction,
>>>>>>>>>>>>>>>>> Source,...)
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> and reuse it in reloading ALL cache. As a
>>>>>>> result
>>>>>>>> I
>>>>>>>>>>>>>>> settled
>>>>>>>>>>>>>>>>> on using
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> InputFormat, because it was used for scanning
>>>>>>> in
>>>>>>>>> all
>>>>>>>>>>>>>>> lookup
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> connectors. (I didn't know that there are
>>>>>> plans
>>>>>>>> to
>>>>>>>>>>>>>>>> deprecate
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> InputFormat in favor of FLIP-27 Source). IMO
>>>>>>>> usage
>>>>>>>>> of
>>>>>>>>>>>>>>>>> FLIP-27 source
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> in ALL caching is not good idea, because this
>>>>>>>>> source
>>>>>>>>>>> was
>>>>>>>>>>>>>>>>> designed to
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> work in distributed environment
>>>>>>> (SplitEnumerator
>>>>>>>> on
>>>>>>>>>>>>>>>>> JobManager and
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> SourceReaders on TaskManagers), not in one
>>>>>>>> operator
>>>>>>>>>>>>>>> (lookup
>>>>>>>>>>>>>>>>> join
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> operator in our case). There is even no
>>>>>> direct
>>>>>>>> way
>>>>>>>>> to
>>>>>>>>>>>>>>> pass
>>>>>>>>>>>>>>>>> splits from
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> SplitEnumerator to SourceReader (this logic
>>>>>>> works
>>>>>>>>>>>>> through
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> SplitEnumeratorContext, which requires
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> OperatorCoordinator.SubtaskGateway to send
>>>>>>>>>>>>>>> AddSplitEvents).
>>>>>>>>>>>>>>>>> Usage of
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> InputFormat for ALL cache seems much more
>>>>>>> clearer
>>>>>>>>> and
>>>>>>>>>>>>>>>>> easier. But if
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> there are plans to refactor all connectors to
>>>>>>>>>>> FLIP-27, I
>>>>>>>>>>>>>>>>> have the
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> following ideas: maybe we can refuse from
>>>>>>> lookup
>>>>>>>>> join
>>>>>>>>>>>>> ALL
>>>>>>>>>>>>>>>>> cache in
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> favor of simple join with multiple scanning
>>>>>> of
>>>>>>>>> batch
>>>>>>>>>>>>>>>> source?
>>>>>>>>>>>>>>>>> The point
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> is that the only difference between lookup
>>>>>> join
>>>>>>>> ALL
>>>>>>>>>>>>> cache
>>>>>>>>>>>>>>>>> and simple
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> join with batch source is that in the first
>>>>>>> case
>>>>>>>>>>>>> scanning
>>>>>>>>>>>>>>>> is
>>>>>>>>>>>>>>>>> performed
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> multiple times, in between which state
>>>>>> (cache)
>>>>>>> is
>>>>>>>>>>>>> cleared
>>>>>>>>>>>>>>>>> (correct me
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> if I'm wrong). So what if we extend the
>>>>>>>>>>> functionality of
>>>>>>>>>>>>>>>>> simple join
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> to support state reloading + extend the
>>>>>>>>>>> functionality of
>>>>>>>>>>>>>>>>> scanning
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> batch source multiple times (this one should
>>>>>> be
>>>>>>>>> easy
>>>>>>>>>>>>> with
>>>>>>>>>>>>>>>>> new FLIP-27
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> source, that unifies streaming/batch reading
>>>>>> -
>>>>>>> we
>>>>>>>>>>> will
>>>>>>>>>>>>>>> need
>>>>>>>>>>>>>>>>> to change
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> only SplitEnumerator, which will pass splits
>>>>>>>> again
>>>>>>>>>>> after
>>>>>>>>>>>>>>>>> some TTL).
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> WDYT? I must say that this looks like a
>>>>>>> long-term
>>>>>>>>>>> goal
>>>>>>>>>>>>>>> and
>>>>>>>>>>>>>>>>> will make
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> the scope of this FLIP even larger than you
>>>>>>> said.
>>>>>>>>>>> Maybe
>>>>>>>>>>>>>>> we
>>>>>>>>>>>>>>>>> can limit
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> ourselves to a simpler solution now
>>>>>>>> (InputFormats).
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> So to sum up, my points is like this:
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> 1) There is a way to make both concise and
>>>>>>>> flexible
>>>>>>>>>>>>>>>>> interfaces for
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> caching in lookup join.
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> 2) Cache filters optimization is important
>>>>>> both
>>>>>>>> in
>>>>>>>>>>> LRU
>>>>>>>>>>>>>>> and
>>>>>>>>>>>>>>>>> ALL caches.
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> 3) It is unclear when filter pushdown will be
>>>>>>>>>>> supported
>>>>>>>>>>>>>>> in
>>>>>>>>>>>>>>>>> Flink
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> connectors, some of the connectors might not
>>>>>>> have
>>>>>>>>> the
>>>>>>>>>>>>>>>>> opportunity to
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> support filter pushdown + as I know,
>>>>>> currently
>>>>>>>>> filter
>>>>>>>>>>>>>>>>> pushdown works
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> only for scanning (not lookup). So cache
>>>>>>> filters
>>>>>>>> +
>>>>>>>>>>>>>>>>> projections
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> optimization should be independent from other
>>>>>>>>>>> features.
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> 4) ALL cache realization is a complex topic
>>>>>>> that
>>>>>>>>>>>>> involves
>>>>>>>>>>>>>>>>> multiple
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> aspects of how Flink is developing. Refusing
>>>>>>> from
>>>>>>>>>>>>>>>>> InputFormat in favor
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> of FLIP-27 Source will make ALL cache
>>>>>>> realization
>>>>>>>>>>> really
>>>>>>>>>>>>>>>>> complex and
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> not clear, so maybe instead of that we can
>>>>>>> extend
>>>>>>>>> the
>>>>>>>>>>>>>>>>> functionality of
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> simple join or not refuse from InputFormat in
>>>>>>>> case
>>>>>>>>> of
>>>>>>>>>>>>>>>> lookup
>>>>>>>>>>>>>>>>> join ALL
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> cache?
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Best regards,
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Smirnov Alexander
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> [1]
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>> 
>>>>>>>>>>>>> 
>>>>>>>>>>> 
>>>>>>>>> 
>>>>>>>> 
>>>>>>> 
>>>>>> 
>>>> 
>>> https://guava.dev/releases/18.0/api/docs/com/google/common/cache/CacheBuilder.html#weigher(com.google.common.cache.Weigher)
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> чт, 5 мая 2022 г. в 20:34, Jark Wu <
>>>>>>>>> [email protected]
>>>>>>>>>>>> :
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> It's great to see the active discussion! I
>>>>>>> want
>>>>>>>> to
>>>>>>>>>>>>> share
>>>>>>>>>>>>>>>> my
>>>>>>>>>>>>>>>>> ideas:
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> 1) implement the cache in framework vs.
>>>>>>>> connectors
>>>>>>>>>>> base
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> I don't have a strong opinion on this. Both
>>>>>>> ways
>>>>>>>>>>> should
>>>>>>>>>>>>>>>>> work (e.g.,
>>>>>>>>>>>>>>>>>>>>>>>>>>>> cache
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> pruning, compatibility).
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> The framework way can provide more concise
>>>>>>>>>>> interfaces.
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> The connector base way can define more
>>>>>>> flexible
>>>>>>>>>>> cache
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> strategies/implementations.
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> We are still investigating a way to see if
>>>>>> we
>>>>>>>> can
>>>>>>>>>>> have
>>>>>>>>>>>>>>>> both
>>>>>>>>>>>>>>>>>>>>>>>>>>>> advantages.
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> We should reach a consensus that the way
>>>>>>> should
>>>>>>>>> be a
>>>>>>>>>>>>>>> final
>>>>>>>>>>>>>>>>> state,
>>>>>>>>>>>>>>>>>>>>>>>>>>>> and we
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> are on the path to it.
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> 2) filters and projections pushdown:
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> I agree with Alex that the filter pushdown
>>>>>>> into
>>>>>>>>>>> cache
>>>>>>>>>>>>>>> can
>>>>>>>>>>>>>>>>> benefit a
>>>>>>>>>>>>>>>>>>>>>>>>>>>> lot
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> for
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> ALL cache.
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> However, this is not true for LRU cache.
>>>>>>>>> Connectors
>>>>>>>>>>> use
>>>>>>>>>>>>>>>>> cache to
>>>>>>>>>>>>>>>>>>>>>>>>>>>> reduce
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> IO
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> requests to databases for better throughput.
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> If a filter can prune 90% of data in the
>>>>>>> cache,
>>>>>>>> we
>>>>>>>>>>> will
>>>>>>>>>>>>>>>>> have 90% of
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> lookup
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> requests that can never be cached
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> and hit directly to the databases. That
>>>>>> means
>>>>>>>> the
>>>>>>>>>>> cache
>>>>>>>>>>>>>>> is
>>>>>>>>>>>>>>>>>>>>>>>>>>>> meaningless in
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> this case.
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> IMO, Flink SQL has provided a standard way
>>>>>> to
>>>>>>> do
>>>>>>>>>>>>> filters
>>>>>>>>>>>>>>>>> and projects
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> pushdown, i.e., SupportsFilterPushDown and
>>>>>>>>>>>>>>>>>>>>>>>>>>>> SupportsProjectionPushDown.
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Jdbc/hive/HBase haven't implemented the
>>>>>>>>> interfaces,
>>>>>>>>>>>>>>> don't
>>>>>>>>>>>>>>>>> mean it's
>>>>>>>>>>>>>>>>>>>>>>>>>>>> hard
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> to
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> implement.
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> They should implement the pushdown
>>>>>> interfaces
>>>>>>> to
>>>>>>>>>>> reduce
>>>>>>>>>>>>>>> IO
>>>>>>>>>>>>>>>>> and the
>>>>>>>>>>>>>>>>>>>>>>>>>>>> cache
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> size.
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> That should be a final state that the scan
>>>>>>>> source
>>>>>>>>>>> and
>>>>>>>>>>>>>>>>> lookup source
>>>>>>>>>>>>>>>>>>>>>>>>>>>> share
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> the exact pushdown implementation.
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> I don't see why we need to duplicate the
>>>>>>>> pushdown
>>>>>>>>>>> logic
>>>>>>>>>>>>>>> in
>>>>>>>>>>>>>>>>> caches,
>>>>>>>>>>>>>>>>>>>>>>>>>>>> which
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> will complex the lookup join design.
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> 3) ALL cache abstraction
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> All cache might be the most challenging part
>>>>>>> of
>>>>>>>>> this
>>>>>>>>>>>>>>> FLIP.
>>>>>>>>>>>>>>>>> We have
>>>>>>>>>>>>>>>>>>>>>>>>>>>> never
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> provided a reload-lookup public interface.
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Currently, we put the reload logic in the
>>>>>>> "eval"
>>>>>>>>>>> method
>>>>>>>>>>>>>>> of
>>>>>>>>>>>>>>>>>>>>>>>>>>>> TableFunction.
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> That's hard for some sources (e.g., Hive).
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Ideally, connector implementation should
>>>>>> share
>>>>>>>> the
>>>>>>>>>>>>> logic
>>>>>>>>>>>>>>>> of
>>>>>>>>>>>>>>>>> reload
>>>>>>>>>>>>>>>>>>>>>>>>>>>> and
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> scan, i.e. ScanTableSource with
>>>>>>>>>>>>>>>>> InputFormat/SourceFunction/FLIP-27
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Source.
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> However, InputFormat/SourceFunction are
>>>>>>>>> deprecated,
>>>>>>>>>>> and
>>>>>>>>>>>>>>>> the
>>>>>>>>>>>>>>>>> FLIP-27
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> source
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> is deeply coupled with SourceOperator.
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> If we want to invoke the FLIP-27 source in
>>>>>>>>>>> LookupJoin,
>>>>>>>>>>>>>>>> this
>>>>>>>>>>>>>>>>> may make
>>>>>>>>>>>>>>>>>>>>>>>>>>>> the
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> scope of this FLIP much larger.
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> We are still investigating how to abstract
>>>>>> the
>>>>>>>> ALL
>>>>>>>>>>>>> cache
>>>>>>>>>>>>>>>>> logic and
>>>>>>>>>>>>>>>>>>>>>>>>>>>> reuse
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> the existing source interfaces.
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Best,
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Jark
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> On Thu, 5 May 2022 at 20:22, Roman Boyko <
>>>>>>>>>>>>>>>>> [email protected]>
>>>>>>>>>>>>>>>>>>>>>>>>>>>> wrote:
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> It's a much more complicated activity and
>>>>>>> lies
>>>>>>>>> out
>>>>>>>>>>> of
>>>>>>>>>>>>>>> the
>>>>>>>>>>>>>>>>> scope of
>>>>>>>>>>>>>>>>>>>>>>>>>>>> this
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> improvement. Because such pushdowns should
>>>>>> be
>>>>>>>>> done
>>>>>>>>>>> for
>>>>>>>>>>>>>>>> all
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> ScanTableSource
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> implementations (not only for Lookup ones).
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> On Thu, 5 May 2022 at 19:02, Martijn
>>>>>> Visser <
>>>>>>>>>>>>>>>>>>>>>>>>>>>> [email protected]>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> wrote:
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Hi everyone,
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> One question regarding "And Alexander
>>>>>>>> correctly
>>>>>>>>>>>>>>>> mentioned
>>>>>>>>>>>>>>>>> that
>>>>>>>>>>>>>>>>>>>>>>>>>>>> filter
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> pushdown still is not implemented for
>>>>>>>>>>>>>>> jdbc/hive/hbase."
>>>>>>>>>>>>>>>>> -> Would
>>>>>>>>>>>>>>>>>>>>>>>>>>>> an
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> alternative solution be to actually
>>>>>>> implement
>>>>>>>>>>> these
>>>>>>>>>>>>>>>> filter
>>>>>>>>>>>>>>>>>>>>>>>>>>>> pushdowns?
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> I
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> can
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> imagine that there are many more benefits
>>>>>> to
>>>>>>>>> doing
>>>>>>>>>>>>>>> that,
>>>>>>>>>>>>>>>>> outside
>>>>>>>>>>>>>>>>>>>>>>>>>>>> of
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> lookup
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> caching and metrics.
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Best regards,
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Martijn Visser
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> https://twitter.com/MartijnVisser82
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> https://github.com/MartijnVisser
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> On Thu, 5 May 2022 at 13:58, Roman Boyko <
>>>>>>>>>>>>>>>>> [email protected]>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> wrote:
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Hi everyone!
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Thanks for driving such a valuable
>>>>>>>> improvement!
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> I do think that single cache
>>>>>> implementation
>>>>>>>>>>> would be
>>>>>>>>>>>>>>> a
>>>>>>>>>>>>>>>>> nice
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> opportunity
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> for
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> users. And it will break the "FOR
>>>>>>> SYSTEM_TIME
>>>>>>>>> AS
>>>>>>>>>>> OF
>>>>>>>>>>>>>>>>> proc_time"
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> semantics
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> anyway - doesn't matter how it will be
>>>>>>>>>>> implemented.
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Putting myself in the user's shoes, I can
>>>>>>> say
>>>>>>>>>>> that:
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> 1) I would prefer to have the opportunity
>>>>>>> to
>>>>>>>>> cut
>>>>>>>>>>> off
>>>>>>>>>>>>>>>> the
>>>>>>>>>>>>>>>>> cache
>>>>>>>>>>>>>>>>>>>>>>>>>>>> size
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> by
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> simply filtering unnecessary data. And
>>>>>> the
>>>>>>>> most
>>>>>>>>>>>>> handy
>>>>>>>>>>>>>>>>> way to do
>>>>>>>>>>>>>>>>>>>>>>>>>>>> it
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> is
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> apply
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> it inside LookupRunners. It would be a
>>>>>> bit
>>>>>>>>>>> harder to
>>>>>>>>>>>>>>>>> pass it
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> through the
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> LookupJoin node to TableFunction. And
>>>>>>>> Alexander
>>>>>>>>>>>>>>>> correctly
>>>>>>>>>>>>>>>>>>>>>>>>>>>> mentioned
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> that
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> filter pushdown still is not implemented
>>>>>>> for
>>>>>>>>>>>>>>>>> jdbc/hive/hbase.
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> 2) The ability to set the different
>>>>>> caching
>>>>>>>>>>>>>>> parameters
>>>>>>>>>>>>>>>>> for
>>>>>>>>>>>>>>>>>>>>>>>>>>>> different
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> tables
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> is quite important. So I would prefer to
>>>>>>> set
>>>>>>>> it
>>>>>>>>>>>>>>> through
>>>>>>>>>>>>>>>>> DDL
>>>>>>>>>>>>>>>>>>>>>>>>>>>> rather
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> than
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> have similar ttla, strategy and other
>>>>>>> options
>>>>>>>>> for
>>>>>>>>>>>>> all
>>>>>>>>>>>>>>>>> lookup
>>>>>>>>>>>>>>>>>>>>>>>>>>>> tables.
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> 3) Providing the cache into the framework
>>>>>>>>> really
>>>>>>>>>>>>>>>>> deprives us of
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> extensibility (users won't be able to
>>>>>>>> implement
>>>>>>>>>>>>> their
>>>>>>>>>>>>>>>> own
>>>>>>>>>>>>>>>>>>>>>>>>>>>> cache).
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> But
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> most
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> probably it might be solved by creating
>>>>>>> more
>>>>>>>>>>>>>>> different
>>>>>>>>>>>>>>>>> cache
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> strategies
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> and
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> a wider set of configurations.
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> All these points are much closer to the
>>>>>>>> schema
>>>>>>>>>>>>>>> proposed
>>>>>>>>>>>>>>>>> by
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Alexander.
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Qingshen Ren, please correct me if I'm
>>>>>> not
>>>>>>>>> right
>>>>>>>>>>> and
>>>>>>>>>>>>>>>> all
>>>>>>>>>>>>>>>>> these
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> facilities
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> might be simply implemented in your
>>>>>>>>> architecture?
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Best regards,
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Roman Boyko
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> e.: [email protected]
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> On Wed, 4 May 2022 at 21:01, Martijn
>>>>>>> Visser <
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> [email protected]>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> wrote:
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Hi everyone,
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> I don't have much to chip in, but just
>>>>>>>> wanted
>>>>>>>>> to
>>>>>>>>>>>>>>>>> express that
>>>>>>>>>>>>>>>>>>>>>>>>>>>> I
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> really
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> appreciate the in-depth discussion on
>>>>>> this
>>>>>>>>> topic
>>>>>>>>>>>>>>> and I
>>>>>>>>>>>>>>>>> hope
>>>>>>>>>>>>>>>>>>>>>>>>>>>> that
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> others
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> will join the conversation.
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Best regards,
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Martijn
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> On Tue, 3 May 2022 at 10:15, Александр
>>>>>>>>> Смирнов <
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> [email protected]>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> wrote:
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Hi Qingsheng, Leonard and Jark,
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Thanks for your detailed feedback!
>>>>>>>> However, I
>>>>>>>>>>> have
>>>>>>>>>>>>>>>>> questions
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> about
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> some of your statements (maybe I didn't
>>>>>>> get
>>>>>>>>>>>>>>>>> something?).
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Caching actually breaks the semantic
>>>>>> of
>>>>>>>> "FOR
>>>>>>>>>>>>>>>>> SYSTEM_TIME
>>>>>>>>>>>>>>>>>>>>>>>>>>>> AS OF
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> proc_time”
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> I agree that the semantics of "FOR
>>>>>>>>> SYSTEM_TIME
>>>>>>>>>>> AS
>>>>>>>>>>>>>>> OF
>>>>>>>>>>>>>>>>>>>>>>>>>>>> proc_time"
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> is
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> not
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> fully implemented with caching, but as
>>>>>>> you
>>>>>>>>>>> said,
>>>>>>>>>>>>>>>> users
>>>>>>>>>>>>>>>>> go
>>>>>>>>>>>>>>>>>>>>>>>>>>>> on it
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> consciously to achieve better
>>>>>> performance
>>>>>>>> (no
>>>>>>>>>>> one
>>>>>>>>>>>>>>>>> proposed
>>>>>>>>>>>>>>>>>>>>>>>>>>>> to
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> enable
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> caching by default, etc.). Or by users
>>>>>> do
>>>>>>>> you
>>>>>>>>>>> mean
>>>>>>>>>>>>>>>>> other
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> developers
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> of
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> connectors? In this case developers
>>>>>>>>> explicitly
>>>>>>>>>>>>>>>> specify
>>>>>>>>>>>>>>>>>>>>>>>>>>>> whether
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> their
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> connector supports caching or not (in
>>>>>> the
>>>>>>>>> list
>>>>>>>>>>> of
>>>>>>>>>>>>>>>>> supported
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> options),
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> no one makes them do that if they don't
>>>>>>>> want
>>>>>>>>>>> to.
>>>>>>>>>>>>> So
>>>>>>>>>>>>>>>>> what
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> exactly is
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> the difference between implementing
>>>>>>> caching
>>>>>>>>> in
>>>>>>>>>>>>>>>> modules
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> flink-table-runtime and in
>>>>>>>> flink-table-common
>>>>>>>>>>> from
>>>>>>>>>>>>>>>> the
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> considered
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> point of view? How does it affect on
>>>>>>>>>>>>>>>>> breaking/non-breaking
>>>>>>>>>>>>>>>>>>>>>>>>>>>> the
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> semantics of "FOR SYSTEM_TIME AS OF
>>>>>>>>> proc_time"?
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> confront a situation that allows table
>>>>>>>>>>> options in
>>>>>>>>>>>>>>>> DDL
>>>>>>>>>>>>>>>>> to
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> control
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> the
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> behavior of the framework, which has
>>>>>>> never
>>>>>>>>>>>>> happened
>>>>>>>>>>>>>>>>>>>>>>>>>>>> previously
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> and
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> should
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> be cautious
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> If we talk about main differences of
>>>>>>>>> semantics
>>>>>>>>>>> of
>>>>>>>>>>>>>>> DDL
>>>>>>>>>>>>>>>>>>>>>>>>>>>> options
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> and
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> config options("table.exec.xxx"), isn't
>>>>>>> it
>>>>>>>>>>> about
>>>>>>>>>>>>>>>>> limiting
>>>>>>>>>>>>>>>>>>>>>>>>>>>> the
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> scope
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> of
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> the options + importance for the user
>>>>>>>>> business
>>>>>>>>>>>>>>> logic
>>>>>>>>>>>>>>>>> rather
>>>>>>>>>>>>>>>>>>>>>>>>>>>> than
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> specific location of corresponding
>>>>>> logic
>>>>>>> in
>>>>>>>>> the
>>>>>>>>>>>>>>>>> framework? I
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> mean
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> that
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> in my design, for example, putting an
>>>>>>>> option
>>>>>>>>>>> with
>>>>>>>>>>>>>>>>> lookup
>>>>>>>>>>>>>>>>>>>>>>>>>>>> cache
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> strategy in configurations would  be
>>>>>> the
>>>>>>>>> wrong
>>>>>>>>>>>>>>>>> decision,
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> because it
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> directly affects the user's business
>>>>>>> logic
>>>>>>>>> (not
>>>>>>>>>>>>>>> just
>>>>>>>>>>>>>>>>>>>>>>>>>>>> performance
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> optimization) + touches just several
>>>>>>>>> functions
>>>>>>>>>>> of
>>>>>>>>>>>>>>> ONE
>>>>>>>>>>>>>>>>> table
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> (there
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> can
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> be multiple tables with different
>>>>>>> caches).
>>>>>>>>>>> Does it
>>>>>>>>>>>>>>>>> really
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> matter for
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> the user (or someone else) where the
>>>>>>> logic
>>>>>>>> is
>>>>>>>>>>>>>>>> located,
>>>>>>>>>>>>>>>>>>>>>>>>>>>> which is
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> affected by the applied option?
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Also I can remember DDL option
>>>>>>>>>>> 'sink.parallelism',
>>>>>>>>>>>>>>>>> which in
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> some way
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> "controls the behavior of the
>>>>>> framework"
>>>>>>>> and
>>>>>>>>> I
>>>>>>>>>>>>>>> don't
>>>>>>>>>>>>>>>>> see any
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> problem
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> here.
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> introduce a new interface for this
>>>>>>>>> all-caching
>>>>>>>>>>>>>>>>> scenario
>>>>>>>>>>>>>>>>>>>>>>>>>>>> and
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> the
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> design
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> would become more complex
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> This is a subject for a separate
>>>>>>>> discussion,
>>>>>>>>>>> but
>>>>>>>>>>>>>>>>> actually
>>>>>>>>>>>>>>>>>>>>>>>>>>>> in our
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> internal version we solved this problem
>>>>>>>> quite
>>>>>>>>>>>>>>> easily
>>>>>>>>>>>>>>>> -
>>>>>>>>>>>>>>>>> we
>>>>>>>>>>>>>>>>>>>>>>>>>>>> reused
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> InputFormat class (so there is no need
>>>>>>> for
>>>>>>>> a
>>>>>>>>>>> new
>>>>>>>>>>>>>>>> API).
>>>>>>>>>>>>>>>>> The
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> point is
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> that currently all lookup connectors
>>>>>> use
>>>>>>>>>>>>>>> InputFormat
>>>>>>>>>>>>>>>>> for
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> scanning
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> the
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> data in batch mode: HBase, JDBC and
>>>>>> even
>>>>>>>> Hive
>>>>>>>>>>> - it
>>>>>>>>>>>>>>>> uses
>>>>>>>>>>>>>>>>>>>>>>>>>>>> class
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> PartitionReader, that is actually just
>>>>>> a
>>>>>>>>>>> wrapper
>>>>>>>>>>>>>>>> around
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> InputFormat.
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> The advantage of this solution is the
>>>>>>>> ability
>>>>>>>>>>> to
>>>>>>>>>>>>>>>> reload
>>>>>>>>>>>>>>>>>>>>>>>>>>>> cache
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> data
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> in
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> parallel (number of threads depends on
>>>>>>>> number
>>>>>>>>>>> of
>>>>>>>>>>>>>>>>>>>>>>>>>>>> InputSplits,
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> but
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> has
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> an upper limit). As a result cache
>>>>>> reload
>>>>>>>>> time
>>>>>>>>>>>>>>>>> significantly
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> reduces
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> (as well as time of input stream
>>>>>>>> blocking). I
>>>>>>>>>>> know
>>>>>>>>>>>>>>>> that
>>>>>>>>>>>>>>>>>>>>>>>>>>>> usually
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> we
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> try
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> to avoid usage of concurrency in Flink
>>>>>>>> code,
>>>>>>>>>>> but
>>>>>>>>>>>>>>>> maybe
>>>>>>>>>>>>>>>>> this
>>>>>>>>>>>>>>>>>>>>>>>>>>>> one
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> can
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> be
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> an exception. BTW I don't say that it's
>>>>>>> an
>>>>>>>>>>> ideal
>>>>>>>>>>>>>>>>> solution,
>>>>>>>>>>>>>>>>>>>>>>>>>>>> maybe
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> there
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> are better ones.
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Providing the cache in the framework
>>>>>>> might
>>>>>>>>>>>>>>> introduce
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> compatibility
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> issues
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> It's possible only in cases when the
>>>>>>>>> developer
>>>>>>>>>>> of
>>>>>>>>>>>>>>> the
>>>>>>>>>>>>>>>>>>>>>>>>>>>> connector
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> won't
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> properly refactor his code and will use
>>>>>>> new
>>>>>>>>>>> cache
>>>>>>>>>>>>>>>>> options
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> incorrectly
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> (i.e. explicitly provide the same
>>>>>> options
>>>>>>>>> into
>>>>>>>>>>> 2
>>>>>>>>>>>>>>>>> different
>>>>>>>>>>>>>>>>>>>>>>>>>>>> code
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> places). For correct behavior all he
>>>>>> will
>>>>>>>>> need
>>>>>>>>>>> to
>>>>>>>>>>>>>>> do
>>>>>>>>>>>>>>>>> is to
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> redirect
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> existing options to the framework's
>>>>>>>>>>> LookupConfig
>>>>>>>>>>>>> (+
>>>>>>>>>>>>>>>>> maybe
>>>>>>>>>>>>>>>>>>>>>>>>>>>> add an
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> alias
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> for options, if there was different
>>>>>>>> naming),
>>>>>>>>>>>>>>>> everything
>>>>>>>>>>>>>>>>>>>>>>>>>>>> will be
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> transparent for users. If the developer
>>>>>>>> won't
>>>>>>>>>>> do
>>>>>>>>>>>>>>>>>>>>>>>>>>>> refactoring at
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> all,
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> nothing will be changed for the
>>>>>> connector
>>>>>>>>>>> because
>>>>>>>>>>>>>>> of
>>>>>>>>>>>>>>>>>>>>>>>>>>>> backward
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> compatibility. Also if a developer
>>>>>> wants
>>>>>>> to
>>>>>>>>> use
>>>>>>>>>>>>> his
>>>>>>>>>>>>>>>> own
>>>>>>>>>>>>>>>>>>>>>>>>>>>> cache
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> logic,
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> he just can refuse to pass some of the
>>>>>>>>> configs
>>>>>>>>>>>>> into
>>>>>>>>>>>>>>>> the
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> framework,
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> and
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> instead make his own implementation
>>>>>> with
>>>>>>>>>>> already
>>>>>>>>>>>>>>>>> existing
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> configs
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> and
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> metrics (but actually I think that
>>>>>> it's a
>>>>>>>>> rare
>>>>>>>>>>>>>>> case).
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> filters and projections should be
>>>>>> pushed
>>>>>>>> all
>>>>>>>>>>> the
>>>>>>>>>>>>>>> way
>>>>>>>>>>>>>>>>> down
>>>>>>>>>>>>>>>>>>>>>>>>>>>> to
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> the
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> table
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> function, like what we do in the scan
>>>>>>>> source
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> It's the great purpose. But the truth
>>>>>> is
>>>>>>>> that
>>>>>>>>>>> the
>>>>>>>>>>>>>>>> ONLY
>>>>>>>>>>>>>>>>>>>>>>>>>>>> connector
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> that
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> supports filter pushdown is
>>>>>>>>>>> FileSystemTableSource
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> (no database connector supports it
>>>>>>>>> currently).
>>>>>>>>>>>>> Also
>>>>>>>>>>>>>>>>> for some
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> databases
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> it's simply impossible to pushdown such
>>>>>>>>> complex
>>>>>>>>>>>>>>>> filters
>>>>>>>>>>>>>>>>>>>>>>>>>>>> that we
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> have
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> in Flink.
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> only applying these optimizations to
>>>>>> the
>>>>>>>>> cache
>>>>>>>>>>>>>>> seems
>>>>>>>>>>>>>>>>> not
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> quite
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> useful
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Filters can cut off an arbitrarily
>>>>>> large
>>>>>>>>>>> amount of
>>>>>>>>>>>>>>>> data
>>>>>>>>>>>>>>>>>>>>>>>>>>>> from the
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> dimension table. For a simple example,
>>>>>>>>> suppose
>>>>>>>>>>> in
>>>>>>>>>>>>>>>>> dimension
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> table
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> 'users'
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> we have column 'age' with values from
>>>>>> 20
>>>>>>> to
>>>>>>>>> 40,
>>>>>>>>>>>>> and
>>>>>>>>>>>>>>>>> input
>>>>>>>>>>>>>>>>>>>>>>>>>>>> stream
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> 'clicks' that is ~uniformly distributed
>>>>>>> by
>>>>>>>>> age
>>>>>>>>>>> of
>>>>>>>>>>>>>>>>> users. If
>>>>>>>>>>>>>>>>>>>>>>>>>>>> we
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> have
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> filter 'age > 30',
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> there will be twice less data in cache.
>>>>>>>> This
>>>>>>>>>>> means
>>>>>>>>>>>>>>>> the
>>>>>>>>>>>>>>>>> user
>>>>>>>>>>>>>>>>>>>>>>>>>>>> can
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> increase 'lookup.cache.max-rows' by
>>>>>>> almost
>>>>>>>> 2
>>>>>>>>>>>>> times.
>>>>>>>>>>>>>>>> It
>>>>>>>>>>>>>>>>> will
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> gain a
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> huge
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> performance boost. Moreover, this
>>>>>>>>> optimization
>>>>>>>>>>>>>>> starts
>>>>>>>>>>>>>>>>> to
>>>>>>>>>>>>>>>>>>>>>>>>>>>> really
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> shine
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> in 'ALL' cache, where tables without
>>>>>>>> filters
>>>>>>>>>>> and
>>>>>>>>>>>>>>>>> projections
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> can't
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> fit
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> in memory, but with them - can. This
>>>>>>> opens
>>>>>>>> up
>>>>>>>>>>>>>>>>> additional
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> possibilities
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> for users. And this doesn't sound as
>>>>>> 'not
>>>>>>>>> quite
>>>>>>>>>>>>>>>>> useful'.
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> It would be great to hear other voices
>>>>>>>>>>> regarding
>>>>>>>>>>>>>>> this
>>>>>>>>>>>>>>>>> topic!
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Because
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> we have quite a lot of controversial
>>>>>>>> points,
>>>>>>>>>>> and I
>>>>>>>>>>>>>>>>> think
>>>>>>>>>>>>>>>>>>>>>>>>>>>> with
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> the
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> help
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> of others it will be easier for us to
>>>>>>> come
>>>>>>>>> to a
>>>>>>>>>>>>>>>>> consensus.
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Best regards,
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Smirnov Alexander
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> пт, 29 апр. 2022 г. в 22:33, Qingsheng
>>>>>>> Ren
>>>>>>>> <
>>>>>>>>>>>>>>>>>>>>>>>>>>>> [email protected]
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> :
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Hi Alexander and Arvid,
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Thanks for the discussion and sorry
>>>>>> for
>>>>>>> my
>>>>>>>>>>> late
>>>>>>>>>>>>>>>>> response!
>>>>>>>>>>>>>>>>>>>>>>>>>>>> We
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> had
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> an
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> internal discussion together with Jark
>>>>>>> and
>>>>>>>>>>> Leonard
>>>>>>>>>>>>>>>> and
>>>>>>>>>>>>>>>>> I’d
>>>>>>>>>>>>>>>>>>>>>>>>>>>> like
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> to
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> summarize our ideas. Instead of
>>>>>>>> implementing
>>>>>>>>>>> the
>>>>>>>>>>>>>>>> cache
>>>>>>>>>>>>>>>>>>>>>>>>>>>> logic in
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> the
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> table
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> runtime layer or wrapping around the
>>>>>>>>>>> user-provided
>>>>>>>>>>>>>>>>> table
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> function,
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> we
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> prefer to introduce some new APIs
>>>>>>> extending
>>>>>>>>>>>>>>>>> TableFunction
>>>>>>>>>>>>>>>>>>>>>>>>>>>> with
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> these
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> concerns:
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> 1. Caching actually breaks the
>>>>>> semantic
>>>>>>> of
>>>>>>>>>>> "FOR
>>>>>>>>>>>>>>>>>>>>>>>>>>>> SYSTEM_TIME
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> AS OF
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> proc_time”, because it couldn’t truly
>>>>>>>> reflect
>>>>>>>>>>> the
>>>>>>>>>>>>>>>>> content
>>>>>>>>>>>>>>>>>>>>>>>>>>>> of the
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> lookup
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> table at the moment of querying. If
>>>>>> users
>>>>>>>>>>> choose
>>>>>>>>>>>>> to
>>>>>>>>>>>>>>>>> enable
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> caching
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> on
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> the
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> lookup table, they implicitly indicate
>>>>>>> that
>>>>>>>>>>> this
>>>>>>>>>>>>>>>>> breakage is
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> acceptable
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> in
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> exchange for the performance. So we
>>>>>>> prefer
>>>>>>>>> not
>>>>>>>>>>> to
>>>>>>>>>>>>>>>>> provide
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> caching on
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> the
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> table runtime level.
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> 2. If we make the cache implementation
>>>>>>> in
>>>>>>>>> the
>>>>>>>>>>>>>>>>> framework
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> (whether
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> in a
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> runner or a wrapper around
>>>>>>> TableFunction),
>>>>>>>> we
>>>>>>>>>>> have
>>>>>>>>>>>>>>> to
>>>>>>>>>>>>>>>>>>>>>>>>>>>> confront a
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> situation
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> that allows table options in DDL to
>>>>>>> control
>>>>>>>>> the
>>>>>>>>>>>>>>>>> behavior of
>>>>>>>>>>>>>>>>>>>>>>>>>>>> the
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> framework,
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> which has never happened previously and
>>>>>>>>> should
>>>>>>>>>>> be
>>>>>>>>>>>>>>>>> cautious.
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Under
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> the
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> current design the behavior of the
>>>>>>>> framework
>>>>>>>>>>>>> should
>>>>>>>>>>>>>>>>> only be
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> specified
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> by
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> configurations (“table.exec.xxx”), and
>>>>>>> it’s
>>>>>>>>>>> hard
>>>>>>>>>>>>> to
>>>>>>>>>>>>>>>>> apply
>>>>>>>>>>>>>>>>>>>>>>>>>>>> these
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> general
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> configs to a specific table.
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> 3. We have use cases that lookup
>>>>>> source
>>>>>>>>> loads
>>>>>>>>>>> and
>>>>>>>>>>>>>>>>> refresh
>>>>>>>>>>>>>>>>>>>>>>>>>>>> all
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> records
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> periodically into the memory to achieve
>>>>>>>> high
>>>>>>>>>>>>> lookup
>>>>>>>>>>>>>>>>>>>>>>>>>>>> performance
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> (like
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Hive
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> connector in the community, and also
>>>>>>> widely
>>>>>>>>>>> used
>>>>>>>>>>>>> by
>>>>>>>>>>>>>>>> our
>>>>>>>>>>>>>>>>>>>>>>>>>>>> internal
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> connectors). Wrapping the cache around
>>>>>>> the
>>>>>>>>>>> user’s
>>>>>>>>>>>>>>>>>>>>>>>>>>>> TableFunction
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> works
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> fine
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> for LRU caches, but I think we have to
>>>>>>>>>>> introduce a
>>>>>>>>>>>>>>>> new
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> interface for
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> this
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> all-caching scenario and the design
>>>>>> would
>>>>>>>>>>> become
>>>>>>>>>>>>>>> more
>>>>>>>>>>>>>>>>>>>>>>>>>>>> complex.
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> 4. Providing the cache in the
>>>>>> framework
>>>>>>>>> might
>>>>>>>>>>>>>>>>> introduce
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> compatibility
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> issues to existing lookup sources like
>>>>>>>> there
>>>>>>>>>>> might
>>>>>>>>>>>>>>>>> exist two
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> caches
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> with
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> totally different strategies if the
>>>>>> user
>>>>>>>>>>>>>>> incorrectly
>>>>>>>>>>>>>>>>>>>>>>>>>>>> configures
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> the
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> table
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> (one in the framework and another
>>>>>>>> implemented
>>>>>>>>>>> by
>>>>>>>>>>>>>>> the
>>>>>>>>>>>>>>>>> lookup
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> source).
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> As for the optimization mentioned by
>>>>>>>>>>> Alexander, I
>>>>>>>>>>>>>>>>> think
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> filters
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> and
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> projections should be pushed all the
>>>>>> way
>>>>>>>> down
>>>>>>>>>>> to
>>>>>>>>>>>>>>> the
>>>>>>>>>>>>>>>>> table
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> function,
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> like
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> what we do in the scan source, instead
>>>>>> of
>>>>>>>> the
>>>>>>>>>>>>>>> runner
>>>>>>>>>>>>>>>>> with
>>>>>>>>>>>>>>>>>>>>>>>>>>>> the
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> cache.
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> The
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> goal of using cache is to reduce the
>>>>>>>> network
>>>>>>>>>>> I/O
>>>>>>>>>>>>>>> and
>>>>>>>>>>>>>>>>>>>>>>>>>>>> pressure
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> on the
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> external system, and only applying
>>>>>> these
>>>>>>>>>>>>>>>> optimizations
>>>>>>>>>>>>>>>>> to
>>>>>>>>>>>>>>>>>>>>>>>>>>>> the
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> cache
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> seems
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> not quite useful.
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> I made some updates to the FLIP[1] to
>>>>>>>>> reflect
>>>>>>>>>>> our
>>>>>>>>>>>>>>>>> ideas.
>>>>>>>>>>>>>>>>>>>>>>>>>>>> We
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> prefer to
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> keep the cache implementation as a part
>>>>>>> of
>>>>>>>>>>>>>>>>> TableFunction,
>>>>>>>>>>>>>>>>>>>>>>>>>>>> and we
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> could
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> provide some helper classes
>>>>>>>>>>> (CachingTableFunction,
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> AllCachingTableFunction,
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> CachingAsyncTableFunction) to
>>>>>> developers
>>>>>>>> and
>>>>>>>>>>>>>>> regulate
>>>>>>>>>>>>>>>>>>>>>>>>>>>> metrics
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> of the
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> cache.
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Also, I made a POC[2] for your
>>>>>> reference.
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Looking forward to your ideas!
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> [1]
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>> 
>>>>>>>>>>>>> 
>>>>>>>>>>> 
>>>>>>>>> 
>>>>>>>> 
>>>>>>> 
>>>>>> 
>>>> 
>>> https://cwiki.apache.org/confluence/display/FLINK/FLIP-221+Abstraction+for+lookup+source+cache+and+metric
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> [2]
>>>>>>>>>>>>>>>> https://github.com/PatrickRen/flink/tree/FLIP-221
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Best regards,
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Qingsheng
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> On Tue, Apr 26, 2022 at 4:45 PM
>>>>>>> Александр
>>>>>>>>>>> Смирнов
>>>>>>>>>>>>>>> <
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> [email protected]>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> wrote:
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Thanks for the response, Arvid!
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> I have few comments on your message.
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> but could also live with an easier
>>>>>>>>> solution
>>>>>>>>>>> as
>>>>>>>>>>>>>>> the
>>>>>>>>>>>>>>>>>>>>>>>>>>>> first
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> step:
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> I think that these 2 ways are
>>>>>> mutually
>>>>>>>>>>> exclusive
>>>>>>>>>>>>>>>>>>>>>>>>>>>> (originally
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> proposed
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> by Qingsheng and mine), because
>>>>>>>>> conceptually
>>>>>>>>>>>>> they
>>>>>>>>>>>>>>>>> follow
>>>>>>>>>>>>>>>>>>>>>>>>>>>> the
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> same
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> goal, but implementation details are
>>>>>>>>>>> different.
>>>>>>>>>>>>>>> If
>>>>>>>>>>>>>>>> we
>>>>>>>>>>>>>>>>>>>>>>>>>>>> will
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> go one
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> way,
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> moving to another way in the future
>>>>>>> will
>>>>>>>>> mean
>>>>>>>>>>>>>>>>> deleting
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> existing
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> code
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> and once again changing the API for
>>>>>>>>>>> connectors.
>>>>>>>>>>>>>>> So
>>>>>>>>>>>>>>>> I
>>>>>>>>>>>>>>>>>>>>>>>>>>>> think we
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> should
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> reach a consensus with the community
>>>>>>>> about
>>>>>>>>>>> that
>>>>>>>>>>>>>>> and
>>>>>>>>>>>>>>>>> then
>>>>>>>>>>>>>>>>>>>>>>>>>>>> work
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> together
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> on this FLIP, i.e. divide the work on
>>>>>>>> tasks
>>>>>>>>>>> for
>>>>>>>>>>>>>>>>> different
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> parts
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> of
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> the
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> flip (for example, LRU cache
>>>>>>> unification
>>>>>>>> /
>>>>>>>>>>>>>>>>> introducing
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> proposed
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> set
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> of
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> metrics / further work…). WDYT,
>>>>>>>> Qingsheng?
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> as the source will only receive the
>>>>>>>>> requests
>>>>>>>>>>>>>>> after
>>>>>>>>>>>>>>>>>>>>>>>>>>>> filter
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Actually if filters are applied to
>>>>>>> fields
>>>>>>>>> of
>>>>>>>>>>> the
>>>>>>>>>>>>>>>>> lookup
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> table, we
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> firstly must do requests, and only
>>>>>>> after
>>>>>>>>>>> that we
>>>>>>>>>>>>>>>> can
>>>>>>>>>>>>>>>>>>>>>>>>>>>> filter
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> responses,
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> because lookup connectors don't have
>>>>>>>> filter
>>>>>>>>>>>>>>>>> pushdown. So
>>>>>>>>>>>>>>>>>>>>>>>>>>>> if
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> filtering
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> is done before caching, there will be
>>>>>>>> much
>>>>>>>>>>> less
>>>>>>>>>>>>>>>> rows
>>>>>>>>>>>>>>>>> in
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> cache.
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> @Alexander unfortunately, your
>>>>>>>>> architecture
>>>>>>>>>>> is
>>>>>>>>>>>>>>> not
>>>>>>>>>>>>>>>>>>>>>>>>>>>> shared.
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> I
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> don't
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> know the
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> solution to share images to be
>>>>>> honest.
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Sorry for that, I’m a bit new to such
>>>>>>>> kinds
>>>>>>>>>>> of
>>>>>>>>>>>>>>>>>>>>>>>>>>>> conversations
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> :)
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> I have no write access to the
>>>>>>> confluence,
>>>>>>>>> so
>>>>>>>>>>> I
>>>>>>>>>>>>>>>> made a
>>>>>>>>>>>>>>>>>>>>>>>>>>>> Jira
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> issue,
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> where described the proposed changes
>>>>>> in
>>>>>>>>> more
>>>>>>>>>>>>>>>> details
>>>>>>>>>>>>>>>>> -
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>> https://issues.apache.org/jira/browse/FLINK-27411.
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Will happy to get more feedback!
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Best,
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Smirnov Alexander
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> пн, 25 апр. 2022 г. в 19:49, Arvid
>>>>>>> Heise
>>>>>>>> <
>>>>>>>>>>>>>>>>>>>>>>>>>>>> [email protected]>:
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Hi Qingsheng,
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Thanks for driving this; the
>>>>>>>> inconsistency
>>>>>>>>>>> was
>>>>>>>>>>>>>>> not
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> satisfying
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> for
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> me.
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> I second Alexander's idea though but
>>>>>>>> could
>>>>>>>>>>> also
>>>>>>>>>>>>>>>> live
>>>>>>>>>>>>>>>>>>>>>>>>>>>> with
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> an
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> easier
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> solution as the first step: Instead
>>>>>> of
>>>>>>>>>>> making
>>>>>>>>>>>>>>>>> caching
>>>>>>>>>>>>>>>>>>>>>>>>>>>> an
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> implementation
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> detail of TableFunction X, rather
>>>>>>>> devise a
>>>>>>>>>>>>>>> caching
>>>>>>>>>>>>>>>>>>>>>>>>>>>> layer
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> around X.
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> So
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> the
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> proposal would be a
>>>>>>> CachingTableFunction
>>>>>>>>>>> that
>>>>>>>>>>>>>>>>>>>>>>>>>>>> delegates to
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> X in
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> case
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> of
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> misses and else manages the cache.
>>>>>>>> Lifting
>>>>>>>>>>> it
>>>>>>>>>>>>>>> into
>>>>>>>>>>>>>>>>> the
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> operator
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> model
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> as
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> proposed would be even better but is
>>>>>>>>>>> probably
>>>>>>>>>>>>>>>>>>>>>>>>>>>> unnecessary
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> in
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> the
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> first step
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> for a lookup source (as the source
>>>>>>> will
>>>>>>>>> only
>>>>>>>>>>>>>>>> receive
>>>>>>>>>>>>>>>>>>>>>>>>>>>> the
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> requests
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> after
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> filter; applying projection may be
>>>>>>> more
>>>>>>>>>>>>>>>> interesting
>>>>>>>>>>>>>>>>> to
>>>>>>>>>>>>>>>>>>>>>>>>>>>> save
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> memory).
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Another advantage is that all the
>>>>>>>> changes
>>>>>>>>> of
>>>>>>>>>>>>>>> this
>>>>>>>>>>>>>>>>> FLIP
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> would be
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> limited to
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> options, no need for new public
>>>>>>>>> interfaces.
>>>>>>>>>>>>>>>>> Everything
>>>>>>>>>>>>>>>>>>>>>>>>>>>> else
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> remains
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> an
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> implementation of Table runtime.
>>>>>> That
>>>>>>>>> means
>>>>>>>>>>> we
>>>>>>>>>>>>>>> can
>>>>>>>>>>>>>>>>>>>>>>>>>>>> easily
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> incorporate
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> the
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> optimization potential that
>>>>>> Alexander
>>>>>>>>>>> pointed
>>>>>>>>>>>>>>> out
>>>>>>>>>>>>>>>>>>>>>>>>>>>> later.
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> @Alexander unfortunately, your
>>>>>>>>> architecture
>>>>>>>>>>> is
>>>>>>>>>>>>>>> not
>>>>>>>>>>>>>>>>>>>>>>>>>>>> shared.
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> I
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> don't
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> know the
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> solution to share images to be
>>>>>> honest.
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> On Fri, Apr 22, 2022 at 5:04 PM
>>>>>>>> Александр
>>>>>>>>>>>>>>> Смирнов
>>>>>>>>>>>>>>>> <
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> [email protected]>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> wrote:
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Hi Qingsheng! My name is Alexander,
>>>>>>> I'm
>>>>>>>>>>> not a
>>>>>>>>>>>>>>>>>>>>>>>>>>>> committer
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> yet,
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> but
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> I'd
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> really like to become one. And this
>>>>>>>> FLIP
>>>>>>>>>>>>> really
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> interested
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> me.
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Actually I have worked on a similar
>>>>>>>>>>> feature in
>>>>>>>>>>>>>>> my
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> company’s
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Flink
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> fork, and we would like to share
>>>>>> our
>>>>>>>>>>> thoughts
>>>>>>>>>>>>>>> on
>>>>>>>>>>>>>>>>>>>>>>>>>>>> this and
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> make
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> code
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> open source.
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> I think there is a better
>>>>>> alternative
>>>>>>>>> than
>>>>>>>>>>>>>>>>>>>>>>>>>>>> introducing an
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> abstract
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> class for TableFunction
>>>>>>>>>>>>> (CachingTableFunction).
>>>>>>>>>>>>>>>> As
>>>>>>>>>>>>>>>>>>>>>>>>>>>> you
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> know,
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> TableFunction exists in the
>>>>>>>>>>> flink-table-common
>>>>>>>>>>>>>>>>>>>>>>>>>>>> module,
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> which
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> provides
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> only an API for working with
>>>>>> tables –
>>>>>>>>> it’s
>>>>>>>>>>>>> very
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> convenient
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> for
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> importing
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> in connectors. In turn,
>>>>>>>>>>> CachingTableFunction
>>>>>>>>>>>>>>>>> contains
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> logic
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> for
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> runtime execution,  so this class
>>>>>> and
>>>>>>>>>>>>>>> everything
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> connected
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> with
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> it
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> should be located in another
>>>>>> module,
>>>>>>>>>>> probably
>>>>>>>>>>>>>>> in
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> flink-table-runtime.
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> But this will require connectors to
>>>>>>>>> depend
>>>>>>>>>>> on
>>>>>>>>>>>>>>>>> another
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> module,
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> which
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> contains a lot of runtime logic,
>>>>>>> which
>>>>>>>>>>> doesn’t
>>>>>>>>>>>>>>>>> sound
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> good.
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> I suggest adding a new method
>>>>>>>>>>>>> ‘getLookupConfig’
>>>>>>>>>>>>>>>> to
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> LookupTableSource
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> or LookupRuntimeProvider to allow
>>>>>>>>>>> connectors
>>>>>>>>>>>>> to
>>>>>>>>>>>>>>>>> only
>>>>>>>>>>>>>>>>>>>>>>>>>>>> pass
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> configurations to the planner,
>>>>>>>> therefore
>>>>>>>>>>> they
>>>>>>>>>>>>>>>> won’t
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> depend on
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> runtime
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> realization. Based on these configs
>>>>>>>>> planner
>>>>>>>>>>>>>>> will
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> construct a
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> lookup
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> join operator with corresponding
>>>>>>>> runtime
>>>>>>>>>>> logic
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> (ProcessFunctions
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> in
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> module flink-table-runtime).
>>>>>>>> Architecture
>>>>>>>>>>>>> looks
>>>>>>>>>>>>>>>>> like
>>>>>>>>>>>>>>>>>>>>>>>>>>>> in
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> the
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> pinned
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> image (LookupConfig class there is
>>>>>>>>> actually
>>>>>>>>>>>>>>> yours
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> CacheConfig).
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Classes in flink-table-planner,
>>>>>> that
>>>>>>>> will
>>>>>>>>>>> be
>>>>>>>>>>>>>>>>>>>>>>>>>>>> responsible
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> for
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> this
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> –
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> CommonPhysicalLookupJoin and his
>>>>>>>>>>> inheritors.
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Current classes for lookup join in
>>>>>>>>>>>>>>>>>>>>>>>>>>>> flink-table-runtime
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> -
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> LookupJoinRunner,
>>>>>>>> AsyncLookupJoinRunner,
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> LookupJoinRunnerWithCalc,
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> AsyncLookupJoinRunnerWithCalc.
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> I suggest adding classes
>>>>>>>>>>>>>>> LookupJoinCachingRunner,
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> LookupJoinCachingRunnerWithCalc,
>>>>>> etc.
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> And here comes another more
>>>>>> powerful
>>>>>>>>>>> advantage
>>>>>>>>>>>>>>> of
>>>>>>>>>>>>>>>>>>>>>>>>>>>> such a
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> solution.
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> If
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> we have caching logic on a lower
>>>>>>> level,
>>>>>>>>> we
>>>>>>>>>>> can
>>>>>>>>>>>>>>>>> apply
>>>>>>>>>>>>>>>>>>>>>>>>>>>> some
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> optimizations to it.
>>>>>>>>>>> LookupJoinRunnerWithCalc
>>>>>>>>>>>>>>> was
>>>>>>>>>>>>>>>>>>>>>>>>>>>> named
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> like
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> this
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> because it uses the ‘calc’
>>>>>> function,
>>>>>>>>> which
>>>>>>>>>>>>>>>> actually
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> mostly
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> consists
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> of
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> filters and projections.
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> For example, in join table A with
>>>>>>>> lookup
>>>>>>>>>>> table
>>>>>>>>>>>>>>> B
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> condition
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> ‘JOIN …
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> ON
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> A.id = B.id AND A.age = B.age + 10
>>>>>>>> WHERE
>>>>>>>>>>>>>>>> B.salary >
>>>>>>>>>>>>>>>>>>>>>>>>>>>> 1000’
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> ‘calc’
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> function will contain filters
>>>>>> A.age =
>>>>>>>>>>> B.age +
>>>>>>>>>>>>>>> 10
>>>>>>>>>>>>>>>>> and
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> B.salary >
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> 1000.
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> If we apply this function before
>>>>>>>> storing
>>>>>>>>>>>>>>> records
>>>>>>>>>>>>>>>> in
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> cache,
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> size
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> of
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> cache will be significantly
>>>>>> reduced:
>>>>>>>>>>> filters =
>>>>>>>>>>>>>>>>> avoid
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> storing
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> useless
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> records in cache, projections =
>>>>>>> reduce
>>>>>>>>>>>>> records’
>>>>>>>>>>>>>>>>>>>>>>>>>>>> size. So
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> the
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> initial
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> max number of records in cache can
>>>>>> be
>>>>>>>>>>>>> increased
>>>>>>>>>>>>>>>> by
>>>>>>>>>>>>>>>>>>>>>>>>>>>> the
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> user.
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> What do you think about it?
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> On 2022/04/19 02:47:11 Qingsheng
>>>>>> Ren
>>>>>>>>> wrote:
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Hi devs,
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Yuan and I would like to start a
>>>>>>>>>>> discussion
>>>>>>>>>>>>>>>> about
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> FLIP-221[1],
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> which
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> introduces an abstraction of lookup
>>>>>>>> table
>>>>>>>>>>>>> cache
>>>>>>>>>>>>>>>> and
>>>>>>>>>>>>>>>>>>>>>>>>>>>> its
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> standard
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> metrics.
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Currently each lookup table source
>>>>>>>>> should
>>>>>>>>>>>>>>>>> implement
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> their
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> own
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> cache to
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> store lookup results, and there
>>>>>>> isn’t a
>>>>>>>>>>>>>>> standard
>>>>>>>>>>>>>>>> of
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> metrics
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> for
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> users and
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> developers to tuning their jobs
>>>>>> with
>>>>>>>>> lookup
>>>>>>>>>>>>>>>> joins,
>>>>>>>>>>>>>>>>>>>>>>>>>>>> which
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> is a
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> quite
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> common
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> use case in Flink table / SQL.
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Therefore we propose some new APIs
>>>>>>>>>>> including
>>>>>>>>>>>>>>>>> cache,
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> metrics,
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> wrapper
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> classes of TableFunction and new
>>>>>>> table
>>>>>>>>>>>>> options.
>>>>>>>>>>>>>>>>>>>>>>>>>>>> Please
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> take a
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> look
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> at the
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> FLIP page [1] to get more details.
>>>>>>> Any
>>>>>>>>>>>>>>>> suggestions
>>>>>>>>>>>>>>>>>>>>>>>>>>>> and
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> comments
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> would be
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> appreciated!
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> [1]
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>> 
>>>>>>>>>>>>> 
>>>>>>>>>>> 
>>>>>>>>> 
>>>>>>>> 
>>>>>>> 
>>>>>> 
>>>> 
>>> https://cwiki.apache.org/confluence/display/FLINK/FLIP-221+Abstraction+for+lookup+source+cache+and+metric
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Best regards,
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Qingsheng
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> --
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Best Regards,
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Qingsheng Ren
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Real-time Computing Team
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Alibaba Cloud
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Email: [email protected]
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> --
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Best regards,
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Roman Boyko
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> e.: [email protected]
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>>> --
>>>>>>>>>>>>>>>>>>>>>>>>> Best Regards,
>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>>> Qingsheng Ren
>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>>> Real-time Computing Team
>>>>>>>>>>>>>>>>>>>>>>>>> Alibaba Cloud
>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>>> Email: [email protected]
>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>> 
>>>>>>>>>>>>> 
>>>>>>>>>>>>> 
>>>>>>>>>>> 
>>>>>>>>>> 
>>>>>>>>> 
>>>>>>>> 
>>>>>>> 
>>>>>> 
>>>> 
>>>> 
>>>

Re: [DISCUSS] FLIP-221 Abstraction for lookup source cache and metric

Reply via email to