Re: [DISCUSS] FLIP-221 Abstraction for lookup source cache and metric

Alexander Smirnov Wed, 22 Jun 2022 10:10:49 -0700

Hi Qingsheng,

I like the current design, thanks for your efforts! I have no
objections at the moment.
+1 to start the vote.


Best regards,
Alexander

ср, 22 июн. 2022 г. в 15:59, Qingsheng Ren <re...@apache.org>:
>
> Hi Jingsong,
>
> 1. Updated and thanks for the reminder!
>
> 2. We could do so for implementation but as public interface I prefer not to 
> introduce another layer and expose too much since this FLIP is already a huge 
> one with bunch of classes and interfaces.
>
> Best,
> Qingsheng
>
> > On Jun 22, 2022, at 11:16, Jingsong Li <jingsongl...@gmail.com> wrote:
> >
> > Thanks Qingsheng and all.
> >
> > I like this design.
> >
> > Some comments:
> >
> > 1. LookupCache implements Serializable?
> >
> > 2. Minor: After FLIP-234 [1], there should be many connectors that
> > implement both PartialCachingLookupProvider and
> > PartialCachingAsyncLookupProvider. Can we extract a common interface
> > for `LookupCache getCache();` to ensure consistency?
> >
> > [1] 
> > https://cwiki.apache.org/confluence/display/FLINK/FLIP-234%3A+Support+Retryable+Lookup+Join+To+Solve+Delayed+Updates+Issue+In+External+Systems
> >
> > Best,
> > Jingsong
> >
> > On Tue, Jun 21, 2022 at 4:09 PM Qingsheng Ren <re...@apache.org> wrote:
> >>
> >> Hi devs,
> >>
> >> I’d like to push FLIP-221 forward a little bit. Recently we had some 
> >> offline discussions and updated the FLIP. Here’s the diff compared to the 
> >> previous version:
> >>
> >> 1. (Async)LookupFunctionProvider is designed as a base interface for 
> >> constructing lookup functions.
> >> 2. From the LookupFunction we extend PartialCaching / 
> >> FullCachingLookupProvider for partial and full caching mode.
> >> 3. Introduce CacheReloadTrigger for specifying reload stratrgy in full 
> >> caching mode, and provide 2 default implementations (Periodic / 
> >> TimedCacheReloadTrigger)
> >>
> >> Looking forward to your replies~
> >>
> >> Best,
> >> Qingsheng
> >>
> >>> On Jun 2, 2022, at 17:15, Qingsheng Ren <renqs...@gmail.com> wrote:
> >>>
> >>> Hi Becket,
> >>>
> >>> Thanks for your feedback!
> >>>
> >>> 1. An alternative way is to let the implementation of cache to decide
> >>> whether to store a missing key in the cache instead of the framework.
> >>> This sounds more reasonable and makes the LookupProvider interface
> >>> cleaner. I can update the FLIP and clarify in the JavaDoc of
> >>> LookupCache#put that the cache should decide whether to store an empty
> >>> collection.
> >>>
> >>> 2. Initially the builder pattern is for the extensibility of
> >>> LookupProvider interfaces that we could need to add more
> >>> configurations in the future. We can remove the builder now as we have
> >>> resolved the issue in 1. As for the builder in DefaultLookupCache I
> >>> prefer to keep it because we have a lot of arguments in the
> >>> constructor.
> >>>
> >>> 3. I think this might overturn the overall design. I agree with
> >>> Becket's idea that the API design should be layered considering
> >>> extensibility and it'll be great to have one unified interface
> >>> supporting both partial, full and even mixed custom strategies, but we
> >>> have some issues to resolve. The original purpose of treating full
> >>> caching separately is that we'd like to reuse the ability of
> >>> ScanRuntimeProvider. Developers just need to hand over Source /
> >>> SourceFunction / InputFormat so that the framework could be able to
> >>> compose the underlying topology and control the reload (maybe in a
> >>> distributed way). Under your design we leave the reload operation
> >>> totally to the CacheStrategy and I think it will be hard for
> >>> developers to reuse the source in the initializeCache method.
> >>>
> >>> Best regards,
> >>>
> >>> Qingsheng
> >>>
> >>> On Thu, Jun 2, 2022 at 1:50 PM Becket Qin <becket....@gmail.com> wrote:
> >>>>
> >>>> Thanks for updating the FLIP, Qingsheng. A few more comments:
> >>>>
> >>>> 1. I am still not sure about what is the use case for cacheMissingKey().
> >>>> More specifically, when would users want to have getCache() return a
> >>>> non-empty value and cacheMissingKey() returns false?
> >>>>
> >>>> 2. The builder pattern. Usually the builder pattern is used when there 
> >>>> are
> >>>> a lot of variations of constructors. For example, if a class has three
> >>>> variables and all of them are optional, so there could potentially be 
> >>>> many
> >>>> combinations of the variables. But in this FLIP, I don't see such case.
> >>>> What is the reason we have builders for all the classes?
> >>>>
> >>>> 3. Should the caching strategy be excluded from the top level provider 
> >>>> API?
> >>>> Technically speaking, the Flink framework should only have two interfaces
> >>>> to deal with:
> >>>>   A) LookupFunction
> >>>>   B) AsyncLookupFunction
> >>>> Orthogonally, we *believe* there are two different strategies people can 
> >>>> do
> >>>> caching. Note that the Flink framework does not care what is the caching
> >>>> strategy here.
> >>>>   a) partial caching
> >>>>   b) full caching
> >>>>
> >>>> Putting them together, we end up with 3 combinations that we think are
> >>>> valid:
> >>>>    Aa) PartialCachingLookupFunctionProvider
> >>>>    Ba) PartialCachingAsyncLookupFunctionProvider
> >>>>    Ab) FullCachingLookupFunctionProvider
> >>>>
> >>>> However, the caching strategy could actually be quite flexible. E.g. an
> >>>> initial full cache load followed by some partial updates. Also, I am not
> >>>> 100% sure if the full caching will always use ScanTableSource. Including
> >>>> the caching strategy in the top level provider API would make it harder 
> >>>> to
> >>>> extend.
> >>>>
> >>>> One possible solution is to just have *LookupFunctionProvider* and
> >>>> *AsyncLookupFunctionProvider
> >>>> *as the top level API, both with a getCacheStrategy() method returning an
> >>>> optional CacheStrategy. The CacheStrategy class would have the following
> >>>> methods:
> >>>> 1. void open(Context), the context exposes some of the resources that may
> >>>> be useful for the the caching strategy, e.g. an ExecutorService that is
> >>>> synchronized with the data processing, or a cache refresh trigger which
> >>>> blocks data processing and refresh the cache.
> >>>> 2. void initializeCache(), a blocking method allows users to pre-populate
> >>>> the cache before processing any data if they wish.
> >>>> 3. void maybeCache(RowData key, Collection<RowData> value), blocking or
> >>>> non-blocking method.
> >>>> 4. void refreshCache(), a blocking / non-blocking method that is invoked 
> >>>> by
> >>>> the Flink framework when the cache refresh trigger is pulled.
> >>>>
> >>>> In the above design, partial caching and full caching would be
> >>>> implementations of the CachingStrategy. And it is OK for users to 
> >>>> implement
> >>>> their own CachingStrategy if they want to.
> >>>>
> >>>> Thanks,
> >>>>
> >>>> Jiangjie (Becket) Qin
> >>>>
> >>>>
> >>>> On Thu, Jun 2, 2022 at 12:14 PM Jark Wu <imj...@gmail.com> wrote:
> >>>>
> >>>>> Thank Qingsheng for the detailed summary and updates,
> >>>>>
> >>>>> The changes look good to me in general. I just have one minor 
> >>>>> improvement
> >>>>> comment.
> >>>>> Could we add a static util method to the "FullCachingReloadTrigger"
> >>>>> interface for quick usage?
> >>>>>
> >>>>> #periodicReloadAtFixedRate(Duration)
> >>>>> #periodicReloadWithFixedDelay(Duration)
> >>>>>
> >>>>> I think we can also do this for LookupCache. Because users may not know
> >>>>> where is the default
> >>>>> implementations and how to use them.
> >>>>>
> >>>>> Best,
> >>>>> Jark
> >>>>>
> >>>>>
> >>>>>
> >>>>>
> >>>>>
> >>>>>
> >>>>> On Wed, 1 Jun 2022 at 18:32, Qingsheng Ren <renqs...@gmail.com> wrote:
> >>>>>
> >>>>>> Hi Jingsong,
> >>>>>>
> >>>>>> Thanks for your comments!
> >>>>>>
> >>>>>>> AllCache definition is not flexible, for example, PartialCache can use
> >>>>>> any custom storage, while the AllCache can not, AllCache can also be
> >>>>>> considered to store memory or disk, also need a flexible strategy.
> >>>>>>
> >>>>>> We had an offline discussion with Jark and Leonard. Basically we think
> >>>>>> exposing the interface of full cache storage to connector developers
> >>>>> might
> >>>>>> limit our future optimizations. The storage of full caching shouldn’t
> >>>>> have
> >>>>>> too many variations for different lookup tables so making it pluggable
> >>>>>> might not help a lot. Also I think it is not quite easy for connector
> >>>>>> developers to implement such an optimized storage. We can keep 
> >>>>>> optimizing
> >>>>>> this storage in the future and all full caching lookup tables would
> >>>>> benefit
> >>>>>> from this.
> >>>>>>
> >>>>>>> We are more inclined to deprecate the connector `async` option when
> >>>>>> discussing FLIP-234. Can we remove this option from this FLIP?
> >>>>>>
> >>>>>> Thanks for the reminder! This option has been removed in the latest
> >>>>>> version.
> >>>>>>
> >>>>>> Best regards,
> >>>>>>
> >>>>>> Qingsheng
> >>>>>>
> >>>>>>
> >>>>>>> On Jun 1, 2022, at 15:28, Jingsong Li <jingsongl...@gmail.com> wrote:
> >>>>>>>
> >>>>>>> Thanks Alexander for your reply. We can discuss the new interface when
> >>>>> it
> >>>>>>> comes out.
> >>>>>>>
> >>>>>>> We are more inclined to deprecate the connector `async` option when
> >>>>>>> discussing FLIP-234 [1]. We should use hint to let planner decide.
> >>>>>>> Although the discussion has not yet produced a conclusion, can we
> >>>>> remove
> >>>>>>> this option from this FLIP? It doesn't seem to be related to this 
> >>>>>>> FLIP,
> >>>>>> but
> >>>>>>> more to FLIP-234, and we can form a conclusion over there.
> >>>>>>>
> >>>>>>> [1] https://lists.apache.org/thread/9k1sl2519kh2n3yttwqc00p07xdfns3h
> >>>>>>>
> >>>>>>> Best,
> >>>>>>> Jingsong
> >>>>>>>
> >>>>>>> On Wed, Jun 1, 2022 at 4:59 AM Jing Ge <j...@ververica.com> wrote:
> >>>>>>>
> >>>>>>>> Hi Jark,
> >>>>>>>>
> >>>>>>>> Thanks for clarifying it. It would be fine. as long as we could
> >>>>> provide
> >>>>>> the
> >>>>>>>> no-cache solution. I was just wondering if the client side cache 
> >>>>>>>> could
> >>>>>>>> really help when HBase is used, since the data to look up should be
> >>>>>> huge.
> >>>>>>>> Depending how much data will be cached on the client side, the data
> >>>>> that
> >>>>>>>> should be lru in e.g. LruBlockCache will not be lru anymore. In the
> >>>>>> worst
> >>>>>>>> case scenario, once the cached data at client side is expired, the
> >>>>>> request
> >>>>>>>> will hit disk which will cause extra latency temporarily, if I am not
> >>>>>>>> mistaken.
> >>>>>>>>
> >>>>>>>> Best regards,
> >>>>>>>> Jing
> >>>>>>>>
> >>>>>>>> On Mon, May 30, 2022 at 9:59 AM Jark Wu <imj...@gmail.com> wrote:
> >>>>>>>>
> >>>>>>>>> Hi Jing Ge,
> >>>>>>>>>
> >>>>>>>>> What do you mean about the "impact on the block cache used by 
> >>>>>>>>> HBase"?
> >>>>>>>>> In my understanding, the connector cache and HBase cache are totally
> >>>>>> two
> >>>>>>>>> things.
> >>>>>>>>> The connector cache is a local/client cache, and the HBase cache is 
> >>>>>>>>> a
> >>>>>>>>> server cache.
> >>>>>>>>>
> >>>>>>>>>> does it make sense to have a no-cache solution as one of the
> >>>>>>>>> default solutions so that customers will have no effort for the
> >>>>>> migration
> >>>>>>>>> if they want to stick with Hbase cache
> >>>>>>>>>
> >>>>>>>>> The implementation migration should be transparent to users. Take 
> >>>>>>>>> the
> >>>>>>>> HBase
> >>>>>>>>> connector as
> >>>>>>>>> an example,  it already supports lookup cache but is disabled by
> >>>>>> default.
> >>>>>>>>> After migration, the
> >>>>>>>>> connector still disables cache by default (i.e. no-cache solution).
> >>>>> No
> >>>>>>>>> migration effort for users.
> >>>>>>>>>
> >>>>>>>>> HBase cache and connector cache are two different things. HBase 
> >>>>>>>>> cache
> >>>>>>>> can't
> >>>>>>>>> simply replace
> >>>>>>>>> connector cache. Because one of the most important usages for
> >>>>> connector
> >>>>>>>>> cache is reducing
> >>>>>>>>> the I/O request/response and improving the throughput, which can
> >>>>>> achieve
> >>>>>>>>> by just using a server cache.
> >>>>>>>>>
> >>>>>>>>> Best,
> >>>>>>>>> Jark
> >>>>>>>>>
> >>>>>>>>>
> >>>>>>>>>
> >>>>>>>>>
> >>>>>>>>> On Fri, 27 May 2022 at 22:42, Jing Ge <j...@ververica.com> wrote:
> >>>>>>>>>
> >>>>>>>>>> Thanks all for the valuable discussion. The new feature looks very
> >>>>>>>>>> interesting.
> >>>>>>>>>>
> >>>>>>>>>> According to the FLIP description: "*Currently we have JDBC, Hive
> >>>>> and
> >>>>>>>>> HBase
> >>>>>>>>>> connector implemented lookup table source. All existing
> >>>>>> implementations
> >>>>>>>>>> will be migrated to the current design and the migration will be
> >>>>>>>>>> transparent to end users*." I was only wondering if we should pay
> >>>>>>>>> attention
> >>>>>>>>>> to HBase and similar DBs. Since, commonly, the lookup data will be
> >>>>>> huge
> >>>>>>>>>> while using HBase, partial caching will be used in this case, if I
> >>>>> am
> >>>>>>>> not
> >>>>>>>>>> mistaken, which might have an impact on the block cache used by
> >>>>> HBase,
> >>>>>>>>> e.g.
> >>>>>>>>>> LruBlockCache.
> >>>>>>>>>> Another question is that, since HBase provides a sophisticated 
> >>>>>>>>>> cache
> >>>>>>>>>> solution, does it make sense to have a no-cache solution as one of
> >>>>> the
> >>>>>>>>>> default solutions so that customers will have no effort for the
> >>>>>>>> migration
> >>>>>>>>>> if they want to stick with Hbase cache?
> >>>>>>>>>>
> >>>>>>>>>> Best regards,
> >>>>>>>>>> Jing
> >>>>>>>>>>
> >>>>>>>>>> On Fri, May 27, 2022 at 11:19 AM Jingsong Li <
> >>>>> jingsongl...@gmail.com>
> >>>>>>>>>> wrote:
> >>>>>>>>>>
> >>>>>>>>>>> Hi all,
> >>>>>>>>>>>
> >>>>>>>>>>> I think the problem now is below:
> >>>>>>>>>>> 1. AllCache and PartialCache interface on the non-uniform, one
> >>>>> needs
> >>>>>>>> to
> >>>>>>>>>>> provide LookupProvider, the other needs to provide CacheBuilder.
> >>>>>>>>>>> 2. AllCache definition is not flexible, for example, PartialCache
> >>>>> can
> >>>>>>>>> use
> >>>>>>>>>>> any custom storage, while the AllCache can not, AllCache can also
> >>>>> be
> >>>>>>>>>>> considered to store memory or disk, also need a flexible strategy.
> >>>>>>>>>>> 3. AllCache can not customize ReloadStrategy, currently only
> >>>>>>>>>>> ScheduledReloadStrategy.
> >>>>>>>>>>>
> >>>>>>>>>>> In order to solve the above problems, the following are my ideas.
> >>>>>>>>>>>
> >>>>>>>>>>> ## Top level cache interfaces:
> >>>>>>>>>>>
> >>>>>>>>>>> ```
> >>>>>>>>>>>
> >>>>>>>>>>> public interface CacheLookupProvider extends
> >>>>>>>>>>> LookupTableSource.LookupRuntimeProvider {
> >>>>>>>>>>>
> >>>>>>>>>>>  CacheBuilder createCacheBuilder();
> >>>>>>>>>>> }
> >>>>>>>>>>>
> >>>>>>>>>>>
> >>>>>>>>>>> public interface CacheBuilder {
> >>>>>>>>>>>  Cache create();
> >>>>>>>>>>> }
> >>>>>>>>>>>
> >>>>>>>>>>>
> >>>>>>>>>>> public interface Cache {
> >>>>>>>>>>>
> >>>>>>>>>>>  /**
> >>>>>>>>>>>   * Returns the value associated with key in this cache, or null
> >>>>>>>> if
> >>>>>>>>>>> there is no cached value for
> >>>>>>>>>>>   * key.
> >>>>>>>>>>>   */
> >>>>>>>>>>>  @Nullable
> >>>>>>>>>>>  Collection<RowData> getIfPresent(RowData key);
> >>>>>>>>>>>
> >>>>>>>>>>>  /** Returns the number of key-value mappings in the cache. */
> >>>>>>>>>>>  long size();
> >>>>>>>>>>> }
> >>>>>>>>>>>
> >>>>>>>>>>> ```
> >>>>>>>>>>>
> >>>>>>>>>>> ## Partial cache
> >>>>>>>>>>>
> >>>>>>>>>>> ```
> >>>>>>>>>>>
> >>>>>>>>>>> public interface PartialCacheLookupFunction extends
> >>>>>>>>> CacheLookupProvider {
> >>>>>>>>>>>
> >>>>>>>>>>>  @Override
> >>>>>>>>>>>  PartialCacheBuilder createCacheBuilder();
> >>>>>>>>>>>
> >>>>>>>>>>> /** Creates an {@link LookupFunction} instance. */
> >>>>>>>>>>> LookupFunction createLookupFunction();
> >>>>>>>>>>> }
> >>>>>>>>>>>
> >>>>>>>>>>>
> >>>>>>>>>>> public interface PartialCacheBuilder extends CacheBuilder {
> >>>>>>>>>>>
> >>>>>>>>>>>  PartialCache create();
> >>>>>>>>>>> }
> >>>>>>>>>>>
> >>>>>>>>>>>
> >>>>>>>>>>> public interface PartialCache extends Cache {
> >>>>>>>>>>>
> >>>>>>>>>>>  /**
> >>>>>>>>>>>   * Associates the specified value rows with the specified key
> >>>>> row
> >>>>>>>>>>> in the cache. If the cache
> >>>>>>>>>>>   * previously contained value associated with the key, the old
> >>>>>>>>>>> value is replaced by the
> >>>>>>>>>>>   * specified value.
> >>>>>>>>>>>   *
> >>>>>>>>>>>   * @return the previous value rows associated with key, or null
> >>>>>>>> if
> >>>>>>>>>>> there was no mapping for key.
> >>>>>>>>>>>   * @param key - key row with which the specified value is to be
> >>>>>>>>>>> associated
> >>>>>>>>>>>   * @param value – value rows to be associated with the specified
> >>>>>>>>> key
> >>>>>>>>>>>   */
> >>>>>>>>>>>  Collection<RowData> put(RowData key, Collection<RowData> value);
> >>>>>>>>>>>
> >>>>>>>>>>>  /** Discards any cached value for the specified key. */
> >>>>>>>>>>>  void invalidate(RowData key);
> >>>>>>>>>>> }
> >>>>>>>>>>>
> >>>>>>>>>>> ```
> >>>>>>>>>>>
> >>>>>>>>>>> ## All cache
> >>>>>>>>>>> ```
> >>>>>>>>>>>
> >>>>>>>>>>> public interface AllCacheLookupProvider extends
> >>>>> CacheLookupProvider {
> >>>>>>>>>>>
> >>>>>>>>>>>  void registerReloadStrategy(ScheduledExecutorService
> >>>>>>>>>>> executorService, Reloader reloader);
> >>>>>>>>>>>
> >>>>>>>>>>>  ScanTableSource.ScanRuntimeProvider getScanRuntimeProvider();
> >>>>>>>>>>>
> >>>>>>>>>>>  @Override
> >>>>>>>>>>>  AllCacheBuilder createCacheBuilder();
> >>>>>>>>>>> }
> >>>>>>>>>>>
> >>>>>>>>>>>
> >>>>>>>>>>> public interface AllCacheBuilder extends CacheBuilder {
> >>>>>>>>>>>
> >>>>>>>>>>>  AllCache create();
> >>>>>>>>>>> }
> >>>>>>>>>>>
> >>>>>>>>>>>
> >>>>>>>>>>> public interface AllCache extends Cache {
> >>>>>>>>>>>
> >>>>>>>>>>>  void putAll(Iterator<Map<RowData, RowData>> allEntries);
> >>>>>>>>>>>
> >>>>>>>>>>>  void clearAll();
> >>>>>>>>>>> }
> >>>>>>>>>>>
> >>>>>>>>>>>
> >>>>>>>>>>> public interface Reloader {
> >>>>>>>>>>>
> >>>>>>>>>>>  void reload();
> >>>>>>>>>>> }
> >>>>>>>>>>>
> >>>>>>>>>>> ```
> >>>>>>>>>>>
> >>>>>>>>>>> Best,
> >>>>>>>>>>> Jingsong
> >>>>>>>>>>>
> >>>>>>>>>>> On Fri, May 27, 2022 at 11:10 AM Jingsong Li <
> >>>>> jingsongl...@gmail.com
> >>>>>>>>>
> >>>>>>>>>>> wrote:
> >>>>>>>>>>>
> >>>>>>>>>>>> Thanks Qingsheng and all for your discussion.
> >>>>>>>>>>>>
> >>>>>>>>>>>> Very sorry to jump in so late.
> >>>>>>>>>>>>
> >>>>>>>>>>>> Maybe I missed something?
> >>>>>>>>>>>> My first impression when I saw the cache interface was, why don't
> >>>>>>>> we
> >>>>>>>>>>>> provide an interface similar to guava cache [1], on top of guava
> >>>>>>>>> cache,
> >>>>>>>>>>>> caffeine also makes extensions for asynchronous calls.[2]
> >>>>>>>>>>>> There is also the bulk load in caffeine too.
> >>>>>>>>>>>>
> >>>>>>>>>>>> I am also more confused why first from LookupCacheFactory.Builder
> >>>>>>>> and
> >>>>>>>>>>> then
> >>>>>>>>>>>> to Factory to create Cache.
> >>>>>>>>>>>>
> >>>>>>>>>>>> [1] https://github.com/google/guava
> >>>>>>>>>>>> [2] https://github.com/ben-manes/caffeine/wiki/Population
> >>>>>>>>>>>>
> >>>>>>>>>>>> Best,
> >>>>>>>>>>>> Jingsong
> >>>>>>>>>>>>
> >>>>>>>>>>>> On Thu, May 26, 2022 at 11:17 PM Jark Wu <imj...@gmail.com>
> >>>>> wrote:
> >>>>>>>>>>>>
> >>>>>>>>>>>>> After looking at the new introduced ReloadTime and Becket's
> >>>>>>>> comment,
> >>>>>>>>>>>>> I agree with Becket we should have a pluggable reloading
> >>>>> strategy.
> >>>>>>>>>>>>> We can provide some common implementations, e.g., periodic
> >>>>>>>>> reloading,
> >>>>>>>>>>> and
> >>>>>>>>>>>>> daily reloading.
> >>>>>>>>>>>>> But there definitely be some connector- or business-specific
> >>>>>>>>> reloading
> >>>>>>>>>>>>> strategies, e.g.
> >>>>>>>>>>>>> notify by a zookeeper watcher, reload once a new Hive partition
> >>>>> is
> >>>>>>>>>>>>> complete.
> >>>>>>>>>>>>>
> >>>>>>>>>>>>> Best,
> >>>>>>>>>>>>> Jark
> >>>>>>>>>>>>>
> >>>>>>>>>>>>> On Thu, 26 May 2022 at 11:52, Becket Qin <becket....@gmail.com>
> >>>>>>>>>> wrote:
> >>>>>>>>>>>>>
> >>>>>>>>>>>>>> Hi Qingsheng,
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>> Thanks for updating the FLIP. A few comments / questions below:
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>> 1. Is there a reason that we have both "XXXFactory" and
> >>>>>>>>>> "XXXProvider".
> >>>>>>>>>>>>>> What is the difference between them? If they are the same, can
> >>>>>>>> we
> >>>>>>>>>> just
> >>>>>>>>>>>>> use
> >>>>>>>>>>>>>> XXXFactory everywhere?
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>> 2. Regarding the FullCachingLookupProvider, should the 
> >>>>>>>>>>>>>> reloading
> >>>>>>>>>>> policy
> >>>>>>>>>>>>>> also be pluggable? Periodical reloading could be sometimes be
> >>>>>>>>> tricky
> >>>>>>>>>>> in
> >>>>>>>>>>>>>> practice. For example, if user uses 24 hours as the cache
> >>>>>>>> refresh
> >>>>>>>>>>>>> interval
> >>>>>>>>>>>>>> and some nightly batch job delayed, the cache update may still
> >>>>>>>> see
> >>>>>>>>>> the
> >>>>>>>>>>>>>> stale data.
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>> 3. In DefaultLookupCacheFactory, it looks like InitialCapacity
> >>>>>>>>>> should
> >>>>>>>>>>> be
> >>>>>>>>>>>>>> removed.
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>> 4. The purpose of LookupFunctionProvider#cacheMissingKey()
> >>>>>>>> seems a
> >>>>>>>>>>>>> little
> >>>>>>>>>>>>>> confusing to me. If Optional<LookupCacheFactory>
> >>>>>>>> getCacheFactory()
> >>>>>>>>>>>>> returns
> >>>>>>>>>>>>>> a non-empty factory, doesn't that already indicates the
> >>>>>>>> framework
> >>>>>>>>> to
> >>>>>>>>>>>>> cache
> >>>>>>>>>>>>>> the missing keys? Also, why is this method returning an
> >>>>>>>>>>>>> Optional<Boolean>
> >>>>>>>>>>>>>> instead of boolean?
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>> Thanks,
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>> Jiangjie (Becket) Qin
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>> On Wed, May 25, 2022 at 5:07 PM Qingsheng Ren <
> >>>>>>>> renqs...@gmail.com
> >>>>>>>>>>
> >>>>>>>>>>>>> wrote:
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>>> Hi Lincoln and Jark,
> >>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>> Thanks for the comments! If the community reaches a consensus
> >>>>>>>>> that
> >>>>>>>>>> we
> >>>>>>>>>>>>> use
> >>>>>>>>>>>>>>> SQL hint instead of table options to decide whether to use 
> >>>>>>>>>>>>>>> sync
> >>>>>>>>> or
> >>>>>>>>>>>>> async
> >>>>>>>>>>>>>>> mode, it’s indeed not necessary to introduce the 
> >>>>>>>>>>>>>>> “lookup.async”
> >>>>>>>>>>> option.
> >>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>> I think it’s a good idea to let the decision of async made on
> >>>>>>>>> query
> >>>>>>>>>>>>>>> level, which could make better optimization with more
> >>>>>>>> infomation
> >>>>>>>>>>>>> gathered
> >>>>>>>>>>>>>>> by planner. Is there any FLIP describing the issue in
> >>>>>>>>> FLINK-27625?
> >>>>>>>>>> I
> >>>>>>>>>>>>>>> thought FLIP-234 is only proposing adding SQL hint for retry 
> >>>>>>>>>>>>>>> on
> >>>>>>>>>>> missing
> >>>>>>>>>>>>>>> instead of the entire async mode to be controlled by hint.
> >>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>> Best regards,
> >>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>> Qingsheng
> >>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>> On May 25, 2022, at 15:13, Lincoln Lee <
> >>>>>>>> lincoln.8...@gmail.com
> >>>>>>>>>>
> >>>>>>>>>>>>> wrote:
> >>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>> Hi Jark,
> >>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>> Thanks for your reply!
> >>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>> Currently 'lookup.async' just lies in HBase connector, I have
> >>>>>>>>> no
> >>>>>>>>>>> idea
> >>>>>>>>>>>>>>>> whether or when to remove it (we can discuss it in another
> >>>>>>>>> issue
> >>>>>>>>>>> for
> >>>>>>>>>>>>> the
> >>>>>>>>>>>>>>>> HBase connector after FLINK-27625 is done), just not add it
> >>>>>>>>> into
> >>>>>>>>>> a
> >>>>>>>>>>>>>>> common
> >>>>>>>>>>>>>>>> option now.
> >>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>> Best,
> >>>>>>>>>>>>>>>> Lincoln Lee
> >>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>> Jark Wu <imj...@gmail.com> 于2022年5月24日周二 20:14写道：
> >>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>> Hi Lincoln,
> >>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>> I have taken a look at FLIP-234, and I agree with you that
> >>>>>>>> the
> >>>>>>>>>>>>>>> connectors
> >>>>>>>>>>>>>>>>> can
> >>>>>>>>>>>>>>>>> provide both async and sync runtime providers simultaneously
> >>>>>>>>>>> instead
> >>>>>>>>>>>>>>> of one
> >>>>>>>>>>>>>>>>> of them.
> >>>>>>>>>>>>>>>>> At that point, "lookup.async" looks redundant. If this
> >>>>>>>> option
> >>>>>>>>> is
> >>>>>>>>>>>>>>> planned to
> >>>>>>>>>>>>>>>>> be removed
> >>>>>>>>>>>>>>>>> in the long term, I think it makes sense not to introduce it
> >>>>>>>>> in
> >>>>>>>>>>> this
> >>>>>>>>>>>>>>> FLIP.
> >>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>> Best,
> >>>>>>>>>>>>>>>>> Jark
> >>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>> On Tue, 24 May 2022 at 11:08, Lincoln Lee <
> >>>>>>>>>> lincoln.8...@gmail.com
> >>>>>>>>>>>>
> >>>>>>>>>>>>>>> wrote:
> >>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>> Hi Qingsheng,
> >>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>> Sorry for jumping into the discussion so late. It's a good
> >>>>>>>>> idea
> >>>>>>>>>>>>> that
> >>>>>>>>>>>>>>> we
> >>>>>>>>>>>>>>>>> can
> >>>>>>>>>>>>>>>>>> have a common table option. I have a minor comments on
> >>>>>>>>>>>>> 'lookup.async'
> >>>>>>>>>>>>>>>>> that
> >>>>>>>>>>>>>>>>>> not make it a common option:
> >>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>> The table layer abstracts both sync and async lookup
> >>>>>>>>>>> capabilities,
> >>>>>>>>>>>>>>>>>> connectors implementers can choose one or both, in the case
> >>>>>>>>> of
> >>>>>>>>>>>>>>>>> implementing
> >>>>>>>>>>>>>>>>>> only one capability(status of the most of existing builtin
> >>>>>>>>>>>>> connectors)
> >>>>>>>>>>>>>>>>>> 'lookup.async' will not be used.  And when a connector has
> >>>>>>>>> both
> >>>>>>>>>>>>>>>>>> capabilities, I think this choice is more suitable for
> >>>>>>>> making
> >>>>>>>>>>>>>>> decisions
> >>>>>>>>>>>>>>>>> at
> >>>>>>>>>>>>>>>>>> the query level, for example, table planner can choose the
> >>>>>>>>>>> physical
> >>>>>>>>>>>>>>>>>> implementation of async lookup or sync lookup based on its
> >>>>>>>>> cost
> >>>>>>>>>>>>>>> model, or
> >>>>>>>>>>>>>>>>>> users can give query hint based on their own better
> >>>>>>>>>>>>> understanding.  If
> >>>>>>>>>>>>>>>>>> there is another common table option 'lookup.async', it may
> >>>>>>>>>>> confuse
> >>>>>>>>>>>>>>> the
> >>>>>>>>>>>>>>>>>> users in the long run.
> >>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>> So, I prefer to leave the 'lookup.async' option in private
> >>>>>>>>>> place
> >>>>>>>>>>>>> (for
> >>>>>>>>>>>>>>> the
> >>>>>>>>>>>>>>>>>> current hbase connector) and not turn it into a common
> >>>>>>>>> option.
> >>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>> WDYT?
> >>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>> Best,
> >>>>>>>>>>>>>>>>>> Lincoln Lee
> >>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>> Qingsheng Ren <renqs...@gmail.com> 于2022年5月23日周一 14:54写道：
> >>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>> Hi Alexander,
> >>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>> Thanks for the review! We recently updated the FLIP and
> >>>>>>>> you
> >>>>>>>>>> can
> >>>>>>>>>>>>> find
> >>>>>>>>>>>>>>>>>> those
> >>>>>>>>>>>>>>>>>>> changes from my latest email. Since some terminologies has
> >>>>>>>>>>>>> changed so
> >>>>>>>>>>>>>>>>>> I’ll
> >>>>>>>>>>>>>>>>>>> use the new concept for replying your comments.
> >>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>> 1. Builder vs ‘of’
> >>>>>>>>>>>>>>>>>>> I’m OK to use builder pattern if we have additional
> >>>>>>>> optional
> >>>>>>>>>>>>>>> parameters
> >>>>>>>>>>>>>>>>>>> for full caching mode (“rescan” previously). The
> >>>>>>>>>>>>> schedule-with-delay
> >>>>>>>>>>>>>>>>> idea
> >>>>>>>>>>>>>>>>>>> looks reasonable to me, but I think we need to redesign
> >>>>>>>> the
> >>>>>>>>>>>>> builder
> >>>>>>>>>>>>>>> API
> >>>>>>>>>>>>>>>>>> of
> >>>>>>>>>>>>>>>>>>> full caching to make it more descriptive for developers.
> >>>>>>>>> Would
> >>>>>>>>>>> you
> >>>>>>>>>>>>>>> mind
> >>>>>>>>>>>>>>>>>>> sharing your ideas about the API? For accessing the FLIP
> >>>>>>>>>>> workspace
> >>>>>>>>>>>>>>> you
> >>>>>>>>>>>>>>>>>> can
> >>>>>>>>>>>>>>>>>>> just provide your account ID and ping any PMC member
> >>>>>>>>> including
> >>>>>>>>>>>>> Jark.
> >>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>> 2. Common table options
> >>>>>>>>>>>>>>>>>>> We have some discussions these days and propose to
> >>>>>>>>> introduce 8
> >>>>>>>>>>>>> common
> >>>>>>>>>>>>>>>>>>> table options about caching. It has been updated on the
> >>>>>>>>> FLIP.
> >>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>> 3. Retries
> >>>>>>>>>>>>>>>>>>> I think we are on the same page :-)
> >>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>> For your additional concerns:
> >>>>>>>>>>>>>>>>>>> 1) The table option has been updated.
> >>>>>>>>>>>>>>>>>>> 2) We got “lookup.cache” back for configuring whether to
> >>>>>>>> use
> >>>>>>>>>>>>> partial
> >>>>>>>>>>>>>>> or
> >>>>>>>>>>>>>>>>>>> full caching mode.
> >>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>> Best regards,
> >>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>> Qingsheng
> >>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>> On May 19, 2022, at 17:25, Александр Смирнов <
> >>>>>>>>>>>>> smirale...@gmail.com>
> >>>>>>>>>>>>>>>>>>> wrote:
> >>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>> Also I have a few additions:
> >>>>>>>>>>>>>>>>>>>> 1) maybe rename 'lookup.cache.maximum-size' to
> >>>>>>>>>>>>>>>>>>>> 'lookup.cache.max-rows'? I think it will be more clear
> >>>>>>>> that
> >>>>>>>>>> we
> >>>>>>>>>>>>> talk
> >>>>>>>>>>>>>>>>>>>> not about bytes, but about the number of rows. Plus it
> >>>>>>>> fits
> >>>>>>>>>>> more,
> >>>>>>>>>>>>>>>>>>>> considering my optimization with filters.
> >>>>>>>>>>>>>>>>>>>> 2) How will users enable rescanning? Are we going to
> >>>>>>>>> separate
> >>>>>>>>>>>>>>> caching
> >>>>>>>>>>>>>>>>>>>> and rescanning from the options point of view? Like
> >>>>>>>>> initially
> >>>>>>>>>>> we
> >>>>>>>>>>>>> had
> >>>>>>>>>>>>>>>>>>>> one option 'lookup.cache' with values LRU / ALL. I think
> >>>>>>>>> now
> >>>>>>>>>> we
> >>>>>>>>>>>>> can
> >>>>>>>>>>>>>>>>>>>> make a boolean option 'lookup.rescan'. RescanInterval can
> >>>>>>>>> be
> >>>>>>>>>>>>>>>>>>>> 'lookup.rescan.interval', etc.
> >>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>> Best regards,
> >>>>>>>>>>>>>>>>>>>> Alexander
> >>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>> чт, 19 мая 2022 г. в 14:50, Александр Смирнов <
> >>>>>>>>>>>>> smirale...@gmail.com
> >>>>>>>>>>>>>>>>>> :
> >>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>> Hi Qingsheng and Jark,
> >>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>> 1. Builders vs 'of'
> >>>>>>>>>>>>>>>>>>>>> I understand that builders are used when we have
> >>>>>>>> multiple
> >>>>>>>>>>>>>>>>> parameters.
> >>>>>>>>>>>>>>>>>>>>> I suggested them because we could add parameters later.
> >>>>>>>> To
> >>>>>>>>>>>>> prevent
> >>>>>>>>>>>>>>>>>>>>> Builder for ScanRuntimeProvider from looking redundant I
> >>>>>>>>> can
> >>>>>>>>>>>>>>> suggest
> >>>>>>>>>>>>>>>>>>>>> one more config now - "rescanStartTime".
> >>>>>>>>>>>>>>>>>>>>> It's a time in UTC (LocalTime class) when the first
> >>>>>>>> reload
> >>>>>>>>>> of
> >>>>>>>>>>>>> cache
> >>>>>>>>>>>>>>>>>>>>> starts. This parameter can be thought of as
> >>>>>>>> 'initialDelay'
> >>>>>>>>>>> (diff
> >>>>>>>>>>>>>>>>>>>>> between current time and rescanStartTime) in method
> >>>>>>>>>>>>>>>>>>>>> ScheduleExecutorService#scheduleWithFixedDelay [1] . It
> >>>>>>>>> can
> >>>>>>>>>> be
> >>>>>>>>>>>>> very
> >>>>>>>>>>>>>>>>>>>>> useful when the dimension table is updated by some other
> >>>>>>>>>>>>> scheduled
> >>>>>>>>>>>>>>>>> job
> >>>>>>>>>>>>>>>>>>>>> at a certain time. Or when the user simply wants a
> >>>>>>>> second
> >>>>>>>>>> scan
> >>>>>>>>>>>>>>>>> (first
> >>>>>>>>>>>>>>>>>>>>> cache reload) be delayed. This option can be used even
> >>>>>>>>>> without
> >>>>>>>>>>>>>>>>>>>>> 'rescanInterval' - in this case 'rescanInterval' will be
> >>>>>>>>> one
> >>>>>>>>>>>>> day.
> >>>>>>>>>>>>>>>>>>>>> If you are fine with this option, I would be very glad
> >>>>>>>> if
> >>>>>>>>>> you
> >>>>>>>>>>>>> would
> >>>>>>>>>>>>>>>>>>>>> give me access to edit FLIP page, so I could add it
> >>>>>>>> myself
> >>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>> 2. Common table options
> >>>>>>>>>>>>>>>>>>>>> I also think that FactoryUtil would be overloaded by all
> >>>>>>>>>> cache
> >>>>>>>>>>>>>>>>>>>>> options. But maybe unify all suggested options, not only
> >>>>>>>>> for
> >>>>>>>>>>>>>>> default
> >>>>>>>>>>>>>>>>>>>>> cache? I.e. class 'LookupOptions', that unifies default
> >>>>>>>>>> cache
> >>>>>>>>>>>>>>>>> options,
> >>>>>>>>>>>>>>>>>>>>> rescan options, 'async', 'maxRetries'. WDYT?
> >>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>> 3. Retries
> >>>>>>>>>>>>>>>>>>>>> I'm fine with suggestion close to
> >>>>>>>>> RetryUtils#tryTimes(times,
> >>>>>>>>>>>>> call)
> >>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>> [1]
> >>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>
> >>>>>>>>>>>>>
> >>>>>>>>>>>
> >>>>>>>>>>
> >>>>>>>>>
> >>>>>>>>
> >>>>>>
> >>>>> https://docs.oracle.com/javase/8/docs/api/java/util/concurrent/ScheduledExecutorService.html#scheduleWithFixedDelay-java.lang.Runnable-long-long-java.util.concurrent.TimeUnit-
> >>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>> Best regards,
> >>>>>>>>>>>>>>>>>>>>> Alexander
> >>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>> ср, 18 мая 2022 г. в 16:04, Qingsheng Ren <
> >>>>>>>>>> renqs...@gmail.com
> >>>>>>>>>>>> :
> >>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>> Hi Jark and Alexander,
> >>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>> Thanks for your comments! I’m also OK to introduce
> >>>>>>>> common
> >>>>>>>>>>> table
> >>>>>>>>>>>>>>>>>>> options. I prefer to introduce a new
> >>>>>>>>> DefaultLookupCacheOptions
> >>>>>>>>>>>>> class
> >>>>>>>>>>>>>>>>> for
> >>>>>>>>>>>>>>>>>>> holding these option definitions because putting all
> >>>>>>>> options
> >>>>>>>>>>> into
> >>>>>>>>>>>>>>>>>>> FactoryUtil would make it a bit ”crowded” and not well
> >>>>>>>>>>>>> categorized.
> >>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>> FLIP has been updated according to suggestions above:
> >>>>>>>>>>>>>>>>>>>>>> 1. Use static “of” method for constructing
> >>>>>>>>>>>>> RescanRuntimeProvider
> >>>>>>>>>>>>>>>>>>> considering both arguments are required.
> >>>>>>>>>>>>>>>>>>>>>> 2. Introduce new table options matching
> >>>>>>>>>>>>> DefaultLookupCacheFactory
> >>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>> Best,
> >>>>>>>>>>>>>>>>>>>>>> Qingsheng
> >>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>> On Wed, May 18, 2022 at 2:57 PM Jark Wu <
> >>>>>>>>> imj...@gmail.com>
> >>>>>>>>>>>>> wrote:
> >>>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>> Hi Alex,
> >>>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>> 1) retry logic
> >>>>>>>>>>>>>>>>>>>>>>> I think we can extract some common retry logic into
> >>>>>>>>>>> utilities,
> >>>>>>>>>>>>>>>>> e.g.
> >>>>>>>>>>>>>>>>>>> RetryUtils#tryTimes(times, call).
> >>>>>>>>>>>>>>>>>>>>>>> This seems independent of this FLIP and can be reused
> >>>>>>>> by
> >>>>>>>>>>>>>>>>> DataStream
> >>>>>>>>>>>>>>>>>>> users.
> >>>>>>>>>>>>>>>>>>>>>>> Maybe we can open an issue to discuss this and where
> >>>>>>>> to
> >>>>>>>>>> put
> >>>>>>>>>>>>> it.
> >>>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>> 2) cache ConfigOptions
> >>>>>>>>>>>>>>>>>>>>>>> I'm fine with defining cache config options in the
> >>>>>>>>>>> framework.
> >>>>>>>>>>>>>>>>>>>>>>> A candidate place to put is FactoryUtil which also
> >>>>>>>>>> includes
> >>>>>>>>>>>>>>>>>>> "sink.parallelism", "format" options.
> >>>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>> Best,
> >>>>>>>>>>>>>>>>>>>>>>> Jark
> >>>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>> On Wed, 18 May 2022 at 13:52, Александр Смирнов <
> >>>>>>>>>>>>>>>>>> smirale...@gmail.com>
> >>>>>>>>>>>>>>>>>>> wrote:
> >>>>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>>> Hi Qingsheng,
> >>>>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>>> Thank you for considering my comments.
> >>>>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>>>> there might be custom logic before making retry,
> >>>>>>>> such
> >>>>>>>>> as
> >>>>>>>>>>>>>>>>>>> re-establish the connection
> >>>>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>>> Yes, I understand that. I meant that such logic can
> >>>>>>>> be
> >>>>>>>>>>>>> placed in
> >>>>>>>>>>>>>>>>> a
> >>>>>>>>>>>>>>>>>>>>>>>> separate function, that can be implemented by
> >>>>>>>>> connectors.
> >>>>>>>>>>>>> Just
> >>>>>>>>>>>>>>>>>> moving
> >>>>>>>>>>>>>>>>>>>>>>>> the retry logic would make connector's LookupFunction
> >>>>>>>>>> more
> >>>>>>>>>>>>>>>>> concise
> >>>>>>>>>>>>>>>>>> +
> >>>>>>>>>>>>>>>>>>>>>>>> avoid duplicate code. However, it's a minor change.
> >>>>>>>> The
> >>>>>>>>>>>>> decision
> >>>>>>>>>>>>>>>>> is
> >>>>>>>>>>>>>>>>>>> up
> >>>>>>>>>>>>>>>>>>>>>>>> to you.
> >>>>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>>>> We decide not to provide common DDL options and let
> >>>>>>>>>>>>> developers
> >>>>>>>>>>>>>>>>> to
> >>>>>>>>>>>>>>>>>>> define their own options as we do now per connector.
> >>>>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>>> What is the reason for that? One of the main goals of
> >>>>>>>>>> this
> >>>>>>>>>>>>> FLIP
> >>>>>>>>>>>>>>>>> was
> >>>>>>>>>>>>>>>>>>> to
> >>>>>>>>>>>>>>>>>>>>>>>> unify the configs, wasn't it? I understand that
> >>>>>>>> current
> >>>>>>>>>>> cache
> >>>>>>>>>>>>>>>>>> design
> >>>>>>>>>>>>>>>>>>>>>>>> doesn't depend on ConfigOptions, like was before. But
> >>>>>>>>>> still
> >>>>>>>>>>>>> we
> >>>>>>>>>>>>>>>>> can
> >>>>>>>>>>>>>>>>>>> put
> >>>>>>>>>>>>>>>>>>>>>>>> these options into the framework, so connectors can
> >>>>>>>>> reuse
> >>>>>>>>>>>>> them
> >>>>>>>>>>>>>>>>> and
> >>>>>>>>>>>>>>>>>>>>>>>> avoid code duplication, and, what is more
> >>>>>>>> significant,
> >>>>>>>>>>> avoid
> >>>>>>>>>>>>>>>>>> possible
> >>>>>>>>>>>>>>>>>>>>>>>> different options naming. This moment can be pointed
> >>>>>>>>> out
> >>>>>>>>>> in
> >>>>>>>>>>>>>>>>>>>>>>>> documentation for connector developers.
> >>>>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>>> Best regards,
> >>>>>>>>>>>>>>>>>>>>>>>> Alexander
> >>>>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>>> вт, 17 мая 2022 г. в 17:11, Qingsheng Ren <
> >>>>>>>>>>>>> renqs...@gmail.com>:
> >>>>>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>>>> Hi Alexander,
> >>>>>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>>>> Thanks for the review and glad to see we are on the
> >>>>>>>>> same
> >>>>>>>>>>>>> page!
> >>>>>>>>>>>>>>> I
> >>>>>>>>>>>>>>>>>>> think you forgot to cc the dev mailing list so I’m also
> >>>>>>>>>> quoting
> >>>>>>>>>>>>> your
> >>>>>>>>>>>>>>>>>> reply
> >>>>>>>>>>>>>>>>>>> under this email.
> >>>>>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>>>>> We can add 'maxRetryTimes' option into this class
> >>>>>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>>>> In my opinion the retry logic should be implemented
> >>>>>>>> in
> >>>>>>>>>>>>> lookup()
> >>>>>>>>>>>>>>>>>>> instead of in LookupFunction#eval(). Retrying is only
> >>>>>>>>>> meaningful
> >>>>>>>>>>>>>>> under
> >>>>>>>>>>>>>>>>>> some
> >>>>>>>>>>>>>>>>>>> specific retriable failures, and there might be custom
> >>>>>>>> logic
> >>>>>>>>>>>>> before
> >>>>>>>>>>>>>>>>>> making
> >>>>>>>>>>>>>>>>>>> retry, such as re-establish the connection
> >>>>>>>>>>>>> (JdbcRowDataLookupFunction
> >>>>>>>>>>>>>>>>> is
> >>>>>>>>>>>>>>>>>> an
> >>>>>>>>>>>>>>>>>>> example), so it's more handy to leave it to the connector.
> >>>>>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>>>>> I don't see DDL options, that were in previous
> >>>>>>>>> version
> >>>>>>>>>> of
> >>>>>>>>>>>>>>> FLIP.
> >>>>>>>>>>>>>>>>>> Do
> >>>>>>>>>>>>>>>>>>> you have any special plans for them?
> >>>>>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>>>> We decide not to provide common DDL options and let
> >>>>>>>>>>>>> developers
> >>>>>>>>>>>>>>>>> to
> >>>>>>>>>>>>>>>>>>> define their own options as we do now per connector.
> >>>>>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>>>> The rest of comments sound great and I’ll update the
> >>>>>>>>>> FLIP.
> >>>>>>>>>>>>> Hope
> >>>>>>>>>>>>>>>>> we
> >>>>>>>>>>>>>>>>>>> can finalize our proposal soon!
> >>>>>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>>>> Best,
> >>>>>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>>>> Qingsheng
> >>>>>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>>>>> On May 17, 2022, at 13:46, Александр Смирнов <
> >>>>>>>>>>>>>>>>>> smirale...@gmail.com>
> >>>>>>>>>>>>>>>>>>> wrote:
> >>>>>>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>>>>> Hi Qingsheng and devs!
> >>>>>>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>>>>> I like the overall design of updated FLIP, however
> >>>>>>>> I
> >>>>>>>>>> have
> >>>>>>>>>>>>>>>>> several
> >>>>>>>>>>>>>>>>>>>>>>>>>> suggestions and questions.
> >>>>>>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>>>>> 1) Introducing LookupFunction as a subclass of
> >>>>>>>>>>>>> TableFunction
> >>>>>>>>>>>>>>>>> is a
> >>>>>>>>>>>>>>>>>>> good
> >>>>>>>>>>>>>>>>>>>>>>>>>> idea. We can add 'maxRetryTimes' option into this
> >>>>>>>>>> class.
> >>>>>>>>>>>>>>> 'eval'
> >>>>>>>>>>>>>>>>>>> method
> >>>>>>>>>>>>>>>>>>>>>>>>>> of new LookupFunction is great for this purpose.
> >>>>>>>> The
> >>>>>>>>>> same
> >>>>>>>>>>>>> is
> >>>>>>>>>>>>>>>>> for
> >>>>>>>>>>>>>>>>>>>>>>>>>> 'async' case.
> >>>>>>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>>>>> 2) There might be other configs in future, such as
> >>>>>>>>>>>>>>>>>>> 'cacheMissingKey'
> >>>>>>>>>>>>>>>>>>>>>>>>>> in LookupFunctionProvider or 'rescanInterval' in
> >>>>>>>>>>>>>>>>>>> ScanRuntimeProvider.
> >>>>>>>>>>>>>>>>>>>>>>>>>> Maybe use Builder pattern in LookupFunctionProvider
> >>>>>>>>> and
> >>>>>>>>>>>>>>>>>>>>>>>>>> RescanRuntimeProvider for more flexibility (use one
> >>>>>>>>>>> 'build'
> >>>>>>>>>>>>>>>>>> method
> >>>>>>>>>>>>>>>>>>>>>>>>>> instead of many 'of' methods in future)?
> >>>>>>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>>>>> 3) What are the plans for existing
> >>>>>>>>>> TableFunctionProvider
> >>>>>>>>>>>>> and
> >>>>>>>>>>>>>>>>>>>>>>>>>> AsyncTableFunctionProvider? I think they should be
> >>>>>>>>>>>>> deprecated.
> >>>>>>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>>>>> 4) Am I right that the current design does not
> >>>>>>>> assume
> >>>>>>>>>>>>> usage of
> >>>>>>>>>>>>>>>>>>>>>>>>>> user-provided LookupCache in re-scanning? In this
> >>>>>>>>> case,
> >>>>>>>>>>> it
> >>>>>>>>>>>>> is
> >>>>>>>>>>>>>>>>> not
> >>>>>>>>>>>>>>>>>>> very
> >>>>>>>>>>>>>>>>>>>>>>>>>> clear why do we need methods such as 'invalidate'
> >>>>>>>> or
> >>>>>>>>>>>>> 'putAll'
> >>>>>>>>>>>>>>>>> in
> >>>>>>>>>>>>>>>>>>>>>>>>>> LookupCache.
> >>>>>>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>>>>> 5) I don't see DDL options, that were in previous
> >>>>>>>>>> version
> >>>>>>>>>>>>> of
> >>>>>>>>>>>>>>>>>> FLIP.
> >>>>>>>>>>>>>>>>>>> Do
> >>>>>>>>>>>>>>>>>>>>>>>>>> you have any special plans for them?
> >>>>>>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>>>>> If you don't mind, I would be glad to be able to
> >>>>>>>> make
> >>>>>>>>>>> small
> >>>>>>>>>>>>>>>>>>>>>>>>>> adjustments to the FLIP document too. I think it's
> >>>>>>>>>> worth
> >>>>>>>>>>>>>>>>>> mentioning
> >>>>>>>>>>>>>>>>>>>>>>>>>> about what exactly optimizations are planning in
> >>>>>>>> the
> >>>>>>>>>>>>> future.
> >>>>>>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>>>>> Best regards,
> >>>>>>>>>>>>>>>>>>>>>>>>>> Smirnov Alexander
> >>>>>>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>>>>> пт, 13 мая 2022 г. в 20:27, Qingsheng Ren <
> >>>>>>>>>>>>> renqs...@gmail.com
> >>>>>>>>>>>>>>>>>> :
> >>>>>>>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>>>>>> Hi Alexander and devs,
> >>>>>>>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>>>>>> Thank you very much for the in-depth discussion!
> >>>>>>>> As
> >>>>>>>>>> Jark
> >>>>>>>>>>>>>>>>>>> mentioned we were inspired by Alexander's idea and made a
> >>>>>>>>>>>>> refactor on
> >>>>>>>>>>>>>>>>> our
> >>>>>>>>>>>>>>>>>>> design. FLIP-221 [1] has been updated to reflect our
> >>>>>>>> design
> >>>>>>>>>> now
> >>>>>>>>>>>>> and
> >>>>>>>>>>>>>>> we
> >>>>>>>>>>>>>>>>>> are
> >>>>>>>>>>>>>>>>>>> happy to hear more suggestions from you!
> >>>>>>>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>>>>>> Compared to the previous design:
> >>>>>>>>>>>>>>>>>>>>>>>>>>> 1. The lookup cache serves at table runtime level
> >>>>>>>>> and
> >>>>>>>>>> is
> >>>>>>>>>>>>>>>>>>> integrated as a component of LookupJoinRunner as discussed
> >>>>>>>>>>>>>>> previously.
> >>>>>>>>>>>>>>>>>>>>>>>>>>> 2. Interfaces are renamed and re-designed to
> >>>>>>>> reflect
> >>>>>>>>>> the
> >>>>>>>>>>>>> new
> >>>>>>>>>>>>>>>>>>> design.
> >>>>>>>>>>>>>>>>>>>>>>>>>>> 3. We separate the all-caching case individually
> >>>>>>>> and
> >>>>>>>>>>>>>>>>> introduce a
> >>>>>>>>>>>>>>>>>>> new RescanRuntimeProvider to reuse the ability of
> >>>>>>>> scanning.
> >>>>>>>>> We
> >>>>>>>>>>> are
> >>>>>>>>>>>>>>>>>> planning
> >>>>>>>>>>>>>>>>>>> to support SourceFunction / InputFormat for now
> >>>>>>>> considering
> >>>>>>>>>> the
> >>>>>>>>>>>>>>>>>> complexity
> >>>>>>>>>>>>>>>>>>> of FLIP-27 Source API.
> >>>>>>>>>>>>>>>>>>>>>>>>>>> 4. A new interface LookupFunction is introduced to
> >>>>>>>>>> make
> >>>>>>>>>>>>> the
> >>>>>>>>>>>>>>>>>>> semantic of lookup more straightforward for developers.
> >>>>>>>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>>>>>> For replying to Alexander:
> >>>>>>>>>>>>>>>>>>>>>>>>>>>> However I'm a little confused whether InputFormat
> >>>>>>>>> is
> >>>>>>>>>>>>>>>>> deprecated
> >>>>>>>>>>>>>>>>>>> or not. Am I right that it will be so in the future, but
> >>>>>>>>>>> currently
> >>>>>>>>>>>>>>> it's
> >>>>>>>>>>>>>>>>>> not?
> >>>>>>>>>>>>>>>>>>>>>>>>>>> Yes you are right. InputFormat is not deprecated
> >>>>>>>> for
> >>>>>>>>>>> now.
> >>>>>>>>>>>>> I
> >>>>>>>>>>>>>>>>>> think
> >>>>>>>>>>>>>>>>>>> it will be deprecated in the future but we don't have a
> >>>>>>>>> clear
> >>>>>>>>>>> plan
> >>>>>>>>>>>>>>> for
> >>>>>>>>>>>>>>>>>> that.
> >>>>>>>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>>>>>> Thanks again for the discussion on this FLIP and
> >>>>>>>>>> looking
> >>>>>>>>>>>>>>>>> forward
> >>>>>>>>>>>>>>>>>>> to cooperating with you after we finalize the design and
> >>>>>>>>>>>>> interfaces!
> >>>>>>>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>>>>>> [1]
> >>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>
> >>>>>>>>>>>>>
> >>>>>>>>>>>
> >>>>>>>>>>
> >>>>>>>>>
> >>>>>>>>
> >>>>>>
> >>>>> https://cwiki.apache.org/confluence/display/FLINK/FLIP-221+Abstraction+for+lookup+source+cache+and+metric
> >>>>>>>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>>>>>> Best regards,
> >>>>>>>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>>>>>> Qingsheng
> >>>>>>>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>>>>>> On Fri, May 13, 2022 at 12:12 AM Александр
> >>>>>>>> Смирнов <
> >>>>>>>>>>>>>>>>>>> smirale...@gmail.com> wrote:
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>>>>>>> Hi Jark, Qingsheng and Leonard!
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>>>>>>> Glad to see that we came to a consensus on almost
> >>>>>>>>> all
> >>>>>>>>>>>>>>> points!
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>>>>>>> However I'm a little confused whether InputFormat
> >>>>>>>>> is
> >>>>>>>>>>>>>>>>> deprecated
> >>>>>>>>>>>>>>>>>>> or
> >>>>>>>>>>>>>>>>>>>>>>>>>>>> not. Am I right that it will be so in the future,
> >>>>>>>>> but
> >>>>>>>>>>>>>>>>> currently
> >>>>>>>>>>>>>>>>>>> it's
> >>>>>>>>>>>>>>>>>>>>>>>>>>>> not? Actually I also think that for the first
> >>>>>>>>> version
> >>>>>>>>>>>>> it's
> >>>>>>>>>>>>>>> OK
> >>>>>>>>>>>>>>>>>> to
> >>>>>>>>>>>>>>>>>>> use
> >>>>>>>>>>>>>>>>>>>>>>>>>>>> InputFormat in ALL cache realization, because
> >>>>>>>>>>> supporting
> >>>>>>>>>>>>>>>>> rescan
> >>>>>>>>>>>>>>>>>>>>>>>>>>>> ability seems like a very distant prospect. But
> >>>>>>>> for
> >>>>>>>>>>> this
> >>>>>>>>>>>>>>>>>>> decision we
> >>>>>>>>>>>>>>>>>>>>>>>>>>>> need a consensus among all discussion
> >>>>>>>> participants.
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>>>>>>> In general, I don't have something to argue with
> >>>>>>>>> your
> >>>>>>>>>>>>>>>>>>> statements. All
> >>>>>>>>>>>>>>>>>>>>>>>>>>>> of them correspond my ideas. Looking ahead, it
> >>>>>>>>> would
> >>>>>>>>>> be
> >>>>>>>>>>>>> nice
> >>>>>>>>>>>>>>>>> to
> >>>>>>>>>>>>>>>>>>> work
> >>>>>>>>>>>>>>>>>>>>>>>>>>>> on this FLIP cooperatively. I've already done a
> >>>>>>>> lot
> >>>>>>>>>> of
> >>>>>>>>>>>>> work
> >>>>>>>>>>>>>>>>> on
> >>>>>>>>>>>>>>>>>>> lookup
> >>>>>>>>>>>>>>>>>>>>>>>>>>>> join caching with realization very close to the
> >>>>>>>> one
> >>>>>>>>>> we
> >>>>>>>>>>>>> are
> >>>>>>>>>>>>>>>>>>> discussing,
> >>>>>>>>>>>>>>>>>>>>>>>>>>>> and want to share the results of this work.
> >>>>>>>> Anyway
> >>>>>>>>>>>>> looking
> >>>>>>>>>>>>>>>>>>> forward for
> >>>>>>>>>>>>>>>>>>>>>>>>>>>> the FLIP update!
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>>>>>>> Best regards,
> >>>>>>>>>>>>>>>>>>>>>>>>>>>> Smirnov Alexander
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>>>>>>> чт, 12 мая 2022 г. в 17:38, Jark Wu <
> >>>>>>>>>> imj...@gmail.com
> >>>>>>>>>>>> :
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>> Hi Alex,
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>> Thanks for summarizing your points.
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>> In the past week, Qingsheng, Leonard, and I have
> >>>>>>>>>>>>> discussed
> >>>>>>>>>>>>>>>>> it
> >>>>>>>>>>>>>>>>>>> several times
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>> and we have totally refactored the design.
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>> I'm glad to say we have reached a consensus on
> >>>>>>>>> many
> >>>>>>>>>> of
> >>>>>>>>>>>>> your
> >>>>>>>>>>>>>>>>>>> points!
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>> Qingsheng is still working on updating the
> >>>>>>>> design
> >>>>>>>>>> docs
> >>>>>>>>>>>>> and
> >>>>>>>>>>>>>>>>>>> maybe can be
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>> available in the next few days.
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>> I will share some conclusions from our
> >>>>>>>>> discussions:
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>> 1) we have refactored the design towards to
> >>>>>>>> "cache
> >>>>>>>>>> in
> >>>>>>>>>>>>>>>>>>> framework" way.
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>> 2) a "LookupCache" interface for users to
> >>>>>>>>> customize
> >>>>>>>>>>> and
> >>>>>>>>>>>>> a
> >>>>>>>>>>>>>>>>>>> default
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>> implementation with builder for users to
> >>>>>>>> easy-use.
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>> This can both make it possible to both have
> >>>>>>>>>>> flexibility
> >>>>>>>>>>>>> and
> >>>>>>>>>>>>>>>>>>> conciseness.
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>> 3) Filter pushdown is important for ALL and LRU
> >>>>>>>>>> lookup
> >>>>>>>>>>>>>>>>> cache,
> >>>>>>>>>>>>>>>>>>> esp reducing
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>> IO.
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>> Filter pushdown should be the final state and
> >>>>>>>> the
> >>>>>>>>>>>>> unified
> >>>>>>>>>>>>>>>>> way
> >>>>>>>>>>>>>>>>>>> to both
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>> support pruning ALL cache and LRU cache,
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>> so I think we should make effort in this
> >>>>>>>>> direction.
> >>>>>>>>>> If
> >>>>>>>>>>>>> we
> >>>>>>>>>>>>>>>>> need
> >>>>>>>>>>>>>>>>>>> to support
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>> filter pushdown for ALL cache anyway, why not
> >>>>>>>> use
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>> it for LRU cache as well? Either way, as we
> >>>>>>>> decide
> >>>>>>>>>> to
> >>>>>>>>>>>>>>>>>> implement
> >>>>>>>>>>>>>>>>>>> the cache
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>> in the framework, we have the chance to support
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>> filter on cache anytime. This is an optimization
> >>>>>>>>> and
> >>>>>>>>>>> it
> >>>>>>>>>>>>>>>>>> doesn't
> >>>>>>>>>>>>>>>>>>> affect the
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>> public API. I think we can create a JIRA issue
> >>>>>>>> to
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>> discuss it when the FLIP is accepted.
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>> 4) The idea to support ALL cache is similar to
> >>>>>>>>> your
> >>>>>>>>>>>>>>>>> proposal.
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>> In the first version, we will only support
> >>>>>>>>>>> InputFormat,
> >>>>>>>>>>>>>>>>>>> SourceFunction for
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>> cache all (invoke InputFormat in join operator).
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>> For FLIP-27 source, we need to join a true
> >>>>>>>> source
> >>>>>>>>>>>>> operator
> >>>>>>>>>>>>>>>>>>> instead of
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>> calling it embedded in the join operator.
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>> However, this needs another FLIP to support the
> >>>>>>>>>>> re-scan
> >>>>>>>>>>>>>>>>>> ability
> >>>>>>>>>>>>>>>>>>> for FLIP-27
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>> Source, and this can be a large work.
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>> In order to not block this issue, we can put the
> >>>>>>>>>>> effort
> >>>>>>>>>>>>> of
> >>>>>>>>>>>>>>>>>>> FLIP-27 source
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>> integration into future work and integrate
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>> InputFormat&SourceFunction for now.
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>> I think it's fine to use
> >>>>>>>>> InputFormat&SourceFunction,
> >>>>>>>>>>> as
> >>>>>>>>>>>>>>> they
> >>>>>>>>>>>>>>>>>>> are not
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>> deprecated, otherwise, we have to introduce
> >>>>>>>>> another
> >>>>>>>>>>>>>>> function
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>> similar to them which is meaningless. We need to
> >>>>>>>>>> plan
> >>>>>>>>>>>>>>>>> FLIP-27
> >>>>>>>>>>>>>>>>>>> source
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>> integration ASAP before InputFormat &
> >>>>>>>>> SourceFunction
> >>>>>>>>>>> are
> >>>>>>>>>>>>>>>>>>> deprecated.
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>> Best,
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>> Jark
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>> On Thu, 12 May 2022 at 15:46, Александр Смирнов
> >>>>>>>> <
> >>>>>>>>>>>>>>>>>>> smirale...@gmail.com>
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>> wrote:
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Hi Martijn!
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Got it. Therefore, the realization with
> >>>>>>>>> InputFormat
> >>>>>>>>>>> is
> >>>>>>>>>>>>> not
> >>>>>>>>>>>>>>>>>>> considered.
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Thanks for clearing that up!
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Best regards,
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Smirnov Alexander
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>> чт, 12 мая 2022 г. в 14:23, Martijn Visser <
> >>>>>>>>>>>>>>>>>>> mart...@ververica.com>:
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Hi,
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> With regards to:
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> But if there are plans to refactor all
> >>>>>>>>> connectors
> >>>>>>>>>>> to
> >>>>>>>>>>>>>>>>>> FLIP-27
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Yes, FLIP-27 is the target for all connectors.
> >>>>>>>>> The
> >>>>>>>>>>> old
> >>>>>>>>>>>>>>>>>>> interfaces will be
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> deprecated and connectors will either be
> >>>>>>>>>> refactored
> >>>>>>>>>>> to
> >>>>>>>>>>>>>>> use
> >>>>>>>>>>>>>>>>>>> the new ones
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>> or
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> dropped.
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> The caching should work for connectors that
> >>>>>>>> are
> >>>>>>>>>>> using
> >>>>>>>>>>>>>>>>>> FLIP-27
> >>>>>>>>>>>>>>>>>>> interfaces,
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> we should not introduce new features for old
> >>>>>>>>>>>>> interfaces.
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Best regards,
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Martijn
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> On Thu, 12 May 2022 at 06:19, Александр
> >>>>>>>> Смирнов
> >>>>>>>>> <
> >>>>>>>>>>>>>>>>>>> smirale...@gmail.com>
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> wrote:
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Hi Jark!
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Sorry for the late response. I would like to
> >>>>>>>>> make
> >>>>>>>>>>>>> some
> >>>>>>>>>>>>>>>>>>> comments and
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> clarify my points.
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> 1) I agree with your first statement. I think
> >>>>>>>>> we
> >>>>>>>>>>> can
> >>>>>>>>>>>>>>>>>> achieve
> >>>>>>>>>>>>>>>>>>> both
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> advantages this way: put the Cache interface
> >>>>>>>> in
> >>>>>>>>>>>>>>>>>>> flink-table-common,
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> but have implementations of it in
> >>>>>>>>>>>>> flink-table-runtime.
> >>>>>>>>>>>>>>>>>>> Therefore if a
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> connector developer wants to use existing
> >>>>>>>> cache
> >>>>>>>>>>>>>>>>> strategies
> >>>>>>>>>>>>>>>>>>> and their
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> implementations, he can just pass
> >>>>>>>> lookupConfig
> >>>>>>>>> to
> >>>>>>>>>>> the
> >>>>>>>>>>>>>>>>>>> planner, but if
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> he wants to have its own cache implementation
> >>>>>>>>> in
> >>>>>>>>>>> his
> >>>>>>>>>>>>>>>>>>> TableFunction, it
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> will be possible for him to use the existing
> >>>>>>>>>>>>> interface
> >>>>>>>>>>>>>>>>> for
> >>>>>>>>>>>>>>>>>>> this
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> purpose (we can explicitly point this out in
> >>>>>>>>> the
> >>>>>>>>>>>>>>>>>>> documentation). In
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> this way all configs and metrics will be
> >>>>>>>>> unified.
> >>>>>>>>>>>>> WDYT?
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> If a filter can prune 90% of data in the
> >>>>>>>>> cache,
> >>>>>>>>>> we
> >>>>>>>>>>>>> will
> >>>>>>>>>>>>>>>>>>> have 90% of
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> lookup requests that can never be cached
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> 2) Let me clarify the logic filters
> >>>>>>>>> optimization
> >>>>>>>>>> in
> >>>>>>>>>>>>> case
> >>>>>>>>>>>>>>>>> of
> >>>>>>>>>>>>>>>>>>> LRU cache.
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> It looks like Cache<RowData,
> >>>>>>>>>> Collection<RowData>>.
> >>>>>>>>>>>>> Here
> >>>>>>>>>>>>>>>>> we
> >>>>>>>>>>>>>>>>>>> always
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> store the response of the dimension table in
> >>>>>>>>>> cache,
> >>>>>>>>>>>>> even
> >>>>>>>>>>>>>>>>>>> after
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> applying calc function. I.e. if there are no
> >>>>>>>>> rows
> >>>>>>>>>>>>> after
> >>>>>>>>>>>>>>>>>>> applying
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> filters to the result of the 'eval' method of
> >>>>>>>>>>>>>>>>>> TableFunction,
> >>>>>>>>>>>>>>>>>>> we store
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> the empty list by lookup keys. Therefore the
> >>>>>>>>>> cache
> >>>>>>>>>>>>> line
> >>>>>>>>>>>>>>>>>> will
> >>>>>>>>>>>>>>>>>>> be
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> filled, but will require much less memory (in
> >>>>>>>>>>> bytes).
> >>>>>>>>>>>>>>>>> I.e.
> >>>>>>>>>>>>>>>>>>> we don't
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> completely filter keys, by which result was
> >>>>>>>>>> pruned,
> >>>>>>>>>>>>> but
> >>>>>>>>>>>>>>>>>>> significantly
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> reduce required memory to store this result.
> >>>>>>>> If
> >>>>>>>>>> the
> >>>>>>>>>>>>> user
> >>>>>>>>>>>>>>>>>>> knows about
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> this behavior, he can increase the 'max-rows'
> >>>>>>>>>>> option
> >>>>>>>>>>>>>>>>> before
> >>>>>>>>>>>>>>>>>>> the start
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> of the job. But actually I came up with the
> >>>>>>>>> idea
> >>>>>>>>>>>>> that we
> >>>>>>>>>>>>>>>>>> can
> >>>>>>>>>>>>>>>>>>> do this
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> automatically by using the 'maximumWeight'
> >>>>>>>> and
> >>>>>>>>>>>>> 'weigher'
> >>>>>>>>>>>>>>>>>>> methods of
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> GuavaCache [1]. Weight can be the size of the
> >>>>>>>>>>>>> collection
> >>>>>>>>>>>>>>>>> of
> >>>>>>>>>>>>>>>>>>> rows
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> (value of cache). Therefore cache can
> >>>>>>>>>> automatically
> >>>>>>>>>>>>> fit
> >>>>>>>>>>>>>>>>>> much
> >>>>>>>>>>>>>>>>>>> more
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> records than before.
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Flink SQL has provided a standard way to do
> >>>>>>>>>>> filters
> >>>>>>>>>>>>> and
> >>>>>>>>>>>>>>>>>>> projects
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> pushdown, i.e., SupportsFilterPushDown and
> >>>>>>>>>>>>>>>>>>> SupportsProjectionPushDown.
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Jdbc/hive/HBase haven't implemented the
> >>>>>>>>>>> interfaces,
> >>>>>>>>>>>>>>>>> don't
> >>>>>>>>>>>>>>>>>>> mean it's
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>> hard
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> to implement.
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> It's debatable how difficult it will be to
> >>>>>>>>>>> implement
> >>>>>>>>>>>>>>>>> filter
> >>>>>>>>>>>>>>>>>>> pushdown.
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> But I think the fact that currently there is
> >>>>>>>> no
> >>>>>>>>>>>>> database
> >>>>>>>>>>>>>>>>>>> connector
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> with filter pushdown at least means that this
> >>>>>>>>>>> feature
> >>>>>>>>>>>>>>>>> won't
> >>>>>>>>>>>>>>>>>>> be
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> supported soon in connectors. Moreover, if we
> >>>>>>>>>> talk
> >>>>>>>>>>>>> about
> >>>>>>>>>>>>>>>>>>> other
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> connectors (not in Flink repo), their
> >>>>>>>> databases
> >>>>>>>>>>> might
> >>>>>>>>>>>>>>> not
> >>>>>>>>>>>>>>>>>>> support all
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Flink filters (or not support filters at
> >>>>>>>> all).
> >>>>>>>>> I
> >>>>>>>>>>>>> think
> >>>>>>>>>>>>>>>>>> users
> >>>>>>>>>>>>>>>>>>> are
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> interested in supporting cache filters
> >>>>>>>>>> optimization
> >>>>>>>>>>>>>>>>>>> independently of
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> supporting other features and solving more
> >>>>>>>>>> complex
> >>>>>>>>>>>>>>>>> problems
> >>>>>>>>>>>>>>>>>>> (or
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> unsolvable at all).
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> 3) I agree with your third statement.
> >>>>>>>> Actually
> >>>>>>>>> in
> >>>>>>>>>>> our
> >>>>>>>>>>>>>>>>>>> internal version
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> I also tried to unify the logic of scanning
> >>>>>>>> and
> >>>>>>>>>>>>>>> reloading
> >>>>>>>>>>>>>>>>>>> data from
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> connectors. But unfortunately, I didn't find
> >>>>>>>> a
> >>>>>>>>>> way
> >>>>>>>>>>> to
> >>>>>>>>>>>>>>>>> unify
> >>>>>>>>>>>>>>>>>>> the logic
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> of all ScanRuntimeProviders (InputFormat,
> >>>>>>>>>>>>>>> SourceFunction,
> >>>>>>>>>>>>>>>>>>> Source,...)
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> and reuse it in reloading ALL cache. As a
> >>>>>>>>> result
> >>>>>>>>>> I
> >>>>>>>>>>>>>>>>> settled
> >>>>>>>>>>>>>>>>>>> on using
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> InputFormat, because it was used for scanning
> >>>>>>>>> in
> >>>>>>>>>>> all
> >>>>>>>>>>>>>>>>> lookup
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> connectors. (I didn't know that there are
> >>>>>>>> plans
> >>>>>>>>>> to
> >>>>>>>>>>>>>>>>>> deprecate
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> InputFormat in favor of FLIP-27 Source). IMO
> >>>>>>>>>> usage
> >>>>>>>>>>> of
> >>>>>>>>>>>>>>>>>>> FLIP-27 source
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> in ALL caching is not good idea, because this
> >>>>>>>>>>> source
> >>>>>>>>>>>>> was
> >>>>>>>>>>>>>>>>>>> designed to
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> work in distributed environment
> >>>>>>>>> (SplitEnumerator
> >>>>>>>>>> on
> >>>>>>>>>>>>>>>>>>> JobManager and
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> SourceReaders on TaskManagers), not in one
> >>>>>>>>>> operator
> >>>>>>>>>>>>>>>>> (lookup
> >>>>>>>>>>>>>>>>>>> join
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> operator in our case). There is even no
> >>>>>>>> direct
> >>>>>>>>>> way
> >>>>>>>>>>> to
> >>>>>>>>>>>>>>>>> pass
> >>>>>>>>>>>>>>>>>>> splits from
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> SplitEnumerator to SourceReader (this logic
> >>>>>>>>> works
> >>>>>>>>>>>>>>> through
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> SplitEnumeratorContext, which requires
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> OperatorCoordinator.SubtaskGateway to send
> >>>>>>>>>>>>>>>>> AddSplitEvents).
> >>>>>>>>>>>>>>>>>>> Usage of
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> InputFormat for ALL cache seems much more
> >>>>>>>>> clearer
> >>>>>>>>>>> and
> >>>>>>>>>>>>>>>>>>> easier. But if
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> there are plans to refactor all connectors to
> >>>>>>>>>>>>> FLIP-27, I
> >>>>>>>>>>>>>>>>>>> have the
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> following ideas: maybe we can refuse from
> >>>>>>>>> lookup
> >>>>>>>>>>> join
> >>>>>>>>>>>>>>> ALL
> >>>>>>>>>>>>>>>>>>> cache in
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> favor of simple join with multiple scanning
> >>>>>>>> of
> >>>>>>>>>>> batch
> >>>>>>>>>>>>>>>>>> source?
> >>>>>>>>>>>>>>>>>>> The point
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> is that the only difference between lookup
> >>>>>>>> join
> >>>>>>>>>> ALL
> >>>>>>>>>>>>>>> cache
> >>>>>>>>>>>>>>>>>>> and simple
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> join with batch source is that in the first
> >>>>>>>>> case
> >>>>>>>>>>>>>>> scanning
> >>>>>>>>>>>>>>>>>> is
> >>>>>>>>>>>>>>>>>>> performed
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> multiple times, in between which state
> >>>>>>>> (cache)
> >>>>>>>>> is
> >>>>>>>>>>>>>>> cleared
> >>>>>>>>>>>>>>>>>>> (correct me
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> if I'm wrong). So what if we extend the
> >>>>>>>>>>>>> functionality of
> >>>>>>>>>>>>>>>>>>> simple join
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> to support state reloading + extend the
> >>>>>>>>>>>>> functionality of
> >>>>>>>>>>>>>>>>>>> scanning
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> batch source multiple times (this one should
> >>>>>>>> be
> >>>>>>>>>>> easy
> >>>>>>>>>>>>>>> with
> >>>>>>>>>>>>>>>>>>> new FLIP-27
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> source, that unifies streaming/batch reading
> >>>>>>>> -
> >>>>>>>>> we
> >>>>>>>>>>>>> will
> >>>>>>>>>>>>>>>>> need
> >>>>>>>>>>>>>>>>>>> to change
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> only SplitEnumerator, which will pass splits
> >>>>>>>>>> again
> >>>>>>>>>>>>> after
> >>>>>>>>>>>>>>>>>>> some TTL).
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> WDYT? I must say that this looks like a
> >>>>>>>>> long-term
> >>>>>>>>>>>>> goal
> >>>>>>>>>>>>>>>>> and
> >>>>>>>>>>>>>>>>>>> will make
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> the scope of this FLIP even larger than you
> >>>>>>>>> said.
> >>>>>>>>>>>>> Maybe
> >>>>>>>>>>>>>>>>> we
> >>>>>>>>>>>>>>>>>>> can limit
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> ourselves to a simpler solution now
> >>>>>>>>>> (InputFormats).
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> So to sum up, my points is like this:
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> 1) There is a way to make both concise and
> >>>>>>>>>> flexible
> >>>>>>>>>>>>>>>>>>> interfaces for
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> caching in lookup join.
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> 2) Cache filters optimization is important
> >>>>>>>> both
> >>>>>>>>>> in
> >>>>>>>>>>>>> LRU
> >>>>>>>>>>>>>>>>> and
> >>>>>>>>>>>>>>>>>>> ALL caches.
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> 3) It is unclear when filter pushdown will be
> >>>>>>>>>>>>> supported
> >>>>>>>>>>>>>>>>> in
> >>>>>>>>>>>>>>>>>>> Flink
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> connectors, some of the connectors might not
> >>>>>>>>> have
> >>>>>>>>>>> the
> >>>>>>>>>>>>>>>>>>> opportunity to
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> support filter pushdown + as I know,
> >>>>>>>> currently
> >>>>>>>>>>> filter
> >>>>>>>>>>>>>>>>>>> pushdown works
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> only for scanning (not lookup). So cache
> >>>>>>>>> filters
> >>>>>>>>>> +
> >>>>>>>>>>>>>>>>>>> projections
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> optimization should be independent from other
> >>>>>>>>>>>>> features.
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> 4) ALL cache realization is a complex topic
> >>>>>>>>> that
> >>>>>>>>>>>>>>> involves
> >>>>>>>>>>>>>>>>>>> multiple
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> aspects of how Flink is developing. Refusing
> >>>>>>>>> from
> >>>>>>>>>>>>>>>>>>> InputFormat in favor
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> of FLIP-27 Source will make ALL cache
> >>>>>>>>> realization
> >>>>>>>>>>>>> really
> >>>>>>>>>>>>>>>>>>> complex and
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> not clear, so maybe instead of that we can
> >>>>>>>>> extend
> >>>>>>>>>>> the
> >>>>>>>>>>>>>>>>>>> functionality of
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> simple join or not refuse from InputFormat in
> >>>>>>>>>> case
> >>>>>>>>>>> of
> >>>>>>>>>>>>>>>>>> lookup
> >>>>>>>>>>>>>>>>>>> join ALL
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> cache?
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Best regards,
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Smirnov Alexander
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> [1]
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>
> >>>>>>>>>>>>>
> >>>>>>>>>>>
> >>>>>>>>>>
> >>>>>>>>>
> >>>>>>>>
> >>>>>>
> >>>>> https://guava.dev/releases/18.0/api/docs/com/google/common/cache/CacheBuilder.html#weigher(com.google.common.cache.Weigher)
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> чт, 5 мая 2022 г. в 20:34, Jark Wu <
> >>>>>>>>>>> imj...@gmail.com
> >>>>>>>>>>>>>> :
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> It's great to see the active discussion! I
> >>>>>>>>> want
> >>>>>>>>>> to
> >>>>>>>>>>>>>>> share
> >>>>>>>>>>>>>>>>>> my
> >>>>>>>>>>>>>>>>>>> ideas:
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> 1) implement the cache in framework vs.
> >>>>>>>>>> connectors
> >>>>>>>>>>>>> base
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> I don't have a strong opinion on this. Both
> >>>>>>>>> ways
> >>>>>>>>>>>>> should
> >>>>>>>>>>>>>>>>>>> work (e.g.,
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>> cache
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> pruning, compatibility).
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> The framework way can provide more concise
> >>>>>>>>>>>>> interfaces.
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> The connector base way can define more
> >>>>>>>>> flexible
> >>>>>>>>>>>>> cache
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> strategies/implementations.
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> We are still investigating a way to see if
> >>>>>>>> we
> >>>>>>>>>> can
> >>>>>>>>>>>>> have
> >>>>>>>>>>>>>>>>>> both
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>> advantages.
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> We should reach a consensus that the way
> >>>>>>>>> should
> >>>>>>>>>>> be a
> >>>>>>>>>>>>>>>>> final
> >>>>>>>>>>>>>>>>>>> state,
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>> and we
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> are on the path to it.
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> 2) filters and projections pushdown:
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> I agree with Alex that the filter pushdown
> >>>>>>>>> into
> >>>>>>>>>>>>> cache
> >>>>>>>>>>>>>>>>> can
> >>>>>>>>>>>>>>>>>>> benefit a
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>> lot
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> for
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> ALL cache.
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> However, this is not true for LRU cache.
> >>>>>>>>>>> Connectors
> >>>>>>>>>>>>> use
> >>>>>>>>>>>>>>>>>>> cache to
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>> reduce
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> IO
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> requests to databases for better throughput.
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> If a filter can prune 90% of data in the
> >>>>>>>>> cache,
> >>>>>>>>>> we
> >>>>>>>>>>>>> will
> >>>>>>>>>>>>>>>>>>> have 90% of
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> lookup
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> requests that can never be cached
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> and hit directly to the databases. That
> >>>>>>>> means
> >>>>>>>>>> the
> >>>>>>>>>>>>> cache
> >>>>>>>>>>>>>>>>> is
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>> meaningless in
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> this case.
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> IMO, Flink SQL has provided a standard way
> >>>>>>>> to
> >>>>>>>>> do
> >>>>>>>>>>>>>>> filters
> >>>>>>>>>>>>>>>>>>> and projects
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> pushdown, i.e., SupportsFilterPushDown and
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>> SupportsProjectionPushDown.
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Jdbc/hive/HBase haven't implemented the
> >>>>>>>>>>> interfaces,
> >>>>>>>>>>>>>>>>> don't
> >>>>>>>>>>>>>>>>>>> mean it's
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>> hard
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> to
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> implement.
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> They should implement the pushdown
> >>>>>>>> interfaces
> >>>>>>>>> to
> >>>>>>>>>>>>> reduce
> >>>>>>>>>>>>>>>>> IO
> >>>>>>>>>>>>>>>>>>> and the
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>> cache
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> size.
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> That should be a final state that the scan
> >>>>>>>>>> source
> >>>>>>>>>>>>> and
> >>>>>>>>>>>>>>>>>>> lookup source
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>> share
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> the exact pushdown implementation.
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> I don't see why we need to duplicate the
> >>>>>>>>>> pushdown
> >>>>>>>>>>>>> logic
> >>>>>>>>>>>>>>>>> in
> >>>>>>>>>>>>>>>>>>> caches,
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>> which
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> will complex the lookup join design.
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> 3) ALL cache abstraction
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> All cache might be the most challenging part
> >>>>>>>>> of
> >>>>>>>>>>> this
> >>>>>>>>>>>>>>>>> FLIP.
> >>>>>>>>>>>>>>>>>>> We have
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>> never
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> provided a reload-lookup public interface.
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Currently, we put the reload logic in the
> >>>>>>>>> "eval"
> >>>>>>>>>>>>> method
> >>>>>>>>>>>>>>>>> of
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>> TableFunction.
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> That's hard for some sources (e.g., Hive).
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Ideally, connector implementation should
> >>>>>>>> share
> >>>>>>>>>> the
> >>>>>>>>>>>>>>> logic
> >>>>>>>>>>>>>>>>>> of
> >>>>>>>>>>>>>>>>>>> reload
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>> and
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> scan, i.e. ScanTableSource with
> >>>>>>>>>>>>>>>>>>> InputFormat/SourceFunction/FLIP-27
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Source.
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> However, InputFormat/SourceFunction are
> >>>>>>>>>>> deprecated,
> >>>>>>>>>>>>> and
> >>>>>>>>>>>>>>>>>> the
> >>>>>>>>>>>>>>>>>>> FLIP-27
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> source
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> is deeply coupled with SourceOperator.
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> If we want to invoke the FLIP-27 source in
> >>>>>>>>>>>>> LookupJoin,
> >>>>>>>>>>>>>>>>>> this
> >>>>>>>>>>>>>>>>>>> may make
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>> the
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> scope of this FLIP much larger.
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> We are still investigating how to abstract
> >>>>>>>> the
> >>>>>>>>>> ALL
> >>>>>>>>>>>>>>> cache
> >>>>>>>>>>>>>>>>>>> logic and
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>> reuse
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> the existing source interfaces.
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Best,
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Jark
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> On Thu, 5 May 2022 at 20:22, Roman Boyko <
> >>>>>>>>>>>>>>>>>>> ro.v.bo...@gmail.com>
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>> wrote:
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> It's a much more complicated activity and
> >>>>>>>>> lies
> >>>>>>>>>>> out
> >>>>>>>>>>>>> of
> >>>>>>>>>>>>>>>>> the
> >>>>>>>>>>>>>>>>>>> scope of
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>> this
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> improvement. Because such pushdowns should
> >>>>>>>> be
> >>>>>>>>>>> done
> >>>>>>>>>>>>> for
> >>>>>>>>>>>>>>>>>> all
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> ScanTableSource
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> implementations (not only for Lookup ones).
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> On Thu, 5 May 2022 at 19:02, Martijn
> >>>>>>>> Visser <
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>> martijnvis...@apache.org>
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> wrote:
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Hi everyone,
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> One question regarding "And Alexander
> >>>>>>>>>> correctly
> >>>>>>>>>>>>>>>>>> mentioned
> >>>>>>>>>>>>>>>>>>> that
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>> filter
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> pushdown still is not implemented for
> >>>>>>>>>>>>>>>>> jdbc/hive/hbase."
> >>>>>>>>>>>>>>>>>>> -> Would
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>> an
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> alternative solution be to actually
> >>>>>>>>> implement
> >>>>>>>>>>>>> these
> >>>>>>>>>>>>>>>>>> filter
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>> pushdowns?
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> I
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> can
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> imagine that there are many more benefits
> >>>>>>>> to
> >>>>>>>>>>> doing
> >>>>>>>>>>>>>>>>> that,
> >>>>>>>>>>>>>>>>>>> outside
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>> of
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> lookup
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> caching and metrics.
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Best regards,
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Martijn Visser
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> https://twitter.com/MartijnVisser82
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> https://github.com/MartijnVisser
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> On Thu, 5 May 2022 at 13:58, Roman Boyko <
> >>>>>>>>>>>>>>>>>>> ro.v.bo...@gmail.com>
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> wrote:
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Hi everyone!
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Thanks for driving such a valuable
> >>>>>>>>>> improvement!
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> I do think that single cache
> >>>>>>>> implementation
> >>>>>>>>>>>>> would be
> >>>>>>>>>>>>>>>>> a
> >>>>>>>>>>>>>>>>>>> nice
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> opportunity
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> for
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> users. And it will break the "FOR
> >>>>>>>>> SYSTEM_TIME
> >>>>>>>>>>> AS
> >>>>>>>>>>>>> OF
> >>>>>>>>>>>>>>>>>>> proc_time"
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> semantics
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> anyway - doesn't matter how it will be
> >>>>>>>>>>>>> implemented.
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Putting myself in the user's shoes, I can
> >>>>>>>>> say
> >>>>>>>>>>>>> that:
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> 1) I would prefer to have the opportunity
> >>>>>>>>> to
> >>>>>>>>>>> cut
> >>>>>>>>>>>>> off
> >>>>>>>>>>>>>>>>>> the
> >>>>>>>>>>>>>>>>>>> cache
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>> size
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> by
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> simply filtering unnecessary data. And
> >>>>>>>> the
> >>>>>>>>>> most
> >>>>>>>>>>>>>>> handy
> >>>>>>>>>>>>>>>>>>> way to do
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>> it
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> is
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> apply
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> it inside LookupRunners. It would be a
> >>>>>>>> bit
> >>>>>>>>>>>>> harder to
> >>>>>>>>>>>>>>>>>>> pass it
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> through the
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> LookupJoin node to TableFunction. And
> >>>>>>>>>> Alexander
> >>>>>>>>>>>>>>>>>> correctly
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>> mentioned
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> that
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> filter pushdown still is not implemented
> >>>>>>>>> for
> >>>>>>>>>>>>>>>>>>> jdbc/hive/hbase.
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> 2) The ability to set the different
> >>>>>>>> caching
> >>>>>>>>>>>>>>>>> parameters
> >>>>>>>>>>>>>>>>>>> for
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>> different
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> tables
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> is quite important. So I would prefer to
> >>>>>>>>> set
> >>>>>>>>>> it
> >>>>>>>>>>>>>>>>> through
> >>>>>>>>>>>>>>>>>>> DDL
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>> rather
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> than
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> have similar ttla, strategy and other
> >>>>>>>>> options
> >>>>>>>>>>> for
> >>>>>>>>>>>>>>> all
> >>>>>>>>>>>>>>>>>>> lookup
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>> tables.
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> 3) Providing the cache into the framework
> >>>>>>>>>>> really
> >>>>>>>>>>>>>>>>>>> deprives us of
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> extensibility (users won't be able to
> >>>>>>>>>> implement
> >>>>>>>>>>>>>>> their
> >>>>>>>>>>>>>>>>>> own
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>> cache).
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> But
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> most
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> probably it might be solved by creating
> >>>>>>>>> more
> >>>>>>>>>>>>>>>>> different
> >>>>>>>>>>>>>>>>>>> cache
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> strategies
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> and
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> a wider set of configurations.
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> All these points are much closer to the
> >>>>>>>>>> schema
> >>>>>>>>>>>>>>>>> proposed
> >>>>>>>>>>>>>>>>>>> by
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Alexander.
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Qingshen Ren, please correct me if I'm
> >>>>>>>> not
> >>>>>>>>>>> right
> >>>>>>>>>>>>> and
> >>>>>>>>>>>>>>>>>> all
> >>>>>>>>>>>>>>>>>>> these
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> facilities
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> might be simply implemented in your
> >>>>>>>>>>> architecture?
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Best regards,
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Roman Boyko
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> e.: ro.v.bo...@gmail.com
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> On Wed, 4 May 2022 at 21:01, Martijn
> >>>>>>>>> Visser <
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> martijnvis...@apache.org>
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> wrote:
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Hi everyone,
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> I don't have much to chip in, but just
> >>>>>>>>>> wanted
> >>>>>>>>>>> to
> >>>>>>>>>>>>>>>>>>> express that
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>> I
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> really
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> appreciate the in-depth discussion on
> >>>>>>>> this
> >>>>>>>>>>> topic
> >>>>>>>>>>>>>>>>> and I
> >>>>>>>>>>>>>>>>>>> hope
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>> that
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> others
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> will join the conversation.
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Best regards,
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Martijn
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> On Tue, 3 May 2022 at 10:15, Александр
> >>>>>>>>>>> Смирнов <
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> smirale...@gmail.com>
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> wrote:
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Hi Qingsheng, Leonard and Jark,
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Thanks for your detailed feedback!
> >>>>>>>>>> However, I
> >>>>>>>>>>>>> have
> >>>>>>>>>>>>>>>>>>> questions
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> about
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> some of your statements (maybe I didn't
> >>>>>>>>> get
> >>>>>>>>>>>>>>>>>>> something?).
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Caching actually breaks the semantic
> >>>>>>>> of
> >>>>>>>>>> "FOR
> >>>>>>>>>>>>>>>>>>> SYSTEM_TIME
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>> AS OF
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> proc_time”
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> I agree that the semantics of "FOR
> >>>>>>>>>>> SYSTEM_TIME
> >>>>>>>>>>>>> AS
> >>>>>>>>>>>>>>>>> OF
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>> proc_time"
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> is
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> not
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> fully implemented with caching, but as
> >>>>>>>>> you
> >>>>>>>>>>>>> said,
> >>>>>>>>>>>>>>>>>> users
> >>>>>>>>>>>>>>>>>>> go
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>> on it
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> consciously to achieve better
> >>>>>>>> performance
> >>>>>>>>>> (no
> >>>>>>>>>>>>> one
> >>>>>>>>>>>>>>>>>>> proposed
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>> to
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> enable
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> caching by default, etc.). Or by users
> >>>>>>>> do
> >>>>>>>>>> you
> >>>>>>>>>>>>> mean
> >>>>>>>>>>>>>>>>>>> other
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> developers
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> of
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> connectors? In this case developers
> >>>>>>>>>>> explicitly
> >>>>>>>>>>>>>>>>>> specify
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>> whether
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> their
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> connector supports caching or not (in
> >>>>>>>> the
> >>>>>>>>>>> list
> >>>>>>>>>>>>> of
> >>>>>>>>>>>>>>>>>>> supported
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> options),
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> no one makes them do that if they don't
> >>>>>>>>>> want
> >>>>>>>>>>>>> to.
> >>>>>>>>>>>>>>> So
> >>>>>>>>>>>>>>>>>>> what
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> exactly is
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> the difference between implementing
> >>>>>>>>> caching
> >>>>>>>>>>> in
> >>>>>>>>>>>>>>>>>> modules
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> flink-table-runtime and in
> >>>>>>>>>> flink-table-common
> >>>>>>>>>>>>> from
> >>>>>>>>>>>>>>>>>> the
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> considered
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> point of view? How does it affect on
> >>>>>>>>>>>>>>>>>>> breaking/non-breaking
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>> the
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> semantics of "FOR SYSTEM_TIME AS OF
> >>>>>>>>>>> proc_time"?
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> confront a situation that allows table
> >>>>>>>>>>>>> options in
> >>>>>>>>>>>>>>>>>> DDL
> >>>>>>>>>>>>>>>>>>> to
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> control
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> the
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> behavior of the framework, which has
> >>>>>>>>> never
> >>>>>>>>>>>>>>> happened
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>> previously
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> and
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> should
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> be cautious
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> If we talk about main differences of
> >>>>>>>>>>> semantics
> >>>>>>>>>>>>> of
> >>>>>>>>>>>>>>>>> DDL
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>> options
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> and
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> config options("table.exec.xxx"), isn't
> >>>>>>>>> it
> >>>>>>>>>>>>> about
> >>>>>>>>>>>>>>>>>>> limiting
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>> the
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> scope
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> of
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> the options + importance for the user
> >>>>>>>>>>> business
> >>>>>>>>>>>>>>>>> logic
> >>>>>>>>>>>>>>>>>>> rather
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>> than
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> specific location of corresponding
> >>>>>>>> logic
> >>>>>>>>> in
> >>>>>>>>>>> the
> >>>>>>>>>>>>>>>>>>> framework? I
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> mean
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> that
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> in my design, for example, putting an
> >>>>>>>>>> option
> >>>>>>>>>>>>> with
> >>>>>>>>>>>>>>>>>>> lookup
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>> cache
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> strategy in configurations would  be
> >>>>>>>> the
> >>>>>>>>>>> wrong
> >>>>>>>>>>>>>>>>>>> decision,
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> because it
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> directly affects the user's business
> >>>>>>>>> logic
> >>>>>>>>>>> (not
> >>>>>>>>>>>>>>>>> just
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>> performance
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> optimization) + touches just several
> >>>>>>>>>>> functions
> >>>>>>>>>>>>> of
> >>>>>>>>>>>>>>>>> ONE
> >>>>>>>>>>>>>>>>>>> table
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> (there
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> can
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> be multiple tables with different
> >>>>>>>>> caches).
> >>>>>>>>>>>>> Does it
> >>>>>>>>>>>>>>>>>>> really
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> matter for
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> the user (or someone else) where the
> >>>>>>>>> logic
> >>>>>>>>>> is
> >>>>>>>>>>>>>>>>>> located,
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>> which is
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> affected by the applied option?
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Also I can remember DDL option
> >>>>>>>>>>>>> 'sink.parallelism',
> >>>>>>>>>>>>>>>>>>> which in
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> some way
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> "controls the behavior of the
> >>>>>>>> framework"
> >>>>>>>>>> and
> >>>>>>>>>>> I
> >>>>>>>>>>>>>>>>> don't
> >>>>>>>>>>>>>>>>>>> see any
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> problem
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> here.
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> introduce a new interface for this
> >>>>>>>>>>> all-caching
> >>>>>>>>>>>>>>>>>>> scenario
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>> and
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> the
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> design
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> would become more complex
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> This is a subject for a separate
> >>>>>>>>>> discussion,
> >>>>>>>>>>>>> but
> >>>>>>>>>>>>>>>>>>> actually
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>> in our
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> internal version we solved this problem
> >>>>>>>>>> quite
> >>>>>>>>>>>>>>>>> easily
> >>>>>>>>>>>>>>>>>> -
> >>>>>>>>>>>>>>>>>>> we
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>> reused
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> InputFormat class (so there is no need
> >>>>>>>>> for
> >>>>>>>>>> a
> >>>>>>>>>>>>> new
> >>>>>>>>>>>>>>>>>> API).
> >>>>>>>>>>>>>>>>>>> The
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> point is
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> that currently all lookup connectors
> >>>>>>>> use
> >>>>>>>>>>>>>>>>> InputFormat
> >>>>>>>>>>>>>>>>>>> for
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> scanning
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> the
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> data in batch mode: HBase, JDBC and
> >>>>>>>> even
> >>>>>>>>>> Hive
> >>>>>>>>>>>>> - it
> >>>>>>>>>>>>>>>>>> uses
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>> class
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> PartitionReader, that is actually just
> >>>>>>>> a
> >>>>>>>>>>>>> wrapper
> >>>>>>>>>>>>>>>>>> around
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> InputFormat.
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> The advantage of this solution is the
> >>>>>>>>>> ability
> >>>>>>>>>>>>> to
> >>>>>>>>>>>>>>>>>> reload
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>> cache
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> data
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> in
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> parallel (number of threads depends on
> >>>>>>>>>> number
> >>>>>>>>>>>>> of
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>> InputSplits,
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> but
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> has
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> an upper limit). As a result cache
> >>>>>>>> reload
> >>>>>>>>>>> time
> >>>>>>>>>>>>>>>>>>> significantly
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> reduces
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> (as well as time of input stream
> >>>>>>>>>> blocking). I
> >>>>>>>>>>>>> know
> >>>>>>>>>>>>>>>>>> that
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>> usually
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> we
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> try
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> to avoid usage of concurrency in Flink
> >>>>>>>>>> code,
> >>>>>>>>>>>>> but
> >>>>>>>>>>>>>>>>>> maybe
> >>>>>>>>>>>>>>>>>>> this
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>> one
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> can
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> be
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> an exception. BTW I don't say that it's
> >>>>>>>>> an
> >>>>>>>>>>>>> ideal
> >>>>>>>>>>>>>>>>>>> solution,
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>> maybe
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> there
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> are better ones.
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Providing the cache in the framework
> >>>>>>>>> might
> >>>>>>>>>>>>>>>>> introduce
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> compatibility
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> issues
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> It's possible only in cases when the
> >>>>>>>>>>> developer
> >>>>>>>>>>>>> of
> >>>>>>>>>>>>>>>>> the
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>> connector
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> won't
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> properly refactor his code and will use
> >>>>>>>>> new
> >>>>>>>>>>>>> cache
> >>>>>>>>>>>>>>>>>>> options
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> incorrectly
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> (i.e. explicitly provide the same
> >>>>>>>> options
> >>>>>>>>>>> into
> >>>>>>>>>>>>> 2
> >>>>>>>>>>>>>>>>>>> different
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>> code
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> places). For correct behavior all he
> >>>>>>>> will
> >>>>>>>>>>> need
> >>>>>>>>>>>>> to
> >>>>>>>>>>>>>>>>> do
> >>>>>>>>>>>>>>>>>>> is to
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> redirect
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> existing options to the framework's
> >>>>>>>>>>>>> LookupConfig
> >>>>>>>>>>>>>>> (+
> >>>>>>>>>>>>>>>>>>> maybe
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>> add an
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> alias
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> for options, if there was different
> >>>>>>>>>> naming),
> >>>>>>>>>>>>>>>>>> everything
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>> will be
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> transparent for users. If the developer
> >>>>>>>>>> won't
> >>>>>>>>>>>>> do
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>> refactoring at
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> all,
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> nothing will be changed for the
> >>>>>>>> connector
> >>>>>>>>>>>>> because
> >>>>>>>>>>>>>>>>> of
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>> backward
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> compatibility. Also if a developer
> >>>>>>>> wants
> >>>>>>>>> to
> >>>>>>>>>>> use
> >>>>>>>>>>>>>>> his
> >>>>>>>>>>>>>>>>>> own
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>> cache
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> logic,
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> he just can refuse to pass some of the
> >>>>>>>>>>> configs
> >>>>>>>>>>>>>>> into
> >>>>>>>>>>>>>>>>>> the
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> framework,
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> and
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> instead make his own implementation
> >>>>>>>> with
> >>>>>>>>>>>>> already
> >>>>>>>>>>>>>>>>>>> existing
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> configs
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> and
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> metrics (but actually I think that
> >>>>>>>> it's a
> >>>>>>>>>>> rare
> >>>>>>>>>>>>>>>>> case).
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> filters and projections should be
> >>>>>>>> pushed
> >>>>>>>>>> all
> >>>>>>>>>>>>> the
> >>>>>>>>>>>>>>>>> way
> >>>>>>>>>>>>>>>>>>> down
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>> to
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> the
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> table
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> function, like what we do in the scan
> >>>>>>>>>> source
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> It's the great purpose. But the truth
> >>>>>>>> is
> >>>>>>>>>> that
> >>>>>>>>>>>>> the
> >>>>>>>>>>>>>>>>>> ONLY
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>> connector
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> that
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> supports filter pushdown is
> >>>>>>>>>>>>> FileSystemTableSource
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> (no database connector supports it
> >>>>>>>>>>> currently).
> >>>>>>>>>>>>>>> Also
> >>>>>>>>>>>>>>>>>>> for some
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> databases
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> it's simply impossible to pushdown such
> >>>>>>>>>>> complex
> >>>>>>>>>>>>>>>>>> filters
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>> that we
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> have
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> in Flink.
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> only applying these optimizations to
> >>>>>>>> the
> >>>>>>>>>>> cache
> >>>>>>>>>>>>>>>>> seems
> >>>>>>>>>>>>>>>>>>> not
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> quite
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> useful
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Filters can cut off an arbitrarily
> >>>>>>>> large
> >>>>>>>>>>>>> amount of
> >>>>>>>>>>>>>>>>>> data
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>> from the
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> dimension table. For a simple example,
> >>>>>>>>>>> suppose
> >>>>>>>>>>>>> in
> >>>>>>>>>>>>>>>>>>> dimension
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> table
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> 'users'
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> we have column 'age' with values from
> >>>>>>>> 20
> >>>>>>>>> to
> >>>>>>>>>>> 40,
> >>>>>>>>>>>>>>> and
> >>>>>>>>>>>>>>>>>>> input
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>> stream
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> 'clicks' that is ~uniformly distributed
> >>>>>>>>> by
> >>>>>>>>>>> age
> >>>>>>>>>>>>> of
> >>>>>>>>>>>>>>>>>>> users. If
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>> we
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> have
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> filter 'age > 30',
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> there will be twice less data in cache.
> >>>>>>>>>> This
> >>>>>>>>>>>>> means
> >>>>>>>>>>>>>>>>>> the
> >>>>>>>>>>>>>>>>>>> user
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>> can
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> increase 'lookup.cache.max-rows' by
> >>>>>>>>> almost
> >>>>>>>>>> 2
> >>>>>>>>>>>>>>> times.
> >>>>>>>>>>>>>>>>>> It
> >>>>>>>>>>>>>>>>>>> will
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> gain a
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> huge
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> performance boost. Moreover, this
> >>>>>>>>>>> optimization
> >>>>>>>>>>>>>>>>> starts
> >>>>>>>>>>>>>>>>>>> to
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>> really
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> shine
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> in 'ALL' cache, where tables without
> >>>>>>>>>> filters
> >>>>>>>>>>>>> and
> >>>>>>>>>>>>>>>>>>> projections
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> can't
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> fit
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> in memory, but with them - can. This
> >>>>>>>>> opens
> >>>>>>>>>> up
> >>>>>>>>>>>>>>>>>>> additional
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> possibilities
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> for users. And this doesn't sound as
> >>>>>>>> 'not
> >>>>>>>>>>> quite
> >>>>>>>>>>>>>>>>>>> useful'.
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> It would be great to hear other voices
> >>>>>>>>>>>>> regarding
> >>>>>>>>>>>>>>>>> this
> >>>>>>>>>>>>>>>>>>> topic!
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Because
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> we have quite a lot of controversial
> >>>>>>>>>> points,
> >>>>>>>>>>>>> and I
> >>>>>>>>>>>>>>>>>>> think
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>> with
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> the
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> help
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> of others it will be easier for us to
> >>>>>>>>> come
> >>>>>>>>>>> to a
> >>>>>>>>>>>>>>>>>>> consensus.
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Best regards,
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Smirnov Alexander
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> пт, 29 апр. 2022 г. в 22:33, Qingsheng
> >>>>>>>>> Ren
> >>>>>>>>>> <
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>> renqs...@gmail.com
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> :
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Hi Alexander and Arvid,
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Thanks for the discussion and sorry
> >>>>>>>> for
> >>>>>>>>> my
> >>>>>>>>>>>>> late
> >>>>>>>>>>>>>>>>>>> response!
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>> We
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> had
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> an
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> internal discussion together with Jark
> >>>>>>>>> and
> >>>>>>>>>>>>> Leonard
> >>>>>>>>>>>>>>>>>> and
> >>>>>>>>>>>>>>>>>>> I’d
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>> like
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> to
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> summarize our ideas. Instead of
> >>>>>>>>>> implementing
> >>>>>>>>>>>>> the
> >>>>>>>>>>>>>>>>>> cache
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>> logic in
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> the
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> table
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> runtime layer or wrapping around the
> >>>>>>>>>>>>> user-provided
> >>>>>>>>>>>>>>>>>>> table
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> function,
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> we
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> prefer to introduce some new APIs
> >>>>>>>>> extending
> >>>>>>>>>>>>>>>>>>> TableFunction
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>> with
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> these
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> concerns:
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> 1. Caching actually breaks the
> >>>>>>>> semantic
> >>>>>>>>> of
> >>>>>>>>>>>>> "FOR
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>> SYSTEM_TIME
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> AS OF
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> proc_time”, because it couldn’t truly
> >>>>>>>>>> reflect
> >>>>>>>>>>>>> the
> >>>>>>>>>>>>>>>>>>> content
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>> of the
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> lookup
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> table at the moment of querying. If
> >>>>>>>> users
> >>>>>>>>>>>>> choose
> >>>>>>>>>>>>>>> to
> >>>>>>>>>>>>>>>>>>> enable
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> caching
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> on
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> the
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> lookup table, they implicitly indicate
> >>>>>>>>> that
> >>>>>>>>>>>>> this
> >>>>>>>>>>>>>>>>>>> breakage is
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> acceptable
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> in
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> exchange for the performance. So we
> >>>>>>>>> prefer
> >>>>>>>>>>> not
> >>>>>>>>>>>>> to
> >>>>>>>>>>>>>>>>>>> provide
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> caching on
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> the
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> table runtime level.
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> 2. If we make the cache implementation
> >>>>>>>>> in
> >>>>>>>>>>> the
> >>>>>>>>>>>>>>>>>>> framework
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> (whether
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> in a
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> runner or a wrapper around
> >>>>>>>>> TableFunction),
> >>>>>>>>>> we
> >>>>>>>>>>>>> have
> >>>>>>>>>>>>>>>>> to
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>> confront a
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> situation
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> that allows table options in DDL to
> >>>>>>>>> control
> >>>>>>>>>>> the
> >>>>>>>>>>>>>>>>>>> behavior of
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>> the
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> framework,
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> which has never happened previously and
> >>>>>>>>>>> should
> >>>>>>>>>>>>> be
> >>>>>>>>>>>>>>>>>>> cautious.
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Under
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> the
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> current design the behavior of the
> >>>>>>>>>> framework
> >>>>>>>>>>>>>>> should
> >>>>>>>>>>>>>>>>>>> only be
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> specified
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> by
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> configurations (“table.exec.xxx”), and
> >>>>>>>>> it’s
> >>>>>>>>>>>>> hard
> >>>>>>>>>>>>>>> to
> >>>>>>>>>>>>>>>>>>> apply
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>> these
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> general
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> configs to a specific table.
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> 3. We have use cases that lookup
> >>>>>>>> source
> >>>>>>>>>>> loads
> >>>>>>>>>>>>> and
> >>>>>>>>>>>>>>>>>>> refresh
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>> all
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> records
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> periodically into the memory to achieve
> >>>>>>>>>> high
> >>>>>>>>>>>>>>> lookup
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>> performance
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> (like
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Hive
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> connector in the community, and also
> >>>>>>>>> widely
> >>>>>>>>>>>>> used
> >>>>>>>>>>>>>>> by
> >>>>>>>>>>>>>>>>>> our
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>> internal
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> connectors). Wrapping the cache around
> >>>>>>>>> the
> >>>>>>>>>>>>> user’s
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>> TableFunction
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> works
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> fine
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> for LRU caches, but I think we have to
> >>>>>>>>>>>>> introduce a
> >>>>>>>>>>>>>>>>>> new
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> interface for
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> this
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> all-caching scenario and the design
> >>>>>>>> would
> >>>>>>>>>>>>> become
> >>>>>>>>>>>>>>>>> more
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>> complex.
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> 4. Providing the cache in the
> >>>>>>>> framework
> >>>>>>>>>>> might
> >>>>>>>>>>>>>>>>>>> introduce
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> compatibility
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> issues to existing lookup sources like
> >>>>>>>>>> there
> >>>>>>>>>>>>> might
> >>>>>>>>>>>>>>>>>>> exist two
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> caches
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> with
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> totally different strategies if the
> >>>>>>>> user
> >>>>>>>>>>>>>>>>> incorrectly
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>> configures
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> the
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> table
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> (one in the framework and another
> >>>>>>>>>> implemented
> >>>>>>>>>>>>> by
> >>>>>>>>>>>>>>>>> the
> >>>>>>>>>>>>>>>>>>> lookup
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> source).
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> As for the optimization mentioned by
> >>>>>>>>>>>>> Alexander, I
> >>>>>>>>>>>>>>>>>>> think
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> filters
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> and
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> projections should be pushed all the
> >>>>>>>> way
> >>>>>>>>>> down
> >>>>>>>>>>>>> to
> >>>>>>>>>>>>>>>>> the
> >>>>>>>>>>>>>>>>>>> table
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> function,
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> like
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> what we do in the scan source, instead
> >>>>>>>> of
> >>>>>>>>>> the
> >>>>>>>>>>>>>>>>> runner
> >>>>>>>>>>>>>>>>>>> with
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>> the
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> cache.
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> The
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> goal of using cache is to reduce the
> >>>>>>>>>> network
> >>>>>>>>>>>>> I/O
> >>>>>>>>>>>>>>>>> and
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>> pressure
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> on the
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> external system, and only applying
> >>>>>>>> these
> >>>>>>>>>>>>>>>>>> optimizations
> >>>>>>>>>>>>>>>>>>> to
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>> the
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> cache
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> seems
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> not quite useful.
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> I made some updates to the FLIP[1] to
> >>>>>>>>>>> reflect
> >>>>>>>>>>>>> our
> >>>>>>>>>>>>>>>>>>> ideas.
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>> We
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> prefer to
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> keep the cache implementation as a part
> >>>>>>>>> of
> >>>>>>>>>>>>>>>>>>> TableFunction,
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>> and we
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> could
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> provide some helper classes
> >>>>>>>>>>>>> (CachingTableFunction,
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> AllCachingTableFunction,
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> CachingAsyncTableFunction) to
> >>>>>>>> developers
> >>>>>>>>>> and
> >>>>>>>>>>>>>>>>> regulate
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>> metrics
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> of the
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> cache.
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Also, I made a POC[2] for your
> >>>>>>>> reference.
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Looking forward to your ideas!
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> [1]
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>
> >>>>>>>>>>>>>
> >>>>>>>>>>>
> >>>>>>>>>>
> >>>>>>>>>
> >>>>>>>>
> >>>>>>
> >>>>> https://cwiki.apache.org/confluence/display/FLINK/FLIP-221+Abstraction+for+lookup+source+cache+and+metric
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> [2]
> >>>>>>>>>>>>>>>>>> https://github.com/PatrickRen/flink/tree/FLIP-221
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Best regards,
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Qingsheng
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> On Tue, Apr 26, 2022 at 4:45 PM
> >>>>>>>>> Александр
> >>>>>>>>>>>>> Смирнов
> >>>>>>>>>>>>>>>>> <
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> smirale...@gmail.com>
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> wrote:
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Thanks for the response, Arvid!
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> I have few comments on your message.
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> but could also live with an easier
> >>>>>>>>>>> solution
> >>>>>>>>>>>>> as
> >>>>>>>>>>>>>>>>> the
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>> first
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> step:
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> I think that these 2 ways are
> >>>>>>>> mutually
> >>>>>>>>>>>>> exclusive
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>> (originally
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> proposed
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> by Qingsheng and mine), because
> >>>>>>>>>>> conceptually
> >>>>>>>>>>>>>>> they
> >>>>>>>>>>>>>>>>>>> follow
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>> the
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> same
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> goal, but implementation details are
> >>>>>>>>>>>>> different.
> >>>>>>>>>>>>>>>>> If
> >>>>>>>>>>>>>>>>>> we
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>> will
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> go one
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> way,
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> moving to another way in the future
> >>>>>>>>> will
> >>>>>>>>>>> mean
> >>>>>>>>>>>>>>>>>>> deleting
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> existing
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> code
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> and once again changing the API for
> >>>>>>>>>>>>> connectors.
> >>>>>>>>>>>>>>>>> So
> >>>>>>>>>>>>>>>>>> I
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>> think we
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> should
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> reach a consensus with the community
> >>>>>>>>>> about
> >>>>>>>>>>>>> that
> >>>>>>>>>>>>>>>>> and
> >>>>>>>>>>>>>>>>>>> then
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>> work
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> together
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> on this FLIP, i.e. divide the work on
> >>>>>>>>>> tasks
> >>>>>>>>>>>>> for
> >>>>>>>>>>>>>>>>>>> different
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> parts
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> of
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> the
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> flip (for example, LRU cache
> >>>>>>>>> unification
> >>>>>>>>>> /
> >>>>>>>>>>>>>>>>>>> introducing
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> proposed
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> set
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> of
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> metrics / further work…). WDYT,
> >>>>>>>>>> Qingsheng?
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> as the source will only receive the
> >>>>>>>>>>> requests
> >>>>>>>>>>>>>>>>> after
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>> filter
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Actually if filters are applied to
> >>>>>>>>> fields
> >>>>>>>>>>> of
> >>>>>>>>>>>>> the
> >>>>>>>>>>>>>>>>>>> lookup
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> table, we
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> firstly must do requests, and only
> >>>>>>>>> after
> >>>>>>>>>>>>> that we
> >>>>>>>>>>>>>>>>>> can
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>> filter
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> responses,
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> because lookup connectors don't have
> >>>>>>>>>> filter
> >>>>>>>>>>>>>>>>>>> pushdown. So
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>> if
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> filtering
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> is done before caching, there will be
> >>>>>>>>>> much
> >>>>>>>>>>>>> less
> >>>>>>>>>>>>>>>>>> rows
> >>>>>>>>>>>>>>>>>>> in
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> cache.
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> @Alexander unfortunately, your
> >>>>>>>>>>> architecture
> >>>>>>>>>>>>> is
> >>>>>>>>>>>>>>>>> not
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>> shared.
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> I
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> don't
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> know the
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> solution to share images to be
> >>>>>>>> honest.
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Sorry for that, I’m a bit new to such
> >>>>>>>>>> kinds
> >>>>>>>>>>>>> of
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>> conversations
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> :)
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> I have no write access to the
> >>>>>>>>> confluence,
> >>>>>>>>>>> so
> >>>>>>>>>>>>> I
> >>>>>>>>>>>>>>>>>> made a
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Jira
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> issue,
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> where described the proposed changes
> >>>>>>>> in
> >>>>>>>>>>> more
> >>>>>>>>>>>>>>>>>> details
> >>>>>>>>>>>>>>>>>>> -
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>> https://issues.apache.org/jira/browse/FLINK-27411.
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Will happy to get more feedback!
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Best,
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Smirnov Alexander
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> пн, 25 апр. 2022 г. в 19:49, Arvid
> >>>>>>>>> Heise
> >>>>>>>>>> <
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>> ar...@apache.org>:
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Hi Qingsheng,
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Thanks for driving this; the
> >>>>>>>>>> inconsistency
> >>>>>>>>>>>>> was
> >>>>>>>>>>>>>>>>> not
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> satisfying
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> for
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> me.
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> I second Alexander's idea though but
> >>>>>>>>>> could
> >>>>>>>>>>>>> also
> >>>>>>>>>>>>>>>>>> live
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>> with
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> an
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> easier
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> solution as the first step: Instead
> >>>>>>>> of
> >>>>>>>>>>>>> making
> >>>>>>>>>>>>>>>>>>> caching
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>> an
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> implementation
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> detail of TableFunction X, rather
> >>>>>>>>>> devise a
> >>>>>>>>>>>>>>>>> caching
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>> layer
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> around X.
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> So
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> the
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> proposal would be a
> >>>>>>>>> CachingTableFunction
> >>>>>>>>>>>>> that
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>> delegates to
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> X in
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> case
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> of
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> misses and else manages the cache.
> >>>>>>>>>> Lifting
> >>>>>>>>>>>>> it
> >>>>>>>>>>>>>>>>> into
> >>>>>>>>>>>>>>>>>>> the
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> operator
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> model
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> as
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> proposed would be even better but is
> >>>>>>>>>>>>> probably
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>> unnecessary
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> in
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> the
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> first step
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> for a lookup source (as the source
> >>>>>>>>> will
> >>>>>>>>>>> only
> >>>>>>>>>>>>>>>>>> receive
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>> the
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> requests
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> after
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> filter; applying projection may be
> >>>>>>>>> more
> >>>>>>>>>>>>>>>>>> interesting
> >>>>>>>>>>>>>>>>>>> to
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>> save
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> memory).
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Another advantage is that all the
> >>>>>>>>>> changes
> >>>>>>>>>>> of
> >>>>>>>>>>>>>>>>> this
> >>>>>>>>>>>>>>>>>>> FLIP
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> would be
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> limited to
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> options, no need for new public
> >>>>>>>>>>> interfaces.
> >>>>>>>>>>>>>>>>>>> Everything
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>> else
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> remains
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> an
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> implementation of Table runtime.
> >>>>>>>> That
> >>>>>>>>>>> means
> >>>>>>>>>>>>> we
> >>>>>>>>>>>>>>>>> can
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>> easily
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> incorporate
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> the
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> optimization potential that
> >>>>>>>> Alexander
> >>>>>>>>>>>>> pointed
> >>>>>>>>>>>>>>>>> out
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>> later.
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> @Alexander unfortunately, your
> >>>>>>>>>>> architecture
> >>>>>>>>>>>>> is
> >>>>>>>>>>>>>>>>> not
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>> shared.
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> I
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> don't
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> know the
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> solution to share images to be
> >>>>>>>> honest.
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> On Fri, Apr 22, 2022 at 5:04 PM
> >>>>>>>>>> Александр
> >>>>>>>>>>>>>>>>> Смирнов
> >>>>>>>>>>>>>>>>>> <
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> smirale...@gmail.com>
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> wrote:
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Hi Qingsheng! My name is Alexander,
> >>>>>>>>> I'm
> >>>>>>>>>>>>> not a
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>> committer
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> yet,
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> but
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> I'd
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> really like to become one. And this
> >>>>>>>>>> FLIP
> >>>>>>>>>>>>>>> really
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> interested
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> me.
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Actually I have worked on a similar
> >>>>>>>>>>>>> feature in
> >>>>>>>>>>>>>>>>> my
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> company’s
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Flink
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> fork, and we would like to share
> >>>>>>>> our
> >>>>>>>>>>>>> thoughts
> >>>>>>>>>>>>>>>>> on
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>> this and
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> make
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> code
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> open source.
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> I think there is a better
> >>>>>>>> alternative
> >>>>>>>>>>> than
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>> introducing an
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> abstract
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> class for TableFunction
> >>>>>>>>>>>>>>> (CachingTableFunction).
> >>>>>>>>>>>>>>>>>> As
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>> you
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> know,
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> TableFunction exists in the
> >>>>>>>>>>>>> flink-table-common
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>> module,
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> which
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> provides
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> only an API for working with
> >>>>>>>> tables –
> >>>>>>>>>>> it’s
> >>>>>>>>>>>>>>> very
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> convenient
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> for
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> importing
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> in connectors. In turn,
> >>>>>>>>>>>>> CachingTableFunction
> >>>>>>>>>>>>>>>>>>> contains
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> logic
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> for
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> runtime execution,  so this class
> >>>>>>>> and
> >>>>>>>>>>>>>>>>> everything
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> connected
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> with
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> it
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> should be located in another
> >>>>>>>> module,
> >>>>>>>>>>>>> probably
> >>>>>>>>>>>>>>>>> in
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> flink-table-runtime.
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> But this will require connectors to
> >>>>>>>>>>> depend
> >>>>>>>>>>>>> on
> >>>>>>>>>>>>>>>>>>> another
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> module,
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> which
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> contains a lot of runtime logic,
> >>>>>>>>> which
> >>>>>>>>>>>>> doesn’t
> >>>>>>>>>>>>>>>>>>> sound
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> good.
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> I suggest adding a new method
> >>>>>>>>>>>>>>> ‘getLookupConfig’
> >>>>>>>>>>>>>>>>>> to
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> LookupTableSource
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> or LookupRuntimeProvider to allow
> >>>>>>>>>>>>> connectors
> >>>>>>>>>>>>>>> to
> >>>>>>>>>>>>>>>>>>> only
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>> pass
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> configurations to the planner,
> >>>>>>>>>> therefore
> >>>>>>>>>>>>> they
> >>>>>>>>>>>>>>>>>> won’t
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> depend on
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> runtime
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> realization. Based on these configs
> >>>>>>>>>>> planner
> >>>>>>>>>>>>>>>>> will
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> construct a
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> lookup
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> join operator with corresponding
> >>>>>>>>>> runtime
> >>>>>>>>>>>>> logic
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> (ProcessFunctions
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> in
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> module flink-table-runtime).
> >>>>>>>>>> Architecture
> >>>>>>>>>>>>>>> looks
> >>>>>>>>>>>>>>>>>>> like
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>> in
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> the
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> pinned
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> image (LookupConfig class there is
> >>>>>>>>>>> actually
> >>>>>>>>>>>>>>>>> yours
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> CacheConfig).
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Classes in flink-table-planner,
> >>>>>>>> that
> >>>>>>>>>> will
> >>>>>>>>>>>>> be
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>> responsible
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> for
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> this
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> –
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> CommonPhysicalLookupJoin and his
> >>>>>>>>>>>>> inheritors.
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Current classes for lookup join in
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>> flink-table-runtime
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> -
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> LookupJoinRunner,
> >>>>>>>>>> AsyncLookupJoinRunner,
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> LookupJoinRunnerWithCalc,
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> AsyncLookupJoinRunnerWithCalc.
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> I suggest adding classes
> >>>>>>>>>>>>>>>>> LookupJoinCachingRunner,
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> LookupJoinCachingRunnerWithCalc,
> >>>>>>>> etc.
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> And here comes another more
> >>>>>>>> powerful
> >>>>>>>>>>>>> advantage
> >>>>>>>>>>>>>>>>> of
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>> such a
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> solution.
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> If
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> we have caching logic on a lower
> >>>>>>>>> level,
> >>>>>>>>>>> we
> >>>>>>>>>>>>> can
> >>>>>>>>>>>>>>>>>>> apply
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>> some
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> optimizations to it.
> >>>>>>>>>>>>> LookupJoinRunnerWithCalc
> >>>>>>>>>>>>>>>>> was
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>> named
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> like
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> this
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> because it uses the ‘calc’
> >>>>>>>> function,
> >>>>>>>>>>> which
> >>>>>>>>>>>>>>>>>> actually
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> mostly
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> consists
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> of
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> filters and projections.
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> For example, in join table A with
> >>>>>>>>>> lookup
> >>>>>>>>>>>>> table
> >>>>>>>>>>>>>>>>> B
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> condition
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> ‘JOIN …
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> ON
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> A.id = B.id AND A.age = B.age + 10
> >>>>>>>>>> WHERE
> >>>>>>>>>>>>>>>>>> B.salary >
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>> 1000’
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> ‘calc’
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> function will contain filters
> >>>>>>>> A.age =
> >>>>>>>>>>>>> B.age +
> >>>>>>>>>>>>>>>>> 10
> >>>>>>>>>>>>>>>>>>> and
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> B.salary >
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> 1000.
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> If we apply this function before
> >>>>>>>>>> storing
> >>>>>>>>>>>>>>>>> records
> >>>>>>>>>>>>>>>>>> in
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> cache,
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> size
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> of
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> cache will be significantly
> >>>>>>>> reduced:
> >>>>>>>>>>>>> filters =
> >>>>>>>>>>>>>>>>>>> avoid
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> storing
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> useless
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> records in cache, projections =
> >>>>>>>>> reduce
> >>>>>>>>>>>>>>> records’
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>> size. So
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> the
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> initial
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> max number of records in cache can
> >>>>>>>> be
> >>>>>>>>>>>>>>> increased
> >>>>>>>>>>>>>>>>>> by
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>> the
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> user.
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> What do you think about it?
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> On 2022/04/19 02:47:11 Qingsheng
> >>>>>>>> Ren
> >>>>>>>>>>> wrote:
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Hi devs,
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Yuan and I would like to start a
> >>>>>>>>>>>>> discussion
> >>>>>>>>>>>>>>>>>> about
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> FLIP-221[1],
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> which
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> introduces an abstraction of lookup
> >>>>>>>>>> table
> >>>>>>>>>>>>>>> cache
> >>>>>>>>>>>>>>>>>> and
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>> its
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> standard
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> metrics.
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Currently each lookup table source
> >>>>>>>>>>> should
> >>>>>>>>>>>>>>>>>>> implement
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> their
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> own
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> cache to
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> store lookup results, and there
> >>>>>>>>> isn’t a
> >>>>>>>>>>>>>>>>> standard
> >>>>>>>>>>>>>>>>>> of
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> metrics
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> for
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> users and
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> developers to tuning their jobs
> >>>>>>>> with
> >>>>>>>>>>> lookup
> >>>>>>>>>>>>>>>>>> joins,
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>> which
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> is a
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> quite
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> common
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> use case in Flink table / SQL.
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Therefore we propose some new APIs
> >>>>>>>>>>>>> including
> >>>>>>>>>>>>>>>>>>> cache,
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> metrics,
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> wrapper
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> classes of TableFunction and new
> >>>>>>>>> table
> >>>>>>>>>>>>>>> options.
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Please
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> take a
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> look
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> at the
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> FLIP page [1] to get more details.
> >>>>>>>>> Any
> >>>>>>>>>>>>>>>>>> suggestions
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>> and
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> comments
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> would be
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> appreciated!
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> [1]
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>
> >>>>>>>>>>>>>
> >>>>>>>>>>>
> >>>>>>>>>>
> >>>>>>>>>
> >>>>>>>>
> >>>>>>
> >>>>> https://cwiki.apache.org/confluence/display/FLINK/FLIP-221+Abstraction+for+lookup+source+cache+and+metric
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Best regards,
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Qingsheng
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> --
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Best Regards,
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Qingsheng Ren
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Real-time Computing Team
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Alibaba Cloud
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Email: renqs...@gmail.com
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> --
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Best regards,
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Roman Boyko
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> e.: ro.v.bo...@gmail.com
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>>>>>> --
> >>>>>>>>>>>>>>>>>>>>>>>>>>> Best Regards,
> >>>>>>>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>>>>>> Qingsheng Ren
> >>>>>>>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>>>>>> Real-time Computing Team
> >>>>>>>>>>>>>>>>>>>>>>>>>>> Alibaba Cloud
> >>>>>>>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>>>>>> Email: renqs...@gmail.com
> >>>>>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>
> >>>>>>>>>>>>>
> >>>>>>>>>>>>
> >>>>>>>>>>>
> >>>>>>>>>>
> >>>>>>>>>
> >>>>>>>>
> >>>>>>
> >>>>>>
> >>>>>
> >>
>

Re: [DISCUSS] FLIP-221 Abstraction for lookup source cache and metric

Reply via email to