Re: [DISCUSS] FLIP-221 Abstraction for lookup source cache and metric

Qingsheng Ren Wed, 22 Jun 2022 01:59:48 -0700

Hi Jingsong,

1. Updated and thanks for the reminder!


2. We could do so for implementation but as public interface I prefer not to 
introduce another layer and expose too much since this FLIP is already a huge 
one with bunch of classes and interfaces.

Best,
Qingsheng

> On Jun 22, 2022, at 11:16, Jingsong Li <jingsongl...@gmail.com> wrote:
> 
> Thanks Qingsheng and all.
> 
> I like this design.
> 
> Some comments:
> 
> 1. LookupCache implements Serializable?
> 
> 2. Minor: After FLIP-234 [1], there should be many connectors that
> implement both PartialCachingLookupProvider and
> PartialCachingAsyncLookupProvider. Can we extract a common interface
> for `LookupCache getCache();` to ensure consistency?
> 
> [1] 
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-234%3A+Support+Retryable+Lookup+Join+To+Solve+Delayed+Updates+Issue+In+External+Systems
> 
> Best,
> Jingsong
> 
> On Tue, Jun 21, 2022 at 4:09 PM Qingsheng Ren <re...@apache.org> wrote:
>> 
>> Hi devs,
>> 
>> I’d like to push FLIP-221 forward a little bit. Recently we had some offline 
>> discussions and updated the FLIP. Here’s the diff compared to the previous 
>> version:
>> 
>> 1. (Async)LookupFunctionProvider is designed as a base interface for 
>> constructing lookup functions.
>> 2. From the LookupFunction we extend PartialCaching / 
>> FullCachingLookupProvider for partial and full caching mode.
>> 3. Introduce CacheReloadTrigger for specifying reload stratrgy in full 
>> caching mode, and provide 2 default implementations (Periodic / 
>> TimedCacheReloadTrigger)
>> 
>> Looking forward to your replies~
>> 
>> Best,
>> Qingsheng
>> 
>>> On Jun 2, 2022, at 17:15, Qingsheng Ren <renqs...@gmail.com> wrote:
>>> 
>>> Hi Becket,
>>> 
>>> Thanks for your feedback!
>>> 
>>> 1. An alternative way is to let the implementation of cache to decide
>>> whether to store a missing key in the cache instead of the framework.
>>> This sounds more reasonable and makes the LookupProvider interface
>>> cleaner. I can update the FLIP and clarify in the JavaDoc of
>>> LookupCache#put that the cache should decide whether to store an empty
>>> collection.
>>> 
>>> 2. Initially the builder pattern is for the extensibility of
>>> LookupProvider interfaces that we could need to add more
>>> configurations in the future. We can remove the builder now as we have
>>> resolved the issue in 1. As for the builder in DefaultLookupCache I
>>> prefer to keep it because we have a lot of arguments in the
>>> constructor.
>>> 
>>> 3. I think this might overturn the overall design. I agree with
>>> Becket's idea that the API design should be layered considering
>>> extensibility and it'll be great to have one unified interface
>>> supporting both partial, full and even mixed custom strategies, but we
>>> have some issues to resolve. The original purpose of treating full
>>> caching separately is that we'd like to reuse the ability of
>>> ScanRuntimeProvider. Developers just need to hand over Source /
>>> SourceFunction / InputFormat so that the framework could be able to
>>> compose the underlying topology and control the reload (maybe in a
>>> distributed way). Under your design we leave the reload operation
>>> totally to the CacheStrategy and I think it will be hard for
>>> developers to reuse the source in the initializeCache method.
>>> 
>>> Best regards,
>>> 
>>> Qingsheng
>>> 
>>> On Thu, Jun 2, 2022 at 1:50 PM Becket Qin <becket....@gmail.com> wrote:
>>>> 
>>>> Thanks for updating the FLIP, Qingsheng. A few more comments:
>>>> 
>>>> 1. I am still not sure about what is the use case for cacheMissingKey().
>>>> More specifically, when would users want to have getCache() return a
>>>> non-empty value and cacheMissingKey() returns false?
>>>> 
>>>> 2. The builder pattern. Usually the builder pattern is used when there are
>>>> a lot of variations of constructors. For example, if a class has three
>>>> variables and all of them are optional, so there could potentially be many
>>>> combinations of the variables. But in this FLIP, I don't see such case.
>>>> What is the reason we have builders for all the classes?
>>>> 
>>>> 3. Should the caching strategy be excluded from the top level provider API?
>>>> Technically speaking, the Flink framework should only have two interfaces
>>>> to deal with:
>>>>   A) LookupFunction
>>>>   B) AsyncLookupFunction
>>>> Orthogonally, we *believe* there are two different strategies people can do
>>>> caching. Note that the Flink framework does not care what is the caching
>>>> strategy here.
>>>>   a) partial caching
>>>>   b) full caching
>>>> 
>>>> Putting them together, we end up with 3 combinations that we think are
>>>> valid:
>>>>    Aa) PartialCachingLookupFunctionProvider
>>>>    Ba) PartialCachingAsyncLookupFunctionProvider
>>>>    Ab) FullCachingLookupFunctionProvider
>>>> 
>>>> However, the caching strategy could actually be quite flexible. E.g. an
>>>> initial full cache load followed by some partial updates. Also, I am not
>>>> 100% sure if the full caching will always use ScanTableSource. Including
>>>> the caching strategy in the top level provider API would make it harder to
>>>> extend.
>>>> 
>>>> One possible solution is to just have *LookupFunctionProvider* and
>>>> *AsyncLookupFunctionProvider
>>>> *as the top level API, both with a getCacheStrategy() method returning an
>>>> optional CacheStrategy. The CacheStrategy class would have the following
>>>> methods:
>>>> 1. void open(Context), the context exposes some of the resources that may
>>>> be useful for the the caching strategy, e.g. an ExecutorService that is
>>>> synchronized with the data processing, or a cache refresh trigger which
>>>> blocks data processing and refresh the cache.
>>>> 2. void initializeCache(), a blocking method allows users to pre-populate
>>>> the cache before processing any data if they wish.
>>>> 3. void maybeCache(RowData key, Collection<RowData> value), blocking or
>>>> non-blocking method.
>>>> 4. void refreshCache(), a blocking / non-blocking method that is invoked by
>>>> the Flink framework when the cache refresh trigger is pulled.
>>>> 
>>>> In the above design, partial caching and full caching would be
>>>> implementations of the CachingStrategy. And it is OK for users to implement
>>>> their own CachingStrategy if they want to.
>>>> 
>>>> Thanks,
>>>> 
>>>> Jiangjie (Becket) Qin
>>>> 
>>>> 
>>>> On Thu, Jun 2, 2022 at 12:14 PM Jark Wu <imj...@gmail.com> wrote:
>>>> 
>>>>> Thank Qingsheng for the detailed summary and updates,
>>>>> 
>>>>> The changes look good to me in general. I just have one minor improvement
>>>>> comment.
>>>>> Could we add a static util method to the "FullCachingReloadTrigger"
>>>>> interface for quick usage?
>>>>> 
>>>>> #periodicReloadAtFixedRate(Duration)
>>>>> #periodicReloadWithFixedDelay(Duration)
>>>>> 
>>>>> I think we can also do this for LookupCache. Because users may not know
>>>>> where is the default
>>>>> implementations and how to use them.
>>>>> 
>>>>> Best,
>>>>> Jark
>>>>> 
>>>>> 
>>>>> 
>>>>> 
>>>>> 
>>>>> 
>>>>> On Wed, 1 Jun 2022 at 18:32, Qingsheng Ren <renqs...@gmail.com> wrote:
>>>>> 
>>>>>> Hi Jingsong,
>>>>>> 
>>>>>> Thanks for your comments!
>>>>>> 
>>>>>>> AllCache definition is not flexible, for example, PartialCache can use
>>>>>> any custom storage, while the AllCache can not, AllCache can also be
>>>>>> considered to store memory or disk, also need a flexible strategy.
>>>>>> 
>>>>>> We had an offline discussion with Jark and Leonard. Basically we think
>>>>>> exposing the interface of full cache storage to connector developers
>>>>> might
>>>>>> limit our future optimizations. The storage of full caching shouldn’t
>>>>> have
>>>>>> too many variations for different lookup tables so making it pluggable
>>>>>> might not help a lot. Also I think it is not quite easy for connector
>>>>>> developers to implement such an optimized storage. We can keep optimizing
>>>>>> this storage in the future and all full caching lookup tables would
>>>>> benefit
>>>>>> from this.
>>>>>> 
>>>>>>> We are more inclined to deprecate the connector `async` option when
>>>>>> discussing FLIP-234. Can we remove this option from this FLIP?
>>>>>> 
>>>>>> Thanks for the reminder! This option has been removed in the latest
>>>>>> version.
>>>>>> 
>>>>>> Best regards,
>>>>>> 
>>>>>> Qingsheng
>>>>>> 
>>>>>> 
>>>>>>> On Jun 1, 2022, at 15:28, Jingsong Li <jingsongl...@gmail.com> wrote:
>>>>>>> 
>>>>>>> Thanks Alexander for your reply. We can discuss the new interface when
>>>>> it
>>>>>>> comes out.
>>>>>>> 
>>>>>>> We are more inclined to deprecate the connector `async` option when
>>>>>>> discussing FLIP-234 [1]. We should use hint to let planner decide.
>>>>>>> Although the discussion has not yet produced a conclusion, can we
>>>>> remove
>>>>>>> this option from this FLIP? It doesn't seem to be related to this FLIP,
>>>>>> but
>>>>>>> more to FLIP-234, and we can form a conclusion over there.
>>>>>>> 
>>>>>>> [1] https://lists.apache.org/thread/9k1sl2519kh2n3yttwqc00p07xdfns3h
>>>>>>> 
>>>>>>> Best,
>>>>>>> Jingsong
>>>>>>> 
>>>>>>> On Wed, Jun 1, 2022 at 4:59 AM Jing Ge <j...@ververica.com> wrote:
>>>>>>> 
>>>>>>>> Hi Jark,
>>>>>>>> 
>>>>>>>> Thanks for clarifying it. It would be fine. as long as we could
>>>>> provide
>>>>>> the
>>>>>>>> no-cache solution. I was just wondering if the client side cache could
>>>>>>>> really help when HBase is used, since the data to look up should be
>>>>>> huge.
>>>>>>>> Depending how much data will be cached on the client side, the data
>>>>> that
>>>>>>>> should be lru in e.g. LruBlockCache will not be lru anymore. In the
>>>>>> worst
>>>>>>>> case scenario, once the cached data at client side is expired, the
>>>>>> request
>>>>>>>> will hit disk which will cause extra latency temporarily, if I am not
>>>>>>>> mistaken.
>>>>>>>> 
>>>>>>>> Best regards,
>>>>>>>> Jing
>>>>>>>> 
>>>>>>>> On Mon, May 30, 2022 at 9:59 AM Jark Wu <imj...@gmail.com> wrote:
>>>>>>>> 
>>>>>>>>> Hi Jing Ge,
>>>>>>>>> 
>>>>>>>>> What do you mean about the "impact on the block cache used by HBase"?
>>>>>>>>> In my understanding, the connector cache and HBase cache are totally
>>>>>> two
>>>>>>>>> things.
>>>>>>>>> The connector cache is a local/client cache, and the HBase cache is a
>>>>>>>>> server cache.
>>>>>>>>> 
>>>>>>>>>> does it make sense to have a no-cache solution as one of the
>>>>>>>>> default solutions so that customers will have no effort for the
>>>>>> migration
>>>>>>>>> if they want to stick with Hbase cache
>>>>>>>>> 
>>>>>>>>> The implementation migration should be transparent to users. Take the
>>>>>>>> HBase
>>>>>>>>> connector as
>>>>>>>>> an example,  it already supports lookup cache but is disabled by
>>>>>> default.
>>>>>>>>> After migration, the
>>>>>>>>> connector still disables cache by default (i.e. no-cache solution).
>>>>> No
>>>>>>>>> migration effort for users.
>>>>>>>>> 
>>>>>>>>> HBase cache and connector cache are two different things. HBase cache
>>>>>>>> can't
>>>>>>>>> simply replace
>>>>>>>>> connector cache. Because one of the most important usages for
>>>>> connector
>>>>>>>>> cache is reducing
>>>>>>>>> the I/O request/response and improving the throughput, which can
>>>>>> achieve
>>>>>>>>> by just using a server cache.
>>>>>>>>> 
>>>>>>>>> Best,
>>>>>>>>> Jark
>>>>>>>>> 
>>>>>>>>> 
>>>>>>>>> 
>>>>>>>>> 
>>>>>>>>> On Fri, 27 May 2022 at 22:42, Jing Ge <j...@ververica.com> wrote:
>>>>>>>>> 
>>>>>>>>>> Thanks all for the valuable discussion. The new feature looks very
>>>>>>>>>> interesting.
>>>>>>>>>> 
>>>>>>>>>> According to the FLIP description: "*Currently we have JDBC, Hive
>>>>> and
>>>>>>>>> HBase
>>>>>>>>>> connector implemented lookup table source. All existing
>>>>>> implementations
>>>>>>>>>> will be migrated to the current design and the migration will be
>>>>>>>>>> transparent to end users*." I was only wondering if we should pay
>>>>>>>>> attention
>>>>>>>>>> to HBase and similar DBs. Since, commonly, the lookup data will be
>>>>>> huge
>>>>>>>>>> while using HBase, partial caching will be used in this case, if I
>>>>> am
>>>>>>>> not
>>>>>>>>>> mistaken, which might have an impact on the block cache used by
>>>>> HBase,
>>>>>>>>> e.g.
>>>>>>>>>> LruBlockCache.
>>>>>>>>>> Another question is that, since HBase provides a sophisticated cache
>>>>>>>>>> solution, does it make sense to have a no-cache solution as one of
>>>>> the
>>>>>>>>>> default solutions so that customers will have no effort for the
>>>>>>>> migration
>>>>>>>>>> if they want to stick with Hbase cache?
>>>>>>>>>> 
>>>>>>>>>> Best regards,
>>>>>>>>>> Jing
>>>>>>>>>> 
>>>>>>>>>> On Fri, May 27, 2022 at 11:19 AM Jingsong Li <
>>>>> jingsongl...@gmail.com>
>>>>>>>>>> wrote:
>>>>>>>>>> 
>>>>>>>>>>> Hi all,
>>>>>>>>>>> 
>>>>>>>>>>> I think the problem now is below:
>>>>>>>>>>> 1. AllCache and PartialCache interface on the non-uniform, one
>>>>> needs
>>>>>>>> to
>>>>>>>>>>> provide LookupProvider, the other needs to provide CacheBuilder.
>>>>>>>>>>> 2. AllCache definition is not flexible, for example, PartialCache
>>>>> can
>>>>>>>>> use
>>>>>>>>>>> any custom storage, while the AllCache can not, AllCache can also
>>>>> be
>>>>>>>>>>> considered to store memory or disk, also need a flexible strategy.
>>>>>>>>>>> 3. AllCache can not customize ReloadStrategy, currently only
>>>>>>>>>>> ScheduledReloadStrategy.
>>>>>>>>>>> 
>>>>>>>>>>> In order to solve the above problems, the following are my ideas.
>>>>>>>>>>> 
>>>>>>>>>>> ## Top level cache interfaces:
>>>>>>>>>>> 
>>>>>>>>>>> ```
>>>>>>>>>>> 
>>>>>>>>>>> public interface CacheLookupProvider extends
>>>>>>>>>>> LookupTableSource.LookupRuntimeProvider {
>>>>>>>>>>> 
>>>>>>>>>>>  CacheBuilder createCacheBuilder();
>>>>>>>>>>> }
>>>>>>>>>>> 
>>>>>>>>>>> 
>>>>>>>>>>> public interface CacheBuilder {
>>>>>>>>>>>  Cache create();
>>>>>>>>>>> }
>>>>>>>>>>> 
>>>>>>>>>>> 
>>>>>>>>>>> public interface Cache {
>>>>>>>>>>> 
>>>>>>>>>>>  /**
>>>>>>>>>>>   * Returns the value associated with key in this cache, or null
>>>>>>>> if
>>>>>>>>>>> there is no cached value for
>>>>>>>>>>>   * key.
>>>>>>>>>>>   */
>>>>>>>>>>>  @Nullable
>>>>>>>>>>>  Collection<RowData> getIfPresent(RowData key);
>>>>>>>>>>> 
>>>>>>>>>>>  /** Returns the number of key-value mappings in the cache. */
>>>>>>>>>>>  long size();
>>>>>>>>>>> }
>>>>>>>>>>> 
>>>>>>>>>>> ```
>>>>>>>>>>> 
>>>>>>>>>>> ## Partial cache
>>>>>>>>>>> 
>>>>>>>>>>> ```
>>>>>>>>>>> 
>>>>>>>>>>> public interface PartialCacheLookupFunction extends
>>>>>>>>> CacheLookupProvider {
>>>>>>>>>>> 
>>>>>>>>>>>  @Override
>>>>>>>>>>>  PartialCacheBuilder createCacheBuilder();
>>>>>>>>>>> 
>>>>>>>>>>> /** Creates an {@link LookupFunction} instance. */
>>>>>>>>>>> LookupFunction createLookupFunction();
>>>>>>>>>>> }
>>>>>>>>>>> 
>>>>>>>>>>> 
>>>>>>>>>>> public interface PartialCacheBuilder extends CacheBuilder {
>>>>>>>>>>> 
>>>>>>>>>>>  PartialCache create();
>>>>>>>>>>> }
>>>>>>>>>>> 
>>>>>>>>>>> 
>>>>>>>>>>> public interface PartialCache extends Cache {
>>>>>>>>>>> 
>>>>>>>>>>>  /**
>>>>>>>>>>>   * Associates the specified value rows with the specified key
>>>>> row
>>>>>>>>>>> in the cache. If the cache
>>>>>>>>>>>   * previously contained value associated with the key, the old
>>>>>>>>>>> value is replaced by the
>>>>>>>>>>>   * specified value.
>>>>>>>>>>>   *
>>>>>>>>>>>   * @return the previous value rows associated with key, or null
>>>>>>>> if
>>>>>>>>>>> there was no mapping for key.
>>>>>>>>>>>   * @param key - key row with which the specified value is to be
>>>>>>>>>>> associated
>>>>>>>>>>>   * @param value – value rows to be associated with the specified
>>>>>>>>> key
>>>>>>>>>>>   */
>>>>>>>>>>>  Collection<RowData> put(RowData key, Collection<RowData> value);
>>>>>>>>>>> 
>>>>>>>>>>>  /** Discards any cached value for the specified key. */
>>>>>>>>>>>  void invalidate(RowData key);
>>>>>>>>>>> }
>>>>>>>>>>> 
>>>>>>>>>>> ```
>>>>>>>>>>> 
>>>>>>>>>>> ## All cache
>>>>>>>>>>> ```
>>>>>>>>>>> 
>>>>>>>>>>> public interface AllCacheLookupProvider extends
>>>>> CacheLookupProvider {
>>>>>>>>>>> 
>>>>>>>>>>>  void registerReloadStrategy(ScheduledExecutorService
>>>>>>>>>>> executorService, Reloader reloader);
>>>>>>>>>>> 
>>>>>>>>>>>  ScanTableSource.ScanRuntimeProvider getScanRuntimeProvider();
>>>>>>>>>>> 
>>>>>>>>>>>  @Override
>>>>>>>>>>>  AllCacheBuilder createCacheBuilder();
>>>>>>>>>>> }
>>>>>>>>>>> 
>>>>>>>>>>> 
>>>>>>>>>>> public interface AllCacheBuilder extends CacheBuilder {
>>>>>>>>>>> 
>>>>>>>>>>>  AllCache create();
>>>>>>>>>>> }
>>>>>>>>>>> 
>>>>>>>>>>> 
>>>>>>>>>>> public interface AllCache extends Cache {
>>>>>>>>>>> 
>>>>>>>>>>>  void putAll(Iterator<Map<RowData, RowData>> allEntries);
>>>>>>>>>>> 
>>>>>>>>>>>  void clearAll();
>>>>>>>>>>> }
>>>>>>>>>>> 
>>>>>>>>>>> 
>>>>>>>>>>> public interface Reloader {
>>>>>>>>>>> 
>>>>>>>>>>>  void reload();
>>>>>>>>>>> }
>>>>>>>>>>> 
>>>>>>>>>>> ```
>>>>>>>>>>> 
>>>>>>>>>>> Best,
>>>>>>>>>>> Jingsong
>>>>>>>>>>> 
>>>>>>>>>>> On Fri, May 27, 2022 at 11:10 AM Jingsong Li <
>>>>> jingsongl...@gmail.com
>>>>>>>>> 
>>>>>>>>>>> wrote:
>>>>>>>>>>> 
>>>>>>>>>>>> Thanks Qingsheng and all for your discussion.
>>>>>>>>>>>> 
>>>>>>>>>>>> Very sorry to jump in so late.
>>>>>>>>>>>> 
>>>>>>>>>>>> Maybe I missed something?
>>>>>>>>>>>> My first impression when I saw the cache interface was, why don't
>>>>>>>> we
>>>>>>>>>>>> provide an interface similar to guava cache [1], on top of guava
>>>>>>>>> cache,
>>>>>>>>>>>> caffeine also makes extensions for asynchronous calls.[2]
>>>>>>>>>>>> There is also the bulk load in caffeine too.
>>>>>>>>>>>> 
>>>>>>>>>>>> I am also more confused why first from LookupCacheFactory.Builder
>>>>>>>> and
>>>>>>>>>>> then
>>>>>>>>>>>> to Factory to create Cache.
>>>>>>>>>>>> 
>>>>>>>>>>>> [1] https://github.com/google/guava
>>>>>>>>>>>> [2] https://github.com/ben-manes/caffeine/wiki/Population
>>>>>>>>>>>> 
>>>>>>>>>>>> Best,
>>>>>>>>>>>> Jingsong
>>>>>>>>>>>> 
>>>>>>>>>>>> On Thu, May 26, 2022 at 11:17 PM Jark Wu <imj...@gmail.com>
>>>>> wrote:
>>>>>>>>>>>> 
>>>>>>>>>>>>> After looking at the new introduced ReloadTime and Becket's
>>>>>>>> comment,
>>>>>>>>>>>>> I agree with Becket we should have a pluggable reloading
>>>>> strategy.
>>>>>>>>>>>>> We can provide some common implementations, e.g., periodic
>>>>>>>>> reloading,
>>>>>>>>>>> and
>>>>>>>>>>>>> daily reloading.
>>>>>>>>>>>>> But there definitely be some connector- or business-specific
>>>>>>>>> reloading
>>>>>>>>>>>>> strategies, e.g.
>>>>>>>>>>>>> notify by a zookeeper watcher, reload once a new Hive partition
>>>>> is
>>>>>>>>>>>>> complete.
>>>>>>>>>>>>> 
>>>>>>>>>>>>> Best,
>>>>>>>>>>>>> Jark
>>>>>>>>>>>>> 
>>>>>>>>>>>>> On Thu, 26 May 2022 at 11:52, Becket Qin <becket....@gmail.com>
>>>>>>>>>> wrote:
>>>>>>>>>>>>> 
>>>>>>>>>>>>>> Hi Qingsheng,
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> Thanks for updating the FLIP. A few comments / questions below:
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> 1. Is there a reason that we have both "XXXFactory" and
>>>>>>>>>> "XXXProvider".
>>>>>>>>>>>>>> What is the difference between them? If they are the same, can
>>>>>>>> we
>>>>>>>>>> just
>>>>>>>>>>>>> use
>>>>>>>>>>>>>> XXXFactory everywhere?
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> 2. Regarding the FullCachingLookupProvider, should the reloading
>>>>>>>>>>> policy
>>>>>>>>>>>>>> also be pluggable? Periodical reloading could be sometimes be
>>>>>>>>> tricky
>>>>>>>>>>> in
>>>>>>>>>>>>>> practice. For example, if user uses 24 hours as the cache
>>>>>>>> refresh
>>>>>>>>>>>>> interval
>>>>>>>>>>>>>> and some nightly batch job delayed, the cache update may still
>>>>>>>> see
>>>>>>>>>> the
>>>>>>>>>>>>>> stale data.
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> 3. In DefaultLookupCacheFactory, it looks like InitialCapacity
>>>>>>>>>> should
>>>>>>>>>>> be
>>>>>>>>>>>>>> removed.
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> 4. The purpose of LookupFunctionProvider#cacheMissingKey()
>>>>>>>> seems a
>>>>>>>>>>>>> little
>>>>>>>>>>>>>> confusing to me. If Optional<LookupCacheFactory>
>>>>>>>> getCacheFactory()
>>>>>>>>>>>>> returns
>>>>>>>>>>>>>> a non-empty factory, doesn't that already indicates the
>>>>>>>> framework
>>>>>>>>> to
>>>>>>>>>>>>> cache
>>>>>>>>>>>>>> the missing keys? Also, why is this method returning an
>>>>>>>>>>>>> Optional<Boolean>
>>>>>>>>>>>>>> instead of boolean?
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> Thanks,
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> Jiangjie (Becket) Qin
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> On Wed, May 25, 2022 at 5:07 PM Qingsheng Ren <
>>>>>>>> renqs...@gmail.com
>>>>>>>>>> 
>>>>>>>>>>>>> wrote:
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>> Hi Lincoln and Jark,
>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>> Thanks for the comments! If the community reaches a consensus
>>>>>>>>> that
>>>>>>>>>> we
>>>>>>>>>>>>> use
>>>>>>>>>>>>>>> SQL hint instead of table options to decide whether to use sync
>>>>>>>>> or
>>>>>>>>>>>>> async
>>>>>>>>>>>>>>> mode, it’s indeed not necessary to introduce the “lookup.async”
>>>>>>>>>>> option.
>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>> I think it’s a good idea to let the decision of async made on
>>>>>>>>> query
>>>>>>>>>>>>>>> level, which could make better optimization with more
>>>>>>>> infomation
>>>>>>>>>>>>> gathered
>>>>>>>>>>>>>>> by planner. Is there any FLIP describing the issue in
>>>>>>>>> FLINK-27625?
>>>>>>>>>> I
>>>>>>>>>>>>>>> thought FLIP-234 is only proposing adding SQL hint for retry on
>>>>>>>>>>> missing
>>>>>>>>>>>>>>> instead of the entire async mode to be controlled by hint.
>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>> Best regards,
>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>> Qingsheng
>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>> On May 25, 2022, at 15:13, Lincoln Lee <
>>>>>>>> lincoln.8...@gmail.com
>>>>>>>>>> 
>>>>>>>>>>>>> wrote:
>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>> Hi Jark,
>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>> Thanks for your reply!
>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>> Currently 'lookup.async' just lies in HBase connector, I have
>>>>>>>>> no
>>>>>>>>>>> idea
>>>>>>>>>>>>>>>> whether or when to remove it (we can discuss it in another
>>>>>>>>> issue
>>>>>>>>>>> for
>>>>>>>>>>>>> the
>>>>>>>>>>>>>>>> HBase connector after FLINK-27625 is done), just not add it
>>>>>>>>> into
>>>>>>>>>> a
>>>>>>>>>>>>>>> common
>>>>>>>>>>>>>>>> option now.
>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>> Best,
>>>>>>>>>>>>>>>> Lincoln Lee
>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>> Jark Wu <imj...@gmail.com> 于2022年5月24日周二 20:14写道：
>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>> Hi Lincoln,
>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>> I have taken a look at FLIP-234, and I agree with you that
>>>>>>>> the
>>>>>>>>>>>>>>> connectors
>>>>>>>>>>>>>>>>> can
>>>>>>>>>>>>>>>>> provide both async and sync runtime providers simultaneously
>>>>>>>>>>> instead
>>>>>>>>>>>>>>> of one
>>>>>>>>>>>>>>>>> of them.
>>>>>>>>>>>>>>>>> At that point, "lookup.async" looks redundant. If this
>>>>>>>> option
>>>>>>>>> is
>>>>>>>>>>>>>>> planned to
>>>>>>>>>>>>>>>>> be removed
>>>>>>>>>>>>>>>>> in the long term, I think it makes sense not to introduce it
>>>>>>>>> in
>>>>>>>>>>> this
>>>>>>>>>>>>>>> FLIP.
>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>> Best,
>>>>>>>>>>>>>>>>> Jark
>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>> On Tue, 24 May 2022 at 11:08, Lincoln Lee <
>>>>>>>>>> lincoln.8...@gmail.com
>>>>>>>>>>>> 
>>>>>>>>>>>>>>> wrote:
>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>> Hi Qingsheng,
>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>> Sorry for jumping into the discussion so late. It's a good
>>>>>>>>> idea
>>>>>>>>>>>>> that
>>>>>>>>>>>>>>> we
>>>>>>>>>>>>>>>>> can
>>>>>>>>>>>>>>>>>> have a common table option. I have a minor comments on
>>>>>>>>>>>>> 'lookup.async'
>>>>>>>>>>>>>>>>> that
>>>>>>>>>>>>>>>>>> not make it a common option:
>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>> The table layer abstracts both sync and async lookup
>>>>>>>>>>> capabilities,
>>>>>>>>>>>>>>>>>> connectors implementers can choose one or both, in the case
>>>>>>>>> of
>>>>>>>>>>>>>>>>> implementing
>>>>>>>>>>>>>>>>>> only one capability(status of the most of existing builtin
>>>>>>>>>>>>> connectors)
>>>>>>>>>>>>>>>>>> 'lookup.async' will not be used.  And when a connector has
>>>>>>>>> both
>>>>>>>>>>>>>>>>>> capabilities, I think this choice is more suitable for
>>>>>>>> making
>>>>>>>>>>>>>>> decisions
>>>>>>>>>>>>>>>>> at
>>>>>>>>>>>>>>>>>> the query level, for example, table planner can choose the
>>>>>>>>>>> physical
>>>>>>>>>>>>>>>>>> implementation of async lookup or sync lookup based on its
>>>>>>>>> cost
>>>>>>>>>>>>>>> model, or
>>>>>>>>>>>>>>>>>> users can give query hint based on their own better
>>>>>>>>>>>>> understanding.  If
>>>>>>>>>>>>>>>>>> there is another common table option 'lookup.async', it may
>>>>>>>>>>> confuse
>>>>>>>>>>>>>>> the
>>>>>>>>>>>>>>>>>> users in the long run.
>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>> So, I prefer to leave the 'lookup.async' option in private
>>>>>>>>>> place
>>>>>>>>>>>>> (for
>>>>>>>>>>>>>>> the
>>>>>>>>>>>>>>>>>> current hbase connector) and not turn it into a common
>>>>>>>>> option.
>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>> WDYT?
>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>> Best,
>>>>>>>>>>>>>>>>>> Lincoln Lee
>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>> Qingsheng Ren <renqs...@gmail.com> 于2022年5月23日周一 14:54写道：
>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>> Hi Alexander,
>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>> Thanks for the review! We recently updated the FLIP and
>>>>>>>> you
>>>>>>>>>> can
>>>>>>>>>>>>> find
>>>>>>>>>>>>>>>>>> those
>>>>>>>>>>>>>>>>>>> changes from my latest email. Since some terminologies has
>>>>>>>>>>>>> changed so
>>>>>>>>>>>>>>>>>> I’ll
>>>>>>>>>>>>>>>>>>> use the new concept for replying your comments.
>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>> 1. Builder vs ‘of’
>>>>>>>>>>>>>>>>>>> I’m OK to use builder pattern if we have additional
>>>>>>>> optional
>>>>>>>>>>>>>>> parameters
>>>>>>>>>>>>>>>>>>> for full caching mode (“rescan” previously). The
>>>>>>>>>>>>> schedule-with-delay
>>>>>>>>>>>>>>>>> idea
>>>>>>>>>>>>>>>>>>> looks reasonable to me, but I think we need to redesign
>>>>>>>> the
>>>>>>>>>>>>> builder
>>>>>>>>>>>>>>> API
>>>>>>>>>>>>>>>>>> of
>>>>>>>>>>>>>>>>>>> full caching to make it more descriptive for developers.
>>>>>>>>> Would
>>>>>>>>>>> you
>>>>>>>>>>>>>>> mind
>>>>>>>>>>>>>>>>>>> sharing your ideas about the API? For accessing the FLIP
>>>>>>>>>>> workspace
>>>>>>>>>>>>>>> you
>>>>>>>>>>>>>>>>>> can
>>>>>>>>>>>>>>>>>>> just provide your account ID and ping any PMC member
>>>>>>>>> including
>>>>>>>>>>>>> Jark.
>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>> 2. Common table options
>>>>>>>>>>>>>>>>>>> We have some discussions these days and propose to
>>>>>>>>> introduce 8
>>>>>>>>>>>>> common
>>>>>>>>>>>>>>>>>>> table options about caching. It has been updated on the
>>>>>>>>> FLIP.
>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>> 3. Retries
>>>>>>>>>>>>>>>>>>> I think we are on the same page :-)
>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>> For your additional concerns:
>>>>>>>>>>>>>>>>>>> 1) The table option has been updated.
>>>>>>>>>>>>>>>>>>> 2) We got “lookup.cache” back for configuring whether to
>>>>>>>> use
>>>>>>>>>>>>> partial
>>>>>>>>>>>>>>> or
>>>>>>>>>>>>>>>>>>> full caching mode.
>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>> Best regards,
>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>> Qingsheng
>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>> On May 19, 2022, at 17:25, Александр Смирнов <
>>>>>>>>>>>>> smirale...@gmail.com>
>>>>>>>>>>>>>>>>>>> wrote:
>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>> Also I have a few additions:
>>>>>>>>>>>>>>>>>>>> 1) maybe rename 'lookup.cache.maximum-size' to
>>>>>>>>>>>>>>>>>>>> 'lookup.cache.max-rows'? I think it will be more clear
>>>>>>>> that
>>>>>>>>>> we
>>>>>>>>>>>>> talk
>>>>>>>>>>>>>>>>>>>> not about bytes, but about the number of rows. Plus it
>>>>>>>> fits
>>>>>>>>>>> more,
>>>>>>>>>>>>>>>>>>>> considering my optimization with filters.
>>>>>>>>>>>>>>>>>>>> 2) How will users enable rescanning? Are we going to
>>>>>>>>> separate
>>>>>>>>>>>>>>> caching
>>>>>>>>>>>>>>>>>>>> and rescanning from the options point of view? Like
>>>>>>>>> initially
>>>>>>>>>>> we
>>>>>>>>>>>>> had
>>>>>>>>>>>>>>>>>>>> one option 'lookup.cache' with values LRU / ALL. I think
>>>>>>>>> now
>>>>>>>>>> we
>>>>>>>>>>>>> can
>>>>>>>>>>>>>>>>>>>> make a boolean option 'lookup.rescan'. RescanInterval can
>>>>>>>>> be
>>>>>>>>>>>>>>>>>>>> 'lookup.rescan.interval', etc.
>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>> Best regards,
>>>>>>>>>>>>>>>>>>>> Alexander
>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>> чт, 19 мая 2022 г. в 14:50, Александр Смирнов <
>>>>>>>>>>>>> smirale...@gmail.com
>>>>>>>>>>>>>>>>>> :
>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>> Hi Qingsheng and Jark,
>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>> 1. Builders vs 'of'
>>>>>>>>>>>>>>>>>>>>> I understand that builders are used when we have
>>>>>>>> multiple
>>>>>>>>>>>>>>>>> parameters.
>>>>>>>>>>>>>>>>>>>>> I suggested them because we could add parameters later.
>>>>>>>> To
>>>>>>>>>>>>> prevent
>>>>>>>>>>>>>>>>>>>>> Builder for ScanRuntimeProvider from looking redundant I
>>>>>>>>> can
>>>>>>>>>>>>>>> suggest
>>>>>>>>>>>>>>>>>>>>> one more config now - "rescanStartTime".
>>>>>>>>>>>>>>>>>>>>> It's a time in UTC (LocalTime class) when the first
>>>>>>>> reload
>>>>>>>>>> of
>>>>>>>>>>>>> cache
>>>>>>>>>>>>>>>>>>>>> starts. This parameter can be thought of as
>>>>>>>> 'initialDelay'
>>>>>>>>>>> (diff
>>>>>>>>>>>>>>>>>>>>> between current time and rescanStartTime) in method
>>>>>>>>>>>>>>>>>>>>> ScheduleExecutorService#scheduleWithFixedDelay [1] . It
>>>>>>>>> can
>>>>>>>>>> be
>>>>>>>>>>>>> very
>>>>>>>>>>>>>>>>>>>>> useful when the dimension table is updated by some other
>>>>>>>>>>>>> scheduled
>>>>>>>>>>>>>>>>> job
>>>>>>>>>>>>>>>>>>>>> at a certain time. Or when the user simply wants a
>>>>>>>> second
>>>>>>>>>> scan
>>>>>>>>>>>>>>>>> (first
>>>>>>>>>>>>>>>>>>>>> cache reload) be delayed. This option can be used even
>>>>>>>>>> without
>>>>>>>>>>>>>>>>>>>>> 'rescanInterval' - in this case 'rescanInterval' will be
>>>>>>>>> one
>>>>>>>>>>>>> day.
>>>>>>>>>>>>>>>>>>>>> If you are fine with this option, I would be very glad
>>>>>>>> if
>>>>>>>>>> you
>>>>>>>>>>>>> would
>>>>>>>>>>>>>>>>>>>>> give me access to edit FLIP page, so I could add it
>>>>>>>> myself
>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>> 2. Common table options
>>>>>>>>>>>>>>>>>>>>> I also think that FactoryUtil would be overloaded by all
>>>>>>>>>> cache
>>>>>>>>>>>>>>>>>>>>> options. But maybe unify all suggested options, not only
>>>>>>>>> for
>>>>>>>>>>>>>>> default
>>>>>>>>>>>>>>>>>>>>> cache? I.e. class 'LookupOptions', that unifies default
>>>>>>>>>> cache
>>>>>>>>>>>>>>>>> options,
>>>>>>>>>>>>>>>>>>>>> rescan options, 'async', 'maxRetries'. WDYT?
>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>> 3. Retries
>>>>>>>>>>>>>>>>>>>>> I'm fine with suggestion close to
>>>>>>>>> RetryUtils#tryTimes(times,
>>>>>>>>>>>>> call)
>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>> [1]
>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>> 
>>>>>>>>>>>>> 
>>>>>>>>>>> 
>>>>>>>>>> 
>>>>>>>>> 
>>>>>>>> 
>>>>>> 
>>>>> https://docs.oracle.com/javase/8/docs/api/java/util/concurrent/ScheduledExecutorService.html#scheduleWithFixedDelay-java.lang.Runnable-long-long-java.util.concurrent.TimeUnit-
>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>> Best regards,
>>>>>>>>>>>>>>>>>>>>> Alexander
>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>> ср, 18 мая 2022 г. в 16:04, Qingsheng Ren <
>>>>>>>>>> renqs...@gmail.com
>>>>>>>>>>>> :
>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>> Hi Jark and Alexander,
>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>> Thanks for your comments! I’m also OK to introduce
>>>>>>>> common
>>>>>>>>>>> table
>>>>>>>>>>>>>>>>>>> options. I prefer to introduce a new
>>>>>>>>> DefaultLookupCacheOptions
>>>>>>>>>>>>> class
>>>>>>>>>>>>>>>>> for
>>>>>>>>>>>>>>>>>>> holding these option definitions because putting all
>>>>>>>> options
>>>>>>>>>>> into
>>>>>>>>>>>>>>>>>>> FactoryUtil would make it a bit ”crowded” and not well
>>>>>>>>>>>>> categorized.
>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>> FLIP has been updated according to suggestions above:
>>>>>>>>>>>>>>>>>>>>>> 1. Use static “of” method for constructing
>>>>>>>>>>>>> RescanRuntimeProvider
>>>>>>>>>>>>>>>>>>> considering both arguments are required.
>>>>>>>>>>>>>>>>>>>>>> 2. Introduce new table options matching
>>>>>>>>>>>>> DefaultLookupCacheFactory
>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>> Best,
>>>>>>>>>>>>>>>>>>>>>> Qingsheng
>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>> On Wed, May 18, 2022 at 2:57 PM Jark Wu <
>>>>>>>>> imj...@gmail.com>
>>>>>>>>>>>>> wrote:
>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>> Hi Alex,
>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>> 1) retry logic
>>>>>>>>>>>>>>>>>>>>>>> I think we can extract some common retry logic into
>>>>>>>>>>> utilities,
>>>>>>>>>>>>>>>>> e.g.
>>>>>>>>>>>>>>>>>>> RetryUtils#tryTimes(times, call).
>>>>>>>>>>>>>>>>>>>>>>> This seems independent of this FLIP and can be reused
>>>>>>>> by
>>>>>>>>>>>>>>>>> DataStream
>>>>>>>>>>>>>>>>>>> users.
>>>>>>>>>>>>>>>>>>>>>>> Maybe we can open an issue to discuss this and where
>>>>>>>> to
>>>>>>>>>> put
>>>>>>>>>>>>> it.
>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>> 2) cache ConfigOptions
>>>>>>>>>>>>>>>>>>>>>>> I'm fine with defining cache config options in the
>>>>>>>>>>> framework.
>>>>>>>>>>>>>>>>>>>>>>> A candidate place to put is FactoryUtil which also
>>>>>>>>>> includes
>>>>>>>>>>>>>>>>>>> "sink.parallelism", "format" options.
>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>> Best,
>>>>>>>>>>>>>>>>>>>>>>> Jark
>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>> On Wed, 18 May 2022 at 13:52, Александр Смирнов <
>>>>>>>>>>>>>>>>>> smirale...@gmail.com>
>>>>>>>>>>>>>>>>>>> wrote:
>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>> Hi Qingsheng,
>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>> Thank you for considering my comments.
>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>>> there might be custom logic before making retry,
>>>>>>>> such
>>>>>>>>> as
>>>>>>>>>>>>>>>>>>> re-establish the connection
>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>> Yes, I understand that. I meant that such logic can
>>>>>>>> be
>>>>>>>>>>>>> placed in
>>>>>>>>>>>>>>>>> a
>>>>>>>>>>>>>>>>>>>>>>>> separate function, that can be implemented by
>>>>>>>>> connectors.
>>>>>>>>>>>>> Just
>>>>>>>>>>>>>>>>>> moving
>>>>>>>>>>>>>>>>>>>>>>>> the retry logic would make connector's LookupFunction
>>>>>>>>>> more
>>>>>>>>>>>>>>>>> concise
>>>>>>>>>>>>>>>>>> +
>>>>>>>>>>>>>>>>>>>>>>>> avoid duplicate code. However, it's a minor change.
>>>>>>>> The
>>>>>>>>>>>>> decision
>>>>>>>>>>>>>>>>> is
>>>>>>>>>>>>>>>>>>> up
>>>>>>>>>>>>>>>>>>>>>>>> to you.
>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>>> We decide not to provide common DDL options and let
>>>>>>>>>>>>> developers
>>>>>>>>>>>>>>>>> to
>>>>>>>>>>>>>>>>>>> define their own options as we do now per connector.
>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>> What is the reason for that? One of the main goals of
>>>>>>>>>> this
>>>>>>>>>>>>> FLIP
>>>>>>>>>>>>>>>>> was
>>>>>>>>>>>>>>>>>>> to
>>>>>>>>>>>>>>>>>>>>>>>> unify the configs, wasn't it? I understand that
>>>>>>>> current
>>>>>>>>>>> cache
>>>>>>>>>>>>>>>>>> design
>>>>>>>>>>>>>>>>>>>>>>>> doesn't depend on ConfigOptions, like was before. But
>>>>>>>>>> still
>>>>>>>>>>>>> we
>>>>>>>>>>>>>>>>> can
>>>>>>>>>>>>>>>>>>> put
>>>>>>>>>>>>>>>>>>>>>>>> these options into the framework, so connectors can
>>>>>>>>> reuse
>>>>>>>>>>>>> them
>>>>>>>>>>>>>>>>> and
>>>>>>>>>>>>>>>>>>>>>>>> avoid code duplication, and, what is more
>>>>>>>> significant,
>>>>>>>>>>> avoid
>>>>>>>>>>>>>>>>>> possible
>>>>>>>>>>>>>>>>>>>>>>>> different options naming. This moment can be pointed
>>>>>>>>> out
>>>>>>>>>> in
>>>>>>>>>>>>>>>>>>>>>>>> documentation for connector developers.
>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>> Best regards,
>>>>>>>>>>>>>>>>>>>>>>>> Alexander
>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>> вт, 17 мая 2022 г. в 17:11, Qingsheng Ren <
>>>>>>>>>>>>> renqs...@gmail.com>:
>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>>> Hi Alexander,
>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>>> Thanks for the review and glad to see we are on the
>>>>>>>>> same
>>>>>>>>>>>>> page!
>>>>>>>>>>>>>>> I
>>>>>>>>>>>>>>>>>>> think you forgot to cc the dev mailing list so I’m also
>>>>>>>>>> quoting
>>>>>>>>>>>>> your
>>>>>>>>>>>>>>>>>> reply
>>>>>>>>>>>>>>>>>>> under this email.
>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>>>> We can add 'maxRetryTimes' option into this class
>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>>> In my opinion the retry logic should be implemented
>>>>>>>> in
>>>>>>>>>>>>> lookup()
>>>>>>>>>>>>>>>>>>> instead of in LookupFunction#eval(). Retrying is only
>>>>>>>>>> meaningful
>>>>>>>>>>>>>>> under
>>>>>>>>>>>>>>>>>> some
>>>>>>>>>>>>>>>>>>> specific retriable failures, and there might be custom
>>>>>>>> logic
>>>>>>>>>>>>> before
>>>>>>>>>>>>>>>>>> making
>>>>>>>>>>>>>>>>>>> retry, such as re-establish the connection
>>>>>>>>>>>>> (JdbcRowDataLookupFunction
>>>>>>>>>>>>>>>>> is
>>>>>>>>>>>>>>>>>> an
>>>>>>>>>>>>>>>>>>> example), so it's more handy to leave it to the connector.
>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>>>> I don't see DDL options, that were in previous
>>>>>>>>> version
>>>>>>>>>> of
>>>>>>>>>>>>>>> FLIP.
>>>>>>>>>>>>>>>>>> Do
>>>>>>>>>>>>>>>>>>> you have any special plans for them?
>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>>> We decide not to provide common DDL options and let
>>>>>>>>>>>>> developers
>>>>>>>>>>>>>>>>> to
>>>>>>>>>>>>>>>>>>> define their own options as we do now per connector.
>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>>> The rest of comments sound great and I’ll update the
>>>>>>>>>> FLIP.
>>>>>>>>>>>>> Hope
>>>>>>>>>>>>>>>>> we
>>>>>>>>>>>>>>>>>>> can finalize our proposal soon!
>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>>> Best,
>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>>> Qingsheng
>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>>>> On May 17, 2022, at 13:46, Александр Смирнов <
>>>>>>>>>>>>>>>>>> smirale...@gmail.com>
>>>>>>>>>>>>>>>>>>> wrote:
>>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>>>> Hi Qingsheng and devs!
>>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>>>> I like the overall design of updated FLIP, however
>>>>>>>> I
>>>>>>>>>> have
>>>>>>>>>>>>>>>>> several
>>>>>>>>>>>>>>>>>>>>>>>>>> suggestions and questions.
>>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>>>> 1) Introducing LookupFunction as a subclass of
>>>>>>>>>>>>> TableFunction
>>>>>>>>>>>>>>>>> is a
>>>>>>>>>>>>>>>>>>> good
>>>>>>>>>>>>>>>>>>>>>>>>>> idea. We can add 'maxRetryTimes' option into this
>>>>>>>>>> class.
>>>>>>>>>>>>>>> 'eval'
>>>>>>>>>>>>>>>>>>> method
>>>>>>>>>>>>>>>>>>>>>>>>>> of new LookupFunction is great for this purpose.
>>>>>>>> The
>>>>>>>>>> same
>>>>>>>>>>>>> is
>>>>>>>>>>>>>>>>> for
>>>>>>>>>>>>>>>>>>>>>>>>>> 'async' case.
>>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>>>> 2) There might be other configs in future, such as
>>>>>>>>>>>>>>>>>>> 'cacheMissingKey'
>>>>>>>>>>>>>>>>>>>>>>>>>> in LookupFunctionProvider or 'rescanInterval' in
>>>>>>>>>>>>>>>>>>> ScanRuntimeProvider.
>>>>>>>>>>>>>>>>>>>>>>>>>> Maybe use Builder pattern in LookupFunctionProvider
>>>>>>>>> and
>>>>>>>>>>>>>>>>>>>>>>>>>> RescanRuntimeProvider for more flexibility (use one
>>>>>>>>>>> 'build'
>>>>>>>>>>>>>>>>>> method
>>>>>>>>>>>>>>>>>>>>>>>>>> instead of many 'of' methods in future)?
>>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>>>> 3) What are the plans for existing
>>>>>>>>>> TableFunctionProvider
>>>>>>>>>>>>> and
>>>>>>>>>>>>>>>>>>>>>>>>>> AsyncTableFunctionProvider? I think they should be
>>>>>>>>>>>>> deprecated.
>>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>>>> 4) Am I right that the current design does not
>>>>>>>> assume
>>>>>>>>>>>>> usage of
>>>>>>>>>>>>>>>>>>>>>>>>>> user-provided LookupCache in re-scanning? In this
>>>>>>>>> case,
>>>>>>>>>>> it
>>>>>>>>>>>>> is
>>>>>>>>>>>>>>>>> not
>>>>>>>>>>>>>>>>>>> very
>>>>>>>>>>>>>>>>>>>>>>>>>> clear why do we need methods such as 'invalidate'
>>>>>>>> or
>>>>>>>>>>>>> 'putAll'
>>>>>>>>>>>>>>>>> in
>>>>>>>>>>>>>>>>>>>>>>>>>> LookupCache.
>>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>>>> 5) I don't see DDL options, that were in previous
>>>>>>>>>> version
>>>>>>>>>>>>> of
>>>>>>>>>>>>>>>>>> FLIP.
>>>>>>>>>>>>>>>>>>> Do
>>>>>>>>>>>>>>>>>>>>>>>>>> you have any special plans for them?
>>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>>>> If you don't mind, I would be glad to be able to
>>>>>>>> make
>>>>>>>>>>> small
>>>>>>>>>>>>>>>>>>>>>>>>>> adjustments to the FLIP document too. I think it's
>>>>>>>>>> worth
>>>>>>>>>>>>>>>>>> mentioning
>>>>>>>>>>>>>>>>>>>>>>>>>> about what exactly optimizations are planning in
>>>>>>>> the
>>>>>>>>>>>>> future.
>>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>>>> Best regards,
>>>>>>>>>>>>>>>>>>>>>>>>>> Smirnov Alexander
>>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>>>> пт, 13 мая 2022 г. в 20:27, Qingsheng Ren <
>>>>>>>>>>>>> renqs...@gmail.com
>>>>>>>>>>>>>>>>>> :
>>>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>>>>> Hi Alexander and devs,
>>>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>>>>> Thank you very much for the in-depth discussion!
>>>>>>>> As
>>>>>>>>>> Jark
>>>>>>>>>>>>>>>>>>> mentioned we were inspired by Alexander's idea and made a
>>>>>>>>>>>>> refactor on
>>>>>>>>>>>>>>>>> our
>>>>>>>>>>>>>>>>>>> design. FLIP-221 [1] has been updated to reflect our
>>>>>>>> design
>>>>>>>>>> now
>>>>>>>>>>>>> and
>>>>>>>>>>>>>>> we
>>>>>>>>>>>>>>>>>> are
>>>>>>>>>>>>>>>>>>> happy to hear more suggestions from you!
>>>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>>>>> Compared to the previous design:
>>>>>>>>>>>>>>>>>>>>>>>>>>> 1. The lookup cache serves at table runtime level
>>>>>>>>> and
>>>>>>>>>> is
>>>>>>>>>>>>>>>>>>> integrated as a component of LookupJoinRunner as discussed
>>>>>>>>>>>>>>> previously.
>>>>>>>>>>>>>>>>>>>>>>>>>>> 2. Interfaces are renamed and re-designed to
>>>>>>>> reflect
>>>>>>>>>> the
>>>>>>>>>>>>> new
>>>>>>>>>>>>>>>>>>> design.
>>>>>>>>>>>>>>>>>>>>>>>>>>> 3. We separate the all-caching case individually
>>>>>>>> and
>>>>>>>>>>>>>>>>> introduce a
>>>>>>>>>>>>>>>>>>> new RescanRuntimeProvider to reuse the ability of
>>>>>>>> scanning.
>>>>>>>>> We
>>>>>>>>>>> are
>>>>>>>>>>>>>>>>>> planning
>>>>>>>>>>>>>>>>>>> to support SourceFunction / InputFormat for now
>>>>>>>> considering
>>>>>>>>>> the
>>>>>>>>>>>>>>>>>> complexity
>>>>>>>>>>>>>>>>>>> of FLIP-27 Source API.
>>>>>>>>>>>>>>>>>>>>>>>>>>> 4. A new interface LookupFunction is introduced to
>>>>>>>>>> make
>>>>>>>>>>>>> the
>>>>>>>>>>>>>>>>>>> semantic of lookup more straightforward for developers.
>>>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>>>>> For replying to Alexander:
>>>>>>>>>>>>>>>>>>>>>>>>>>>> However I'm a little confused whether InputFormat
>>>>>>>>> is
>>>>>>>>>>>>>>>>> deprecated
>>>>>>>>>>>>>>>>>>> or not. Am I right that it will be so in the future, but
>>>>>>>>>>> currently
>>>>>>>>>>>>>>> it's
>>>>>>>>>>>>>>>>>> not?
>>>>>>>>>>>>>>>>>>>>>>>>>>> Yes you are right. InputFormat is not deprecated
>>>>>>>> for
>>>>>>>>>>> now.
>>>>>>>>>>>>> I
>>>>>>>>>>>>>>>>>> think
>>>>>>>>>>>>>>>>>>> it will be deprecated in the future but we don't have a
>>>>>>>>> clear
>>>>>>>>>>> plan
>>>>>>>>>>>>>>> for
>>>>>>>>>>>>>>>>>> that.
>>>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>>>>> Thanks again for the discussion on this FLIP and
>>>>>>>>>> looking
>>>>>>>>>>>>>>>>> forward
>>>>>>>>>>>>>>>>>>> to cooperating with you after we finalize the design and
>>>>>>>>>>>>> interfaces!
>>>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>>>>> [1]
>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>> 
>>>>>>>>>>>>> 
>>>>>>>>>>> 
>>>>>>>>>> 
>>>>>>>>> 
>>>>>>>> 
>>>>>> 
>>>>> https://cwiki.apache.org/confluence/display/FLINK/FLIP-221+Abstraction+for+lookup+source+cache+and+metric
>>>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>>>>> Best regards,
>>>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>>>>> Qingsheng
>>>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>>>>> On Fri, May 13, 2022 at 12:12 AM Александр
>>>>>>>> Смирнов <
>>>>>>>>>>>>>>>>>>> smirale...@gmail.com> wrote:
>>>>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>>>>>> Hi Jark, Qingsheng and Leonard!
>>>>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>>>>>> Glad to see that we came to a consensus on almost
>>>>>>>>> all
>>>>>>>>>>>>>>> points!
>>>>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>>>>>> However I'm a little confused whether InputFormat
>>>>>>>>> is
>>>>>>>>>>>>>>>>> deprecated
>>>>>>>>>>>>>>>>>>> or
>>>>>>>>>>>>>>>>>>>>>>>>>>>> not. Am I right that it will be so in the future,
>>>>>>>>> but
>>>>>>>>>>>>>>>>> currently
>>>>>>>>>>>>>>>>>>> it's
>>>>>>>>>>>>>>>>>>>>>>>>>>>> not? Actually I also think that for the first
>>>>>>>>> version
>>>>>>>>>>>>> it's
>>>>>>>>>>>>>>> OK
>>>>>>>>>>>>>>>>>> to
>>>>>>>>>>>>>>>>>>> use
>>>>>>>>>>>>>>>>>>>>>>>>>>>> InputFormat in ALL cache realization, because
>>>>>>>>>>> supporting
>>>>>>>>>>>>>>>>> rescan
>>>>>>>>>>>>>>>>>>>>>>>>>>>> ability seems like a very distant prospect. But
>>>>>>>> for
>>>>>>>>>>> this
>>>>>>>>>>>>>>>>>>> decision we
>>>>>>>>>>>>>>>>>>>>>>>>>>>> need a consensus among all discussion
>>>>>>>> participants.
>>>>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>>>>>> In general, I don't have something to argue with
>>>>>>>>> your
>>>>>>>>>>>>>>>>>>> statements. All
>>>>>>>>>>>>>>>>>>>>>>>>>>>> of them correspond my ideas. Looking ahead, it
>>>>>>>>> would
>>>>>>>>>> be
>>>>>>>>>>>>> nice
>>>>>>>>>>>>>>>>> to
>>>>>>>>>>>>>>>>>>> work
>>>>>>>>>>>>>>>>>>>>>>>>>>>> on this FLIP cooperatively. I've already done a
>>>>>>>> lot
>>>>>>>>>> of
>>>>>>>>>>>>> work
>>>>>>>>>>>>>>>>> on
>>>>>>>>>>>>>>>>>>> lookup
>>>>>>>>>>>>>>>>>>>>>>>>>>>> join caching with realization very close to the
>>>>>>>> one
>>>>>>>>>> we
>>>>>>>>>>>>> are
>>>>>>>>>>>>>>>>>>> discussing,
>>>>>>>>>>>>>>>>>>>>>>>>>>>> and want to share the results of this work.
>>>>>>>> Anyway
>>>>>>>>>>>>> looking
>>>>>>>>>>>>>>>>>>> forward for
>>>>>>>>>>>>>>>>>>>>>>>>>>>> the FLIP update!
>>>>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>>>>>> Best regards,
>>>>>>>>>>>>>>>>>>>>>>>>>>>> Smirnov Alexander
>>>>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>>>>>> чт, 12 мая 2022 г. в 17:38, Jark Wu <
>>>>>>>>>> imj...@gmail.com
>>>>>>>>>>>> :
>>>>>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Hi Alex,
>>>>>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Thanks for summarizing your points.
>>>>>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>> In the past week, Qingsheng, Leonard, and I have
>>>>>>>>>>>>> discussed
>>>>>>>>>>>>>>>>> it
>>>>>>>>>>>>>>>>>>> several times
>>>>>>>>>>>>>>>>>>>>>>>>>>>>> and we have totally refactored the design.
>>>>>>>>>>>>>>>>>>>>>>>>>>>>> I'm glad to say we have reached a consensus on
>>>>>>>>> many
>>>>>>>>>> of
>>>>>>>>>>>>> your
>>>>>>>>>>>>>>>>>>> points!
>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Qingsheng is still working on updating the
>>>>>>>> design
>>>>>>>>>> docs
>>>>>>>>>>>>> and
>>>>>>>>>>>>>>>>>>> maybe can be
>>>>>>>>>>>>>>>>>>>>>>>>>>>>> available in the next few days.
>>>>>>>>>>>>>>>>>>>>>>>>>>>>> I will share some conclusions from our
>>>>>>>>> discussions:
>>>>>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>> 1) we have refactored the design towards to
>>>>>>>> "cache
>>>>>>>>>> in
>>>>>>>>>>>>>>>>>>> framework" way.
>>>>>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>> 2) a "LookupCache" interface for users to
>>>>>>>>> customize
>>>>>>>>>>> and
>>>>>>>>>>>>> a
>>>>>>>>>>>>>>>>>>> default
>>>>>>>>>>>>>>>>>>>>>>>>>>>>> implementation with builder for users to
>>>>>>>> easy-use.
>>>>>>>>>>>>>>>>>>>>>>>>>>>>> This can both make it possible to both have
>>>>>>>>>>> flexibility
>>>>>>>>>>>>> and
>>>>>>>>>>>>>>>>>>> conciseness.
>>>>>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>> 3) Filter pushdown is important for ALL and LRU
>>>>>>>>>> lookup
>>>>>>>>>>>>>>>>> cache,
>>>>>>>>>>>>>>>>>>> esp reducing
>>>>>>>>>>>>>>>>>>>>>>>>>>>>> IO.
>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Filter pushdown should be the final state and
>>>>>>>> the
>>>>>>>>>>>>> unified
>>>>>>>>>>>>>>>>> way
>>>>>>>>>>>>>>>>>>> to both
>>>>>>>>>>>>>>>>>>>>>>>>>>>>> support pruning ALL cache and LRU cache,
>>>>>>>>>>>>>>>>>>>>>>>>>>>>> so I think we should make effort in this
>>>>>>>>> direction.
>>>>>>>>>> If
>>>>>>>>>>>>> we
>>>>>>>>>>>>>>>>> need
>>>>>>>>>>>>>>>>>>> to support
>>>>>>>>>>>>>>>>>>>>>>>>>>>>> filter pushdown for ALL cache anyway, why not
>>>>>>>> use
>>>>>>>>>>>>>>>>>>>>>>>>>>>>> it for LRU cache as well? Either way, as we
>>>>>>>> decide
>>>>>>>>>> to
>>>>>>>>>>>>>>>>>> implement
>>>>>>>>>>>>>>>>>>> the cache
>>>>>>>>>>>>>>>>>>>>>>>>>>>>> in the framework, we have the chance to support
>>>>>>>>>>>>>>>>>>>>>>>>>>>>> filter on cache anytime. This is an optimization
>>>>>>>>> and
>>>>>>>>>>> it
>>>>>>>>>>>>>>>>>> doesn't
>>>>>>>>>>>>>>>>>>> affect the
>>>>>>>>>>>>>>>>>>>>>>>>>>>>> public API. I think we can create a JIRA issue
>>>>>>>> to
>>>>>>>>>>>>>>>>>>>>>>>>>>>>> discuss it when the FLIP is accepted.
>>>>>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>> 4) The idea to support ALL cache is similar to
>>>>>>>>> your
>>>>>>>>>>>>>>>>> proposal.
>>>>>>>>>>>>>>>>>>>>>>>>>>>>> In the first version, we will only support
>>>>>>>>>>> InputFormat,
>>>>>>>>>>>>>>>>>>> SourceFunction for
>>>>>>>>>>>>>>>>>>>>>>>>>>>>> cache all (invoke InputFormat in join operator).
>>>>>>>>>>>>>>>>>>>>>>>>>>>>> For FLIP-27 source, we need to join a true
>>>>>>>> source
>>>>>>>>>>>>> operator
>>>>>>>>>>>>>>>>>>> instead of
>>>>>>>>>>>>>>>>>>>>>>>>>>>>> calling it embedded in the join operator.
>>>>>>>>>>>>>>>>>>>>>>>>>>>>> However, this needs another FLIP to support the
>>>>>>>>>>> re-scan
>>>>>>>>>>>>>>>>>> ability
>>>>>>>>>>>>>>>>>>> for FLIP-27
>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Source, and this can be a large work.
>>>>>>>>>>>>>>>>>>>>>>>>>>>>> In order to not block this issue, we can put the
>>>>>>>>>>> effort
>>>>>>>>>>>>> of
>>>>>>>>>>>>>>>>>>> FLIP-27 source
>>>>>>>>>>>>>>>>>>>>>>>>>>>>> integration into future work and integrate
>>>>>>>>>>>>>>>>>>>>>>>>>>>>> InputFormat&SourceFunction for now.
>>>>>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>> I think it's fine to use
>>>>>>>>> InputFormat&SourceFunction,
>>>>>>>>>>> as
>>>>>>>>>>>>>>> they
>>>>>>>>>>>>>>>>>>> are not
>>>>>>>>>>>>>>>>>>>>>>>>>>>>> deprecated, otherwise, we have to introduce
>>>>>>>>> another
>>>>>>>>>>>>>>> function
>>>>>>>>>>>>>>>>>>>>>>>>>>>>> similar to them which is meaningless. We need to
>>>>>>>>>> plan
>>>>>>>>>>>>>>>>> FLIP-27
>>>>>>>>>>>>>>>>>>> source
>>>>>>>>>>>>>>>>>>>>>>>>>>>>> integration ASAP before InputFormat &
>>>>>>>>> SourceFunction
>>>>>>>>>>> are
>>>>>>>>>>>>>>>>>>> deprecated.
>>>>>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Best,
>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Jark
>>>>>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>> On Thu, 12 May 2022 at 15:46, Александр Смирнов
>>>>>>>> <
>>>>>>>>>>>>>>>>>>> smirale...@gmail.com>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>> wrote:
>>>>>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Hi Martijn!
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Got it. Therefore, the realization with
>>>>>>>>> InputFormat
>>>>>>>>>>> is
>>>>>>>>>>>>> not
>>>>>>>>>>>>>>>>>>> considered.
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Thanks for clearing that up!
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Best regards,
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Smirnov Alexander
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> чт, 12 мая 2022 г. в 14:23, Martijn Visser <
>>>>>>>>>>>>>>>>>>> mart...@ververica.com>:
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Hi,
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> With regards to:
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> But if there are plans to refactor all
>>>>>>>>> connectors
>>>>>>>>>>> to
>>>>>>>>>>>>>>>>>> FLIP-27
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Yes, FLIP-27 is the target for all connectors.
>>>>>>>>> The
>>>>>>>>>>> old
>>>>>>>>>>>>>>>>>>> interfaces will be
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> deprecated and connectors will either be
>>>>>>>>>> refactored
>>>>>>>>>>> to
>>>>>>>>>>>>>>> use
>>>>>>>>>>>>>>>>>>> the new ones
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> or
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> dropped.
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> The caching should work for connectors that
>>>>>>>> are
>>>>>>>>>>> using
>>>>>>>>>>>>>>>>>> FLIP-27
>>>>>>>>>>>>>>>>>>> interfaces,
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> we should not introduce new features for old
>>>>>>>>>>>>> interfaces.
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Best regards,
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Martijn
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> On Thu, 12 May 2022 at 06:19, Александр
>>>>>>>> Смирнов
>>>>>>>>> <
>>>>>>>>>>>>>>>>>>> smirale...@gmail.com>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> wrote:
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Hi Jark!
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Sorry for the late response. I would like to
>>>>>>>>> make
>>>>>>>>>>>>> some
>>>>>>>>>>>>>>>>>>> comments and
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> clarify my points.
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> 1) I agree with your first statement. I think
>>>>>>>>> we
>>>>>>>>>>> can
>>>>>>>>>>>>>>>>>> achieve
>>>>>>>>>>>>>>>>>>> both
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> advantages this way: put the Cache interface
>>>>>>>> in
>>>>>>>>>>>>>>>>>>> flink-table-common,
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> but have implementations of it in
>>>>>>>>>>>>> flink-table-runtime.
>>>>>>>>>>>>>>>>>>> Therefore if a
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> connector developer wants to use existing
>>>>>>>> cache
>>>>>>>>>>>>>>>>> strategies
>>>>>>>>>>>>>>>>>>> and their
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> implementations, he can just pass
>>>>>>>> lookupConfig
>>>>>>>>> to
>>>>>>>>>>> the
>>>>>>>>>>>>>>>>>>> planner, but if
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> he wants to have its own cache implementation
>>>>>>>>> in
>>>>>>>>>>> his
>>>>>>>>>>>>>>>>>>> TableFunction, it
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> will be possible for him to use the existing
>>>>>>>>>>>>> interface
>>>>>>>>>>>>>>>>> for
>>>>>>>>>>>>>>>>>>> this
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> purpose (we can explicitly point this out in
>>>>>>>>> the
>>>>>>>>>>>>>>>>>>> documentation). In
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> this way all configs and metrics will be
>>>>>>>>> unified.
>>>>>>>>>>>>> WDYT?
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> If a filter can prune 90% of data in the
>>>>>>>>> cache,
>>>>>>>>>> we
>>>>>>>>>>>>> will
>>>>>>>>>>>>>>>>>>> have 90% of
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> lookup requests that can never be cached
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> 2) Let me clarify the logic filters
>>>>>>>>> optimization
>>>>>>>>>> in
>>>>>>>>>>>>> case
>>>>>>>>>>>>>>>>> of
>>>>>>>>>>>>>>>>>>> LRU cache.
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> It looks like Cache<RowData,
>>>>>>>>>> Collection<RowData>>.
>>>>>>>>>>>>> Here
>>>>>>>>>>>>>>>>> we
>>>>>>>>>>>>>>>>>>> always
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> store the response of the dimension table in
>>>>>>>>>> cache,
>>>>>>>>>>>>> even
>>>>>>>>>>>>>>>>>>> after
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> applying calc function. I.e. if there are no
>>>>>>>>> rows
>>>>>>>>>>>>> after
>>>>>>>>>>>>>>>>>>> applying
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> filters to the result of the 'eval' method of
>>>>>>>>>>>>>>>>>> TableFunction,
>>>>>>>>>>>>>>>>>>> we store
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> the empty list by lookup keys. Therefore the
>>>>>>>>>> cache
>>>>>>>>>>>>> line
>>>>>>>>>>>>>>>>>> will
>>>>>>>>>>>>>>>>>>> be
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> filled, but will require much less memory (in
>>>>>>>>>>> bytes).
>>>>>>>>>>>>>>>>> I.e.
>>>>>>>>>>>>>>>>>>> we don't
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> completely filter keys, by which result was
>>>>>>>>>> pruned,
>>>>>>>>>>>>> but
>>>>>>>>>>>>>>>>>>> significantly
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> reduce required memory to store this result.
>>>>>>>> If
>>>>>>>>>> the
>>>>>>>>>>>>> user
>>>>>>>>>>>>>>>>>>> knows about
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> this behavior, he can increase the 'max-rows'
>>>>>>>>>>> option
>>>>>>>>>>>>>>>>> before
>>>>>>>>>>>>>>>>>>> the start
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> of the job. But actually I came up with the
>>>>>>>>> idea
>>>>>>>>>>>>> that we
>>>>>>>>>>>>>>>>>> can
>>>>>>>>>>>>>>>>>>> do this
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> automatically by using the 'maximumWeight'
>>>>>>>> and
>>>>>>>>>>>>> 'weigher'
>>>>>>>>>>>>>>>>>>> methods of
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> GuavaCache [1]. Weight can be the size of the
>>>>>>>>>>>>> collection
>>>>>>>>>>>>>>>>> of
>>>>>>>>>>>>>>>>>>> rows
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> (value of cache). Therefore cache can
>>>>>>>>>> automatically
>>>>>>>>>>>>> fit
>>>>>>>>>>>>>>>>>> much
>>>>>>>>>>>>>>>>>>> more
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> records than before.
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Flink SQL has provided a standard way to do
>>>>>>>>>>> filters
>>>>>>>>>>>>> and
>>>>>>>>>>>>>>>>>>> projects
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> pushdown, i.e., SupportsFilterPushDown and
>>>>>>>>>>>>>>>>>>> SupportsProjectionPushDown.
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Jdbc/hive/HBase haven't implemented the
>>>>>>>>>>> interfaces,
>>>>>>>>>>>>>>>>> don't
>>>>>>>>>>>>>>>>>>> mean it's
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> hard
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> to implement.
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> It's debatable how difficult it will be to
>>>>>>>>>>> implement
>>>>>>>>>>>>>>>>> filter
>>>>>>>>>>>>>>>>>>> pushdown.
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> But I think the fact that currently there is
>>>>>>>> no
>>>>>>>>>>>>> database
>>>>>>>>>>>>>>>>>>> connector
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> with filter pushdown at least means that this
>>>>>>>>>>> feature
>>>>>>>>>>>>>>>>> won't
>>>>>>>>>>>>>>>>>>> be
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> supported soon in connectors. Moreover, if we
>>>>>>>>>> talk
>>>>>>>>>>>>> about
>>>>>>>>>>>>>>>>>>> other
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> connectors (not in Flink repo), their
>>>>>>>> databases
>>>>>>>>>>> might
>>>>>>>>>>>>>>> not
>>>>>>>>>>>>>>>>>>> support all
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Flink filters (or not support filters at
>>>>>>>> all).
>>>>>>>>> I
>>>>>>>>>>>>> think
>>>>>>>>>>>>>>>>>> users
>>>>>>>>>>>>>>>>>>> are
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> interested in supporting cache filters
>>>>>>>>>> optimization
>>>>>>>>>>>>>>>>>>> independently of
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> supporting other features and solving more
>>>>>>>>>> complex
>>>>>>>>>>>>>>>>> problems
>>>>>>>>>>>>>>>>>>> (or
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> unsolvable at all).
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> 3) I agree with your third statement.
>>>>>>>> Actually
>>>>>>>>> in
>>>>>>>>>>> our
>>>>>>>>>>>>>>>>>>> internal version
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> I also tried to unify the logic of scanning
>>>>>>>> and
>>>>>>>>>>>>>>> reloading
>>>>>>>>>>>>>>>>>>> data from
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> connectors. But unfortunately, I didn't find
>>>>>>>> a
>>>>>>>>>> way
>>>>>>>>>>> to
>>>>>>>>>>>>>>>>> unify
>>>>>>>>>>>>>>>>>>> the logic
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> of all ScanRuntimeProviders (InputFormat,
>>>>>>>>>>>>>>> SourceFunction,
>>>>>>>>>>>>>>>>>>> Source,...)
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> and reuse it in reloading ALL cache. As a
>>>>>>>>> result
>>>>>>>>>> I
>>>>>>>>>>>>>>>>> settled
>>>>>>>>>>>>>>>>>>> on using
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> InputFormat, because it was used for scanning
>>>>>>>>> in
>>>>>>>>>>> all
>>>>>>>>>>>>>>>>> lookup
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> connectors. (I didn't know that there are
>>>>>>>> plans
>>>>>>>>>> to
>>>>>>>>>>>>>>>>>> deprecate
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> InputFormat in favor of FLIP-27 Source). IMO
>>>>>>>>>> usage
>>>>>>>>>>> of
>>>>>>>>>>>>>>>>>>> FLIP-27 source
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> in ALL caching is not good idea, because this
>>>>>>>>>>> source
>>>>>>>>>>>>> was
>>>>>>>>>>>>>>>>>>> designed to
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> work in distributed environment
>>>>>>>>> (SplitEnumerator
>>>>>>>>>> on
>>>>>>>>>>>>>>>>>>> JobManager and
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> SourceReaders on TaskManagers), not in one
>>>>>>>>>> operator
>>>>>>>>>>>>>>>>> (lookup
>>>>>>>>>>>>>>>>>>> join
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> operator in our case). There is even no
>>>>>>>> direct
>>>>>>>>>> way
>>>>>>>>>>> to
>>>>>>>>>>>>>>>>> pass
>>>>>>>>>>>>>>>>>>> splits from
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> SplitEnumerator to SourceReader (this logic
>>>>>>>>> works
>>>>>>>>>>>>>>> through
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> SplitEnumeratorContext, which requires
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> OperatorCoordinator.SubtaskGateway to send
>>>>>>>>>>>>>>>>> AddSplitEvents).
>>>>>>>>>>>>>>>>>>> Usage of
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> InputFormat for ALL cache seems much more
>>>>>>>>> clearer
>>>>>>>>>>> and
>>>>>>>>>>>>>>>>>>> easier. But if
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> there are plans to refactor all connectors to
>>>>>>>>>>>>> FLIP-27, I
>>>>>>>>>>>>>>>>>>> have the
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> following ideas: maybe we can refuse from
>>>>>>>>> lookup
>>>>>>>>>>> join
>>>>>>>>>>>>>>> ALL
>>>>>>>>>>>>>>>>>>> cache in
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> favor of simple join with multiple scanning
>>>>>>>> of
>>>>>>>>>>> batch
>>>>>>>>>>>>>>>>>> source?
>>>>>>>>>>>>>>>>>>> The point
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> is that the only difference between lookup
>>>>>>>> join
>>>>>>>>>> ALL
>>>>>>>>>>>>>>> cache
>>>>>>>>>>>>>>>>>>> and simple
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> join with batch source is that in the first
>>>>>>>>> case
>>>>>>>>>>>>>>> scanning
>>>>>>>>>>>>>>>>>> is
>>>>>>>>>>>>>>>>>>> performed
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> multiple times, in between which state
>>>>>>>> (cache)
>>>>>>>>> is
>>>>>>>>>>>>>>> cleared
>>>>>>>>>>>>>>>>>>> (correct me
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> if I'm wrong). So what if we extend the
>>>>>>>>>>>>> functionality of
>>>>>>>>>>>>>>>>>>> simple join
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> to support state reloading + extend the
>>>>>>>>>>>>> functionality of
>>>>>>>>>>>>>>>>>>> scanning
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> batch source multiple times (this one should
>>>>>>>> be
>>>>>>>>>>> easy
>>>>>>>>>>>>>>> with
>>>>>>>>>>>>>>>>>>> new FLIP-27
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> source, that unifies streaming/batch reading
>>>>>>>> -
>>>>>>>>> we
>>>>>>>>>>>>> will
>>>>>>>>>>>>>>>>> need
>>>>>>>>>>>>>>>>>>> to change
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> only SplitEnumerator, which will pass splits
>>>>>>>>>> again
>>>>>>>>>>>>> after
>>>>>>>>>>>>>>>>>>> some TTL).
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> WDYT? I must say that this looks like a
>>>>>>>>> long-term
>>>>>>>>>>>>> goal
>>>>>>>>>>>>>>>>> and
>>>>>>>>>>>>>>>>>>> will make
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> the scope of this FLIP even larger than you
>>>>>>>>> said.
>>>>>>>>>>>>> Maybe
>>>>>>>>>>>>>>>>> we
>>>>>>>>>>>>>>>>>>> can limit
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> ourselves to a simpler solution now
>>>>>>>>>> (InputFormats).
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> So to sum up, my points is like this:
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> 1) There is a way to make both concise and
>>>>>>>>>> flexible
>>>>>>>>>>>>>>>>>>> interfaces for
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> caching in lookup join.
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> 2) Cache filters optimization is important
>>>>>>>> both
>>>>>>>>>> in
>>>>>>>>>>>>> LRU
>>>>>>>>>>>>>>>>> and
>>>>>>>>>>>>>>>>>>> ALL caches.
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> 3) It is unclear when filter pushdown will be
>>>>>>>>>>>>> supported
>>>>>>>>>>>>>>>>> in
>>>>>>>>>>>>>>>>>>> Flink
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> connectors, some of the connectors might not
>>>>>>>>> have
>>>>>>>>>>> the
>>>>>>>>>>>>>>>>>>> opportunity to
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> support filter pushdown + as I know,
>>>>>>>> currently
>>>>>>>>>>> filter
>>>>>>>>>>>>>>>>>>> pushdown works
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> only for scanning (not lookup). So cache
>>>>>>>>> filters
>>>>>>>>>> +
>>>>>>>>>>>>>>>>>>> projections
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> optimization should be independent from other
>>>>>>>>>>>>> features.
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> 4) ALL cache realization is a complex topic
>>>>>>>>> that
>>>>>>>>>>>>>>> involves
>>>>>>>>>>>>>>>>>>> multiple
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> aspects of how Flink is developing. Refusing
>>>>>>>>> from
>>>>>>>>>>>>>>>>>>> InputFormat in favor
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> of FLIP-27 Source will make ALL cache
>>>>>>>>> realization
>>>>>>>>>>>>> really
>>>>>>>>>>>>>>>>>>> complex and
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> not clear, so maybe instead of that we can
>>>>>>>>> extend
>>>>>>>>>>> the
>>>>>>>>>>>>>>>>>>> functionality of
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> simple join or not refuse from InputFormat in
>>>>>>>>>> case
>>>>>>>>>>> of
>>>>>>>>>>>>>>>>>> lookup
>>>>>>>>>>>>>>>>>>> join ALL
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> cache?
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Best regards,
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Smirnov Alexander
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> [1]
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>> 
>>>>>>>>>>>>> 
>>>>>>>>>>> 
>>>>>>>>>> 
>>>>>>>>> 
>>>>>>>> 
>>>>>> 
>>>>> https://guava.dev/releases/18.0/api/docs/com/google/common/cache/CacheBuilder.html#weigher(com.google.common.cache.Weigher)
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> чт, 5 мая 2022 г. в 20:34, Jark Wu <
>>>>>>>>>>> imj...@gmail.com
>>>>>>>>>>>>>> :
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> It's great to see the active discussion! I
>>>>>>>>> want
>>>>>>>>>> to
>>>>>>>>>>>>>>> share
>>>>>>>>>>>>>>>>>> my
>>>>>>>>>>>>>>>>>>> ideas:
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> 1) implement the cache in framework vs.
>>>>>>>>>> connectors
>>>>>>>>>>>>> base
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> I don't have a strong opinion on this. Both
>>>>>>>>> ways
>>>>>>>>>>>>> should
>>>>>>>>>>>>>>>>>>> work (e.g.,
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> cache
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> pruning, compatibility).
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> The framework way can provide more concise
>>>>>>>>>>>>> interfaces.
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> The connector base way can define more
>>>>>>>>> flexible
>>>>>>>>>>>>> cache
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> strategies/implementations.
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> We are still investigating a way to see if
>>>>>>>> we
>>>>>>>>>> can
>>>>>>>>>>>>> have
>>>>>>>>>>>>>>>>>> both
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> advantages.
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> We should reach a consensus that the way
>>>>>>>>> should
>>>>>>>>>>> be a
>>>>>>>>>>>>>>>>> final
>>>>>>>>>>>>>>>>>>> state,
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> and we
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> are on the path to it.
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> 2) filters and projections pushdown:
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> I agree with Alex that the filter pushdown
>>>>>>>>> into
>>>>>>>>>>>>> cache
>>>>>>>>>>>>>>>>> can
>>>>>>>>>>>>>>>>>>> benefit a
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> lot
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> for
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> ALL cache.
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> However, this is not true for LRU cache.
>>>>>>>>>>> Connectors
>>>>>>>>>>>>> use
>>>>>>>>>>>>>>>>>>> cache to
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> reduce
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> IO
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> requests to databases for better throughput.
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> If a filter can prune 90% of data in the
>>>>>>>>> cache,
>>>>>>>>>> we
>>>>>>>>>>>>> will
>>>>>>>>>>>>>>>>>>> have 90% of
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> lookup
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> requests that can never be cached
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> and hit directly to the databases. That
>>>>>>>> means
>>>>>>>>>> the
>>>>>>>>>>>>> cache
>>>>>>>>>>>>>>>>> is
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> meaningless in
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> this case.
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> IMO, Flink SQL has provided a standard way
>>>>>>>> to
>>>>>>>>> do
>>>>>>>>>>>>>>> filters
>>>>>>>>>>>>>>>>>>> and projects
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> pushdown, i.e., SupportsFilterPushDown and
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> SupportsProjectionPushDown.
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Jdbc/hive/HBase haven't implemented the
>>>>>>>>>>> interfaces,
>>>>>>>>>>>>>>>>> don't
>>>>>>>>>>>>>>>>>>> mean it's
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> hard
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> to
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> implement.
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> They should implement the pushdown
>>>>>>>> interfaces
>>>>>>>>> to
>>>>>>>>>>>>> reduce
>>>>>>>>>>>>>>>>> IO
>>>>>>>>>>>>>>>>>>> and the
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> cache
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> size.
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> That should be a final state that the scan
>>>>>>>>>> source
>>>>>>>>>>>>> and
>>>>>>>>>>>>>>>>>>> lookup source
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> share
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> the exact pushdown implementation.
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> I don't see why we need to duplicate the
>>>>>>>>>> pushdown
>>>>>>>>>>>>> logic
>>>>>>>>>>>>>>>>> in
>>>>>>>>>>>>>>>>>>> caches,
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> which
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> will complex the lookup join design.
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> 3) ALL cache abstraction
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> All cache might be the most challenging part
>>>>>>>>> of
>>>>>>>>>>> this
>>>>>>>>>>>>>>>>> FLIP.
>>>>>>>>>>>>>>>>>>> We have
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> never
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> provided a reload-lookup public interface.
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Currently, we put the reload logic in the
>>>>>>>>> "eval"
>>>>>>>>>>>>> method
>>>>>>>>>>>>>>>>> of
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> TableFunction.
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> That's hard for some sources (e.g., Hive).
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Ideally, connector implementation should
>>>>>>>> share
>>>>>>>>>> the
>>>>>>>>>>>>>>> logic
>>>>>>>>>>>>>>>>>> of
>>>>>>>>>>>>>>>>>>> reload
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> and
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> scan, i.e. ScanTableSource with
>>>>>>>>>>>>>>>>>>> InputFormat/SourceFunction/FLIP-27
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Source.
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> However, InputFormat/SourceFunction are
>>>>>>>>>>> deprecated,
>>>>>>>>>>>>> and
>>>>>>>>>>>>>>>>>> the
>>>>>>>>>>>>>>>>>>> FLIP-27
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> source
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> is deeply coupled with SourceOperator.
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> If we want to invoke the FLIP-27 source in
>>>>>>>>>>>>> LookupJoin,
>>>>>>>>>>>>>>>>>> this
>>>>>>>>>>>>>>>>>>> may make
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> the
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> scope of this FLIP much larger.
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> We are still investigating how to abstract
>>>>>>>> the
>>>>>>>>>> ALL
>>>>>>>>>>>>>>> cache
>>>>>>>>>>>>>>>>>>> logic and
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> reuse
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> the existing source interfaces.
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Best,
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Jark
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> On Thu, 5 May 2022 at 20:22, Roman Boyko <
>>>>>>>>>>>>>>>>>>> ro.v.bo...@gmail.com>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> wrote:
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> It's a much more complicated activity and
>>>>>>>>> lies
>>>>>>>>>>> out
>>>>>>>>>>>>> of
>>>>>>>>>>>>>>>>> the
>>>>>>>>>>>>>>>>>>> scope of
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> this
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> improvement. Because such pushdowns should
>>>>>>>> be
>>>>>>>>>>> done
>>>>>>>>>>>>> for
>>>>>>>>>>>>>>>>>> all
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> ScanTableSource
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> implementations (not only for Lookup ones).
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> On Thu, 5 May 2022 at 19:02, Martijn
>>>>>>>> Visser <
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> martijnvis...@apache.org>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> wrote:
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Hi everyone,
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> One question regarding "And Alexander
>>>>>>>>>> correctly
>>>>>>>>>>>>>>>>>> mentioned
>>>>>>>>>>>>>>>>>>> that
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> filter
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> pushdown still is not implemented for
>>>>>>>>>>>>>>>>> jdbc/hive/hbase."
>>>>>>>>>>>>>>>>>>> -> Would
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> an
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> alternative solution be to actually
>>>>>>>>> implement
>>>>>>>>>>>>> these
>>>>>>>>>>>>>>>>>> filter
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> pushdowns?
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> I
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> can
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> imagine that there are many more benefits
>>>>>>>> to
>>>>>>>>>>> doing
>>>>>>>>>>>>>>>>> that,
>>>>>>>>>>>>>>>>>>> outside
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> of
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> lookup
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> caching and metrics.
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Best regards,
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Martijn Visser
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> https://twitter.com/MartijnVisser82
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> https://github.com/MartijnVisser
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> On Thu, 5 May 2022 at 13:58, Roman Boyko <
>>>>>>>>>>>>>>>>>>> ro.v.bo...@gmail.com>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> wrote:
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Hi everyone!
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Thanks for driving such a valuable
>>>>>>>>>> improvement!
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> I do think that single cache
>>>>>>>> implementation
>>>>>>>>>>>>> would be
>>>>>>>>>>>>>>>>> a
>>>>>>>>>>>>>>>>>>> nice
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> opportunity
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> for
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> users. And it will break the "FOR
>>>>>>>>> SYSTEM_TIME
>>>>>>>>>>> AS
>>>>>>>>>>>>> OF
>>>>>>>>>>>>>>>>>>> proc_time"
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> semantics
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> anyway - doesn't matter how it will be
>>>>>>>>>>>>> implemented.
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Putting myself in the user's shoes, I can
>>>>>>>>> say
>>>>>>>>>>>>> that:
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> 1) I would prefer to have the opportunity
>>>>>>>>> to
>>>>>>>>>>> cut
>>>>>>>>>>>>> off
>>>>>>>>>>>>>>>>>> the
>>>>>>>>>>>>>>>>>>> cache
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> size
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> by
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> simply filtering unnecessary data. And
>>>>>>>> the
>>>>>>>>>> most
>>>>>>>>>>>>>>> handy
>>>>>>>>>>>>>>>>>>> way to do
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> it
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> is
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> apply
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> it inside LookupRunners. It would be a
>>>>>>>> bit
>>>>>>>>>>>>> harder to
>>>>>>>>>>>>>>>>>>> pass it
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> through the
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> LookupJoin node to TableFunction. And
>>>>>>>>>> Alexander
>>>>>>>>>>>>>>>>>> correctly
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> mentioned
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> that
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> filter pushdown still is not implemented
>>>>>>>>> for
>>>>>>>>>>>>>>>>>>> jdbc/hive/hbase.
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> 2) The ability to set the different
>>>>>>>> caching
>>>>>>>>>>>>>>>>> parameters
>>>>>>>>>>>>>>>>>>> for
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> different
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> tables
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> is quite important. So I would prefer to
>>>>>>>>> set
>>>>>>>>>> it
>>>>>>>>>>>>>>>>> through
>>>>>>>>>>>>>>>>>>> DDL
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> rather
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> than
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> have similar ttla, strategy and other
>>>>>>>>> options
>>>>>>>>>>> for
>>>>>>>>>>>>>>> all
>>>>>>>>>>>>>>>>>>> lookup
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> tables.
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> 3) Providing the cache into the framework
>>>>>>>>>>> really
>>>>>>>>>>>>>>>>>>> deprives us of
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> extensibility (users won't be able to
>>>>>>>>>> implement
>>>>>>>>>>>>>>> their
>>>>>>>>>>>>>>>>>> own
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> cache).
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> But
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> most
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> probably it might be solved by creating
>>>>>>>>> more
>>>>>>>>>>>>>>>>> different
>>>>>>>>>>>>>>>>>>> cache
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> strategies
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> and
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> a wider set of configurations.
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> All these points are much closer to the
>>>>>>>>>> schema
>>>>>>>>>>>>>>>>> proposed
>>>>>>>>>>>>>>>>>>> by
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Alexander.
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Qingshen Ren, please correct me if I'm
>>>>>>>> not
>>>>>>>>>>> right
>>>>>>>>>>>>> and
>>>>>>>>>>>>>>>>>> all
>>>>>>>>>>>>>>>>>>> these
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> facilities
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> might be simply implemented in your
>>>>>>>>>>> architecture?
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Best regards,
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Roman Boyko
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> e.: ro.v.bo...@gmail.com
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> On Wed, 4 May 2022 at 21:01, Martijn
>>>>>>>>> Visser <
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> martijnvis...@apache.org>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> wrote:
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Hi everyone,
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> I don't have much to chip in, but just
>>>>>>>>>> wanted
>>>>>>>>>>> to
>>>>>>>>>>>>>>>>>>> express that
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> I
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> really
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> appreciate the in-depth discussion on
>>>>>>>> this
>>>>>>>>>>> topic
>>>>>>>>>>>>>>>>> and I
>>>>>>>>>>>>>>>>>>> hope
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> that
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> others
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> will join the conversation.
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Best regards,
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Martijn
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> On Tue, 3 May 2022 at 10:15, Александр
>>>>>>>>>>> Смирнов <
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> smirale...@gmail.com>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> wrote:
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Hi Qingsheng, Leonard and Jark,
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Thanks for your detailed feedback!
>>>>>>>>>> However, I
>>>>>>>>>>>>> have
>>>>>>>>>>>>>>>>>>> questions
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> about
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> some of your statements (maybe I didn't
>>>>>>>>> get
>>>>>>>>>>>>>>>>>>> something?).
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Caching actually breaks the semantic
>>>>>>>> of
>>>>>>>>>> "FOR
>>>>>>>>>>>>>>>>>>> SYSTEM_TIME
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> AS OF
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> proc_time”
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> I agree that the semantics of "FOR
>>>>>>>>>>> SYSTEM_TIME
>>>>>>>>>>>>> AS
>>>>>>>>>>>>>>>>> OF
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> proc_time"
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> is
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> not
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> fully implemented with caching, but as
>>>>>>>>> you
>>>>>>>>>>>>> said,
>>>>>>>>>>>>>>>>>> users
>>>>>>>>>>>>>>>>>>> go
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> on it
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> consciously to achieve better
>>>>>>>> performance
>>>>>>>>>> (no
>>>>>>>>>>>>> one
>>>>>>>>>>>>>>>>>>> proposed
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> to
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> enable
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> caching by default, etc.). Or by users
>>>>>>>> do
>>>>>>>>>> you
>>>>>>>>>>>>> mean
>>>>>>>>>>>>>>>>>>> other
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> developers
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> of
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> connectors? In this case developers
>>>>>>>>>>> explicitly
>>>>>>>>>>>>>>>>>> specify
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> whether
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> their
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> connector supports caching or not (in
>>>>>>>> the
>>>>>>>>>>> list
>>>>>>>>>>>>> of
>>>>>>>>>>>>>>>>>>> supported
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> options),
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> no one makes them do that if they don't
>>>>>>>>>> want
>>>>>>>>>>>>> to.
>>>>>>>>>>>>>>> So
>>>>>>>>>>>>>>>>>>> what
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> exactly is
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> the difference between implementing
>>>>>>>>> caching
>>>>>>>>>>> in
>>>>>>>>>>>>>>>>>> modules
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> flink-table-runtime and in
>>>>>>>>>> flink-table-common
>>>>>>>>>>>>> from
>>>>>>>>>>>>>>>>>> the
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> considered
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> point of view? How does it affect on
>>>>>>>>>>>>>>>>>>> breaking/non-breaking
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> the
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> semantics of "FOR SYSTEM_TIME AS OF
>>>>>>>>>>> proc_time"?
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> confront a situation that allows table
>>>>>>>>>>>>> options in
>>>>>>>>>>>>>>>>>> DDL
>>>>>>>>>>>>>>>>>>> to
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> control
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> the
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> behavior of the framework, which has
>>>>>>>>> never
>>>>>>>>>>>>>>> happened
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> previously
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> and
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> should
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> be cautious
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> If we talk about main differences of
>>>>>>>>>>> semantics
>>>>>>>>>>>>> of
>>>>>>>>>>>>>>>>> DDL
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> options
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> and
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> config options("table.exec.xxx"), isn't
>>>>>>>>> it
>>>>>>>>>>>>> about
>>>>>>>>>>>>>>>>>>> limiting
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> the
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> scope
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> of
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> the options + importance for the user
>>>>>>>>>>> business
>>>>>>>>>>>>>>>>> logic
>>>>>>>>>>>>>>>>>>> rather
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> than
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> specific location of corresponding
>>>>>>>> logic
>>>>>>>>> in
>>>>>>>>>>> the
>>>>>>>>>>>>>>>>>>> framework? I
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> mean
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> that
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> in my design, for example, putting an
>>>>>>>>>> option
>>>>>>>>>>>>> with
>>>>>>>>>>>>>>>>>>> lookup
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> cache
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> strategy in configurations would  be
>>>>>>>> the
>>>>>>>>>>> wrong
>>>>>>>>>>>>>>>>>>> decision,
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> because it
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> directly affects the user's business
>>>>>>>>> logic
>>>>>>>>>>> (not
>>>>>>>>>>>>>>>>> just
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> performance
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> optimization) + touches just several
>>>>>>>>>>> functions
>>>>>>>>>>>>> of
>>>>>>>>>>>>>>>>> ONE
>>>>>>>>>>>>>>>>>>> table
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> (there
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> can
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> be multiple tables with different
>>>>>>>>> caches).
>>>>>>>>>>>>> Does it
>>>>>>>>>>>>>>>>>>> really
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> matter for
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> the user (or someone else) where the
>>>>>>>>> logic
>>>>>>>>>> is
>>>>>>>>>>>>>>>>>> located,
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> which is
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> affected by the applied option?
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Also I can remember DDL option
>>>>>>>>>>>>> 'sink.parallelism',
>>>>>>>>>>>>>>>>>>> which in
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> some way
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> "controls the behavior of the
>>>>>>>> framework"
>>>>>>>>>> and
>>>>>>>>>>> I
>>>>>>>>>>>>>>>>> don't
>>>>>>>>>>>>>>>>>>> see any
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> problem
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> here.
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> introduce a new interface for this
>>>>>>>>>>> all-caching
>>>>>>>>>>>>>>>>>>> scenario
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> and
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> the
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> design
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> would become more complex
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> This is a subject for a separate
>>>>>>>>>> discussion,
>>>>>>>>>>>>> but
>>>>>>>>>>>>>>>>>>> actually
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> in our
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> internal version we solved this problem
>>>>>>>>>> quite
>>>>>>>>>>>>>>>>> easily
>>>>>>>>>>>>>>>>>> -
>>>>>>>>>>>>>>>>>>> we
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> reused
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> InputFormat class (so there is no need
>>>>>>>>> for
>>>>>>>>>> a
>>>>>>>>>>>>> new
>>>>>>>>>>>>>>>>>> API).
>>>>>>>>>>>>>>>>>>> The
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> point is
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> that currently all lookup connectors
>>>>>>>> use
>>>>>>>>>>>>>>>>> InputFormat
>>>>>>>>>>>>>>>>>>> for
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> scanning
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> the
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> data in batch mode: HBase, JDBC and
>>>>>>>> even
>>>>>>>>>> Hive
>>>>>>>>>>>>> - it
>>>>>>>>>>>>>>>>>> uses
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> class
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> PartitionReader, that is actually just
>>>>>>>> a
>>>>>>>>>>>>> wrapper
>>>>>>>>>>>>>>>>>> around
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> InputFormat.
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> The advantage of this solution is the
>>>>>>>>>> ability
>>>>>>>>>>>>> to
>>>>>>>>>>>>>>>>>> reload
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> cache
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> data
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> in
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> parallel (number of threads depends on
>>>>>>>>>> number
>>>>>>>>>>>>> of
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> InputSplits,
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> but
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> has
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> an upper limit). As a result cache
>>>>>>>> reload
>>>>>>>>>>> time
>>>>>>>>>>>>>>>>>>> significantly
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> reduces
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> (as well as time of input stream
>>>>>>>>>> blocking). I
>>>>>>>>>>>>> know
>>>>>>>>>>>>>>>>>> that
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> usually
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> we
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> try
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> to avoid usage of concurrency in Flink
>>>>>>>>>> code,
>>>>>>>>>>>>> but
>>>>>>>>>>>>>>>>>> maybe
>>>>>>>>>>>>>>>>>>> this
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> one
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> can
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> be
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> an exception. BTW I don't say that it's
>>>>>>>>> an
>>>>>>>>>>>>> ideal
>>>>>>>>>>>>>>>>>>> solution,
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> maybe
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> there
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> are better ones.
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Providing the cache in the framework
>>>>>>>>> might
>>>>>>>>>>>>>>>>> introduce
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> compatibility
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> issues
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> It's possible only in cases when the
>>>>>>>>>>> developer
>>>>>>>>>>>>> of
>>>>>>>>>>>>>>>>> the
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> connector
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> won't
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> properly refactor his code and will use
>>>>>>>>> new
>>>>>>>>>>>>> cache
>>>>>>>>>>>>>>>>>>> options
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> incorrectly
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> (i.e. explicitly provide the same
>>>>>>>> options
>>>>>>>>>>> into
>>>>>>>>>>>>> 2
>>>>>>>>>>>>>>>>>>> different
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> code
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> places). For correct behavior all he
>>>>>>>> will
>>>>>>>>>>> need
>>>>>>>>>>>>> to
>>>>>>>>>>>>>>>>> do
>>>>>>>>>>>>>>>>>>> is to
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> redirect
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> existing options to the framework's
>>>>>>>>>>>>> LookupConfig
>>>>>>>>>>>>>>> (+
>>>>>>>>>>>>>>>>>>> maybe
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> add an
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> alias
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> for options, if there was different
>>>>>>>>>> naming),
>>>>>>>>>>>>>>>>>> everything
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> will be
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> transparent for users. If the developer
>>>>>>>>>> won't
>>>>>>>>>>>>> do
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> refactoring at
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> all,
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> nothing will be changed for the
>>>>>>>> connector
>>>>>>>>>>>>> because
>>>>>>>>>>>>>>>>> of
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> backward
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> compatibility. Also if a developer
>>>>>>>> wants
>>>>>>>>> to
>>>>>>>>>>> use
>>>>>>>>>>>>>>> his
>>>>>>>>>>>>>>>>>> own
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> cache
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> logic,
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> he just can refuse to pass some of the
>>>>>>>>>>> configs
>>>>>>>>>>>>>>> into
>>>>>>>>>>>>>>>>>> the
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> framework,
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> and
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> instead make his own implementation
>>>>>>>> with
>>>>>>>>>>>>> already
>>>>>>>>>>>>>>>>>>> existing
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> configs
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> and
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> metrics (but actually I think that
>>>>>>>> it's a
>>>>>>>>>>> rare
>>>>>>>>>>>>>>>>> case).
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> filters and projections should be
>>>>>>>> pushed
>>>>>>>>>> all
>>>>>>>>>>>>> the
>>>>>>>>>>>>>>>>> way
>>>>>>>>>>>>>>>>>>> down
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> to
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> the
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> table
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> function, like what we do in the scan
>>>>>>>>>> source
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> It's the great purpose. But the truth
>>>>>>>> is
>>>>>>>>>> that
>>>>>>>>>>>>> the
>>>>>>>>>>>>>>>>>> ONLY
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> connector
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> that
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> supports filter pushdown is
>>>>>>>>>>>>> FileSystemTableSource
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> (no database connector supports it
>>>>>>>>>>> currently).
>>>>>>>>>>>>>>> Also
>>>>>>>>>>>>>>>>>>> for some
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> databases
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> it's simply impossible to pushdown such
>>>>>>>>>>> complex
>>>>>>>>>>>>>>>>>> filters
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> that we
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> have
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> in Flink.
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> only applying these optimizations to
>>>>>>>> the
>>>>>>>>>>> cache
>>>>>>>>>>>>>>>>> seems
>>>>>>>>>>>>>>>>>>> not
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> quite
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> useful
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Filters can cut off an arbitrarily
>>>>>>>> large
>>>>>>>>>>>>> amount of
>>>>>>>>>>>>>>>>>> data
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> from the
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> dimension table. For a simple example,
>>>>>>>>>>> suppose
>>>>>>>>>>>>> in
>>>>>>>>>>>>>>>>>>> dimension
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> table
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> 'users'
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> we have column 'age' with values from
>>>>>>>> 20
>>>>>>>>> to
>>>>>>>>>>> 40,
>>>>>>>>>>>>>>> and
>>>>>>>>>>>>>>>>>>> input
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> stream
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> 'clicks' that is ~uniformly distributed
>>>>>>>>> by
>>>>>>>>>>> age
>>>>>>>>>>>>> of
>>>>>>>>>>>>>>>>>>> users. If
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> we
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> have
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> filter 'age > 30',
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> there will be twice less data in cache.
>>>>>>>>>> This
>>>>>>>>>>>>> means
>>>>>>>>>>>>>>>>>> the
>>>>>>>>>>>>>>>>>>> user
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> can
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> increase 'lookup.cache.max-rows' by
>>>>>>>>> almost
>>>>>>>>>> 2
>>>>>>>>>>>>>>> times.
>>>>>>>>>>>>>>>>>> It
>>>>>>>>>>>>>>>>>>> will
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> gain a
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> huge
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> performance boost. Moreover, this
>>>>>>>>>>> optimization
>>>>>>>>>>>>>>>>> starts
>>>>>>>>>>>>>>>>>>> to
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> really
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> shine
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> in 'ALL' cache, where tables without
>>>>>>>>>> filters
>>>>>>>>>>>>> and
>>>>>>>>>>>>>>>>>>> projections
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> can't
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> fit
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> in memory, but with them - can. This
>>>>>>>>> opens
>>>>>>>>>> up
>>>>>>>>>>>>>>>>>>> additional
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> possibilities
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> for users. And this doesn't sound as
>>>>>>>> 'not
>>>>>>>>>>> quite
>>>>>>>>>>>>>>>>>>> useful'.
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> It would be great to hear other voices
>>>>>>>>>>>>> regarding
>>>>>>>>>>>>>>>>> this
>>>>>>>>>>>>>>>>>>> topic!
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Because
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> we have quite a lot of controversial
>>>>>>>>>> points,
>>>>>>>>>>>>> and I
>>>>>>>>>>>>>>>>>>> think
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> with
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> the
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> help
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> of others it will be easier for us to
>>>>>>>>> come
>>>>>>>>>>> to a
>>>>>>>>>>>>>>>>>>> consensus.
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Best regards,
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Smirnov Alexander
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> пт, 29 апр. 2022 г. в 22:33, Qingsheng
>>>>>>>>> Ren
>>>>>>>>>> <
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> renqs...@gmail.com
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> :
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Hi Alexander and Arvid,
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Thanks for the discussion and sorry
>>>>>>>> for
>>>>>>>>> my
>>>>>>>>>>>>> late
>>>>>>>>>>>>>>>>>>> response!
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> We
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> had
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> an
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> internal discussion together with Jark
>>>>>>>>> and
>>>>>>>>>>>>> Leonard
>>>>>>>>>>>>>>>>>> and
>>>>>>>>>>>>>>>>>>> I’d
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> like
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> to
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> summarize our ideas. Instead of
>>>>>>>>>> implementing
>>>>>>>>>>>>> the
>>>>>>>>>>>>>>>>>> cache
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> logic in
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> the
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> table
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> runtime layer or wrapping around the
>>>>>>>>>>>>> user-provided
>>>>>>>>>>>>>>>>>>> table
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> function,
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> we
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> prefer to introduce some new APIs
>>>>>>>>> extending
>>>>>>>>>>>>>>>>>>> TableFunction
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> with
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> these
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> concerns:
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> 1. Caching actually breaks the
>>>>>>>> semantic
>>>>>>>>> of
>>>>>>>>>>>>> "FOR
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> SYSTEM_TIME
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> AS OF
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> proc_time”, because it couldn’t truly
>>>>>>>>>> reflect
>>>>>>>>>>>>> the
>>>>>>>>>>>>>>>>>>> content
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> of the
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> lookup
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> table at the moment of querying. If
>>>>>>>> users
>>>>>>>>>>>>> choose
>>>>>>>>>>>>>>> to
>>>>>>>>>>>>>>>>>>> enable
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> caching
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> on
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> the
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> lookup table, they implicitly indicate
>>>>>>>>> that
>>>>>>>>>>>>> this
>>>>>>>>>>>>>>>>>>> breakage is
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> acceptable
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> in
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> exchange for the performance. So we
>>>>>>>>> prefer
>>>>>>>>>>> not
>>>>>>>>>>>>> to
>>>>>>>>>>>>>>>>>>> provide
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> caching on
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> the
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> table runtime level.
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> 2. If we make the cache implementation
>>>>>>>>> in
>>>>>>>>>>> the
>>>>>>>>>>>>>>>>>>> framework
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> (whether
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> in a
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> runner or a wrapper around
>>>>>>>>> TableFunction),
>>>>>>>>>> we
>>>>>>>>>>>>> have
>>>>>>>>>>>>>>>>> to
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> confront a
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> situation
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> that allows table options in DDL to
>>>>>>>>> control
>>>>>>>>>>> the
>>>>>>>>>>>>>>>>>>> behavior of
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> the
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> framework,
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> which has never happened previously and
>>>>>>>>>>> should
>>>>>>>>>>>>> be
>>>>>>>>>>>>>>>>>>> cautious.
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Under
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> the
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> current design the behavior of the
>>>>>>>>>> framework
>>>>>>>>>>>>>>> should
>>>>>>>>>>>>>>>>>>> only be
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> specified
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> by
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> configurations (“table.exec.xxx”), and
>>>>>>>>> it’s
>>>>>>>>>>>>> hard
>>>>>>>>>>>>>>> to
>>>>>>>>>>>>>>>>>>> apply
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> these
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> general
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> configs to a specific table.
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> 3. We have use cases that lookup
>>>>>>>> source
>>>>>>>>>>> loads
>>>>>>>>>>>>> and
>>>>>>>>>>>>>>>>>>> refresh
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> all
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> records
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> periodically into the memory to achieve
>>>>>>>>>> high
>>>>>>>>>>>>>>> lookup
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> performance
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> (like
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Hive
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> connector in the community, and also
>>>>>>>>> widely
>>>>>>>>>>>>> used
>>>>>>>>>>>>>>> by
>>>>>>>>>>>>>>>>>> our
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> internal
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> connectors). Wrapping the cache around
>>>>>>>>> the
>>>>>>>>>>>>> user’s
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> TableFunction
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> works
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> fine
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> for LRU caches, but I think we have to
>>>>>>>>>>>>> introduce a
>>>>>>>>>>>>>>>>>> new
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> interface for
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> this
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> all-caching scenario and the design
>>>>>>>> would
>>>>>>>>>>>>> become
>>>>>>>>>>>>>>>>> more
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> complex.
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> 4. Providing the cache in the
>>>>>>>> framework
>>>>>>>>>>> might
>>>>>>>>>>>>>>>>>>> introduce
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> compatibility
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> issues to existing lookup sources like
>>>>>>>>>> there
>>>>>>>>>>>>> might
>>>>>>>>>>>>>>>>>>> exist two
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> caches
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> with
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> totally different strategies if the
>>>>>>>> user
>>>>>>>>>>>>>>>>> incorrectly
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> configures
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> the
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> table
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> (one in the framework and another
>>>>>>>>>> implemented
>>>>>>>>>>>>> by
>>>>>>>>>>>>>>>>> the
>>>>>>>>>>>>>>>>>>> lookup
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> source).
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> As for the optimization mentioned by
>>>>>>>>>>>>> Alexander, I
>>>>>>>>>>>>>>>>>>> think
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> filters
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> and
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> projections should be pushed all the
>>>>>>>> way
>>>>>>>>>> down
>>>>>>>>>>>>> to
>>>>>>>>>>>>>>>>> the
>>>>>>>>>>>>>>>>>>> table
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> function,
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> like
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> what we do in the scan source, instead
>>>>>>>> of
>>>>>>>>>> the
>>>>>>>>>>>>>>>>> runner
>>>>>>>>>>>>>>>>>>> with
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> the
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> cache.
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> The
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> goal of using cache is to reduce the
>>>>>>>>>> network
>>>>>>>>>>>>> I/O
>>>>>>>>>>>>>>>>> and
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> pressure
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> on the
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> external system, and only applying
>>>>>>>> these
>>>>>>>>>>>>>>>>>> optimizations
>>>>>>>>>>>>>>>>>>> to
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> the
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> cache
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> seems
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> not quite useful.
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> I made some updates to the FLIP[1] to
>>>>>>>>>>> reflect
>>>>>>>>>>>>> our
>>>>>>>>>>>>>>>>>>> ideas.
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> We
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> prefer to
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> keep the cache implementation as a part
>>>>>>>>> of
>>>>>>>>>>>>>>>>>>> TableFunction,
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> and we
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> could
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> provide some helper classes
>>>>>>>>>>>>> (CachingTableFunction,
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> AllCachingTableFunction,
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> CachingAsyncTableFunction) to
>>>>>>>> developers
>>>>>>>>>> and
>>>>>>>>>>>>>>>>> regulate
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> metrics
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> of the
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> cache.
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Also, I made a POC[2] for your
>>>>>>>> reference.
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Looking forward to your ideas!
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> [1]
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>> 
>>>>>>>>>>>>> 
>>>>>>>>>>> 
>>>>>>>>>> 
>>>>>>>>> 
>>>>>>>> 
>>>>>> 
>>>>> https://cwiki.apache.org/confluence/display/FLINK/FLIP-221+Abstraction+for+lookup+source+cache+and+metric
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> [2]
>>>>>>>>>>>>>>>>>> https://github.com/PatrickRen/flink/tree/FLIP-221
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Best regards,
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Qingsheng
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> On Tue, Apr 26, 2022 at 4:45 PM
>>>>>>>>> Александр
>>>>>>>>>>>>> Смирнов
>>>>>>>>>>>>>>>>> <
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> smirale...@gmail.com>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> wrote:
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Thanks for the response, Arvid!
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> I have few comments on your message.
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> but could also live with an easier
>>>>>>>>>>> solution
>>>>>>>>>>>>> as
>>>>>>>>>>>>>>>>> the
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> first
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> step:
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> I think that these 2 ways are
>>>>>>>> mutually
>>>>>>>>>>>>> exclusive
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> (originally
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> proposed
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> by Qingsheng and mine), because
>>>>>>>>>>> conceptually
>>>>>>>>>>>>>>> they
>>>>>>>>>>>>>>>>>>> follow
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> the
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> same
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> goal, but implementation details are
>>>>>>>>>>>>> different.
>>>>>>>>>>>>>>>>> If
>>>>>>>>>>>>>>>>>> we
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> will
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> go one
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> way,
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> moving to another way in the future
>>>>>>>>> will
>>>>>>>>>>> mean
>>>>>>>>>>>>>>>>>>> deleting
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> existing
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> code
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> and once again changing the API for
>>>>>>>>>>>>> connectors.
>>>>>>>>>>>>>>>>> So
>>>>>>>>>>>>>>>>>> I
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> think we
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> should
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> reach a consensus with the community
>>>>>>>>>> about
>>>>>>>>>>>>> that
>>>>>>>>>>>>>>>>> and
>>>>>>>>>>>>>>>>>>> then
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> work
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> together
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> on this FLIP, i.e. divide the work on
>>>>>>>>>> tasks
>>>>>>>>>>>>> for
>>>>>>>>>>>>>>>>>>> different
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> parts
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> of
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> the
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> flip (for example, LRU cache
>>>>>>>>> unification
>>>>>>>>>> /
>>>>>>>>>>>>>>>>>>> introducing
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> proposed
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> set
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> of
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> metrics / further work…). WDYT,
>>>>>>>>>> Qingsheng?
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> as the source will only receive the
>>>>>>>>>>> requests
>>>>>>>>>>>>>>>>> after
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> filter
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Actually if filters are applied to
>>>>>>>>> fields
>>>>>>>>>>> of
>>>>>>>>>>>>> the
>>>>>>>>>>>>>>>>>>> lookup
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> table, we
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> firstly must do requests, and only
>>>>>>>>> after
>>>>>>>>>>>>> that we
>>>>>>>>>>>>>>>>>> can
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> filter
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> responses,
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> because lookup connectors don't have
>>>>>>>>>> filter
>>>>>>>>>>>>>>>>>>> pushdown. So
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> if
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> filtering
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> is done before caching, there will be
>>>>>>>>>> much
>>>>>>>>>>>>> less
>>>>>>>>>>>>>>>>>> rows
>>>>>>>>>>>>>>>>>>> in
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> cache.
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> @Alexander unfortunately, your
>>>>>>>>>>> architecture
>>>>>>>>>>>>> is
>>>>>>>>>>>>>>>>> not
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> shared.
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> I
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> don't
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> know the
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> solution to share images to be
>>>>>>>> honest.
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Sorry for that, I’m a bit new to such
>>>>>>>>>> kinds
>>>>>>>>>>>>> of
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> conversations
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> :)
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> I have no write access to the
>>>>>>>>> confluence,
>>>>>>>>>>> so
>>>>>>>>>>>>> I
>>>>>>>>>>>>>>>>>> made a
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Jira
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> issue,
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> where described the proposed changes
>>>>>>>> in
>>>>>>>>>>> more
>>>>>>>>>>>>>>>>>> details
>>>>>>>>>>>>>>>>>>> -
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>> https://issues.apache.org/jira/browse/FLINK-27411.
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Will happy to get more feedback!
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Best,
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Smirnov Alexander
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> пн, 25 апр. 2022 г. в 19:49, Arvid
>>>>>>>>> Heise
>>>>>>>>>> <
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> ar...@apache.org>:
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Hi Qingsheng,
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Thanks for driving this; the
>>>>>>>>>> inconsistency
>>>>>>>>>>>>> was
>>>>>>>>>>>>>>>>> not
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> satisfying
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> for
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> me.
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> I second Alexander's idea though but
>>>>>>>>>> could
>>>>>>>>>>>>> also
>>>>>>>>>>>>>>>>>> live
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> with
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> an
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> easier
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> solution as the first step: Instead
>>>>>>>> of
>>>>>>>>>>>>> making
>>>>>>>>>>>>>>>>>>> caching
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> an
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> implementation
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> detail of TableFunction X, rather
>>>>>>>>>> devise a
>>>>>>>>>>>>>>>>> caching
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> layer
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> around X.
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> So
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> the
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> proposal would be a
>>>>>>>>> CachingTableFunction
>>>>>>>>>>>>> that
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> delegates to
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> X in
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> case
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> of
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> misses and else manages the cache.
>>>>>>>>>> Lifting
>>>>>>>>>>>>> it
>>>>>>>>>>>>>>>>> into
>>>>>>>>>>>>>>>>>>> the
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> operator
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> model
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> as
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> proposed would be even better but is
>>>>>>>>>>>>> probably
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> unnecessary
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> in
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> the
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> first step
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> for a lookup source (as the source
>>>>>>>>> will
>>>>>>>>>>> only
>>>>>>>>>>>>>>>>>> receive
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> the
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> requests
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> after
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> filter; applying projection may be
>>>>>>>>> more
>>>>>>>>>>>>>>>>>> interesting
>>>>>>>>>>>>>>>>>>> to
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> save
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> memory).
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Another advantage is that all the
>>>>>>>>>> changes
>>>>>>>>>>> of
>>>>>>>>>>>>>>>>> this
>>>>>>>>>>>>>>>>>>> FLIP
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> would be
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> limited to
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> options, no need for new public
>>>>>>>>>>> interfaces.
>>>>>>>>>>>>>>>>>>> Everything
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> else
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> remains
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> an
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> implementation of Table runtime.
>>>>>>>> That
>>>>>>>>>>> means
>>>>>>>>>>>>> we
>>>>>>>>>>>>>>>>> can
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> easily
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> incorporate
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> the
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> optimization potential that
>>>>>>>> Alexander
>>>>>>>>>>>>> pointed
>>>>>>>>>>>>>>>>> out
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> later.
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> @Alexander unfortunately, your
>>>>>>>>>>> architecture
>>>>>>>>>>>>> is
>>>>>>>>>>>>>>>>> not
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> shared.
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> I
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> don't
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> know the
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> solution to share images to be
>>>>>>>> honest.
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> On Fri, Apr 22, 2022 at 5:04 PM
>>>>>>>>>> Александр
>>>>>>>>>>>>>>>>> Смирнов
>>>>>>>>>>>>>>>>>> <
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> smirale...@gmail.com>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> wrote:
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Hi Qingsheng! My name is Alexander,
>>>>>>>>> I'm
>>>>>>>>>>>>> not a
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> committer
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> yet,
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> but
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> I'd
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> really like to become one. And this
>>>>>>>>>> FLIP
>>>>>>>>>>>>>>> really
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> interested
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> me.
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Actually I have worked on a similar
>>>>>>>>>>>>> feature in
>>>>>>>>>>>>>>>>> my
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> company’s
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Flink
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> fork, and we would like to share
>>>>>>>> our
>>>>>>>>>>>>> thoughts
>>>>>>>>>>>>>>>>> on
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> this and
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> make
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> code
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> open source.
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> I think there is a better
>>>>>>>> alternative
>>>>>>>>>>> than
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> introducing an
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> abstract
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> class for TableFunction
>>>>>>>>>>>>>>> (CachingTableFunction).
>>>>>>>>>>>>>>>>>> As
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> you
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> know,
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> TableFunction exists in the
>>>>>>>>>>>>> flink-table-common
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> module,
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> which
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> provides
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> only an API for working with
>>>>>>>> tables –
>>>>>>>>>>> it’s
>>>>>>>>>>>>>>> very
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> convenient
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> for
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> importing
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> in connectors. In turn,
>>>>>>>>>>>>> CachingTableFunction
>>>>>>>>>>>>>>>>>>> contains
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> logic
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> for
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> runtime execution,  so this class
>>>>>>>> and
>>>>>>>>>>>>>>>>> everything
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> connected
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> with
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> it
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> should be located in another
>>>>>>>> module,
>>>>>>>>>>>>> probably
>>>>>>>>>>>>>>>>> in
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> flink-table-runtime.
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> But this will require connectors to
>>>>>>>>>>> depend
>>>>>>>>>>>>> on
>>>>>>>>>>>>>>>>>>> another
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> module,
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> which
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> contains a lot of runtime logic,
>>>>>>>>> which
>>>>>>>>>>>>> doesn’t
>>>>>>>>>>>>>>>>>>> sound
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> good.
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> I suggest adding a new method
>>>>>>>>>>>>>>> ‘getLookupConfig’
>>>>>>>>>>>>>>>>>> to
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> LookupTableSource
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> or LookupRuntimeProvider to allow
>>>>>>>>>>>>> connectors
>>>>>>>>>>>>>>> to
>>>>>>>>>>>>>>>>>>> only
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> pass
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> configurations to the planner,
>>>>>>>>>> therefore
>>>>>>>>>>>>> they
>>>>>>>>>>>>>>>>>> won’t
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> depend on
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> runtime
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> realization. Based on these configs
>>>>>>>>>>> planner
>>>>>>>>>>>>>>>>> will
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> construct a
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> lookup
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> join operator with corresponding
>>>>>>>>>> runtime
>>>>>>>>>>>>> logic
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> (ProcessFunctions
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> in
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> module flink-table-runtime).
>>>>>>>>>> Architecture
>>>>>>>>>>>>>>> looks
>>>>>>>>>>>>>>>>>>> like
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> in
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> the
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> pinned
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> image (LookupConfig class there is
>>>>>>>>>>> actually
>>>>>>>>>>>>>>>>> yours
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> CacheConfig).
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Classes in flink-table-planner,
>>>>>>>> that
>>>>>>>>>> will
>>>>>>>>>>>>> be
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> responsible
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> for
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> this
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> –
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> CommonPhysicalLookupJoin and his
>>>>>>>>>>>>> inheritors.
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Current classes for lookup join in
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> flink-table-runtime
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> -
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> LookupJoinRunner,
>>>>>>>>>> AsyncLookupJoinRunner,
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> LookupJoinRunnerWithCalc,
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> AsyncLookupJoinRunnerWithCalc.
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> I suggest adding classes
>>>>>>>>>>>>>>>>> LookupJoinCachingRunner,
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> LookupJoinCachingRunnerWithCalc,
>>>>>>>> etc.
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> And here comes another more
>>>>>>>> powerful
>>>>>>>>>>>>> advantage
>>>>>>>>>>>>>>>>> of
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> such a
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> solution.
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> If
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> we have caching logic on a lower
>>>>>>>>> level,
>>>>>>>>>>> we
>>>>>>>>>>>>> can
>>>>>>>>>>>>>>>>>>> apply
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> some
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> optimizations to it.
>>>>>>>>>>>>> LookupJoinRunnerWithCalc
>>>>>>>>>>>>>>>>> was
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> named
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> like
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> this
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> because it uses the ‘calc’
>>>>>>>> function,
>>>>>>>>>>> which
>>>>>>>>>>>>>>>>>> actually
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> mostly
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> consists
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> of
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> filters and projections.
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> For example, in join table A with
>>>>>>>>>> lookup
>>>>>>>>>>>>> table
>>>>>>>>>>>>>>>>> B
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> condition
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> ‘JOIN …
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> ON
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> A.id = B.id AND A.age = B.age + 10
>>>>>>>>>> WHERE
>>>>>>>>>>>>>>>>>> B.salary >
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> 1000’
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> ‘calc’
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> function will contain filters
>>>>>>>> A.age =
>>>>>>>>>>>>> B.age +
>>>>>>>>>>>>>>>>> 10
>>>>>>>>>>>>>>>>>>> and
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> B.salary >
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> 1000.
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> If we apply this function before
>>>>>>>>>> storing
>>>>>>>>>>>>>>>>> records
>>>>>>>>>>>>>>>>>> in
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> cache,
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> size
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> of
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> cache will be significantly
>>>>>>>> reduced:
>>>>>>>>>>>>> filters =
>>>>>>>>>>>>>>>>>>> avoid
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> storing
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> useless
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> records in cache, projections =
>>>>>>>>> reduce
>>>>>>>>>>>>>>> records’
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> size. So
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> the
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> initial
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> max number of records in cache can
>>>>>>>> be
>>>>>>>>>>>>>>> increased
>>>>>>>>>>>>>>>>>> by
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> the
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> user.
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> What do you think about it?
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> On 2022/04/19 02:47:11 Qingsheng
>>>>>>>> Ren
>>>>>>>>>>> wrote:
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Hi devs,
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Yuan and I would like to start a
>>>>>>>>>>>>> discussion
>>>>>>>>>>>>>>>>>> about
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> FLIP-221[1],
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> which
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> introduces an abstraction of lookup
>>>>>>>>>> table
>>>>>>>>>>>>>>> cache
>>>>>>>>>>>>>>>>>> and
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> its
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> standard
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> metrics.
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Currently each lookup table source
>>>>>>>>>>> should
>>>>>>>>>>>>>>>>>>> implement
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> their
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> own
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> cache to
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> store lookup results, and there
>>>>>>>>> isn’t a
>>>>>>>>>>>>>>>>> standard
>>>>>>>>>>>>>>>>>> of
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> metrics
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> for
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> users and
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> developers to tuning their jobs
>>>>>>>> with
>>>>>>>>>>> lookup
>>>>>>>>>>>>>>>>>> joins,
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> which
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> is a
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> quite
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> common
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> use case in Flink table / SQL.
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Therefore we propose some new APIs
>>>>>>>>>>>>> including
>>>>>>>>>>>>>>>>>>> cache,
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> metrics,
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> wrapper
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> classes of TableFunction and new
>>>>>>>>> table
>>>>>>>>>>>>>>> options.
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Please
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> take a
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> look
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> at the
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> FLIP page [1] to get more details.
>>>>>>>>> Any
>>>>>>>>>>>>>>>>>> suggestions
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> and
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> comments
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> would be
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> appreciated!
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> [1]
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>> 
>>>>>>>>>>>>> 
>>>>>>>>>>> 
>>>>>>>>>> 
>>>>>>>>> 
>>>>>>>> 
>>>>>> 
>>>>> https://cwiki.apache.org/confluence/display/FLINK/FLIP-221+Abstraction+for+lookup+source+cache+and+metric
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Best regards,
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Qingsheng
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> --
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Best Regards,
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Qingsheng Ren
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Real-time Computing Team
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Alibaba Cloud
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Email: renqs...@gmail.com
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> --
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Best regards,
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Roman Boyko
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> e.: ro.v.bo...@gmail.com
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>>>>> --
>>>>>>>>>>>>>>>>>>>>>>>>>>> Best Regards,
>>>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>>>>> Qingsheng Ren
>>>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>>>>> Real-time Computing Team
>>>>>>>>>>>>>>>>>>>>>>>>>>> Alibaba Cloud
>>>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>>>>> Email: renqs...@gmail.com
>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>> 
>>>>>>>>>>>>> 
>>>>>>>>>>>> 
>>>>>>>>>>> 
>>>>>>>>>> 
>>>>>>>>> 
>>>>>>>> 
>>>>>> 
>>>>>> 
>>>>> 
>>

Re: [DISCUSS] FLIP-221 Abstraction for lookup source cache and metric

Reply via email to