Hi Jingsong, 1. Updated and thanks for the reminder!
2. We could do so for implementation but as public interface I prefer not to introduce another layer and expose too much since this FLIP is already a huge one with bunch of classes and interfaces. Best, Qingsheng > On Jun 22, 2022, at 11:16, Jingsong Li <jingsongl...@gmail.com> wrote: > > Thanks Qingsheng and all. > > I like this design. > > Some comments: > > 1. LookupCache implements Serializable? > > 2. Minor: After FLIP-234 [1], there should be many connectors that > implement both PartialCachingLookupProvider and > PartialCachingAsyncLookupProvider. Can we extract a common interface > for `LookupCache getCache();` to ensure consistency? > > [1] > https://cwiki.apache.org/confluence/display/FLINK/FLIP-234%3A+Support+Retryable+Lookup+Join+To+Solve+Delayed+Updates+Issue+In+External+Systems > > Best, > Jingsong > > On Tue, Jun 21, 2022 at 4:09 PM Qingsheng Ren <re...@apache.org> wrote: >> >> Hi devs, >> >> I’d like to push FLIP-221 forward a little bit. Recently we had some offline >> discussions and updated the FLIP. Here’s the diff compared to the previous >> version: >> >> 1. (Async)LookupFunctionProvider is designed as a base interface for >> constructing lookup functions. >> 2. From the LookupFunction we extend PartialCaching / >> FullCachingLookupProvider for partial and full caching mode. >> 3. Introduce CacheReloadTrigger for specifying reload stratrgy in full >> caching mode, and provide 2 default implementations (Periodic / >> TimedCacheReloadTrigger) >> >> Looking forward to your replies~ >> >> Best, >> Qingsheng >> >>> On Jun 2, 2022, at 17:15, Qingsheng Ren <renqs...@gmail.com> wrote: >>> >>> Hi Becket, >>> >>> Thanks for your feedback! >>> >>> 1. An alternative way is to let the implementation of cache to decide >>> whether to store a missing key in the cache instead of the framework. >>> This sounds more reasonable and makes the LookupProvider interface >>> cleaner. I can update the FLIP and clarify in the JavaDoc of >>> LookupCache#put that the cache should decide whether to store an empty >>> collection. >>> >>> 2. Initially the builder pattern is for the extensibility of >>> LookupProvider interfaces that we could need to add more >>> configurations in the future. We can remove the builder now as we have >>> resolved the issue in 1. As for the builder in DefaultLookupCache I >>> prefer to keep it because we have a lot of arguments in the >>> constructor. >>> >>> 3. I think this might overturn the overall design. I agree with >>> Becket's idea that the API design should be layered considering >>> extensibility and it'll be great to have one unified interface >>> supporting both partial, full and even mixed custom strategies, but we >>> have some issues to resolve. The original purpose of treating full >>> caching separately is that we'd like to reuse the ability of >>> ScanRuntimeProvider. Developers just need to hand over Source / >>> SourceFunction / InputFormat so that the framework could be able to >>> compose the underlying topology and control the reload (maybe in a >>> distributed way). Under your design we leave the reload operation >>> totally to the CacheStrategy and I think it will be hard for >>> developers to reuse the source in the initializeCache method. >>> >>> Best regards, >>> >>> Qingsheng >>> >>> On Thu, Jun 2, 2022 at 1:50 PM Becket Qin <becket....@gmail.com> wrote: >>>> >>>> Thanks for updating the FLIP, Qingsheng. A few more comments: >>>> >>>> 1. I am still not sure about what is the use case for cacheMissingKey(). >>>> More specifically, when would users want to have getCache() return a >>>> non-empty value and cacheMissingKey() returns false? >>>> >>>> 2. The builder pattern. Usually the builder pattern is used when there are >>>> a lot of variations of constructors. For example, if a class has three >>>> variables and all of them are optional, so there could potentially be many >>>> combinations of the variables. But in this FLIP, I don't see such case. >>>> What is the reason we have builders for all the classes? >>>> >>>> 3. Should the caching strategy be excluded from the top level provider API? >>>> Technically speaking, the Flink framework should only have two interfaces >>>> to deal with: >>>> A) LookupFunction >>>> B) AsyncLookupFunction >>>> Orthogonally, we *believe* there are two different strategies people can do >>>> caching. Note that the Flink framework does not care what is the caching >>>> strategy here. >>>> a) partial caching >>>> b) full caching >>>> >>>> Putting them together, we end up with 3 combinations that we think are >>>> valid: >>>> Aa) PartialCachingLookupFunctionProvider >>>> Ba) PartialCachingAsyncLookupFunctionProvider >>>> Ab) FullCachingLookupFunctionProvider >>>> >>>> However, the caching strategy could actually be quite flexible. E.g. an >>>> initial full cache load followed by some partial updates. Also, I am not >>>> 100% sure if the full caching will always use ScanTableSource. Including >>>> the caching strategy in the top level provider API would make it harder to >>>> extend. >>>> >>>> One possible solution is to just have *LookupFunctionProvider* and >>>> *AsyncLookupFunctionProvider >>>> *as the top level API, both with a getCacheStrategy() method returning an >>>> optional CacheStrategy. The CacheStrategy class would have the following >>>> methods: >>>> 1. void open(Context), the context exposes some of the resources that may >>>> be useful for the the caching strategy, e.g. an ExecutorService that is >>>> synchronized with the data processing, or a cache refresh trigger which >>>> blocks data processing and refresh the cache. >>>> 2. void initializeCache(), a blocking method allows users to pre-populate >>>> the cache before processing any data if they wish. >>>> 3. void maybeCache(RowData key, Collection<RowData> value), blocking or >>>> non-blocking method. >>>> 4. void refreshCache(), a blocking / non-blocking method that is invoked by >>>> the Flink framework when the cache refresh trigger is pulled. >>>> >>>> In the above design, partial caching and full caching would be >>>> implementations of the CachingStrategy. And it is OK for users to implement >>>> their own CachingStrategy if they want to. >>>> >>>> Thanks, >>>> >>>> Jiangjie (Becket) Qin >>>> >>>> >>>> On Thu, Jun 2, 2022 at 12:14 PM Jark Wu <imj...@gmail.com> wrote: >>>> >>>>> Thank Qingsheng for the detailed summary and updates, >>>>> >>>>> The changes look good to me in general. I just have one minor improvement >>>>> comment. >>>>> Could we add a static util method to the "FullCachingReloadTrigger" >>>>> interface for quick usage? >>>>> >>>>> #periodicReloadAtFixedRate(Duration) >>>>> #periodicReloadWithFixedDelay(Duration) >>>>> >>>>> I think we can also do this for LookupCache. Because users may not know >>>>> where is the default >>>>> implementations and how to use them. >>>>> >>>>> Best, >>>>> Jark >>>>> >>>>> >>>>> >>>>> >>>>> >>>>> >>>>> On Wed, 1 Jun 2022 at 18:32, Qingsheng Ren <renqs...@gmail.com> wrote: >>>>> >>>>>> Hi Jingsong, >>>>>> >>>>>> Thanks for your comments! >>>>>> >>>>>>> AllCache definition is not flexible, for example, PartialCache can use >>>>>> any custom storage, while the AllCache can not, AllCache can also be >>>>>> considered to store memory or disk, also need a flexible strategy. >>>>>> >>>>>> We had an offline discussion with Jark and Leonard. Basically we think >>>>>> exposing the interface of full cache storage to connector developers >>>>> might >>>>>> limit our future optimizations. The storage of full caching shouldn’t >>>>> have >>>>>> too many variations for different lookup tables so making it pluggable >>>>>> might not help a lot. Also I think it is not quite easy for connector >>>>>> developers to implement such an optimized storage. We can keep optimizing >>>>>> this storage in the future and all full caching lookup tables would >>>>> benefit >>>>>> from this. >>>>>> >>>>>>> We are more inclined to deprecate the connector `async` option when >>>>>> discussing FLIP-234. Can we remove this option from this FLIP? >>>>>> >>>>>> Thanks for the reminder! This option has been removed in the latest >>>>>> version. >>>>>> >>>>>> Best regards, >>>>>> >>>>>> Qingsheng >>>>>> >>>>>> >>>>>>> On Jun 1, 2022, at 15:28, Jingsong Li <jingsongl...@gmail.com> wrote: >>>>>>> >>>>>>> Thanks Alexander for your reply. We can discuss the new interface when >>>>> it >>>>>>> comes out. >>>>>>> >>>>>>> We are more inclined to deprecate the connector `async` option when >>>>>>> discussing FLIP-234 [1]. We should use hint to let planner decide. >>>>>>> Although the discussion has not yet produced a conclusion, can we >>>>> remove >>>>>>> this option from this FLIP? It doesn't seem to be related to this FLIP, >>>>>> but >>>>>>> more to FLIP-234, and we can form a conclusion over there. >>>>>>> >>>>>>> [1] https://lists.apache.org/thread/9k1sl2519kh2n3yttwqc00p07xdfns3h >>>>>>> >>>>>>> Best, >>>>>>> Jingsong >>>>>>> >>>>>>> On Wed, Jun 1, 2022 at 4:59 AM Jing Ge <j...@ververica.com> wrote: >>>>>>> >>>>>>>> Hi Jark, >>>>>>>> >>>>>>>> Thanks for clarifying it. It would be fine. as long as we could >>>>> provide >>>>>> the >>>>>>>> no-cache solution. I was just wondering if the client side cache could >>>>>>>> really help when HBase is used, since the data to look up should be >>>>>> huge. >>>>>>>> Depending how much data will be cached on the client side, the data >>>>> that >>>>>>>> should be lru in e.g. LruBlockCache will not be lru anymore. In the >>>>>> worst >>>>>>>> case scenario, once the cached data at client side is expired, the >>>>>> request >>>>>>>> will hit disk which will cause extra latency temporarily, if I am not >>>>>>>> mistaken. >>>>>>>> >>>>>>>> Best regards, >>>>>>>> Jing >>>>>>>> >>>>>>>> On Mon, May 30, 2022 at 9:59 AM Jark Wu <imj...@gmail.com> wrote: >>>>>>>> >>>>>>>>> Hi Jing Ge, >>>>>>>>> >>>>>>>>> What do you mean about the "impact on the block cache used by HBase"? >>>>>>>>> In my understanding, the connector cache and HBase cache are totally >>>>>> two >>>>>>>>> things. >>>>>>>>> The connector cache is a local/client cache, and the HBase cache is a >>>>>>>>> server cache. >>>>>>>>> >>>>>>>>>> does it make sense to have a no-cache solution as one of the >>>>>>>>> default solutions so that customers will have no effort for the >>>>>> migration >>>>>>>>> if they want to stick with Hbase cache >>>>>>>>> >>>>>>>>> The implementation migration should be transparent to users. Take the >>>>>>>> HBase >>>>>>>>> connector as >>>>>>>>> an example, it already supports lookup cache but is disabled by >>>>>> default. >>>>>>>>> After migration, the >>>>>>>>> connector still disables cache by default (i.e. no-cache solution). >>>>> No >>>>>>>>> migration effort for users. >>>>>>>>> >>>>>>>>> HBase cache and connector cache are two different things. HBase cache >>>>>>>> can't >>>>>>>>> simply replace >>>>>>>>> connector cache. Because one of the most important usages for >>>>> connector >>>>>>>>> cache is reducing >>>>>>>>> the I/O request/response and improving the throughput, which can >>>>>> achieve >>>>>>>>> by just using a server cache. >>>>>>>>> >>>>>>>>> Best, >>>>>>>>> Jark >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> On Fri, 27 May 2022 at 22:42, Jing Ge <j...@ververica.com> wrote: >>>>>>>>> >>>>>>>>>> Thanks all for the valuable discussion. The new feature looks very >>>>>>>>>> interesting. >>>>>>>>>> >>>>>>>>>> According to the FLIP description: "*Currently we have JDBC, Hive >>>>> and >>>>>>>>> HBase >>>>>>>>>> connector implemented lookup table source. All existing >>>>>> implementations >>>>>>>>>> will be migrated to the current design and the migration will be >>>>>>>>>> transparent to end users*." I was only wondering if we should pay >>>>>>>>> attention >>>>>>>>>> to HBase and similar DBs. Since, commonly, the lookup data will be >>>>>> huge >>>>>>>>>> while using HBase, partial caching will be used in this case, if I >>>>> am >>>>>>>> not >>>>>>>>>> mistaken, which might have an impact on the block cache used by >>>>> HBase, >>>>>>>>> e.g. >>>>>>>>>> LruBlockCache. >>>>>>>>>> Another question is that, since HBase provides a sophisticated cache >>>>>>>>>> solution, does it make sense to have a no-cache solution as one of >>>>> the >>>>>>>>>> default solutions so that customers will have no effort for the >>>>>>>> migration >>>>>>>>>> if they want to stick with Hbase cache? >>>>>>>>>> >>>>>>>>>> Best regards, >>>>>>>>>> Jing >>>>>>>>>> >>>>>>>>>> On Fri, May 27, 2022 at 11:19 AM Jingsong Li < >>>>> jingsongl...@gmail.com> >>>>>>>>>> wrote: >>>>>>>>>> >>>>>>>>>>> Hi all, >>>>>>>>>>> >>>>>>>>>>> I think the problem now is below: >>>>>>>>>>> 1. AllCache and PartialCache interface on the non-uniform, one >>>>> needs >>>>>>>> to >>>>>>>>>>> provide LookupProvider, the other needs to provide CacheBuilder. >>>>>>>>>>> 2. AllCache definition is not flexible, for example, PartialCache >>>>> can >>>>>>>>> use >>>>>>>>>>> any custom storage, while the AllCache can not, AllCache can also >>>>> be >>>>>>>>>>> considered to store memory or disk, also need a flexible strategy. >>>>>>>>>>> 3. AllCache can not customize ReloadStrategy, currently only >>>>>>>>>>> ScheduledReloadStrategy. >>>>>>>>>>> >>>>>>>>>>> In order to solve the above problems, the following are my ideas. >>>>>>>>>>> >>>>>>>>>>> ## Top level cache interfaces: >>>>>>>>>>> >>>>>>>>>>> ``` >>>>>>>>>>> >>>>>>>>>>> public interface CacheLookupProvider extends >>>>>>>>>>> LookupTableSource.LookupRuntimeProvider { >>>>>>>>>>> >>>>>>>>>>> CacheBuilder createCacheBuilder(); >>>>>>>>>>> } >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> public interface CacheBuilder { >>>>>>>>>>> Cache create(); >>>>>>>>>>> } >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> public interface Cache { >>>>>>>>>>> >>>>>>>>>>> /** >>>>>>>>>>> * Returns the value associated with key in this cache, or null >>>>>>>> if >>>>>>>>>>> there is no cached value for >>>>>>>>>>> * key. >>>>>>>>>>> */ >>>>>>>>>>> @Nullable >>>>>>>>>>> Collection<RowData> getIfPresent(RowData key); >>>>>>>>>>> >>>>>>>>>>> /** Returns the number of key-value mappings in the cache. */ >>>>>>>>>>> long size(); >>>>>>>>>>> } >>>>>>>>>>> >>>>>>>>>>> ``` >>>>>>>>>>> >>>>>>>>>>> ## Partial cache >>>>>>>>>>> >>>>>>>>>>> ``` >>>>>>>>>>> >>>>>>>>>>> public interface PartialCacheLookupFunction extends >>>>>>>>> CacheLookupProvider { >>>>>>>>>>> >>>>>>>>>>> @Override >>>>>>>>>>> PartialCacheBuilder createCacheBuilder(); >>>>>>>>>>> >>>>>>>>>>> /** Creates an {@link LookupFunction} instance. */ >>>>>>>>>>> LookupFunction createLookupFunction(); >>>>>>>>>>> } >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> public interface PartialCacheBuilder extends CacheBuilder { >>>>>>>>>>> >>>>>>>>>>> PartialCache create(); >>>>>>>>>>> } >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> public interface PartialCache extends Cache { >>>>>>>>>>> >>>>>>>>>>> /** >>>>>>>>>>> * Associates the specified value rows with the specified key >>>>> row >>>>>>>>>>> in the cache. If the cache >>>>>>>>>>> * previously contained value associated with the key, the old >>>>>>>>>>> value is replaced by the >>>>>>>>>>> * specified value. >>>>>>>>>>> * >>>>>>>>>>> * @return the previous value rows associated with key, or null >>>>>>>> if >>>>>>>>>>> there was no mapping for key. >>>>>>>>>>> * @param key - key row with which the specified value is to be >>>>>>>>>>> associated >>>>>>>>>>> * @param value – value rows to be associated with the specified >>>>>>>>> key >>>>>>>>>>> */ >>>>>>>>>>> Collection<RowData> put(RowData key, Collection<RowData> value); >>>>>>>>>>> >>>>>>>>>>> /** Discards any cached value for the specified key. */ >>>>>>>>>>> void invalidate(RowData key); >>>>>>>>>>> } >>>>>>>>>>> >>>>>>>>>>> ``` >>>>>>>>>>> >>>>>>>>>>> ## All cache >>>>>>>>>>> ``` >>>>>>>>>>> >>>>>>>>>>> public interface AllCacheLookupProvider extends >>>>> CacheLookupProvider { >>>>>>>>>>> >>>>>>>>>>> void registerReloadStrategy(ScheduledExecutorService >>>>>>>>>>> executorService, Reloader reloader); >>>>>>>>>>> >>>>>>>>>>> ScanTableSource.ScanRuntimeProvider getScanRuntimeProvider(); >>>>>>>>>>> >>>>>>>>>>> @Override >>>>>>>>>>> AllCacheBuilder createCacheBuilder(); >>>>>>>>>>> } >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> public interface AllCacheBuilder extends CacheBuilder { >>>>>>>>>>> >>>>>>>>>>> AllCache create(); >>>>>>>>>>> } >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> public interface AllCache extends Cache { >>>>>>>>>>> >>>>>>>>>>> void putAll(Iterator<Map<RowData, RowData>> allEntries); >>>>>>>>>>> >>>>>>>>>>> void clearAll(); >>>>>>>>>>> } >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> public interface Reloader { >>>>>>>>>>> >>>>>>>>>>> void reload(); >>>>>>>>>>> } >>>>>>>>>>> >>>>>>>>>>> ``` >>>>>>>>>>> >>>>>>>>>>> Best, >>>>>>>>>>> Jingsong >>>>>>>>>>> >>>>>>>>>>> On Fri, May 27, 2022 at 11:10 AM Jingsong Li < >>>>> jingsongl...@gmail.com >>>>>>>>> >>>>>>>>>>> wrote: >>>>>>>>>>> >>>>>>>>>>>> Thanks Qingsheng and all for your discussion. >>>>>>>>>>>> >>>>>>>>>>>> Very sorry to jump in so late. >>>>>>>>>>>> >>>>>>>>>>>> Maybe I missed something? >>>>>>>>>>>> My first impression when I saw the cache interface was, why don't >>>>>>>> we >>>>>>>>>>>> provide an interface similar to guava cache [1], on top of guava >>>>>>>>> cache, >>>>>>>>>>>> caffeine also makes extensions for asynchronous calls.[2] >>>>>>>>>>>> There is also the bulk load in caffeine too. >>>>>>>>>>>> >>>>>>>>>>>> I am also more confused why first from LookupCacheFactory.Builder >>>>>>>> and >>>>>>>>>>> then >>>>>>>>>>>> to Factory to create Cache. >>>>>>>>>>>> >>>>>>>>>>>> [1] https://github.com/google/guava >>>>>>>>>>>> [2] https://github.com/ben-manes/caffeine/wiki/Population >>>>>>>>>>>> >>>>>>>>>>>> Best, >>>>>>>>>>>> Jingsong >>>>>>>>>>>> >>>>>>>>>>>> On Thu, May 26, 2022 at 11:17 PM Jark Wu <imj...@gmail.com> >>>>> wrote: >>>>>>>>>>>> >>>>>>>>>>>>> After looking at the new introduced ReloadTime and Becket's >>>>>>>> comment, >>>>>>>>>>>>> I agree with Becket we should have a pluggable reloading >>>>> strategy. >>>>>>>>>>>>> We can provide some common implementations, e.g., periodic >>>>>>>>> reloading, >>>>>>>>>>> and >>>>>>>>>>>>> daily reloading. >>>>>>>>>>>>> But there definitely be some connector- or business-specific >>>>>>>>> reloading >>>>>>>>>>>>> strategies, e.g. >>>>>>>>>>>>> notify by a zookeeper watcher, reload once a new Hive partition >>>>> is >>>>>>>>>>>>> complete. >>>>>>>>>>>>> >>>>>>>>>>>>> Best, >>>>>>>>>>>>> Jark >>>>>>>>>>>>> >>>>>>>>>>>>> On Thu, 26 May 2022 at 11:52, Becket Qin <becket....@gmail.com> >>>>>>>>>> wrote: >>>>>>>>>>>>> >>>>>>>>>>>>>> Hi Qingsheng, >>>>>>>>>>>>>> >>>>>>>>>>>>>> Thanks for updating the FLIP. A few comments / questions below: >>>>>>>>>>>>>> >>>>>>>>>>>>>> 1. Is there a reason that we have both "XXXFactory" and >>>>>>>>>> "XXXProvider". >>>>>>>>>>>>>> What is the difference between them? If they are the same, can >>>>>>>> we >>>>>>>>>> just >>>>>>>>>>>>> use >>>>>>>>>>>>>> XXXFactory everywhere? >>>>>>>>>>>>>> >>>>>>>>>>>>>> 2. Regarding the FullCachingLookupProvider, should the reloading >>>>>>>>>>> policy >>>>>>>>>>>>>> also be pluggable? Periodical reloading could be sometimes be >>>>>>>>> tricky >>>>>>>>>>> in >>>>>>>>>>>>>> practice. For example, if user uses 24 hours as the cache >>>>>>>> refresh >>>>>>>>>>>>> interval >>>>>>>>>>>>>> and some nightly batch job delayed, the cache update may still >>>>>>>> see >>>>>>>>>> the >>>>>>>>>>>>>> stale data. >>>>>>>>>>>>>> >>>>>>>>>>>>>> 3. In DefaultLookupCacheFactory, it looks like InitialCapacity >>>>>>>>>> should >>>>>>>>>>> be >>>>>>>>>>>>>> removed. >>>>>>>>>>>>>> >>>>>>>>>>>>>> 4. The purpose of LookupFunctionProvider#cacheMissingKey() >>>>>>>> seems a >>>>>>>>>>>>> little >>>>>>>>>>>>>> confusing to me. If Optional<LookupCacheFactory> >>>>>>>> getCacheFactory() >>>>>>>>>>>>> returns >>>>>>>>>>>>>> a non-empty factory, doesn't that already indicates the >>>>>>>> framework >>>>>>>>> to >>>>>>>>>>>>> cache >>>>>>>>>>>>>> the missing keys? Also, why is this method returning an >>>>>>>>>>>>> Optional<Boolean> >>>>>>>>>>>>>> instead of boolean? >>>>>>>>>>>>>> >>>>>>>>>>>>>> Thanks, >>>>>>>>>>>>>> >>>>>>>>>>>>>> Jiangjie (Becket) Qin >>>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>>>> On Wed, May 25, 2022 at 5:07 PM Qingsheng Ren < >>>>>>>> renqs...@gmail.com >>>>>>>>>> >>>>>>>>>>>>> wrote: >>>>>>>>>>>>>> >>>>>>>>>>>>>>> Hi Lincoln and Jark, >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> Thanks for the comments! If the community reaches a consensus >>>>>>>>> that >>>>>>>>>> we >>>>>>>>>>>>> use >>>>>>>>>>>>>>> SQL hint instead of table options to decide whether to use sync >>>>>>>>> or >>>>>>>>>>>>> async >>>>>>>>>>>>>>> mode, it’s indeed not necessary to introduce the “lookup.async” >>>>>>>>>>> option. >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> I think it’s a good idea to let the decision of async made on >>>>>>>>> query >>>>>>>>>>>>>>> level, which could make better optimization with more >>>>>>>> infomation >>>>>>>>>>>>> gathered >>>>>>>>>>>>>>> by planner. Is there any FLIP describing the issue in >>>>>>>>> FLINK-27625? >>>>>>>>>> I >>>>>>>>>>>>>>> thought FLIP-234 is only proposing adding SQL hint for retry on >>>>>>>>>>> missing >>>>>>>>>>>>>>> instead of the entire async mode to be controlled by hint. >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> Best regards, >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> Qingsheng >>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> On May 25, 2022, at 15:13, Lincoln Lee < >>>>>>>> lincoln.8...@gmail.com >>>>>>>>>> >>>>>>>>>>>>> wrote: >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> Hi Jark, >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> Thanks for your reply! >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> Currently 'lookup.async' just lies in HBase connector, I have >>>>>>>>> no >>>>>>>>>>> idea >>>>>>>>>>>>>>>> whether or when to remove it (we can discuss it in another >>>>>>>>> issue >>>>>>>>>>> for >>>>>>>>>>>>> the >>>>>>>>>>>>>>>> HBase connector after FLINK-27625 is done), just not add it >>>>>>>>> into >>>>>>>>>> a >>>>>>>>>>>>>>> common >>>>>>>>>>>>>>>> option now. >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> Best, >>>>>>>>>>>>>>>> Lincoln Lee >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> Jark Wu <imj...@gmail.com> 于2022年5月24日周二 20:14写道: >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> Hi Lincoln, >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> I have taken a look at FLIP-234, and I agree with you that >>>>>>>> the >>>>>>>>>>>>>>> connectors >>>>>>>>>>>>>>>>> can >>>>>>>>>>>>>>>>> provide both async and sync runtime providers simultaneously >>>>>>>>>>> instead >>>>>>>>>>>>>>> of one >>>>>>>>>>>>>>>>> of them. >>>>>>>>>>>>>>>>> At that point, "lookup.async" looks redundant. If this >>>>>>>> option >>>>>>>>> is >>>>>>>>>>>>>>> planned to >>>>>>>>>>>>>>>>> be removed >>>>>>>>>>>>>>>>> in the long term, I think it makes sense not to introduce it >>>>>>>>> in >>>>>>>>>>> this >>>>>>>>>>>>>>> FLIP. >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> Best, >>>>>>>>>>>>>>>>> Jark >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> On Tue, 24 May 2022 at 11:08, Lincoln Lee < >>>>>>>>>> lincoln.8...@gmail.com >>>>>>>>>>>> >>>>>>>>>>>>>>> wrote: >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> Hi Qingsheng, >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> Sorry for jumping into the discussion so late. It's a good >>>>>>>>> idea >>>>>>>>>>>>> that >>>>>>>>>>>>>>> we >>>>>>>>>>>>>>>>> can >>>>>>>>>>>>>>>>>> have a common table option. I have a minor comments on >>>>>>>>>>>>> 'lookup.async' >>>>>>>>>>>>>>>>> that >>>>>>>>>>>>>>>>>> not make it a common option: >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> The table layer abstracts both sync and async lookup >>>>>>>>>>> capabilities, >>>>>>>>>>>>>>>>>> connectors implementers can choose one or both, in the case >>>>>>>>> of >>>>>>>>>>>>>>>>> implementing >>>>>>>>>>>>>>>>>> only one capability(status of the most of existing builtin >>>>>>>>>>>>> connectors) >>>>>>>>>>>>>>>>>> 'lookup.async' will not be used. And when a connector has >>>>>>>>> both >>>>>>>>>>>>>>>>>> capabilities, I think this choice is more suitable for >>>>>>>> making >>>>>>>>>>>>>>> decisions >>>>>>>>>>>>>>>>> at >>>>>>>>>>>>>>>>>> the query level, for example, table planner can choose the >>>>>>>>>>> physical >>>>>>>>>>>>>>>>>> implementation of async lookup or sync lookup based on its >>>>>>>>> cost >>>>>>>>>>>>>>> model, or >>>>>>>>>>>>>>>>>> users can give query hint based on their own better >>>>>>>>>>>>> understanding. If >>>>>>>>>>>>>>>>>> there is another common table option 'lookup.async', it may >>>>>>>>>>> confuse >>>>>>>>>>>>>>> the >>>>>>>>>>>>>>>>>> users in the long run. >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> So, I prefer to leave the 'lookup.async' option in private >>>>>>>>>> place >>>>>>>>>>>>> (for >>>>>>>>>>>>>>> the >>>>>>>>>>>>>>>>>> current hbase connector) and not turn it into a common >>>>>>>>> option. >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> WDYT? >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> Best, >>>>>>>>>>>>>>>>>> Lincoln Lee >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> Qingsheng Ren <renqs...@gmail.com> 于2022年5月23日周一 14:54写道: >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>> Hi Alexander, >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>> Thanks for the review! We recently updated the FLIP and >>>>>>>> you >>>>>>>>>> can >>>>>>>>>>>>> find >>>>>>>>>>>>>>>>>> those >>>>>>>>>>>>>>>>>>> changes from my latest email. Since some terminologies has >>>>>>>>>>>>> changed so >>>>>>>>>>>>>>>>>> I’ll >>>>>>>>>>>>>>>>>>> use the new concept for replying your comments. >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>> 1. Builder vs ‘of’ >>>>>>>>>>>>>>>>>>> I’m OK to use builder pattern if we have additional >>>>>>>> optional >>>>>>>>>>>>>>> parameters >>>>>>>>>>>>>>>>>>> for full caching mode (“rescan” previously). The >>>>>>>>>>>>> schedule-with-delay >>>>>>>>>>>>>>>>> idea >>>>>>>>>>>>>>>>>>> looks reasonable to me, but I think we need to redesign >>>>>>>> the >>>>>>>>>>>>> builder >>>>>>>>>>>>>>> API >>>>>>>>>>>>>>>>>> of >>>>>>>>>>>>>>>>>>> full caching to make it more descriptive for developers. >>>>>>>>> Would >>>>>>>>>>> you >>>>>>>>>>>>>>> mind >>>>>>>>>>>>>>>>>>> sharing your ideas about the API? For accessing the FLIP >>>>>>>>>>> workspace >>>>>>>>>>>>>>> you >>>>>>>>>>>>>>>>>> can >>>>>>>>>>>>>>>>>>> just provide your account ID and ping any PMC member >>>>>>>>> including >>>>>>>>>>>>> Jark. >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>> 2. Common table options >>>>>>>>>>>>>>>>>>> We have some discussions these days and propose to >>>>>>>>> introduce 8 >>>>>>>>>>>>> common >>>>>>>>>>>>>>>>>>> table options about caching. It has been updated on the >>>>>>>>> FLIP. >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>> 3. Retries >>>>>>>>>>>>>>>>>>> I think we are on the same page :-) >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>> For your additional concerns: >>>>>>>>>>>>>>>>>>> 1) The table option has been updated. >>>>>>>>>>>>>>>>>>> 2) We got “lookup.cache” back for configuring whether to >>>>>>>> use >>>>>>>>>>>>> partial >>>>>>>>>>>>>>> or >>>>>>>>>>>>>>>>>>> full caching mode. >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>> Best regards, >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>> Qingsheng >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>> On May 19, 2022, at 17:25, Александр Смирнов < >>>>>>>>>>>>> smirale...@gmail.com> >>>>>>>>>>>>>>>>>>> wrote: >>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>> Also I have a few additions: >>>>>>>>>>>>>>>>>>>> 1) maybe rename 'lookup.cache.maximum-size' to >>>>>>>>>>>>>>>>>>>> 'lookup.cache.max-rows'? I think it will be more clear >>>>>>>> that >>>>>>>>>> we >>>>>>>>>>>>> talk >>>>>>>>>>>>>>>>>>>> not about bytes, but about the number of rows. Plus it >>>>>>>> fits >>>>>>>>>>> more, >>>>>>>>>>>>>>>>>>>> considering my optimization with filters. >>>>>>>>>>>>>>>>>>>> 2) How will users enable rescanning? Are we going to >>>>>>>>> separate >>>>>>>>>>>>>>> caching >>>>>>>>>>>>>>>>>>>> and rescanning from the options point of view? Like >>>>>>>>> initially >>>>>>>>>>> we >>>>>>>>>>>>> had >>>>>>>>>>>>>>>>>>>> one option 'lookup.cache' with values LRU / ALL. I think >>>>>>>>> now >>>>>>>>>> we >>>>>>>>>>>>> can >>>>>>>>>>>>>>>>>>>> make a boolean option 'lookup.rescan'. RescanInterval can >>>>>>>>> be >>>>>>>>>>>>>>>>>>>> 'lookup.rescan.interval', etc. >>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>> Best regards, >>>>>>>>>>>>>>>>>>>> Alexander >>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>> чт, 19 мая 2022 г. в 14:50, Александр Смирнов < >>>>>>>>>>>>> smirale...@gmail.com >>>>>>>>>>>>>>>>>> : >>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>> Hi Qingsheng and Jark, >>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>> 1. Builders vs 'of' >>>>>>>>>>>>>>>>>>>>> I understand that builders are used when we have >>>>>>>> multiple >>>>>>>>>>>>>>>>> parameters. >>>>>>>>>>>>>>>>>>>>> I suggested them because we could add parameters later. >>>>>>>> To >>>>>>>>>>>>> prevent >>>>>>>>>>>>>>>>>>>>> Builder for ScanRuntimeProvider from looking redundant I >>>>>>>>> can >>>>>>>>>>>>>>> suggest >>>>>>>>>>>>>>>>>>>>> one more config now - "rescanStartTime". >>>>>>>>>>>>>>>>>>>>> It's a time in UTC (LocalTime class) when the first >>>>>>>> reload >>>>>>>>>> of >>>>>>>>>>>>> cache >>>>>>>>>>>>>>>>>>>>> starts. This parameter can be thought of as >>>>>>>> 'initialDelay' >>>>>>>>>>> (diff >>>>>>>>>>>>>>>>>>>>> between current time and rescanStartTime) in method >>>>>>>>>>>>>>>>>>>>> ScheduleExecutorService#scheduleWithFixedDelay [1] . It >>>>>>>>> can >>>>>>>>>> be >>>>>>>>>>>>> very >>>>>>>>>>>>>>>>>>>>> useful when the dimension table is updated by some other >>>>>>>>>>>>> scheduled >>>>>>>>>>>>>>>>> job >>>>>>>>>>>>>>>>>>>>> at a certain time. Or when the user simply wants a >>>>>>>> second >>>>>>>>>> scan >>>>>>>>>>>>>>>>> (first >>>>>>>>>>>>>>>>>>>>> cache reload) be delayed. This option can be used even >>>>>>>>>> without >>>>>>>>>>>>>>>>>>>>> 'rescanInterval' - in this case 'rescanInterval' will be >>>>>>>>> one >>>>>>>>>>>>> day. >>>>>>>>>>>>>>>>>>>>> If you are fine with this option, I would be very glad >>>>>>>> if >>>>>>>>>> you >>>>>>>>>>>>> would >>>>>>>>>>>>>>>>>>>>> give me access to edit FLIP page, so I could add it >>>>>>>> myself >>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>> 2. Common table options >>>>>>>>>>>>>>>>>>>>> I also think that FactoryUtil would be overloaded by all >>>>>>>>>> cache >>>>>>>>>>>>>>>>>>>>> options. But maybe unify all suggested options, not only >>>>>>>>> for >>>>>>>>>>>>>>> default >>>>>>>>>>>>>>>>>>>>> cache? I.e. class 'LookupOptions', that unifies default >>>>>>>>>> cache >>>>>>>>>>>>>>>>> options, >>>>>>>>>>>>>>>>>>>>> rescan options, 'async', 'maxRetries'. WDYT? >>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>> 3. Retries >>>>>>>>>>>>>>>>>>>>> I'm fine with suggestion close to >>>>>>>>> RetryUtils#tryTimes(times, >>>>>>>>>>>>> call) >>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>> [1] >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>> >>>>>>>>> >>>>>>>> >>>>>> >>>>> https://docs.oracle.com/javase/8/docs/api/java/util/concurrent/ScheduledExecutorService.html#scheduleWithFixedDelay-java.lang.Runnable-long-long-java.util.concurrent.TimeUnit- >>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>> Best regards, >>>>>>>>>>>>>>>>>>>>> Alexander >>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>> ср, 18 мая 2022 г. в 16:04, Qingsheng Ren < >>>>>>>>>> renqs...@gmail.com >>>>>>>>>>>> : >>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>> Hi Jark and Alexander, >>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>> Thanks for your comments! I’m also OK to introduce >>>>>>>> common >>>>>>>>>>> table >>>>>>>>>>>>>>>>>>> options. I prefer to introduce a new >>>>>>>>> DefaultLookupCacheOptions >>>>>>>>>>>>> class >>>>>>>>>>>>>>>>> for >>>>>>>>>>>>>>>>>>> holding these option definitions because putting all >>>>>>>> options >>>>>>>>>>> into >>>>>>>>>>>>>>>>>>> FactoryUtil would make it a bit ”crowded” and not well >>>>>>>>>>>>> categorized. >>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>> FLIP has been updated according to suggestions above: >>>>>>>>>>>>>>>>>>>>>> 1. Use static “of” method for constructing >>>>>>>>>>>>> RescanRuntimeProvider >>>>>>>>>>>>>>>>>>> considering both arguments are required. >>>>>>>>>>>>>>>>>>>>>> 2. Introduce new table options matching >>>>>>>>>>>>> DefaultLookupCacheFactory >>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>> Best, >>>>>>>>>>>>>>>>>>>>>> Qingsheng >>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>> On Wed, May 18, 2022 at 2:57 PM Jark Wu < >>>>>>>>> imj...@gmail.com> >>>>>>>>>>>>> wrote: >>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>> Hi Alex, >>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>> 1) retry logic >>>>>>>>>>>>>>>>>>>>>>> I think we can extract some common retry logic into >>>>>>>>>>> utilities, >>>>>>>>>>>>>>>>> e.g. >>>>>>>>>>>>>>>>>>> RetryUtils#tryTimes(times, call). >>>>>>>>>>>>>>>>>>>>>>> This seems independent of this FLIP and can be reused >>>>>>>> by >>>>>>>>>>>>>>>>> DataStream >>>>>>>>>>>>>>>>>>> users. >>>>>>>>>>>>>>>>>>>>>>> Maybe we can open an issue to discuss this and where >>>>>>>> to >>>>>>>>>> put >>>>>>>>>>>>> it. >>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>> 2) cache ConfigOptions >>>>>>>>>>>>>>>>>>>>>>> I'm fine with defining cache config options in the >>>>>>>>>>> framework. >>>>>>>>>>>>>>>>>>>>>>> A candidate place to put is FactoryUtil which also >>>>>>>>>> includes >>>>>>>>>>>>>>>>>>> "sink.parallelism", "format" options. >>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>> Best, >>>>>>>>>>>>>>>>>>>>>>> Jark >>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>> On Wed, 18 May 2022 at 13:52, Александр Смирнов < >>>>>>>>>>>>>>>>>> smirale...@gmail.com> >>>>>>>>>>>>>>>>>>> wrote: >>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>> Hi Qingsheng, >>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>> Thank you for considering my comments. >>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>> there might be custom logic before making retry, >>>>>>>> such >>>>>>>>> as >>>>>>>>>>>>>>>>>>> re-establish the connection >>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>> Yes, I understand that. I meant that such logic can >>>>>>>> be >>>>>>>>>>>>> placed in >>>>>>>>>>>>>>>>> a >>>>>>>>>>>>>>>>>>>>>>>> separate function, that can be implemented by >>>>>>>>> connectors. >>>>>>>>>>>>> Just >>>>>>>>>>>>>>>>>> moving >>>>>>>>>>>>>>>>>>>>>>>> the retry logic would make connector's LookupFunction >>>>>>>>>> more >>>>>>>>>>>>>>>>> concise >>>>>>>>>>>>>>>>>> + >>>>>>>>>>>>>>>>>>>>>>>> avoid duplicate code. However, it's a minor change. >>>>>>>> The >>>>>>>>>>>>> decision >>>>>>>>>>>>>>>>> is >>>>>>>>>>>>>>>>>>> up >>>>>>>>>>>>>>>>>>>>>>>> to you. >>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>> We decide not to provide common DDL options and let >>>>>>>>>>>>> developers >>>>>>>>>>>>>>>>> to >>>>>>>>>>>>>>>>>>> define their own options as we do now per connector. >>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>> What is the reason for that? One of the main goals of >>>>>>>>>> this >>>>>>>>>>>>> FLIP >>>>>>>>>>>>>>>>> was >>>>>>>>>>>>>>>>>>> to >>>>>>>>>>>>>>>>>>>>>>>> unify the configs, wasn't it? I understand that >>>>>>>> current >>>>>>>>>>> cache >>>>>>>>>>>>>>>>>> design >>>>>>>>>>>>>>>>>>>>>>>> doesn't depend on ConfigOptions, like was before. But >>>>>>>>>> still >>>>>>>>>>>>> we >>>>>>>>>>>>>>>>> can >>>>>>>>>>>>>>>>>>> put >>>>>>>>>>>>>>>>>>>>>>>> these options into the framework, so connectors can >>>>>>>>> reuse >>>>>>>>>>>>> them >>>>>>>>>>>>>>>>> and >>>>>>>>>>>>>>>>>>>>>>>> avoid code duplication, and, what is more >>>>>>>> significant, >>>>>>>>>>> avoid >>>>>>>>>>>>>>>>>> possible >>>>>>>>>>>>>>>>>>>>>>>> different options naming. This moment can be pointed >>>>>>>>> out >>>>>>>>>> in >>>>>>>>>>>>>>>>>>>>>>>> documentation for connector developers. >>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>> Best regards, >>>>>>>>>>>>>>>>>>>>>>>> Alexander >>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>> вт, 17 мая 2022 г. в 17:11, Qingsheng Ren < >>>>>>>>>>>>> renqs...@gmail.com>: >>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>> Hi Alexander, >>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>> Thanks for the review and glad to see we are on the >>>>>>>>> same >>>>>>>>>>>>> page! >>>>>>>>>>>>>>> I >>>>>>>>>>>>>>>>>>> think you forgot to cc the dev mailing list so I’m also >>>>>>>>>> quoting >>>>>>>>>>>>> your >>>>>>>>>>>>>>>>>> reply >>>>>>>>>>>>>>>>>>> under this email. >>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>> We can add 'maxRetryTimes' option into this class >>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>> In my opinion the retry logic should be implemented >>>>>>>> in >>>>>>>>>>>>> lookup() >>>>>>>>>>>>>>>>>>> instead of in LookupFunction#eval(). Retrying is only >>>>>>>>>> meaningful >>>>>>>>>>>>>>> under >>>>>>>>>>>>>>>>>> some >>>>>>>>>>>>>>>>>>> specific retriable failures, and there might be custom >>>>>>>> logic >>>>>>>>>>>>> before >>>>>>>>>>>>>>>>>> making >>>>>>>>>>>>>>>>>>> retry, such as re-establish the connection >>>>>>>>>>>>> (JdbcRowDataLookupFunction >>>>>>>>>>>>>>>>> is >>>>>>>>>>>>>>>>>> an >>>>>>>>>>>>>>>>>>> example), so it's more handy to leave it to the connector. >>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>> I don't see DDL options, that were in previous >>>>>>>>> version >>>>>>>>>> of >>>>>>>>>>>>>>> FLIP. >>>>>>>>>>>>>>>>>> Do >>>>>>>>>>>>>>>>>>> you have any special plans for them? >>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>> We decide not to provide common DDL options and let >>>>>>>>>>>>> developers >>>>>>>>>>>>>>>>> to >>>>>>>>>>>>>>>>>>> define their own options as we do now per connector. >>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>> The rest of comments sound great and I’ll update the >>>>>>>>>> FLIP. >>>>>>>>>>>>> Hope >>>>>>>>>>>>>>>>> we >>>>>>>>>>>>>>>>>>> can finalize our proposal soon! >>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>> Best, >>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>> Qingsheng >>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>> On May 17, 2022, at 13:46, Александр Смирнов < >>>>>>>>>>>>>>>>>> smirale...@gmail.com> >>>>>>>>>>>>>>>>>>> wrote: >>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>> Hi Qingsheng and devs! >>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>> I like the overall design of updated FLIP, however >>>>>>>> I >>>>>>>>>> have >>>>>>>>>>>>>>>>> several >>>>>>>>>>>>>>>>>>>>>>>>>> suggestions and questions. >>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>> 1) Introducing LookupFunction as a subclass of >>>>>>>>>>>>> TableFunction >>>>>>>>>>>>>>>>> is a >>>>>>>>>>>>>>>>>>> good >>>>>>>>>>>>>>>>>>>>>>>>>> idea. We can add 'maxRetryTimes' option into this >>>>>>>>>> class. >>>>>>>>>>>>>>> 'eval' >>>>>>>>>>>>>>>>>>> method >>>>>>>>>>>>>>>>>>>>>>>>>> of new LookupFunction is great for this purpose. >>>>>>>> The >>>>>>>>>> same >>>>>>>>>>>>> is >>>>>>>>>>>>>>>>> for >>>>>>>>>>>>>>>>>>>>>>>>>> 'async' case. >>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>> 2) There might be other configs in future, such as >>>>>>>>>>>>>>>>>>> 'cacheMissingKey' >>>>>>>>>>>>>>>>>>>>>>>>>> in LookupFunctionProvider or 'rescanInterval' in >>>>>>>>>>>>>>>>>>> ScanRuntimeProvider. >>>>>>>>>>>>>>>>>>>>>>>>>> Maybe use Builder pattern in LookupFunctionProvider >>>>>>>>> and >>>>>>>>>>>>>>>>>>>>>>>>>> RescanRuntimeProvider for more flexibility (use one >>>>>>>>>>> 'build' >>>>>>>>>>>>>>>>>> method >>>>>>>>>>>>>>>>>>>>>>>>>> instead of many 'of' methods in future)? >>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>> 3) What are the plans for existing >>>>>>>>>> TableFunctionProvider >>>>>>>>>>>>> and >>>>>>>>>>>>>>>>>>>>>>>>>> AsyncTableFunctionProvider? I think they should be >>>>>>>>>>>>> deprecated. >>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>> 4) Am I right that the current design does not >>>>>>>> assume >>>>>>>>>>>>> usage of >>>>>>>>>>>>>>>>>>>>>>>>>> user-provided LookupCache in re-scanning? In this >>>>>>>>> case, >>>>>>>>>>> it >>>>>>>>>>>>> is >>>>>>>>>>>>>>>>> not >>>>>>>>>>>>>>>>>>> very >>>>>>>>>>>>>>>>>>>>>>>>>> clear why do we need methods such as 'invalidate' >>>>>>>> or >>>>>>>>>>>>> 'putAll' >>>>>>>>>>>>>>>>> in >>>>>>>>>>>>>>>>>>>>>>>>>> LookupCache. >>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>> 5) I don't see DDL options, that were in previous >>>>>>>>>> version >>>>>>>>>>>>> of >>>>>>>>>>>>>>>>>> FLIP. >>>>>>>>>>>>>>>>>>> Do >>>>>>>>>>>>>>>>>>>>>>>>>> you have any special plans for them? >>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>> If you don't mind, I would be glad to be able to >>>>>>>> make >>>>>>>>>>> small >>>>>>>>>>>>>>>>>>>>>>>>>> adjustments to the FLIP document too. I think it's >>>>>>>>>> worth >>>>>>>>>>>>>>>>>> mentioning >>>>>>>>>>>>>>>>>>>>>>>>>> about what exactly optimizations are planning in >>>>>>>> the >>>>>>>>>>>>> future. >>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>> Best regards, >>>>>>>>>>>>>>>>>>>>>>>>>> Smirnov Alexander >>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>> пт, 13 мая 2022 г. в 20:27, Qingsheng Ren < >>>>>>>>>>>>> renqs...@gmail.com >>>>>>>>>>>>>>>>>> : >>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>> Hi Alexander and devs, >>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>> Thank you very much for the in-depth discussion! >>>>>>>> As >>>>>>>>>> Jark >>>>>>>>>>>>>>>>>>> mentioned we were inspired by Alexander's idea and made a >>>>>>>>>>>>> refactor on >>>>>>>>>>>>>>>>> our >>>>>>>>>>>>>>>>>>> design. FLIP-221 [1] has been updated to reflect our >>>>>>>> design >>>>>>>>>> now >>>>>>>>>>>>> and >>>>>>>>>>>>>>> we >>>>>>>>>>>>>>>>>> are >>>>>>>>>>>>>>>>>>> happy to hear more suggestions from you! >>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>> Compared to the previous design: >>>>>>>>>>>>>>>>>>>>>>>>>>> 1. The lookup cache serves at table runtime level >>>>>>>>> and >>>>>>>>>> is >>>>>>>>>>>>>>>>>>> integrated as a component of LookupJoinRunner as discussed >>>>>>>>>>>>>>> previously. >>>>>>>>>>>>>>>>>>>>>>>>>>> 2. Interfaces are renamed and re-designed to >>>>>>>> reflect >>>>>>>>>> the >>>>>>>>>>>>> new >>>>>>>>>>>>>>>>>>> design. >>>>>>>>>>>>>>>>>>>>>>>>>>> 3. We separate the all-caching case individually >>>>>>>> and >>>>>>>>>>>>>>>>> introduce a >>>>>>>>>>>>>>>>>>> new RescanRuntimeProvider to reuse the ability of >>>>>>>> scanning. >>>>>>>>> We >>>>>>>>>>> are >>>>>>>>>>>>>>>>>> planning >>>>>>>>>>>>>>>>>>> to support SourceFunction / InputFormat for now >>>>>>>> considering >>>>>>>>>> the >>>>>>>>>>>>>>>>>> complexity >>>>>>>>>>>>>>>>>>> of FLIP-27 Source API. >>>>>>>>>>>>>>>>>>>>>>>>>>> 4. A new interface LookupFunction is introduced to >>>>>>>>>> make >>>>>>>>>>>>> the >>>>>>>>>>>>>>>>>>> semantic of lookup more straightforward for developers. >>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>> For replying to Alexander: >>>>>>>>>>>>>>>>>>>>>>>>>>>> However I'm a little confused whether InputFormat >>>>>>>>> is >>>>>>>>>>>>>>>>> deprecated >>>>>>>>>>>>>>>>>>> or not. Am I right that it will be so in the future, but >>>>>>>>>>> currently >>>>>>>>>>>>>>> it's >>>>>>>>>>>>>>>>>> not? >>>>>>>>>>>>>>>>>>>>>>>>>>> Yes you are right. InputFormat is not deprecated >>>>>>>> for >>>>>>>>>>> now. >>>>>>>>>>>>> I >>>>>>>>>>>>>>>>>> think >>>>>>>>>>>>>>>>>>> it will be deprecated in the future but we don't have a >>>>>>>>> clear >>>>>>>>>>> plan >>>>>>>>>>>>>>> for >>>>>>>>>>>>>>>>>> that. >>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>> Thanks again for the discussion on this FLIP and >>>>>>>>>> looking >>>>>>>>>>>>>>>>> forward >>>>>>>>>>>>>>>>>>> to cooperating with you after we finalize the design and >>>>>>>>>>>>> interfaces! >>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>> [1] >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>> >>>>>>>>> >>>>>>>> >>>>>> >>>>> https://cwiki.apache.org/confluence/display/FLINK/FLIP-221+Abstraction+for+lookup+source+cache+and+metric >>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>> Best regards, >>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>> Qingsheng >>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>> On Fri, May 13, 2022 at 12:12 AM Александр >>>>>>>> Смирнов < >>>>>>>>>>>>>>>>>>> smirale...@gmail.com> wrote: >>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>> Hi Jark, Qingsheng and Leonard! >>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>> Glad to see that we came to a consensus on almost >>>>>>>>> all >>>>>>>>>>>>>>> points! >>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>> However I'm a little confused whether InputFormat >>>>>>>>> is >>>>>>>>>>>>>>>>> deprecated >>>>>>>>>>>>>>>>>>> or >>>>>>>>>>>>>>>>>>>>>>>>>>>> not. Am I right that it will be so in the future, >>>>>>>>> but >>>>>>>>>>>>>>>>> currently >>>>>>>>>>>>>>>>>>> it's >>>>>>>>>>>>>>>>>>>>>>>>>>>> not? Actually I also think that for the first >>>>>>>>> version >>>>>>>>>>>>> it's >>>>>>>>>>>>>>> OK >>>>>>>>>>>>>>>>>> to >>>>>>>>>>>>>>>>>>> use >>>>>>>>>>>>>>>>>>>>>>>>>>>> InputFormat in ALL cache realization, because >>>>>>>>>>> supporting >>>>>>>>>>>>>>>>> rescan >>>>>>>>>>>>>>>>>>>>>>>>>>>> ability seems like a very distant prospect. But >>>>>>>> for >>>>>>>>>>> this >>>>>>>>>>>>>>>>>>> decision we >>>>>>>>>>>>>>>>>>>>>>>>>>>> need a consensus among all discussion >>>>>>>> participants. >>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>> In general, I don't have something to argue with >>>>>>>>> your >>>>>>>>>>>>>>>>>>> statements. All >>>>>>>>>>>>>>>>>>>>>>>>>>>> of them correspond my ideas. Looking ahead, it >>>>>>>>> would >>>>>>>>>> be >>>>>>>>>>>>> nice >>>>>>>>>>>>>>>>> to >>>>>>>>>>>>>>>>>>> work >>>>>>>>>>>>>>>>>>>>>>>>>>>> on this FLIP cooperatively. I've already done a >>>>>>>> lot >>>>>>>>>> of >>>>>>>>>>>>> work >>>>>>>>>>>>>>>>> on >>>>>>>>>>>>>>>>>>> lookup >>>>>>>>>>>>>>>>>>>>>>>>>>>> join caching with realization very close to the >>>>>>>> one >>>>>>>>>> we >>>>>>>>>>>>> are >>>>>>>>>>>>>>>>>>> discussing, >>>>>>>>>>>>>>>>>>>>>>>>>>>> and want to share the results of this work. >>>>>>>> Anyway >>>>>>>>>>>>> looking >>>>>>>>>>>>>>>>>>> forward for >>>>>>>>>>>>>>>>>>>>>>>>>>>> the FLIP update! >>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>> Best regards, >>>>>>>>>>>>>>>>>>>>>>>>>>>> Smirnov Alexander >>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>> чт, 12 мая 2022 г. в 17:38, Jark Wu < >>>>>>>>>> imj...@gmail.com >>>>>>>>>>>> : >>>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>> Hi Alex, >>>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>> Thanks for summarizing your points. >>>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>> In the past week, Qingsheng, Leonard, and I have >>>>>>>>>>>>> discussed >>>>>>>>>>>>>>>>> it >>>>>>>>>>>>>>>>>>> several times >>>>>>>>>>>>>>>>>>>>>>>>>>>>> and we have totally refactored the design. >>>>>>>>>>>>>>>>>>>>>>>>>>>>> I'm glad to say we have reached a consensus on >>>>>>>>> many >>>>>>>>>> of >>>>>>>>>>>>> your >>>>>>>>>>>>>>>>>>> points! >>>>>>>>>>>>>>>>>>>>>>>>>>>>> Qingsheng is still working on updating the >>>>>>>> design >>>>>>>>>> docs >>>>>>>>>>>>> and >>>>>>>>>>>>>>>>>>> maybe can be >>>>>>>>>>>>>>>>>>>>>>>>>>>>> available in the next few days. >>>>>>>>>>>>>>>>>>>>>>>>>>>>> I will share some conclusions from our >>>>>>>>> discussions: >>>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>> 1) we have refactored the design towards to >>>>>>>> "cache >>>>>>>>>> in >>>>>>>>>>>>>>>>>>> framework" way. >>>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>> 2) a "LookupCache" interface for users to >>>>>>>>> customize >>>>>>>>>>> and >>>>>>>>>>>>> a >>>>>>>>>>>>>>>>>>> default >>>>>>>>>>>>>>>>>>>>>>>>>>>>> implementation with builder for users to >>>>>>>> easy-use. >>>>>>>>>>>>>>>>>>>>>>>>>>>>> This can both make it possible to both have >>>>>>>>>>> flexibility >>>>>>>>>>>>> and >>>>>>>>>>>>>>>>>>> conciseness. >>>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>> 3) Filter pushdown is important for ALL and LRU >>>>>>>>>> lookup >>>>>>>>>>>>>>>>> cache, >>>>>>>>>>>>>>>>>>> esp reducing >>>>>>>>>>>>>>>>>>>>>>>>>>>>> IO. >>>>>>>>>>>>>>>>>>>>>>>>>>>>> Filter pushdown should be the final state and >>>>>>>> the >>>>>>>>>>>>> unified >>>>>>>>>>>>>>>>> way >>>>>>>>>>>>>>>>>>> to both >>>>>>>>>>>>>>>>>>>>>>>>>>>>> support pruning ALL cache and LRU cache, >>>>>>>>>>>>>>>>>>>>>>>>>>>>> so I think we should make effort in this >>>>>>>>> direction. >>>>>>>>>> If >>>>>>>>>>>>> we >>>>>>>>>>>>>>>>> need >>>>>>>>>>>>>>>>>>> to support >>>>>>>>>>>>>>>>>>>>>>>>>>>>> filter pushdown for ALL cache anyway, why not >>>>>>>> use >>>>>>>>>>>>>>>>>>>>>>>>>>>>> it for LRU cache as well? Either way, as we >>>>>>>> decide >>>>>>>>>> to >>>>>>>>>>>>>>>>>> implement >>>>>>>>>>>>>>>>>>> the cache >>>>>>>>>>>>>>>>>>>>>>>>>>>>> in the framework, we have the chance to support >>>>>>>>>>>>>>>>>>>>>>>>>>>>> filter on cache anytime. This is an optimization >>>>>>>>> and >>>>>>>>>>> it >>>>>>>>>>>>>>>>>> doesn't >>>>>>>>>>>>>>>>>>> affect the >>>>>>>>>>>>>>>>>>>>>>>>>>>>> public API. I think we can create a JIRA issue >>>>>>>> to >>>>>>>>>>>>>>>>>>>>>>>>>>>>> discuss it when the FLIP is accepted. >>>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>> 4) The idea to support ALL cache is similar to >>>>>>>>> your >>>>>>>>>>>>>>>>> proposal. >>>>>>>>>>>>>>>>>>>>>>>>>>>>> In the first version, we will only support >>>>>>>>>>> InputFormat, >>>>>>>>>>>>>>>>>>> SourceFunction for >>>>>>>>>>>>>>>>>>>>>>>>>>>>> cache all (invoke InputFormat in join operator). >>>>>>>>>>>>>>>>>>>>>>>>>>>>> For FLIP-27 source, we need to join a true >>>>>>>> source >>>>>>>>>>>>> operator >>>>>>>>>>>>>>>>>>> instead of >>>>>>>>>>>>>>>>>>>>>>>>>>>>> calling it embedded in the join operator. >>>>>>>>>>>>>>>>>>>>>>>>>>>>> However, this needs another FLIP to support the >>>>>>>>>>> re-scan >>>>>>>>>>>>>>>>>> ability >>>>>>>>>>>>>>>>>>> for FLIP-27 >>>>>>>>>>>>>>>>>>>>>>>>>>>>> Source, and this can be a large work. >>>>>>>>>>>>>>>>>>>>>>>>>>>>> In order to not block this issue, we can put the >>>>>>>>>>> effort >>>>>>>>>>>>> of >>>>>>>>>>>>>>>>>>> FLIP-27 source >>>>>>>>>>>>>>>>>>>>>>>>>>>>> integration into future work and integrate >>>>>>>>>>>>>>>>>>>>>>>>>>>>> InputFormat&SourceFunction for now. >>>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>> I think it's fine to use >>>>>>>>> InputFormat&SourceFunction, >>>>>>>>>>> as >>>>>>>>>>>>>>> they >>>>>>>>>>>>>>>>>>> are not >>>>>>>>>>>>>>>>>>>>>>>>>>>>> deprecated, otherwise, we have to introduce >>>>>>>>> another >>>>>>>>>>>>>>> function >>>>>>>>>>>>>>>>>>>>>>>>>>>>> similar to them which is meaningless. We need to >>>>>>>>>> plan >>>>>>>>>>>>>>>>> FLIP-27 >>>>>>>>>>>>>>>>>>> source >>>>>>>>>>>>>>>>>>>>>>>>>>>>> integration ASAP before InputFormat & >>>>>>>>> SourceFunction >>>>>>>>>>> are >>>>>>>>>>>>>>>>>>> deprecated. >>>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>> Best, >>>>>>>>>>>>>>>>>>>>>>>>>>>>> Jark >>>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>> On Thu, 12 May 2022 at 15:46, Александр Смирнов >>>>>>>> < >>>>>>>>>>>>>>>>>>> smirale...@gmail.com> >>>>>>>>>>>>>>>>>>>>>>>>>>>>> wrote: >>>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Hi Martijn! >>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Got it. Therefore, the realization with >>>>>>>>> InputFormat >>>>>>>>>>> is >>>>>>>>>>>>> not >>>>>>>>>>>>>>>>>>> considered. >>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Thanks for clearing that up! >>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Best regards, >>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Smirnov Alexander >>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>> чт, 12 мая 2022 г. в 14:23, Martijn Visser < >>>>>>>>>>>>>>>>>>> mart...@ververica.com>: >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Hi, >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> With regards to: >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> But if there are plans to refactor all >>>>>>>>> connectors >>>>>>>>>>> to >>>>>>>>>>>>>>>>>> FLIP-27 >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Yes, FLIP-27 is the target for all connectors. >>>>>>>>> The >>>>>>>>>>> old >>>>>>>>>>>>>>>>>>> interfaces will be >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> deprecated and connectors will either be >>>>>>>>>> refactored >>>>>>>>>>> to >>>>>>>>>>>>>>> use >>>>>>>>>>>>>>>>>>> the new ones >>>>>>>>>>>>>>>>>>>>>>>>>>>>>> or >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> dropped. >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> The caching should work for connectors that >>>>>>>> are >>>>>>>>>>> using >>>>>>>>>>>>>>>>>> FLIP-27 >>>>>>>>>>>>>>>>>>> interfaces, >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> we should not introduce new features for old >>>>>>>>>>>>> interfaces. >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Best regards, >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Martijn >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> On Thu, 12 May 2022 at 06:19, Александр >>>>>>>> Смирнов >>>>>>>>> < >>>>>>>>>>>>>>>>>>> smirale...@gmail.com> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> wrote: >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Hi Jark! >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Sorry for the late response. I would like to >>>>>>>>> make >>>>>>>>>>>>> some >>>>>>>>>>>>>>>>>>> comments and >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> clarify my points. >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> 1) I agree with your first statement. I think >>>>>>>>> we >>>>>>>>>>> can >>>>>>>>>>>>>>>>>> achieve >>>>>>>>>>>>>>>>>>> both >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> advantages this way: put the Cache interface >>>>>>>> in >>>>>>>>>>>>>>>>>>> flink-table-common, >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> but have implementations of it in >>>>>>>>>>>>> flink-table-runtime. >>>>>>>>>>>>>>>>>>> Therefore if a >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> connector developer wants to use existing >>>>>>>> cache >>>>>>>>>>>>>>>>> strategies >>>>>>>>>>>>>>>>>>> and their >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> implementations, he can just pass >>>>>>>> lookupConfig >>>>>>>>> to >>>>>>>>>>> the >>>>>>>>>>>>>>>>>>> planner, but if >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> he wants to have its own cache implementation >>>>>>>>> in >>>>>>>>>>> his >>>>>>>>>>>>>>>>>>> TableFunction, it >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> will be possible for him to use the existing >>>>>>>>>>>>> interface >>>>>>>>>>>>>>>>> for >>>>>>>>>>>>>>>>>>> this >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> purpose (we can explicitly point this out in >>>>>>>>> the >>>>>>>>>>>>>>>>>>> documentation). In >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> this way all configs and metrics will be >>>>>>>>> unified. >>>>>>>>>>>>> WDYT? >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> If a filter can prune 90% of data in the >>>>>>>>> cache, >>>>>>>>>> we >>>>>>>>>>>>> will >>>>>>>>>>>>>>>>>>> have 90% of >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> lookup requests that can never be cached >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> 2) Let me clarify the logic filters >>>>>>>>> optimization >>>>>>>>>> in >>>>>>>>>>>>> case >>>>>>>>>>>>>>>>> of >>>>>>>>>>>>>>>>>>> LRU cache. >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> It looks like Cache<RowData, >>>>>>>>>> Collection<RowData>>. >>>>>>>>>>>>> Here >>>>>>>>>>>>>>>>> we >>>>>>>>>>>>>>>>>>> always >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> store the response of the dimension table in >>>>>>>>>> cache, >>>>>>>>>>>>> even >>>>>>>>>>>>>>>>>>> after >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> applying calc function. I.e. if there are no >>>>>>>>> rows >>>>>>>>>>>>> after >>>>>>>>>>>>>>>>>>> applying >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> filters to the result of the 'eval' method of >>>>>>>>>>>>>>>>>> TableFunction, >>>>>>>>>>>>>>>>>>> we store >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> the empty list by lookup keys. Therefore the >>>>>>>>>> cache >>>>>>>>>>>>> line >>>>>>>>>>>>>>>>>> will >>>>>>>>>>>>>>>>>>> be >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> filled, but will require much less memory (in >>>>>>>>>>> bytes). >>>>>>>>>>>>>>>>> I.e. >>>>>>>>>>>>>>>>>>> we don't >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> completely filter keys, by which result was >>>>>>>>>> pruned, >>>>>>>>>>>>> but >>>>>>>>>>>>>>>>>>> significantly >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> reduce required memory to store this result. >>>>>>>> If >>>>>>>>>> the >>>>>>>>>>>>> user >>>>>>>>>>>>>>>>>>> knows about >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> this behavior, he can increase the 'max-rows' >>>>>>>>>>> option >>>>>>>>>>>>>>>>> before >>>>>>>>>>>>>>>>>>> the start >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> of the job. But actually I came up with the >>>>>>>>> idea >>>>>>>>>>>>> that we >>>>>>>>>>>>>>>>>> can >>>>>>>>>>>>>>>>>>> do this >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> automatically by using the 'maximumWeight' >>>>>>>> and >>>>>>>>>>>>> 'weigher' >>>>>>>>>>>>>>>>>>> methods of >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> GuavaCache [1]. Weight can be the size of the >>>>>>>>>>>>> collection >>>>>>>>>>>>>>>>> of >>>>>>>>>>>>>>>>>>> rows >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> (value of cache). Therefore cache can >>>>>>>>>> automatically >>>>>>>>>>>>> fit >>>>>>>>>>>>>>>>>> much >>>>>>>>>>>>>>>>>>> more >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> records than before. >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Flink SQL has provided a standard way to do >>>>>>>>>>> filters >>>>>>>>>>>>> and >>>>>>>>>>>>>>>>>>> projects >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> pushdown, i.e., SupportsFilterPushDown and >>>>>>>>>>>>>>>>>>> SupportsProjectionPushDown. >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Jdbc/hive/HBase haven't implemented the >>>>>>>>>>> interfaces, >>>>>>>>>>>>>>>>> don't >>>>>>>>>>>>>>>>>>> mean it's >>>>>>>>>>>>>>>>>>>>>>>>>>>>>> hard >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> to implement. >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> It's debatable how difficult it will be to >>>>>>>>>>> implement >>>>>>>>>>>>>>>>> filter >>>>>>>>>>>>>>>>>>> pushdown. >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> But I think the fact that currently there is >>>>>>>> no >>>>>>>>>>>>> database >>>>>>>>>>>>>>>>>>> connector >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> with filter pushdown at least means that this >>>>>>>>>>> feature >>>>>>>>>>>>>>>>> won't >>>>>>>>>>>>>>>>>>> be >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> supported soon in connectors. Moreover, if we >>>>>>>>>> talk >>>>>>>>>>>>> about >>>>>>>>>>>>>>>>>>> other >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> connectors (not in Flink repo), their >>>>>>>> databases >>>>>>>>>>> might >>>>>>>>>>>>>>> not >>>>>>>>>>>>>>>>>>> support all >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Flink filters (or not support filters at >>>>>>>> all). >>>>>>>>> I >>>>>>>>>>>>> think >>>>>>>>>>>>>>>>>> users >>>>>>>>>>>>>>>>>>> are >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> interested in supporting cache filters >>>>>>>>>> optimization >>>>>>>>>>>>>>>>>>> independently of >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> supporting other features and solving more >>>>>>>>>> complex >>>>>>>>>>>>>>>>> problems >>>>>>>>>>>>>>>>>>> (or >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> unsolvable at all). >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> 3) I agree with your third statement. >>>>>>>> Actually >>>>>>>>> in >>>>>>>>>>> our >>>>>>>>>>>>>>>>>>> internal version >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> I also tried to unify the logic of scanning >>>>>>>> and >>>>>>>>>>>>>>> reloading >>>>>>>>>>>>>>>>>>> data from >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> connectors. But unfortunately, I didn't find >>>>>>>> a >>>>>>>>>> way >>>>>>>>>>> to >>>>>>>>>>>>>>>>> unify >>>>>>>>>>>>>>>>>>> the logic >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> of all ScanRuntimeProviders (InputFormat, >>>>>>>>>>>>>>> SourceFunction, >>>>>>>>>>>>>>>>>>> Source,...) >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> and reuse it in reloading ALL cache. As a >>>>>>>>> result >>>>>>>>>> I >>>>>>>>>>>>>>>>> settled >>>>>>>>>>>>>>>>>>> on using >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> InputFormat, because it was used for scanning >>>>>>>>> in >>>>>>>>>>> all >>>>>>>>>>>>>>>>> lookup >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> connectors. (I didn't know that there are >>>>>>>> plans >>>>>>>>>> to >>>>>>>>>>>>>>>>>> deprecate >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> InputFormat in favor of FLIP-27 Source). IMO >>>>>>>>>> usage >>>>>>>>>>> of >>>>>>>>>>>>>>>>>>> FLIP-27 source >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> in ALL caching is not good idea, because this >>>>>>>>>>> source >>>>>>>>>>>>> was >>>>>>>>>>>>>>>>>>> designed to >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> work in distributed environment >>>>>>>>> (SplitEnumerator >>>>>>>>>> on >>>>>>>>>>>>>>>>>>> JobManager and >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> SourceReaders on TaskManagers), not in one >>>>>>>>>> operator >>>>>>>>>>>>>>>>> (lookup >>>>>>>>>>>>>>>>>>> join >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> operator in our case). There is even no >>>>>>>> direct >>>>>>>>>> way >>>>>>>>>>> to >>>>>>>>>>>>>>>>> pass >>>>>>>>>>>>>>>>>>> splits from >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> SplitEnumerator to SourceReader (this logic >>>>>>>>> works >>>>>>>>>>>>>>> through >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> SplitEnumeratorContext, which requires >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> OperatorCoordinator.SubtaskGateway to send >>>>>>>>>>>>>>>>> AddSplitEvents). >>>>>>>>>>>>>>>>>>> Usage of >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> InputFormat for ALL cache seems much more >>>>>>>>> clearer >>>>>>>>>>> and >>>>>>>>>>>>>>>>>>> easier. But if >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> there are plans to refactor all connectors to >>>>>>>>>>>>> FLIP-27, I >>>>>>>>>>>>>>>>>>> have the >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> following ideas: maybe we can refuse from >>>>>>>>> lookup >>>>>>>>>>> join >>>>>>>>>>>>>>> ALL >>>>>>>>>>>>>>>>>>> cache in >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> favor of simple join with multiple scanning >>>>>>>> of >>>>>>>>>>> batch >>>>>>>>>>>>>>>>>> source? >>>>>>>>>>>>>>>>>>> The point >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> is that the only difference between lookup >>>>>>>> join >>>>>>>>>> ALL >>>>>>>>>>>>>>> cache >>>>>>>>>>>>>>>>>>> and simple >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> join with batch source is that in the first >>>>>>>>> case >>>>>>>>>>>>>>> scanning >>>>>>>>>>>>>>>>>> is >>>>>>>>>>>>>>>>>>> performed >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> multiple times, in between which state >>>>>>>> (cache) >>>>>>>>> is >>>>>>>>>>>>>>> cleared >>>>>>>>>>>>>>>>>>> (correct me >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> if I'm wrong). So what if we extend the >>>>>>>>>>>>> functionality of >>>>>>>>>>>>>>>>>>> simple join >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> to support state reloading + extend the >>>>>>>>>>>>> functionality of >>>>>>>>>>>>>>>>>>> scanning >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> batch source multiple times (this one should >>>>>>>> be >>>>>>>>>>> easy >>>>>>>>>>>>>>> with >>>>>>>>>>>>>>>>>>> new FLIP-27 >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> source, that unifies streaming/batch reading >>>>>>>> - >>>>>>>>> we >>>>>>>>>>>>> will >>>>>>>>>>>>>>>>> need >>>>>>>>>>>>>>>>>>> to change >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> only SplitEnumerator, which will pass splits >>>>>>>>>> again >>>>>>>>>>>>> after >>>>>>>>>>>>>>>>>>> some TTL). >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> WDYT? I must say that this looks like a >>>>>>>>> long-term >>>>>>>>>>>>> goal >>>>>>>>>>>>>>>>> and >>>>>>>>>>>>>>>>>>> will make >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> the scope of this FLIP even larger than you >>>>>>>>> said. >>>>>>>>>>>>> Maybe >>>>>>>>>>>>>>>>> we >>>>>>>>>>>>>>>>>>> can limit >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> ourselves to a simpler solution now >>>>>>>>>> (InputFormats). >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> So to sum up, my points is like this: >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> 1) There is a way to make both concise and >>>>>>>>>> flexible >>>>>>>>>>>>>>>>>>> interfaces for >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> caching in lookup join. >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> 2) Cache filters optimization is important >>>>>>>> both >>>>>>>>>> in >>>>>>>>>>>>> LRU >>>>>>>>>>>>>>>>> and >>>>>>>>>>>>>>>>>>> ALL caches. >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> 3) It is unclear when filter pushdown will be >>>>>>>>>>>>> supported >>>>>>>>>>>>>>>>> in >>>>>>>>>>>>>>>>>>> Flink >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> connectors, some of the connectors might not >>>>>>>>> have >>>>>>>>>>> the >>>>>>>>>>>>>>>>>>> opportunity to >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> support filter pushdown + as I know, >>>>>>>> currently >>>>>>>>>>> filter >>>>>>>>>>>>>>>>>>> pushdown works >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> only for scanning (not lookup). So cache >>>>>>>>> filters >>>>>>>>>> + >>>>>>>>>>>>>>>>>>> projections >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> optimization should be independent from other >>>>>>>>>>>>> features. >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> 4) ALL cache realization is a complex topic >>>>>>>>> that >>>>>>>>>>>>>>> involves >>>>>>>>>>>>>>>>>>> multiple >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> aspects of how Flink is developing. Refusing >>>>>>>>> from >>>>>>>>>>>>>>>>>>> InputFormat in favor >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> of FLIP-27 Source will make ALL cache >>>>>>>>> realization >>>>>>>>>>>>> really >>>>>>>>>>>>>>>>>>> complex and >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> not clear, so maybe instead of that we can >>>>>>>>> extend >>>>>>>>>>> the >>>>>>>>>>>>>>>>>>> functionality of >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> simple join or not refuse from InputFormat in >>>>>>>>>> case >>>>>>>>>>> of >>>>>>>>>>>>>>>>>> lookup >>>>>>>>>>>>>>>>>>> join ALL >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> cache? >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Best regards, >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Smirnov Alexander >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> [1] >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>> >>>>>>>>> >>>>>>>> >>>>>> >>>>> https://guava.dev/releases/18.0/api/docs/com/google/common/cache/CacheBuilder.html#weigher(com.google.common.cache.Weigher) >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> чт, 5 мая 2022 г. в 20:34, Jark Wu < >>>>>>>>>>> imj...@gmail.com >>>>>>>>>>>>>> : >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> It's great to see the active discussion! I >>>>>>>>> want >>>>>>>>>> to >>>>>>>>>>>>>>> share >>>>>>>>>>>>>>>>>> my >>>>>>>>>>>>>>>>>>> ideas: >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> 1) implement the cache in framework vs. >>>>>>>>>> connectors >>>>>>>>>>>>> base >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> I don't have a strong opinion on this. Both >>>>>>>>> ways >>>>>>>>>>>>> should >>>>>>>>>>>>>>>>>>> work (e.g., >>>>>>>>>>>>>>>>>>>>>>>>>>>>>> cache >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> pruning, compatibility). >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> The framework way can provide more concise >>>>>>>>>>>>> interfaces. >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> The connector base way can define more >>>>>>>>> flexible >>>>>>>>>>>>> cache >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> strategies/implementations. >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> We are still investigating a way to see if >>>>>>>> we >>>>>>>>>> can >>>>>>>>>>>>> have >>>>>>>>>>>>>>>>>> both >>>>>>>>>>>>>>>>>>>>>>>>>>>>>> advantages. >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> We should reach a consensus that the way >>>>>>>>> should >>>>>>>>>>> be a >>>>>>>>>>>>>>>>> final >>>>>>>>>>>>>>>>>>> state, >>>>>>>>>>>>>>>>>>>>>>>>>>>>>> and we >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> are on the path to it. >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> 2) filters and projections pushdown: >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> I agree with Alex that the filter pushdown >>>>>>>>> into >>>>>>>>>>>>> cache >>>>>>>>>>>>>>>>> can >>>>>>>>>>>>>>>>>>> benefit a >>>>>>>>>>>>>>>>>>>>>>>>>>>>>> lot >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> for >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> ALL cache. >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> However, this is not true for LRU cache. >>>>>>>>>>> Connectors >>>>>>>>>>>>> use >>>>>>>>>>>>>>>>>>> cache to >>>>>>>>>>>>>>>>>>>>>>>>>>>>>> reduce >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> IO >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> requests to databases for better throughput. >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> If a filter can prune 90% of data in the >>>>>>>>> cache, >>>>>>>>>> we >>>>>>>>>>>>> will >>>>>>>>>>>>>>>>>>> have 90% of >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> lookup >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> requests that can never be cached >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> and hit directly to the databases. That >>>>>>>> means >>>>>>>>>> the >>>>>>>>>>>>> cache >>>>>>>>>>>>>>>>> is >>>>>>>>>>>>>>>>>>>>>>>>>>>>>> meaningless in >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> this case. >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> IMO, Flink SQL has provided a standard way >>>>>>>> to >>>>>>>>> do >>>>>>>>>>>>>>> filters >>>>>>>>>>>>>>>>>>> and projects >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> pushdown, i.e., SupportsFilterPushDown and >>>>>>>>>>>>>>>>>>>>>>>>>>>>>> SupportsProjectionPushDown. >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Jdbc/hive/HBase haven't implemented the >>>>>>>>>>> interfaces, >>>>>>>>>>>>>>>>> don't >>>>>>>>>>>>>>>>>>> mean it's >>>>>>>>>>>>>>>>>>>>>>>>>>>>>> hard >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> to >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> implement. >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> They should implement the pushdown >>>>>>>> interfaces >>>>>>>>> to >>>>>>>>>>>>> reduce >>>>>>>>>>>>>>>>> IO >>>>>>>>>>>>>>>>>>> and the >>>>>>>>>>>>>>>>>>>>>>>>>>>>>> cache >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> size. >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> That should be a final state that the scan >>>>>>>>>> source >>>>>>>>>>>>> and >>>>>>>>>>>>>>>>>>> lookup source >>>>>>>>>>>>>>>>>>>>>>>>>>>>>> share >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> the exact pushdown implementation. >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> I don't see why we need to duplicate the >>>>>>>>>> pushdown >>>>>>>>>>>>> logic >>>>>>>>>>>>>>>>> in >>>>>>>>>>>>>>>>>>> caches, >>>>>>>>>>>>>>>>>>>>>>>>>>>>>> which >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> will complex the lookup join design. >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> 3) ALL cache abstraction >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> All cache might be the most challenging part >>>>>>>>> of >>>>>>>>>>> this >>>>>>>>>>>>>>>>> FLIP. >>>>>>>>>>>>>>>>>>> We have >>>>>>>>>>>>>>>>>>>>>>>>>>>>>> never >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> provided a reload-lookup public interface. >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Currently, we put the reload logic in the >>>>>>>>> "eval" >>>>>>>>>>>>> method >>>>>>>>>>>>>>>>> of >>>>>>>>>>>>>>>>>>>>>>>>>>>>>> TableFunction. >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> That's hard for some sources (e.g., Hive). >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Ideally, connector implementation should >>>>>>>> share >>>>>>>>>> the >>>>>>>>>>>>>>> logic >>>>>>>>>>>>>>>>>> of >>>>>>>>>>>>>>>>>>> reload >>>>>>>>>>>>>>>>>>>>>>>>>>>>>> and >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> scan, i.e. ScanTableSource with >>>>>>>>>>>>>>>>>>> InputFormat/SourceFunction/FLIP-27 >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Source. >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> However, InputFormat/SourceFunction are >>>>>>>>>>> deprecated, >>>>>>>>>>>>> and >>>>>>>>>>>>>>>>>> the >>>>>>>>>>>>>>>>>>> FLIP-27 >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> source >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> is deeply coupled with SourceOperator. >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> If we want to invoke the FLIP-27 source in >>>>>>>>>>>>> LookupJoin, >>>>>>>>>>>>>>>>>> this >>>>>>>>>>>>>>>>>>> may make >>>>>>>>>>>>>>>>>>>>>>>>>>>>>> the >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> scope of this FLIP much larger. >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> We are still investigating how to abstract >>>>>>>> the >>>>>>>>>> ALL >>>>>>>>>>>>>>> cache >>>>>>>>>>>>>>>>>>> logic and >>>>>>>>>>>>>>>>>>>>>>>>>>>>>> reuse >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> the existing source interfaces. >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Best, >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Jark >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> On Thu, 5 May 2022 at 20:22, Roman Boyko < >>>>>>>>>>>>>>>>>>> ro.v.bo...@gmail.com> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>> wrote: >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> It's a much more complicated activity and >>>>>>>>> lies >>>>>>>>>>> out >>>>>>>>>>>>> of >>>>>>>>>>>>>>>>> the >>>>>>>>>>>>>>>>>>> scope of >>>>>>>>>>>>>>>>>>>>>>>>>>>>>> this >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> improvement. Because such pushdowns should >>>>>>>> be >>>>>>>>>>> done >>>>>>>>>>>>> for >>>>>>>>>>>>>>>>>> all >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> ScanTableSource >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> implementations (not only for Lookup ones). >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> On Thu, 5 May 2022 at 19:02, Martijn >>>>>>>> Visser < >>>>>>>>>>>>>>>>>>>>>>>>>>>>>> martijnvis...@apache.org> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> wrote: >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Hi everyone, >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> One question regarding "And Alexander >>>>>>>>>> correctly >>>>>>>>>>>>>>>>>> mentioned >>>>>>>>>>>>>>>>>>> that >>>>>>>>>>>>>>>>>>>>>>>>>>>>>> filter >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> pushdown still is not implemented for >>>>>>>>>>>>>>>>> jdbc/hive/hbase." >>>>>>>>>>>>>>>>>>> -> Would >>>>>>>>>>>>>>>>>>>>>>>>>>>>>> an >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> alternative solution be to actually >>>>>>>>> implement >>>>>>>>>>>>> these >>>>>>>>>>>>>>>>>> filter >>>>>>>>>>>>>>>>>>>>>>>>>>>>>> pushdowns? >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> I >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> can >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> imagine that there are many more benefits >>>>>>>> to >>>>>>>>>>> doing >>>>>>>>>>>>>>>>> that, >>>>>>>>>>>>>>>>>>> outside >>>>>>>>>>>>>>>>>>>>>>>>>>>>>> of >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> lookup >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> caching and metrics. >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Best regards, >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Martijn Visser >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> https://twitter.com/MartijnVisser82 >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> https://github.com/MartijnVisser >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> On Thu, 5 May 2022 at 13:58, Roman Boyko < >>>>>>>>>>>>>>>>>>> ro.v.bo...@gmail.com> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> wrote: >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Hi everyone! >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Thanks for driving such a valuable >>>>>>>>>> improvement! >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> I do think that single cache >>>>>>>> implementation >>>>>>>>>>>>> would be >>>>>>>>>>>>>>>>> a >>>>>>>>>>>>>>>>>>> nice >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> opportunity >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> for >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> users. And it will break the "FOR >>>>>>>>> SYSTEM_TIME >>>>>>>>>>> AS >>>>>>>>>>>>> OF >>>>>>>>>>>>>>>>>>> proc_time" >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> semantics >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> anyway - doesn't matter how it will be >>>>>>>>>>>>> implemented. >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Putting myself in the user's shoes, I can >>>>>>>>> say >>>>>>>>>>>>> that: >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> 1) I would prefer to have the opportunity >>>>>>>>> to >>>>>>>>>>> cut >>>>>>>>>>>>> off >>>>>>>>>>>>>>>>>> the >>>>>>>>>>>>>>>>>>> cache >>>>>>>>>>>>>>>>>>>>>>>>>>>>>> size >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> by >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> simply filtering unnecessary data. And >>>>>>>> the >>>>>>>>>> most >>>>>>>>>>>>>>> handy >>>>>>>>>>>>>>>>>>> way to do >>>>>>>>>>>>>>>>>>>>>>>>>>>>>> it >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> is >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> apply >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> it inside LookupRunners. It would be a >>>>>>>> bit >>>>>>>>>>>>> harder to >>>>>>>>>>>>>>>>>>> pass it >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> through the >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> LookupJoin node to TableFunction. And >>>>>>>>>> Alexander >>>>>>>>>>>>>>>>>> correctly >>>>>>>>>>>>>>>>>>>>>>>>>>>>>> mentioned >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> that >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> filter pushdown still is not implemented >>>>>>>>> for >>>>>>>>>>>>>>>>>>> jdbc/hive/hbase. >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> 2) The ability to set the different >>>>>>>> caching >>>>>>>>>>>>>>>>> parameters >>>>>>>>>>>>>>>>>>> for >>>>>>>>>>>>>>>>>>>>>>>>>>>>>> different >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> tables >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> is quite important. So I would prefer to >>>>>>>>> set >>>>>>>>>> it >>>>>>>>>>>>>>>>> through >>>>>>>>>>>>>>>>>>> DDL >>>>>>>>>>>>>>>>>>>>>>>>>>>>>> rather >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> than >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> have similar ttla, strategy and other >>>>>>>>> options >>>>>>>>>>> for >>>>>>>>>>>>>>> all >>>>>>>>>>>>>>>>>>> lookup >>>>>>>>>>>>>>>>>>>>>>>>>>>>>> tables. >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> 3) Providing the cache into the framework >>>>>>>>>>> really >>>>>>>>>>>>>>>>>>> deprives us of >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> extensibility (users won't be able to >>>>>>>>>> implement >>>>>>>>>>>>>>> their >>>>>>>>>>>>>>>>>> own >>>>>>>>>>>>>>>>>>>>>>>>>>>>>> cache). >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> But >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> most >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> probably it might be solved by creating >>>>>>>>> more >>>>>>>>>>>>>>>>> different >>>>>>>>>>>>>>>>>>> cache >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> strategies >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> and >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> a wider set of configurations. >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> All these points are much closer to the >>>>>>>>>> schema >>>>>>>>>>>>>>>>> proposed >>>>>>>>>>>>>>>>>>> by >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Alexander. >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Qingshen Ren, please correct me if I'm >>>>>>>> not >>>>>>>>>>> right >>>>>>>>>>>>> and >>>>>>>>>>>>>>>>>> all >>>>>>>>>>>>>>>>>>> these >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> facilities >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> might be simply implemented in your >>>>>>>>>>> architecture? >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Best regards, >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Roman Boyko >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> e.: ro.v.bo...@gmail.com >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> On Wed, 4 May 2022 at 21:01, Martijn >>>>>>>>> Visser < >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> martijnvis...@apache.org> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> wrote: >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Hi everyone, >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> I don't have much to chip in, but just >>>>>>>>>> wanted >>>>>>>>>>> to >>>>>>>>>>>>>>>>>>> express that >>>>>>>>>>>>>>>>>>>>>>>>>>>>>> I >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> really >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> appreciate the in-depth discussion on >>>>>>>> this >>>>>>>>>>> topic >>>>>>>>>>>>>>>>> and I >>>>>>>>>>>>>>>>>>> hope >>>>>>>>>>>>>>>>>>>>>>>>>>>>>> that >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> others >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> will join the conversation. >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Best regards, >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Martijn >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> On Tue, 3 May 2022 at 10:15, Александр >>>>>>>>>>> Смирнов < >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> smirale...@gmail.com> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> wrote: >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Hi Qingsheng, Leonard and Jark, >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Thanks for your detailed feedback! >>>>>>>>>> However, I >>>>>>>>>>>>> have >>>>>>>>>>>>>>>>>>> questions >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> about >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> some of your statements (maybe I didn't >>>>>>>>> get >>>>>>>>>>>>>>>>>>> something?). >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Caching actually breaks the semantic >>>>>>>> of >>>>>>>>>> "FOR >>>>>>>>>>>>>>>>>>> SYSTEM_TIME >>>>>>>>>>>>>>>>>>>>>>>>>>>>>> AS OF >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> proc_time” >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> I agree that the semantics of "FOR >>>>>>>>>>> SYSTEM_TIME >>>>>>>>>>>>> AS >>>>>>>>>>>>>>>>> OF >>>>>>>>>>>>>>>>>>>>>>>>>>>>>> proc_time" >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> is >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> not >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> fully implemented with caching, but as >>>>>>>>> you >>>>>>>>>>>>> said, >>>>>>>>>>>>>>>>>> users >>>>>>>>>>>>>>>>>>> go >>>>>>>>>>>>>>>>>>>>>>>>>>>>>> on it >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> consciously to achieve better >>>>>>>> performance >>>>>>>>>> (no >>>>>>>>>>>>> one >>>>>>>>>>>>>>>>>>> proposed >>>>>>>>>>>>>>>>>>>>>>>>>>>>>> to >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> enable >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> caching by default, etc.). Or by users >>>>>>>> do >>>>>>>>>> you >>>>>>>>>>>>> mean >>>>>>>>>>>>>>>>>>> other >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> developers >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> of >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> connectors? In this case developers >>>>>>>>>>> explicitly >>>>>>>>>>>>>>>>>> specify >>>>>>>>>>>>>>>>>>>>>>>>>>>>>> whether >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> their >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> connector supports caching or not (in >>>>>>>> the >>>>>>>>>>> list >>>>>>>>>>>>> of >>>>>>>>>>>>>>>>>>> supported >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> options), >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> no one makes them do that if they don't >>>>>>>>>> want >>>>>>>>>>>>> to. >>>>>>>>>>>>>>> So >>>>>>>>>>>>>>>>>>> what >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> exactly is >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> the difference between implementing >>>>>>>>> caching >>>>>>>>>>> in >>>>>>>>>>>>>>>>>> modules >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> flink-table-runtime and in >>>>>>>>>> flink-table-common >>>>>>>>>>>>> from >>>>>>>>>>>>>>>>>> the >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> considered >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> point of view? How does it affect on >>>>>>>>>>>>>>>>>>> breaking/non-breaking >>>>>>>>>>>>>>>>>>>>>>>>>>>>>> the >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> semantics of "FOR SYSTEM_TIME AS OF >>>>>>>>>>> proc_time"? >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> confront a situation that allows table >>>>>>>>>>>>> options in >>>>>>>>>>>>>>>>>> DDL >>>>>>>>>>>>>>>>>>> to >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> control >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> the >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> behavior of the framework, which has >>>>>>>>> never >>>>>>>>>>>>>>> happened >>>>>>>>>>>>>>>>>>>>>>>>>>>>>> previously >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> and >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> should >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> be cautious >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> If we talk about main differences of >>>>>>>>>>> semantics >>>>>>>>>>>>> of >>>>>>>>>>>>>>>>> DDL >>>>>>>>>>>>>>>>>>>>>>>>>>>>>> options >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> and >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> config options("table.exec.xxx"), isn't >>>>>>>>> it >>>>>>>>>>>>> about >>>>>>>>>>>>>>>>>>> limiting >>>>>>>>>>>>>>>>>>>>>>>>>>>>>> the >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> scope >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> of >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> the options + importance for the user >>>>>>>>>>> business >>>>>>>>>>>>>>>>> logic >>>>>>>>>>>>>>>>>>> rather >>>>>>>>>>>>>>>>>>>>>>>>>>>>>> than >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> specific location of corresponding >>>>>>>> logic >>>>>>>>> in >>>>>>>>>>> the >>>>>>>>>>>>>>>>>>> framework? I >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> mean >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> that >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> in my design, for example, putting an >>>>>>>>>> option >>>>>>>>>>>>> with >>>>>>>>>>>>>>>>>>> lookup >>>>>>>>>>>>>>>>>>>>>>>>>>>>>> cache >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> strategy in configurations would be >>>>>>>> the >>>>>>>>>>> wrong >>>>>>>>>>>>>>>>>>> decision, >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> because it >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> directly affects the user's business >>>>>>>>> logic >>>>>>>>>>> (not >>>>>>>>>>>>>>>>> just >>>>>>>>>>>>>>>>>>>>>>>>>>>>>> performance >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> optimization) + touches just several >>>>>>>>>>> functions >>>>>>>>>>>>> of >>>>>>>>>>>>>>>>> ONE >>>>>>>>>>>>>>>>>>> table >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> (there >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> can >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> be multiple tables with different >>>>>>>>> caches). >>>>>>>>>>>>> Does it >>>>>>>>>>>>>>>>>>> really >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> matter for >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> the user (or someone else) where the >>>>>>>>> logic >>>>>>>>>> is >>>>>>>>>>>>>>>>>> located, >>>>>>>>>>>>>>>>>>>>>>>>>>>>>> which is >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> affected by the applied option? >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Also I can remember DDL option >>>>>>>>>>>>> 'sink.parallelism', >>>>>>>>>>>>>>>>>>> which in >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> some way >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> "controls the behavior of the >>>>>>>> framework" >>>>>>>>>> and >>>>>>>>>>> I >>>>>>>>>>>>>>>>> don't >>>>>>>>>>>>>>>>>>> see any >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> problem >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> here. >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> introduce a new interface for this >>>>>>>>>>> all-caching >>>>>>>>>>>>>>>>>>> scenario >>>>>>>>>>>>>>>>>>>>>>>>>>>>>> and >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> the >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> design >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> would become more complex >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> This is a subject for a separate >>>>>>>>>> discussion, >>>>>>>>>>>>> but >>>>>>>>>>>>>>>>>>> actually >>>>>>>>>>>>>>>>>>>>>>>>>>>>>> in our >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> internal version we solved this problem >>>>>>>>>> quite >>>>>>>>>>>>>>>>> easily >>>>>>>>>>>>>>>>>> - >>>>>>>>>>>>>>>>>>> we >>>>>>>>>>>>>>>>>>>>>>>>>>>>>> reused >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> InputFormat class (so there is no need >>>>>>>>> for >>>>>>>>>> a >>>>>>>>>>>>> new >>>>>>>>>>>>>>>>>> API). >>>>>>>>>>>>>>>>>>> The >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> point is >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> that currently all lookup connectors >>>>>>>> use >>>>>>>>>>>>>>>>> InputFormat >>>>>>>>>>>>>>>>>>> for >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> scanning >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> the >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> data in batch mode: HBase, JDBC and >>>>>>>> even >>>>>>>>>> Hive >>>>>>>>>>>>> - it >>>>>>>>>>>>>>>>>> uses >>>>>>>>>>>>>>>>>>>>>>>>>>>>>> class >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> PartitionReader, that is actually just >>>>>>>> a >>>>>>>>>>>>> wrapper >>>>>>>>>>>>>>>>>> around >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> InputFormat. >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> The advantage of this solution is the >>>>>>>>>> ability >>>>>>>>>>>>> to >>>>>>>>>>>>>>>>>> reload >>>>>>>>>>>>>>>>>>>>>>>>>>>>>> cache >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> data >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> in >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> parallel (number of threads depends on >>>>>>>>>> number >>>>>>>>>>>>> of >>>>>>>>>>>>>>>>>>>>>>>>>>>>>> InputSplits, >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> but >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> has >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> an upper limit). As a result cache >>>>>>>> reload >>>>>>>>>>> time >>>>>>>>>>>>>>>>>>> significantly >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> reduces >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> (as well as time of input stream >>>>>>>>>> blocking). I >>>>>>>>>>>>> know >>>>>>>>>>>>>>>>>> that >>>>>>>>>>>>>>>>>>>>>>>>>>>>>> usually >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> we >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> try >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> to avoid usage of concurrency in Flink >>>>>>>>>> code, >>>>>>>>>>>>> but >>>>>>>>>>>>>>>>>> maybe >>>>>>>>>>>>>>>>>>> this >>>>>>>>>>>>>>>>>>>>>>>>>>>>>> one >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> can >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> be >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> an exception. BTW I don't say that it's >>>>>>>>> an >>>>>>>>>>>>> ideal >>>>>>>>>>>>>>>>>>> solution, >>>>>>>>>>>>>>>>>>>>>>>>>>>>>> maybe >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> there >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> are better ones. >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Providing the cache in the framework >>>>>>>>> might >>>>>>>>>>>>>>>>> introduce >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> compatibility >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> issues >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> It's possible only in cases when the >>>>>>>>>>> developer >>>>>>>>>>>>> of >>>>>>>>>>>>>>>>> the >>>>>>>>>>>>>>>>>>>>>>>>>>>>>> connector >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> won't >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> properly refactor his code and will use >>>>>>>>> new >>>>>>>>>>>>> cache >>>>>>>>>>>>>>>>>>> options >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> incorrectly >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> (i.e. explicitly provide the same >>>>>>>> options >>>>>>>>>>> into >>>>>>>>>>>>> 2 >>>>>>>>>>>>>>>>>>> different >>>>>>>>>>>>>>>>>>>>>>>>>>>>>> code >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> places). For correct behavior all he >>>>>>>> will >>>>>>>>>>> need >>>>>>>>>>>>> to >>>>>>>>>>>>>>>>> do >>>>>>>>>>>>>>>>>>> is to >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> redirect >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> existing options to the framework's >>>>>>>>>>>>> LookupConfig >>>>>>>>>>>>>>> (+ >>>>>>>>>>>>>>>>>>> maybe >>>>>>>>>>>>>>>>>>>>>>>>>>>>>> add an >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> alias >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> for options, if there was different >>>>>>>>>> naming), >>>>>>>>>>>>>>>>>> everything >>>>>>>>>>>>>>>>>>>>>>>>>>>>>> will be >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> transparent for users. If the developer >>>>>>>>>> won't >>>>>>>>>>>>> do >>>>>>>>>>>>>>>>>>>>>>>>>>>>>> refactoring at >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> all, >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> nothing will be changed for the >>>>>>>> connector >>>>>>>>>>>>> because >>>>>>>>>>>>>>>>> of >>>>>>>>>>>>>>>>>>>>>>>>>>>>>> backward >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> compatibility. Also if a developer >>>>>>>> wants >>>>>>>>> to >>>>>>>>>>> use >>>>>>>>>>>>>>> his >>>>>>>>>>>>>>>>>> own >>>>>>>>>>>>>>>>>>>>>>>>>>>>>> cache >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> logic, >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> he just can refuse to pass some of the >>>>>>>>>>> configs >>>>>>>>>>>>>>> into >>>>>>>>>>>>>>>>>> the >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> framework, >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> and >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> instead make his own implementation >>>>>>>> with >>>>>>>>>>>>> already >>>>>>>>>>>>>>>>>>> existing >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> configs >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> and >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> metrics (but actually I think that >>>>>>>> it's a >>>>>>>>>>> rare >>>>>>>>>>>>>>>>> case). >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> filters and projections should be >>>>>>>> pushed >>>>>>>>>> all >>>>>>>>>>>>> the >>>>>>>>>>>>>>>>> way >>>>>>>>>>>>>>>>>>> down >>>>>>>>>>>>>>>>>>>>>>>>>>>>>> to >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> the >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> table >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> function, like what we do in the scan >>>>>>>>>> source >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> It's the great purpose. But the truth >>>>>>>> is >>>>>>>>>> that >>>>>>>>>>>>> the >>>>>>>>>>>>>>>>>> ONLY >>>>>>>>>>>>>>>>>>>>>>>>>>>>>> connector >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> that >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> supports filter pushdown is >>>>>>>>>>>>> FileSystemTableSource >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> (no database connector supports it >>>>>>>>>>> currently). >>>>>>>>>>>>>>> Also >>>>>>>>>>>>>>>>>>> for some >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> databases >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> it's simply impossible to pushdown such >>>>>>>>>>> complex >>>>>>>>>>>>>>>>>> filters >>>>>>>>>>>>>>>>>>>>>>>>>>>>>> that we >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> have >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> in Flink. >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> only applying these optimizations to >>>>>>>> the >>>>>>>>>>> cache >>>>>>>>>>>>>>>>> seems >>>>>>>>>>>>>>>>>>> not >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> quite >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> useful >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Filters can cut off an arbitrarily >>>>>>>> large >>>>>>>>>>>>> amount of >>>>>>>>>>>>>>>>>> data >>>>>>>>>>>>>>>>>>>>>>>>>>>>>> from the >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> dimension table. For a simple example, >>>>>>>>>>> suppose >>>>>>>>>>>>> in >>>>>>>>>>>>>>>>>>> dimension >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> table >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> 'users' >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> we have column 'age' with values from >>>>>>>> 20 >>>>>>>>> to >>>>>>>>>>> 40, >>>>>>>>>>>>>>> and >>>>>>>>>>>>>>>>>>> input >>>>>>>>>>>>>>>>>>>>>>>>>>>>>> stream >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> 'clicks' that is ~uniformly distributed >>>>>>>>> by >>>>>>>>>>> age >>>>>>>>>>>>> of >>>>>>>>>>>>>>>>>>> users. If >>>>>>>>>>>>>>>>>>>>>>>>>>>>>> we >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> have >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> filter 'age > 30', >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> there will be twice less data in cache. >>>>>>>>>> This >>>>>>>>>>>>> means >>>>>>>>>>>>>>>>>> the >>>>>>>>>>>>>>>>>>> user >>>>>>>>>>>>>>>>>>>>>>>>>>>>>> can >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> increase 'lookup.cache.max-rows' by >>>>>>>>> almost >>>>>>>>>> 2 >>>>>>>>>>>>>>> times. >>>>>>>>>>>>>>>>>> It >>>>>>>>>>>>>>>>>>> will >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> gain a >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> huge >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> performance boost. Moreover, this >>>>>>>>>>> optimization >>>>>>>>>>>>>>>>> starts >>>>>>>>>>>>>>>>>>> to >>>>>>>>>>>>>>>>>>>>>>>>>>>>>> really >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> shine >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> in 'ALL' cache, where tables without >>>>>>>>>> filters >>>>>>>>>>>>> and >>>>>>>>>>>>>>>>>>> projections >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> can't >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> fit >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> in memory, but with them - can. This >>>>>>>>> opens >>>>>>>>>> up >>>>>>>>>>>>>>>>>>> additional >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> possibilities >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> for users. And this doesn't sound as >>>>>>>> 'not >>>>>>>>>>> quite >>>>>>>>>>>>>>>>>>> useful'. >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> It would be great to hear other voices >>>>>>>>>>>>> regarding >>>>>>>>>>>>>>>>> this >>>>>>>>>>>>>>>>>>> topic! >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Because >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> we have quite a lot of controversial >>>>>>>>>> points, >>>>>>>>>>>>> and I >>>>>>>>>>>>>>>>>>> think >>>>>>>>>>>>>>>>>>>>>>>>>>>>>> with >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> the >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> help >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> of others it will be easier for us to >>>>>>>>> come >>>>>>>>>>> to a >>>>>>>>>>>>>>>>>>> consensus. >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Best regards, >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Smirnov Alexander >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> пт, 29 апр. 2022 г. в 22:33, Qingsheng >>>>>>>>> Ren >>>>>>>>>> < >>>>>>>>>>>>>>>>>>>>>>>>>>>>>> renqs...@gmail.com >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> : >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Hi Alexander and Arvid, >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Thanks for the discussion and sorry >>>>>>>> for >>>>>>>>> my >>>>>>>>>>>>> late >>>>>>>>>>>>>>>>>>> response! >>>>>>>>>>>>>>>>>>>>>>>>>>>>>> We >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> had >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> an >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> internal discussion together with Jark >>>>>>>>> and >>>>>>>>>>>>> Leonard >>>>>>>>>>>>>>>>>> and >>>>>>>>>>>>>>>>>>> I’d >>>>>>>>>>>>>>>>>>>>>>>>>>>>>> like >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> to >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> summarize our ideas. Instead of >>>>>>>>>> implementing >>>>>>>>>>>>> the >>>>>>>>>>>>>>>>>> cache >>>>>>>>>>>>>>>>>>>>>>>>>>>>>> logic in >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> the >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> table >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> runtime layer or wrapping around the >>>>>>>>>>>>> user-provided >>>>>>>>>>>>>>>>>>> table >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> function, >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> we >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> prefer to introduce some new APIs >>>>>>>>> extending >>>>>>>>>>>>>>>>>>> TableFunction >>>>>>>>>>>>>>>>>>>>>>>>>>>>>> with >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> these >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> concerns: >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> 1. Caching actually breaks the >>>>>>>> semantic >>>>>>>>> of >>>>>>>>>>>>> "FOR >>>>>>>>>>>>>>>>>>>>>>>>>>>>>> SYSTEM_TIME >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> AS OF >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> proc_time”, because it couldn’t truly >>>>>>>>>> reflect >>>>>>>>>>>>> the >>>>>>>>>>>>>>>>>>> content >>>>>>>>>>>>>>>>>>>>>>>>>>>>>> of the >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> lookup >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> table at the moment of querying. If >>>>>>>> users >>>>>>>>>>>>> choose >>>>>>>>>>>>>>> to >>>>>>>>>>>>>>>>>>> enable >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> caching >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> on >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> the >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> lookup table, they implicitly indicate >>>>>>>>> that >>>>>>>>>>>>> this >>>>>>>>>>>>>>>>>>> breakage is >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> acceptable >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> in >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> exchange for the performance. So we >>>>>>>>> prefer >>>>>>>>>>> not >>>>>>>>>>>>> to >>>>>>>>>>>>>>>>>>> provide >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> caching on >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> the >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> table runtime level. >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> 2. If we make the cache implementation >>>>>>>>> in >>>>>>>>>>> the >>>>>>>>>>>>>>>>>>> framework >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> (whether >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> in a >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> runner or a wrapper around >>>>>>>>> TableFunction), >>>>>>>>>> we >>>>>>>>>>>>> have >>>>>>>>>>>>>>>>> to >>>>>>>>>>>>>>>>>>>>>>>>>>>>>> confront a >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> situation >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> that allows table options in DDL to >>>>>>>>> control >>>>>>>>>>> the >>>>>>>>>>>>>>>>>>> behavior of >>>>>>>>>>>>>>>>>>>>>>>>>>>>>> the >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> framework, >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> which has never happened previously and >>>>>>>>>>> should >>>>>>>>>>>>> be >>>>>>>>>>>>>>>>>>> cautious. >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Under >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> the >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> current design the behavior of the >>>>>>>>>> framework >>>>>>>>>>>>>>> should >>>>>>>>>>>>>>>>>>> only be >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> specified >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> by >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> configurations (“table.exec.xxx”), and >>>>>>>>> it’s >>>>>>>>>>>>> hard >>>>>>>>>>>>>>> to >>>>>>>>>>>>>>>>>>> apply >>>>>>>>>>>>>>>>>>>>>>>>>>>>>> these >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> general >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> configs to a specific table. >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> 3. We have use cases that lookup >>>>>>>> source >>>>>>>>>>> loads >>>>>>>>>>>>> and >>>>>>>>>>>>>>>>>>> refresh >>>>>>>>>>>>>>>>>>>>>>>>>>>>>> all >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> records >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> periodically into the memory to achieve >>>>>>>>>> high >>>>>>>>>>>>>>> lookup >>>>>>>>>>>>>>>>>>>>>>>>>>>>>> performance >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> (like >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Hive >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> connector in the community, and also >>>>>>>>> widely >>>>>>>>>>>>> used >>>>>>>>>>>>>>> by >>>>>>>>>>>>>>>>>> our >>>>>>>>>>>>>>>>>>>>>>>>>>>>>> internal >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> connectors). Wrapping the cache around >>>>>>>>> the >>>>>>>>>>>>> user’s >>>>>>>>>>>>>>>>>>>>>>>>>>>>>> TableFunction >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> works >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> fine >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> for LRU caches, but I think we have to >>>>>>>>>>>>> introduce a >>>>>>>>>>>>>>>>>> new >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> interface for >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> this >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> all-caching scenario and the design >>>>>>>> would >>>>>>>>>>>>> become >>>>>>>>>>>>>>>>> more >>>>>>>>>>>>>>>>>>>>>>>>>>>>>> complex. >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> 4. Providing the cache in the >>>>>>>> framework >>>>>>>>>>> might >>>>>>>>>>>>>>>>>>> introduce >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> compatibility >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> issues to existing lookup sources like >>>>>>>>>> there >>>>>>>>>>>>> might >>>>>>>>>>>>>>>>>>> exist two >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> caches >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> with >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> totally different strategies if the >>>>>>>> user >>>>>>>>>>>>>>>>> incorrectly >>>>>>>>>>>>>>>>>>>>>>>>>>>>>> configures >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> the >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> table >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> (one in the framework and another >>>>>>>>>> implemented >>>>>>>>>>>>> by >>>>>>>>>>>>>>>>> the >>>>>>>>>>>>>>>>>>> lookup >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> source). >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> As for the optimization mentioned by >>>>>>>>>>>>> Alexander, I >>>>>>>>>>>>>>>>>>> think >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> filters >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> and >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> projections should be pushed all the >>>>>>>> way >>>>>>>>>> down >>>>>>>>>>>>> to >>>>>>>>>>>>>>>>> the >>>>>>>>>>>>>>>>>>> table >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> function, >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> like >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> what we do in the scan source, instead >>>>>>>> of >>>>>>>>>> the >>>>>>>>>>>>>>>>> runner >>>>>>>>>>>>>>>>>>> with >>>>>>>>>>>>>>>>>>>>>>>>>>>>>> the >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> cache. >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> The >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> goal of using cache is to reduce the >>>>>>>>>> network >>>>>>>>>>>>> I/O >>>>>>>>>>>>>>>>> and >>>>>>>>>>>>>>>>>>>>>>>>>>>>>> pressure >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> on the >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> external system, and only applying >>>>>>>> these >>>>>>>>>>>>>>>>>> optimizations >>>>>>>>>>>>>>>>>>> to >>>>>>>>>>>>>>>>>>>>>>>>>>>>>> the >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> cache >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> seems >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> not quite useful. >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> I made some updates to the FLIP[1] to >>>>>>>>>>> reflect >>>>>>>>>>>>> our >>>>>>>>>>>>>>>>>>> ideas. >>>>>>>>>>>>>>>>>>>>>>>>>>>>>> We >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> prefer to >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> keep the cache implementation as a part >>>>>>>>> of >>>>>>>>>>>>>>>>>>> TableFunction, >>>>>>>>>>>>>>>>>>>>>>>>>>>>>> and we >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> could >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> provide some helper classes >>>>>>>>>>>>> (CachingTableFunction, >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> AllCachingTableFunction, >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> CachingAsyncTableFunction) to >>>>>>>> developers >>>>>>>>>> and >>>>>>>>>>>>>>>>> regulate >>>>>>>>>>>>>>>>>>>>>>>>>>>>>> metrics >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> of the >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> cache. >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Also, I made a POC[2] for your >>>>>>>> reference. >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Looking forward to your ideas! >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> [1] >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>> >>>>>>>>> >>>>>>>> >>>>>> >>>>> https://cwiki.apache.org/confluence/display/FLINK/FLIP-221+Abstraction+for+lookup+source+cache+and+metric >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> [2] >>>>>>>>>>>>>>>>>> https://github.com/PatrickRen/flink/tree/FLIP-221 >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Best regards, >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Qingsheng >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> On Tue, Apr 26, 2022 at 4:45 PM >>>>>>>>> Александр >>>>>>>>>>>>> Смирнов >>>>>>>>>>>>>>>>> < >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> smirale...@gmail.com> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> wrote: >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Thanks for the response, Arvid! >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> I have few comments on your message. >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> but could also live with an easier >>>>>>>>>>> solution >>>>>>>>>>>>> as >>>>>>>>>>>>>>>>> the >>>>>>>>>>>>>>>>>>>>>>>>>>>>>> first >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> step: >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> I think that these 2 ways are >>>>>>>> mutually >>>>>>>>>>>>> exclusive >>>>>>>>>>>>>>>>>>>>>>>>>>>>>> (originally >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> proposed >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> by Qingsheng and mine), because >>>>>>>>>>> conceptually >>>>>>>>>>>>>>> they >>>>>>>>>>>>>>>>>>> follow >>>>>>>>>>>>>>>>>>>>>>>>>>>>>> the >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> same >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> goal, but implementation details are >>>>>>>>>>>>> different. >>>>>>>>>>>>>>>>> If >>>>>>>>>>>>>>>>>> we >>>>>>>>>>>>>>>>>>>>>>>>>>>>>> will >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> go one >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> way, >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> moving to another way in the future >>>>>>>>> will >>>>>>>>>>> mean >>>>>>>>>>>>>>>>>>> deleting >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> existing >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> code >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> and once again changing the API for >>>>>>>>>>>>> connectors. >>>>>>>>>>>>>>>>> So >>>>>>>>>>>>>>>>>> I >>>>>>>>>>>>>>>>>>>>>>>>>>>>>> think we >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> should >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> reach a consensus with the community >>>>>>>>>> about >>>>>>>>>>>>> that >>>>>>>>>>>>>>>>> and >>>>>>>>>>>>>>>>>>> then >>>>>>>>>>>>>>>>>>>>>>>>>>>>>> work >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> together >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> on this FLIP, i.e. divide the work on >>>>>>>>>> tasks >>>>>>>>>>>>> for >>>>>>>>>>>>>>>>>>> different >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> parts >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> of >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> the >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> flip (for example, LRU cache >>>>>>>>> unification >>>>>>>>>> / >>>>>>>>>>>>>>>>>>> introducing >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> proposed >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> set >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> of >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> metrics / further work…). WDYT, >>>>>>>>>> Qingsheng? >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> as the source will only receive the >>>>>>>>>>> requests >>>>>>>>>>>>>>>>> after >>>>>>>>>>>>>>>>>>>>>>>>>>>>>> filter >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Actually if filters are applied to >>>>>>>>> fields >>>>>>>>>>> of >>>>>>>>>>>>> the >>>>>>>>>>>>>>>>>>> lookup >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> table, we >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> firstly must do requests, and only >>>>>>>>> after >>>>>>>>>>>>> that we >>>>>>>>>>>>>>>>>> can >>>>>>>>>>>>>>>>>>>>>>>>>>>>>> filter >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> responses, >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> because lookup connectors don't have >>>>>>>>>> filter >>>>>>>>>>>>>>>>>>> pushdown. So >>>>>>>>>>>>>>>>>>>>>>>>>>>>>> if >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> filtering >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> is done before caching, there will be >>>>>>>>>> much >>>>>>>>>>>>> less >>>>>>>>>>>>>>>>>> rows >>>>>>>>>>>>>>>>>>> in >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> cache. >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> @Alexander unfortunately, your >>>>>>>>>>> architecture >>>>>>>>>>>>> is >>>>>>>>>>>>>>>>> not >>>>>>>>>>>>>>>>>>>>>>>>>>>>>> shared. >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> I >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> don't >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> know the >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> solution to share images to be >>>>>>>> honest. >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Sorry for that, I’m a bit new to such >>>>>>>>>> kinds >>>>>>>>>>>>> of >>>>>>>>>>>>>>>>>>>>>>>>>>>>>> conversations >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> :) >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> I have no write access to the >>>>>>>>> confluence, >>>>>>>>>>> so >>>>>>>>>>>>> I >>>>>>>>>>>>>>>>>> made a >>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Jira >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> issue, >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> where described the proposed changes >>>>>>>> in >>>>>>>>>>> more >>>>>>>>>>>>>>>>>> details >>>>>>>>>>>>>>>>>>> - >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> https://issues.apache.org/jira/browse/FLINK-27411. >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Will happy to get more feedback! >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Best, >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Smirnov Alexander >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> пн, 25 апр. 2022 г. в 19:49, Arvid >>>>>>>>> Heise >>>>>>>>>> < >>>>>>>>>>>>>>>>>>>>>>>>>>>>>> ar...@apache.org>: >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Hi Qingsheng, >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Thanks for driving this; the >>>>>>>>>> inconsistency >>>>>>>>>>>>> was >>>>>>>>>>>>>>>>> not >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> satisfying >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> for >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> me. >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> I second Alexander's idea though but >>>>>>>>>> could >>>>>>>>>>>>> also >>>>>>>>>>>>>>>>>> live >>>>>>>>>>>>>>>>>>>>>>>>>>>>>> with >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> an >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> easier >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> solution as the first step: Instead >>>>>>>> of >>>>>>>>>>>>> making >>>>>>>>>>>>>>>>>>> caching >>>>>>>>>>>>>>>>>>>>>>>>>>>>>> an >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> implementation >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> detail of TableFunction X, rather >>>>>>>>>> devise a >>>>>>>>>>>>>>>>> caching >>>>>>>>>>>>>>>>>>>>>>>>>>>>>> layer >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> around X. >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> So >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> the >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> proposal would be a >>>>>>>>> CachingTableFunction >>>>>>>>>>>>> that >>>>>>>>>>>>>>>>>>>>>>>>>>>>>> delegates to >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> X in >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> case >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> of >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> misses and else manages the cache. >>>>>>>>>> Lifting >>>>>>>>>>>>> it >>>>>>>>>>>>>>>>> into >>>>>>>>>>>>>>>>>>> the >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> operator >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> model >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> as >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> proposed would be even better but is >>>>>>>>>>>>> probably >>>>>>>>>>>>>>>>>>>>>>>>>>>>>> unnecessary >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> in >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> the >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> first step >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> for a lookup source (as the source >>>>>>>>> will >>>>>>>>>>> only >>>>>>>>>>>>>>>>>> receive >>>>>>>>>>>>>>>>>>>>>>>>>>>>>> the >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> requests >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> after >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> filter; applying projection may be >>>>>>>>> more >>>>>>>>>>>>>>>>>> interesting >>>>>>>>>>>>>>>>>>> to >>>>>>>>>>>>>>>>>>>>>>>>>>>>>> save >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> memory). >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Another advantage is that all the >>>>>>>>>> changes >>>>>>>>>>> of >>>>>>>>>>>>>>>>> this >>>>>>>>>>>>>>>>>>> FLIP >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> would be >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> limited to >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> options, no need for new public >>>>>>>>>>> interfaces. >>>>>>>>>>>>>>>>>>> Everything >>>>>>>>>>>>>>>>>>>>>>>>>>>>>> else >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> remains >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> an >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> implementation of Table runtime. >>>>>>>> That >>>>>>>>>>> means >>>>>>>>>>>>> we >>>>>>>>>>>>>>>>> can >>>>>>>>>>>>>>>>>>>>>>>>>>>>>> easily >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> incorporate >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> the >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> optimization potential that >>>>>>>> Alexander >>>>>>>>>>>>> pointed >>>>>>>>>>>>>>>>> out >>>>>>>>>>>>>>>>>>>>>>>>>>>>>> later. >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> @Alexander unfortunately, your >>>>>>>>>>> architecture >>>>>>>>>>>>> is >>>>>>>>>>>>>>>>> not >>>>>>>>>>>>>>>>>>>>>>>>>>>>>> shared. >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> I >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> don't >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> know the >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> solution to share images to be >>>>>>>> honest. >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> On Fri, Apr 22, 2022 at 5:04 PM >>>>>>>>>> Александр >>>>>>>>>>>>>>>>> Смирнов >>>>>>>>>>>>>>>>>> < >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> smirale...@gmail.com> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> wrote: >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Hi Qingsheng! My name is Alexander, >>>>>>>>> I'm >>>>>>>>>>>>> not a >>>>>>>>>>>>>>>>>>>>>>>>>>>>>> committer >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> yet, >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> but >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> I'd >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> really like to become one. And this >>>>>>>>>> FLIP >>>>>>>>>>>>>>> really >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> interested >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> me. >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Actually I have worked on a similar >>>>>>>>>>>>> feature in >>>>>>>>>>>>>>>>> my >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> company’s >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Flink >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> fork, and we would like to share >>>>>>>> our >>>>>>>>>>>>> thoughts >>>>>>>>>>>>>>>>> on >>>>>>>>>>>>>>>>>>>>>>>>>>>>>> this and >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> make >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> code >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> open source. >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> I think there is a better >>>>>>>> alternative >>>>>>>>>>> than >>>>>>>>>>>>>>>>>>>>>>>>>>>>>> introducing an >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> abstract >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> class for TableFunction >>>>>>>>>>>>>>> (CachingTableFunction). >>>>>>>>>>>>>>>>>> As >>>>>>>>>>>>>>>>>>>>>>>>>>>>>> you >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> know, >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> TableFunction exists in the >>>>>>>>>>>>> flink-table-common >>>>>>>>>>>>>>>>>>>>>>>>>>>>>> module, >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> which >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> provides >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> only an API for working with >>>>>>>> tables – >>>>>>>>>>> it’s >>>>>>>>>>>>>>> very >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> convenient >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> for >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> importing >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> in connectors. In turn, >>>>>>>>>>>>> CachingTableFunction >>>>>>>>>>>>>>>>>>> contains >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> logic >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> for >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> runtime execution, so this class >>>>>>>> and >>>>>>>>>>>>>>>>> everything >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> connected >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> with >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> it >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> should be located in another >>>>>>>> module, >>>>>>>>>>>>> probably >>>>>>>>>>>>>>>>> in >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> flink-table-runtime. >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> But this will require connectors to >>>>>>>>>>> depend >>>>>>>>>>>>> on >>>>>>>>>>>>>>>>>>> another >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> module, >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> which >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> contains a lot of runtime logic, >>>>>>>>> which >>>>>>>>>>>>> doesn’t >>>>>>>>>>>>>>>>>>> sound >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> good. >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> I suggest adding a new method >>>>>>>>>>>>>>> ‘getLookupConfig’ >>>>>>>>>>>>>>>>>> to >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> LookupTableSource >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> or LookupRuntimeProvider to allow >>>>>>>>>>>>> connectors >>>>>>>>>>>>>>> to >>>>>>>>>>>>>>>>>>> only >>>>>>>>>>>>>>>>>>>>>>>>>>>>>> pass >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> configurations to the planner, >>>>>>>>>> therefore >>>>>>>>>>>>> they >>>>>>>>>>>>>>>>>> won’t >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> depend on >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> runtime >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> realization. Based on these configs >>>>>>>>>>> planner >>>>>>>>>>>>>>>>> will >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> construct a >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> lookup >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> join operator with corresponding >>>>>>>>>> runtime >>>>>>>>>>>>> logic >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> (ProcessFunctions >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> in >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> module flink-table-runtime). >>>>>>>>>> Architecture >>>>>>>>>>>>>>> looks >>>>>>>>>>>>>>>>>>> like >>>>>>>>>>>>>>>>>>>>>>>>>>>>>> in >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> the >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> pinned >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> image (LookupConfig class there is >>>>>>>>>>> actually >>>>>>>>>>>>>>>>> yours >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> CacheConfig). >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Classes in flink-table-planner, >>>>>>>> that >>>>>>>>>> will >>>>>>>>>>>>> be >>>>>>>>>>>>>>>>>>>>>>>>>>>>>> responsible >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> for >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> this >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> – >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> CommonPhysicalLookupJoin and his >>>>>>>>>>>>> inheritors. >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Current classes for lookup join in >>>>>>>>>>>>>>>>>>>>>>>>>>>>>> flink-table-runtime >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> - >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> LookupJoinRunner, >>>>>>>>>> AsyncLookupJoinRunner, >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> LookupJoinRunnerWithCalc, >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> AsyncLookupJoinRunnerWithCalc. >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> I suggest adding classes >>>>>>>>>>>>>>>>> LookupJoinCachingRunner, >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> LookupJoinCachingRunnerWithCalc, >>>>>>>> etc. >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> And here comes another more >>>>>>>> powerful >>>>>>>>>>>>> advantage >>>>>>>>>>>>>>>>> of >>>>>>>>>>>>>>>>>>>>>>>>>>>>>> such a >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> solution. >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> If >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> we have caching logic on a lower >>>>>>>>> level, >>>>>>>>>>> we >>>>>>>>>>>>> can >>>>>>>>>>>>>>>>>>> apply >>>>>>>>>>>>>>>>>>>>>>>>>>>>>> some >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> optimizations to it. >>>>>>>>>>>>> LookupJoinRunnerWithCalc >>>>>>>>>>>>>>>>> was >>>>>>>>>>>>>>>>>>>>>>>>>>>>>> named >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> like >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> this >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> because it uses the ‘calc’ >>>>>>>> function, >>>>>>>>>>> which >>>>>>>>>>>>>>>>>> actually >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> mostly >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> consists >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> of >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> filters and projections. >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> For example, in join table A with >>>>>>>>>> lookup >>>>>>>>>>>>> table >>>>>>>>>>>>>>>>> B >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> condition >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> ‘JOIN … >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> ON >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> A.id = B.id AND A.age = B.age + 10 >>>>>>>>>> WHERE >>>>>>>>>>>>>>>>>> B.salary > >>>>>>>>>>>>>>>>>>>>>>>>>>>>>> 1000’ >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> ‘calc’ >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> function will contain filters >>>>>>>> A.age = >>>>>>>>>>>>> B.age + >>>>>>>>>>>>>>>>> 10 >>>>>>>>>>>>>>>>>>> and >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> B.salary > >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> 1000. >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> If we apply this function before >>>>>>>>>> storing >>>>>>>>>>>>>>>>> records >>>>>>>>>>>>>>>>>> in >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> cache, >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> size >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> of >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> cache will be significantly >>>>>>>> reduced: >>>>>>>>>>>>> filters = >>>>>>>>>>>>>>>>>>> avoid >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> storing >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> useless >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> records in cache, projections = >>>>>>>>> reduce >>>>>>>>>>>>>>> records’ >>>>>>>>>>>>>>>>>>>>>>>>>>>>>> size. So >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> the >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> initial >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> max number of records in cache can >>>>>>>> be >>>>>>>>>>>>>>> increased >>>>>>>>>>>>>>>>>> by >>>>>>>>>>>>>>>>>>>>>>>>>>>>>> the >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> user. >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> What do you think about it? >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> On 2022/04/19 02:47:11 Qingsheng >>>>>>>> Ren >>>>>>>>>>> wrote: >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Hi devs, >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Yuan and I would like to start a >>>>>>>>>>>>> discussion >>>>>>>>>>>>>>>>>> about >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> FLIP-221[1], >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> which >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> introduces an abstraction of lookup >>>>>>>>>> table >>>>>>>>>>>>>>> cache >>>>>>>>>>>>>>>>>> and >>>>>>>>>>>>>>>>>>>>>>>>>>>>>> its >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> standard >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> metrics. >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Currently each lookup table source >>>>>>>>>>> should >>>>>>>>>>>>>>>>>>> implement >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> their >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> own >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> cache to >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> store lookup results, and there >>>>>>>>> isn’t a >>>>>>>>>>>>>>>>> standard >>>>>>>>>>>>>>>>>> of >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> metrics >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> for >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> users and >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> developers to tuning their jobs >>>>>>>> with >>>>>>>>>>> lookup >>>>>>>>>>>>>>>>>> joins, >>>>>>>>>>>>>>>>>>>>>>>>>>>>>> which >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> is a >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> quite >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> common >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> use case in Flink table / SQL. >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Therefore we propose some new APIs >>>>>>>>>>>>> including >>>>>>>>>>>>>>>>>>> cache, >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> metrics, >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> wrapper >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> classes of TableFunction and new >>>>>>>>> table >>>>>>>>>>>>>>> options. >>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Please >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> take a >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> look >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> at the >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> FLIP page [1] to get more details. >>>>>>>>> Any >>>>>>>>>>>>>>>>>> suggestions >>>>>>>>>>>>>>>>>>>>>>>>>>>>>> and >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> comments >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> would be >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> appreciated! >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> [1] >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>> >>>>>>>>> >>>>>>>> >>>>>> >>>>> https://cwiki.apache.org/confluence/display/FLINK/FLIP-221+Abstraction+for+lookup+source+cache+and+metric >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Best regards, >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Qingsheng >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> -- >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Best Regards, >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Qingsheng Ren >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Real-time Computing Team >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Alibaba Cloud >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Email: renqs...@gmail.com >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> -- >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Best regards, >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Roman Boyko >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> e.: ro.v.bo...@gmail.com >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>> -- >>>>>>>>>>>>>>>>>>>>>>>>>>> Best Regards, >>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>> Qingsheng Ren >>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>> Real-time Computing Team >>>>>>>>>>>>>>>>>>>>>>>>>>> Alibaba Cloud >>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>> Email: renqs...@gmail.com >>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>> >>>>>>>>> >>>>>>>> >>>>>> >>>>>> >>>>> >>