Re: Where is the "Partitioned All Cache" doc?

2022-03-28 Thread Qingsheng Ren
Hi, 

The optimization you mentioned is only applicable for the product provided by 
Alibaba Cloud. In open-source Apache Flink there isn’t a unique caching 
abstraction for all lookup tables, and each connector has there own cache 
implementation. For example JDBC uses Guava cache and FileSystem uses in-memory 
HashMap, and both of them don’t load all records in dim table into the cache. 

Best, 

Qingsheng


> On Mar 28, 2022, at 12:26, dz902  wrote:
> 
> Hi,
> 
> I've read some docs
> (https://help.aliyun.com/document_detail/182011.html) stating Flink
> optimization technique using:
> 
> - partitionedJoin = 'true'
> - cache = 'ALL'
> - blink.partialAgg.enabled=true
> 
> However I could not find any official doc references. Are these
> supported at all?
> 
> Also "partitionedJoin" seemed to have the effect of shuffling input by
> joining key so they can fit into memory. I read this
> (https://flink.apache.org/news/2015/03/13/peeking-into-Apache-Flinks-Engine-Room.html)
> and believes this is already a default behavior of Flink.
> 
> Is this optimization not needed even for huge input tables?
> 
> Thanks,
> Dai



Re: Where is the "Partitioned All Cache" doc?

2022-03-28 Thread dz902
This is interesting. Thanks for the clarification!

On Mon, Mar 28, 2022 at 4:09 PM Qingsheng Ren  wrote:
>
> Hi,
>
> The optimization you mentioned is only applicable for the product provided by 
> Alibaba Cloud. In open-source Apache Flink there isn’t a unique caching 
> abstraction for all lookup tables, and each connector has there own cache 
> implementation. For example JDBC uses Guava cache and FileSystem uses 
> in-memory HashMap, and both of them don’t load all records in dim table into 
> the cache.
>
> Best,
>
> Qingsheng
>
>
> > On Mar 28, 2022, at 12:26, dz902  wrote:
> >
> > Hi,
> >
> > I've read some docs
> > (https://help.aliyun.com/document_detail/182011.html) stating Flink
> > optimization technique using:
> >
> > - partitionedJoin = 'true'
> > - cache = 'ALL'
> > - blink.partialAgg.enabled=true
> >
> > However I could not find any official doc references. Are these
> > supported at all?
> >
> > Also "partitionedJoin" seemed to have the effect of shuffling input by
> > joining key so they can fit into memory. I read this
> > (https://flink.apache.org/news/2015/03/13/peeking-into-Apache-Flinks-Engine-Room.html)
> > and believes this is already a default behavior of Flink.
> >
> > Is this optimization not needed even for huge input tables?
> >
> > Thanks,
> > Dai
>