Hisoka-X commented on PR #7148:
URL: https://github.com/apache/seatunnel/pull/7148#issuecomment-2230060784
> > > Cache is often used for frequently querying hot data. In data
synchronization read scenarios, data is usually queried only once. Is it
effective to configure cache-related settings in this case? Looking forward to
your reply.
> >
> >
> > You are right. In the case of synchronous data, it only needs to be
queried once, so I set the default value of `cache_blocks` to false. The
default cache_blocks value of Hbase Scan is True. At the same time, I am not
sure whether any user will use cache_blocks = true, so I set the cache_blocks
parameter to an optional parameter for special cases.
> > ```
> > public static final Option<Boolean> HBASE_CACHE_BLOCKS_CONFIG =
> > Options.key("cache_blocks")
> > .booleanType()
> > .defaultValue(false)
> > .withDescription(
> > "When it is false, data blocks are not cached. When it is true, data
blocks are cached. This value should be set to false when scanning a large
amount of data to reduce memory consumption. The default value is false");
> > ```
>
> Thank you for your reply. I believe setting the batch size is necessary.
Should we keep the cache enabled? I would like to hear your suggestions.
@Hisoka-X @hailin0
OK for me. Give the choice to users.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]