Re: [PR] [hotfix][connector-v2-hbase]fix and optimize hbase source problem [seatunnel]

via GitHub Mon, 15 Jul 2024 22:34:04 -0700


Hisoka-X commented on PR #7148:
URL: https://github.com/apache/seatunnel/pull/7148#issuecomment-2230060784


   > > > Cache is often used for frequently querying hot data. In data 
synchronization read scenarios, data is usually queried only once. Is it 
effective to configure cache-related settings in this case? Looking forward to 
your reply.
   > > 
   > > 
   > > You are right. In the case of synchronous data, it only needs to be 
queried once, so I set the default value of `cache_blocks` to false. The 
default cache_blocks value of Hbase Scan is True. At the same time, I am not 
sure whether any user will use cache_blocks = true, so I set the cache_blocks 
parameter to an optional parameter for special cases.
   > > ```
   > > public static final Option<Boolean> HBASE_CACHE_BLOCKS_CONFIG =
   > > Options.key("cache_blocks")
   > > .booleanType()
   > > .defaultValue(false)
   > > .withDescription(
   > > "When it is false, data blocks are not cached. When it is true, data 
blocks are cached. This value should be set to false when scanning a large 
amount of data to reduce memory consumption. The default value is false");
   > > ```
   > 
   > Thank you for your reply. I believe setting the batch size is necessary. 
Should we keep the cache enabled? I would like to hear your suggestions. 
@Hisoka-X @hailin0
   
   OK for me. Give the choice to users.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Re: [PR] [hotfix][connector-v2-hbase]fix and optimize hbase source problem [seatunnel]

Reply via email to