[GitHub] [hbase] bbeaudreault commented on pull request #5428: HBASE-28107 Limit max count of rows filtered per scan request.

via GitHub Wed, 27 Sep 2023 05:38:49 -0700


bbeaudreault commented on PR #5428:
URL: https://github.com/apache/hbase/pull/5428#issuecomment-1737316350


   > > Are you sure this is necessary? I agree that heavily filtered scans can 
cause problems. As of recently, this should be mostly mitigated with the 
improvements around blockBytesScanned 
([HBASE-27227](https://issues.apache.org/jira/browse/HBASE-27227), 
[HBASE-27532](https://issues.apache.org/jira/browse/HBASE-27532), 
[HBASE-27558](https://issues.apache.org/jira/browse/HBASE-27558)).
   > > So with those jiras, a heavily filtered scan will have a high volume of 
blockBytesScanned. This will cause those scans to checkpoint more often due to 
max scan size limits, so they won't hold up RPC handlers. You can then use 
quotas to limit these scans, so they don't have to be hard failed instead 
slowed down.
   > > I prefer that approach over this one because in my experience failing a 
request is an extreme response which can have consequences for users. Fixing 
the scan requires figuring out why and pushing code to production, which is 
time consuming and in the meantime the user queries are failing. Another big 
problem with filtered rows is it doesn't necessarily happen right as you deploy 
a new scan workload. It may work fine for a while but over time rows get 
written that dont match your filters, so one day your scan just starts failing 
due to the limits.
   > > What do you think? Do you want to give the above jiras a try?
   > 
   > @bbeaudreault hi, Thank you very much for your reply. I roughly looked at 
above jiras, they are indeed very useful. But I found a scenario that seems not 
to be covered? If the data that needs to be filtered only exists in the 
memstore, can the user's scan request be restricted? Besides, I think my 
implementation is relatively simple, maybe it can also be an option for users 
to quickly kill heavily filtered scan requests? What do you think? cc @Apache9
   
   Yea currently it doesn't handle the memstore, but we are hoping to add that 
in the future. Since this feature here is optional, I don't have a problem with 
adding it. I just think in general hard failures like this are hard to react to 
in production, so hopefully disabled by default.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@hbase.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[GitHub] [hbase] bbeaudreault commented on pull request #5428: HBASE-28107 Limit max count of rows filtered per scan request.

Reply via email to