bbeaudreault commented on PR #5428: URL: https://github.com/apache/hbase/pull/5428#issuecomment-1737316350
> > Are you sure this is necessary? I agree that heavily filtered scans can cause problems. As of recently, this should be mostly mitigated with the improvements around blockBytesScanned ([HBASE-27227](https://issues.apache.org/jira/browse/HBASE-27227), [HBASE-27532](https://issues.apache.org/jira/browse/HBASE-27532), [HBASE-27558](https://issues.apache.org/jira/browse/HBASE-27558)). > > So with those jiras, a heavily filtered scan will have a high volume of blockBytesScanned. This will cause those scans to checkpoint more often due to max scan size limits, so they won't hold up RPC handlers. You can then use quotas to limit these scans, so they don't have to be hard failed instead slowed down. > > I prefer that approach over this one because in my experience failing a request is an extreme response which can have consequences for users. Fixing the scan requires figuring out why and pushing code to production, which is time consuming and in the meantime the user queries are failing. Another big problem with filtered rows is it doesn't necessarily happen right as you deploy a new scan workload. It may work fine for a while but over time rows get written that dont match your filters, so one day your scan just starts failing due to the limits. > > What do you think? Do you want to give the above jiras a try? > > @bbeaudreault hi, Thank you very much for your reply. I roughly looked at above jiras, they are indeed very useful. But I found a scenario that seems not to be covered? If the data that needs to be filtered only exists in the memstore, can the user's scan request be restricted? Besides, I think my implementation is relatively simple, maybe it can also be an option for users to quickly kill heavily filtered scan requests? What do you think? cc @Apache9 Yea currently it doesn't handle the memstore, but we are hoping to add that in the future. Since this feature here is optional, I don't have a problem with adding it. I just think in general hard failures like this are hard to react to in production, so hopefully disabled by default. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@hbase.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org