This is indeed quite a small change. The PR is up at https://github.com/apache/hbase/pull/5955
On Wed, May 29, 2024 at 10:07 AM Istvan Toth <st...@cloudera.com> wrote: > Thanks for the detailed reply, Andrew. > > I was also considering default methods, but it turns out that Filter is > not an interface, but an abstract class, so it doesn't apply. > > Children not implementing a marker interface or marker method would > inherit the marker method implementation from the closest parent the same > way they would inherit the marker interface, so I think they are equivalent > in this aspect, too. > > I think that marker interface(s) and overridable non-abstract getter(s) in > Filter are mostly equivalent from both logical and source compatibility > aspects. > The only difference is that the marker interfaces cannot be removed in a > subclass, while the getter can be overridden anywhere, but with well-chosen > defaults it shouldn't be much of a limitation. > > Now that I think about it, we could cache the markers' values in an array > when creating the filter lists, so even the cost of looking them up doesn't > matter as it wouldn't happen in the hot code path. > > Using the marker interfaces is more elegant, and discourages problematic > subclassing, so I am leaning towards that. > > Istvan > > On Wed, May 29, 2024 at 2:30 AM Andrew Purtell <apurt...@apache.org> > wrote: > >> Actually source compatibility with default methods would be fine too. I >> forget this is the main reason default methods were invented. The code of >> derived classes would not need to be changed, unless the returned value of >> the new method should be changed, and this is no worse than having a >> marker >> interface, which would also require code changes to implement non-default >> behaviors. >> >> A marker interface does remain as an option. It might make a difference in >> chained use cases. Consider a chain of filter instances that mixes derived >> code that is unaware of isHinting() and base code that is. The filter >> chain >> can be examined for the presence or absence of the marker interface and >> would not need to rely on every filter in the chain passing return values >> of isHinting back. >> >> Marker interfaces can also be added to denote stateful or stateless >> filters, if distinguishing between them would be useful, perhaps down the >> road. >> >> On Tue, May 28, 2024 at 5:13 PM Andrew Purtell <apurt...@apache.org> >> wrote: >> >> > I think you've clearly put a lot of time into the analysis and it is >> > plausible. >> > >> > Adding isHinting as a default method will preserve binary compatibility. >> > Source compatibility for derived custom filters would be broken though >> and >> > that probably prevents this going back into a releasing code line. >> > >> > Have you considered adding a marker interface instead? That would >> preserve >> > both source and binary compatibility. It wouldn't require any changes to >> > derived custom filters. A runtime instanceof test would determine if the >> > filter is a hinting filter or not. No need for a new method, default or >> > otherwise. >> > >> > On Tue, May 28, 2024 at 12:41 AM Istvan Toth <st...@apache.org> wrote: >> > >> >> I have recently opened HBASE-28622 >> >> <https://issues.apache.org/jira/browse/HBASE-28622> , which has turned >> >> out >> >> to be another aspect of the problem discussed in HBASE-20565 >> >> <https://issues.apache.org/jira/browse/HBASE-20565> . >> >> >> >> The problem is discussed in detail in HBASE-20565 >> >> <https://issues.apache.org/jira/browse/HBASE-20565> , but it boils >> down >> >> to >> >> the API design decision that the filters returning SEEK_NEXT_USING_HINT >> >> rely on filterCell() getting called. >> >> >> >> On the other hand, some filters maintain an internal row state that >> sets >> >> counters for calls of filterCell(), which interacts with the results of >> >> previous filters in a filterList. >> >> >> >> When filters return different results for filterRowkey(), then filters >> >> returning SEEK_NEXT_USING_HINT that have returned false must have >> >> filterCell() called, otherwise the scan will degenerate into a full >> scan. >> >> >> >> On the other hand, filters that maintain an internal row state must >> only >> >> be >> >> called if all previous filters have INCLUDEed the Cell, otherwise their >> >> internal state will be off. (This still has caveats, as described in >> >> HBASE-20565 <https://issues.apache.org/jira/browse/HBASE-20565>) >> >> >> >> In my opinion, the current code from HBASE-20565 >> >> <https://issues.apache.org/jira/browse/HBASE-20565> strikes a bad >> balance >> >> between features, as while it fixes some use cases for row stateful >> >> filters, it also often negates the performance benefits of the filters >> >> providing hints, which in practice makes them unusable in many filter >> list >> >> combinations. >> >> >> >> Without completely re-designing the filter system, I think that the >> best >> >> solution would be adding a method to distinguish the filters that can >> >> return hints from the rest of them. (This was also suggested in >> >> HBASE-20565 >> >> <https://issues.apache.org/jira/browse/HBASE-20565> , but it was not >> >> implemented) >> >> >> >> In theory, we have four combinations of hinting and row stateful >> filters, >> >> but currently we have no filters that are both hinting and row >> stateful, >> >> and I don't think that there is valid use case for those. The ones that >> >> are >> >> neither hinting nor stateful could be handled as either, but treating >> them >> >> as non-hinting seems faster. >> >> >> >> Once we have that, we can improve the filterList behaviour a lot: >> >> - in filterRowKey(), if any hinting filter returns false, then we could >> >> return false >> >> - in filterCell(), rather than returning on the first non-include >> result, >> >> we could process the remaining hinting filters, while skipping the >> >> non-hinting ones. >> >> >> >> The code changes are minimal, we just need to add a new method like >> >> isHinting() to the Filter class, and change the above two methods. >> >> >> >> We could add this even in 2.5, by defaulting isHinting() to return >> false >> >> in >> >> the Filter class, which would preserve the current API and behaviour >> for >> >> existing custom filters. >> >> >> >> I was looking at it from the AND filter perspective, but if needed, >> >> similar >> >> changes could be made to the OR filter. >> >> >> >> What do you think ? >> >> Is this a good idea ? >> >> >> >> Istvan >> >> >> > >> -- >> Best regards, >> Andrew >> >> Unrest, ignorance distilled, nihilistic imbeciles - >> It's what we’ve earned >> Welcome, apocalypse, what’s taken you so long? >> Bring us the fitting end that we’ve been counting on >> - A23, Welcome, Apocalypse >> > > > -- > *István Tóth* | Sr. Staff Software Engineer > *Email*: st...@cloudera.com > cloudera.com <https://www.cloudera.com> > [image: Cloudera] <https://www.cloudera.com/> > [image: Cloudera on Twitter] <https://twitter.com/cloudera> [image: > Cloudera on Facebook] <https://www.facebook.com/cloudera> [image: > Cloudera on LinkedIn] <https://www.linkedin.com/company/cloudera> > ------------------------------ > ------------------------------ > -- *István Tóth* | Sr. Staff Software Engineer *Email*: st...@cloudera.com cloudera.com <https://www.cloudera.com> [image: Cloudera] <https://www.cloudera.com/> [image: Cloudera on Twitter] <https://twitter.com/cloudera> [image: Cloudera on Facebook] <https://www.facebook.com/cloudera> [image: Cloudera on LinkedIn] <https://www.linkedin.com/company/cloudera> ------------------------------ ------------------------------