This is indeed quite a small change.
The PR is up at https://github.com/apache/hbase/pull/5955

On Wed, May 29, 2024 at 10:07 AM Istvan Toth <st...@cloudera.com> wrote:

> Thanks for the detailed reply, Andrew.
>
> I was also considering default methods, but it turns out that Filter is
> not an interface, but an abstract class, so it doesn't apply.
>
> Children not implementing a marker interface or marker method would
> inherit the marker method implementation from the closest parent the same
> way they would inherit the marker interface, so I think they are equivalent
> in this aspect, too.
>
> I think that marker interface(s) and overridable non-abstract getter(s) in
> Filter are mostly equivalent from both logical and source compatibility
> aspects.
> The only difference is that the marker interfaces cannot be removed in a
> subclass, while the getter can be overridden anywhere, but with well-chosen
> defaults it shouldn't be much of a limitation.
>
> Now that I think about it, we could cache the markers' values in an array
> when creating the filter lists, so even the cost of looking them up doesn't
> matter as it wouldn't happen in the hot code path.
>
> Using the marker interfaces is more elegant, and discourages problematic
> subclassing, so I am leaning towards that.
>
> Istvan
>
> On Wed, May 29, 2024 at 2:30 AM Andrew Purtell <apurt...@apache.org>
> wrote:
>
>> Actually source compatibility with default methods would be fine too. I
>> forget this is the main reason default methods were invented. The code of
>> derived classes would not need to be changed, unless the returned value of
>> the new method should be changed, and this is no worse than having a
>> marker
>> interface, which would also require code changes to implement non-default
>> behaviors.
>>
>> A marker interface does remain as an option. It might make a difference in
>> chained use cases. Consider a chain of filter instances that mixes derived
>> code that is unaware of isHinting() and base code that is. The filter
>> chain
>> can be examined for the presence or absence of the marker interface and
>> would not need to rely on every filter in the chain passing return values
>> of isHinting back.
>>
>> Marker interfaces can also be added to denote stateful or stateless
>> filters, if distinguishing between them would be useful, perhaps down the
>> road.
>>
>> On Tue, May 28, 2024 at 5:13 PM Andrew Purtell <apurt...@apache.org>
>> wrote:
>>
>> > I think you've clearly put a lot of time into the analysis and it is
>> > plausible.
>> >
>> > Adding isHinting as a default method will preserve binary compatibility.
>> > Source compatibility for derived custom filters would be broken though
>> and
>> > that probably prevents this going back into a releasing code line.
>> >
>> > Have you considered adding a marker interface instead? That would
>> preserve
>> > both source and binary compatibility. It wouldn't require any changes to
>> > derived custom filters. A runtime instanceof test would determine if the
>> > filter is a hinting filter or not. No need for a new method, default or
>> > otherwise.
>> >
>> > On Tue, May 28, 2024 at 12:41 AM Istvan Toth <st...@apache.org> wrote:
>> >
>> >> I have recently opened HBASE-28622
>> >> <https://issues.apache.org/jira/browse/HBASE-28622> , which has turned
>> >> out
>> >> to be another aspect of the problem discussed in HBASE-20565
>> >> <https://issues.apache.org/jira/browse/HBASE-20565> .
>> >>
>> >> The problem is discussed in detail in HBASE-20565
>> >> <https://issues.apache.org/jira/browse/HBASE-20565> , but it boils
>> down
>> >> to
>> >> the API design decision that the filters returning SEEK_NEXT_USING_HINT
>> >> rely on filterCell() getting called.
>> >>
>> >> On the other hand, some filters maintain an internal row state that
>> sets
>> >> counters for calls of filterCell(), which interacts with the results of
>> >> previous filters in a filterList.
>> >>
>> >> When filters return different results for filterRowkey(), then filters
>> >> returning  SEEK_NEXT_USING_HINT that have returned false must have
>> >> filterCell() called, otherwise the scan will degenerate into a full
>> scan.
>> >>
>> >> On the other hand, filters that maintain an internal row state must
>> only
>> >> be
>> >> called if all previous filters have INCLUDEed the Cell, otherwise their
>> >> internal state will be off. (This still has caveats, as described in
>> >> HBASE-20565 <https://issues.apache.org/jira/browse/HBASE-20565>)
>> >>
>> >> In my opinion, the current code from HBASE-20565
>> >> <https://issues.apache.org/jira/browse/HBASE-20565> strikes a bad
>> balance
>> >> between features, as while it fixes some use cases for row stateful
>> >> filters, it also often negates the performance benefits of the filters
>> >> providing hints, which in practice makes them unusable in many filter
>> list
>> >> combinations.
>> >>
>> >> Without completely re-designing the filter system, I think that the
>> best
>> >> solution would be adding a method to distinguish the filters that can
>> >> return hints from the rest of them. (This was also suggested in
>> >> HBASE-20565
>> >> <https://issues.apache.org/jira/browse/HBASE-20565> , but it was not
>> >> implemented)
>> >>
>> >> In theory, we have four combinations of hinting and row stateful
>> filters,
>> >> but currently we have no filters that are both hinting and row
>> stateful,
>> >> and I don't think that there is valid use case for those. The ones that
>> >> are
>> >> neither hinting nor stateful could be handled as either, but treating
>> them
>> >> as non-hinting seems faster.
>> >>
>> >> Once we have that, we can improve the filterList behaviour a lot:
>> >> - in filterRowKey(), if any hinting filter returns false, then we could
>> >> return false
>> >> - in filterCell(), rather than returning on the first non-include
>> result,
>> >> we could process the remaining hinting filters, while skipping the
>> >> non-hinting ones.
>> >>
>> >> The code changes are minimal, we just need to add a new method like
>> >> isHinting() to the Filter class, and change the above two methods.
>> >>
>> >> We could add this even in 2.5, by defaulting isHinting() to return
>> false
>> >> in
>> >> the Filter class, which would preserve the current API and behaviour
>> for
>> >> existing custom filters.
>> >>
>> >> I was looking at it from the AND filter perspective, but if needed,
>> >> similar
>> >> changes could be made to the OR filter.
>> >>
>> >> What do you think ?
>> >> Is this a good idea ?
>> >>
>> >> Istvan
>> >>
>> >
>> --
>> Best regards,
>> Andrew
>>
>> Unrest, ignorance distilled, nihilistic imbeciles -
>>     It's what we’ve earned
>> Welcome, apocalypse, what’s taken you so long?
>> Bring us the fitting end that we’ve been counting on
>>    - A23, Welcome, Apocalypse
>>
>
>
> --
> *István Tóth* | Sr. Staff Software Engineer
> *Email*: st...@cloudera.com
> cloudera.com <https://www.cloudera.com>
> [image: Cloudera] <https://www.cloudera.com/>
> [image: Cloudera on Twitter] <https://twitter.com/cloudera> [image:
> Cloudera on Facebook] <https://www.facebook.com/cloudera> [image:
> Cloudera on LinkedIn] <https://www.linkedin.com/company/cloudera>
> ------------------------------
> ------------------------------
>


-- 
*István Tóth* | Sr. Staff Software Engineer
*Email*: st...@cloudera.com
cloudera.com <https://www.cloudera.com>
[image: Cloudera] <https://www.cloudera.com/>
[image: Cloudera on Twitter] <https://twitter.com/cloudera> [image:
Cloudera on Facebook] <https://www.facebook.com/cloudera> [image: Cloudera
on LinkedIn] <https://www.linkedin.com/company/cloudera>
------------------------------
------------------------------

Reply via email to