Hi Ken,

This is a great input. Thank you for the feedback. I believe you are right
and we should check each case separately.
That said, I was just worried if Graph Providers start blindly replacing
steps with optimised versions of those steps then it becomes hard to track
how it affects TinkerPop side optimization strategies and custom Steps.
For example, let's take a look at any random optimization strategy on
TinkerPop's side. For example EarlyLimitStrategy
<https://github.com/apache/tinkerpop/blob/3.6-dev/gremlin-core/src/main/java/org/apache/tinkerpop/gremlin/process/traversal/strategy/optimization/EarlyLimitStrategy.java>
.
This strategy specifically has specific logic regarding usage of
`RangeGlobalStep`.
I.e.  `if (step instanceof RangeGlobalStep)`.
If we check this step it's `final` (`public final class RangeGlobalStep`).
In such a case nobody could actually easily replace this strategy without
affecting `EarlyLimitStrategy` because we can't extend / overwrite
RangeGlobalStep due to being final.
Moreover `if (step instanceof RangeGlobalStep)` could actually be replaced
by `if (RangeGlobalStep.class.equals(step.getClass()))` because we know
there are no instances of `RangeGlobalStep` other than `RangeGlobalStep`
itself, but the strategy tells that any instance of `RangeGlobalStep` is
valid for optimization.

The above relation of Optimizations and Steps affects many Step classes and
many optimization strategies. Moreover it's kind of inconsistent that some
of the steps are `final` and some of them are non-final even without having
any other classes extending them in the TinkerPop codebase. For example,
PropertiesStep and PropertyMapStep are non-final, but they don't have any
classes extending them inside TinkerPop codebase. Logically Graph Providers
extend and replace those steps, but then why do we keep those steps open if
their codebase is bigger and more likely to be changed then the other
`final` steps which are more likely to stay as is? (i.e. I would expect
PropertyMap step to be changed more often then the IsStep for example, even
so the latter one is non-final).

I do agree with you that we should probably be curious of any methods /
fields which we open with `protected` and don't blindly replace `private`
fields / methods to `protected` fields / methods (as I did in the PR), but
I doubt any Step should be `final`.
Taking your input into account I think the way we could do, is:
1. Open all Step classes for extension by removing `final`. (do this in a
single PR for all steps)
2. Move all `private` / `protected` utility methods out from Step classes
into some utility classes. (do this case by case depending on the step)
3. Add public getter methods and / or public constructors to be able to
copy necessary step fields from the original step into a new extending
step. (do this case by case depending on the step)

Thus, I believe the original PR
<https://github.com/apache/tinkerpop/pull/2027> should be closed and
separate PRs should be opened. So far I opened 3 separate small PRs to make
extensions for the specific steps.
https://github.com/apache/tinkerpop/pull/2029
https://github.com/apache/tinkerpop/pull/2030
https://github.com/apache/tinkerpop/pull/2031

Would be great to know what you think about it.

Thanks,
Oleksandr


On Thu, Apr 20, 2023 at 4:42 PM Ken Hu <k...@bitquilltech.com.invalid>
wrote:

> Hi Oleksandr,
>
> Thanks for bringing this idea to the community. However, I would be very
> hesitant to include such a large change to TinkerPop. The Gremlin language
> doesn't really have a specification which means that what a Step does is
> more susceptible to change. I also feel like the current implementation of
> Steps wasn't designed to be a stable and open interface so I would consider
> them to be internal TinkerPop classes. From what I've heard, some of these
> classes have intentionally been marked as final to prevent misusage which
> is something that has happened in the past. Another concern I have is the
> increased potential to break other's changes during refactoring or bug
> fixing. It's easy to run into a situation where you make a fix for an issue
> but don't realize that you have accidentally broken someone else's
> implementation because you didn't know their code had a specific functional
> dependency on the current implementation. This scenario is much more likely
> to occur when you open up this many classes and there is no way to predict
> what others will do with it.
>
> I think we can revisit this idea in the future once we have a firmer
> specification for the Gremlin language. For now I think it is better to
> take these changes on in a case by case basis.
>
> Thanks,
> Ken
>
> On Tue, Apr 18, 2023 at 6:27 PM Oleksandr Porunov <
> alexandr.poru...@gmail.com> wrote:
>
> > Hello everyone,
> >
> > I would like to get some opinion from the community regarding relaxing
> > strictness of Step classes for better extensibility.
> > The problem I encounter while making some optimizations for JanusGraph is
> > that many steps are not quite extensible due to being marked as `final`
> or
> > simply having important fields as `private` without any `public` getters.
> > Thus, making it complicated to replace original TinkerPop Steps with
> Graph
> > Provider's optimized Steps.
> > In this discussion I would like to know community opinion on relaxing
> Step
> > classes strictness with making the following changes:
> >
> > 1. Remove `final` keyword from all Step classes, so that it would be
> > possible to extend / overwrite them. Without this change Graph Providers
> > might need to duplicate some logic during their custom Step
> implementation
> > (because they won't be able to extend an already existing step and
> instead
> > will need to re-implement it from scratch).
> > Real example:
> > We want to optimize the `ProjectStep` step to be able to query all
> included
> > traversals in parallel instead of sequentially (
> > https://github.com/JanusGraph/janusgraph/issues/3559 ). Current
> > TinkerPop's ProjectStep if `final`. Thus, if we want to replace this step
> > with the optimized version of this step we will need to create our custom
> > step which will have most of the logic duplicated with ProjectStep (most
> > likely the only difference will be in the `map` step).
> >
> > 2. Convert all methods and fields from `private` to `protected` to be
> able
> > to access them in extended Step versions. In case the extended version
> > overwrites some of the original TinkerPop functionality then the child
> > class may potentially need to access some of the parent's class fields.
> If
> > those fields are private and don't have any getters then it might be
> > complicated to retrieve that information (it will require using
> > Reflection API).
> > Having fields as `protected` will allow for extended child classes to
> > access parent's fields.
> > Another solution could be to have `protected` getters available for those
> > fields.
> > Real example:
> > We would like to optimize PropertyMap step to pre-fetch elements
> properties
> > ( https://github.com/JanusGraph/janusgraph/issues/2444 ).
> > For that we have to overwrite the `map` method and re-implement it (could
> > be considered as anti-pattern, but it's complicated to do that
> otherwise).
> > In the overwritten method we may have some duplicated logic to be
> > re-implemented and for that we will need to access parent fields /
> methods.
> > This was fixed for PropertyMap here (
> > https://github.com/apache/tinkerpop/pull/2022 ), but there are many
> other
> > Steps which have private properties without getters.
> >
> > 3. (optional) Add public getters for fields which may be required to be
> > copied from an original Step into an extension Step.
> > Real example:
> > Again about `PropertyMapStep`. Even though we changed the fields to
> > `protected` (which now allows us to set and read those fields from the
> > child class), we can't easily replace the original Step with the extended
> > version of this Step because we will need to provide all the information
> we
> > have from the original Step into the new Step. Without having public
> > getters available for those fields we won't be able to retrieve data for
> > protected fields (unless we place the optimization class to the same
> > package).
> > I.e. If we implement a replacement Step for PropertyMap step we will need
> > to access `tokens` and `traversalRing` from the original step, but we
> won't
> > be able to do that because currently PropertyMap doesn't have public
> > getters for those fields. Thus, it will require us to place a new
> > implementation class to the same package, so that we could access
> protected
> > fields.
> >
> > Overall, I think relaxing Step classes' strictness can benefit Graph
> > Providers as they will be able to overwrite functionality they need. That
> > said, due to small optimization strategies experience, I might miss some
> > development patterns regarding developing optimizations. Potentially, it
> > could be that replacing original Steps is a bad idea but it's hard for me
> > to find good optimization solutions otherwise.
> >
> > These changes don't look to me as breaking changes because no code
> changes
> > will be required from current Graph Provides sides, but maybe I missed
> > something.
> >
> > As for now, the JanusGraph community has plans to optimize usage of:
> > PropertyMap, ElementMap, Coalesce, Has, Project steps. Thus it drived
> this
> > thinking and discussion.
> >
> > Ticket regarding relaxing step classes strictness:
> > https://issues.apache.org/jira/browse/TINKERPOP-2927
> > PR to relax step classes strictness:
> > https://github.com/apache/tinkerpop/pull/2027
> >
> > Any feedback is appreciated.
> >
> > Best regards,
> > Oleksandr
> >
>

Reply via email to