Hi Andrii,

It was great to see your response. I think we are mostly in agreement here.

> It would be even better, IMHO, if the TP project added an ANTLR4 parser for 
> GQL match statements

Agreed, I've been loosely following LDBC's Open GQL project which has produced 
an Apache 2 licensed GQL Antlr grammar which likely offers a good starting 
point.
https://github.com/opengql/grammar

> Except for obvious query injection cases, which, in the absence of query 
> parameters, should be handled by users themselves

I mostly considered this in the remote context, in which reliance on 
gremlin-server for parameters is not an issue. I suppose there may be embedded 
use cases in which query injection is a concern, however this seems much rarer 
than the remote case.

> another important argument for the presence of query parameters is that query 
> parsing is quite a heavy process

I definitely agree on this front.

> >I would prefer to solve that problem at the broader gremlin level, instead 
> >of isolating it to the match step.
> 
> Would you happen to have any other applications in mind?

I suppose I'm approaching this one more from the perspective that I don't see 
why these parameters need to be isolated to just the match subquery.

Parameters is already a bit overloaded and messy in TinkerPop and I hope to 
reduce that complexity overtime. As already noted, remote gremlin scripts 
already have the ability to use parameters via gremlin-server. Bytecode 
requests currently have bindings which serve a similar purpose. Internally we 
also have the Parameterizing interface which is more about steps supporting 
things like `with()` modulation, and not related to query parameters.

I think it's easier for users if we simply have one set of query parameters 
instead of fractured gremlin parameters and match parameters. I expect there 
are some cases where it is useful to reference the same parameter in both the 
gremlin and GQL portions of a query, although it is admittedly not a common use 
case. The following query is a somewhat contrived example where the same 
parameters are used to match 2 nodes, and then the same parameters are 
concatenated together to form an id for a new edge which is added between the 
nodes:
g.match("MATCH (src:Airport {code:srcCode}),  (dest:Airport {code:destCode}) 
RETURN src")
    .addE("Route").to("dest")
    .property(T.id, 
format("%{_}-%{_}").by(constant(srcCode)).by(constant(destCode)))

There may also be cases where it is useful to have multiple match steps in a 
single traversal which reuse the same parameters.

Taking the existing remote query parameters, reworking them to support the 
embedded case as well, then making those parameters available to the new match 
step would solve the query injection and parse cache problems without 
introducing an additional form of parameters for users to handle.

> > I will take some time next week to work through some example queries and 
> > get a better sense of how I feel on each option here.
> 
> Looking forward to reading your conclusions.

I still haven't quite aligned myself regarding single non-element returns. I'll 
reply back on this topic soon.

Thanks again for driving these discussions. In my opinion this will be one of 
the most exciting additions to gremlin in quite some time.

Regards,
Cole

On 2025/08/23 14:00:51 Andrii Lomakin wrote:
> Good day, Cole.
> 
> Glad to exchange more ideas with you in this thread.
> 
> >I think it would make sense for TinkerPop to adopt a default language for 
> >the new match step, which is some heavily restricted form of GQL (read-only, 
> >limited to basic MATCH, WHERE, and RETURN statements). This "standard" 
> >language could then be used in the new match step without a language 
> >with-modulator. Providers would still be free to support their own languages 
> >via that modulator if they choose.
> 
> That makes sense, I agree with you.
> It would be even better, IMHO, if the TP project added an ANTLR4
> parser for GQL match statements (there is already at least one ANTLR
> spec in the public domain) that vendors can use to work on the AST
> level. We can talk about possible collaboration on this task.
> 
> > I'd be interested if you have any examples where embedded parameters 
> > present a clear advantage.
> 
> I expected that this question would be raised :-)
> But decided to move the discussion to a follow-up thread to avoid
> polluting the main proposal.
> Except for obvious query injection cases, which, in the absence of
> query parameters, should be handled by users themselves, another
> important argument for the presence of query parameters is that query
> parsing is quite a heavy process, and the consumption of 20% of CPU
> resources on query parsing is not a rare exception.
> To avoid this overhead, query parsing results (likely ASTs) are cached
> by a simple string hash code (likely the only way, as they are not
> parsed in this phase). Of course, the absence of query parameters very
> often increases the variability of queries by several orders of
> magnitude and voids caching efforts.
> 
> >I would prefer to solve that problem at the broader gremlin level, instead 
> >of isolating it to the match step.
> 
> Would you happen to have any other applications in mind?
> 
> > I will take some time next week to work through some example queries and 
> > get a better sense of how I feel on each option here.
> 
> Looking forward to reading your conclusions.
> 
> >. I think that all "variables" bound in the match query should be stored 
> >such that they are later selectable.
> 
> Yeah, cool idea!
> 
> >Overall I think this would be a great change to gremlin. I look forward to 
> >keeping this discussion going and ultimately seeing the changes land in 
> >TinkerPop.
> 
> Thank you, Cole!
> Once the discussion comes to a natural conclusion, I will summarize
> all the ideas again to ensure that we are all on the same page. Then,
> we will add it to our roadmap.
> 
> On Sat, Aug 23, 2025 at 12:01 AM Cole Greer <[email protected]> wrote:
> >
> > Hi Andrii,
> >
> > Thanks for starting this discussion and putting together this proposal. I 
> > want to start by saying that overall, I'm massively in favour of the 
> > proposed overhaul of match(). This is a topic that has come up many times 
> > in the past, and taking advantage of an established declarative language 
> > like GQL always seems to be the preferred solution.
> >
> > The idea of having the language configurable via something like 
> > `.with(“language”,
> > “GQL”)` is quite interesting, and something I haven't seen in previous 
> > discussions. There is clear value in allowing providers to support their 
> > own preferred declarative languages here, but I also worry about the loss 
> > of query portability if TinkerPop is too hands off on the choice of 
> > declarative language. I believe the vast majority of usages here will be 
> > seeing a traversal with a simple GQL-like match pattern. I think it would 
> > make sense for TinkerPop to adopt a default language for the new match 
> > step, which is some heavily restricted form of GQL (read-only, limited to 
> > basic MATCH, WHERE, and RETURN statements). This "standard" language could 
> > then be used in the new match step without a language with-modulator. 
> > Providers would still be free to support their own languages via that 
> > modulator if they choose.
> >
> > I will take a bit more time to consider the withParameter() proposal. My 
> > initial reaction is that I prefer to tie it into the existing parameter 
> > bindings included in remote requests to gremlin-server. I would like query 
> > parameters to function in a unified manner across the entire traversal if 
> > possible, instead of a separate detached system isolated to the new match 
> > step. I understand the current limitation of only supporting parameters in 
> > remote traversals. I'm not immediately seeing the need to support 
> > parameters for embedded traversals here, I'd be interested if you have any 
> > examples where embedded parameters present a clear advantage. If we do 
> > decide there is a need for embedded parameters, I would prefer to solve 
> > that problem at the broader gremlin level, instead of isolating it to the 
> > match step.
> >
> > I totally agree that the start and mid-step behaviour of the new match step 
> > should be modeled after V() and E().
> >
> > I think the trickiest part of getting this right is the return types. The 
> > most common use cases I expect is where the RETURN clause only includes a 
> > single node or edge. In this case I completely agree with returning the 
> > element itself. I definitely want to support usages such as g.match("MATCH 
> > (n{name:'Cole'}) RETURN n").out()... My main tenet here is that results 
> > should naturally flow from the declarative match into the subsequent 
> > gremlin and be easy to consume. If multiple objects are returned, I would 
> > agree that it is necessary to return a Map<String, ?> as in g.match("MATCH 
> > (p:person)-[e:created]->(s:software) RETURN *") -> {"p": V[1], "e": E[9], 
> > "s": V[3]} ...
> >
> > I'm still on the fence for how to handle single returns of non-elements. I 
> > see the value in your recommendation to return a map of size 1, but I also 
> > see some convenience to directly returning the value (usually a single 
> > property). I will take some time next week to work through some example 
> > queries and get a better sense of how I feel on each option here.
> >
> > There is one final item which I would like to see added to the proposal. I 
> > think that all "variables" bound in the match query should be stored such 
> > that they are later selectable. Essentially I think it's important to 
> > support something like this:
> >
> > g.match("MATCH (n1{name:'Cole'})-[]->(n2) RETURN 
> > n1").where(...)...select(n2).out()...
> >
> > The ability to select other bound variables later in the traversal should 
> > greatly limit the number of times users are forced to return multiple items 
> > at once, which reduces the amount of use cases where users will be forced 
> > to break down maps in gremlin to complete their query.
> >
> > Overall I think this would be a great change to gremlin. I look forward to 
> > keeping this discussion going and ultimately seeing the changes land in 
> > TinkerPop.
> >
> > Thanks,
> > Cole
> >
> > On 2025/08/22 15:46:10 Andrii Lomakin wrote:
> > > Good day.
> > >
> > > I propose new semantics for the match step in Gremlin, which we discussed
> > > briefly in the Discord chat. The current ideas listed partially summarize
> > > ideas suggested by several discussion participants.
> > >
> > > The current semantics of the match step are complex to optimize, so users
> > > do not use this step in practice, and DB vendors do not recommend using
> > > match step in queries.
> > >
> > > Instead, what is proposed is to provide a new match step based on
> > > declarative semantics.
> > >
> > > Signature of this step is quite simple: Travervsal<S, E> match(String
> > > matchQuery).
> > >
> > > Where matchQuery is a match statement written in declarative query 
> > > language
> > > supported by the provider, I will use GQL as an example below.
> > >
> > > This step will require the language as a configuration parameter provided
> > > using with the step.
> > >
> > > So the simplest query will look like:
> > >
> > > g.match(“MATCH 
> > > (person:Person)-[:knows]->(friend:Person)”).with(“language”,
> > > “GQL”)
> > >
> > > match step can accept query parameters, so if we provide a query like
> > > g.match(“MATCH
> > > (p:Person WHERE p.name = $personName)RETURN p.email”).with(“language”,
> > > “GQL”)
> > >
> > > we may use parameter bindings, but it will work only for interaction with
> > > Gremlin Server, so instead, I propose an additional modulator step:
> > > withParameter(String
> > > name, Object value)
> > >
> > > In such case final version will look like: g.match(“MATCH (p:Person WHERE
> > > p.name = $personName) RETURN p.email”).with(“language”,
> > > “GQL”).withParameter(“personName”, “Stephen”)
> > >
> > > Alongside the version of withParameter step that provides the name of the
> > > query parameter, a version with the following signature should also be
> > > provided: withParameter(int index, Object value) for query languages that
> > > support indexed parameters with/instead of named parameters.
> > >
> > > Because we already introduced one modulator step, it is reasonable to
> > > consider replacing it with step by more specific withQueryLanguage()
> > > modulator step that will allow us to add more expressiveness to the
> > > resulting queries.
> > >
> > > In such case final version will look like:  g.match(“MATCH (p:Person WHERE
> > > p.name = $personName) RETURN
> > > p.email”).withQueryLanguage(“GQL”).withParameter(“personName”, “Stephen”)
> > >
> > > As for the scope of application of this step, I recommend making it behave
> > > exactly as it is implemented for the V() and E() steps. It could be added
> > > in the middle of GraphTraversal, but the execution result will be the same
> > > pattern matching execution applied to the whole graph stored in the
> > > database (not to the item filtered/transformed by the previous steps).
> > >
> > > It also means that match step will be added to the GraphTraversalSource.
> > >
> > > As for the format of the output of the match step, I would recommend the
> > > following:
> > >
> > > 1.  If the match statement returns an Element instance, it is returned as
> > > is.
> > >
> > > 2.  Otherwise, it should return any value that is allowed to be a property
> > > value in Element.
> > >
> > > 3. I would add an optional recommendation to return either Element or
> > > Map<String,
> > > ?>  where the key of the map is the result a projection of the query 
> > > result
> > > which in case of query  g.match(“MATCH (p:Person WHERE p.name =
> > > $personName) RETURN
> > > p.email”).withQueryLanguage(“GQL”).withParameter(“personName”, “Stephen”)
> > >
> > > will look like {“p.email”: “[email protected]”}. Following this optional
> > > recommendation will, IMHO, improve user experience.
> > >
> > > This step should be restricted to executing only idempotent queries.
> > >
> > > I would also recommend adding versions of withParameter() that accept
> > > Traversal as a value of the parameters, namely:
> > > 1.  withParameter(String name, TraversalSource value)
> > >
> > > 2.  withParameter(int index, TraversalSource value)
> > >
> > >
> > >
> > > The current version of the match step should be deprecated and then 
> > > removed.
> > >
> > > I want to thank Stephen Mallette, whose initial idea closely aligned with
> > > ours and who actively contributed to our discussions.
> > >
> > > I'm looking forward to your thoughts, observations, and any other feedback
> > > you may have.
> > >
> > > Best Regards,
> > > YouTrackDB development lead
> > > Andrii Lomakin
> > >
> 

Reply via email to