[ 
https://issues.apache.org/jira/browse/TINKERPOP-2635?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Stephen Mallette updated TINKERPOP-2635:
----------------------------------------
    Description: 
A {{by()}} is deemed "unproductive" when it does not produce a value (i.e. 
where the {{hasNext()}} is {{false}}). As of 3.5.x you can get an exception or 
a {{null}} in those cases:

{code}
gremlin> g.V().group().by('age') // null behavior
==>[null:[v[3],v[5]],32:[v[4]],35:[v[6]],27:[v[2]],29:[v[1]]]
gremlin> g.V().aggregate('a').by(out()).cap('a') // exception behavior
The provided traverser does not map to a value: v[2]->[VertexStep(OUT,vertex)]
Type ':help' or ':h' for help.
Display stack trace? [yN]n
{code}

The {{by(String)}} behavior for this introduce in 3.5.x has become a bit of a 
special case around {{by()}} and as we are slowly removing exceptions for 
cleaner behavior in Gremlin, there clearly needs to be a more consistent 
approach taken here. Here are some problems with how things are now:

1. {{by(String)}} is a bit too much of a special case and is now inconsistent 
with the other forms of {{by()}} which continue to throw runtime exceptions 
(which isn’t desirable either).
2. By propagating a {{null}} from an unproductive {{by()}}, it becomes 
impossible to distinguish between a {{null}} property value (from an existing 
property) and a property that is simply missing.
3. Traversals that use {{simplePath()}} or {{cyclicPath()}} with unproductive 
{{by()}} modulators might lead to confusing results where generated {{null}} 
values create an unexpected equality. Of course, this may be considered an 
orthogonal issue, as it might also be possible to change or parameterize these 
steps to handle {{null}} differently.
4. The {{dedup()}} step will return the first by() modulator that returns 
{{null}} which might be unexpected.

To bring some consistent behavior to this situation an unproductive {{by()}} 
will simply filter results for all steps. How that filtering is applied is 
specific to the step itself but generally speaking the filtering will either:

1. Filter the traverser or otherwise,
2. Ignore the traverser for purpose of constructing a side-effect.

As this sort of filtering might make Gremlin harder to debug, TINKERPOP-2634 
will improve the ability to debug traversals, via {{DebugStrategy}}, which in 
and of itself is a well requested feature. The {{DebugStrategy}} would let 
users know about unproductive {{by()}} modulators by simply throwing an 
exception as it does in 3.4.x. It will accomplish this by converting every 
{{by()}} modulator to this pattern: {{coalesce(<modulator>, fail())}} where 
{{fail()}} would be a new step that kills the traversal by raising an exception 
(a feature that has long been asked for). In addition to {{fail()}}, there 
could also be more of a soft warning in the form of a log or trace if that is 
desired. This pattern could be implemented as a GraphTraversal or a special 
case implementation of {{Traversal}} (like, {{ValueTraversal}}), but the key 
point is that there would now be a way to be aware of unproductive {{by()}} 
filters while developing your application. 

This change in behavior does not have to be a breaking change. The introduction 
of a new {{ProductiveByStrategy}} could unify all {by()}} behavior to produce a 
{{null}}. This strategy would wrap {{by()}} modulators in {{coalesce(<by>, 
constant(null))}} and be installed by default. It could even be made 
configurable to take the keys to which it would apply leaving Gremlin in a more 
optimized state in such cases. The default behavior without this strategy would 
be changed to filter. For 3.6.0, the {{ProductiveByStrategy}} could be removed 
as a default strategy with the option for users to add it back in to maintain 
the 3.5.x functionality.


  was:
A {{by()}} is deemed "unproductive" when it does not produce a value (i.e. 
where the {{hasNext()}} is {{false}}). As of 3.5.x you can get an exception or 
a {{null}} in those cases:

{code}
gremlin> g.V().group().by('age') // null behavior
==>[null:[v[3],v[5]],32:[v[4]],35:[v[6]],27:[v[2]],29:[v[1]]]
gremlin> g.V().aggregate('a').by(out()).cap('a') // exception behavior
The provided traverser does not map to a value: v[2]->[VertexStep(OUT,vertex)]
Type ':help' or ':h' for help.
Display stack trace? [yN]n
{code}

The {{by(String)}} behavior for this introduce in 3.5.x has become a bit of a 
special case around {{by()}} and as we are slowly removing exceptions for 
cleaner behavior in Gremlin, there clearly needs to be a more consistent 
approach taken here. Here are some problems with how things are now:

1. {{by(String)}} is a bit too much of a special case and is now inconsistent 
with the other forms of {{by()}} which continue to throw runtime exceptions 
(which isn’t desirable either).
2. By propagating a {{null}} from an unproductive {{by()}}, it becomes 
impossible to distinguish between a {{null}} property value (from an existing 
property) and a property that is simply missing.
3. Traversals that use {{simplePath()}} or {{cyclicPath()}} with unproductive 
{{by()}} modulators might lead to confusing results where generated {{null}} 
values create an unexpected equality. Of course, this may be considered an 
orthogonal issue, as it might also be possible to change or parameterize these 
steps to handle {{null}} differently.
4. The {{dedup()}} step will return the first by() modulator that returns 
{{null}} which might be unexpected.

To bring some consistent behavior to this situation an unproductive {{by()}} 
will simply filter results for all steps. How that filtering is applied is 
specific to the step itself but generally speaking the filtering will either:

1. Filter the traverser or otherwise,
2. Ignore the traverser for purpose of constructing a side-effect.

As this sort of filtering might make Gremlin harder to debug, TINKERPOP-2634 
will improve the ability to debug traversals, via {{DebugStrategy}}, which in 
and of itself is a well requested feature. The {{DebugStrategy}} would let 
users know about unproductive {{by()}} modulators by simply throwing an 
exception as it does in 3.4.x. It will accomplish this by converting every 
{{by()}} modulator to this pattern: {{coalesce(<modulator>, stop())}} where 
{{stop()}} would be a new step that kills the traversal by raising an exception 
(a feature that has long been asked for). In addition to {{stop()}}, there 
could also be more of a soft warning in the form of a log or trace if that is 
desired. This pattern could be implemented as a GraphTraversal or a special 
case implementation of {{Traversal}} (like, {{ValueTraversal}}), but the key 
point is that there would now be a way to be aware of unproductive {{by()}} 
filters while developing your application. 

This change in behavior does not have to be a breaking change. The introduction 
of a new {{ProductiveByStrategy}} could unify all {by()}} behavior to produce a 
{{null}}. This strategy would wrap {{by()}} modulators in {{coalesce(<by>, 
constant(null))}} and be installed by default. It could even be made 
configurable to take the keys to which it would apply leaving Gremlin in a more 
optimized state in such cases. The default behavior without this strategy would 
be changed to filter. For 3.6.0, the {{ProductiveByStrategy}} could be removed 
as a default strategy with the option for users to add it back in to maintain 
the 3.5.x functionality.



> Consistent by() behavior
> ------------------------
>
>                 Key: TINKERPOP-2635
>                 URL: https://issues.apache.org/jira/browse/TINKERPOP-2635
>             Project: TinkerPop
>          Issue Type: Improvement
>          Components: process
>    Affects Versions: 3.5.1
>            Reporter: Stephen Mallette
>            Priority: Major
>
> A {{by()}} is deemed "unproductive" when it does not produce a value (i.e. 
> where the {{hasNext()}} is {{false}}). As of 3.5.x you can get an exception 
> or a {{null}} in those cases:
> {code}
> gremlin> g.V().group().by('age') // null behavior
> ==>[null:[v[3],v[5]],32:[v[4]],35:[v[6]],27:[v[2]],29:[v[1]]]
> gremlin> g.V().aggregate('a').by(out()).cap('a') // exception behavior
> The provided traverser does not map to a value: v[2]->[VertexStep(OUT,vertex)]
> Type ':help' or ':h' for help.
> Display stack trace? [yN]n
> {code}
> The {{by(String)}} behavior for this introduce in 3.5.x has become a bit of a 
> special case around {{by()}} and as we are slowly removing exceptions for 
> cleaner behavior in Gremlin, there clearly needs to be a more consistent 
> approach taken here. Here are some problems with how things are now:
> 1. {{by(String)}} is a bit too much of a special case and is now inconsistent 
> with the other forms of {{by()}} which continue to throw runtime exceptions 
> (which isn’t desirable either).
> 2. By propagating a {{null}} from an unproductive {{by()}}, it becomes 
> impossible to distinguish between a {{null}} property value (from an existing 
> property) and a property that is simply missing.
> 3. Traversals that use {{simplePath()}} or {{cyclicPath()}} with unproductive 
> {{by()}} modulators might lead to confusing results where generated {{null}} 
> values create an unexpected equality. Of course, this may be considered an 
> orthogonal issue, as it might also be possible to change or parameterize 
> these steps to handle {{null}} differently.
> 4. The {{dedup()}} step will return the first by() modulator that returns 
> {{null}} which might be unexpected.
> To bring some consistent behavior to this situation an unproductive {{by()}} 
> will simply filter results for all steps. How that filtering is applied is 
> specific to the step itself but generally speaking the filtering will either:
> 1. Filter the traverser or otherwise,
> 2. Ignore the traverser for purpose of constructing a side-effect.
> As this sort of filtering might make Gremlin harder to debug, TINKERPOP-2634 
> will improve the ability to debug traversals, via {{DebugStrategy}}, which in 
> and of itself is a well requested feature. The {{DebugStrategy}} would let 
> users know about unproductive {{by()}} modulators by simply throwing an 
> exception as it does in 3.4.x. It will accomplish this by converting every 
> {{by()}} modulator to this pattern: {{coalesce(<modulator>, fail())}} where 
> {{fail()}} would be a new step that kills the traversal by raising an 
> exception (a feature that has long been asked for). In addition to 
> {{fail()}}, there could also be more of a soft warning in the form of a log 
> or trace if that is desired. This pattern could be implemented as a 
> GraphTraversal or a special case implementation of {{Traversal}} (like, 
> {{ValueTraversal}}), but the key point is that there would now be a way to be 
> aware of unproductive {{by()}} filters while developing your application. 
> This change in behavior does not have to be a breaking change. The 
> introduction of a new {{ProductiveByStrategy}} could unify all {by()}} 
> behavior to produce a {{null}}. This strategy would wrap {{by()}} modulators 
> in {{coalesce(<by>, constant(null))}} and be installed by default. It could 
> even be made configurable to take the keys to which it would apply leaving 
> Gremlin in a more optimized state in such cases. The default behavior without 
> this strategy would be changed to filter. For 3.6.0, the 
> {{ProductiveByStrategy}} could be removed as a default strategy with the 
> option for users to add it back in to maintain the 3.5.x functionality.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

Reply via email to