[jira] [Comment Edited] (TINKERPOP-2917) Please make specification of "valueMap()" step more index-friendly

Jira Fri, 16 Jun 2023 01:04:03 -0700


    [ 
https://issues.apache.org/jira/browse/TINKERPOP-2917?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17733386#comment-17733386
 ]


Martin Häusler edited comment on TINKERPOP-2917 at 6/16/23 8:03 AM:
--------------------------------------------------------------------

[~spmallette] sorry for the late response, busy times. It's certainly possible 
to work with {{{}OptionsStrategy{}}}, or with global settings on the graph or 
the traversal. My original comment was going more in the direction of either 
slightly altering the standard definition of the {{valueMap}} step to enable 
index support by default (turning the values of the map into {{Sets}}, 
regardless of the original property type). If the index support has to be 
enabled with an option (whether that is globally on the graph, on the 
traversal, or on the individual step), the user needs to be aware of its 
existence (running {{valueMap}} with the default semantics is almost guaranteed 
to force linear iteration, unless the graph has a custom schema definition 
which it can enforce). Furthermore, there is no standardized option which works 
across all graph implementations, every vendor is currently rolling their own 
solution to the problem I've outlined in the ticket.

So... does it solve the problem? It's the way I'm currently doing it, and it 
works, but it introduces a quirk specific to my implementation in order to 
solve a problem which exists across all implementations.


was (Author: martin.haeusler):
[~spmallette] sorry for the late response, busy times. It's certainly possible 
to work with \{{OptionsStrategy}}, or with global settings on the graph or the 
traversal. My original comment was going more in the direction of either 
slightly altering the standard definition of the \{{valueMap}} step to enable 
index support by default (turning the values of the map into \{{Set}}s, 
regardless of the original property type). If the index support has to be 
enabled with an option (whether that is globally on the graph, on the 
traversal, or on the individual step), the user needs to be aware of its 
existence (running \{{valueMap}} with the default semantics is almost 
guaranteed to force linear iteration, unless the graph has a custom schema 
definition which it can enforce). Furthermore, there is no standardized option 
which works across all graph implementations, every vendor is currently rolling 
their own solution to the problem I've outlined in the ticket.

So... does it solve the problem? It's the way I'm currently doing it, and it 
works, but it introduces a quirk specific to my implementation in order to 
solve a problem which exists across all implementations.

> Please make specification of "valueMap()" step more index-friendly
> ------------------------------------------------------------------
>
>                 Key: TINKERPOP-2917
>                 URL: https://issues.apache.org/jira/browse/TINKERPOP-2917
>             Project: TinkerPop
>          Issue Type: Improvement
>          Components: language
>    Affects Versions: 3.6.2
>            Reporter: Martin Häusler
>            Priority: Major
>
> The {{valueMap(...)}} step states that it returns the property values found 
> in the graph element. There's nothing inherently wrong with this definition, 
> however it doesn't lend itself well to index-supported execution. For example:
> {code:java}
> traversal().V().has("type", "person").valueMap("firstName", "lastName")
> {code}
> This traversal may be executed on very large graphs entirely without loading 
> a single element from disk, provided that {{{}type{}}}, {{firstName}} and 
> {{lastName}} are indexed. That would be the ideal case, as it is roughly 
> equivalent to a (commonly used) SQL query like this:
> {code:sql}
> SELECT firstName, lastName
> FROM person
> {code}
> The problems come in when the {{valueMap}} step should be answered from the 
> secondary index:
> 1. Most secondary indices cannot distinguish between a list of "John", a set 
> of "John", or the raw string value "John" (each entry of a multi-valued 
> property is indexed separately).
> 2. Most secondary indices cannot distinguish between {{[ "John", "John" ]}} 
> and {{["John"]}} (same values are written to the index only once per vertex).
> Issue 1 could be overcome if we had a schema on the graph, but for a 
> schemaless graph there's no way to tell if the index result should be a list, 
> a set or a raw string. Issue 2 isn't even solved if we had a schema (we would 
> know that the property is a List, but we would have no way to tell how often 
> a given value appears in the list).
> So, how can we answer {{valueMap()}} with index support? In ChronoGraph, I've 
> introduced a setting {{{}USE_SECONDARY_INDEX_FOR_VALUE_MAP_STEP{}}}. If this 
> setting is active, we deviate slightly from the gremlin {{valueMap()}} 
> definition:
>  - Each value will always be reported as a {{{}Set{}}}, even if the original 
> vertex property was a single value or a List (i.e. only distinct values will 
> be reported, in arbitrary order).
>  - If the value was {{null}} or the property didn't exist on the vertex at 
> all, an empty set is reported.
> This isn't harmful (as far as I can tell), but it is a difference to the 
> standard behavior. In exchange, we get the possibility to scan our secondary 
> indices to answer {{valueMap()}} steps, which is a great advantage in 
> performance.
>  
> I would very much like to see an adaptation of the standard (for example, the 
> way I've described above) to make it more friendly towards index usage.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Comment Edited] (TINKERPOP-2917) Please make specification of "valueMap()" step more index-friendly

Reply via email to