[jira] [Commented] (TINKERPOP-2917) Please make specification of "valueMap()" step more index-friendly

Stephen Mallette (Jira) Thu, 15 Jun 2023 12:16:04 -0700


    [ 
https://issues.apache.org/jira/browse/TINKERPOP-2917?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17733205#comment-17733205
 ]


Stephen Mallette commented on TINKERPOP-2917:
---------------------------------------------

[~martin.haeusler] did you get a chance to look at my comment above? does that 
solve your problem?

> Please make specification of "valueMap()" step more index-friendly
> ------------------------------------------------------------------
>
>                 Key: TINKERPOP-2917
>                 URL: https://issues.apache.org/jira/browse/TINKERPOP-2917
>             Project: TinkerPop
>          Issue Type: Improvement
>          Components: language
>    Affects Versions: 3.6.2
>            Reporter: Martin Häusler
>            Priority: Major
>
> The {{valueMap(...)}} step states that it returns the property values found 
> in the graph element. There's nothing inherently wrong with this definition, 
> however it doesn't lend itself well to index-supported execution. For example:
> {code:java}
> traversal().V().has("type", "person").valueMap("firstName", "lastName")
> {code}
> This traversal may be executed on very large graphs entirely without loading 
> a single element from disk, provided that {{{}type{}}}, {{firstName}} and 
> {{lastName}} are indexed. That would be the ideal case, as it is roughly 
> equivalent to a (commonly used) SQL query like this:
> {code:sql}
> SELECT firstName, lastName
> FROM person
> {code}
> The problems come in when the {{valueMap}} step should be answered from the 
> secondary index:
> 1. Most secondary indices cannot distinguish between a list of "John", a set 
> of "John", or the raw string value "John" (each entry of a multi-valued 
> property is indexed separately).
> 2. Most secondary indices cannot distinguish between {{[ "John", "John" ]}} 
> and {{["John"]}} (same values are written to the index only once per vertex).
> Issue 1 could be overcome if we had a schema on the graph, but for a 
> schemaless graph there's no way to tell if the index result should be a list, 
> a set or a raw string. Issue 2 isn't even solved if we had a schema (we would 
> know that the property is a List, but we would have no way to tell how often 
> a given value appears in the list).
> So, how can we answer {{valueMap()}} with index support? In ChronoGraph, I've 
> introduced a setting {{{}USE_SECONDARY_INDEX_FOR_VALUE_MAP_STEP{}}}. If this 
> setting is active, we deviate slightly from the gremlin {{valueMap()}} 
> definition:
>  - Each value will always be reported as a {{{}Set{}}}, even if the original 
> vertex property was a single value or a List (i.e. only distinct values will 
> be reported, in arbitrary order).
>  - If the value was {{null}} or the property didn't exist on the vertex at 
> all, an empty set is reported.
> This isn't harmful (as far as I can tell), but it is a difference to the 
> standard behavior. In exchange, we get the possibility to scan our secondary 
> indices to answer {{valueMap()}} steps, which is a great advantage in 
> performance.
>  
> I would very much like to see an adaptation of the standard (for example, the 
> way I've described above) to make it more friendly towards index usage.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Commented] (TINKERPOP-2917) Please make specification of "valueMap()" step more index-friendly

Reply via email to