[ https://issues.apache.org/jira/browse/TINKERPOP-2917?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17733205#comment-17733205 ]
Stephen Mallette commented on TINKERPOP-2917: --------------------------------------------- [~martin.haeusler] did you get a chance to look at my comment above? does that solve your problem? > Please make specification of "valueMap()" step more index-friendly > ------------------------------------------------------------------ > > Key: TINKERPOP-2917 > URL: https://issues.apache.org/jira/browse/TINKERPOP-2917 > Project: TinkerPop > Issue Type: Improvement > Components: language > Affects Versions: 3.6.2 > Reporter: Martin Häusler > Priority: Major > > The {{valueMap(...)}} step states that it returns the property values found > in the graph element. There's nothing inherently wrong with this definition, > however it doesn't lend itself well to index-supported execution. For example: > {code:java} > traversal().V().has("type", "person").valueMap("firstName", "lastName") > {code} > This traversal may be executed on very large graphs entirely without loading > a single element from disk, provided that {{{}type{}}}, {{firstName}} and > {{lastName}} are indexed. That would be the ideal case, as it is roughly > equivalent to a (commonly used) SQL query like this: > {code:sql} > SELECT firstName, lastName > FROM person > {code} > The problems come in when the {{valueMap}} step should be answered from the > secondary index: > 1. Most secondary indices cannot distinguish between a list of "John", a set > of "John", or the raw string value "John" (each entry of a multi-valued > property is indexed separately). > 2. Most secondary indices cannot distinguish between {{[ "John", "John" ]}} > and {{["John"]}} (same values are written to the index only once per vertex). > Issue 1 could be overcome if we had a schema on the graph, but for a > schemaless graph there's no way to tell if the index result should be a list, > a set or a raw string. Issue 2 isn't even solved if we had a schema (we would > know that the property is a List, but we would have no way to tell how often > a given value appears in the list). > So, how can we answer {{valueMap()}} with index support? In ChronoGraph, I've > introduced a setting {{{}USE_SECONDARY_INDEX_FOR_VALUE_MAP_STEP{}}}. If this > setting is active, we deviate slightly from the gremlin {{valueMap()}} > definition: > - Each value will always be reported as a {{{}Set{}}}, even if the original > vertex property was a single value or a List (i.e. only distinct values will > be reported, in arbitrary order). > - If the value was {{null}} or the property didn't exist on the vertex at > all, an empty set is reported. > This isn't harmful (as far as I can tell), but it is a difference to the > standard behavior. In exchange, we get the possibility to scan our secondary > indices to answer {{valueMap()}} steps, which is a great advantage in > performance. > > I would very much like to see an adaptation of the standard (for example, the > way I've described above) to make it more friendly towards index usage. -- This message was sent by Atlassian Jira (v8.20.10#820010)