Added duplicate vertex recipe
Project: http://git-wip-us.apache.org/repos/asf/tinkerpop/repo Commit: http://git-wip-us.apache.org/repos/asf/tinkerpop/commit/d7ecfc05 Tree: http://git-wip-us.apache.org/repos/asf/tinkerpop/tree/d7ecfc05 Diff: http://git-wip-us.apache.org/repos/asf/tinkerpop/diff/d7ecfc05 Branch: refs/heads/TINKERPOP-1602 Commit: d7ecfc05f9c476fed2ebf987e6b26bee14f5db54 Parents: 3e54d89 Author: Stephen Mallette <sp...@genoprime.com> Authored: Tue Jan 10 10:19:59 2017 -0500 Committer: Stephen Mallette <sp...@genoprime.com> Committed: Tue Jan 10 10:19:59 2017 -0500 ---------------------------------------------------------------------- docs/src/recipes/duplicate-vertex.asciidoc | 52 +++++++++++++++++++++++++ docs/src/recipes/index.asciidoc | 2 + 2 files changed, 54 insertions(+) ---------------------------------------------------------------------- http://git-wip-us.apache.org/repos/asf/tinkerpop/blob/d7ecfc05/docs/src/recipes/duplicate-vertex.asciidoc ---------------------------------------------------------------------- diff --git a/docs/src/recipes/duplicate-vertex.asciidoc b/docs/src/recipes/duplicate-vertex.asciidoc new file mode 100644 index 0000000..e0327f4 --- /dev/null +++ b/docs/src/recipes/duplicate-vertex.asciidoc @@ -0,0 +1,52 @@ +//// +Licensed to the Apache Software Foundation (ASF) under one or more +contributor license agreements. See the NOTICE file distributed with +this work for additional information regarding copyright ownership. +The ASF licenses this file to You under the Apache License, Version 2.0 +(the "License"); you may not use this file except in compliance with +the License. You may obtain a copy of the License at + + http://www.apache.org/licenses/LICENSE-2.0 + +Unless required by applicable law or agreed to in writing, software +distributed under the License is distributed on an "AS IS" BASIS, +WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +See the License for the specific language governing permissions and +limitations under the License. +//// +[[duplicate-vertex]] +Duplicate Vertex Detection +-------------------------- + +The pattern for finding duplicate vertices is quite similar to the pattern defined in the <<duplicate-edge,Duplicate Edge>> +section. The idea is to extract the relevant features of the vertex into a comparable list that can then be used to +group for duplicates. + +Consider the following example with some duplicate vertices added to the "modern" graph: + +[gremlin-groovy,modern] +---- +g.addV(label, 'person', 'name', 'vadas', 'age', 27) +g.addV(label, 'person', 'name', 'vadas', 'age', 22) // not a duplicate because "age" value +g.addV(label, 'person', 'name', 'marko', 'age', 29) +g.V().hasLabel("person"). + group(). + by(values("name", "age").fold()). + unfold() +---- + +In the above case, the "name" and "age" properties are the relevant features for identifying duplication. The key in +the `Map` provided by the `group` is the list of features for comparison and the value is the list of vertices that +match the feature. To extract just those vertices that contain duplicates an additional filter can be added: + +[gremlin-groovy,existing] +---- +g.V().hasLabel("person"). + group(). + by(values("name", "age").fold()). + unfold(). + filter(select(values).count(local).is(gt(1))) +---- + +That filter, extracts the values of the `Map` and counts the vertices within each list. If that list contains more than +one vertex then it is a duplicate. http://git-wip-us.apache.org/repos/asf/tinkerpop/blob/d7ecfc05/docs/src/recipes/index.asciidoc ---------------------------------------------------------------------- diff --git a/docs/src/recipes/index.asciidoc b/docs/src/recipes/index.asciidoc index f77b929..31095c0 100644 --- a/docs/src/recipes/index.asciidoc +++ b/docs/src/recipes/index.asciidoc @@ -46,6 +46,8 @@ include::cycle-detection.asciidoc[] include::duplicate-edge.asciidoc[] +include::duplicate-vertex.asciidoc[] + include::if-then-based-grouping.asciidoc[] include::pagination.asciidoc[]