Dylan Bethune-Waddell created TINKERPOP-1099:
------------------------------------------------

             Summary: IncrementalBulkLoader's getOrCreateEdge overwrites 
previously added self-loops
                 Key: TINKERPOP-1099
                 URL: https://issues.apache.org/jira/browse/TINKERPOP-1099
             Project: TinkerPop
          Issue Type: Bug
          Components: hadoop, io
    Affects Versions: 3.1.1-incubating
         Environment: Linux, CentOS 6.4
            Reporter: Dylan Bethune-Waddell


The traversal in this function assumes that only one edge will be returned and 
if it is returned then we should "get" and not "create" that edge, but any 
number of self-edges on a vertex will be picked up by this traversal. Further, 
they are allowed to have different properties than the edge we are about to 
load, causing the first self-edge returned to be repeatedly overwritten or have 
properties appended to it. On a fresh bulk load, only the very last self-edge 
of a given label out of all self-edges of that label that were going to be 
added appears in the graph.

{code:title=IncrementalBulkLoader.java (lines 51-77)}
@Override
    public Edge getOrCreateEdge(final Edge edge, final Vertex outVertex, final 
Vertex inVertex, final Graph graph, final GraphTraversalSource g) {
        final Edge e;
        final Traversal<Vertex, Edge> t = 
g.V(outVertex).outE(edge.label()).filter(__.inV().is(inVertex));
        if (t.hasNext()) {
            e = t.next();
            edge.properties().forEachRemaining(property -> {
                final Property<?> existing = e.property(property.key());
                if (!existing.isPresent() || 
!existing.value().equals(property.value())) {
                    e.property(property.key(), property.value());
                }
            });
        } else {
            e = createEdge(edge, outVertex, inVertex, graph, g);
        }
        return e;
    }
{code}

It would seem that the values of any properties on the edge must be compared to 
try a "get" instead of just creating the edge, but if there are no properties 
on the (weird) self-edge, I have no idea what reasonable behaviour would be.

I may be able to submit a PR for this later today so I'll assign myself for now 
but feel free to put it in better hands - this seems like something relatively 
minor to provide a decent interim fix for, and it does overwrite user data in 
the graph so I vote that a fix gets pushed into 3.1.1 and I have flagged it as 
"important".

*Tested on*: 
- Linux/CentOS 6.4
- Titan 1.1 (6 nodes) and TinkerGraph 
- TinkerPop-3.1.1-SNAPSHOT
- Spark 1.5.2
- Hadoop 2.7.1 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to