Dylan Bethune-Waddell created TINKERPOP-1099:
------------------------------------------------
Summary: IncrementalBulkLoader's getOrCreateEdge overwrites
previously added self-loops
Key: TINKERPOP-1099
URL: https://issues.apache.org/jira/browse/TINKERPOP-1099
Project: TinkerPop
Issue Type: Bug
Components: hadoop, io
Affects Versions: 3.1.1-incubating
Environment: Linux, CentOS 6.4
Reporter: Dylan Bethune-Waddell
The traversal in this function assumes that only one edge will be returned and
if it is returned then we should "get" and not "create" that edge, but any
number of self-edges on a vertex will be picked up by this traversal. Further,
they are allowed to have different properties than the edge we are about to
load, causing the first self-edge returned to be repeatedly overwritten or have
properties appended to it. On a fresh bulk load, only the very last self-edge
of a given label out of all self-edges of that label that were going to be
added appears in the graph.
{code:title=IncrementalBulkLoader.java (lines 51-77)}
@Override
public Edge getOrCreateEdge(final Edge edge, final Vertex outVertex, final
Vertex inVertex, final Graph graph, final GraphTraversalSource g) {
final Edge e;
final Traversal<Vertex, Edge> t =
g.V(outVertex).outE(edge.label()).filter(__.inV().is(inVertex));
if (t.hasNext()) {
e = t.next();
edge.properties().forEachRemaining(property -> {
final Property<?> existing = e.property(property.key());
if (!existing.isPresent() ||
!existing.value().equals(property.value())) {
e.property(property.key(), property.value());
}
});
} else {
e = createEdge(edge, outVertex, inVertex, graph, g);
}
return e;
}
{code}
It would seem that the values of any properties on the edge must be compared to
try a "get" instead of just creating the edge, but if there are no properties
on the (weird) self-edge, I have no idea what reasonable behaviour would be.
I may be able to submit a PR for this later today so I'll assign myself for now
but feel free to put it in better hands - this seems like something relatively
minor to provide a decent interim fix for, and it does overwrite user data in
the graph so I vote that a fix gets pushed into 3.1.1 and I have flagged it as
"important".
*Tested on*:
- Linux/CentOS 6.4
- Titan 1.1 (6 nodes) and TinkerGraph
- TinkerPop-3.1.1-SNAPSHOT
- Spark 1.5.2
- Hadoop 2.7.1
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)