Cole Greer created TINKERPOP-3071: ------------------------------------- Summary: barrier() reliant on undefined HashSet behavior Key: TINKERPOP-3071 URL: https://issues.apache.org/jira/browse/TINKERPOP-3071 Project: TinkerPop Issue Type: Improvement Components: process Affects Versions: 3.7.2, 3.6.7 Reporter: Cole Greer
Certain traversals can force the barrier step into relying on undefined behavior in Java. The barrier step operates by storing all incoming traversers into a HashSet, and only once the input is exhausted, will it release traversers to the next step. The contract of HashSet requires that users do not modify objects in any way which will change their hash code while it is stored in the HashSet. It is undefined how a HashSet will behave if the hash codes of a contained object changes. Some very specific types of traversals can force barrier() into such behavior. Consider the following traversal: {code:java} g.V().as("label").aggregate(local,"x").select("x").barrier().select("label") {code} The aggregate step will attach a BulkSet to each traverser, as "x". It is key here that each traverser is given a reference to the same BulkSet. The execution of this traverser will go as follows: 1. The first traverser moves through the traversal, is given the BulkSet (currently size 1) as "x" 2. The first traverser is stored in the barrier. At this time, hashcode(Traverser1) = hashcode(BulkSetSize1). 3. The second traverser moves through the traversal, as it passes through the aggregate step, it is added to the BulkSet, the BulkSet now has a size of 2. 4. The hashcode of Traverser1 has now changed. Now, hashcode(Traverser1) = hashcode(BulkSetSize2). What happens next will depend on the version of Java being used. Currently in Java 8, the HashSet can be iterated and Traverser1 can be read, but it can never be deleted as any attempt to find Traverser1 in the set by it's current hash code will fail. The result of this is that the iterator will get stuck on the first element. Repeated calls to .next() on the HashSet iterator will always return Traverser1. This causes the traversal to get stuck in a loop until it runs out of resources. In Java 9 and up, HashSet has been [patched|https://bugs.openjdk.org/browse/JDK-8170733] such that elements with changed hash codes can be deleted, which allows the above traversal to run as expected. We are still using HashSet wrong, but we aren't getting punished for it in Java 9+. There isn't a clear obvious solution for such traversals. The fix with the lowest impact that I'm currently aware of would be for barrier step to "box" each incoming traverser in some wrapper which computes the hash code of the internal object once, and stores it. Successive calls to hashcode() on the box, will always return the cached value regardless of if the internal object has been modified. This would prevent the hashcode from changing (from the perspective of the HashSet) while the traversers are being stored in the barrier. -- This message was sent by Atlassian Jira (v8.20.10#820010)