Cole Greer created TINKERPOP-3071:
-------------------------------------

             Summary: barrier() reliant on undefined HashSet behavior
                 Key: TINKERPOP-3071
                 URL: https://issues.apache.org/jira/browse/TINKERPOP-3071
             Project: TinkerPop
          Issue Type: Improvement
          Components: process
    Affects Versions: 3.7.2, 3.6.7
            Reporter: Cole Greer


Certain traversals can force the barrier step into relying on undefined 
behavior in Java. The barrier step operates by storing all incoming traversers 
into a HashSet, and only once the input is exhausted, will it release 
traversers to the next step. The contract of HashSet requires that users do not 
modify objects in any way which will change their hash code while it is stored 
in the HashSet. It is undefined how a HashSet will behave if the hash codes of 
a contained object changes.

Some very specific types of traversals can force barrier() into such behavior. 
Consider the following traversal:

{code:java}
g.V().as("label").aggregate(local,"x").select("x").barrier().select("label")
{code}

The aggregate step will attach a BulkSet to each traverser, as "x". It is key 
here that each traverser is given a reference to the same BulkSet. The 
execution of this traverser will go as follows:

1. The first traverser moves through the traversal, is given the BulkSet 
(currently size 1) as "x"
2. The first traverser is stored in the barrier. At this time, 
hashcode(Traverser1) = hashcode(BulkSetSize1).
3. The second traverser moves through the traversal, as it passes through the 
aggregate step, it is added to the BulkSet, the BulkSet now has a size of 2.
4. The hashcode of Traverser1 has now changed. Now, hashcode(Traverser1) = 
hashcode(BulkSetSize2).

What happens next will depend on the version of Java being used. 

Currently in Java 8, the HashSet can be iterated and Traverser1 can be read, 
but it can never be deleted as any attempt to find Traverser1 in the set by 
it's current hash code will fail. The result of this is that the iterator will 
get stuck on the first element. Repeated calls to .next() on the HashSet 
iterator will always return Traverser1. This causes the traversal to get stuck 
in a loop until it runs out of resources.

In Java 9 and up, HashSet has been 
[patched|https://bugs.openjdk.org/browse/JDK-8170733] such that elements with 
changed hash codes can be deleted, which allows the above traversal to run as 
expected. We are still using HashSet wrong, but we aren't getting punished for 
it in Java 9+.

There isn't a clear obvious solution for such traversals. The fix with the 
lowest impact that I'm currently aware of would be for barrier step to "box" 
each incoming traverser in some wrapper which computes the hash code of the 
internal object once, and stores it. Successive calls to hashcode() on the box, 
will always return the cached value regardless of if the internal object has 
been modified. This would prevent the hashcode from changing (from the 
perspective of the HashSet) while the traversers are being stored in the 
barrier.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to