[ 
https://issues.apache.org/jira/browse/TINKERPOP-2081?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

stephen mallette updated TINKERPOP-2081:
----------------------------------------
    Component/s: hadoop

> PersistedOutputRDD materialises rdd lazily with Spark 2.x
> ---------------------------------------------------------
>
>                 Key: TINKERPOP-2081
>                 URL: https://issues.apache.org/jira/browse/TINKERPOP-2081
>             Project: TinkerPop
>          Issue Type: Bug
>          Components: hadoop
>    Affects Versions: 3.3.4
>            Reporter: Artem Aliev
>            Priority: Major
>
> PersistedOutputRDD is not actually persist RDD in spark memory but mark it 
> for lazy caching in the future. It looks like caching was eager in Spark 1.6, 
> but in spark 2.0 it lazy.
> The lazy caching looks wrong for this case, the source graph could be changed 
> after snapshot is created and snapshot should not be affected by that changes.
> The fix itself is simple: PersistedOutputRDD should call any spark action to 
> trigger eager caching. For example count()



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Reply via email to