[
https://issues.apache.org/jira/browse/TINKERPOP-1072?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15090666#comment-15090666
]
ASF GitHub Bot commented on TINKERPOP-1072:
-------------------------------------------
Github user twilmes commented on a diff in the pull request:
https://github.com/apache/incubator-tinkerpop/pull/196#discussion_r49265086
--- Diff:
spark-gremlin/src/test/java/org/apache/tinkerpop/gremlin/spark/structure/io/PersistedInputOutputRDDTest.java
---
@@ -54,6 +56,44 @@
public class PersistedInputOutputRDDTest extends AbstractSparkTest {
@Test
+ public void shouldPersistRDDBasedOnStorageLevel() throws Exception {
+ Spark.create("local[4]");
+ int counter = 0;
+ for (final String storageLevel : Arrays.asList("MEMORY_ONLY",
"DISK_ONLY","MEMORY_ONLY_SER","MEMORY_AND_DISK_SER","OFF_HEAP")) {
+ assertEquals(counter * 2, Spark.getRDDs().size());
+ counter++;
+ final String rddName =
TestHelper.makeTestDataDirectory(PersistedInputOutputRDDTest.class,
UUID.randomUUID().toString());
+ final Configuration configuration = new BaseConfiguration();
+ configuration.setProperty("spark.master", "local[4]");
+ configuration.setProperty("spark.serializer",
GryoSerializer.class.getCanonicalName());
+ configuration.setProperty(Graph.GRAPH,
HadoopGraph.class.getName());
+
configuration.setProperty(Constants.GREMLIN_HADOOP_INPUT_LOCATION,
SparkHadoopGraphProvider.PATHS.get("tinkerpop-modern.kryo"));
+
configuration.setProperty(Constants.GREMLIN_HADOOP_GRAPH_INPUT_FORMAT,
GryoInputFormat.class.getCanonicalName());
+
configuration.setProperty(Constants.GREMLIN_SPARK_GRAPH_OUTPUT_RDD,
PersistedOutputRDD.class.getCanonicalName());
+
configuration.setProperty(Constants.GREMLIN_SPARK_PERSIST_STORAGE_LEVEL,
storageLevel);
+
configuration.setProperty(Constants.GREMLIN_HADOOP_JARS_IN_DISTRIBUTED_CACHE,
false);
+
configuration.setProperty(Constants.GREMLIN_HADOOP_OUTPUT_LOCATION, rddName);
+
configuration.setProperty(Constants.GREMLIN_SPARK_PERSIST_CONTEXT, true);
+ Graph graph = GraphFactory.open(configuration);
+ graph.compute(SparkGraphComputer.class)
+ .result(GraphComputer.ResultGraph.NEW)
+ .persist(GraphComputer.Persist.EDGES)
+ .program(TraversalVertexProgram.build()
+
.traversal(GraphTraversalSource.build().engine(ComputerTraversalEngine.build().computer(SparkGraphComputer.class)),
+ "gremlin-groovy",
+ "g.V()").create(graph)).submit().get();
+ ////////
+ assertTrue(Spark.hasRDD(Constants.getGraphLocation(rddName)));
+ assertEquals(StorageLevel.fromString(storageLevel),
Spark.getRDD(Constants.getGraphLocation(rddName)).getStorageLevel());
+ assertTrue(Spark.hasRDD(Constants.getMemoryLocation(rddName,
Graph.Hidden.hide("traversers"))));
+ assertEquals(StorageLevel.fromString(storageLevel),
Spark.getRDD(Constants.getMemoryLocation(rddName,
Graph.Hidden.hide("traversers"))).getStorageLevel());
+ assertEquals(counter * 2, Spark.getRDDs().size());
+ //System.out.println(SparkContextStorage.open().ls());
--- End diff --
Looks like there was a lingering debug println here that could be removed.
> Allow the user to set persistence options using StorageLevel.valueOf()
> ----------------------------------------------------------------------
>
> Key: TINKERPOP-1072
> URL: https://issues.apache.org/jira/browse/TINKERPOP-1072
> Project: TinkerPop
> Issue Type: Improvement
> Components: hadoop
> Affects Versions: 3.1.0-incubating
> Reporter: Marko A. Rodriguez
> Assignee: Marko A. Rodriguez
> Fix For: 3.1.1-incubating
>
>
> I always thought there was a Spark option to say stuff like
> {{default.persist=DISK_SER_1}}, but I can't seem to find it.
> If no such option exists, then we should add it to Spark-Gremlin. For
> instance:
> {code}
> gremlin.spark.storageLevel=DISK_ONLY
> {code}
> See:
> http://spark.apache.org/docs/latest/programming-guide.html#rdd-persistence
> Then we would need to go through and where we have {{...cache()}} calls, they
> need to be changed to
> {{....persist(StorageLevel.valueOf(conf.get("gremlin.spark.storageLevel","MEMORY_ONLY")}}.
> The question then becomes, do we provide flexibility where the user can have
> the program caching different from the persisted RDD caching :|.... Too many
> configurations sucks.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)