[jira] [Commented] (TINKERPOP-1072) Allow the user to set persistence options using StorageLevel.valueOf()

ASF GitHub Bot (JIRA) Sat, 09 Jan 2016 07:49:07 -0800

    [ 
https://issues.apache.org/jira/browse/TINKERPOP-1072?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15090666#comment-15090666
 ]


ASF GitHub Bot commented on TINKERPOP-1072:
-------------------------------------------

Github user twilmes commented on a diff in the pull request:

    https://github.com/apache/incubator-tinkerpop/pull/196#discussion_r49265086
  
    --- Diff: 
spark-gremlin/src/test/java/org/apache/tinkerpop/gremlin/spark/structure/io/PersistedInputOutputRDDTest.java
 ---
    @@ -54,6 +56,44 @@
     public class PersistedInputOutputRDDTest extends AbstractSparkTest {
     
         @Test
    +    public void shouldPersistRDDBasedOnStorageLevel() throws Exception {
    +        Spark.create("local[4]");
    +        int counter = 0;
    +        for (final String storageLevel : Arrays.asList("MEMORY_ONLY", 
"DISK_ONLY","MEMORY_ONLY_SER","MEMORY_AND_DISK_SER","OFF_HEAP")) {
    +            assertEquals(counter * 2, Spark.getRDDs().size());
    +            counter++;
    +            final String rddName = 
TestHelper.makeTestDataDirectory(PersistedInputOutputRDDTest.class, 
UUID.randomUUID().toString());
    +            final Configuration configuration = new BaseConfiguration();
    +            configuration.setProperty("spark.master", "local[4]");
    +            configuration.setProperty("spark.serializer", 
GryoSerializer.class.getCanonicalName());
    +            configuration.setProperty(Graph.GRAPH, 
HadoopGraph.class.getName());
    +            
configuration.setProperty(Constants.GREMLIN_HADOOP_INPUT_LOCATION, 
SparkHadoopGraphProvider.PATHS.get("tinkerpop-modern.kryo"));
    +            
configuration.setProperty(Constants.GREMLIN_HADOOP_GRAPH_INPUT_FORMAT, 
GryoInputFormat.class.getCanonicalName());
    +            
configuration.setProperty(Constants.GREMLIN_SPARK_GRAPH_OUTPUT_RDD, 
PersistedOutputRDD.class.getCanonicalName());
    +            
configuration.setProperty(Constants.GREMLIN_SPARK_PERSIST_STORAGE_LEVEL, 
storageLevel);
    +            
configuration.setProperty(Constants.GREMLIN_HADOOP_JARS_IN_DISTRIBUTED_CACHE, 
false);
    +            
configuration.setProperty(Constants.GREMLIN_HADOOP_OUTPUT_LOCATION, rddName);
    +            
configuration.setProperty(Constants.GREMLIN_SPARK_PERSIST_CONTEXT, true);
    +            Graph graph = GraphFactory.open(configuration);
    +            graph.compute(SparkGraphComputer.class)
    +                    .result(GraphComputer.ResultGraph.NEW)
    +                    .persist(GraphComputer.Persist.EDGES)
    +                    .program(TraversalVertexProgram.build()
    +                            
.traversal(GraphTraversalSource.build().engine(ComputerTraversalEngine.build().computer(SparkGraphComputer.class)),
    +                                    "gremlin-groovy",
    +                                    "g.V()").create(graph)).submit().get();
    +            ////////
    +            assertTrue(Spark.hasRDD(Constants.getGraphLocation(rddName)));
    +            assertEquals(StorageLevel.fromString(storageLevel), 
Spark.getRDD(Constants.getGraphLocation(rddName)).getStorageLevel());
    +            assertTrue(Spark.hasRDD(Constants.getMemoryLocation(rddName, 
Graph.Hidden.hide("traversers"))));
    +            assertEquals(StorageLevel.fromString(storageLevel), 
Spark.getRDD(Constants.getMemoryLocation(rddName, 
Graph.Hidden.hide("traversers"))).getStorageLevel());
    +            assertEquals(counter * 2, Spark.getRDDs().size());
    +            //System.out.println(SparkContextStorage.open().ls());
    --- End diff --
    
    Looks like there was a lingering debug println here that could be removed.


> Allow the user to set persistence options using StorageLevel.valueOf()
> ----------------------------------------------------------------------
>
>                 Key: TINKERPOP-1072
>                 URL: https://issues.apache.org/jira/browse/TINKERPOP-1072
>             Project: TinkerPop
>          Issue Type: Improvement
>          Components: hadoop
>    Affects Versions: 3.1.0-incubating
>            Reporter: Marko A. Rodriguez
>            Assignee: Marko A. Rodriguez
>             Fix For: 3.1.1-incubating
>
>
> I always thought there was a Spark option to say stuff like 
> {{default.persist=DISK_SER_1}}, but I can't seem to find it.
> If no such option exists, then we should add it to Spark-Gremlin. For 
> instance:
> {code}
> gremlin.spark.storageLevel=DISK_ONLY
> {code}
> See: 
> http://spark.apache.org/docs/latest/programming-guide.html#rdd-persistence
> Then we would need to go through and where we have {{...cache()}} calls, they 
> need to be changed to 
> {{....persist(StorageLevel.valueOf(conf.get("gremlin.spark.storageLevel","MEMORY_ONLY")}}.
> The question then becomes, do we provide flexibility where the user can have 
> the program caching different from the persisted RDD caching :|.... Too many 
> configurations sucks.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (TINKERPOP-1072) Allow the user to set persistence options using StorageLevel.valueOf()

Reply via email to