[
https://issues.apache.org/jira/browse/TINKERPOP-1072?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15089630#comment-15089630
]
ASF GitHub Bot commented on TINKERPOP-1072:
-------------------------------------------
GitHub user okram opened a pull request:
https://github.com/apache/incubator-tinkerpop/pull/196
TINKERPOP-1072: Allow the user to set persistence options using
StorageLevel.valueOf()
https://issues.apache.org/jira/browse/TINKERPOP-1072
I always thought Spark had some configuration like `default.storageLevel`
and then when a user did `cache()` it would do that default. I was wrong.
`cache()` is always `MEMORY_ONLY`. I made it so you can specify the storage
level for both persisted RDDs and runtime job RDDs and thus now (internally)
use `persist(STORAGE_LEVEL)` where `MEMORY_ONLY` is the default.
Test cases, docs, and Spark integration tests pass.
VOTE +1.
You can merge this pull request into a Git repository by running:
$ git pull https://github.com/apache/incubator-tinkerpop TINKERPOP-1072
Alternatively you can review and apply these changes as the patch at:
https://github.com/apache/incubator-tinkerpop/pull/196.patch
To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:
This closes #196
----
commit 4082a4a043b54c102f49f220b14e2644817e1222
Author: Marko A. Rodriguez <[email protected]>
Date: 2016-01-08T18:05:08Z
Allow the user to specify the persistence StorageLevel for both the
computed job graph and any PersistedOutputRDD data. Updated docs, example conf,
and added a test case that validates that persisted to SparkStorage is correct
as the configuration changes.
----
> Allow the user to set persistence options using StorageLevel.valueOf()
> ----------------------------------------------------------------------
>
> Key: TINKERPOP-1072
> URL: https://issues.apache.org/jira/browse/TINKERPOP-1072
> Project: TinkerPop
> Issue Type: Improvement
> Components: hadoop
> Affects Versions: 3.1.0-incubating
> Reporter: Marko A. Rodriguez
> Assignee: Marko A. Rodriguez
> Fix For: 3.1.1-incubating
>
>
> I always thought there was a Spark option to say stuff like
> {{default.persist=DISK_SER_1}}, but I can't seem to find it.
> If no such option exists, then we should add it to Spark-Gremlin. For
> instance:
> {code}
> gremlin.spark.storageLevel=DISK_ONLY
> {code}
> See:
> http://spark.apache.org/docs/latest/programming-guide.html#rdd-persistence
> Then we would need to go through and where we have {{...cache()}} calls, they
> need to be changed to
> {{....persist(StorageLevel.valueOf(conf.get("gremlin.spark.storageLevel","MEMORY_ONLY")}}.
> The question then becomes, do we provide flexibility where the user can have
> the program caching different from the persisted RDD caching :|.... Too many
> configurations sucks.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)