[
https://issues.apache.org/jira/browse/TINKERPOP-1117?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
stephen mallette updated TINKERPOP-1117:
----------------------------------------
Component/s: hadoop
> InputFormatRDD.readGraphRDD requires a valid gremlin.hadoop.inputLocation,
> breaking InputFormats (Cassandra, HBase) that don't need one
> ---------------------------------------------------------------------------------------------------------------------------------------
>
> Key: TINKERPOP-1117
> URL: https://issues.apache.org/jira/browse/TINKERPOP-1117
> Project: TinkerPop
> Issue Type: Improvement
> Components: hadoop
> Affects Versions: 3.2.0-incubating
> Reporter: Dylan Bethune-Waddell
> Priority: Minor
> Fix For: 3.2.0-incubating
>
>
> On line 43, the call to Constants.getSearchGraphLocation returns
> Optional.empty() if gremlin.hadoop.inputLocation=none as advised in Titan's
> CassandraInputFormat and HBaseInputFormat. Changing the readGraphRDD method
> to call .isPresent() and only set the storage location in the config if so
> allows SparkGraphComputer from the 3.2.0-SNAPSHOT branch to work with Titan
> via CassandraInputFormat in a traversal source:
> {code}
> // Imports
> import java.util.Optional;
> @Override
> public JavaPairRDD<Object, VertexWritable> readGraphRDD(final Configuration
> configuration, final JavaSparkContext sparkContext) {
> final org.apache.hadoop.conf.Configuration hadoopConfiguration =
> ConfUtil.makeHadoopConfiguration(configuration);
> // This part was used directly in hadoopConfiguration.set(...)
> final Optional<String> searchGraph =
> Constants.getSearchGraphLocation(configuration.getString(Constants.GREMLIN_HADOOP_INPUT_LOCATION),
> FileSystemStorage.open(hadoopConfiguration));
> if (searchGraph.isPresent()) {
>
> hadoopConfiguration.set(configuration.getString(Constants.GREMLIN_HADOOP_INPUT_LOCATION),
> searchGraph.get());
> }
> return sparkContext.newAPIHadoopRDD(hadoopConfiguration,
> (Class<InputFormat<NullWritable, VertexWritable>>)
> hadoopConfiguration.getClass(Constants.GREMLIN_HADOOP_GRAPH_INPUT_FORMAT,
> InputFormat.class),
> NullWritable.class,
> VertexWritable.class)
> .mapToPair(tuple -> new Tuple2<>(tuple._2().get().id(), new
> VertexWritable(tuple._2().get())));
> {code}
> I don't really understand the intended behaviour, so this is probably not the
> right thing to do. Would the addition of a configuration variable such as
> "gremlin.hadoop.inputLocationRequired" that defaults to true, and can be set
> to false for these other input formats work?
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)