[ https://issues.apache.org/jira/browse/TINKERPOP-1304?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Marko A. Rodriguez updated TINKERPOP-1304: ------------------------------------------ Issue Type: Improvement (was: Bug) > Input format for OLAP jobs is changed during job execution > ---------------------------------------------------------- > > Key: TINKERPOP-1304 > URL: https://issues.apache.org/jira/browse/TINKERPOP-1304 > Project: TinkerPop > Issue Type: Improvement > Components: process > Affects Versions: 3.2.0-incubating > Reporter: Daniel Kuppitz > Assignee: Marko A. Rodriguez > > To reproduce the error, create the following config file: > {noformat} > # hadoop-script-output.properties > gremlin.graph=org.apache.tinkerpop.gremlin.hadoop.structure.HadoopGraph > gremlin.hadoop.graphReader=org.apache.tinkerpop.gremlin.hadoop.structure.io.gryo.GryoInputFormat > gremlin.hadoop.graphWriter=org.apache.tinkerpop.gremlin.hadoop.structure.io.script.ScriptOutputFormat > gremlin.hadoop.jarsInDistributedCache=true > gremlin.hadoop.defaultGraphComputer=org.apache.tinkerpop.gremlin.spark.process.computer.SparkGraphComputer > gremlin.hadoop.inputLocation=tinkerpop-modern.kryo > gremlin.hadoop.outputLocation=output > gremlin.hadoop.scriptOutputFormat.script=script-output.groovy > spark.master=local[*] > spark.executor.memory=1g > spark.serializer=org.apache.tinkerpop.gremlin.spark.structure.io.gryo.GryoSerializer > {noformat} > ... and this Groovy file: > {code} > // script-output.groovy > def stringify(vertex) { > return "foo" > } > {code} > Finally execute the following code in the Gremlin console: > {noformat} > :install org.apache.tinkerpop hadoop-gremlin 3.2.1-SNAPSHOT > :install org.apache.tinkerpop spark-gremlin 3.2.1-SNAPSHOT > :q > {noformat} > {noformat} > :plugin use tinkerpop.hadoop > :plugin use tinkerpop.spark > hdfs.copyFromLocal("data/tinkerpop-modern.kryo", "tinkerpop-modern.kryo") > hdfs.copyFromLocal("/tmp/script-output.groovy", "script-output.groovy") > graph = GraphFactory.open("/tmp/hadoop-script-output.properties") > g = graph.traversal().withComputer() > g.V().hasLabel("person").program(BulkDumperVertexProgram.build().create(graph)) > {noformat} > It will fail with the following exception: > {noformat} > ERROR org.apache.spark.executor.Executor - Exception in task 0.0 in stage > 0.0 (TID 0) > java.lang.IllegalArgumentException: Can not create a Path from a null string > at org.apache.hadoop.fs.Path.checkPathArg(Path.java:122) > at org.apache.hadoop.fs.Path.<init>(Path.java:134) > at > org.apache.tinkerpop.gremlin.hadoop.structure.io.script.ScriptRecordReader.initialize(ScriptRecordReader.java:88) > at > org.apache.tinkerpop.gremlin.hadoop.structure.io.script.ScriptInputFormat.createRecordReader(ScriptInputFormat.java:39) > at > org.apache.spark.rdd.NewHadoopRDD$$anon$1.<init>(NewHadoopRDD.scala:156) > at org.apache.spark.rdd.NewHadoopRDD.compute(NewHadoopRDD.scala:129) > at org.apache.spark.rdd.NewHadoopRDD.compute(NewHadoopRDD.scala:64) > at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:306) > at org.apache.spark.rdd.RDD.iterator(RDD.scala:270) > at > org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38) > at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:306) > at org.apache.spark.rdd.RDD.iterator(RDD.scala:270) > at > org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:73) > at > org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:41) > at org.apache.spark.scheduler.Task.run(Task.scala:89) > at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:214) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) > at java.lang.Thread.run(Thread.java:745) > {noformat} > Other traversals and {{graph.compute().program(...).submit().get()}} work > fine, only traversals using {{program()}} seem to cause this problem. -- This message was sent by Atlassian JIRA (v6.3.4#6332)