[jira] [Commented] (TINKERPOP-1304) Input format for OLAP jobs is changed during job execution

Marko A. Rodriguez (JIRA) Wed, 18 May 2016 12:26:05 -0700

    [ 
https://issues.apache.org/jira/browse/TINKERPOP-1304?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15289591#comment-15289591
 ]


Marko A. Rodriguez commented on TINKERPOP-1304:
-----------------------------------------------

This isn't a bug. Your output format is {{ScriptOutputFormat}} so the next in 
the chain assumes {{ScriptInputFormat}} and since there is no parser script 
provided, NPE. There is really no way around this. You should use 
{{GryoOutputFormat}} as your {{graphWriter}}. 

After chatting with [~dkuppitz] in HipChat, I realized that we can't even have 
"intermediate formats" as there is no way for the OLAP job to know its part of 
a chain --- except at the Traversal level, but thats way above {{HadoopGraph}} 
stuff. :/ ... 

> Input format for OLAP jobs is changed during job execution
> ----------------------------------------------------------
>
>                 Key: TINKERPOP-1304
>                 URL: https://issues.apache.org/jira/browse/TINKERPOP-1304
>             Project: TinkerPop
>          Issue Type: Bug
>          Components: process
>    Affects Versions: 3.2.0-incubating
>            Reporter: Daniel Kuppitz
>            Assignee: Marko A. Rodriguez
>
> To reproduce the error, create the following config file:
> {noformat}
> # hadoop-script-output.properties
> gremlin.graph=org.apache.tinkerpop.gremlin.hadoop.structure.HadoopGraph
> gremlin.hadoop.graphReader=org.apache.tinkerpop.gremlin.hadoop.structure.io.gryo.GryoInputFormat
> gremlin.hadoop.graphWriter=org.apache.tinkerpop.gremlin.hadoop.structure.io.script.ScriptOutputFormat
> gremlin.hadoop.jarsInDistributedCache=true
> gremlin.hadoop.defaultGraphComputer=org.apache.tinkerpop.gremlin.spark.process.computer.SparkGraphComputer
> gremlin.hadoop.inputLocation=tinkerpop-modern.kryo
> gremlin.hadoop.outputLocation=output
> gremlin.hadoop.scriptOutputFormat.script=script-output.groovy
> spark.master=local[*]
> spark.executor.memory=1g
> spark.serializer=org.apache.tinkerpop.gremlin.spark.structure.io.gryo.GryoSerializer
> {noformat}
> ... and this Groovy file:
> {code}
> // script-output.groovy
> def stringify(vertex) {
>     return "foo"
> }
> {code}
> Finally execute the following code in the Gremlin console:
> {noformat}
> :install org.apache.tinkerpop hadoop-gremlin 3.2.1-SNAPSHOT
> :install org.apache.tinkerpop spark-gremlin 3.2.1-SNAPSHOT
> :q
> {noformat}
> {noformat}
> :plugin use tinkerpop.hadoop
> :plugin use tinkerpop.spark
> hdfs.copyFromLocal("data/tinkerpop-modern.kryo", "tinkerpop-modern.kryo")
> hdfs.copyFromLocal("/tmp/script-output.groovy", "script-output.groovy")
> graph = GraphFactory.open("/tmp/hadoop-script-output.properties")
> g = graph.traversal().withComputer()
> g.V().hasLabel("person").program(BulkDumperVertexProgram.build().create(graph))
> {noformat}
> It will fail with the following exception:
> {noformat}
> ERROR org.apache.spark.executor.Executor  - Exception in task 0.0 in stage 
> 0.0 (TID 0)
> java.lang.IllegalArgumentException: Can not create a Path from a null string
>       at org.apache.hadoop.fs.Path.checkPathArg(Path.java:122)
>       at org.apache.hadoop.fs.Path.<init>(Path.java:134)
>       at 
> org.apache.tinkerpop.gremlin.hadoop.structure.io.script.ScriptRecordReader.initialize(ScriptRecordReader.java:88)
>       at 
> org.apache.tinkerpop.gremlin.hadoop.structure.io.script.ScriptInputFormat.createRecordReader(ScriptInputFormat.java:39)
>       at 
> org.apache.spark.rdd.NewHadoopRDD$$anon$1.<init>(NewHadoopRDD.scala:156)
>       at org.apache.spark.rdd.NewHadoopRDD.compute(NewHadoopRDD.scala:129)
>       at org.apache.spark.rdd.NewHadoopRDD.compute(NewHadoopRDD.scala:64)
>       at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:306)
>       at org.apache.spark.rdd.RDD.iterator(RDD.scala:270)
>       at 
> org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38)
>       at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:306)
>       at org.apache.spark.rdd.RDD.iterator(RDD.scala:270)
>       at 
> org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:73)
>       at 
> org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:41)
>       at org.apache.spark.scheduler.Task.run(Task.scala:89)
>       at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:214)
>       at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
>       at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
>       at java.lang.Thread.run(Thread.java:745)
> {noformat}
> Other traversals and {{graph.compute().program(...).submit().get()}} work 
> fine, only traversals using {{program()}} seem to cause this problem.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (TINKERPOP-1304) Input format for OLAP jobs is changed during job execution

Reply via email to