Kiyan Ahmadizadeh created CRUNCH-46:
---------------------------------------
Summary: Scrunch jobs launched from repl using PipelineLike#done
are not shipped with jar of repl code.
Key: CRUNCH-46
URL: https://issues.apache.org/jira/browse/CRUNCH-46
Project: Crunch
Issue Type: Bug
Reporter: Kiyan Ahmadizadeh
Suppose the following example code is run in the scrunch/scala repl:
val pipeline = Pipeline()
val textLines = pipeline.read(From.textFile("shakes.txt"))
val alphaNumericTextLines = textLines.map(line =>
line.toLowerCase().replaceAll("[^A-Za-z ]", ""))
val words = alphaNumericTextLines.flatMap(line => line.split("""\W+"""))
counts = words.count()
counts.write(To.textFile("/user/kiyan/counts"))
pipeline.done()
This code results in a ClassNotFoundException in MapReduce tasks. However,
changing the last line to pipeline.run() produces no errors.
The problem is that the method org.apache.crunch.scrunch.PipelineLike#run adds
a jar of generated repl code to the job's running tasks, but
org.apache.crunch.scrunch.PipelineLike#done does not. The done method should
be modified to take the same actions as the run method when launching a job
from the Scala repl. This will ensure users can launch jobs from the repl
regardless of how they conclude their pipelines.
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators:
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira