[
https://issues.apache.org/jira/browse/CRUNCH-9?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Kiyan Ahmadizadeh updated CRUNCH-9:
-----------------------------------
Attachment: CRUNCH-9.patch
This commit modifies the Scrunch project so that Scrunch jobs can be run
from
a Scala REPL. Users can run a Scala REPL capable of launching Scrunch jobs
by
building Scrunch using `mvn package` and running bin/scrunch from the
distribution directory that results. Several changes have been made to the
project to accomplish this:
1. The project has been modified to produce a release distribution. The
distribution is created by maven when `mvn package` is run. A distribution
folder and tarball are created. The distribution folder contains a bin dir
that
contains scripts, a lib dir that contains all library jars, and a log dir
that
contains a log4j configuration file.
2. A modified Scala REPL was added to the project. An object
InterpreterRunner
was created that launches a Scala REPL. It's a modification of Scala's
MainGenericRunner. The new Scrunch version allows client code to determine
if a
REPL is actually running, and includes methods for creating a jar from the
code
compiled from REPL input. A script named "scrunch" was added to the project
that, when run, launches this modified Scala REPL. The script is a
modification
of the script distributed with Scala that launches the Scala REPL.
3. Scrunch's Pipeline class was modified so that any MapReduce pipeline
constructed automatically adds the Scrunch lib jars to the Distributed
Cache of
the job and to the classpaths of run tasks.
4. Methods on PCollection/PTable/etc. that result in a job being launched
were
modified to check if the REPL is running and, if so, create a jar of code
compiled from REPL input and ship that jar with the job so that it's on the
classpath of run tasks.
5. To facilitate extensions, From/To/At objects were changed to traits, with
likewise named singleton objects that extend the traits created.
6. The examples in the examples directory, and the script scrunch.py for
running
those examples, are included in the project distribution. The scrunch.py
script
was renamed to scrunch-job.py and modified to cope with the new project
distribution structure and take advantage of the fact that Scrunch lib jars
are
now automatically added to the classpath of run jobs.
I started an integration test for actually launching jobs but the
MiniMRCluster
testing framework does not behave properly when jars are added to the
distributed cache. The problem is related to MAPREDUCE-2884. I have
verified
that jobs can be launched from the REPL using an actual cluster.
> Add support for launching Scrunch pipelines from a REPL
> -------------------------------------------------------
>
> Key: CRUNCH-9
> URL: https://issues.apache.org/jira/browse/CRUNCH-9
> Project: Crunch
> Issue Type: New Feature
> Components: Scrunch
> Reporter: Josh Wills
> Attachments: CRUNCH-9.patch
>
>
> It would be really, really cool and useful to be able to launch a Scrunch
> pipeline from a Scala-based REPL, which was one of the killer apps for
> Cascade, Google's Scala-based wrapper around FlumeJava.
> See the video from Scala Days 2011 for a reference:
> http://days2011.scala-lang.org/node/138/282
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators:
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira