[
https://issues.apache.org/jira/browse/CRUNCH-73?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Josh Wills resolved CRUNCH-73.
------------------------------
Resolution: Fixed
Fix Version/s: 0.4.0
Fixed. Thanks Kiyan!
> Scrunch applications using PipelineApp do not properly serialize closures to
> MapReduce tasks.
> ---------------------------------------------------------------------------------------------
>
> Key: CRUNCH-73
> URL: https://issues.apache.org/jira/browse/CRUNCH-73
> Project: Crunch
> Issue Type: Bug
> Components: Scrunch
> Affects Versions: 0.4.0
> Reporter: Kiyan Ahmadizadeh
> Assignee: Kiyan Ahmadizadeh
> Fix For: 0.4.0
>
> Attachments: CRUNCH-73-v1.patch, CRUNCH-73-v2.patch
>
>
> One of the great potential advantages of using Scala for writing MapReduce
> pipelines is the ability to send side data as part of function closures,
> rather than through Hadoop Configurations or the Distributed Cache. As an
> absurdly simple example, consider the following Scala PipelineApp that
> divides all elements of a numeric PCollection by an arbitrary argument:
> object DivideApp extends PipelineApp {
> val divisor = Integer.valueOf(args(0))
> val nums = read(From.textFile("numbers.txt"))
> val dividedNums = nums.map { n => n / divisor }
> dividedNums.write(To.textFile("dividedNums"))
> run()
> }
> Executing this PipelineApp fails. MapReduce tasks get a value of "null" for
> divisor (or 0 if divisor is forced to be a primitive numeric type). This
> indicates that an error is occurring in the serialization of Scala function
> closures that causes unbound variables in the closure to take on their
> default JVM values.
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira