Kiyan Ahmadizadeh created CRUNCH-73:
---------------------------------------

             Summary: Scrunch applications using PipelineApp do not properly 
serialize closures to MapReduce tasks.
                 Key: CRUNCH-73
                 URL: https://issues.apache.org/jira/browse/CRUNCH-73
             Project: Crunch
          Issue Type: Bug
          Components: Scrunch
    Affects Versions: 0.4.0
            Reporter: Kiyan Ahmadizadeh
            Assignee: Kiyan Ahmadizadeh


One of the great potential advantages of using Scala for writing MapReduce 
pipelines is the ability to send side data as part of function closures, rather 
than through Hadoop Configurations or the Distributed Cache.  As an absurdly 
simple example, consider the following Scala PipelineApp that divides all 
elements of a numeric PCollection by an arbitrary argument:

object DivideApp extends PipelineApp {
  val divisor = Integer.valueOf(args(0))
  val nums = read(From.textFile("numbers.txt"))
  val dividedNums = nums.map { n => n / divisor }
  dividedNums.write(To.textFile("dividedNums"))
  run()
}

Executing this PipelineApp fails.  MapReduce tasks get a value of "null" for 
divisor (or 0 if divisor is forced to be a primitive numeric type).  This 
indicates that an error is occurring in the serialization of Scala function 
closures that causes unbound variables in the closure to take on their default 
JVM values.  

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Reply via email to