[ https://issues.apache.org/jira/browse/SPARK-15582?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15303003#comment-15303003 ]
Steve Loughran commented on SPARK-15582: ---------------------------------------- To clarify: have you ever got groovy closures to go over the wire? Groovy is a lazy-interpreter really. Even @CompileStatic is a hint. I have done stuff in the distant past [grumpy|https://github.com/steveloughran/grumpy] to send the text around and compile at the back end; that was for Hadoop 2.0.x MR, compiling the script into a subclass of the Hadoop classes and invoking. Not the fastest think in the world, but easy to play with in a time before Notebooks. > Support for Groovy closures > --------------------------- > > Key: SPARK-15582 > URL: https://issues.apache.org/jira/browse/SPARK-15582 > Project: Spark > Issue Type: Improvement > Components: Input/Output, Java API > Affects Versions: 1.6.1, 1.6.2, 2.0.0 > Environment: 6 node Debian 8 based Spark cluster > Reporter: Catalin Alexandru Zamfir > > After fixing SPARK-13599 and running one of our jobs against this fix for > Groovy dependencies (which indeed it fixed), we see the Spark executors stuck > at a ClassNotFound exception when running as a Script (via > GroovyShell.evalute (scriptText)). It seems Spark cannot de-serialize the > closure, or the closure is not received by the executor. > {noformat} > sparkContext.binaryFiles (ourPath).flatMap ({ onePathEntry -> code-block } as > FlatMapFunction).count (); > { onePathEntry -> code-block } denotes a Groovy closure. > {noformat} > There is a groovy-spark example @ > https://github.com/bunions1/groovy-spark-example ... However the above uses a > modified Groovy. If my understanding is correct, Groovy compiles to native > byte-code, which should be easy for Spark to pick-up and use closures. > The above example code fails with this stack-trace: > {noformat} > Caused by: java.lang.ClassNotFoundException: Script1$_run_closure1 > at java.net.URLClassLoader.findClass(URLClassLoader.java:381) > at java.lang.ClassLoader.loadClass(ClassLoader.java:424) > at java.lang.ClassLoader.loadClass(ClassLoader.java:357) > at java.lang.Class.forName0(Native Method) > at java.lang.Class.forName(Class.java:348) > at > org.apache.spark.serializer.JavaDeserializationStream$$anon$1.resolveClass(JavaSerializer.scala:68) > at > java.io.ObjectInputStream.readNonProxyDesc(ObjectInputStream.java:1613) > at java.io.ObjectInputStream.readClassDesc(ObjectInputStream.java:1518) > at > java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1774) > at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1351) > at > java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:2000) > at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1924) > at > java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1801) > at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1351) > at > java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:2000) > at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1924) > at > java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1801) > at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1351) > at > java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:2000) > at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1924) > at > java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1801) > at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1351) > at > java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:2000) > at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1924) > at > java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1801) > at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1351) > at > java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:2000) > at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1924) > at > java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1801) > at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1351) > at > java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:2000) > at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1924) > at > java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1801) > at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1351) > at java.io.ObjectInputStream.readObject(ObjectInputStream.java:371) > at > org.apache.spark.serializer.JavaDeserializationStream.readObject(JavaSerializer.scala:76) > at > org.apache.spark.serializer.JavaSerializerInstance.deserialize(JavaSerializer.scala:115) > at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:61) > at org.apache.spark.scheduler.Task.run(Task.scala:89) > at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:214 > {noformat} > Any ideas on how to tackle this, welcomed. I've tried Googling around for > similar issues, but nobody has found a solution. > At least, point me on where to "hack" to make Spark support closures and I'd > share some of my time to make it work. There is SPARK-2171 arguing that > support for this is out of the box, but for projects of a relative complex > size where the driver application is contained/part-of a bigger application > and running on a cluster, things do not seem to work. I don't know if > SPARK-2171 has tried to run outside of a local[] cluster set-up where such > issues can arise. > I saw a couple of people trying to make it to work, but again, they look to > work-arounds (eg. distribution of byte-code manually before needed, adding a > JAR with addJar and other work-arounds). > Can this be done? Where can we look? -- This message was sent by Atlassian JIRA (v6.3.4#6332) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org