Re: SCALA_HOME or SCALA_LIBRARY_PATH not set during build
This is a false error message actually - the Maven build no longer requires SCALA_HOME but the message/check was still there. This was fixed recently in master: https://github.com/apache/spark/commit/d8c005d5371f81a2a06c5d27c7021e1ae43d7193 I can back port that fix into branch-1.0 so it will be in 1.0.1 as well. For other people running into this, you can export SCALA_HOME to any value and it will work. - Patrick On Sat, May 31, 2014 at 8:34 PM, Colin McCabe cmcc...@alumni.cmu.edu wrote: Spark currently supports two build systems, sbt and maven. sbt will download the correct version of scala, but with Maven you need to supply it yourself and set SCALA_HOME. It sounds like the instructions need to be updated-- perhaps create a JIRA? best, Colin On Sat, May 31, 2014 at 7:06 PM, Soren Macbeth so...@yieldbot.com wrote: Hello, Following the instructions for building spark 1.0.0, I encountered the following error: [ERROR] Failed to execute goal org.apache.maven.plugins:maven-antrun-plugin:1.7:run (default) on project spark-core_2.10: An Ant BuildException has occured: Please set the SCALA_HOME (or SCALA_LIBRARY_PATH if scala is on the path) environment variables and retry. [ERROR] around Ant part ...fail message=Please set the SCALA_HOME (or SCALA_LIBRARY_PATH if scala is on the path) environment variables and retry @ 6:126 in /Users/soren/src/spark-1.0.0/core/target/antrun/build-main.xml No where in the documentation does it mention that having scala installed and either of these env vars set nor what version should be installed. Setting these env vars wasn't required for 0.9.1 with sbt. I was able to get past it by downloading the scala 2.10.4 binary package to a temp dir and setting SCALA_HOME to that dir. Ideally, it would be nice to not have to require people to have a standalone scala installation but at a minimum this requirement should be documented in the build instructions no? -Soren
Re: SCALA_HOME or SCALA_LIBRARY_PATH not set during build
I went ahead and created a JIRA for this and back ported the improvement into branch-1.0. This wasn't a regression per-se because the behavior existed in all previous versions, but it's annoying behavior so best to fix it. https://issues.apache.org/jira/browse/SPARK-1984 - Patrick On Sun, Jun 1, 2014 at 11:13 AM, Patrick Wendell pwend...@gmail.com wrote: This is a false error message actually - the Maven build no longer requires SCALA_HOME but the message/check was still there. This was fixed recently in master: https://github.com/apache/spark/commit/d8c005d5371f81a2a06c5d27c7021e1ae43d7193 I can back port that fix into branch-1.0 so it will be in 1.0.1 as well. For other people running into this, you can export SCALA_HOME to any value and it will work. - Patrick On Sat, May 31, 2014 at 8:34 PM, Colin McCabe cmcc...@alumni.cmu.edu wrote: Spark currently supports two build systems, sbt and maven. sbt will download the correct version of scala, but with Maven you need to supply it yourself and set SCALA_HOME. It sounds like the instructions need to be updated-- perhaps create a JIRA? best, Colin On Sat, May 31, 2014 at 7:06 PM, Soren Macbeth so...@yieldbot.com wrote: Hello, Following the instructions for building spark 1.0.0, I encountered the following error: [ERROR] Failed to execute goal org.apache.maven.plugins:maven-antrun-plugin:1.7:run (default) on project spark-core_2.10: An Ant BuildException has occured: Please set the SCALA_HOME (or SCALA_LIBRARY_PATH if scala is on the path) environment variables and retry. [ERROR] around Ant part ...fail message=Please set the SCALA_HOME (or SCALA_LIBRARY_PATH if scala is on the path) environment variables and retry @ 6:126 in /Users/soren/src/spark-1.0.0/core/target/antrun/build-main.xml No where in the documentation does it mention that having scala installed and either of these env vars set nor what version should be installed. Setting these env vars wasn't required for 0.9.1 with sbt. I was able to get past it by downloading the scala 2.10.4 binary package to a temp dir and setting SCALA_HOME to that dir. Ideally, it would be nice to not have to require people to have a standalone scala installation but at a minimum this requirement should be documented in the build instructions no? -Soren
Re: SCALA_HOME or SCALA_LIBRARY_PATH not set during build
Cheers, I didn't think it was needed, but just wanted to point it out. On Sun, Jun 1, 2014 at 11:21 AM, Patrick Wendell pwend...@gmail.com wrote: I went ahead and created a JIRA for this and back ported the improvement into branch-1.0. This wasn't a regression per-se because the behavior existed in all previous versions, but it's annoying behavior so best to fix it. https://issues.apache.org/jira/browse/SPARK-1984 - Patrick On Sun, Jun 1, 2014 at 11:13 AM, Patrick Wendell pwend...@gmail.com wrote: This is a false error message actually - the Maven build no longer requires SCALA_HOME but the message/check was still there. This was fixed recently in master: https://github.com/apache/spark/commit/d8c005d5371f81a2a06c5d27c7021e1ae43d7193 I can back port that fix into branch-1.0 so it will be in 1.0.1 as well. For other people running into this, you can export SCALA_HOME to any value and it will work. - Patrick On Sat, May 31, 2014 at 8:34 PM, Colin McCabe cmcc...@alumni.cmu.edu wrote: Spark currently supports two build systems, sbt and maven. sbt will download the correct version of scala, but with Maven you need to supply it yourself and set SCALA_HOME. It sounds like the instructions need to be updated-- perhaps create a JIRA? best, Colin On Sat, May 31, 2014 at 7:06 PM, Soren Macbeth so...@yieldbot.com wrote: Hello, Following the instructions for building spark 1.0.0, I encountered the following error: [ERROR] Failed to execute goal org.apache.maven.plugins:maven-antrun-plugin:1.7:run (default) on project spark-core_2.10: An Ant BuildException has occured: Please set the SCALA_HOME (or SCALA_LIBRARY_PATH if scala is on the path) environment variables and retry. [ERROR] around Ant part ...fail message=Please set the SCALA_HOME (or SCALA_LIBRARY_PATH if scala is on the path) environment variables and retry @ 6:126 in /Users/soren/src/spark-1.0.0/core/target/antrun/build-main.xml No where in the documentation does it mention that having scala installed and either of these env vars set nor what version should be installed. Setting these env vars wasn't required for 0.9.1 with sbt. I was able to get past it by downloading the scala 2.10.4 binary package to a temp dir and setting SCALA_HOME to that dir. Ideally, it would be nice to not have to require people to have a standalone scala installation but at a minimum this requirement should be documented in the build instructions no? -Soren
ClassTag in Serializer in 1.0.0 makes non-scala callers sad panda
https://github.com/apache/spark/blob/v1.0.0/core/src/main/scala/org/apache/spark/serializer/Serializer.scala#L64-L66 These changes to the SerializerInstance make it really gross to call serialize and deserialize from non-scala languages. I'm not sure what the purpose of a ClassTag is, but if we could get some other arities that don't require classtags that would help a ton.
Re: ClassTag in Serializer in 1.0.0 makes non-scala callers sad panda
Why do you need to call Serializer from your own program? It’s an internal developer API so ideally it would only be called to extend Spark. Are you looking to implement a custom Serializer? Matei On Jun 1, 2014, at 3:40 PM, Soren Macbeth so...@yieldbot.com wrote: https://github.com/apache/spark/blob/v1.0.0/core/src/main/scala/org/apache/spark/serializer/Serializer.scala#L64-L66 These changes to the SerializerInstance make it really gross to call serialize and deserialize from non-scala languages. I'm not sure what the purpose of a ClassTag is, but if we could get some other arities that don't require classtags that would help a ton.
Re: ClassTag in Serializer in 1.0.0 makes non-scala callers sad panda
BTW passing a ClassTag tells the Serializer what the type of object being serialized is when you compile your program, which will allow for more efficient serializers (especially on streams). Matei On Jun 1, 2014, at 4:24 PM, Matei Zaharia matei.zaha...@gmail.com wrote: Why do you need to call Serializer from your own program? It’s an internal developer API so ideally it would only be called to extend Spark. Are you looking to implement a custom Serializer? Matei On Jun 1, 2014, at 3:40 PM, Soren Macbeth so...@yieldbot.com wrote: https://github.com/apache/spark/blob/v1.0.0/core/src/main/scala/org/apache/spark/serializer/Serializer.scala#L64-L66 These changes to the SerializerInstance make it really gross to call serialize and deserialize from non-scala languages. I'm not sure what the purpose of a ClassTag is, but if we could get some other arities that don't require classtags that would help a ton.
Re: ClassTag in Serializer in 1.0.0 makes non-scala callers sad panda
I'm writing a Clojure DSL for Spark. I use kryo to serialize my clojure functions and for efficiency I hook into Spark's kryo serializer. In order to do that I get a SerializerInstance from SparkEnv and call the serialize and deserialize methods. I was able to workaround it by making ClassTag object in clojure, but it's less than ideal. On Sun, Jun 1, 2014 at 4:25 PM, Matei Zaharia matei.zaha...@gmail.com wrote: BTW passing a ClassTag tells the Serializer what the type of object being serialized is when you compile your program, which will allow for more efficient serializers (especially on streams). Matei On Jun 1, 2014, at 4:24 PM, Matei Zaharia matei.zaha...@gmail.com wrote: Why do you need to call Serializer from your own program? It’s an internal developer API so ideally it would only be called to extend Spark. Are you looking to implement a custom Serializer? Matei On Jun 1, 2014, at 3:40 PM, Soren Macbeth so...@yieldbot.com wrote: https://github.com/apache/spark/blob/v1.0.0/core/src/main/scala/org/apache/spark/serializer/Serializer.scala#L64-L66 These changes to the SerializerInstance make it really gross to call serialize and deserialize from non-scala languages. I'm not sure what the purpose of a ClassTag is, but if we could get some other arities that don't require classtags that would help a ton.
Re: ClassTag in Serializer in 1.0.0 makes non-scala callers sad panda
Ah, got it. In general it will always be safe to pass the ClassTag for java.lang.Object here — this is what our Java API does to say that type info is not known. So you can always pass that. Look at the Java code for how to get this ClassTag. Matei On Jun 1, 2014, at 4:33 PM, Soren Macbeth so...@yieldbot.com wrote: I'm writing a Clojure DSL for Spark. I use kryo to serialize my clojure functions and for efficiency I hook into Spark's kryo serializer. In order to do that I get a SerializerInstance from SparkEnv and call the serialize and deserialize methods. I was able to workaround it by making ClassTag object in clojure, but it's less than ideal. On Sun, Jun 1, 2014 at 4:25 PM, Matei Zaharia matei.zaha...@gmail.com wrote: BTW passing a ClassTag tells the Serializer what the type of object being serialized is when you compile your program, which will allow for more efficient serializers (especially on streams). Matei On Jun 1, 2014, at 4:24 PM, Matei Zaharia matei.zaha...@gmail.com wrote: Why do you need to call Serializer from your own program? It’s an internal developer API so ideally it would only be called to extend Spark. Are you looking to implement a custom Serializer? Matei On Jun 1, 2014, at 3:40 PM, Soren Macbeth so...@yieldbot.com wrote: https://github.com/apache/spark/blob/v1.0.0/core/src/main/scala/org/apache/spark/serializer/Serializer.scala#L64-L66 These changes to the SerializerInstance make it really gross to call serialize and deserialize from non-scala languages. I'm not sure what the purpose of a ClassTag is, but if we could get some other arities that don't require classtags that would help a ton.
Re: ClassTag in Serializer in 1.0.0 makes non-scala callers sad panda
Yep, that's what I'm doing. (def OBJECT-CLASS-TAG (.apply ClassTag$/MODULE$ java.lang.Object)) ps - I'm planning to open source this Clojure DSL soon as well On Sun, Jun 1, 2014 at 5:10 PM, Matei Zaharia matei.zaha...@gmail.com wrote: Ah, got it. In general it will always be safe to pass the ClassTag for java.lang.Object here — this is what our Java API does to say that type info is not known. So you can always pass that. Look at the Java code for how to get this ClassTag. Matei On Jun 1, 2014, at 4:33 PM, Soren Macbeth so...@yieldbot.com wrote: I'm writing a Clojure DSL for Spark. I use kryo to serialize my clojure functions and for efficiency I hook into Spark's kryo serializer. In order to do that I get a SerializerInstance from SparkEnv and call the serialize and deserialize methods. I was able to workaround it by making ClassTag object in clojure, but it's less than ideal. On Sun, Jun 1, 2014 at 4:25 PM, Matei Zaharia matei.zaha...@gmail.com wrote: BTW passing a ClassTag tells the Serializer what the type of object being serialized is when you compile your program, which will allow for more efficient serializers (especially on streams). Matei On Jun 1, 2014, at 4:24 PM, Matei Zaharia matei.zaha...@gmail.com wrote: Why do you need to call Serializer from your own program? It’s an internal developer API so ideally it would only be called to extend Spark. Are you looking to implement a custom Serializer? Matei On Jun 1, 2014, at 3:40 PM, Soren Macbeth so...@yieldbot.com wrote: https://github.com/apache/spark/blob/v1.0.0/core/src/main/scala/org/apache/spark/serializer/Serializer.scala#L64-L66 These changes to the SerializerInstance make it really gross to call serialize and deserialize from non-scala languages. I'm not sure what the purpose of a ClassTag is, but if we could get some other arities that don't require classtags that would help a ton.
Re: ClassTag in Serializer in 1.0.0 makes non-scala callers sad panda
Very cool, looking forward to it! Matei On Jun 1, 2014, at 5:42 PM, Soren Macbeth so...@yieldbot.com wrote: Yep, that's what I'm doing. (def OBJECT-CLASS-TAG (.apply ClassTag$/MODULE$ java.lang.Object)) ps - I'm planning to open source this Clojure DSL soon as well On Sun, Jun 1, 2014 at 5:10 PM, Matei Zaharia matei.zaha...@gmail.com wrote: Ah, got it. In general it will always be safe to pass the ClassTag for java.lang.Object here — this is what our Java API does to say that type info is not known. So you can always pass that. Look at the Java code for how to get this ClassTag. Matei On Jun 1, 2014, at 4:33 PM, Soren Macbeth so...@yieldbot.com wrote: I'm writing a Clojure DSL for Spark. I use kryo to serialize my clojure functions and for efficiency I hook into Spark's kryo serializer. In order to do that I get a SerializerInstance from SparkEnv and call the serialize and deserialize methods. I was able to workaround it by making ClassTag object in clojure, but it's less than ideal. On Sun, Jun 1, 2014 at 4:25 PM, Matei Zaharia matei.zaha...@gmail.com wrote: BTW passing a ClassTag tells the Serializer what the type of object being serialized is when you compile your program, which will allow for more efficient serializers (especially on streams). Matei On Jun 1, 2014, at 4:24 PM, Matei Zaharia matei.zaha...@gmail.com wrote: Why do you need to call Serializer from your own program? It’s an internal developer API so ideally it would only be called to extend Spark. Are you looking to implement a custom Serializer? Matei On Jun 1, 2014, at 3:40 PM, Soren Macbeth so...@yieldbot.com wrote: https://github.com/apache/spark/blob/v1.0.0/core/src/main/scala/org/apache/spark/serializer/Serializer.scala#L64-L66 These changes to the SerializerInstance make it really gross to call serialize and deserialize from non-scala languages. I'm not sure what the purpose of a ClassTag is, but if we could get some other arities that don't require classtags that would help a ton.