Re: SCALA_HOME or SCALA_LIBRARY_PATH not set during build

2014-06-01 Thread Patrick Wendell
This is a false error message actually - the Maven build no longer
requires SCALA_HOME but the message/check was still there. This was
fixed recently in master:

https://github.com/apache/spark/commit/d8c005d5371f81a2a06c5d27c7021e1ae43d7193

I can back port that fix into branch-1.0 so it will be in 1.0.1 as
well. For other people running into this, you can export SCALA_HOME to
any value and it will work.

- Patrick

On Sat, May 31, 2014 at 8:34 PM, Colin McCabe cmcc...@alumni.cmu.edu wrote:
 Spark currently supports two build systems, sbt and maven.  sbt will
 download the correct version of scala, but with Maven you need to supply it
 yourself and set SCALA_HOME.

 It sounds like the instructions need to be updated-- perhaps create a JIRA?

 best,
 Colin


 On Sat, May 31, 2014 at 7:06 PM, Soren Macbeth so...@yieldbot.com wrote:

 Hello,

 Following the instructions for building spark 1.0.0, I encountered the
 following error:

 [ERROR] Failed to execute goal
 org.apache.maven.plugins:maven-antrun-plugin:1.7:run (default) on project
 spark-core_2.10: An Ant BuildException has occured: Please set the
 SCALA_HOME (or SCALA_LIBRARY_PATH if scala is on the path) environment
 variables and retry.
 [ERROR] around Ant part ...fail message=Please set the SCALA_HOME (or
 SCALA_LIBRARY_PATH if scala is on the path) environment variables and
 retry @ 6:126 in
 /Users/soren/src/spark-1.0.0/core/target/antrun/build-main.xml

 No where in the documentation does it mention that having scala installed
 and either of these env vars set nor what version should be installed.
 Setting these env vars wasn't required for 0.9.1 with sbt.

 I was able to get past it by downloading the scala 2.10.4 binary package to
 a temp dir and setting SCALA_HOME to that dir.

 Ideally, it would be nice to not have to require people to have a
 standalone scala installation but at a minimum this requirement should be
 documented in the build instructions no?

 -Soren



Re: SCALA_HOME or SCALA_LIBRARY_PATH not set during build

2014-06-01 Thread Patrick Wendell
I went ahead and created a JIRA for this and back ported the
improvement into branch-1.0. This wasn't a regression per-se because
the behavior existed in all previous versions, but it's annoying
behavior so best to fix it.

https://issues.apache.org/jira/browse/SPARK-1984

- Patrick

On Sun, Jun 1, 2014 at 11:13 AM, Patrick Wendell pwend...@gmail.com wrote:
 This is a false error message actually - the Maven build no longer
 requires SCALA_HOME but the message/check was still there. This was
 fixed recently in master:

 https://github.com/apache/spark/commit/d8c005d5371f81a2a06c5d27c7021e1ae43d7193

 I can back port that fix into branch-1.0 so it will be in 1.0.1 as
 well. For other people running into this, you can export SCALA_HOME to
 any value and it will work.

 - Patrick

 On Sat, May 31, 2014 at 8:34 PM, Colin McCabe cmcc...@alumni.cmu.edu wrote:
 Spark currently supports two build systems, sbt and maven.  sbt will
 download the correct version of scala, but with Maven you need to supply it
 yourself and set SCALA_HOME.

 It sounds like the instructions need to be updated-- perhaps create a JIRA?

 best,
 Colin


 On Sat, May 31, 2014 at 7:06 PM, Soren Macbeth so...@yieldbot.com wrote:

 Hello,

 Following the instructions for building spark 1.0.0, I encountered the
 following error:

 [ERROR] Failed to execute goal
 org.apache.maven.plugins:maven-antrun-plugin:1.7:run (default) on project
 spark-core_2.10: An Ant BuildException has occured: Please set the
 SCALA_HOME (or SCALA_LIBRARY_PATH if scala is on the path) environment
 variables and retry.
 [ERROR] around Ant part ...fail message=Please set the SCALA_HOME (or
 SCALA_LIBRARY_PATH if scala is on the path) environment variables and
 retry @ 6:126 in
 /Users/soren/src/spark-1.0.0/core/target/antrun/build-main.xml

 No where in the documentation does it mention that having scala installed
 and either of these env vars set nor what version should be installed.
 Setting these env vars wasn't required for 0.9.1 with sbt.

 I was able to get past it by downloading the scala 2.10.4 binary package to
 a temp dir and setting SCALA_HOME to that dir.

 Ideally, it would be nice to not have to require people to have a
 standalone scala installation but at a minimum this requirement should be
 documented in the build instructions no?

 -Soren



Re: SCALA_HOME or SCALA_LIBRARY_PATH not set during build

2014-06-01 Thread Soren Macbeth
Cheers, I didn't think it was needed, but just wanted to point it out.


On Sun, Jun 1, 2014 at 11:21 AM, Patrick Wendell pwend...@gmail.com wrote:

 I went ahead and created a JIRA for this and back ported the
 improvement into branch-1.0. This wasn't a regression per-se because
 the behavior existed in all previous versions, but it's annoying
 behavior so best to fix it.

 https://issues.apache.org/jira/browse/SPARK-1984

 - Patrick

 On Sun, Jun 1, 2014 at 11:13 AM, Patrick Wendell pwend...@gmail.com
 wrote:
  This is a false error message actually - the Maven build no longer
  requires SCALA_HOME but the message/check was still there. This was
  fixed recently in master:
 
 
 https://github.com/apache/spark/commit/d8c005d5371f81a2a06c5d27c7021e1ae43d7193
 
  I can back port that fix into branch-1.0 so it will be in 1.0.1 as
  well. For other people running into this, you can export SCALA_HOME to
  any value and it will work.
 
  - Patrick
 
  On Sat, May 31, 2014 at 8:34 PM, Colin McCabe cmcc...@alumni.cmu.edu
 wrote:
  Spark currently supports two build systems, sbt and maven.  sbt will
  download the correct version of scala, but with Maven you need to
 supply it
  yourself and set SCALA_HOME.
 
  It sounds like the instructions need to be updated-- perhaps create a
 JIRA?
 
  best,
  Colin
 
 
  On Sat, May 31, 2014 at 7:06 PM, Soren Macbeth so...@yieldbot.com
 wrote:
 
  Hello,
 
  Following the instructions for building spark 1.0.0, I encountered the
  following error:
 
  [ERROR] Failed to execute goal
  org.apache.maven.plugins:maven-antrun-plugin:1.7:run (default) on
 project
  spark-core_2.10: An Ant BuildException has occured: Please set the
  SCALA_HOME (or SCALA_LIBRARY_PATH if scala is on the path) environment
  variables and retry.
  [ERROR] around Ant part ...fail message=Please set the SCALA_HOME (or
  SCALA_LIBRARY_PATH if scala is on the path) environment variables and
  retry @ 6:126 in
  /Users/soren/src/spark-1.0.0/core/target/antrun/build-main.xml
 
  No where in the documentation does it mention that having scala
 installed
  and either of these env vars set nor what version should be installed.
  Setting these env vars wasn't required for 0.9.1 with sbt.
 
  I was able to get past it by downloading the scala 2.10.4 binary
 package to
  a temp dir and setting SCALA_HOME to that dir.
 
  Ideally, it would be nice to not have to require people to have a
  standalone scala installation but at a minimum this requirement should
 be
  documented in the build instructions no?
 
  -Soren
 



ClassTag in Serializer in 1.0.0 makes non-scala callers sad panda

2014-06-01 Thread Soren Macbeth
https://github.com/apache/spark/blob/v1.0.0/core/src/main/scala/org/apache/spark/serializer/Serializer.scala#L64-L66

These changes to the SerializerInstance make it really gross to call
serialize and deserialize from non-scala languages. I'm not sure what the
purpose of a ClassTag is, but if we could get some other arities that don't
require classtags that would help a ton.


Re: ClassTag in Serializer in 1.0.0 makes non-scala callers sad panda

2014-06-01 Thread Matei Zaharia
Why do you need to call Serializer from your own program? It’s an internal 
developer API so ideally it would only be called to extend Spark. Are you 
looking to implement a custom Serializer?

Matei

On Jun 1, 2014, at 3:40 PM, Soren Macbeth so...@yieldbot.com wrote:

 https://github.com/apache/spark/blob/v1.0.0/core/src/main/scala/org/apache/spark/serializer/Serializer.scala#L64-L66
 
 These changes to the SerializerInstance make it really gross to call
 serialize and deserialize from non-scala languages. I'm not sure what the
 purpose of a ClassTag is, but if we could get some other arities that don't
 require classtags that would help a ton.



Re: ClassTag in Serializer in 1.0.0 makes non-scala callers sad panda

2014-06-01 Thread Matei Zaharia
BTW passing a ClassTag tells the Serializer what the type of object being 
serialized is when you compile your program, which will allow for more 
efficient serializers (especially on streams).

Matei

On Jun 1, 2014, at 4:24 PM, Matei Zaharia matei.zaha...@gmail.com wrote:

 Why do you need to call Serializer from your own program? It’s an internal 
 developer API so ideally it would only be called to extend Spark. Are you 
 looking to implement a custom Serializer?
 
 Matei
 
 On Jun 1, 2014, at 3:40 PM, Soren Macbeth so...@yieldbot.com wrote:
 
 https://github.com/apache/spark/blob/v1.0.0/core/src/main/scala/org/apache/spark/serializer/Serializer.scala#L64-L66
 
 These changes to the SerializerInstance make it really gross to call
 serialize and deserialize from non-scala languages. I'm not sure what the
 purpose of a ClassTag is, but if we could get some other arities that don't
 require classtags that would help a ton.
 



Re: ClassTag in Serializer in 1.0.0 makes non-scala callers sad panda

2014-06-01 Thread Soren Macbeth
I'm writing a Clojure DSL for Spark. I use kryo to serialize my clojure
functions and for efficiency I hook into Spark's kryo serializer. In order
to do that I get a SerializerInstance from SparkEnv and call the serialize
and deserialize methods. I was able to workaround it by making ClassTag
object in clojure, but it's less than ideal.


On Sun, Jun 1, 2014 at 4:25 PM, Matei Zaharia matei.zaha...@gmail.com
wrote:

 BTW passing a ClassTag tells the Serializer what the type of object being
 serialized is when you compile your program, which will allow for more
 efficient serializers (especially on streams).

 Matei

 On Jun 1, 2014, at 4:24 PM, Matei Zaharia matei.zaha...@gmail.com wrote:

  Why do you need to call Serializer from your own program? It’s an
 internal developer API so ideally it would only be called to extend Spark.
 Are you looking to implement a custom Serializer?
 
  Matei
 
  On Jun 1, 2014, at 3:40 PM, Soren Macbeth so...@yieldbot.com wrote:
 
 
 https://github.com/apache/spark/blob/v1.0.0/core/src/main/scala/org/apache/spark/serializer/Serializer.scala#L64-L66
 
  These changes to the SerializerInstance make it really gross to call
  serialize and deserialize from non-scala languages. I'm not sure what
 the
  purpose of a ClassTag is, but if we could get some other arities that
 don't
  require classtags that would help a ton.
 




Re: ClassTag in Serializer in 1.0.0 makes non-scala callers sad panda

2014-06-01 Thread Matei Zaharia
Ah, got it. In general it will always be safe to pass the ClassTag for 
java.lang.Object here — this is what our Java API does to say that type info is 
not known. So you can always pass that. Look at the Java code for how to get 
this ClassTag.

Matei

On Jun 1, 2014, at 4:33 PM, Soren Macbeth so...@yieldbot.com wrote:

 I'm writing a Clojure DSL for Spark. I use kryo to serialize my clojure
 functions and for efficiency I hook into Spark's kryo serializer. In order
 to do that I get a SerializerInstance from SparkEnv and call the serialize
 and deserialize methods. I was able to workaround it by making ClassTag
 object in clojure, but it's less than ideal.
 
 
 On Sun, Jun 1, 2014 at 4:25 PM, Matei Zaharia matei.zaha...@gmail.com
 wrote:
 
 BTW passing a ClassTag tells the Serializer what the type of object being
 serialized is when you compile your program, which will allow for more
 efficient serializers (especially on streams).
 
 Matei
 
 On Jun 1, 2014, at 4:24 PM, Matei Zaharia matei.zaha...@gmail.com wrote:
 
 Why do you need to call Serializer from your own program? It’s an
 internal developer API so ideally it would only be called to extend Spark.
 Are you looking to implement a custom Serializer?
 
 Matei
 
 On Jun 1, 2014, at 3:40 PM, Soren Macbeth so...@yieldbot.com wrote:
 
 
 https://github.com/apache/spark/blob/v1.0.0/core/src/main/scala/org/apache/spark/serializer/Serializer.scala#L64-L66
 
 These changes to the SerializerInstance make it really gross to call
 serialize and deserialize from non-scala languages. I'm not sure what
 the
 purpose of a ClassTag is, but if we could get some other arities that
 don't
 require classtags that would help a ton.
 
 
 



Re: ClassTag in Serializer in 1.0.0 makes non-scala callers sad panda

2014-06-01 Thread Soren Macbeth
Yep, that's what I'm doing.

(def OBJECT-CLASS-TAG (.apply ClassTag$/MODULE$ java.lang.Object))

ps - I'm planning to open source this Clojure DSL soon as well


On Sun, Jun 1, 2014 at 5:10 PM, Matei Zaharia matei.zaha...@gmail.com
wrote:

 Ah, got it. In general it will always be safe to pass the ClassTag for
 java.lang.Object here — this is what our Java API does to say that type
 info is not known. So you can always pass that. Look at the Java code for
 how to get this ClassTag.

 Matei

 On Jun 1, 2014, at 4:33 PM, Soren Macbeth so...@yieldbot.com wrote:

  I'm writing a Clojure DSL for Spark. I use kryo to serialize my clojure
  functions and for efficiency I hook into Spark's kryo serializer. In
 order
  to do that I get a SerializerInstance from SparkEnv and call the
 serialize
  and deserialize methods. I was able to workaround it by making ClassTag
  object in clojure, but it's less than ideal.
 
 
  On Sun, Jun 1, 2014 at 4:25 PM, Matei Zaharia matei.zaha...@gmail.com
  wrote:
 
  BTW passing a ClassTag tells the Serializer what the type of object
 being
  serialized is when you compile your program, which will allow for more
  efficient serializers (especially on streams).
 
  Matei
 
  On Jun 1, 2014, at 4:24 PM, Matei Zaharia matei.zaha...@gmail.com
 wrote:
 
  Why do you need to call Serializer from your own program? It’s an
  internal developer API so ideally it would only be called to extend
 Spark.
  Are you looking to implement a custom Serializer?
 
  Matei
 
  On Jun 1, 2014, at 3:40 PM, Soren Macbeth so...@yieldbot.com wrote:
 
 
 
 https://github.com/apache/spark/blob/v1.0.0/core/src/main/scala/org/apache/spark/serializer/Serializer.scala#L64-L66
 
  These changes to the SerializerInstance make it really gross to call
  serialize and deserialize from non-scala languages. I'm not sure what
  the
  purpose of a ClassTag is, but if we could get some other arities that
  don't
  require classtags that would help a ton.
 
 
 




Re: ClassTag in Serializer in 1.0.0 makes non-scala callers sad panda

2014-06-01 Thread Matei Zaharia
Very cool, looking forward to it!

Matei

On Jun 1, 2014, at 5:42 PM, Soren Macbeth so...@yieldbot.com wrote:

 Yep, that's what I'm doing.
 
 (def OBJECT-CLASS-TAG (.apply ClassTag$/MODULE$ java.lang.Object))
 
 ps - I'm planning to open source this Clojure DSL soon as well
 
 
 On Sun, Jun 1, 2014 at 5:10 PM, Matei Zaharia matei.zaha...@gmail.com
 wrote:
 
 Ah, got it. In general it will always be safe to pass the ClassTag for
 java.lang.Object here — this is what our Java API does to say that type
 info is not known. So you can always pass that. Look at the Java code for
 how to get this ClassTag.
 
 Matei
 
 On Jun 1, 2014, at 4:33 PM, Soren Macbeth so...@yieldbot.com wrote:
 
 I'm writing a Clojure DSL for Spark. I use kryo to serialize my clojure
 functions and for efficiency I hook into Spark's kryo serializer. In
 order
 to do that I get a SerializerInstance from SparkEnv and call the
 serialize
 and deserialize methods. I was able to workaround it by making ClassTag
 object in clojure, but it's less than ideal.
 
 
 On Sun, Jun 1, 2014 at 4:25 PM, Matei Zaharia matei.zaha...@gmail.com
 wrote:
 
 BTW passing a ClassTag tells the Serializer what the type of object
 being
 serialized is when you compile your program, which will allow for more
 efficient serializers (especially on streams).
 
 Matei
 
 On Jun 1, 2014, at 4:24 PM, Matei Zaharia matei.zaha...@gmail.com
 wrote:
 
 Why do you need to call Serializer from your own program? It’s an
 internal developer API so ideally it would only be called to extend
 Spark.
 Are you looking to implement a custom Serializer?
 
 Matei
 
 On Jun 1, 2014, at 3:40 PM, Soren Macbeth so...@yieldbot.com wrote:
 
 
 
 https://github.com/apache/spark/blob/v1.0.0/core/src/main/scala/org/apache/spark/serializer/Serializer.scala#L64-L66
 
 These changes to the SerializerInstance make it really gross to call
 serialize and deserialize from non-scala languages. I'm not sure what
 the
 purpose of a ClassTag is, but if we could get some other arities that
 don't
 require classtags that would help a ton.