Re: Spark on Windows

2015-04-17 Thread Arun Lists
Thanks, Sree!

Are you able to run your applications using spark-submit? Even after we
were able to build successfully, we ran into problems with running the
spark-submit script. If everything worked correctly for you, we can hope
that things will be smoother when 1.4.0 is made generally available.

arun

On Thu, Apr 16, 2015 at 10:18 PM, Sree V sree_at_ch...@yahoo.com wrote:

 spark 'master' branch (i.e. v1.4.0) builds successfully on windows 8.1
 intel i7 64-bit with oracle jdk8_45.
 with maven opts without the flag -XX:ReservedCodeCacheSize=1g.
 takes about 33 minutes.

 Thanking you.

 With Regards
 Sree




   On Thursday, April 16, 2015 9:07 PM, Arun Lists lists.a...@gmail.com
 wrote:


 Here is what I got from the engineer who worked on building Spark and
 using it on Windows:

 1)  Hadoop winutils.exe is needed on Windows, even for local files – and
 you have to set the Hadoop.home.dir in the spark-class2.cmd (for the two
 lines with $RUNNER near the end, by adding “-Dhadoop.home.dir=dir” file
 after downloading Hadoop binaries + winutils.
 2)  Java/Spark cannot delete the spark temporary files and it throws an
 exception (program still works though).  Manual clean-up works just fine,
 and it is not a permissions issue as it has rights to create the file (I
 have also tried using my own directory rather than the default, same error).
 3)  tried building Spark again, and have attached the log – I don’t get
 any errors, just warnings.  However when I try to use that JAR I just get
 the error message “Error: Could not find or load main class
 org.apache.spark.deploy.SparkSubmit”.

 On Thu, Apr 16, 2015 at 12:19 PM, Arun Lists lists.a...@gmail.com wrote:

 Thanks, Matei! We'll try that and let you know if it works. You are
 correct in inferring that some of the problems we had were with
 dependencies.

 We also had problems with the spark-submit scripts. I will get the details
 from the engineer who worked on the Windows builds and provide them to you.

 arun


 On Thu, Apr 16, 2015 at 10:44 AM, Matei Zaharia matei.zaha...@gmail.com
 wrote:

 You could build Spark with Scala 2.11 on Mac / Linux and transfer it over
 to Windows. AFAIK it should build on Windows too, the only problem is that
 Maven might take a long time to download dependencies. What errors are you
 seeing?

 Matei

  On Apr 16, 2015, at 9:23 AM, Arun Lists lists.a...@gmail.com wrote:
 
  We run Spark on Mac and Linux but also need to run it on Windows 8.1
 and  Windows Server. We ran into problems with the Scala 2.10 binary bundle
 for Spark 1.3.0 but managed to get it working. However, on Mac/Linux, we
 are on Scala 2.11.6 (we built Spark from the sources). On Windows, however
 despite our best efforts we cannot get Spark 1.3.0 as built from sources
 working for Scala 2.11.6. Spark has too many moving parts and dependencies!
 
  When can we expect to see a binary bundle for Spark 1.3.0 that is built
 for Scala 2.11.6?  I read somewhere that the only reason that Spark 1.3.0
 is still built for Scala 2.10 is because Kafka is still on Scala 2.10. For
 those of us who don't use Kafka, can we have a Scala 2.10 bundle.
 
  If there isn't an official bundle arriving any time soon, can someone
 who has built it for Windows 8.1 successfully please share with the group?
 
  Thanks,
  arun
 





 -
 To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
 For additional commands, e-mail: user-h...@spark.apache.org





Spark on Windows

2015-04-16 Thread Arun Lists
We run Spark on Mac and Linux but also need to run it on Windows 8.1 and
 Windows Server. We ran into problems with the Scala 2.10 binary bundle for
Spark 1.3.0 but managed to get it working. However, on Mac/Linux, we are on
Scala 2.11.6 (we built Spark from the sources). On Windows, however despite
our best efforts we cannot get Spark 1.3.0 as built from sources working
for Scala 2.11.6. Spark has too many moving parts and dependencies!

When can we expect to see a binary bundle for Spark 1.3.0 that is built for
Scala 2.11.6?  I read somewhere that the only reason that Spark 1.3.0 is
still built for Scala 2.10 is because Kafka is still on Scala 2.10. For
those of us who don't use Kafka, can we have a Scala 2.10 bundle.

If there isn't an official bundle arriving any time soon, can someone who
has built it for Windows 8.1 successfully please share with the group?

Thanks,
arun


Re: Spark on Windows

2015-04-16 Thread Arun Lists
Thanks, Matei! We'll try that and let you know if it works. You are correct
in inferring that some of the problems we had were with dependencies.

We also had problems with the spark-submit scripts. I will get the details
from the engineer who worked on the Windows builds and provide them to you.

arun


On Thu, Apr 16, 2015 at 10:44 AM, Matei Zaharia matei.zaha...@gmail.com
wrote:

 You could build Spark with Scala 2.11 on Mac / Linux and transfer it over
 to Windows. AFAIK it should build on Windows too, the only problem is that
 Maven might take a long time to download dependencies. What errors are you
 seeing?

 Matei

  On Apr 16, 2015, at 9:23 AM, Arun Lists lists.a...@gmail.com wrote:
 
  We run Spark on Mac and Linux but also need to run it on Windows 8.1
 and  Windows Server. We ran into problems with the Scala 2.10 binary bundle
 for Spark 1.3.0 but managed to get it working. However, on Mac/Linux, we
 are on Scala 2.11.6 (we built Spark from the sources). On Windows, however
 despite our best efforts we cannot get Spark 1.3.0 as built from sources
 working for Scala 2.11.6. Spark has too many moving parts and dependencies!
 
  When can we expect to see a binary bundle for Spark 1.3.0 that is built
 for Scala 2.11.6?  I read somewhere that the only reason that Spark 1.3.0
 is still built for Scala 2.10 is because Kafka is still on Scala 2.10. For
 those of us who don't use Kafka, can we have a Scala 2.10 bundle.
 
  If there isn't an official bundle arriving any time soon, can someone
 who has built it for Windows 8.1 successfully please share with the group?
 
  Thanks,
  arun
 




Re: Registering classes with KryoSerializer

2015-04-14 Thread Arun Lists
Wow, it all works now! Thanks, Imran!

In case someone else finds this useful, here are the additional classes
that I had to register (in addition to my application specific classes):

val tuple3ArrayClass = classOf[Array[Tuple3[Any, Any, Any]]]
val anonClass = Class.forName(scala.reflect.ClassTag$$anon$1)
val javaClassClass = classOf[java.lang.Class[Any]]

arun

On Tue, Apr 14, 2015 at 6:23 PM, Imran Rashid iras...@cloudera.com wrote:

 hmm, I dunno why IntelliJ is unhappy, but you can always fall back to
 getting a class from the String:

 Class.forName(scala.reflect.ClassTag$$anon$1)

 perhaps the class is package private or something, and the repl somehow
 subverts it ...

 On Tue, Apr 14, 2015 at 5:44 PM, Arun Lists lists.a...@gmail.com wrote:

 Hi Imran,

 Thanks for the response! However, I am still not there yet.

 In the Scala interpreter, I can do:

 scala classOf[scala.reflect.ClassTag$$anon$1]

 but when I try to do this in my program in IntelliJ, it indicates an
 error:

 Cannot resolve symbol ClassTag$$anon$1

 Hence I am not any closer to making this work. If you have any further
 suggestions, they would be most welcome.

 arun


 On Tue, Apr 14, 2015 at 2:33 PM, Imran Rashid iras...@cloudera.com
 wrote:

 Hi Arun,

 It can be hard to use kryo with required registration because of issues
 like this -- there isn't a good way to register all the classes that you
 need transitively.  In this case, it looks like one of your classes has a
 reference to a ClassTag, which in turn has a reference to some anonymous
 inner class.  I'd suggest

 (a) figuring out whether you really want to be serializing this thing --
 its possible you're serializing an RDD which keeps a ClassTag, but normally
 you wouldn't want to serialize your RDDs
 (b) you might want to bring this up w/ chill -- spark offloads most of
 the kryo setup for all the scala internals to chill, I'm surprised they
 don't handle this already.  Looks like they still handle ClassManifests
 which are from pre-scala 2.10:
 https://github.com/twitter/chill/blob/master/chill-scala/src/main/scala/com/twitter/chill/ScalaKryoInstantiator.scala#L189

 (c) you can always register these classes yourself, despite the crazy
 names, though you'll just need to knock these out one-by-one:

 scala classOf[scala.reflect.ClassTag$$anon$1]

 res0: Class[scala.reflect.ClassTag[T]{def unapply(x$1:
 scala.runtime.BoxedUnit): Option[_]; def arrayClass(x$1: Class[_]):
 Class[_]}] = class scala.reflect.ClassTag$$anon$1

 On Mon, Apr 13, 2015 at 6:09 PM, Arun Lists lists.a...@gmail.com
 wrote:

 Hi,

 I am trying to register classes with KryoSerializer. This has worked
 with other programs. Usually the error messages are helpful in indicating
 which classes need to be registered. But with my current program, I get the
 following cryptic error message:

 *Caused by: java.lang.IllegalArgumentException: Class is not
 registered: scala.reflect.ClassTag$$anon$1*

 *Note: To register this class use:
 kryo.register(scala.reflect.ClassTag$$anon$1.class);*

 How do I find out which class needs to be registered? I looked at my
 program and registered all classes used in RDDs. But clearly more classes
 remain to be registered if I can figure out which classes.

 Thanks for your help!

 arun








Re: Registering classes with KryoSerializer

2015-04-14 Thread Arun Lists
Hi Imran,

Thanks for the response! However, I am still not there yet.

In the Scala interpreter, I can do:

scala classOf[scala.reflect.ClassTag$$anon$1]

but when I try to do this in my program in IntelliJ, it indicates an error:

Cannot resolve symbol ClassTag$$anon$1

Hence I am not any closer to making this work. If you have any further
suggestions, they would be most welcome.

arun


On Tue, Apr 14, 2015 at 2:33 PM, Imran Rashid iras...@cloudera.com wrote:

 Hi Arun,

 It can be hard to use kryo with required registration because of issues
 like this -- there isn't a good way to register all the classes that you
 need transitively.  In this case, it looks like one of your classes has a
 reference to a ClassTag, which in turn has a reference to some anonymous
 inner class.  I'd suggest

 (a) figuring out whether you really want to be serializing this thing --
 its possible you're serializing an RDD which keeps a ClassTag, but normally
 you wouldn't want to serialize your RDDs
 (b) you might want to bring this up w/ chill -- spark offloads most of the
 kryo setup for all the scala internals to chill, I'm surprised they don't
 handle this already.  Looks like they still handle ClassManifests which are
 from pre-scala 2.10:
 https://github.com/twitter/chill/blob/master/chill-scala/src/main/scala/com/twitter/chill/ScalaKryoInstantiator.scala#L189

 (c) you can always register these classes yourself, despite the crazy
 names, though you'll just need to knock these out one-by-one:

 scala classOf[scala.reflect.ClassTag$$anon$1]

 res0: Class[scala.reflect.ClassTag[T]{def unapply(x$1:
 scala.runtime.BoxedUnit): Option[_]; def arrayClass(x$1: Class[_]):
 Class[_]}] = class scala.reflect.ClassTag$$anon$1

 On Mon, Apr 13, 2015 at 6:09 PM, Arun Lists lists.a...@gmail.com wrote:

 Hi,

 I am trying to register classes with KryoSerializer. This has worked with
 other programs. Usually the error messages are helpful in indicating which
 classes need to be registered. But with my current program, I get the
 following cryptic error message:

 *Caused by: java.lang.IllegalArgumentException: Class is not registered:
 scala.reflect.ClassTag$$anon$1*

 *Note: To register this class use:
 kryo.register(scala.reflect.ClassTag$$anon$1.class);*

 How do I find out which class needs to be registered? I looked at my
 program and registered all classes used in RDDs. But clearly more classes
 remain to be registered if I can figure out which classes.

 Thanks for your help!

 arun






Registering classes with KryoSerializer

2015-04-13 Thread Arun Lists
Hi,

I am trying to register classes with KryoSerializer. This has worked with
other programs. Usually the error messages are helpful in indicating which
classes need to be registered. But with my current program, I get the
following cryptic error message:

*Caused by: java.lang.IllegalArgumentException: Class is not registered:
scala.reflect.ClassTag$$anon$1*

*Note: To register this class use:
kryo.register(scala.reflect.ClassTag$$anon$1.class);*

How do I find out which class needs to be registered? I looked at my
program and registered all classes used in RDDs. But clearly more classes
remain to be registered if I can figure out which classes.

Thanks for your help!

arun


Reading file with Unicode characters

2015-04-08 Thread Arun Lists
Hi,

Does SparkContext's textFile() method handle files with Unicode characters?
How about files in UTF-8 format?

Going further, is it possible to specify encodings to the method? If not,
what should one do if the files to be read are in some encoding?

Thanks,
arun


Re: Reading file with Unicode characters

2015-04-08 Thread Arun Lists
Thanks!

arun

On Wed, Apr 8, 2015 at 10:51 AM, java8964 java8...@hotmail.com wrote:

 Spark use the Hadoop TextInputFormat to read the file. Since Hadoop is
 almost only supporting Linux, so UTF-8 is the only encoding supported, as
 it is the the one on Linux.

 If you have other encoding data, you may want to vote for this Jira:
 https://issues.apache.org/jira/browse/MAPREDUCE-232

 Yong

 --
 Date: Wed, 8 Apr 2015 10:35:18 -0700
 Subject: Reading file with Unicode characters
 From: lists.a...@gmail.com
 To: user@spark.apache.org
 CC: lists.a...@gmail.com


 Hi,

 Does SparkContext's textFile() method handle files with Unicode
 characters? How about files in UTF-8 format?

 Going further, is it possible to specify encodings to the method? If not,
 what should one do if the files to be read are in some encoding?

 Thanks,
 arun




Specifying Spark property from command line?

2015-04-07 Thread Arun Lists
Hi,

Is it possible to specify a Spark property like spark.local.dir from the
command line when running an application using spark-submit?

Thanks,
arun


Error when running Spark on Windows 8.1

2015-04-07 Thread Arun Lists
Hi,

We are trying to run a Spark application using spark-submit on Windows 8.1.
The application runs successfully to completion on MacOS 10.10 and on
Ubuntu Linux. On Windows, we get the following error messages (see below).
It appears that Spark is trying to delete some temporary directory that it
creates.

How do we solve this problem?

Thanks,
arun

5/04/07 10:55:14 ERROR Utils: Exception while deleting Spark temp dir:
C:\Users\JOSHMC~1\AppData\Local\Temp\spark-339bf2d9-8b89-46e9-b5c1-404caf9d3cd7\userFiles-62976ef7-ab56-41c0-a35b-793c7dca31c7

java.io.IOException: Failed to delete:
C:\Users\JOSHMC~1\AppData\Local\Temp\spark-339bf2d9-8b89-46e9-b5c1-404caf9d3cd7\userFiles-62976ef7-ab56-41c0-a35b-793c7dca31c7

  at
org.apache.spark.util.Utils$.deleteRecursively(Utils.scala:932)

  at
org.apache.spark.util.Utils$$anon$4$$anonfun$run$1$$anonfun$apply$mcV$sp$2.apply(Utils.scala:181)

  at
org.apache.spark.util.Utils$$anon$4$$anonfun$run$1$$anonfun$apply$mcV$sp$2.apply(Utils.scala:179)

  at scala.collection.mutable.HashSet.foreach(HashSet.scala:79)

  at
org.apache.spark.util.Utils$$anon$4$$anonfun$run$1.apply$mcV$sp(Utils.scala:179)

  at
org.apache.spark.util.Utils$$anon$4$$anonfun$run$1.apply(Utils.scala:177)

  at
org.apache.spark.util.Utils$$anon$4$$anonfun$run$1.apply(Utils.scala:177)

  at
org.apache.spark.util.Utils$.logUncaughtExceptions(Utils.scala:1617)

  at org.apache.spark.util.Utils$$anon$4.run(Utils.scala:177)


Re: Specifying Spark property from command line?

2015-04-07 Thread Arun Lists
I just figured this out from the documentation:

--conf spark.local.dir=C:\Temp


On Tue, Apr 7, 2015 at 5:00 PM, Arun Lists lists.a...@gmail.com wrote:

 Hi,

 Is it possible to specify a Spark property like spark.local.dir from the
 command line when running an application using spark-submit?

 Thanks,
 arun




Registering classes with KryoSerializer

2015-03-30 Thread Arun Lists
I am trying to register classes with KryoSerializer. I get the following
error message:

How do I find out what class is being referred to by: *OpenHashMap$mcI$sp ?*

*com.esotericsoftware.kryo.KryoException:
java.lang.IllegalArgumentException: Class is not registered:
com.comp.common.base.OpenHashMap$mcI$sp*

*Note: To register this class use: *
*kryo.register(com.dtex.common.base.OpenHashMap$mcI$sp.class);*

I have registered other classes with it by using:

sparkConf.registerKryoClasses(Array(

  classOf[MyClass]

))


Thanks,

arun


ClassNotFoundException when registering classes with Kryo

2015-02-01 Thread Arun Lists
Here is the relevant snippet of code in my main program:

===

sparkConf.set(spark.serializer,
  org.apache.spark.serializer.KryoSerializer)
sparkConf.set(spark.kryo.registrationRequired, true)
val summaryDataClass = classOf[SummaryData]
val summaryViewClass = classOf[SummaryView]
sparkConf.registerKryoClasses(Array(

  summaryDataClass, summaryViewClass))

===

I get the following error:

Exception in thread main java.lang.reflect.InvocationTargetException
...

Caused by: org.apache.spark.SparkException: Failed to load class to
register with Kryo
...

Caused by: java.lang.ClassNotFoundException:
com.dtex.analysis.transform.SummaryData


Note that the class in question SummaryData is in the same package as the
main program and hence in the same jar.

What do I need to do to make this work?

Thanks,
arun


Re: ClassNotFoundException when registering classes with Kryo

2015-02-01 Thread Arun Lists
Thanks for the notification!

For now, I'll use the Kryo serializer without registering classes until the
bug fix has been merged into the next version of Spark (I guess that will
be 1.3, right?).

arun


On Sun, Feb 1, 2015 at 10:58 PM, Shixiong Zhu zsxw...@gmail.com wrote:

 It's a bug that has been fixed in
 https://github.com/apache/spark/pull/4258 but not yet been merged.

 Best Regards,
 Shixiong Zhu

 2015-02-02 10:08 GMT+08:00 Arun Lists lists.a...@gmail.com:

 Here is the relevant snippet of code in my main program:

 ===

 sparkConf.set(spark.serializer,
   org.apache.spark.serializer.KryoSerializer)
 sparkConf.set(spark.kryo.registrationRequired, true)
 val summaryDataClass = classOf[SummaryData]
 val summaryViewClass = classOf[SummaryView]
 sparkConf.registerKryoClasses(Array(

   summaryDataClass, summaryViewClass))

 ===

 I get the following error:

 Exception in thread main java.lang.reflect.InvocationTargetException
 ...

 Caused by: org.apache.spark.SparkException: Failed to load class to
 register with Kryo
 ...

 Caused by: java.lang.ClassNotFoundException:
 com.dtex.analysis.transform.SummaryData


 Note that the class in question SummaryData is in the same package as the
 main program and hence in the same jar.

 What do I need to do to make this work?

 Thanks,
 arun






Re: Reading resource files in a Spark application

2015-01-14 Thread Arun Lists
The problem is that it gives an error message saying something to the
effect that:

URI is not hierarchical

This is consistent with your explanation.

Thanks,
arun


On Wed, Jan 14, 2015 at 1:14 AM, Sean Owen so...@cloudera.com wrote:

 My hunch is that it is because the URI of a resource in a JAR file will
 necessarily be specific to where the JAR is on the local filesystem and
 that is not portable or the right way to read a resource. But you didn't
 specify the problem here.
 On Jan 14, 2015 5:15 AM, Arun Lists lists.a...@gmail.com wrote:

 I experimented with using getResourceAsStream(cls, fileName) instead
 cls.getResource(fileName).toURI. That works!

 I have no idea why the latter method does not work in Spark. Any
 explanations would be welcome.

 Thanks,
 arun


 On Tue, Jan 13, 2015 at 6:35 PM, Arun Lists lists.a...@gmail.com wrote:

 In some classes, I initialize some values from resource files using the
 following snippet:

 new File(cls.getResource(fileName).toURI)

 This works fine in SBT. When I run it using spark-submit, I get a bunch of 
 errors because the classes cannot be initialized. What can I do to make 
 such initialization that is Spark friendly?

 Thanks,
 arun






Re: Reading resource files in a Spark application

2015-01-13 Thread Arun Lists
I experimented with using getResourceAsStream(cls, fileName) instead
cls.getResource(fileName).toURI. That works!

I have no idea why the latter method does not work in Spark. Any
explanations would be welcome.

Thanks,
arun


On Tue, Jan 13, 2015 at 6:35 PM, Arun Lists lists.a...@gmail.com wrote:

 In some classes, I initialize some values from resource files using the
 following snippet:

 new File(cls.getResource(fileName).toURI)

 This works fine in SBT. When I run it using spark-submit, I get a bunch of 
 errors because the classes cannot be initialized. What can I do to make such 
 initialization that is Spark friendly?

 Thanks,
 arun





Re: Running Spark application from command line

2015-01-13 Thread Arun Lists
Yes, I am running with Scala 2.11. Here is what I see when I do scala
-version

 scala -version

Scala code runner version 2.11.4 -- Copyright 2002-2013, LAMP/EPFL

On Tue, Jan 13, 2015 at 2:30 AM, Sean Owen so...@cloudera.com wrote:

 It sounds like possibly a Scala version mismatch? are you sure you're
 running with Scala 2.11 too?

 On Tue, Jan 13, 2015 at 6:58 AM, Arun Lists lists.a...@gmail.com wrote:
  I have a Spark application that was assembled using sbt 0.13.7, Scala
 2.11,
  and Spark 1.2.0. In build.sbt, I am running on Mac OSX Yosemite.
 
  I use provided for the Spark dependencies. I can run the application
 fine
  within sbt.
 
  I run into problems when I try to run it from the command line. Here is
 the
  command I use:
 
  ADD_JARS=analysis/target/scala-2.11/dtex-analysis_2.11-0.1.jar scala -cp
 
 /Applications/spark-1.2.0-bin-hadoop2.4/lib/spark-assembly-1.2.0-hadoop2.4.0.jar:analysis/target/scala-2.11/dtex-analysis_2.11-0.1.jar
  com.dtex.analysis.transform.GenUserSummaryView ...
 
  I get the following error messages below. Please advise what I can do to
  resolve this issue. Thanks!
 
  arun
 
  15/01/12 22:47:18 WARN NativeCodeLoader: Unable to load native-hadoop
  library for your platform... using builtin-java classes where applicable
 
  15/01/12 22:47:18 WARN BlockManager: Putting block broadcast_0 failed
 
  java.lang.NoSuchMethodError:
  scala.collection.immutable.$colon$colon.hd$1()Ljava/lang/Object;
 
  at
 
 org.apache.spark.util.collection.SizeTracker$class.takeSample(SizeTracker.scala:84)
 
  at
 
 org.apache.spark.util.collection.SizeTracker$class.resetSamples(SizeTracker.scala:61)
 
  at
 
 org.apache.spark.util.collection.SizeTrackingVector.resetSamples(SizeTrackingVector.scala:25)
 
  at
 
 org.apache.spark.util.collection.SizeTracker$class.$init$(SizeTracker.scala:51)
 
  at
 
 org.apache.spark.util.collection.SizeTrackingVector.init(SizeTrackingVector.scala:25)
 
  at
 org.apache.spark.storage.MemoryStore.unrollSafely(MemoryStore.scala:236)
 
  at
 org.apache.spark.storage.MemoryStore.putIterator(MemoryStore.scala:136)
 
  at
 org.apache.spark.storage.MemoryStore.putIterator(MemoryStore.scala:114)
 
  at org.apache.spark.storage.BlockManager.doPut(BlockManager.scala:787)
 
  at
 org.apache.spark.storage.BlockManager.putIterator(BlockManager.scala:638)
 
  at
 org.apache.spark.storage.BlockManager.putSingle(BlockManager.scala:992)
 
  at
 
 org.apache.spark.broadcast.TorrentBroadcast.writeBlocks(TorrentBroadcast.scala:98)
 
  at
 
 org.apache.spark.broadcast.TorrentBroadcast.init(TorrentBroadcast.scala:84)
 
  at
 
 org.apache.spark.broadcast.TorrentBroadcastFactory.newBroadcast(TorrentBroadcastFactory.scala:34)
 
  at
 
 org.apache.spark.broadcast.TorrentBroadcastFactory.newBroadcast(TorrentBroadcastFactory.scala:29)
 
  at
 
 org.apache.spark.broadcast.BroadcastManager.newBroadcast(BroadcastManager.scala:62)
 
  at org.apache.spark.SparkContext.broadcast(SparkContext.scala:945)
 
  at org.apache.spark.SparkContext.hadoopFile(SparkContext.scala:695)
 
  at org.apache.spark.SparkContext.textFile(SparkContext.scala:540)
 
  at
 
 com.dtex.analysis.transform.TransformUtils$anonfun$2.apply(TransformUtils.scala:97)
 
  at
 
 com.dtex.analysis.transform.TransformUtils$anonfun$2.apply(TransformUtils.scala:97)
 
  at
 
 scala.collection.TraversableLike$anonfun$map$1.apply(TraversableLike.scala:245)
 
  at
 
 scala.collection.TraversableLike$anonfun$map$1.apply(TraversableLike.scala:245)
 
  at
 
 scala.collection.IndexedSeqOptimized$class.foreach(IndexedSeqOptimized.scala:33)
 
  at scala.collection.mutable.ArrayOps$ofRef.foreach(ArrayOps.scala:186)
 
  at scala.collection.TraversableLike$class.map(TraversableLike.scala:245)
 
  at scala.collection.mutable.ArrayOps$ofRef.map(ArrayOps.scala:186)
 
  at
 
 com.dtex.analysis.transform.TransformUtils$.generateUserSummaryData(TransformUtils.scala:97)
 
  at
 
 com.dtex.analysis.transform.GenUserSummaryView$.main(GenUserSummaryView.scala:77)
 
  at
 
 com.dtex.analysis.transform.GenUserSummaryView.main(GenUserSummaryView.scala)
 
  at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
 
  at
 
 sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
 
  at
 
 sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
 
  at java.lang.reflect.Method.invoke(Method.java:483)
 
  at
 
 scala.reflect.internal.util.ScalaClassLoader$anonfun$run$1.apply(ScalaClassLoader.scala:70)
 
  at
 
 scala.reflect.internal.util.ScalaClassLoader$class.asContext(ScalaClassLoader.scala:31)
 
  at
 
 scala.reflect.internal.util.ScalaClassLoader$URLClassLoader.asContext(ScalaClassLoader.scala:101)
 
  at
 
 scala.reflect.internal.util.ScalaClassLoader$class.run(ScalaClassLoader.scala:70)
 
  at
 
 scala.reflect.internal.util.ScalaClassLoader$URLClassLoader.run(ScalaClassLoader.scala:101)
 
  at scala.tools.nsc.CommonRunner$class.run(ObjectRunner.scala:22)
 
  at scala.tools.nsc.ObjectRunner$.run

Reading resource files in a Spark application

2015-01-13 Thread Arun Lists
In some classes, I initialize some values from resource files using the
following snippet:

new File(cls.getResource(fileName).toURI)

This works fine in SBT. When I run it using spark-submit, I get a
bunch of errors because the classes cannot be initialized. What can I
do to make such initialization that is Spark friendly?

Thanks,
arun


Running Spark application from command line

2015-01-12 Thread Arun Lists
I have a Spark application that was assembled using sbt 0.13.7, Scala 2.11,
and Spark 1.2.0. In build.sbt, I am running on Mac OSX Yosemite.

I use provided for the Spark dependencies. I can run the application fine
within sbt.

I run into problems when I try to run it from the command line. Here is the
command I use:

ADD_JARS=analysis/target/scala-2.11/dtex-analysis_2.11-0.1.jar scala -cp
/Applications/spark-1.2.0-bin-hadoop2.4/lib/spark-assembly-1.2.0-hadoop2.4.0.jar:analysis/target/scala-2.11/dtex-analysis_2.11-0.1.jar
com.dtex.analysis.transform.GenUserSummaryView ...

I get the following error messages below. Please advise what I can do to
resolve this issue. Thanks!

arun
15/01/12 22:47:18 WARN NativeCodeLoader: Unable to load native-hadoop
library for your platform... using builtin-java classes where applicable

15/01/12 22:47:18 WARN BlockManager: Putting block broadcast_0 failed

java.lang.NoSuchMethodError:
scala.collection.immutable.$colon$colon.hd$1()Ljava/lang/Object;

at
org.apache.spark.util.collection.SizeTracker$class.takeSample(SizeTracker.scala:84)

at
org.apache.spark.util.collection.SizeTracker$class.resetSamples(SizeTracker.scala:61)

at
org.apache.spark.util.collection.SizeTrackingVector.resetSamples(SizeTrackingVector.scala:25)

at
org.apache.spark.util.collection.SizeTracker$class.$init$(SizeTracker.scala:51)

at
org.apache.spark.util.collection.SizeTrackingVector.init(SizeTrackingVector.scala:25)

at org.apache.spark.storage.MemoryStore.unrollSafely(MemoryStore.scala:236)

at org.apache.spark.storage.MemoryStore.putIterator(MemoryStore.scala:136)

at org.apache.spark.storage.MemoryStore.putIterator(MemoryStore.scala:114)

at org.apache.spark.storage.BlockManager.doPut(BlockManager.scala:787)

at org.apache.spark.storage.BlockManager.putIterator(BlockManager.scala:638)

at org.apache.spark.storage.BlockManager.putSingle(BlockManager.scala:992)

at
org.apache.spark.broadcast.TorrentBroadcast.writeBlocks(TorrentBroadcast.scala:98)

at
org.apache.spark.broadcast.TorrentBroadcast.init(TorrentBroadcast.scala:84)

at
org.apache.spark.broadcast.TorrentBroadcastFactory.newBroadcast(TorrentBroadcastFactory.scala:34)

at
org.apache.spark.broadcast.TorrentBroadcastFactory.newBroadcast(TorrentBroadcastFactory.scala:29)

at
org.apache.spark.broadcast.BroadcastManager.newBroadcast(BroadcastManager.scala:62)

at org.apache.spark.SparkContext.broadcast(SparkContext.scala:945)

at org.apache.spark.SparkContext.hadoopFile(SparkContext.scala:695)

at org.apache.spark.SparkContext.textFile(SparkContext.scala:540)

at
com.dtex.analysis.transform.TransformUtils$anonfun$2.apply(TransformUtils.scala:97)

at
com.dtex.analysis.transform.TransformUtils$anonfun$2.apply(TransformUtils.scala:97)

at
scala.collection.TraversableLike$anonfun$map$1.apply(TraversableLike.scala:245)

at
scala.collection.TraversableLike$anonfun$map$1.apply(TraversableLike.scala:245)

at
scala.collection.IndexedSeqOptimized$class.foreach(IndexedSeqOptimized.scala:33)

at scala.collection.mutable.ArrayOps$ofRef.foreach(ArrayOps.scala:186)

at scala.collection.TraversableLike$class.map(TraversableLike.scala:245)

at scala.collection.mutable.ArrayOps$ofRef.map(ArrayOps.scala:186)

at
com.dtex.analysis.transform.TransformUtils$.generateUserSummaryData(TransformUtils.scala:97)

at
com.dtex.analysis.transform.GenUserSummaryView$.main(GenUserSummaryView.scala:77)

at
com.dtex.analysis.transform.GenUserSummaryView.main(GenUserSummaryView.scala)

at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)

at
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)

at
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)

at java.lang.reflect.Method.invoke(Method.java:483)

at
scala.reflect.internal.util.ScalaClassLoader$anonfun$run$1.apply(ScalaClassLoader.scala:70)

at
scala.reflect.internal.util.ScalaClassLoader$class.asContext(ScalaClassLoader.scala:31)

at
scala.reflect.internal.util.ScalaClassLoader$URLClassLoader.asContext(ScalaClassLoader.scala:101)

at
scala.reflect.internal.util.ScalaClassLoader$class.run(ScalaClassLoader.scala:70)

at
scala.reflect.internal.util.ScalaClassLoader$URLClassLoader.run(ScalaClassLoader.scala:101)

at scala.tools.nsc.CommonRunner$class.run(ObjectRunner.scala:22)

at scala.tools.nsc.ObjectRunner$.run(ObjectRunner.scala:39)

at scala.tools.nsc.CommonRunner$class.runAndCatch(ObjectRunner.scala:29)

at scala.tools.nsc.ObjectRunner$.runAndCatch(ObjectRunner.scala:39)

at scala.tools.nsc.MainGenericRunner.runTarget$1(MainGenericRunner.scala:65)

at scala.tools.nsc.MainGenericRunner.run$1(MainGenericRunner.scala:87)

at scala.tools.nsc.MainGenericRunner.process(MainGenericRunner.scala:98)

at scala.tools.nsc.MainGenericRunner$.main(MainGenericRunner.scala:103)

at scala.tools.nsc.MainGenericRunner.main(MainGenericRunner.scala)