Re: mutable.LinkedHashMap kryo serialization issues

2016-08-26 Thread Rahul Palamuttam
Hi,

I apologize, I spoke too soon.
Those transient member variables may not be the issue.

To clarify my test case I am creating a LinkedHashMap with two elements in
a map expression on an RDD.
Note that the LinkedHashMaps are being created on the worker JVMs (not the
driver JVM) and THEN collected to the driver JVM.
I am NOT creating LinkedHashMaps on the driver and then parallelizing them
(sending them to worker JVMs).

As Renato said spark requires us to register classes that aren't yet in
Chill.
As far as I know there are three ways to register and it's through api
calls on sparkConf.

1. sparkConf().registerKryoClasses(Array(classOf[...], clasOf[...]))
* This is the method of registering classes as described in the Tuning page:
http://spark.apache.org/docs/latest/tuning.html#data-serialization

2. sparkConf().set("spark.kryo.classesToRegister", "cName1, cName2")

3. sparkConf().set("spark.kryo.registrator", "registrator1, registrator2")

In the first two methods, which set the classes to register in Kryo,
what I get are empty mutable.LinkedHashMaps after calling collect on the
RDD.
To my best understanding this should not happen (none of the other
collection classes I have used have this problem).

For the third method I created a registrator for mutable.LinkedHashMap
which can be found here :
https://gist.github.com/rahulpalamuttam/9f3bfa39a160efa80844d3a7a7bd87cd

I set the registrator like so :
sparkConf().set("spark.kryo.registrator",
"org.dia.core.MutableLinkedHashMapRegistrator").
Now, when I do the same test, I get an Array of LinkedHashMaps.
Each LinkedHashMap contains the entries I populated it with in the map task.

Why do the first two methods result in improper serialization of
mutable.LinkedHashMap?
Should I file a JIRA for it?

Much credit should be given to Martin Grotzke from EsotericSoftware/kryo
who helped me tremendously.

Best,

Rahul Palamuttam




On Fri, Aug 26, 2016 at 10:16 AM, Rahul Palamuttam <rahulpala...@gmail.com>
wrote:

> Thanks Renato.
>
> I forgot to reply all last time. I apologize for the rather confusing
> example.
> All that the snipet code did was
> 1. Make an RDD of LinkedHashMaps with size 2
> 2. On the worker side get the sizes of the HashMaps (via a map(hash =>
> hash.size))
> 3. On the driver call collect on the RDD[Ints] which is the RDD of hashmap
> sizes giving you an Array[Ints]
> 4. On the driver call collect on the RDD[LinkedHashMap] giving you an
> Array[LinkedHashMap]
> 5. Check the size of a hashmap in Array[LinkedHashMap] with any size value
> in Array[Ints] (they're all going to be the same size).
> 6. The sizes differ because the elements of the LinkedHashMap were never
> copied over
>
> Anyway I think I've tracked down the issue and it doesn't seem to be a
> spark or kryo issue.
>
> For those it concerns LinkedHashMap has this serialization issue because
> it has transient members for firstEntry and lastEntry.
> Take a look here : https://github.com/scala/scala/blob/v2.11.8/src/
> library/scala/collection/mutable/LinkedHashMap.scala#L62
>
> Those attributes are not going to be serialized.
> Furthermore, the iterator on LinkedHashMap depends on the firstEntry
> variable
> Since that member is not serialized it is null.
> The iterator requires the firstEntry variable to walk the LinkedHashMap
> https://github.com/scala/scala/blob/v2.11.8/src/library/scala/collection/
> mutable/LinkedHashMap.scala#L94-L100
>
> I wonder why these two variables were made transient.
>
> Best,
> Rahul Palamuttam
>
>
> On Thu, Aug 25, 2016 at 11:13 PM, Renato Marroquín Mogrovejo <
> renatoj.marroq...@gmail.com> wrote:
>
>> Hi Rahul,
>>
>> You have probably already figured this one out, but anyway...
>> You need to register the classes that you'll be using with Kryo because
>> it does not support all Serializable types and requires you to register the
>> classes you’ll use in the program in advance. So when you don't register
>> the class, Kryo doesn't know how to serialize/deserialize it.
>>
>>
>> Best,
>>
>> Renato M.
>>
>> 2016-08-22 17:12 GMT+02:00 Rahul Palamuttam <rahulpala...@gmail.com>:
>>
>>> Hi,
>>>
>>> Just sending this again to see if others have had this issue.
>>>
>>> I recently switched to using kryo serialization and I've been running
>>> into errors
>>> with the mutable.LinkedHashMap class.
>>>
>>> If I don't register the mutable.LinkedHashMap class then I get an
>>> ArrayStoreException seen below.
>>> If I do register the class, then when the LinkedHashMap is collected on
>>> the driver, it does not contain any elements.
>>>
>>> H

Re: mutable.LinkedHashMap kryo serialization issues

2016-08-26 Thread Rahul Palamuttam
Thanks Renato.

I forgot to reply all last time. I apologize for the rather confusing
example.
All that the snipet code did was
1. Make an RDD of LinkedHashMaps with size 2
2. On the worker side get the sizes of the HashMaps (via a map(hash =>
hash.size))
3. On the driver call collect on the RDD[Ints] which is the RDD of hashmap
sizes giving you an Array[Ints]
4. On the driver call collect on the RDD[LinkedHashMap] giving you an
Array[LinkedHashMap]
5. Check the size of a hashmap in Array[LinkedHashMap] with any size value
in Array[Ints] (they're all going to be the same size).
6. The sizes differ because the elements of the LinkedHashMap were never
copied over

Anyway I think I've tracked down the issue and it doesn't seem to be a
spark or kryo issue.

For those it concerns LinkedHashMap has this serialization issue because it
has transient members for firstEntry and lastEntry.
Take a look here :
https://github.com/scala/scala/blob/v2.11.8/src/library/scala/collection/mutable/LinkedHashMap.scala#L62

Those attributes are not going to be serialized.
Furthermore, the iterator on LinkedHashMap depends on the firstEntry
variable
Since that member is not serialized it is null.
The iterator requires the firstEntry variable to walk the LinkedHashMap
https://github.com/scala/scala/blob/v2.11.8/src/library/scala/collection/mutable/LinkedHashMap.scala#L94-L100

I wonder why these two variables were made transient.

Best,
Rahul Palamuttam


On Thu, Aug 25, 2016 at 11:13 PM, Renato Marroquín Mogrovejo <
renatoj.marroq...@gmail.com> wrote:

> Hi Rahul,
>
> You have probably already figured this one out, but anyway...
> You need to register the classes that you'll be using with Kryo because it
> does not support all Serializable types and requires you to register the
> classes you’ll use in the program in advance. So when you don't register
> the class, Kryo doesn't know how to serialize/deserialize it.
>
>
> Best,
>
> Renato M.
>
> 2016-08-22 17:12 GMT+02:00 Rahul Palamuttam <rahulpala...@gmail.com>:
>
>> Hi,
>>
>> Just sending this again to see if others have had this issue.
>>
>> I recently switched to using kryo serialization and I've been running
>> into errors
>> with the mutable.LinkedHashMap class.
>>
>> If I don't register the mutable.LinkedHashMap class then I get an
>> ArrayStoreException seen below.
>> If I do register the class, then when the LinkedHashMap is collected on
>> the driver, it does not contain any elements.
>>
>> Here is the snippet of code I used :
>>
>> val sc = new SparkContext(new SparkConf()
>>   .setMaster("local[*]")
>>   .setAppName("Sample")
>>   .set("spark.serializer", "org.apache.spark.serializer.KryoSerializer")
>>   .registerKryoClasses(Array(classOf[mutable.LinkedHashMap[String, 
>> String]])))
>>
>> val collect = sc.parallelize(0 to 10)
>>   .map(p => new mutable.LinkedHashMap[String, String]() ++= Array(("hello", 
>> "bonjour"), ("good", "bueno")))
>>
>> val mapSideSizes = collect.map(p => p.size).collect()(0)
>> val driverSideSizes = collect.collect()(0).size
>>
>> println("The sizes before collect : " + mapSideSizes)
>> println("The sizes after collect : " + driverSideSizes)
>>
>>
>> ** The following only occurs if I did not register the
>> mutable.LinkedHashMap class **
>> 16/08/20 18:10:38 ERROR TaskResultGetter: Exception while getting task
>> result
>> java.lang.ArrayStoreException: scala.collection.mutable.HashMap
>> at com.esotericsoftware.kryo.serializers.DefaultArraySerializer
>> s$ObjectArraySerializer.read(DefaultArraySerializers.java:338)
>> at com.esotericsoftware.kryo.serializers.DefaultArraySerializer
>> s$ObjectArraySerializer.read(DefaultArraySerializers.java:293)
>> at com.esotericsoftware.kryo.Kryo.readClassAndObject(Kryo.java:729)
>> at org.apache.spark.serializer.KryoSerializerInstance.deseriali
>> ze(KryoSerializer.scala:311)
>> at org.apache.spark.scheduler.DirectTaskResult.value(TaskResult.scala:97)
>> at org.apache.spark.scheduler.TaskResultGetter$$anon$2$$anonfun
>> $run$1.apply$mcV$sp(TaskResultGetter.scala:60)
>> at org.apache.spark.scheduler.TaskResultGetter$$anon$2$$anonfun
>> $run$1.apply(TaskResultGetter.scala:51)
>> at org.apache.spark.scheduler.TaskResultGetter$$anon$2$$anonfun
>> $run$1.apply(TaskResultGetter.scala:51)
>> at org.apache.spark.util.Utils$.logUncaughtExceptions(Utils.scala:1741)
>> at org.apache.spark.scheduler.TaskResultGetter$$anon$2.run(Task
>> ResultGetter.scala:50)
>> at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPool
>> Executor.java:1142)
>> at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoo
>> lExecutor.java:617)
>> at java.lang.Thread.run(Thread.java:745)
>>
>> I hope this is a known issue and/or I'm missing something important in my
>> setup.
>> Appreciate any help or advice!
>>
>> Best,
>>
>> Rahul Palamuttam
>>
>
>


mutable.LinkedHashMap kryo serialization issues

2016-08-22 Thread Rahul Palamuttam
Hi,

Just sending this again to see if others have had this issue.

I recently switched to using kryo serialization and I've been running into 
errors
with the mutable.LinkedHashMap class.

If I don't register the mutable.LinkedHashMap class then I get an 
ArrayStoreException seen below.
If I do register the class, then when the LinkedHashMap is collected on the 
driver, it does not contain any elements.

Here is the snippet of code I used : 
val sc = new SparkContext(new SparkConf()
  .setMaster("local[*]")
  .setAppName("Sample")
  .set("spark.serializer", "org.apache.spark.serializer.KryoSerializer")
  .registerKryoClasses(Array(classOf[mutable.LinkedHashMap[String, String]])))

val collect = sc.parallelize(0 to 10)
  .map(p => new mutable.LinkedHashMap[String, String]() ++= Array(("hello", 
"bonjour"), ("good", "bueno")))

val mapSideSizes = collect.map(p => p.size).collect()(0)
val driverSideSizes = collect.collect()(0).size

println("The sizes before collect : " + mapSideSizes)
println("The sizes after collect : " + driverSideSizes)

** The following only occurs if I did not register the mutable.LinkedHashMap 
class **
16/08/20 18:10:38 ERROR TaskResultGetter: Exception while getting task result
java.lang.ArrayStoreException: scala.collection.mutable.HashMap
at 
com.esotericsoftware.kryo.serializers.DefaultArraySerializers$ObjectArraySerializer.read(DefaultArraySerializers.java:338)
at 
com.esotericsoftware.kryo.serializers.DefaultArraySerializers$ObjectArraySerializer.read(DefaultArraySerializers.java:293)
at com.esotericsoftware.kryo.Kryo.readClassAndObject(Kryo.java:729)
at 
org.apache.spark.serializer.KryoSerializerInstance.deserialize(KryoSerializer.scala:311)
at 
org.apache.spark.scheduler.DirectTaskResult.value(TaskResult.scala:97)
at 
org.apache.spark.scheduler.TaskResultGetter$$anon$2$$anonfun$run$1.apply$mcV$sp(TaskResultGetter.scala:60)
at 
org.apache.spark.scheduler.TaskResultGetter$$anon$2$$anonfun$run$1.apply(TaskResultGetter.scala:51)
at 
org.apache.spark.scheduler.TaskResultGetter$$anon$2$$anonfun$run$1.apply(TaskResultGetter.scala:51)
at org.apache.spark.util.Utils$.logUncaughtExceptions(Utils.scala:1741)
at 
org.apache.spark.scheduler.TaskResultGetter$$anon$2.run(TaskResultGetter.scala:50)
at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745)

I hope this is a known issue and/or I'm missing something important in my setup.
Appreciate any help or advice!

Best,

Rahul Palamuttam

Re: Unsubscribe

2016-08-21 Thread Rahul Palamuttam
Hi sudhanshu,

Try user-unsubscribe.spark.apache.org

- Rahul P



Sent from my iPhone
> On Aug 21, 2016, at 9:19 AM, Sudhanshu Janghel 
>  wrote:
> 
> Hello,
> 
> I wish to unsubscribe from the channel.
> 
> KIND REGARDS, 
> SUDHANSHU


mutable.LinkedHashMap kryo serialization issues

2016-08-20 Thread Rahul Palamuttam
Hi,

I recently switched to using kryo serialization and I've been running into
errors
with the mutable.LinkedHashMap class.

If I don't register the mutable.LinkedHashMap class then I get an
ArrayStoreException seen below.
If I do register the class, then when the LinkedHashMap is collected on the
driver, it does not contain any elements.

Here is the snippet of code I used :

val sc = new SparkContext(new SparkConf()
  .setMaster("local[*]")
  .setAppName("Sample")
  .set("spark.serializer", "org.apache.spark.serializer.KryoSerializer")
  .registerKryoClasses(Array(classOf[mutable.LinkedHashMap[String, String]])))

val collect = sc.parallelize(0 to 10)
  .map(p => new mutable.LinkedHashMap[String, String]() ++=
Array(("hello", "bonjour"), ("good", "bueno")))

val mapSideSizes = collect.map(p => p.size).collect()(0)
val driverSideSizes = collect.collect()(0).size

println("The sizes before collect : " + mapSideSizes)
println("The sizes after collect : " + driverSideSizes)


** The following only occurs if I did not register the
mutable.LinkedHashMap class **
16/08/20 18:10:38 ERROR TaskResultGetter: Exception while getting task
result
java.lang.ArrayStoreException: scala.collection.mutable.HashMap
at
com.esotericsoftware.kryo.serializers.DefaultArraySerializers$ObjectArraySerializer.read(DefaultArraySerializers.java:338)
at
com.esotericsoftware.kryo.serializers.DefaultArraySerializers$ObjectArraySerializer.read(DefaultArraySerializers.java:293)
at com.esotericsoftware.kryo.Kryo.readClassAndObject(Kryo.java:729)
at
org.apache.spark.serializer.KryoSerializerInstance.deserialize(KryoSerializer.scala:311)
at org.apache.spark.scheduler.DirectTaskResult.value(TaskResult.scala:97)
at
org.apache.spark.scheduler.TaskResultGetter$$anon$2$$anonfun$run$1.apply$mcV$sp(TaskResultGetter.scala:60)
at
org.apache.spark.scheduler.TaskResultGetter$$anon$2$$anonfun$run$1.apply(TaskResultGetter.scala:51)
at
org.apache.spark.scheduler.TaskResultGetter$$anon$2$$anonfun$run$1.apply(TaskResultGetter.scala:51)
at org.apache.spark.util.Utils$.logUncaughtExceptions(Utils.scala:1741)
at
org.apache.spark.scheduler.TaskResultGetter$$anon$2.run(TaskResultGetter.scala:50)
at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745)

I hope this is a known issue and/or I'm missing something important in my
setup.
Appreciate any help or advice!

Best,

Rahul Palamuttam


Renaming sc variable in sparkcontext throws task not serializable

2016-03-02 Thread Rahul Palamuttam
Hi All,

We recently came across this issue when using the spark-shell and zeppelin.
If we assign the sparkcontext variable (sc) to a new variable and reference
another variable in an RDD lambda expression we get a task not serializable
exception.

The following three lines of code illustrate this :

val temp = 10
val newSC = sc
val new RDD = newSC.parallelize(0 to 100).map(p => p + temp).

I am not sure if this is a known issue, or we should file a JIRA for it.
We originally came across this bug in the SciSpark project.

Best,

Rahul P


Re: Support of other languages?

2015-09-17 Thread Rahul Palamuttam
Hi,

Thank you for both responses.
Sun you pointed out the exact issue I was referring to, which is
copying,serializing, deserializing, the byte-array between the JVM heap and
the worker memory.
It also doesn't make sense why the byte-array should be kept on-heap, since
the data of the parent partition is just a byte array that only makes sense
to a python environment.
Shouldn't we be writing the byte-array off-heap and provide supporting
interfaces for outside processes to read and interact with the data?
I'm probably oversimplifying what is really required to do this.

There is a recent JIRA which I thought was interesting with respect to our
discussion.
https://issues.apache.org/jira/browse/SPARK-10399t JIRA

There's also a suggestion, at the bottom of the JIRA, that considers
exposing on-heap memory which is pretty interesting.

- Rahul Palamuttam


On Wed, Sep 9, 2015 at 4:52 AM, Sun, Rui <rui@intel.com> wrote:

> Hi, Rahul,
>
> To support a new language other than Java/Scala in spark, it is different
> between RDD API and DataFrame API.
>
> For RDD API:
>
> RDD is a distributed collection of the language-specific data types whose
> representation is unknown to JVM. Also transformation functions for RDD are
> written in the language which can't be executed on JVM. That's why worker
> processes of the language runtime are needed in such case. Generally, to
> support RDD API in the language, a subclass of the Scala RDD is needed on
> JVM side (for example, PythonRDD for python, RRDD for R) where compute() is
> overridden to send the serialized parent partition data (yes, what you mean
> data copy happens here) and the serialized transformation function via
> socket to the worker process. The worker process deserializes the partition
> data and the transformation function, then applies the function to the
> data. The result is sent back to JVM via socket after serialization as byte
> array. From JVM's viewpoint, the resulting RDD is a collection of byte
> arrays.
>
> Performance is a concern in such case, as there are overheads, like
> launching of worker processes, serialization/deserialization of partition
> data, bi-directional communication cost of the data.
> Besides, as the JVM can't know the real representation of data in the RDD,
> it is difficult and complex to support shuffle and aggregation operations.
> The Spark Core's built-in aggregator and shuffle can't be utilized
> directly. There should be language specific implementation to support these
> operations, which cause additional overheads.
>
> Additional memory occupation by the worker processes is also a concern.
>
> For DataFrame API:
>
> Things are much simpler than RDD API. For DataFrame, data is read from
> Data Source API and is represented as native objects within the JVM and
> there is no language-specific transformation functions. Basically,
> DataFrame API in the language are just method wrappers to the corresponding
> ones in Scala DataFrame API.
>
> Performance is not a concern. The computation is done on native objects in
> JVM, virtually no performance lost.
>
> The only exception is UDF in DataFrame. The UDF() has to rely on language
> worker processes, similar to RDD API.
>
> -Original Message-
> From: Rahul Palamuttam [mailto:rahulpala...@gmail.com]
> Sent: Tuesday, September 8, 2015 10:54 AM
> To: user@spark.apache.org
> Subject: Support of other languages?
>
> Hi,
> I wanted to know more about how Spark supports R and Python, with respect
> to what gets copied into the language environments.
>
> To clarify :
>
> I know that PySpark utilizes py4j sockets to pass pickled python functions
> between the JVM and the python daemons. However, I wanted to know how it
> passes the data from the JVM into the daemon environment. I assume it has
> to copy the data over into the new environment, since python can't exactly
> operate in JVM heap space, (or can it?).
>
> I had the same question with respect to SparkR, though I'm not completely
> familiar with how they pass around native R code through the worker JVM's.
>
> The primary question I wanted to ask is does Spark make a second copy of
> data, so language-specific daemons can operate on the data? What are some
> of the other limitations encountered when we try to offer multi-language
> support, whether it's in performance or in general software architecture.
> With python in particular the collect operation must be first written to
> disk and then read back from the python driver process.
>
> Would appreciate any insight on this, and if there is any work happening
> in this area.
>
> Thank you,
>
> Rahul Palamuttam
>
>
>
> --
> View this message in context:
> http://apache-spark-user-list.100156

Support of other languages?

2015-09-07 Thread Rahul Palamuttam
Hi, 
I wanted to know more about how Spark supports R and Python, with respect to
what gets copied into the language environments.

To clarify :

I know that PySpark utilizes py4j sockets to pass pickled python functions
between the JVM and the python daemons. However, I wanted to know how it
passes the data from the JVM into the daemon environment. I assume it has to
copy the data over into the new environment, since python can't exactly
operate in JVM heap space, (or can it?).  

I had the same question with respect to SparkR, though I'm not completely
familiar with how they pass around native R code through the worker JVM's. 

The primary question I wanted to ask is does Spark make a second copy of
data, so language-specific daemons can operate on the data? What are some of
the other limitations encountered when we try to offer multi-language
support, whether it's in performance or in general software architecture.
With python in particular the collect operation must be first written to
disk and then read back from the python driver process.

Would appreciate any insight on this, and if there is any work happening in
this area.

Thank you,

Rahul Palamuttam  



--
View this message in context: 
http://apache-spark-user-list.1001560.n3.nabble.com/Support-of-other-languages-tp24599.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

-
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org



Re: Spark build/sbt assembly

2015-07-30 Thread Rahul Palamuttam
Hi Akhil,

Yes I did try to remove it, and i tried to build again.
However that jar keeps getting recreated, whenever i run ./build/sbt
assembly

Thanks,

Rahul P

On Thu, Jul 30, 2015 at 12:38 AM, Akhil Das ak...@sigmoidanalytics.com
wrote:

 Did you try removing this jar? build/sbt-launch-0.13.7.jar

 Thanks
 Best Regards

 On Tue, Jul 28, 2015 at 12:08 AM, Rahul Palamuttam rahulpala...@gmail.com
  wrote:

 Hi All,

 I hope this is the right place to post troubleshooting questions.
 I've been following the install instructions and I get the following error
 when running the following from Spark home directory

 $./build/sbt
 Using /usr/java/jdk1.8.0_20/ as default JAVA_HOME.
 Note, this will be overridden by -java-home if it is set.
 Attempting to fetch sbt
 Launching sbt from build/sbt-launch-0.13.7.jar
 Error: Invalid or corrupt jarfile build/sbt-launch-0.13.7.jar

 However when I run sbt assembly it compiles, with a couple of warnings,
 but
 it works none-the less.
 Is the build/sbt script deprecated? I do notice on one node it works but
 on
 the other it gives me the above error.

 Thanks,

 Rahul P



 --
 View this message in context:
 http://apache-spark-user-list.1001560.n3.nabble.com/Spark-build-sbt-assembly-tp24012.html
 Sent from the Apache Spark User List mailing list archive at Nabble.com.

 -
 To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
 For additional commands, e-mail: user-h...@spark.apache.org





Spark Number of Partitions Recommendations

2015-07-28 Thread Rahul Palamuttam
Hi All,

I was wondering why the recommended number for parallelism was 2 -3 times
the number of cores on your cluster.
Is the heuristic explained in any of the Spark papers? Or is it more of an
agreed upon rule of thumb?

Thanks,

Rahul P



--
View this message in context: 
http://apache-spark-user-list.1001560.n3.nabble.com/Spark-Number-of-Partitions-Recommendations-tp24022.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

-
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org



Re: Spark build/sbt assembly

2015-07-27 Thread Rahul Palamuttam
So just to clarify, I have 4 nodes, all of which use Java 8.
Only one of them is able to successfully execute the build/sbt assembly
command.
However on the 3 others I get the error.

If I run sbt assembly in Spark Home, it works and I'm able to launch the
master and worker processes.

On Mon, Jul 27, 2015 at 11:48 AM, Rahul Palamuttam rahulpala...@gmail.com
wrote:

 All nodes are using java 8.
 I've tried to mimic the environments as much as possible among all nodes.


 On Mon, Jul 27, 2015 at 11:44 AM, Ted Yu yuzhih...@gmail.com wrote:

 bq. on one node it works but on the other it gives me the above error.

 Can you tell us the difference between the environments on the two nodes ?
 Does the other node use Java 8 ?

 Cheers

 On Mon, Jul 27, 2015 at 11:38 AM, Rahul Palamuttam 
 rahulpala...@gmail.com wrote:

 Hi All,

 I hope this is the right place to post troubleshooting questions.
 I've been following the install instructions and I get the following
 error
 when running the following from Spark home directory

 $./build/sbt
 Using /usr/java/jdk1.8.0_20/ as default JAVA_HOME.
 Note, this will be overridden by -java-home if it is set.
 Attempting to fetch sbt
 Launching sbt from build/sbt-launch-0.13.7.jar
 Error: Invalid or corrupt jarfile build/sbt-launch-0.13.7.jar

 However when I run sbt assembly it compiles, with a couple of warnings,
 but
 it works none-the less.
 Is the build/sbt script deprecated? I do notice on one node it works but
 on
 the other it gives me the above error.

 Thanks,

 Rahul P



 --
 View this message in context:
 http://apache-spark-user-list.1001560.n3.nabble.com/Spark-build-sbt-assembly-tp24012.html
 Sent from the Apache Spark User List mailing list archive at Nabble.com.

 -
 To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
 For additional commands, e-mail: user-h...@spark.apache.org






Re: Spark build/sbt assembly

2015-07-27 Thread Rahul Palamuttam
All nodes are using java 8.
I've tried to mimic the environments as much as possible among all nodes.


On Mon, Jul 27, 2015 at 11:44 AM, Ted Yu yuzhih...@gmail.com wrote:

 bq. on one node it works but on the other it gives me the above error.

 Can you tell us the difference between the environments on the two nodes ?
 Does the other node use Java 8 ?

 Cheers

 On Mon, Jul 27, 2015 at 11:38 AM, Rahul Palamuttam rahulpala...@gmail.com
  wrote:

 Hi All,

 I hope this is the right place to post troubleshooting questions.
 I've been following the install instructions and I get the following error
 when running the following from Spark home directory

 $./build/sbt
 Using /usr/java/jdk1.8.0_20/ as default JAVA_HOME.
 Note, this will be overridden by -java-home if it is set.
 Attempting to fetch sbt
 Launching sbt from build/sbt-launch-0.13.7.jar
 Error: Invalid or corrupt jarfile build/sbt-launch-0.13.7.jar

 However when I run sbt assembly it compiles, with a couple of warnings,
 but
 it works none-the less.
 Is the build/sbt script deprecated? I do notice on one node it works but
 on
 the other it gives me the above error.

 Thanks,

 Rahul P



 --
 View this message in context:
 http://apache-spark-user-list.1001560.n3.nabble.com/Spark-build-sbt-assembly-tp24012.html
 Sent from the Apache Spark User List mailing list archive at Nabble.com.

 -
 To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
 For additional commands, e-mail: user-h...@spark.apache.org





Spark build/sbt assembly

2015-07-27 Thread Rahul Palamuttam
Hi All,

I hope this is the right place to post troubleshooting questions.
I've been following the install instructions and I get the following error
when running the following from Spark home directory

$./build/sbt
Using /usr/java/jdk1.8.0_20/ as default JAVA_HOME.
Note, this will be overridden by -java-home if it is set.
Attempting to fetch sbt
Launching sbt from build/sbt-launch-0.13.7.jar
Error: Invalid or corrupt jarfile build/sbt-launch-0.13.7.jar

However when I run sbt assembly it compiles, with a couple of warnings, but
it works none-the less.
Is the build/sbt script deprecated? I do notice on one node it works but on
the other it gives me the above error.

Thanks,

Rahul P



--
View this message in context: 
http://apache-spark-user-list.1001560.n3.nabble.com/Spark-build-sbt-assembly-tp24012.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

-
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org