Hi All,
I have a setup which consists of 8 small machines (1 core) and 8G RAM
and 1 large machine (8 cores) with 100G RAM. Is there a way to enable
spark to run multiple executors on the large machine, and a single
executor on each of the small machines ?
Alternatively, is is possible to
Hi Community,
Im running spark in standalone mode and In my current cluster each slave
has 8GB of RAM.
I wanted to add one more powerful machine with 100GB of RAM as a slave
to the cluster and encountered some difficulty.
If I don't set spark.executor.memory, all slaves will only allocate
Hi Jie,
it seems that SparkPrefix is not serializable. you can try adding
/implements Serializable/ and see if that solves the problem.
Yadid
On 12/13/13 5:10 AM, Jie Deng wrote:
Hi,all,
Thanks for your time to read this,
When I first trying to write a new Java class, and put spark in it,
oops, ,meant to send to the entire list...
Original Message
Subject:Re: reading a specific key-value
Date: Fri, 13 Dec 2013 14:56:22 -0500
From: Yadid Ayzenberg ya...@media.mit.edu
To: K. Shankari shank...@eecs.berkeley.edu
Its says more efficient if the RDD
partitioned by
key, and after that only partition-preserving transformations can have
been performed on the RDD[(K,V)].
On Fri, Dec 13, 2013 at 12:07 PM, Yadid Ayzenberg ya...@media.mit.edu
mailto:ya...@media.mit.edu wrote:
oops, ,meant to send to the entire list...
Original
Hi All,
I'm trying to understand the performance results I'm getting for the
following:
rdd = sc.newAPIHadoopRDD( ... )
rdd1 = rdd.keyBy( func1() )
rdd1.count()
rdd1.cache()
rdd2= rdd1.map(func2())
rdd2.count()
rdd3 = rdd2.map(func2())
rdd3.count()
I would expect the 2 maps to be more or
it was marked as cached, so rdd1 does not need to be re-evaluated
within the rdd3.count job.
On Tue, Dec 10, 2013 at 9:11 AM, Yadid Ayzenberg ya...@media.mit.edu
mailto:ya...@media.mit.edu wrote:
Hi All,
I'm trying to understand the performance results I'm getting for
the following
Hi all,
Im noticing some strange behavior when running mapPartitions. Pseudo code:
JavaPairRDDObject, Tuple2Object, BSONObject myRDD =
myRDD.mapPartitions( func )
myRDD.count()
ArrayListTuple2Integer, Tuple2ListTuple2Double, Double,
ListTuple2Double, DoubletempRDD =
-effecting mutator
-- you've just changed where in a lineage of transformations you are
pointing to with your mutable myRDD reference.
On Mon, Dec 9, 2013 at 11:06 AM, Yadid Ayzenberg ya...@media.mit.edu
mailto:ya...@media.mit.edu wrote:
Hi all,
Im noticing some strange behavior when
-defined functions, so if you're caching mutable Java
objects and mutating them in a transformation, the effects of that
mutation might change the cached data and affect other RDDs. Is func2
mutating your cached objects?
On Mon, Dec 9, 2013 at 11:25 AM, Yadid Ayzenberg ya...@media.mit.edu
. just doing one action after the looped
transformations
5. trying to unpersist() prior cached iterations can work, but it is
sensitive to where it occurs relative to actions
On Sat, Nov 30, 2013 at 6:39 PM, Yadid Ayzenberg ya...@media.mit.edu
mailto:ya...@media.mit.edu wrote:
step
Hi All,
Im trying to implement the following and would like to know in which places I
should be calling RDD.cache():
Suppose I have a group of RDDs : RDD1 to RDDn as input.
1. create a single RDD_total = RDD1.union(RDD2)..union(RDDn)
2. for i = 0 to x:RDD_total = RDD_total.map (some
hasn't actually done anything yet.
On Sat, Nov 30, 2013 at 6:01 PM, Yadid Ayzenberg ya...@media.mit.edu
mailto:ya...@media.mit.edu wrote:
Hi All,
Im trying to implement the following and would like to know in
which places I should be calling RDD.cache():
Suppose I have a group
on
worker nodes.
Anyway, Prashant's response about spreadOut is appropriate for
application-level scheduling.
On Tue, Nov 19, 2013 at 8:03 AM, Yadid Ayzenberg ya...@media.mit.edu
mailto:ya...@media.mit.edu wrote:
My bad - I should have stated that up front. I guess it was kind
up
at shuffle boundaries), and stages are associated with task sets
containing multiple tasks, the units of work that actually run on
worker nodes.
Anyway, Prashant's response about spreadOut is appropriate for
application-level scheduling.
On Tue, Nov 19, 2013 at 8:03 AM, Yadid Ayzenberg
mapPartitions and ignore the result?
- Patrick
On Sun, Nov 17, 2013 at 4:45 PM, Yadid Ayzenberg
ya...@media.mit.edu mailto:ya...@media.mit.edu wrote:
Hi,
According to the API, foreachPartition() is not yet implemented
in Java.
Are there any workarounds to get
Hi,
According to the API, foreachPartition() is not yet implemented in Java.
Are there any workarounds to get the same functionality ?
I have a non serializable DB connection and instantiating it is pretty
expensive, so I prefer to do it on a per partition basis.
thanks,
Yadid
8:19 PM, Patrick Wendell wrote:
Thanks that would help. This would be consistent with there being a
reference to the SparkContext itself inside of the closure. Just want
to make sure that's not the case.
On Sun, Nov 3, 2013 at 5:13 PM, Yadid Ayzenberg ya...@media.mit.edu wrote:
Im running
18 matches
Mail list logo