Why not attach a bigger hard disk to the machines and point your
SPARK_LOCAL_DIRS to it?
Thanks
Best Regards
On Sat, Aug 29, 2015 at 1:13 AM, fsacerdoti
wrote:
> Hello,
>
> Similar to the thread below [1], when I tried to create an RDD from a 4GB
> pandas dataframe
There are two issues here:
1. Suppression of the true reason for failure. The spark runtime reports
"TypeError" but that is not why the operation failed.
2. The low performance of loading a pandas dataframe.
DISCUSSION
Number (1) is easily fixed, and the primary purpose for my post.
Number
Sounds good, want me to create a jira and link it to SPARK-9697? Will put
down some ideas to start.
On Aug 31, 2015 4:14 AM, "Reynold Xin" wrote:
> BTW if you are interested in this, we could definitely get some help in
> terms of prototyping the feasibility, i.e. how we can
Thanks josh ... i'll take a look
On 31 Aug 2015 19:21, "Josh Rosen" wrote:
> There are currently a few known issues with using KryoSerializer as the
> closure serializer, so it's going to require some changes to Spark if we
> want to properly support this. See
>
tested now against Spark 1.5.0 rc2, and same exceptions happen when
num-executors > 2 :
15/08/25 10:31:10 WARN scheduler.TaskSetManager: Lost task 0.1 in stage 5.0
(TID 501, xxx): java.lang.ClassCastException: java.lang.Double cannot
be cast to java.lang.Long
at
I'm going to -1 the release myself since the issue @yhuai identified is
pretty serious. It basically OOMs the driver for reading any files with a
large number of partitions. Looks like the patch for that has already been
merged.
I'm going to cut rc3 momentarily.
On Sun, Aug 30, 2015 at 11:30
Seems that Github branch-1.5 already changing the version to
1.5.1-SNAPSHOT,
I am a bit confused are we still on 1.5.0 RC3 or we are in 1.5.1 ?
Chester
On Mon, Aug 31, 2015 at 3:52 PM, Reynold Xin wrote:
> I'm going to -1 the release myself since the issue @yhuai
On Sun, Aug 30, 2015 at 5:58 AM, Paul Weiss wrote:
>
> Also, is this work being done on a branch I could look into further and
> try out?
>
>
We don't have a branch yet -- because there is no code nor design for this
yet. As I said, it is one of the motivations behind
If you look at the recurrent issues in datacentre-scale computing systems, two
stand out
-resilience to failure: that's algorithms and the layers underneath (storage,
work allocation & tracking ...)
-scheduling: maximising resource utilisation while prioritising high-SLA work
(interactive
Hi devs,
Curently the only supported serializer for serializing tasks in
DAGScheduler.scala is JavaSerializer.
val taskBinaryBytes: Array[Byte] = stage match {
case stage: ShuffleMapStage =>
closureSerializer.serialize((stage.rdd, stage.shuffleDep): AnyRef).array()
case stage:
BTW if you are interested in this, we could definitely get some help in
terms of prototyping the feasibility, i.e. how we can have a native (e.g.
C++) API for data access shipped with Spark. There are a lot of questions
(e.g. build, portability) that need to be answered.
On Mon, Aug 31, 2015 at
11 matches
Mail list logo