the objects retrieved from the
sequence files, there's a ton more memory used than just building the objects
manually? It doesn't make sense to me. I'm theoretically performing the same
operation on both datasets.
Thanks, I'd definitely appreciate the help!
-Matt Cheah
From: Andrew Winings mch
there will be.
Thanks,
-Matt Cheah
will be (presumably
proportional to the size of the dataset).
Thanks for the quick response!
-Matt Cheah
From: Aaron Davidson ilike...@gmail.commailto:ilike...@gmail.com
Reply-To:
user@spark.incubator.apache.orgmailto:user@spark.incubator.apache.org
user@spark.incubator.apache.orgmailto:user
Thanks a lot for that. There's definitely a lot of subtleties that we need to
consider. We appreciate the thorough explanation!
-Matt Cheah
From: Aaron Davidson ilike...@gmail.commailto:ilike...@gmail.com
Reply-To:
user@spark.incubator.apache.orgmailto:user@spark.incubator.apache.org
user
the ramifications of turning up this value, but I was
wondering what the actual maximum number that could be set for it is. I'll
benchmark the performance hit accordingly.
Thanks!
-Matt Cheah
Actually, we want the opposite – we want as much data to be computed as
possible.
It's only for benchmarking purposes, of course.
-Matt Cheah
From: Matei Zaharia matei.zaha...@gmail.commailto:matei.zaha...@gmail.com
Reply-To:
user@spark.incubator.apache.orgmailto:user
I'm reading the paper now, thanks. It states 100-node clusters were used. Is
this typical in the field to have 100 node clusters for the 1TB scale? We were
expecting to be using ~10 nodes.
I'm still pretty new to cluster computing, so just not sure how people have set
these up.
-Matt Cheah
to me that I'd have to do so. Especially
since the tuning guide suggests to use Externalizable:
http://spark.incubator.apache.org/docs/latest/tuning.html
-Matt Cheah
From: Andrew Ash and...@andrewash.commailto:and...@andrewash.com
Reply-To:
user@spark.incubator.apache.orgmailto:user
:153)
I'm running on a spark cluster generated by the EC2 Scripts. This doesn't
happen if I'm running things with local[N]. Any ideas?
Thanks,
-Matt Cheah
to create a SparkContext per compute-session to sandbox the jars in each
user's job.
Is this a use case that could be done by only using one SparkContext in the JVM?
-Matt Cheah
From: Dmitriy Lyubimov dlie...@gmail.commailto:dlie...@gmail.com
Reply-To:
user@spark.incubator.apache.orgmailto:user
EC2 nodes could have their
firewalls configured to allow this.
We don't want to deploy the web server on the master node of the spark cluster.
Thanks,
-Matt Cheah
Hi everyone,
I see there is a take() function for RDDs, getting the first n elements. Is
there a way to get the last n elements?
Thanks,
-Matt Cheah
cases, we get a stack trace (running locally with 3 threads). I've
included the stack trace below.
Thanks,
-Matt Cheah
org.apache.spark.SparkException: Error communicating with MapOutputTracker
at org.apache.spark.MapOutputTracker.askTracker(MapOutputTracker.scala:84
reduce functions need to be associative
and commutative.
On Tue, Oct 22, 2013 at 12:28 PM, Matt Cheah
mch...@palantir.commailto:mch...@palantir.com wrote:
Hi everyone,
I have a driver holding a reference to an RDD. The driver would like to visit
each item in the RDD in order, say
out again to get
this sequential behavior.
I appreciate the discussion though. Quite enlightening.
Thanks,
-Matt Cheah
From: Christopher Nguyen c...@adatao.commailto:c...@adatao.com
Date: Tuesday, October 22, 2013 2:23 PM
To: user@spark.incubator.apache.orgmailto:user@spark.incubator.apache.org
Ah, I misunderstood the functionality then – I was under the impression that
exactly that fraction would be returned.
Thanks,
-Matt Cheah
From: Aaron Davidson ilike...@gmail.commailto:ilike...@gmail.com
Reply-To:
user@spark.incubator.apache.orgmailto:user@spark.incubator.apache.org
user
16 matches
Mail list logo