Heya TD,
Thanks for the detailed answer! Much appreciated.
Regarding order among elements within an RDD, you're definitively right,
it'd kill the //ism and would require synchronization which is completely
avoided in distributed env.
That's why, I won't push this constraint to the RDDs
Hi,
I am trying to understand how the resource allocation happens in spark. I
understand the resourceOffer method in taskScheduler. This method takes
care of locality factor while allocating the resources. This resourceOffer
method gets invoked by the corresponding cluster manager.
I am working
Hi Karthik,
The resourceOffer() method is invoked from a class implementing the
SchedulerBackend interface; in the case of a standalone cluster, it's
invoked from a CoarseGrainedSchedulerBackend (in the makeOffers() method).
If you look in TaskSchedulerImpl.submitTasks(), it calls
I think it makes sense, though without a concrete implementation its hard
to be sure. Applying sorting on the RDD according to the RDDs makes sense,
but I can think of two kinds of fundamental problems.
1. How do you deal with ordering across RDD boundaries. Say two consecutive
RDDs in the
Indeed, these two cases are tightly coupled (the first one is a special
case of the second).
Actually, these outliers could be handled by a dedicated function what I
named outliersManager -- I was not so much inspired ^^, but we could name
these outliers, outlaws and thus the function would be
Hi, Sandy
We do have some issue with this. The difference is in Yarn-Alpha and
Yarn Stable ( I noticed that in the latest build, the module name has
changed,
yarn-alpha -- yarn
yarn -- yarn-stable
)
For example: MRJobConfig.class
the field:
DEFAULT_MAPREDUCE_APPLICATION_CLASSPATH
Hi guys,
am wondering how the RDD checkpointing
https://spark.apache.org/docs/latest/streaming-programming-guide.html#RDD
Checkpointing works in Spark Streaming. When I use updateStateByKey, does
the Spark store the entire state (at one time point) into the HDFS or only
put the transformation
Hmm
looks like a Build script issue:
I run the command with :
sbt/sbt clean *yarn/*test:compile
but errors came from
[error] 40 errors found
[error] (*yarn-stable*/compile:compile) Compilation failed
Chester
On Wed, Jul 16, 2014 at 5:18 PM, Chester Chen ches...@alpinenow.com wrote:
Hi,
After every checkpointing interval, the latest state RDD is stored to HDFS
in its entirety. Along with that, the series of DStream transformations
that was setup with the streaming context is also stored into HDFS (the
whole DAG of DStream objects is serialized and saved).
TD
On Wed, Jul 16,
Looking further, the yarn and yarn-stable are both for the stable version
of Yarn, that explains the compilation errors when using 2.0.5-alpha
version of hadoop.
the module yarn-alpha ( although is still on SparkBuild.scala), is no
longer there in sbt console.
projects
[info] In
Hey Reynold, just to clarify, users will still have to manually broadcast
objects that they want to use *across* operations (e.g. in multiple iterations
of an algorithm, or multiple map functions, or stuff like that). But they won't
have to broadcast something they only use once.
Matei
On Jul
Yup - that is correct. Thanks for clarifying.
On Wed, Jul 16, 2014 at 10:12 PM, Matei Zaharia matei.zaha...@gmail.com
wrote:
Hey Reynold, just to clarify, users will still have to manually broadcast
objects that they want to use *across* operations (e.g. in multiple
iterations of an
Wow. Great writeup.
I keep tabs on several open source projects that we use heavily, and
I'd be ecstatic if more major changes were this well/succinctly
explained instead of the usual just read the commit message/diff.
- Stephen
Hi Patrick, thanks for taking a look. I filed as
https://issues.apache.org/jira/browse/SPARK-2546
Would you recommend I pursue the cloned Configuration object approach now
and send in a PR?
Reynold's recent announcement of the broadcast RDD object patch may also
have implications of the right
14 matches
Mail list logo