I am going to make an edit to the download page on the web site to
start, as that much seems uncontroversial. Proposed change:
Reorder sections to put developer-oriented sections at the bottom,
including the info on nightly builds:
Download Spark
Link with Spark
All Releases
Spark Source
Hi,
I noticed that KinesisStreamSuite fails for both hadoop profiles in master
Jenkins builds.
From
https://amplab.cs.berkeley.edu/jenkins/job/Spark-Master-Maven-with-YARN/HADOOP_PROFILE=hadoop-2.4,label=centos/3011/console
:
KinesisStreamSuite:*** RUN ABORTED *** java.lang.AssertionError:
The PR to fix this is out.
https://github.com/apache/spark/pull/7519
On Sun, Jul 19, 2015 at 6:41 PM, Tathagata Das t...@databricks.com wrote:
I am taking care of this right now.
On Sun, Jul 19, 2015 at 6:08 PM, Patrick Wendell pwend...@gmail.com
wrote:
I think we should just revert this
Yes.
Sent from my iPhone
On 19 Jul, 2015, at 10:52 pm, Jahagirdar, Madhu
madhu.jahagir...@philips.com wrote:
All,
Can we run different version of Spark using the same Mesos Dispatcher. For
example we can run drivers with Spark 1.3 and Spark 1.4 at the same time ?
Regards,
Madhu
I think we should just revert this patch on all affected branches. No
reason to leave the builds broken until a fix is in place.
- Patrick
On Sun, Jul 19, 2015 at 6:03 PM, Josh Rosen rosenvi...@gmail.com wrote:
Yep, I emailed TD about it; I think that we may need to make a change to the
pull
Yep, I emailed TD about it; I think that we may need to make a change to
the pull request builder to fix this. Pending that, we could just revert
the commit that added this.
On Sun, Jul 19, 2015 at 5:32 PM, Ted Yu yuzhih...@gmail.com wrote:
Hi,
I noticed that KinesisStreamSuite fails for both
I am taking care of this right now.
On Sun, Jul 19, 2015 at 6:08 PM, Patrick Wendell pwend...@gmail.com wrote:
I think we should just revert this patch on all affected branches. No
reason to leave the builds broken until a fix is in place.
- Patrick
On Sun, Jul 19, 2015 at 6:03 PM, Josh
Edit: the first line should read:
val groupedRdd = rdd.map((_, 1)).reduceByKey(_ + _)
On Sun, Jul 19, 2015 at 11:02 AM, Sandy Ryza sandy.r...@cloudera.com
wrote:
This functionality already basically exists in Spark. To create the
grouped RDD, one can run:
val groupedRdd =
The user gets to choose what they want to reside in memory. If they call
rdd.cache() on the original RDD, it will be in memory. If they call
rdd.cache() on the compact RDD, it will be in memory. If cache() is called
on both, they'll both be in memory.
-Sandy
On Sun, Jul 19, 2015 at 11:09 AM,
Sorry, maybe I am saying something completely wrong... we have a stream,
we digitize it to created rdd. rdd in this case will be just array of any.
than we apply transformation to create new grouped rdd and GC should remove
original rdd from memory(if we won't persist it). Will we have GC step in
Sean B.,
Thank you for giving a thorough reply. I will work with Sean O. and
see what we can change to make us more in line with the stated policy.
I did some research and it appears that some time between October [1]
and December [2] 2006, this page was modified to include stricter
policy
Hey Sean,
One other thing I'd be okay doing is moving the main text about
nightly builds to the wiki and just have header called Nightly
builds at the end of the downloads page that says For developers,
Spark maintains nightly builds. More information is available on the
[Spark developer
Hi,
My two cents is that that could be interesting if all RDD and pair
RDD operations would be lifted to work on groupedRDD. For example as
suggested a map on grouped RDDs would be more efficient if the original RDD
had lots of duplicate entries, but for RDDs with little repetitions I guess
you
Thanks for answer! Could you please answer for one more question? Will we
have in memory original rdd and grouped rdd in the same time?
2015-07-19 21:04 GMT+03:00 Sandy Ryza sandy.r...@cloudera.com:
Edit: the first line should read:
val groupedRdd = rdd.map((_, 1)).reduceByKey(_ + _)
On
This functionality already basically exists in Spark. To create the
grouped RDD, one can run:
val groupedRdd = rdd.reduceByKey(_ + _)
To get it back into the original form:
groupedRdd.flatMap(x = List.fill(x._1)(x._2))
-Sandy
-Sandy
On Sun, Jul 19, 2015 at 10:40 AM, Сергей Лихоман
I only used client mode both 1.3 and 1.4 versions on mesos.
I skimmed through
https://github.com/apache/spark/blob/master/core/src/main/scala/org/apache/spark/deploy/mesos/MesosClusterDispatcher.scala.
I would actually backport the Cluster Mode feature. Sorry, I don't have an
answer for this.
On
Hi,
I am looking for suitable issue for Master Degree project(it sounds like
scalability problems and improvements for spark streaming) and seems like
introduction of grouped RDD(for example: don't store
Spark, Spark, Spark, instead store (Spark, 3)) can:
1. Reduce memory needed for RDD
In the Spark model, constructing an RDD does not mean storing all its
contents in memory. Rather, an RDD is a description of a dataset that
enables iterating over its contents, record by record (in parallel). The
only time the full contents of an RDD are stored in memory is when a user
19 matches
Mail list logo