You might be interested in "Maximum Flow implementation on Spark GraphX" done
by a Colorado School of Mines grad student a couple of years ago.
http://datascienceassn.org/2016-01-27-maximum-flow-implementation-spark-graphx
From: Swapnil Shinde
Chapter 6 of my book implements Dijkstra's Algorithm. The source code is
available to download for free.
https://www.manning.com/books/spark-graphx-in-action
From: Brian Wilson
To: user@spark.apache.org
Sent: Monday, October 24, 2016 7:11 AM
Subject:
In chapter 10 of Spark GraphX In Action, we describe how to use Zeppelin with
d3.js to render graphs using d3's force-directed rendering algorithm. The
source code can be downloaded for free from
https://www.manning.com/books/spark-graphx-in-action
From: agc studio
Yes, it is possible to use GraphX from Java but it requires 10x the amount of
code and involves using obscure typing and pre-defined lambda prototype
facilities. I give an example of it in my book, the source code for which can
be downloaded for free from
At first glance, it looks like the only streaming data sources available out of
the box from the github master branch are
https://github.com/apache/spark/blob/master/sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/FileStreamSource.scala
and
http://go.databricks.com/apache-spark-2.0-presented-by-databricks-co-founder-reynold-xin
From: Sourav Mazumder
To: user
Sent: Wednesday, April 20, 2016 11:07 AM
Subject: Spark 2.0 forthcoming features
Hi All,
Is there
As with all history, "what if"s are not scientifically testable hypotheses, but
my speculation is the energy (VCs, startups, big Internet companies,
universities) within Silicon Valley contrasted to Germany.
From: Mich Talebzadeh <mich.talebza...@gmail.com>
To: Michael
There have been commercial CEP solutions for decades, including from my
employer.
From: Mich Talebzadeh
To: Mark Hamstra
Cc: Corey Nolet ; "user @spark"
Sent: Sunday, April 17, 2016 3:48 PM
In terms of publication date, a paper on Nephele was published in 2009, prior
to the 2010 USENIX paper on Spark. Nephele is the execution engine of
Stratosphere, which became Flink.
From: Mark Hamstra
To: Mich Talebzadeh
Cc: Corey
Will Spark 2.0 Structured Streaming obviate some of the Druid/Spark use cases?
From: Raymond Honderdors
To: "yuzhih...@gmail.com"
Cc: "user@spark.apache.org"
Sent: Wednesday, March 23, 2016 8:43 AM
Subject:
Yes. And a paper that describes using grids (actually varying grids) is
http://research.microsoft.com/en-us/um/people/jingdw/pubs%5CCVPR12-GraphConstruction.pdf
In the Spark GraphX In Action book that Robin East and I are writing, we
implement a drastically simplified version of this in chapter
I would also add, from a data locality theoretic standpoint, mapPartitions()
provides for node-local computation that plain old map-reduce does not.
From my Android phone on T-Mobile. The first nationwide 4G network.
Original message
From: Ashic Mahtab as...@live.com
Date:
http://www.datascienceassn.org/content/making-sense-making-sense-performance-data-analytics-frameworks
From: bit1...@163.com bit1...@163.com
To: user user@spark.apache.org
Sent: Monday, April 27, 2015 8:33 PM
Subject: Why Spark is much faster than Hadoop MapReduce even on disk
You could have your receiver send a magic value when it is done. I discuss
this Spark Streaming pattern in my presentation Spark Gotchas and
Anti-Patterns. In the PDF version, it's slides
34-36.http://www.datascienceassn.org/content/2014-11-05-spark-gotchas-and-anti-patterns-julia-language
Can my new book, Spark GraphX In Action, which is currently in MEAP
http://manning.com/malak/, be added to
https://spark.apache.org/documentation.html and, if appropriate, to
https://spark.apache.org/graphx/ ?
Michael Malak
But isn't foldLeft() overkill for the originally stated use case of max diff of
adjacent pairs? Isn't foldLeft() for recursive non-commutative non-associative
accumulation as opposed to an embarrassingly parallel operation such as this
one?
This use case reminds me of FIR filtering in DSP. It
On Wednesday, October 22, 2014 9:06 AM, Sean Owen so...@cloudera.com wrote:
No, there's no such thing as an RDD of RDDs in Spark.
Here though, why not just operate on an RDD of Lists? or a List of RDDs?
Usually one of these two is the right approach whenever you feel
inclined to operate on an
Depending on the density of your keys, the alternative signature
def updateStateByKey[S](updateFunc: (Iterator[(K, Seq[V], Option[S])]) ?
Iterator[(K, S)], partitioner: Partitioner, rememberPartitioner:
Boolean)(implicit arg0: ClassTag[S]): DStream[(K, S)]
at least iterates by key rather than
It's really more of a Scala question than a Spark question, but the standard OO
(not Scala-specific) way is to create your own custom supertype (e.g.
MyCollectionTrait), inherited/implemented by two concrete classes (e.g. MyRDD
and MyArray), each of which manually forwards method calls to the
How about a treeReduceByKey? :-)
On Friday, June 20, 2014 11:55 AM, DB Tsai dbt...@stanford.edu wrote:
Currently, the reduce operation combines the result from mapper
sequentially, so it's O(n).
Xiangrui is working on treeReduce which is O(log(n)). Based on the
benchmark, it dramatically
Mohit Jaggi:
A workaround is to use zipWithIndex (to appear in Spark 1.0, but if you're
still on 0.9x you can swipe the code from
https://github.com/apache/spark/blob/master/core/src/main/scala/org/apache/spark/rdd/ZippedWithIndexRDD.scala
), map it to (x = (x._2,x._1)) and then sortByKey.
as the ASF Jira system will let me
reset my password.
On Sunday, May 11, 2014 4:40 AM, Michael Malak michaelma...@yahoo.com wrote:
Is this a bug?
scala sc.parallelize(1 to 2,4).zip(sc.parallelize(11 to 12,4)).collect
res0: Array[(Int, Int)] = Array((1,11), (2,12))
scala sc.parallelize(1L to 2L,4
looks like Spark outperforms Stratosphere fairly consistently in the
experiments
There was one exception the paper noted, which was when memory resources were
constrained. In that case, Stratosphere seemed to have degraded more gracefully
than Spark, but the author did not explore it deeper.
23 matches
Mail list logo