On 27 Jul 2015, at 16:42, Ulanov, Alexander alexander.ula...@hp.com wrote:
It seems that the mentioned two joins can be rewritten as one outer join
You're right. In fact, the outer join can be streamlined further using a
method from GraphOps:
g = g.joinVertices(messages)(vprog).cache()
Then,
I think it would be good to have more basic operators like union or
difference, as long as they have an efficient distributed implementation
and are plausibly useful.
If they can be written in terms of the existing GraphX API, it would be
best to put them into GraphOps to keep the core GraphX
This is the latest GraphX-based ALS implementation that I'm aware of:
https://github.com/ankurdave/spark/blob/GraphXALS/graphx/src/main/scala/org/apache/spark/graphx/lib/ALS.scala
When I benchmarked it last year, it was about twice as slow as MLlib's ALS,
and I think the latter has gotten faster
No - the vertices are hash-partitioned onto workers independently of the
edges. It would be nice for each vertex to be on the worker with the most
adjacent edges, but we haven't done this yet since it would add a lot of
complexity to avoid load imbalance while reducing the overall communication
by
+1 (binding)
Ankur http://www.ankurdave.com/
On Wed, Nov 5, 2014 at 5:31 PM, Matei Zaharia matei.zaha...@gmail.com
wrote:
I'd like to formally call a [VOTE] on this model, to last 72 hours. The
[VOTE] will end on Nov 8, 2014 at 6 PM PST.
At 2014-09-15 08:59:48 -0700, Andrew Ash and...@andrewash.com wrote:
I'm seeing the same exception now on the Spark 1.1.0 release. Did you ever
get this figured out?
[...]
On Thu, Aug 21, 2014 at 2:14 PM, npanj nitinp...@gmail.com wrote:
I am getting PARSING_ERROR while running my job on
I posted the fix on the JIRA ticket
(https://issues.apache.org/jira/browse/SPARK-3190). To update the user list,
this is indeed an integer overflow problem when summing up the partition sizes.
The fix is to use Longs for the sum: https://github.com/apache/spark/pull/2106.
Ankur
On Mon, Jul 28, 2014 at 4:29 AM, Larry Xiao xia...@sjtu.edu.cn wrote:
On 7/28/14, 3:41 PM, shijiaxin wrote:
There is a VertexPartition in the EdgePartition,which is created by
EdgePartitionBuilder.toEdgePartition.
and There is also a ShippableVertexPartition in the VertexRDD.
These two
Hi Larry,
GraphX's graph constructor leaves the edges in their original partitions by
default. To support arbitrary multipass graph partitioning, one idea is to
take advantage of that by partitioning the graph externally to GraphX
(though possibly using information from GraphX such as the
Oops, the code should be:
val unpartitionedGraph: Graph[Int, Int] = ...val numPartitions: Int = 128
def getTripletPartition(e: EdgeTriplet[Int, Int]): PartitionID = ...
// Get the triplets using GraphX, then use Spark to repartition
themval partitionedEdges = unpartitionedGraph.triplets
.map(e
We didn't provide an unpersist API for Graph because the internal
dependency structure of a graph can make it hard to unpersist correctly in
a way that avoids recomputation. However, you can directly unpersist a
graph's vertices and edges RDDs using graph.vertices.unpersist() and
I agree, let's go ahead and remove it.
Ankur http://www.ankurdave.com/
0
OK, I withdraw my downvote.
Ankur http://www.ankurdave.com/
This is probably due to
SPARK-1931https://issues.apache.org/jira/browse/SPARK-1931,
which I just fixed in PR #885 https://github.com/apache/spark/pull/885.
Is the problem resolved if you use the current Spark master?
Ankur http://www.ankurdave.com/
14 matches
Mail list logo