Re: [GraphX]: Prevent recomputation of DAG

2024-03-18 Thread Mich Talebzadeh
Hi, I must admit I don't know much about this Fruchterman-Reingold (call it FR) visualization using GraphX and Kubernetes..But you are suggesting this slowdown issue starts after the second iteration, and caching/persisting the graph after each iteration does not help. FR involves many

Re: GraphX Support

2022-03-25 Thread Bjørn Jørgensen
Yes, MLlib is actively developed. You can have a look at github and filter on closed and ML github and filter on closed and ML fre. 25. mar. 2022 kl. 22:15 skrev Bitfox : > BTW , is MLlib

Re: GraphX Support

2022-03-25 Thread Bitfox
BTW , is MLlib still in active development? Thanks On Tue, Mar 22, 2022 at 07:11 Sean Owen wrote: > GraphX is not active, though still there and does continue to build and > test with each Spark release. GraphFrames kind of superseded it, but is > also not super active FWIW. > > On Mon, Mar

Re: [EXTERNAL] Re: GraphX Support

2022-03-25 Thread Bjørn Jørgensen
graphs utils and documentation <https://www.arangodb.com/docs/stable/graphs.html> tir. 22. mar. 2022 kl. 00:49 skrev Jacob Marquez : > Awesome, thank you! > > > > *From:* Sean Owen > *Sent:* Monday, March 21, 2022 4:11 PM > *To:* Jacob Marquez > *Cc:* user@spark.apa

Re: GraphX Support

2022-03-22 Thread Enrico Minack
Right, GraphFrames is not very active and maintainers don't even have the capacity to make releases. Enrico Am 22.03.22 um 00:10 schrieb Sean Owen: GraphX is not active, though still there and does continue to build and test with each Spark release. GraphFrames kind of superseded it, but is

RE: [EXTERNAL] Re: GraphX Support

2022-03-21 Thread Jacob Marquez
Awesome, thank you! From: Sean Owen Sent: Monday, March 21, 2022 4:11 PM To: Jacob Marquez Cc: user@spark.apache.org Subject: [EXTERNAL] Re: GraphX Support You don't often get email from sro...@gmail.com<mailto:sro...@gmail.com>. Learn why this is important<http

Re: GraphX Support

2022-03-21 Thread Sean Owen
GraphX is not active, though still there and does continue to build and test with each Spark release. GraphFrames kind of superseded it, but is also not super active FWIW. On Mon, Mar 21, 2022 at 6:03 PM Jacob Marquez wrote: > Hello! > > > > My team and I are evaluating GraphX as a possible

Re: GraphX performance feedback

2019-11-28 Thread mahzad kalantari
Ok thanks! Le jeu. 28 nov. 2019 à 11:27, Phillip Henry a écrit : > I saw a large improvement in my GraphX processing by: > > - using fewer partitions > - using fewer executors but with much more memory. > > YMMV. > > Phillip > > On Mon, 25 Nov 2019, 19:14 mahzad kalantari, > wrote: > >> Thanks

Re: GraphX performance feedback

2019-11-28 Thread Phillip Henry
I saw a large improvement in my GraphX processing by: - using fewer partitions - using fewer executors but with much more memory. YMMV. Phillip On Mon, 25 Nov 2019, 19:14 mahzad kalantari, wrote: > Thanks for your answer, my use case is friend recommandation for 200 > million profils. > > Le

Re: GraphX performance feedback

2019-11-25 Thread mahzad kalantari
Thanks for your answer, my use case is friend recommandation for 200 million profils. Le lun. 25 nov. 2019 à 14:10, Jörn Franke a écrit : > I think it depends what you want do. Interactive big data graph analytics > are probably better of in Janusgraph or similar. > Batch processing (once-off)

Re: GraphX performance feedback

2019-11-25 Thread Jörn Franke
I think it depends what you want do. Interactive big data graph analytics are probably better of in Janusgraph or similar. Batch processing (once-off) can be still fine in graphx - you have though to carefully design the process. > Am 25.11.2019 um 20:04 schrieb mahzad kalantari : > >  > Hi

Re: graphx vs graphframes

2019-10-17 Thread Nicolas Paris
Hi Alastair Cypher support looks like promising and the dev list thread discussion is interesting. thanks for your feedback. On Thu, Oct 17, 2019 at 09:19:28AM +0100, Alastair Green wrote: > Hi Nicolas, > > I was following the current thread on the dev channel about Spark > Graph, including

Re: graphx vs graphframes

2019-10-17 Thread Alastair Green
Hi Nicolas, I was following the current thread on the dev channel about Spark Graph, including Cypher support, http://apache-spark-developers-list.1001551.n3.nabble.com/Add-spark-dependency-on-on-org-opencypher-okapi-shade-okapi-td28118.html

Re: [GraphX] Preserving Partitions when reading from HDFS

2019-04-25 Thread M Bilal
If I understand correctly this would set the split size in the Hadoop configuration when reading file. I can see that being useful when you want to create more partitions than what the block size in HDFS might dictate. Instead what I want to do is to create a single partition for each file written

Re: [GraphX] Preserving Partitions when reading from HDFS

2019-04-15 Thread Manu Zhang
You may try `sparkContext.hadoopConfiguration().set("mapred.max.split.size", "33554432")` to tune the partition size when reading from HDFS. Thanks, Manu Zhang On Mon, Apr 15, 2019 at 11:28 PM M Bilal wrote: > Hi, > > I have implemented a custom partitioning algorithm to partition graphs in >

Re: GraphX subgraph from list of VertexIds

2017-05-12 Thread Robineast
it would be listVertices.contains(vid) wouldn't it? - Robin East Spark GraphX in Action Michael Malak and Robin East Manning Publications Co. http://www.manning.com/books/spark-graphx-in-action -- View this message in context:

Re: GraphX Pregel API: add vertices and edges

2017-03-23 Thread Robineast
>From the section on Pregel API in the GraphX programming guide: '... the Pregel operator in GraphX is a bulk-synchronous parallel messaging abstraction /constrained to the topology of the graph/.'. Does that answer your question? Did you read the programming guide? - Robin East Spark

Re: GraphX Pregel API: add vertices and edges

2017-03-23 Thread Robineast
GraphX is not synonymous with Pregel. To quote the GraphX programming guide 'GraphX exposes a variant of the Pregel API.'. There is no compute() function in GraphX - see the Pregel API section of the programming

Re: GraphX Pregel API: add vertices and edges

2017-03-23 Thread Robineast
Not that I'm aware of. Where did you read that? - Robin East Spark GraphX in Action Michael Malak and Robin East Manning Publications Co. http://www.manning.com/books/spark-graphx-in-action -- View this message in context:

Re: Graphx Examples for ALS

2017-02-17 Thread Irving Duran
Not sure I follow your question. Do you want to use ALS or GraphX? Thank You, Irving Duran On Fri, Feb 17, 2017 at 7:07 AM, balaji9058 wrote: > Hi, > > Where can i find the the ALS recommendation algorithm for large data set? > > Please feel to share your

Re: Graphx triplet comparison

2016-12-14 Thread Robineast
You are trying to invoke 1 RDD action inside another, that won't work. If you want to do what you are attempting you need to .collect() each triplet to the driver and iterate over that. HOWEVER you almost certainly don't want to do that, not if your data are anything other than a trivial size. In

Re: Graphx triplet comparison

2016-12-13 Thread balaji9058
Hi Thanks for reply. Here is my code: class BusStopNode(val name: String,val mode:String,val maxpasengers :Int) extends Serializable case class busstop(override val name: String,override val mode:String,val shelterId: String, override val maxpasengers :Int) extends

Re: Graphx triplet comparison

2016-12-13 Thread Robineast
No sure what you are asking. What's wrong with: triplet1.filter(condition3) triplet2.filter(condition3) - Robin East Spark GraphX in Action Michael Malak and Robin East Manning Publications Co. http://www.manning.com/books/spark-graphx-in-action -- View this message in context:

Re: [GraphX] Extreme scheduler delay

2016-12-06 Thread Sean Owen
(For what it is worth, I happened to look into this with Anton earlier and am also pretty convinced it's related to GraphX rather than the app. It's somewhat difficult to debug what gets sent in the closure AFAICT.) On Tue, Dec 6, 2016 at 7:49 PM AntonIpp wrote: > Hi

Re: GraphX Pregel not update vertex state properly, cause messages loss

2016-11-28 Thread rohit13k
Found the exact issue. If the vertex attribute is a complex object with mutable objects the edge triplet does not update the new state once already the vertex attributes are shipped but if the vertex attributes are immutable objects then there is no issue. below is a code for the same. Just

Re: GraphX Pregel not update vertex state properly, cause messages loss

2016-11-24 Thread 吴 郎
-- From: "Dale Wang"<w.zhaok...@gmail.com>; Date: 2016年11月24日(星期四) 中午11:10 To: "吴 郎"<fuz@qq.com>; Cc: "user"<user@spark.apache.org>; Subject: Re: GraphX Pregel not update vertex state properly, cause messages loss The prob

Re: GraphX Pregel not update vertex state properly, cause messages loss

2016-11-23 Thread Dale Wang
The problem comes from the inconsistency between graph’s triplet view and vertex view. The message may not be lost but the message is just not sent in sendMsgfunction because sendMsg function gets wrong value of srcAttr! It is not a new bug. I met a similar bug that appeared in version 1.2.1

Re: GraphX Pregel not update vertex state properly, cause messages loss

2016-11-23 Thread rohit13k
Created a JIRA for the same https://issues.apache.org/jira/browse/SPARK-18568 -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/GraphX-Pregel-not-update-vertex-state-properly-cause-messages-loss-tp28100p28124.html Sent from the Apache Spark User List mailing

Re: GraphX Pregel not update vertex state properly, cause messages loss

2016-11-23 Thread rohit13k
Hi I am facing a similar issue. It's not that the message is getting lost or something. The vertex 1 attributes changes in super step 1 but when the sendMsg gets the vertex attribute from the edge triplet in the 2nd superstep it stills has the old value of vertex 1 and not the latest value. So

Re: GraphX Connected Components

2016-11-08 Thread Robineast
Have you tried this? https://spark.apache.org/docs/2.0.1/api/scala/index.html#org.apache.spark.graphx.GraphLoader$ - Robin East Spark GraphX in Action Michael Malak and Robin East Manning Publications Co. http://www.manning.com/books/spark-graphx-in-action -- View this message in

Re: GraphX drawing algorithm

2016-09-11 Thread Michael Malak
In chapter 10 of Spark GraphX In Action, we describe how to use Zeppelin with d3.js to render graphs using d3's force-directed rendering algorithm. The source code can be downloaded for free from  https://www.manning.com/books/spark-graphx-in-action From: agc studio

Re: GraphX performance and settings

2016-07-22 Thread B YL
Hi, We are also running Connected Components test with GraphX. We ran experiments using Spark 1.6.1 on machine which have 16 cores with 2-way and run only a single executor per machine. We got this result: Facebook-like graph with 2^24 edges, using 4 executors with 90GB each, it took 100

Re: GraphX performance and settings

2016-07-22 Thread B YL
Hi, We are also running Connected Components test with GraphX. We ran experiments using Spark 1.6.1 on machine which have 16 cores with 2-way and run only a single executor per machine. We got this result: Facebook-like graph with 2^24 edges, using 4 executors with 90GB each, it took 100

Re: GraphX performance and settings

2016-06-22 Thread Maja Kabiljo
which we'd kill them. Maja From: Deepak Goel <deic...@gmail.com<mailto:deic...@gmail.com>> Date: Wednesday, June 15, 2016 at 7:13 PM To: Maja Kabiljo <majakabi...@fb.com<mailto:majakabi...@fb.com>> Cc: "user @spark" <user@spark.apache.org<mailto:use

Re: GraphX performance and settings

2016-06-15 Thread Deepak Goel
I am not an expert but some thoughts inline On Jun 16, 2016 6:31 AM, "Maja Kabiljo" wrote: > > Hi, > > We are running some experiments with GraphX in order to compare it with other systems. There are multiple settings which significantly affect performance, and we

RE: GraphX Java API

2016-06-08 Thread Felix Cheung
(US - Bengaluru); Matta, Rishul (US - Bengaluru); Aich, Risha (US - Bengaluru); Kumar, Rajinish (US - Bengaluru); Jain, Isha (US - Bengaluru); Kumar, Sandeep (US - Bengaluru) Subject: Re: GraphX Java API Its very much possible to use GraphX through Java, though some boilerplate may be needed. He

RE: GraphX Java API

2016-06-05 Thread Santoshakhilesh
(US - Bengaluru); Jain, Isha (US - Bengaluru); Kumar, Sandeep (US - Bengaluru) Subject: RE: GraphX Java API Hey, • I see some graphx packages listed here: http://spark.apache.org/docs/latest/api/java/index.html • org.apache.spark.graphx<http://spark.apach

Re: GraphX Java API

2016-05-31 Thread Sonal Goyal
Santoshakhilesh; user@spark.apache.org > *Cc:* Golatkar, Jayesh (US - Bengaluru); Soni, Akhil Dharamprakash (US - > Bengaluru); Matta, Rishul (US - Bengaluru); Aich, Risha (US - Bengaluru); > Kumar, Rajinish (US - Bengaluru); Jain, Isha (US - Bengaluru); Kumar, > Sandeep (US - Bengaluru) &g

RE: GraphX Java API

2016-05-31 Thread Santoshakhilesh
, Rajinish (US - Bengaluru); Jain, Isha (US - Bengaluru); Kumar, Sandeep (US - Bengaluru) Subject: RE: GraphX Java API Hey, • I see some graphx packages listed here: http://spark.apache.org/docs/latest/api/java/index.html • org.apache.spark.graphx<http://spark.apache.org/docs/lat

Re: GraphX Java API

2016-05-30 Thread Chris Fregly
hi Yamamuro <linguin@gmail.com>; "Kumar, Abhishek (US - > Bengaluru)" <abhishekkuma...@deloitte.com> > *Cc:* "user@spark.apache.org" <user@spark.apache.org> > *Sent:* Monday, May 30, 2016 7:07 AM > *Subject:* Re: GraphX Java API > > No, you

Re: GraphX Java API

2016-05-30 Thread Michael Malak
mamuro <linguin@gmail.com>; "Kumar, Abhishek (US - Bengaluru)" <abhishekkuma...@deloitte.com> Cc: "user@spark.apache.org" <user@spark.apache.org> Sent: Monday, May 30, 2016 7:07 AM Subject: Re: GraphX Java API No, you can call any Scala API in Java. I

Re: GraphX Java API

2016-05-30 Thread Sean Owen
gt;> >> Aren’t they meant to be used with JAVA? >> >> Thanks >> >> >> >> *From:* Santoshakhilesh [mailto:santosh.akhil...@huawei.com] >> *Sent:* Friday, May 27, 2016 4:52 PM >> *To:* Kumar, Abhishek (US - Bengaluru) <abhishekkuma...@deloitte

Re: GraphX Java API

2016-05-29 Thread Takeshi Yamamuro
> Aren’t they meant to be used with JAVA? > > Thanks > > > > *From:* Santoshakhilesh [mailto:santosh.akhil...@huawei.com] > *Sent:* Friday, May 27, 2016 4:52 PM > *To:* Kumar, Abhishek (US - Bengaluru) <abhishekkuma...@deloitte.com>; > user@spark.apache.org > *Su

RE: GraphX Java API

2016-05-29 Thread Kumar, Abhishek (US - Bengaluru)
java/org/apache/spark/graphx/util/package-frame.html> Aren’t they meant to be used with JAVA? Thanks From: Santoshakhilesh [mailto:santosh.akhil...@huawei.com] Sent: Friday, May 27, 2016 4:52 PM To: Kumar, Abhishek (US - Bengaluru) <abhishekkuma...@deloitte.com>; user@spark.apache.org Subjec

Re: GraphX Java API

2016-05-29 Thread Jules Damji
Also, this blog talks about GraphsFrames implementation of some GraphX algorithms, accessible from Java, Scala, and Python https://databricks.com/blog/2016/03/03/introducing-graphframes.html Cheers Jules Sent from my iPhone Pardon the dumb thumb typos :) > On May 29, 2016, at 12:24 AM,

Re: GraphX Java API

2016-05-29 Thread Takeshi Yamamuro
Hi, Have you checked GraphFrame? See the related discussion: See https://issues.apache.org/jira/browse/SPARK-3665 // maropu On Fri, May 27, 2016 at 8:22 PM, Santoshakhilesh < santosh.akhil...@huawei.com> wrote: > GraphX APis are available only in Scala. If you need to use GraphX you > need to

RE: GraphX Java API

2016-05-27 Thread Santoshakhilesh
GraphX APis are available only in Scala. If you need to use GraphX you need to switch to Scala. From: Kumar, Abhishek (US - Bengaluru) [mailto:abhishekkuma...@deloitte.com] Sent: 27 May 2016 19:59 To: user@spark.apache.org Subject: GraphX Java API Hi, We are trying to consume the Java API for

Re: Graphx

2016-03-11 Thread Khaled Ammar
t;; Ovidiu-Cristian MARCU < > ovidiu-cristian.ma...@inria.fr>; lihu <lihu...@gmail.com>; Andrew A < > andrew.a...@gmail.com>; u...@spark.incubator.apache.org; Geoff Thompson < > geoff.thomp...@redpoint.net> > *Subject:* Re: Graphx > > > > Also we

RE: Graphx

2016-03-11 Thread John Lilley
diu-cristian.ma...@inria.fr>; lihu <lihu...@gmail.com>; Andrew A <andrew.a...@gmail.com>; u...@spark.incubator.apache.org; Geoff Thompson <geoff.thomp...@redpoint.net> Subject: Re: Graphx Also we keep the Node info minimal as needed for connected components and rejoin later. A

Re: Graphx

2016-03-11 Thread Alexis Roos
; >> >> >> *John Lilley* >> >> Chief Architect, RedPoint Global Inc. >> >> T: +1 303 541 1516 *| *M: +1 720 938 5761 *|* F: +1 781-705-2077 >> >> Skype: jlilley.redpoint *|* john.lil...@redpoint.net *|* www.redpoint.net >> >>

RE: Graphx

2016-03-11 Thread John Lilley
Andrew A <andrew.a...@gmail.com>; u...@spark.incubator.apache.org; Geoff Thompson <geoff.thomp...@redpoint.net> Subject: Re: Graphx we use it in prod 70 boxes, 61GB RAM each GraphX Connected Components works fine on 250M Vertices and 1B Edges (takes about 5-10 min) Spark likes memory,

Re: Graphx

2016-03-11 Thread Alexander Pivovarov
n.lil...@redpoint.net *|* www.redpoint.net > > > > *From:* John Lilley [mailto:john.lil...@redpoint.net] > *Sent:* Friday, March 11, 2016 8:46 AM > *To:* Ovidiu-Cristian MARCU <ovidiu-cristian.ma...@inria.fr> > *Cc:* lihu <lihu...@gmail.com>; Andrew A <andrew.a...@g

RE: Graphx

2016-03-11 Thread John Lilley
lihu <lihu...@gmail.com>; Andrew A <andrew.a...@gmail.com>; u...@spark.incubator.apache.org; Geoff Thompson <geoff.thomp...@redpoint.net> Subject: RE: Graphx Ovidiu, IMHO, this is one of the biggest issues facing GraphX and Spark. There are a lot of knobs and levers to pull to affec

RE: Graphx

2016-03-11 Thread John Lilley
net> Cc: lihu <lihu...@gmail.com>; Andrew A <andrew.a...@gmail.com>; u...@spark.incubator.apache.org Subject: Re: Graphx Hi, I wonder what version of Spark and different parameter configuration you used. I was able to run CC for 1.8bn edges in about 8 minutes (23 iterations) using

RE: Graphx

2016-03-11 Thread John Lilley
A <andrew.a...@gmail.com>; u...@spark.incubator.apache.org Subject: Re: Graphx Hi, I wonder what version of Spark and different parameter configuration you used. I was able to run CC for 1.8bn edges in about 8 minutes (23 iterations) using 16 nodes with around 80GB RAM each (Spark 1.5, default p

Re: Graphx

2016-03-11 Thread Ovidiu-Cristian MARCU
ley.redpoint | john.lil...@redpoint.net > <mailto:john.lil...@redpoint.net> | www.redpoint.net > <http://www.redpoint.net/> > > From: lihu [mailto:lihu...@gmail.com] > Sent: Friday, March 11, 2016 7:58 AM > To: John Lilley <john.lil...@redpoint.net> > Cc: A

Re: Graphx

2016-03-11 Thread lihu
Hi, John: I am very intersting in your experiment, How can you get that RDD serialization cost lots of time, from the log or some other tools? On Fri, Mar 11, 2016 at 8:46 PM, John Lilley wrote: > Andrew, > > > > We conducted some tests for using Graphx to solve

RE: Graphx

2016-03-11 Thread John Lilley
ay, March 11, 2016 7:58 AM To: John Lilley <john.lil...@redpoint.net> Cc: Andrew A <andrew.a...@gmail.com>; u...@spark.incubator.apache.org Subject: Re: Graphx Hi, John: I am very intersting in your experiment, How can you get that RDD serialization cost lots of time, from the

RE: Graphx

2016-03-11 Thread John Lilley
Andrew, We conducted some tests for using Graphx to solve the connected-components problem and were disappointed. On 8 nodes of 16GB each, we could not get above 100M edges. On 8 nodes of 60GB each, we could not process 1bn edges. RDD serialization would take excessive time and then we

Re: GraphX can show graph?

2016-01-29 Thread Balachandar R.A.
Thanks... Will look into that - Bala On 28 January 2016 at 15:36, Sahil Sareen wrote: > Try Neo4j for visualization, GraphX does a pretty god job at distributed > graph processing. > > On Thu, Jan 28, 2016 at 12:42 PM, Balachandar R.A. < > balachandar...@gmail.com> wrote:

Re: GraphX can show graph?

2016-01-29 Thread Russell Jurney
Maybe checkout Gephi. It is a program that does what you need out of the box. On Friday, January 29, 2016, Balachandar R.A. wrote: > Thanks... Will look into that > > - Bala > > On 28 January 2016 at 15:36, Sahil Sareen

Re: GraphX can show graph?

2016-01-28 Thread Sahil Sareen
Try Neo4j for visualization, GraphX does a pretty god job at distributed graph processing. On Thu, Jan 28, 2016 at 12:42 PM, Balachandar R.A. wrote: > Hi > > I am new to GraphX. I have a simple csv file which I could load and > compute few graph statistics. However, I

Re: GraphX - How to make a directed graph an undirected graph?

2015-11-26 Thread Robineast
1. GraphX doesn't have a concept of undirected graphs, Edges are always specified with a srcId and dstId. However there is nothing to stop you adding in edges that point in the other direction i.e. if you have an edge with srcId -> dstId you can add an edge dstId -> srcId 2. In general APIs will

Re: graphx - mutable?

2015-10-14 Thread rohit13k
Hi I am also working on the same area where the graph evolves over time and the current approach of rebuilding the graph again and again is very slow and memory consuming did you find any workaround? What was your usecase? -- View this message in context:

Re: GraphX: How can I tell if 2 nodes are connected?

2015-10-06 Thread Dino Fancellu
Ok, thanks, just wanted to make sure I wasn't missing something obvious. I've worked with Neo4j cypher as well, where it was rather more obvious. e.g. http://neo4j.com/docs/milestone/query-match.html#_shortest_path http://neo4j.com/docs/stable/cypher-refcard/ Dino. On 6 October 2015 at 06:43,

RE: Graphx hangs and crashes on EdgeRDD creation

2015-10-06 Thread William Saar
); graph.connectedComponents().vertices From: Robin East [mailto:robin.e...@xense.co.uk] Sent: den 5 oktober 2015 19:07 To: William Saar <william.s...@king.com>; user@spark.apache.org Subject: Re: Graphx hangs and crashes on EdgeRDD creation Have you tried using Graph.partitionBy? e.g.

Re: GraphX: How can I tell if 2 nodes are connected?

2015-10-05 Thread Dino Fancellu
Ah thanks, got it working with that. e.g. val (_,smap)=shortest.vertices.filter(_._1==src).first smap.contains(dest) Is there anything a little less eager? i.e. that doesn't compute all the distances from all source nodes, where I can supply the source vertex id, dest vertex id, and just get

Re: GraphX: How can I tell if 2 nodes are connected?

2015-10-05 Thread Anwar Rizal
Maybe connected component is what you need ? On Oct 5, 2015 19:02, "Robineast" wrote: > GraphX has a Shortest Paths algorithm implementation which will tell you, > for > all vertices in the graph, the shortest distance to a specific ('landmark') > vertex. The returned

Re: GraphX: How can I tell if 2 nodes are connected?

2015-10-05 Thread Robineast
GraphX doesn't implement Tinkerpop functionality but there is an external effort to provide an implementation. See https://issues.apache.org/jira/plugins/servlet/mobile#issue/SPARK-4279 - Robin East Spark GraphX in Action Michael Malak and Robin East Manning Publications Co.

Re: GraphX: How can I tell if 2 nodes are connected?

2015-10-05 Thread Robineast
GraphX has a Shortest Paths algorithm implementation which will tell you, for all vertices in the graph, the shortest distance to a specific ('landmark') vertex. The returned value is '/a graph where each vertex attribute is a map containing the shortest-path distance to each reachable landmark

Re: GraphX create graph with multiple node attributes

2015-09-26 Thread JJ
Robineast wrote > 2) let GraphX supply a null instead > val graph = Graph(vertices, edges) // vertices found in 'edges' but > not in 'vertices' will be set to null Thank you! This method works. As a follow up (sorry I'm new to this, don't know if I should start a new thread?): if I have

Re: GraphX create graph with multiple node attributes

2015-09-26 Thread JJ
Here is all of my code. My first post had a simplified version. As I post this, I realize one issue may be that when I convert my Ids to long (I define a pageHash function to convert string Ids to long), the nodeIds are no longer the same between the 'vertices' object and the 'edges' object. Do

Re: GraphX create graph with multiple node attributes

2015-09-26 Thread Robineast
Vertices that aren't connected to anything are perfectly valid e.g. import org.apache.spark.graphx._ val vertices = sc.makeRDD(Seq((1L,1),(2L,1),(3L,1))) val edges = sc.makeRDD(Seq(Edge(1L,2L,1))) val g = Graph(vertices, edges) g.vertices.count gives 3 Not sure why vertices appear to be

Re: GraphX create graph with multiple node attributes

2015-09-26 Thread Nick Peterson
Have you checked to make sure that your hashing function doesn't have any collisions? Node ids have to be unique; so, if you're getting repeated ids out of your hasher, it could certainly lead to dropping of duplicate ids, and therefore loss of vertices. On Sat, Sep 26, 2015 at 10:37 AM JJ

Re: Graphx CompactBuffer help

2015-08-28 Thread Robineast
my previous reply got mangled This should work: coon.filter(x = x.exists(el = Seq(1,15).contains(el))) CompactBuffer is a specialised form of a Scala Iterator --- Robin East Spark GraphX in Action Michael Malak and

Re: graphx class not found error

2015-08-13 Thread Ted Yu
The code and error didn't go through. Mind sending again ? Which Spark release are you using ? On Thu, Aug 13, 2015 at 6:17 PM, dizzy5112 dave.zee...@gmail.com wrote: the code below works perfectly on both cluster and local modes but when i try to create a graph in cluster mode (it works

Re: graphx class not found error

2015-08-13 Thread dizzy5112
Oh forgot to note using the Scala REPL for this. -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/graphx-class-not-found-error-tp24253p24254.html Sent from the Apache Spark User List mailing list archive at Nabble.com.

Re: GraphX Synth Benchmark

2015-07-09 Thread Khaled Ammar
Hi, I am not a spark expert but I found that passing a small partitions value might help. Try to use this option --numEPart=$partitions where partitions=3 (number of workers) or at most 3*40 (total number of cores). Thanks, -Khaled On Thu, Jul 9, 2015 at 11:37 AM, AshutoshRaghuvanshi

Re: GraphX - ConnectedComponents (Pregel) - longer and longer interval between jobs

2015-06-29 Thread Thomas Gerber
It seems the root cause of the delay was the sheer size of the DAG for those jobs, which are towards the end of a long series of jobs. To reduce it, you can probably try to checkpoint (rdd.checkpoint) some previous RDDs. That will: 1. save the RDD on disk 2. remove all references to the parents

Re: GraphX - ConnectedComponents (Pregel) - longer and longer interval between jobs

2015-06-26 Thread Thomas Gerber
Note that this problem is probably NOT caused directly by GraphX, but GraphX reveals it because as you go further down the iterations, you get further and further away of a shuffle you can rely on. On Thu, Jun 25, 2015 at 7:43 PM, Thomas Gerber thomas.ger...@radius.com wrote: Hello, We run

Re: GraphX: unbalanced computation and slow runtime on livejournal network

2015-04-19 Thread hnahak
Hi Steve i did spark 1.3.0 page rank bench-marking on soc-LiveJournal1 in 4 node cluster. 16,16,8,8 Gbs ram respectively. Cluster have 4 worker including master with 4,4,2,2 CPUs I set executor memroy to 3g and driver to 5g. No. of Iterations -- GraphX(mins) 1 -- 1 2

Re: [GraphX] aggregateMessages with active set

2015-04-13 Thread James
Hello, Great thanks for your reply. From the code I found that the reason why my program will scan all the edges is becasue of the EdgeDirection I passed into is EdgeDirection.Either. However I still met the problem of Time consuming of each iteration will not decrease by time. Thus I have two

Re: [GraphX] aggregateMessages with active set

2015-04-09 Thread James
In aggregateMessagesWithActiveSet, Spark still have to read all edges. It means that a fixed time which scale with graph size is unavoidable on a pregel-like iteration. But what if I have to iterate nearly 100 iterations but at the last 50 iterations there are only 0.1% nodes need to be updated

Re: [GraphX] aggregateMessages with active set

2015-04-09 Thread Ankur Dave
Actually, GraphX doesn't need to scan all the edges, because it maintains a clustered index on the source vertex id (that is, it sorts the edges by source vertex id and stores the offsets in a hash table). If the activeDirection is appropriately set, it can then jump only to the clusters with

Re: [GraphX] aggregateMessages with active set

2015-04-07 Thread Ankur Dave
We thought it would be better to simplify the interface, since the active set is a performance optimization but the result is identical to calling subgraph before aggregateMessages. The active set option is still there in the package-private method aggregateMessagesWithActiveSet. You can actually

Re: Graphx gets slower as the iteration number increases

2015-03-24 Thread Ankur Dave
This might be because partitions are getting dropped from memory and needing to be recomputed. How much memory is in the cluster, and how large are the partitions? This information should be in the Executors and Storage pages in the web UI. Ankur http://www.ankurdave.com/ On Tue, Mar 24, 2015 at

Re: GraphX: Get edges for a vertex

2015-03-18 Thread Jeffrey Jedele
Hi Mas, I never actually worked with GraphX, but one idea: As far as I know, you can directly access the vertex and edge RDDs of your Graph object. Why not simply run a .filter() on the edge RDD to get all edges that originate from or end at your vertex? Regards, Jeff 2015-03-18 10:52 GMT+01:00

Re: GraphX Snapshot Partitioning

2015-03-14 Thread Takeshi Yamamuro
Large edge partitions could cause java.lang.OutOfMemoryError, and then spark tasks fails. FWIW, each edge partition can have at most 2^32 edges because 64-bit vertex IDs are mapped into 32-bit ones in each partitions. If #edges is over the limit, graphx could throw ArrayIndexOutOfBoundsException,

Re: [GRAPHX] could not process graph with 230M edges

2015-03-14 Thread Takeshi Yamamuro
Hi, If you have heap problems in spark/graphx, it'd be better to split partitions into smaller ones so as to fit the partition on memory. On Sat, Mar 14, 2015 at 12:09 AM, Hlib Mykhailenko hlib.mykhaile...@inria.fr wrote: Hello, I cannot process graph with 230M edges. I cloned

Re: GraphX Snapshot Partitioning

2015-03-11 Thread Matthew Bucci
Hi, Thanks for the response! That answered some questions I had, but the last one I was wondering is what happens if you run a partition strategy and one of the partitions ends up being too large? For example, let's say partitions can hold 64MB (actually knowing the maximum possible size of a

Re: GraphX Snapshot Partitioning

2015-03-09 Thread Takeshi Yamamuro
Hi, Vertices are simply hash-paritioned by their 64-bit IDs, so they are evenly spread over parititons. As for edges, GraphLoader#edgeList builds edge paritions through hadoopFile(), so the initial parititons depend on InputFormat#getSplits implementations (e.g, partitions are mostly equal to

Re: GraphX path traversal

2015-03-04 Thread Robin East
Actually your Pregel code works for me: import org.apache.spark._ import org.apache.spark.graphx._ import org.apache.spark.rdd.RDD val vertexlist = Array((1L,One), (2L,Two), (3L,Three), (4L,Four),(5L,Five),(6L,Six)) val edgelist = Array(Edge(6,5,6 to 5),Edge(5,4,5 to 4),Edge(4,3,4 to 3),

Re: GraphX path traversal

2015-03-03 Thread Madabhattula Rajesh Kumar
Hi, Could you please let me know how to do this? (or) Any suggestion Regards, Rajesh On Mon, Mar 2, 2015 at 4:47 PM, Madabhattula Rajesh Kumar mrajaf...@gmail.com wrote: Hi, I have a below edge list. How to find the parents path for every vertex? Example : Vertex 1 path : 2, 3, 4, 5, 6

Re: GraphX path traversal

2015-03-03 Thread Madabhattula Rajesh Kumar
Hi Robin, Thank you for your response. Please find below my question. I have a below edge file Source Vertex Destination Vertex 1 2 2 3 3 4 4 5 5 6 6 6 In this graph 1st vertex is connected to 2nd vertex, 2nd Vertex is connected to 3rd vertex,. 6th vertex is connected to 6th vertex.

Re: GraphX path traversal

2015-03-03 Thread Madabhattula Rajesh Kumar
Hi, I have tried below program using pergel API but I'm not able to get my required output. I'm getting exactly reverse output which I'm expecting. // Creating graph using above mail mentioned edgefile val graph: Graph[Int, Int] = GraphLoader.edgeListFile(sc,

Re: GraphX path traversal

2015-03-03 Thread Robin East
Have you tried EdgeDirection.In? On 3 Mar 2015, at 16:32, Robin East robin.e...@xense.co.uk wrote: What about the following which can be run in spark shell: import org.apache.spark._ import org.apache.spark.graphx._ import org.apache.spark.rdd.RDD val vertexlist = Array((1L,One),

Re: GraphX path traversal

2015-03-03 Thread Robin East
What about the following which can be run in spark shell: import org.apache.spark._ import org.apache.spark.graphx._ import org.apache.spark.rdd.RDD val vertexlist = Array((1L,One), (2L,Two), (3L,Three), (4L,Four),(5L,Five),(6L,Six)) val edgelist = Array(Edge(6,5,6 to 5),Edge(5,4,5 to

Re: GraphX path traversal

2015-03-03 Thread Robin East
Rajesh I'm not sure if I can help you, however I don't even understand the question. Could you restate what you are trying to do. Sent from my iPhone On 2 Mar 2015, at 11:17, Madabhattula Rajesh Kumar mrajaf...@gmail.com wrote: Hi, I have a below edge list. How to find the parents

Re: [GraphX] Excessive value recalculations during aggregateMessages cycles

2015-02-15 Thread Takeshi Yamamuro
Hi, I tried quick and simple tests though, ISTM the vertices below were correctly cached. Could you give me the differences between my codes and yours? import org.apache.spark.graphx._ import org.apache.spark.graphx.lib._ object Prog { def processInt(d: Int) = d * 2 } val g =

Re: [GraphX] Excessive value recalculations during aggregateMessages cycles

2015-02-08 Thread Kyle Ellrott
I changed the curGraph = curGraph.outerJoinVertices(curMessages)( (vid, vertex, message) = vertex.process(message.getOrElse(List[Message]()), ti) ).cache() to curGraph = curGraph.outerJoinVertices(curMessages)( (vid, vertex, message) = (vertex,

  1   2   >