RE: Graphx

2016-03-11 Thread John Lilley
We have almost zero node info – just an identifying integer. John Lilley From: Alexis Roos [mailto:alexis.r...@gmail.com] Sent: Friday, March 11, 2016 11:24 AM To: Alexander Pivovarov <apivova...@gmail.com> Cc: John Lilley <john.lil...@redpoint.net>; Ovidiu-Cristian MARCU <ovi

RE: Graphx

2016-03-11 Thread John Lilley
to run our software on 1bn edges. John Lilley From: Alexander Pivovarov [mailto:apivova...@gmail.com] Sent: Friday, March 11, 2016 11:13 AM To: John Lilley <john.lil...@redpoint.net> Cc: Ovidiu-Cristian MARCU <ovidiu-cristian.ma...@inria.fr>; lihu <lihu...@gmail.com>;

RE: Graphx

2016-03-11 Thread John Lilley
currentGroupSize++; } if (currentGroupSize >= groupSize) { currentGroupSize = 0; currentEdge += 2; } else { currentEdge++; } } } } John Lilley Chief Architect, RedPoint Global Inc. T: +1 303 541 1516 | M: +1 720 938 5761 | F: +1 781-705-2077 Skype: jl

RE: Graphx

2016-03-11 Thread John Lilley
currentGroupSize++; } if (currentGroupSize >= groupSize) { currentGroupSize = 0; currentEdge += 2; } else { currentEdge++; } } } } John Lilley Chief Architect, RedPoint Global Inc. T: +1 303 541 1516 | M: +1 720 938 5761

RE: Graphx

2016-03-11 Thread John Lilley
. It degrades gracefully along the O(N^2) curve and additional memory reduces time. John Lilley From: Ovidiu-Cristian MARCU [mailto:ovidiu-cristian.ma...@inria.fr] Sent: Friday, March 11, 2016 8:14 AM To: John Lilley <john.lil...@redpoint.net> Cc: lihu <lihu...@gmail.com>; Andrew

RE: Graphx

2016-03-11 Thread John Lilley
ay, March 11, 2016 7:58 AM To: John Lilley <john.lil...@redpoint.net> Cc: Andrew A <andrew.a...@gmail.com>; u...@spark.incubator.apache.org Subject: Re: Graphx Hi, John: I am very intersting in your experiment, How can you get that RDD serialization cost lots of time, from the

RE: Graphx

2016-03-11 Thread John Lilley
would get failures. By contrast, we have a C++ algorithm that solves 1bn edges using memory+disk on a single 16GB node in about an hour. I think that a very large cluster will do better, but we did not explore that. John Lilley Chief Architect, RedPoint Global Inc. T: +1 303 541 1516 | M: +1 720

RE: Question about GraphX connected-components

2015-10-12 Thread John Lilley
is reached? Are there tuning parameters that optimize for data all fitting in memory vs. data that must spill? Thanks, John Lilley From: Igor Berman [mailto:igor.ber...@gmail.com] Sent: Saturday, October 10, 2015 12:06 PM To: John Lilley <john.lil...@redpoint.net> Cc: user@spark.apache.org;

Question about GraphX connected-components

2015-10-09 Thread John Lilley
happens when the data set exceed memory, does it spill to disk "nicely" or degrade catastrophically? Thanks, John Lilley