date:20120307

[jira] [Commented] (GIRAPH-153) HBase/Accumulo Input and Output formats

2012-03-07 Thread Brian Femiano (Commented) (JIRA)


[ 
https://issues.apache.org/jira/browse/GIRAPH-153?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13224762#comment-13224762
 ] 

Brian Femiano commented on GIRAPH-153:
--

More on the algorithms. This was using the LineRecordReader from flat text 
files in HDFS. Reading from HBase adds about 20min of setup overhead. 
My graph nodes take the form of 

001 -1 -1  06, 007 
where 001 = node guid
and 06, and 07 are children linked to that guid. A child can belong to 
more than 1 guid, hence the formation of a graph instead of a simple tree. 

I have tested these against 10 and 19 node clusters using m1.4xlarge instances 
via EC2. 70GB, 8 virtual cores per machine. GigE backplane bandwidth. 

Datasets included 98mil with 1050 possible hops from a given root to any leaf. 

The 98mil runs in about 1 hour using 143 workers (144 map tasks -1 for the 
master). Adding more workers with more concurrent map slots does not lead to 
better performance. In fact running with over 150 workers will cause FS open 
descriptor limit issues, even when increasing the ulimit for all users on each 
machine and bouncing the cluster. The algorithm and dataset exhibit the best 
performance on as few a workers as possible necessary to instantiate all 98mil 
nodes. 10 seems to be the sweet spot. There's just enough heap per JVM for all 
79 workers to load the graph in a responsible amount of time, without incurring 
the stack overhead of having too many workers to open a communication channel 
to. I understand there is 'num_flush_thread' and other configuration parameters 
designed to control this, but I haven't experimented with them yet. This is 
over Federa 8 AMI images with 7GB heap per worker JVM. 

Running just a small sample (1mil nodes) over 160+ workers leads to similar 
results. "Unable to create new native thread" or "Too many open FS". 
Throttle back the number of workers causes the symptoms to go away.  I've been 
following Giraph-45 which I believe is related. 

I'm happy to expand on these issues for anyone interested. 

I also worked on a variant algorithm that passes each node in the graph a copy 
of every spanning tree from each root of which the node is connected back to. 
It does not adhere to strict connected component rules, just for reference. 
Naturally this lead to a much higher volume of message traffic and gc limit 
overhead reached issues. The 98mil graph would produce around 11billion 
messages in the first 5 supersteps.  Concatenating the spanning trees together 
as one message to limit the overall # of messages would lead to OOM issues. I 
don't have a requirement to perform this nasty a BSP algorithm, but it was an 
interesting stress test. 


> HBase/Accumulo Input and Output formats
> ---
>
> Key: GIRAPH-153
> URL: https://issues.apache.org/jira/browse/GIRAPH-153
> Project: Giraph
>  Issue Type: New Feature
>  Components: bsp
>Affects Versions: 0.1.0
> Environment: Single host OSX 10.6.8 2.2Ghz Intel i7, 8GB
>Reporter: Brian Femiano
> Attachments: AccumuloRootMarker.java, 
> AccumuloRootMarkerInputFormat.java, AccumuloRootMarkerOutputFormat.java, 
> AccumuloVertexInputFormat.java, AccumuloVertexOutputFormat.java, 
> ComputeIsRoot.java, DistributedCacheHelper.java, HBaseVertexInputFormat.java, 
> HBaseVertexOutputFormat.java, IdentifyAndMarkRoots.java, 
> SetLongWritable.java, SetTextWritable.java, TableRootMarker.java, 
> TableRootMarkerInputFormat.java, TableRootMarkerOutputFormat.java
>
>
> Four abstract classes that wrap their respective delegate input/output 
> formats for
> easy hooks into vertex input format subclasses. I've included some sample 
> programs that show two very simple graph
> algorithms. I have a graph generator that builds out a very simple directed 
> structure, starting with a few 'root' nodes.
> Root nodes are defined as nodes which are not listed as a child anywhere in 
> the graph. 
> Algorithm 1) AccumuloRootMarker.java  --> Accumulo as read/write source. 
> Every vertex starts thinking it's a root. At superstep 0, send a message down 
> to each
> child as a non-root notification. After superstep 1, only root nodes will 
> have never been messaged. 
> Algorithm 2) TableRootMarker --> HBase as read/write source. Expands on A1 by 
> bundling the notification logic followed by root node propagation. Once we've 
> marked the appropriate nodes as roots, tell every child which roots it can be 
> traced back to via one or more spanning trees. This will take N + 2 
> supersteps where N is the maximum number of hops from any root to any leaf, 
> plus 2 supersteps for the initial root flagging. 
> I've included all relevant code plus DistributedCacheHelper.java for 
> recursive cache file and archive searches. It is more hadoop centric t

[jira] [Updated] (GIRAPH-153) HBase/Accumulo Input and Output formats

2012-03-07 Thread Brian Femiano (Updated) (JIRA)


 [ 
https://issues.apache.org/jira/browse/GIRAPH-153?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Brian Femiano updated GIRAPH-153:
-

 Labels:   (was: gir)
Description: 
Four abstract classes that wrap their respective delegate input/output formats 
for
easy hooks into vertex input format subclasses. I've included some sample 
programs that show two very simple graph
algorithms. I have a graph generator that builds out a very simple directed 
structure, starting with a few 'root' nodes.

Root nodes are defined as nodes which are not listed as a child anywhere in the 
graph. 

Algorithm 1) AccumuloRootMarker.java  --> Accumulo as read/write source. Every 
vertex starts thinking it's a root. At superstep 0, send a message down to each
child as a non-root notification. After superstep 1, only root nodes will have 
never been messaged. 

Algorithm 2) TableRootMarker --> HBase as read/write source. Expands on A1 by 
bundling the notification logic followed by root node propagation. Once we've 
marked the appropriate nodes as roots, tell every child which roots it can be 
traced back to via one or more spanning trees. This will take N + 2 supersteps 
where N is the maximum number of hops from any root to any leaf, plus 2 
supersteps for the initial root flagging. 

I've included all relevant code plus DistributedCacheHelper.java for recursive 
cache file and archive searches. It is more hadoop centric than giraph, but 
these jobs use it so I figured why not commit here. 

These have been tested through local JobRunner, pseudo-distributed on the 
aforementioned hardware, and full distributed on EC2. More details in the 
comments.



  was:
Four abstract classes that wrap their respective delegate input/output formats 
for
easy hooks into vertex input format subclasses. I've included some sample 
programs that show two very simple graph
algorithms. I have a graph generator that builds out a very simple direct 
structure, starting with a few 'root' nodes.

Root nodes are defined as nodes that is not listed as a child anywhere in the 
graph. 

Algorithm 1) AccumuloRootMarker.java  --> Accumulo as read/write source. Every 
vertex starts thinking it's a root. At superstep 0, send a message down to each
child as a non-root notification. After superstep 1, only root nodes will have 
never been messaged. 

Algorithm 2) TableRootMarker --> HBase as read/write source. Expands on A1 by 
bundling the notification logic followed by root node propagation. Once we've 
marked the appropriate nodes as roots, tell every child which roots it can be 
traced back to via one or more spanning trees. This will take N + 2 supersteps 
where N is the maximum number of hops from any root to any leaf, plus 2 
supersteps for the initial root flagging. 

I've included all relevant code plus DistributedCacheHelper.java for recursive 
cache file and archive searches. It is more hadoop centric than giraph in 
particular, but these jobs use it so I figured why not commit here. 

These have been tested through local JobRunner, pseudo-distributed on the 
aforementioned hardware, and full distributed on EC2. More details in the 
comments.




> HBase/Accumulo Input and Output formats
> ---
>
> Key: GIRAPH-153
> URL: https://issues.apache.org/jira/browse/GIRAPH-153
> Project: Giraph
>  Issue Type: New Feature
>  Components: bsp
>Affects Versions: 0.1.0
> Environment: Single host OSX 10.6.8 2.2Ghz Intel i7, 8GB
>Reporter: Brian Femiano
> Attachments: AccumuloRootMarker.java, 
> AccumuloRootMarkerInputFormat.java, AccumuloRootMarkerOutputFormat.java, 
> AccumuloVertexInputFormat.java, AccumuloVertexOutputFormat.java, 
> ComputeIsRoot.java, DistributedCacheHelper.java, HBaseVertexInputFormat.java, 
> HBaseVertexOutputFormat.java, IdentifyAndMarkRoots.java, 
> SetLongWritable.java, SetTextWritable.java, TableRootMarker.java, 
> TableRootMarkerInputFormat.java, TableRootMarkerOutputFormat.java
>
>
> Four abstract classes that wrap their respective delegate input/output 
> formats for
> easy hooks into vertex input format subclasses. I've included some sample 
> programs that show two very simple graph
> algorithms. I have a graph generator that builds out a very simple directed 
> structure, starting with a few 'root' nodes.
> Root nodes are defined as nodes which are not listed as a child anywhere in 
> the graph. 
> Algorithm 1) AccumuloRootMarker.java  --> Accumulo as read/write source. 
> Every vertex starts thinking it's a root. At superstep 0, send a message down 
> to each
> child as a non-root notification. After superstep 1, only root nodes will 
> have never been messaged. 
> Algorithm 2) TableRootMarker --> HBase as read/write source. Expands on A1 by 
> bundling the notification logic followed by root node propagation. Once we've 
> marked t

[jira] [Updated] (GIRAPH-153) HBase/Accumulo Input and Output formats

2012-03-07 Thread Brian Femiano (Updated) (JIRA)


 [ 
https://issues.apache.org/jira/browse/GIRAPH-153?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Brian Femiano updated GIRAPH-153:
-

Attachment: TableRootMarkerOutputFormat.java
TableRootMarkerInputFormat.java
TableRootMarker.java
SetTextWritable.java
SetLongWritable.java
IdentifyAndMarkRoots.java
HBaseVertexOutputFormat.java
HBaseVertexInputFormat.java
DistributedCacheHelper.java
ComputeIsRoot.java
AccumuloVertexOutputFormat.java
AccumuloVertexInputFormat.java
AccumuloRootMarkerOutputFormat.java
AccumuloRootMarkerInputFormat.java
AccumuloRootMarker.java

> HBase/Accumulo Input and Output formats
> ---
>
> Key: GIRAPH-153
> URL: https://issues.apache.org/jira/browse/GIRAPH-153
> Project: Giraph
>  Issue Type: New Feature
>  Components: bsp
>Affects Versions: 0.1.0
> Environment: Single host OSX 10.6.8 2.2Ghz Intel i7, 8GB
>Reporter: Brian Femiano
>  Labels: gir
> Attachments: AccumuloRootMarker.java, 
> AccumuloRootMarkerInputFormat.java, AccumuloRootMarkerOutputFormat.java, 
> AccumuloVertexInputFormat.java, AccumuloVertexOutputFormat.java, 
> ComputeIsRoot.java, DistributedCacheHelper.java, HBaseVertexInputFormat.java, 
> HBaseVertexOutputFormat.java, IdentifyAndMarkRoots.java, 
> SetLongWritable.java, SetTextWritable.java, TableRootMarker.java, 
> TableRootMarkerInputFormat.java, TableRootMarkerOutputFormat.java
>
>
> Four abstract classes that wrap their respective delegate input/output 
> formats for
> easy hooks into vertex input format subclasses. I've included some sample 
> programs that show two very simple graph
> algorithms. I have a graph generator that builds out a very simple direct 
> structure, starting with a few 'root' nodes.
> Root nodes are defined as nodes that is not listed as a child anywhere in the 
> graph. 
> Algorithm 1) AccumuloRootMarker.java  --> Accumulo as read/write source. 
> Every vertex starts thinking it's a root. At superstep 0, send a message down 
> to each
> child as a non-root notification. After superstep 1, only root nodes will 
> have never been messaged. 
> Algorithm 2) TableRootMarker --> HBase as read/write source. Expands on A1 by 
> bundling the notification logic followed by root node propagation. Once we've 
> marked the appropriate nodes as roots, tell every child which roots it can be 
> traced back to via one or more spanning trees. This will take N + 2 
> supersteps where N is the maximum number of hops from any root to any leaf, 
> plus 2 supersteps for the initial root flagging. 
> I've included all relevant code plus DistributedCacheHelper.java for 
> recursive cache file and archive searches. It is more hadoop centric than 
> giraph in particular, but these jobs use it so I figured why not commit here. 
> These have been tested through local JobRunner, pseudo-distributed on the 
> aforementioned hardware, and full distributed on EC2. More details in the 
> comments.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Created] (GIRAPH-153) HBase/Accumulo Input and Output formats

2012-03-07 Thread Brian Femiano (Created) (JIRA)

HBase/Accumulo Input and Output formats
---

 Key: GIRAPH-153
 URL: https://issues.apache.org/jira/browse/GIRAPH-153
 Project: Giraph
  Issue Type: New Feature
  Components: bsp
Affects Versions: 0.1.0
 Environment: Single host OSX 10.6.8 2.2Ghz Intel i7, 8GB
Reporter: Brian Femiano
 Attachments: AccumuloRootMarker.java, 
AccumuloRootMarkerInputFormat.java, AccumuloRootMarkerOutputFormat.java, 
AccumuloVertexInputFormat.java, AccumuloVertexOutputFormat.java, 
ComputeIsRoot.java, DistributedCacheHelper.java, HBaseVertexInputFormat.java, 
HBaseVertexOutputFormat.java, IdentifyAndMarkRoots.java, SetLongWritable.java, 
SetTextWritable.java, TableRootMarker.java, TableRootMarkerInputFormat.java, 
TableRootMarkerOutputFormat.java

Four abstract classes that wrap their respective delegate input/output formats 
for
easy hooks into vertex input format subclasses. I've included some sample 
programs that show two very simple graph
algorithms. I have a graph generator that builds out a very simple direct 
structure, starting with a few 'root' nodes.

Root nodes are defined as nodes that is not listed as a child anywhere in the 
graph. 

Algorithm 1) AccumuloRootMarker.java  --> Accumulo as read/write source. Every 
vertex starts thinking it's a root. At superstep 0, send a message down to each
child as a non-root notification. After superstep 1, only root nodes will have 
never been messaged. 

Algorithm 2) TableRootMarker --> HBase as read/write source. Expands on A1 by 
bundling the notification logic followed by root node propagation. Once we've 
marked the appropriate nodes as roots, tell every child which roots it can be 
traced back to via one or more spanning trees. This will take N + 2 supersteps 
where N is the maximum number of hops from any root to any leaf, plus 2 
supersteps for the initial root flagging. 

I've included all relevant code plus DistributedCacheHelper.java for recursive 
cache file and archive searches. It is more hadoop centric than giraph in 
particular, but these jobs use it so I figured why not commit here. 

These have been tested through local JobRunner, pseudo-distributed on the 
aforementioned hardware, and full distributed on EC2. More details in the 
comments.



--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

Re: Graph clustering via LinLog force directed layout

2012-03-07 Thread Claudio Martella

Ok, but why a bipartite graph? I can imagine you can have 2 or 3
coordinates associated with each vertices without actually having multiple
vertex types.


On Wednesday, March 7, 2012, Timmy Wilson  wrote:
>> while i'd expect from linlog something like an (x,y) coordinate. is
>> that correct?
>
> (x,y,z) is also common -- i think of it as extreme dimension reduction
> -- after which you're free to use your favorite gis tool --
> http://postgis.refractions.net/ -- you can also pull in tools from
> statistical mechanics/thermodynamics
>
> Without complicating the algo for efficiency -- the essence of the
> energy model is simple -- all nodes repulse each other -- connected
> nodes attract each other -- this works for any type of graph -- i'm
> using a weighted directed graph -- in which case the weighs influence
> the attractive force
>
> It's all very continuous/natural -- living in a 3d world we've build a
> lot of tools/methods to process this kind of information
>
> The problem is without efficiency methods like Barnes-Hut we have
> Cartesian(x,y) or Cartesian(x,y,z) -- because every vertex influences
> every other vertex
>
> An efficient giraph implementation would need 2 layers -- a bipartite
> graph -- assuming (x,y,z) reduction -- we would need an Octree graph
> interacting dynamically w/ our graph of interest
>
>
> On Wed, Mar 7, 2012 at 5:00 AM, Claudio Martella
>  wrote:
>> My personal take is that they do have similar function (they "extract"
>> communities), but they have a general different type of output. in
>> label propagation you'd end up with an id for each vertex (the vertex
>> id that is the centroid for the community each vertex belongs to),
>> while i'd expect from linlog something like an (x,y) coordinate. is
>> that correct?
>>
>> On Wed, Mar 7, 2012 at 2:45 AM, Timmy Wilson 
wrote:
>>> Thank you everyone!
>>>
>>> I would love to see a comparison of force directed layouts
>>> (specifically LinLog) and label propagation.
>>>
>>> I searched but alas nothing -- they seem to be oddly similar?
>>>
>>> The current, serial LinLog implementation --
>>> http://code.google.com/p/linloglayout/ -- uses Barnes-Hut simulation
>>> -- n*log(n):
>>>
>>> http://en.wikipedia.org/wiki/Barnes%E2%80%93Hut_simulation
>>>
>>> I guess the root question is -- do you think it's reasonable to use
>>> giraph for Barnes-Hut simulation?
>>>
>>>
>>>
>>>
>>> On Tue, Mar 6, 2012 at 11:34 AM, Claudio Martella
>>>  wrote:
 Hi,

 I'm not definitely familiar with the algorithm or implementation of
 LinLog, I've been just a user. It should be doable with Giraph if you
 can express it in terms of message-passing between vertices and
 without a dependency on a global view of the graph (except for the
 convergence criteria, such as total energy).

 Please consider that Giraph's data model is based on a directed graph,
 this should be a quite "interesting" constraint for you, if your
 implementation is going to modify energy associated with edges (you'd
 have two views over the undirected edge, one in each endpoint).

 In general, a good way of doing community analysis would be to look at
 algorithms that belong to the family of label-propagation clustering
 algorithms.


 Hope this helps,
 Claudio

 On Tue, Mar 6, 2012 at 3:28 PM, Timmy Wilson 
wrote:
> Hi giraph community,
>
> I'm interested in using giraph for distributed n-body simulation.
>
> Initially, i'm interested in force directed layouts -- ie, graph
drawing:
>
> http://en.wikipedia.org/wiki/Force-based_algorithms_(graph_drawing)
>
> I'm interested specifically in Dr. Andreas Noack's LinLog energy model
> -- which performs well w/ community detection:
>
> http://www.informatik.tu-cottbus.de/~an/GD/linlog.html
>
> I have a few examples of a serial implementation here:
>
> http://www.smarttypes.org/
>
> The model maximizes the distance between all nodes while minimizing
> the distance between connected nodes.
>
> Without getting into too much detail, i'm curious if anyone has
> considered using giraph for force directed graph embedding (yet
> another name for it)?
>
> I'm also considering something like http://www.mcs.anl.gov/petsc/ or
> http://www.cs.cmu.edu/~scandal/alg/nbody.html -- which have fast
>

-- 
   Claudio Martella
   claudio.marte...@gmail.com

Re: Graph clustering via LinLog force directed layout

2012-03-07 Thread Timmy Wilson

> while i'd expect from linlog something like an (x,y) coordinate. is
> that correct?

(x,y,z) is also common -- i think of it as extreme dimension reduction
-- after which you're free to use your favorite gis tool --
http://postgis.refractions.net/ -- you can also pull in tools from
statistical mechanics/thermodynamics

Without complicating the algo for efficiency -- the essence of the
energy model is simple -- all nodes repulse each other -- connected
nodes attract each other -- this works for any type of graph -- i'm
using a weighted directed graph -- in which case the weighs influence
the attractive force

It's all very continuous/natural -- living in a 3d world we've build a
lot of tools/methods to process this kind of information

The problem is without efficiency methods like Barnes-Hut we have
Cartesian(x,y) or Cartesian(x,y,z) -- because every vertex influences
every other vertex

An efficient giraph implementation would need 2 layers -- a bipartite
graph -- assuming (x,y,z) reduction -- we would need an Octree graph
interacting dynamically w/ our graph of interest


On Wed, Mar 7, 2012 at 5:00 AM, Claudio Martella
 wrote:
> My personal take is that they do have similar function (they "extract"
> communities), but they have a general different type of output. in
> label propagation you'd end up with an id for each vertex (the vertex
> id that is the centroid for the community each vertex belongs to),
> while i'd expect from linlog something like an (x,y) coordinate. is
> that correct?
>
> On Wed, Mar 7, 2012 at 2:45 AM, Timmy Wilson  wrote:
>> Thank you everyone!
>>
>> I would love to see a comparison of force directed layouts
>> (specifically LinLog) and label propagation.
>>
>> I searched but alas nothing -- they seem to be oddly similar?
>>
>> The current, serial LinLog implementation --
>> http://code.google.com/p/linloglayout/ -- uses Barnes-Hut simulation
>> -- n*log(n):
>>
>> http://en.wikipedia.org/wiki/Barnes%E2%80%93Hut_simulation
>>
>> I guess the root question is -- do you think it's reasonable to use
>> giraph for Barnes-Hut simulation?
>>
>>
>>
>>
>> On Tue, Mar 6, 2012 at 11:34 AM, Claudio Martella
>>  wrote:
>>> Hi,
>>>
>>> I'm not definitely familiar with the algorithm or implementation of
>>> LinLog, I've been just a user. It should be doable with Giraph if you
>>> can express it in terms of message-passing between vertices and
>>> without a dependency on a global view of the graph (except for the
>>> convergence criteria, such as total energy).
>>>
>>> Please consider that Giraph's data model is based on a directed graph,
>>> this should be a quite "interesting" constraint for you, if your
>>> implementation is going to modify energy associated with edges (you'd
>>> have two views over the undirected edge, one in each endpoint).
>>>
>>> In general, a good way of doing community analysis would be to look at
>>> algorithms that belong to the family of label-propagation clustering
>>> algorithms.
>>>
>>>
>>> Hope this helps,
>>> Claudio
>>>
>>> On Tue, Mar 6, 2012 at 3:28 PM, Timmy Wilson  wrote:
 Hi giraph community,

 I'm interested in using giraph for distributed n-body simulation.

 Initially, i'm interested in force directed layouts -- ie, graph drawing:

 http://en.wikipedia.org/wiki/Force-based_algorithms_(graph_drawing)

 I'm interested specifically in Dr. Andreas Noack's LinLog energy model
 -- which performs well w/ community detection:

 http://www.informatik.tu-cottbus.de/~an/GD/linlog.html

 I have a few examples of a serial implementation here:

 http://www.smarttypes.org/

 The model maximizes the distance between all nodes while minimizing
 the distance between connected nodes.

 Without getting into too much detail, i'm curious if anyone has
 considered using giraph for force directed graph embedding (yet
 another name for it)?

 I'm also considering something like http://www.mcs.anl.gov/petsc/ or
 http://www.cs.cmu.edu/~scandal/alg/nbody.html -- which have fast
 n-body simulation implementations (Barnes-Hut + Fast Multipole).

 That said, i think giraph may be a good fit -- curious what the
 community thinks?


 Thanks,
 Timmy Wilson
 Cleveland, OH
>>>
>>>
>>>
>>> --
>>>    Claudio Martella
>>>    claudio.marte...@gmail.com
>
>
>
> --
>    Claudio Martella
>    claudio.marte...@gmail.com

Re: Graph clustering via LinLog force directed layout

2012-03-07 Thread Claudio Martella

My personal take is that they do have similar function (they "extract"
communities), but they have a general different type of output. in
label propagation you'd end up with an id for each vertex (the vertex
id that is the centroid for the community each vertex belongs to),
while i'd expect from linlog something like an (x,y) coordinate. is
that correct?

On Wed, Mar 7, 2012 at 2:45 AM, Timmy Wilson  wrote:
> Thank you everyone!
>
> I would love to see a comparison of force directed layouts
> (specifically LinLog) and label propagation.
>
> I searched but alas nothing -- they seem to be oddly similar?
>
> The current, serial LinLog implementation --
> http://code.google.com/p/linloglayout/ -- uses Barnes-Hut simulation
> -- n*log(n):
>
> http://en.wikipedia.org/wiki/Barnes%E2%80%93Hut_simulation
>
> I guess the root question is -- do you think it's reasonable to use
> giraph for Barnes-Hut simulation?
>
>
>
>
> On Tue, Mar 6, 2012 at 11:34 AM, Claudio Martella
>  wrote:
>> Hi,
>>
>> I'm not definitely familiar with the algorithm or implementation of
>> LinLog, I've been just a user. It should be doable with Giraph if you
>> can express it in terms of message-passing between vertices and
>> without a dependency on a global view of the graph (except for the
>> convergence criteria, such as total energy).
>>
>> Please consider that Giraph's data model is based on a directed graph,
>> this should be a quite "interesting" constraint for you, if your
>> implementation is going to modify energy associated with edges (you'd
>> have two views over the undirected edge, one in each endpoint).
>>
>> In general, a good way of doing community analysis would be to look at
>> algorithms that belong to the family of label-propagation clustering
>> algorithms.
>>
>>
>> Hope this helps,
>> Claudio
>>
>> On Tue, Mar 6, 2012 at 3:28 PM, Timmy Wilson  wrote:
>>> Hi giraph community,
>>>
>>> I'm interested in using giraph for distributed n-body simulation.
>>>
>>> Initially, i'm interested in force directed layouts -- ie, graph drawing:
>>>
>>> http://en.wikipedia.org/wiki/Force-based_algorithms_(graph_drawing)
>>>
>>> I'm interested specifically in Dr. Andreas Noack's LinLog energy model
>>> -- which performs well w/ community detection:
>>>
>>> http://www.informatik.tu-cottbus.de/~an/GD/linlog.html
>>>
>>> I have a few examples of a serial implementation here:
>>>
>>> http://www.smarttypes.org/
>>>
>>> The model maximizes the distance between all nodes while minimizing
>>> the distance between connected nodes.
>>>
>>> Without getting into too much detail, i'm curious if anyone has
>>> considered using giraph for force directed graph embedding (yet
>>> another name for it)?
>>>
>>> I'm also considering something like http://www.mcs.anl.gov/petsc/ or
>>> http://www.cs.cmu.edu/~scandal/alg/nbody.html -- which have fast
>>> n-body simulation implementations (Barnes-Hut + Fast Multipole).
>>>
>>> That said, i think giraph may be a good fit -- curious what the
>>> community thinks?
>>>
>>>
>>> Thanks,
>>> Timmy Wilson
>>> Cleveland, OH
>>
>>
>>
>> --
>>    Claudio Martella
>>    claudio.marte...@gmail.com



-- 
   Claudio Martella
   claudio.marte...@gmail.com

[jira] [Commented] (GIRAPH-144) GiraphJob should not extend Job (users should not be able to call Job methods like waitForCompletion or setMapper..etc)

2012-03-07 Thread Avery Ching (Commented) (JIRA)


[ 
https://issues.apache.org/jira/browse/GIRAPH-144?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13224096#comment-13224096
 ] 

Avery Ching commented on GIRAPH-144:


@Jakob, any more thoughts?

> GiraphJob should not extend Job  (users should not be able to call Job 
> methods like waitForCompletion or setMapper..etc)
> 
>
> Key: GIRAPH-144
> URL: https://issues.apache.org/jira/browse/GIRAPH-144
> Project: Giraph
>  Issue Type: Bug
>Reporter: Dave
>Assignee: Avery Ching
> Attachments: GIRAPH-144.patch
>
>   Original Estimate: 24h
>  Remaining Estimate: 24h
>


--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (GIRAPH-153) HBase/Accumulo Input and Output formats

[jira] [Updated] (GIRAPH-153) HBase/Accumulo Input and Output formats

[jira] [Updated] (GIRAPH-153) HBase/Accumulo Input and Output formats

[jira] [Created] (GIRAPH-153) HBase/Accumulo Input and Output formats

Re: Graph clustering via LinLog force directed layout

Re: Graph clustering via LinLog force directed layout

Re: Graph clustering via LinLog force directed layout

[jira] [Commented] (GIRAPH-144) GiraphJob should not extend Job (users should not be able to call Job methods like waitForCompletion or setMapper..etc)

8 matches

Site Navigation

Mail list logo

Footer information