Hi,

some time ago I was starting work on visualization of graph data, stored in
Hadoop via Gephi. A first draft of results is here in this blog post:
http://blog.cloudera.com/blog/2014/05/how-to-manage-time-dependent-multilayer-networks-in-apache-hadoop/
We found, to handle the metadata for graphs and the appropriate
input-converters was the major problem which had to be solved. Now it is
easy to retrieve edge and node lists, even for time dependent graphs. The
current solution works with Hive or Impala to retrieve the data via JDBC.

But I think, it would be great to have an API in Giraph which allows to
trigger a snapshot of the current state of a graph which is processed.
After such a snapshot is done the external tool loads this data, e.g. into
Gephi. Maybe in a second step, we can just load the data from all worker
nodes directly instead of HDFS, but for the beginning it would be fine to
use HDFS to decouple the processing layer and the gui.

In case of really large graphs, I think a Java-Applet using the
"gephi-tools" project could do a great job to render a large graph.

The snapshot could be triggered via Zookeeper. A job registers its ability
to receive such an optional request. And via Zookeeper a client can find
all graphs to look into (based on such a snapshot) and than sends this
request. In the next superstep the job looks for the snapshot status in
Zookeeper, creates one or just precedes and so on. This would even allow to
export time dependent intermediate results from running graph algorithms
without a new start.

What do you think about such a feature? I think it is also related to the
"graph centric API", propsed a while ago.
Is it worth a JIRA and do you see use cases for this feature?

Best wishes,
Mirko

Reply via email to