[
https://issues.apache.org/jira/browse/MAHOUT-710?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13051739#comment-13051739
]
Sebastian Schelter commented on MAHOUT-710:
-------------------------------------------
@Tillmann and Sebastian
I committed your patch with some major modifications. While your generic
writable approach was very elegant, I removed them for more simple writables
and more expressive signatures. I tried to streamline and polish a lot of
things as well as refactor them to use Mahout-ish conventions.
Yet I did not change any piece of the algorithmic basis and I want to again
thank you for that awesome piece of work of yours! Kudos in the name of the
Mahout community!
@Sean
In short words, enumerating triangles (three interconnected nodes) in a graph
is used as a preprocessing step for finding strongly connected "communities" in
a graph. Theoretically you want to find "maximum cliques" in a graph but this
is NP-hard and real cliques are rare in reality. The final goal of this issue
is to create an algorithm to find k-trusses, which are a relaxation of those
cliques. Maybe Tillmann and Sebastian want to write a little more about the
usecase on the mailinglist, I'll ask them.
@Hector
Would be great to see you beta-test this code on large graphs. The running time
should be very dependent on the degree distribution in the graph as the number
of "open triads" (possible triangles with one missing edge) to consider per
vertex is the square of its degree.
> Implementing K-Trusses
> ----------------------
>
> Key: MAHOUT-710
> URL: https://issues.apache.org/jira/browse/MAHOUT-710
> Project: Mahout
> Issue Type: New Feature
> Affects Versions: 0.6
> Reporter: Tillmann Fiehn
> Assignee: Sebastian Schelter
> Attachments:
> 0001-SimplifyGraph_AugmentEdgesWithDegrees_EnumerateTriangles_workfine.patch
>
> Original Estimate: 672h
> Remaining Estimate: 672h
>
> We are Tillmann Fiehn and Sebastian Arnold, IT students from TU Berlin. As
> Sebastian Schelter already announced, we are atteding Isabel's and
> Sebastian's class "Large scale data analysis and data mining" and picked an
> interesting project that we want to implement in Mahout. We are open for any
> hints and suggestions and would appreciate if you could share your thoughts
> on our proposal.
> Our goal is to implement a map/reduce algorithm for finding k-trusses in a
> given graph. A k-truss is a nontrivial, single-component maximal subgraph,
> such that every edge is contained in at least k-2 triangles in the subgraph.
> The algorithm was proposed in the IEEE paper J. Cohen 2009: "Graph Twiddling
> in a MapReduce World"
> (http://www.csee.usf.edu/~anda/CIS6930-S11/papers/graph-processing-w-mapreduce.pdf)
> and involves a number of graph algorithms that are to our knowledge
> currently not present in Mahout:
> *Goal: finding K-Trusses*
> * relaxation of k-member clique
> * non-trivial, single-component maximal subgraph, s.t.
> * every edge is contained in at least k-2 triangles in the subgraph
> *Algorithms to be implemented on top of Mahout / Hadoop:*
> *simplifyGraph*: Edges -> RepresentativeEdges
> * removes Loops (not cycles)
> * aggregate duplicate edges
> *augmentGraphWithDegrees*: RepresentativeEdges -> AugmentedEdges = (Edge (v,
> u) , d(v), d(u))
> * augements the edges with degree information for both nodes d(v) = |{E | E =
> (x,y) a. (x = v o. y = v) }|
> *enumerateTriangles*: AugmentedEdges = (Edge, d(v), d(u)) -> Triangles (v,
> u, s)
> * finds all triangles in a Graph
> *findComponents*: RepresentativeEdges -> ZoneAssignments (v, z)
> * finds all components of a graph, each identified as the order number of the
> lowest-order vertex contained
> * consists of:
> ** step 1: find adjacent zones: Edges x Zones -> InterzoneEdges (z, z)
> ** step 2: merge adjacent zones into one (the lowest-order neighbouring
> zone): InterzoneEdges, ZoneAssignments (v, z) -> Pairs (v, z)
> {noformat}
> while true do:
> step 1
> if empty set interzone edges break;
> step 2
> done
> {noformat}
> *findKTrusses*: Edges, k -> ZoneAssignments (v, z)
> * finds all k-trusses of the graph
> * each returned vertex v is part of a truss z
> {noformat}
> simplifyGraph
> while true do:
> augmentGraphWithDegrees
> enumerateTriangles
> keep only edges contained in k-2 triangles
> if all edges kept break;
> done
> findComponents
> {noformat}
> We suppose to create the package {{org.apache.mahout.graph.trusses}} and
> {{org.apache.mahout.graph.components}} in the {{core}} module.
--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira