Hi Shannon,

  I can't speak to your second question, about non-maximal suppression,
but as for package management, a good rule of thumb is this: for math stuff,
if it depends on Hadoop, then it has to go in the "core" maven module, but
package structure is fairly flexible after that (we can always move things
around later) - putting most of it in o.a.m.clustering.spectral, but some in
o.a.m.math.hadoop (for things which are general operations on a
DistributedRowMatrix) sounds fine (it doesn't all need to go in the same
place).

  -jake

On Wed, Jul 14, 2010 at 6:26 PM, Shannon Quinn <[email protected]> wrote:

> Hi all,
>
> Just a couple questions:
>
> 1) The first and second iterations of my algorithm are, for all practical
> purposes, two independent spectral clustering algorithms. As such, I'd like
> to try and keep both intact. However, since many of their computations are
> identical, I have been considering different ways for re-using the
> utilities
> they share. One method Isabel suggested was to create a superpackage
> "o.a.m.clustering.spectral" with subpackages for the algorithms and shared
> utilities. Another method - the reason I wanted to pose this question to
> the
> list - is making these utilities global, perhaps in the mahout.math
> package.
> These utilities include a map/reduce task for reading an input graph into a
> DistributedRowMatrix, a task for creating a diagonal DRM by summing rows of
> an input DRM, and a task for converting the rows of DRM to unit vector
> length. Would these tasks be useful on a global level? Or should I stick to
> keeping them in my own subpackage?
>
> 2) Is anyone familiar with non-maximal suppression, specifically within the
> context of graphs? I'm having a difficult time wrapping my mind around this
> step. Given a sensitivity S_ij (which is a function of edge weights within
> a
> graph, or in this case, a value within a matrix), it needs to be suppressed
> if there is a strictly more negative value S_mj or S_in for some vertex v_m
> in the neighborhood of v_j, or some v_n in the neighborhood of v_i. To me,
> this seems simply a case of finding all nodes connected to the vertices v_i
> and v_j, and if any of the edges to those other vertices yield a S_mj (if
> connected to v_j) or S_in (if connected to v_i) that is less than S_ij,
> then
> S_ij is to be "suppressed". Which, from what I can tell, is simply a method
> for flagging something; the value isn't changed if it is suppressed.
>
> tl;dr version: within an immediate neighborhood of nodes, we're looking for
> the local minimum, and flagging everything that isn't the local minimum. Is
> this accurate?
>
> Thanks!
>
> Regards,
> Shannon
>

Reply via email to