Robin Anil wrote:
other clusters have a notion of center which is the best command line
visualisation one can do?(can you think of another?) . Dirichlet models are
so separate that (not in many cases)  having a fixed center doesnt make
sense, so no Cluster dump.
But Dirichlet clusters are clusters produced by one of our clustering jobs and so it is logical for ClusterDumper to be able to dump them too. (And, actually, all the models we have so far do have centers). The Printable interface unifies all clusters with asFormatString(bindings) and asJsonString(). Just need a small change to ClusterDumper to incorporate it.
printing in json format is a terrible way to show clusters.
I agree we need a better way to show clusters. Json is useful for some applications of the current asFormatString() methods (which because of history now produce Json) where it needs to be complete and usable for input to something standard. The formatting in VectorDumper and ClusterBase are working to a use case which favors human-readability over completeness.

Vectors and Clusters and Models are all Writable and so can be passed efficiently between processing steps. I think they should all be Printable too (the latter two are already). That would let us refactor VectorDumper into AbstractVector and clean up another code duplication.

On Tue, Mar 2, 2010 at 3:16 AM, Jeff Eastman <j...@windwardsolutions.com>wrote:

The loop still needs to be closed in order to unify DirichletCluster under
the ClusterDumper's domain. Specifically, the new Printable interface needs
to replace ClusterBase in printClusters.

Ideally, the VectorDumper utility should be moved to base (or better, the
functionality added to AbstractVector) so that ClusterBase can use it
legally. AbstractVector already supports asFormatString but it returns a
Json string. Printable adds asJsonString for users wanting a printable I/O
representation and asFormatString(bindings) for less formal applications
such as below.



Robin Anil wrote:

It already does this, i think. But floats can be formatted better

On Tue, Mar 2, 2010 at 2:55 AM, Jeff Eastman <j...@windwardsolutions.com
wrote:

And check the asFormatString(bindings) implementation in ClusterBase. It
does this I think, though it has not yet been wired into
ClusterDumper.printClusters.  I wanted to give the ClusterDumper users a
chance to critique my formatting but it is like the below.

Jeff



Jake Mannix (JIRA) wrote:



VectorDumper should also do printing to simple {index : value, index :
value, ... } output, if no dictionary is specified.


--------------------------------------------------------------------------------------------------------------------------

               Key: MAHOUT-315
               URL: https://issues.apache.org/jira/browse/MAHOUT-315
           Project: Mahout
        Issue Type: Improvement
  Affects Versions: 0.2
          Reporter: Jake Mannix
          Assignee: Jake Mannix
           Fix For: 0.4


I've got a patch for this, tied up in other code.







Reply via email to