Robin Anil wrote:
other clusters have a notion of center which is the best command line
visualisation one can do?(can you think of another?) . Dirichlet models are
so separate that (not in many cases) having a fixed center doesnt make
sense, so no Cluster dump.
But Dirichlet clusters are clusters produced by one of our clustering
jobs and so it is logical for ClusterDumper to be able to dump them too.
(And, actually, all the models we have so far do have centers). The
Printable interface unifies all clusters with asFormatString(bindings)
and asJsonString(). Just need a small change to ClusterDumper to
incorporate it.
printing in json format is a terrible way to show clusters.
I agree we need a better way to show clusters. Json is useful for some
applications of the current asFormatString() methods (which because of
history now produce Json) where it needs to be complete and usable for
input to something standard. The formatting in VectorDumper and
ClusterBase are working to a use case which favors human-readability
over completeness.
Vectors and Clusters and Models are all Writable and so can be passed
efficiently between processing steps. I think they should all be
Printable too (the latter two are already). That would let us refactor
VectorDumper into AbstractVector and clean up another code duplication.
On Tue, Mar 2, 2010 at 3:16 AM, Jeff Eastman <j...@windwardsolutions.com>wrote:
The loop still needs to be closed in order to unify DirichletCluster under
the ClusterDumper's domain. Specifically, the new Printable interface needs
to replace ClusterBase in printClusters.
Ideally, the VectorDumper utility should be moved to base (or better, the
functionality added to AbstractVector) so that ClusterBase can use it
legally. AbstractVector already supports asFormatString but it returns a
Json string. Printable adds asJsonString for users wanting a printable I/O
representation and asFormatString(bindings) for less formal applications
such as below.
Robin Anil wrote:
It already does this, i think. But floats can be formatted better
On Tue, Mar 2, 2010 at 2:55 AM, Jeff Eastman <j...@windwardsolutions.com
wrote:
And check the asFormatString(bindings) implementation in ClusterBase. It
does this I think, though it has not yet been wired into
ClusterDumper.printClusters. I wanted to give the ClusterDumper users a
chance to critique my formatting but it is like the below.
Jeff
Jake Mannix (JIRA) wrote:
VectorDumper should also do printing to simple {index : value, index :
value, ... } output, if no dictionary is specified.
--------------------------------------------------------------------------------------------------------------------------
Key: MAHOUT-315
URL: https://issues.apache.org/jira/browse/MAHOUT-315
Project: Mahout
Issue Type: Improvement
Affects Versions: 0.2
Reporter: Jake Mannix
Assignee: Jake Mannix
Fix For: 0.4
I've got a patch for this, tied up in other code.