I will respond from the standpoint of an SGD partisan first.

What I think is needed next is some way to save clusterings as models that
are interoperable with SGD models.  That is, ModelSerializer.readBinary
should return a usable classifier when applied to whatever the clustering
algorithm saved.  This makes clustering models as deployable as SGD models
already are and abstracts away the origin of the clustering model.

It would be a kick if the clustering driver could be merged with the
classification driver (if any) so that it would apply any model supported by
ModelSerializer.readBinary to specified data.

Then as a token offering to the gods of inter-operability it would be kind
of cool if the initial state of k-means or other clustering algorithms could
also be such a serialized model.  That would allow an SGD model to be the
initial state for clustering which would give a vague kind of
semi-supervised learning at little cost.



On Tue, Apr 12, 2011 at 2:57 PM, Jeff Eastman <[email protected]> wrote:

> Hi Ted,
>
> We've been discussing this on and off and I'd like to pick up the thread
> again. Currently we have AbstractVectorClassifier (in pkg classifier) and
> VectorModelClassifier (in pkg clustering). This allows any set of Cluster
> Models (List< Model<VectorWritable>>) to function as a classifier. In your
> last email you indicated this as a step in the right direction. What else is
> needed?
>
> One thought I've had is this: Most clustering algorithms - the older ones
> anyway - have static Driver methods "buildClusters()" and "clusterData()".
> Would it help with the convergence process if these were simply renamed to
> "trainClusters()" and "classifyData()" (or something similar) respectively?
> I know it took me a while to see the isomorphism between clustering and
> classification, so perhaps something simple like this would be an
> improvement.
>
>

Reply via email to