Hi Ted,

Sure, consider the following, somewhat simplified implementation. What the various TODOs indicate is the places where encoding/decoding decisions need to be made. Some of these are non-trivial, such as in Attributes where the nameIndex and attributes need to refer to common values. I got bogged down trying to invent more ad-hoc encodings for these things; what would be better is to move them all to a uniform serialization/deserialization approach. There are about 70 references to the asFormatString and asWritableComparable that need to be checked so this is not a trivial change.

We should discuss the alternative techniques (Java Serializable, XML, JSON, ...) and make this change before implementing element labels. That was where I ran out of steam, though I seem to have more now :).

Jeff

=====

class Attributes {
 Map<String, Attribute> nameIndex;
 List<Attribute> attributes;

 void setName(Attribute, String);
 String getName(Attribute);

   public String asFormatString() {
   // TODO: how to encode?
   }

   public static Attributes decodeFormat(String format) {
   // TODO: reverse the encoding
   }
}
class Attribute {
 enum Type {ordinal, numeric};
 Type type;
 List<String> ordinalValues;

   public String asFormatString() {
   // TODO: how to encode?
   }

   public static Attribute decodeFormat(String format) {
   // TODO: reverse the encoding
   }
}

class DenseVector {
   private double[] values;
   private Attributes attributes;

   public Attributes getAttributes() {
       if (attributes == null)
          attributes = new Attributes();
       return attributes;
  }

   public String asFormatString() {
       StringBuilder out = new StringBuilder();
       out.append("[, ");
      // TODO: how to delimit the Attributes?
      out.append(attributes.asFormatString());
      for (int i = 0; i < values.length; i++)
         out.append(values[i]).append(", ");
      out.append("] ");
      return out.toString();
   }

   public static Vector decodeFormat(String formattedString) {
      // TODO: how to isolate the attribute substring?
       String[] pts = formattedString.split(",");
       double[] point = new double[pts.length - 2];
       for (int i = 1; i < pts.length - 1; i++)
       point[i - 1] = Double.parseDouble(pts[i]);
       return new DenseVector(point);
   }




Ted Dunning wrote:
Jeff,

Can you say just a bit more about what richness and what awkwardness that
you ran into?

On Fri, Oct 17, 2008 at 11:33 AM, Jeff Eastman (JIRA) <[EMAIL PROTECTED]>wrote:

   [
https://issues.apache.org/jira/browse/MAHOUT-65?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12640616#action_12640616]

Jeff Eastman commented on MAHOUT-65:
------------------------------------

I did a test implementation of element labeling based upon Karl's
suggestion and Ted's use case. It used a lazy instantiation of a label map
rather than a wrapper that was similar to my earlier patch. The rub comes
when trying to serialize the label map for asFormatString(). Here, the
richness of the labeling semantics made serialization/deserialization of its
state most awkward.




Attachment: PGP.sig
Description: PGP signature

Reply via email to