Hi Ted,Sure, consider the following, somewhat simplified implementation. What the various TODOs indicate is the places where encoding/decoding decisions need to be made. Some of these are non-trivial, such as in Attributes where the nameIndex and attributes need to refer to common values. I got bogged down trying to invent more ad-hoc encodings for these things; what would be better is to move them all to a uniform serialization/deserialization approach. There are about 70 references to the asFormatString and asWritableComparable that need to be checked so this is not a trivial change.
We should discuss the alternative techniques (Java Serializable, XML, JSON, ...) and make this change before implementing element labels. That was where I ran out of steam, though I seem to have more now :).
Jeff ===== class Attributes { Map<String, Attribute> nameIndex; List<Attribute> attributes; void setName(Attribute, String); String getName(Attribute); public String asFormatString() { // TODO: how to encode? } public static Attributes decodeFormat(String format) { // TODO: reverse the encoding }}
class Attribute { enum Type {ordinal, numeric}; Type type; List<String> ordinalValues; public String asFormatString() { // TODO: how to encode? } public static Attribute decodeFormat(String format) { // TODO: reverse the encoding } } class DenseVector { private double[] values; private Attributes attributes; public Attributes getAttributes() { if (attributes == null) attributes = new Attributes(); return attributes; } public String asFormatString() { StringBuilder out = new StringBuilder(); out.append("[, "); // TODO: how to delimit the Attributes? out.append(attributes.asFormatString()); for (int i = 0; i < values.length; i++) out.append(values[i]).append(", "); out.append("] "); return out.toString(); } public static Vector decodeFormat(String formattedString) { // TODO: how to isolate the attribute substring? String[] pts = formattedString.split(","); double[] point = new double[pts.length - 2]; for (int i = 1; i < pts.length - 1; i++) point[i - 1] = Double.parseDouble(pts[i]); return new DenseVector(point); } Ted Dunning wrote:
Jeff, Can you say just a bit more about what richness and what awkwardness that you ran into? On Fri, Oct 17, 2008 at 11:33 AM, Jeff Eastman (JIRA) <[EMAIL PROTECTED]>wrote:[ https://issues.apache.org/jira/browse/MAHOUT-65?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12640616#action_12640616] Jeff Eastman commented on MAHOUT-65: ------------------------------------ I did a test implementation of element labeling based upon Karl's suggestion and Ted's use case. It used a lazy instantiation of a label map rather than a wrapper that was similar to my earlier patch. The rub comes when trying to serialize the label map for asFormatString(). Here, the richness of the labeling semantics made serialization/deserialization of its state most awkward.
PGP.sig
Description: PGP signature