I agree that VectorWritable should handle construction of all vector types and it should understand how to do that.
BUT... there is one possible role for sub-classes of VectorWritable. That would be to avoid the otherwise necessary cast of the object that is produced by the VectorWritable. Thus a MumbleVectorWritable would delegate all reading to VectorWritable but would cast the result to a MumbleVector before returning it. That cast would fail if the objects being read don't sub-class MumbleVector and the user code would not need a cast. That isn't a big deal, though, and I would be +epsilon for the final marking since it might be more maintainable in the long run since anybody who hasn't heard this discussion would almost have to look at the comment if they tried to sub-class VectorWritable. On Mon, Sep 13, 2010 at 8:36 AM, Sean Owen <[email protected]> wrote: > No, and that's the issue, really. A file of MultiLableVectorWritable > cannot be read by VectorWritable since the latter does not expect that > extra data. It's not quite a Hadoop issue, but simply that the OO > world's object representation in memory doesn't exactly translate to > serializing to a stream neatly. > > Yes I would mark VectorWritable final. >
