On Fri, Sep 11, 2009 at 12:50:14PM -0600, Nathan Kurz wrote:
> On Fri, Sep 11, 2009 at 11:27 AM, Marvin Humphrey
> <[email protected]> wrote:
> > Huffman coding naming principles dictate that classes whose names are typed
> > most often should have the shortest names. Therefore, instead of locating
> > common classes within sub-trees, we should locate them at the first level --
> > directly underneath Lucy.
>
> I'm only heckling from the sidelines lately, but this produced an
> internal 'ug'.
Heh. I had a hunch that if any email to this list would generate responses,
it would be this one. :\
> > Schema, Doc, QueryParser, and probably Indexer will all descend from
> > Lucy::Obj.
>
> Might it be possible to rename Lucy::Obj to Lucy, so that everything
> is Lucy:: is a Lucy object?
That would degrade the clarity of the C code. Variants of "Lucy" are already
used as the first level namespace differentiator. Adding a "Lucy" type will
double up on our use of "lucy":
lucy_Lucy *dupe = Lucy_Lucy_Clone(thing);
It's the "buffalo buffalo" problem, no? :)
> It's boring, but I really like 'Class::Subclass::SubSubClass' schemes.
There will be too many first-level descendents. Right now KinoSearch has
around 60 classes which extend Obj, not including test classes. If we dump
everything into Lucy/, we'll get a big mess. There's no choice but to break
stuff up into multiple directories by general topic.
And yet, we have a constraint imposed by our C naming scheme. In order to
avoid horrendously long symbols, we only use one level of namespacing
beyond the "lucy" prefix:
lucy_HitCollector *collector = (lucy_HitCollector*)lucy_BitColl_new(bit_vec);
Consider the alternative:
lucy_HitCollector_BitCollector *collector
= (lucy_HitCollector*)lucy_HitColl_BitColl_new(bit_vec);
(Or something like that.)
For this reason, the final component of the class name has to convey the
identity of the class without any other context. Lucy::Search::Query::Term
would be ok for a pure Perl hierarcy, but it won't work for Lucy -- that class
has to end in "TermQuery".
And if we accept that all search-related components are going to start with
Lucy::Search, then an inheritance-driven subclass naming scheme starts to
yield painfully long fully-qualified class names.
"Lucy::Search::HitCollector::BitCollector" is 40 characters; a lot of people
limit their code to 78-80 characters per line, and class names that long start
to cause awkward wrappings. We don't want to have too many of those.
I think the primary principle guiding our class hierarchy organization has to
be grouping by topic, as in Lucene. A 'Class::Subclass::SubSubClass' scheme
just isn't workable.
To be fully consistent with Lucene, though, we'd have to put QueryParser under
Lucy::QueryParser::QueryParser, like Plucene and early versions of KinoSearch
did. That always bugged me, which is why it moved in later versions of
KinoSearch.
But QueryParser could also go under Lucy::Search. Maybe we should try to have
all second-level namespacing represent grouping only? In other words, there
would be no instantiable classes with the pattern Lucy::Xxxx -- only
Lucy::Xxxx::Xxxx and deeper.
That would change my initial proposal to this:
Lucy::Object::Obj
Lucy::Index::Indexer
Lucy::Search::Searcher
Lucy::Search::QueryParser
Lucy::Document::Doc
Lucy::Plan::Schema
It would also imply moving around some other classes I didn't mention in my
original proposal for brevity's sake:
Lucy::Plan::Architecture
Lucy::Plan::FieldType
Lucy::Plan::TextType
Lucy::Plan::FullTextType
Lucy::Plan::StringType
Lucy::Plan::Float32Type
Lucy::Plan::Float64Type
Lucy::Plan::Int32Type
Lucy::Document::HitDoc
If we arrange things this way, at least no subclass is ever located above its
superclass in the hierarchy -- as was the case with Lucy::Searcher subclassing
Lucy::Search::Searchable. They're always at the same level or below.
Lucy::Object::Obj
Lucy::Search::Searchable
Lucy::Search::Searcher
Lucy::Search::PolySearcher
Lucy::Object::Obj
Lucy::Plan::FieldType
Lucy::Plan::TextType
Lucy::Plan::FullTextType
Lucy::Plan::StringType
Additionally, we remove the ambiguity about what the second part of the class
name means -- it's always a grouping. Think of Lucy::Search as LucySearch and
Lucy::Index as LucyIndex, if you like.
Marvin Humphrey