Re: revisit naming for grouping/join?

Chris Hostetter Mon, 04 Jul 2011 10:35:09 -0700

: In my example the city was parent -- I raised this example to explain
: that index-time joining is more general than just nested docs (ie, I
: think we should keep the name "join" for this module... also because
: we should factor out more general search-time-only join capabilities
: into it).


i think that may be the wrong approach to take when discussing "examples", 
while it's great to say there are dozens of usecases that these features 
can all support in dozens of diff ways" we should relaly focus on 
naming/deming these use cases in the ways where they really make the most 
sense.

In otherwords, i don't think we should say "All of these types of problems 
are different types of nails, and all of these modules are specialty 
hammers that are slightly distinct from eachother in how they work, but 
you can use any of these hammers on any of these nails"  instead we should 
say "here are some specialty hammers, you can use them for lots of 
types of nails, ut for each hammer here is the type of nail where it 
really shines"


"block-index-join" as i understand it requires all the docs you want to 
join up to be in one contigious range of docids in the index, so if you want to 
re-index one doc in a block you have to re-index the entire block -- so 
the city/doctor example doesn't sound like a good generic example of 
when/why to use this (because a doctor might change his office 
hours, or address -- maybe even leavong the city completely, while a 
city might change it's population w/o the doctor being affected at all.

The "book and pages" example seems much more appropriate, since in the 
real world these things change in lock step -- pages aren't added/removed to 
a book; pages don't change w/o the book itself being fundementally 
changed.  the fields of a page document are the text of that page, and 
that is inheriently data about the book -- the fields of a doctor 
document are metadata about the doctor, and that is not inheriently data 
about the city the doctor lives in.

as for the name ... i understand why it's called "module/join" and i 
understand why the classes are called "BlockJoinQuery" and 
"BlockJoinCollector" but i don't think those names really stand out and 
convey to end users what they do and how/why they are useful.

Personally i think better names would be "modules/subdocuments", 
"ParentDocumentQuery" and "ChildDocumentsCollector"

I know mcccandless isn't a fan of the name "Nested Documents" because this 
functionality *can* be used for use cases where the data being modeled is 
not strictly organized in a nested relationship, but that doesn't mean 
it's *optimal* or easy for a user to apply to other usecases, because they 
have to design their model (and their indexing strategy) in such a way 
that they think them as nested or hierarchical documents.  

Naming it "module/subdocuments" would not only emphasis the usecase where 
it really shines, it would more importantly draw attention to how users 
have to model their data in order to take advantage of it -- and using 
"ParentDocument" and "ChildDocuments" in the names of the Query/Collector 
would make it clear what they "match" on relative the underlying query 
that they wrap/collect

it would also help distibguish from more general joins like what solr 
does today -- it seems like that should eventually take the name 
"module/join"

At a minum we should rename what we have now "modules/block-join" or 
"modules/index-join" (but the later is confusing) and eventually add 
"modules/query-join"  (yes, yes, block joins provide a query, btu the 
differnce is when you you have to make a decision about how you want to 
join your model, at index time or at query time.


-Hoss

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Re: revisit naming for grouping/join?

Reply via email to