Yes: the idea of adding collections for every possible combination of dimensions seems a bit scary as you add (remove - yipes!) dimensions. Even with only three, you need eight different collections to capture all the possible combinations. It's 2^n in the number of dimensions, I guess, although perhaps that's not an issue if n < 10, which I do expect.
I'll experiment a bit and report back. -Mike On 03/02/2011 11:06 AM, Geert Josten wrote: > +1 > > You might want to consider future change though. What would happen if you > decide you need another facet, or want to drop one? > > Kind regards, > Geert > > -----Oorspronkelijk bericht----- > Van: [email protected] > [mailto:[email protected]] Namens Michael Blakeley > Verzonden: woensdag 2 maart 2011 16:52 > Aan: General MarkLogic Developer Discussion > Onderwerp: Re: [MarkLogic Dev General] efficient storage/retrieval scheme > > I would use approach (2) for three-constraint lookups. Since specifying all > three is "typical", that should be a good strategy. The one-constraint and > two-constraint queries could fall back on XML representation. If (and only > if) that isn't fast enough, add another three collections per document: one > per constraint. > > Range indexes might speed up some use-cases, but I don't see the benefit for > the uses presented, especially the three-constraint lookup. The uri lexicon > would give you the URI, but you still have to call doc(). Calling > collection() directly should be faster. > > -- Mike > > On 2 Mar 2011, at 06:30 , Mike Sokolov wrote: > > >> I need to design a data element for our platform with an eye to the most >> efficient possible retrieval of documents in a collection defined by >> this data element. Assume there could be millions of documents. It >> will have at least three dimensions: site, content-set, and status; >> these are all completely independent. None of these are likely to have >> more than a few tens or hundreds of different values: status will have 2 >> or 3, definitely less than 10. >> >> I need to be able to retrieve documents based on the values of each >> dimension independently (ie all; documents in content set X), as well as >> (and this could be more typical) a fully-specified vector (content-set, >> site and status) >> >> I can think of several possibilities: >> >> 1. An element whose text includes all three values as words in some >> predefined order: >> >> <collection>cs100 site50 status1</collection> >> >> with word queries for single dimension queries and value (or maybe >> phrase queries?) for joins. >> >> 2. A ML collection whose name is all three values concatenated in some >> order: >> >> collection("cs100-site50-status1") >> >> joins of all three dimensions become a simple collection lookup, and >> cts:collection-match() for single- or dual-dimension queries. >> >> 3. An element with three attributes: >> <collection cs="100" site="50" status="1" /> >> This is attractive from the perspective of XML modeling and will expose >> the values neatly for xpath (perhaps we could combine it with one of the >> above), but I'm concerned that: >> cts:element-query(collection, ...) might not be as efficient for retrieval? >> Also: would we need to enable element-position indexes to make this >> accurate as an unfiltered query? >> >> Would anyone care to comment on the "best" design? Other ideas? >> >> Thanks! >> >> -- >> Michael Sokolov >> Engineering Director >> www.ifactory.com >> @iFactoryBoston >> >> PubFactory: the revolutionary e-publishing platform from iFactory >> >> _______________________________________________ >> General mailing list >> [email protected] >> http://developer.marklogic.com/mailman/listinfo/general >> >> > _______________________________________________ > General mailing list > [email protected] > http://developer.marklogic.com/mailman/listinfo/general > _______________________________________________ > General mailing list > [email protected] > http://developer.marklogic.com/mailman/listinfo/general > _______________________________________________ General mailing list [email protected] http://developer.marklogic.com/mailman/listinfo/general
