Yes: the idea of adding collections for every possible combination of 
dimensions seems a bit scary as you add (remove - yipes!) dimensions.  
Even with only three, you need eight different collections to capture 
all the possible combinations.  It's 2^n in the number of dimensions, I 
guess, although perhaps that's not an issue if n < 10, which I do expect.

I'll experiment a bit and report back.

-Mike

On 03/02/2011 11:06 AM, Geert Josten wrote:
> +1
>
> You might want to consider future change though. What would happen if you 
> decide you need another facet, or want to drop one?
>
> Kind regards,
> Geert
>
> -----Oorspronkelijk bericht-----
> Van: [email protected] 
> [mailto:[email protected]] Namens Michael Blakeley
> Verzonden: woensdag 2 maart 2011 16:52
> Aan: General MarkLogic Developer Discussion
> Onderwerp: Re: [MarkLogic Dev General] efficient storage/retrieval scheme
>
> I would use approach (2) for three-constraint lookups. Since specifying all 
> three is "typical", that should be a good strategy. The one-constraint and 
> two-constraint queries could fall back on XML representation. If (and only 
> if) that isn't fast enough, add another three collections per document: one 
> per constraint.
>
> Range indexes might speed up some use-cases, but I don't see the benefit for 
> the uses presented, especially the three-constraint lookup. The uri lexicon 
> would give you the URI, but you still have to call doc(). Calling 
> collection() directly should be faster.
>
> -- Mike
>
> On 2 Mar 2011, at 06:30 , Mike Sokolov wrote:
>
>    
>> I need to design a data element for our platform with an eye to the most
>> efficient possible retrieval of documents in a collection defined by
>> this data element.  Assume there could be millions of documents.  It
>> will have at least three dimensions: site, content-set, and status;
>> these are all completely independent.  None of these are likely to have
>> more than a few tens or hundreds of different values: status will have 2
>> or 3, definitely less than 10.
>>
>> I need to be able to retrieve documents based on the values of each
>> dimension independently (ie all; documents in content set X), as well as
>> (and this could be more typical) a fully-specified vector (content-set,
>> site and status)
>>
>> I can think of several possibilities:
>>
>> 1. An element whose text includes all three values as words in some
>> predefined order:
>>
>> <collection>cs100 site50 status1</collection>
>>
>> with word queries for single dimension queries and value (or maybe
>> phrase queries?) for joins.
>>
>> 2. A ML collection whose name is all three values concatenated in some
>> order:
>>
>> collection("cs100-site50-status1")
>>
>> joins of all three dimensions become a simple collection lookup, and
>> cts:collection-match() for single- or dual-dimension queries.
>>
>> 3. An element with three attributes:
>> <collection cs="100" site="50" status="1" />
>> This is attractive from the perspective of XML modeling and will expose
>> the values neatly for xpath (perhaps we could combine it with one of the
>> above), but I'm concerned that:
>> cts:element-query(collection, ...) might not be as efficient for retrieval?
>> Also: would we need to enable element-position indexes to make this
>> accurate as an unfiltered query?
>>
>> Would anyone care to comment on the "best" design?  Other ideas?
>>
>> Thanks!
>>
>> -- 
>> Michael Sokolov
>> Engineering Director
>> www.ifactory.com
>> @iFactoryBoston
>>
>> PubFactory: the revolutionary e-publishing platform from iFactory
>>
>> _______________________________________________
>> General mailing list
>> [email protected]
>> http://developer.marklogic.com/mailman/listinfo/general
>>
>>      
> _______________________________________________
> General mailing list
> [email protected]
> http://developer.marklogic.com/mailman/listinfo/general
> _______________________________________________
> General mailing list
> [email protected]
> http://developer.marklogic.com/mailman/listinfo/general
>    
_______________________________________________
General mailing list
[email protected]
http://developer.marklogic.com/mailman/listinfo/general

Reply via email to