Re: [MarkLogic Dev General] efficient storage/retrieval scheme

Michael Blakeley Wed, 02 Mar 2011 07:52:16 -0800

I would use approach (2) for three-constraint lookups. Since specifying all 
three is "typical", that should be a good strategy. The one-constraint and 
two-constraint queries could fall back on XML representation. If (and only if) 
that isn't fast enough, add another three collections per document: one per 
constraint.


Range indexes might speed up some use-cases, but I don't see the benefit for 
the uses presented, especially the three-constraint lookup. The uri lexicon 
would give you the URI, but you still have to call doc(). Calling collection() 
directly should be faster.

-- Mike

On 2 Mar 2011, at 06:30 , Mike Sokolov wrote:

> I need to design a data element for our platform with an eye to the most 
> efficient possible retrieval of documents in a collection defined by 
> this data element.  Assume there could be millions of documents.  It 
> will have at least three dimensions: site, content-set, and status; 
> these are all completely independent.  None of these are likely to have 
> more than a few tens or hundreds of different values: status will have 2 
> or 3, definitely less than 10.
> 
> I need to be able to retrieve documents based on the values of each 
> dimension independently (ie all; documents in content set X), as well as 
> (and this could be more typical) a fully-specified vector (content-set, 
> site and status)
> 
> I can think of several possibilities:
> 
> 1. An element whose text includes all three values as words in some 
> predefined order:
> 
> <collection>cs100 site50 status1</collection>
> 
> with word queries for single dimension queries and value (or maybe 
> phrase queries?) for joins.
> 
> 2. A ML collection whose name is all three values concatenated in some 
> order:
> 
> collection("cs100-site50-status1")
> 
> joins of all three dimensions become a simple collection lookup, and 
> cts:collection-match() for single- or dual-dimension queries.
> 
> 3. An element with three attributes:
> <collection cs="100" site="50" status="1" />
> This is attractive from the perspective of XML modeling and will expose 
> the values neatly for xpath (perhaps we could combine it with one of the 
> above), but I'm concerned that:
> cts:element-query(collection, ...) might not be as efficient for retrieval?
> Also: would we need to enable element-position indexes to make this 
> accurate as an unfiltered query?
> 
> Would anyone care to comment on the "best" design?  Other ideas?
> 
> Thanks!
> 
> -- 
> Michael Sokolov
> Engineering Director
> www.ifactory.com
> @iFactoryBoston
> 
> PubFactory: the revolutionary e-publishing platform from iFactory
> 
> _______________________________________________
> General mailing list
> [email protected]
> http://developer.marklogic.com/mailman/listinfo/general
> 

_______________________________________________
General mailing list
[email protected]
http://developer.marklogic.com/mailman/listinfo/general

Re: [MarkLogic Dev General] efficient storage/retrieval scheme

Reply via email to