Re: [MarkLogic Dev General] efficient storage/retrieval scheme

Mike Sokolov Wed, 02 Mar 2011 07:05:31 -0800

Thanks Darin - I do like the idea of using the cts:uris() as the basis 
for the query.


But I think there may be a problem with cts:query you proposed.  My 
documents can be in multiple collections (perhaps I didn't mention 
that).  And I think your query doesn't restrict the matches to be on a 
single collection.


-Mike

On 03/02/2011 09:52 AM, McBeath, Darin W (ELS-STL) wrote:
> Here's my thoughts ... .
>
> Create a URI Lexicon index.
>
> Then, write a query something like below ... you don't need to enable
> any full-text indexes (or any other indexes for that matter) for this
> specific query ... other than the URI Lexicon index.  This query assumes
> you are using the last approach of an element with attributes ... the
> query can be easily adjusted to use individual elements or collections
> instead.
>
> let $results := cts:uris((), (),
>                           cts:and-query((if ($site) then
>
> cts:element-attribute-value-query(xs:QName("ns:collection"),
> xs:QName("ns:site"), $site)
>                                         else (),
>                                         if ($set) then
>
> cts:element-attribute-value-query(xs:QName("ns:collection"),
> xs:QName("ns:set"), $set)
>                                         else (),
>                                         if ($status) then
>
> cts:element-attribute-value-query(xs:QName("ns:collection"),
> xs:QName("ns:status"), $status))))
>
> The above query should return the uris quickly ... granted, I didn't
> verify/test this.  You could then go and retrieve the individual
> documents using the returned uris.
>
> Darin.
>
> -----Original Message-----
> From: [email protected]
> [mailto:[email protected]] On Behalf Of Mike
> Sokolov
> Sent: Wednesday, March 02, 2011 9:30 AM
> To: Mark Logic
> Subject: [MarkLogic Dev General] efficient storage/retrieval scheme
>
> I need to design a data element for our platform with an eye to the most
>
> efficient possible retrieval of documents in a collection defined by
> this data element.  Assume there could be millions of documents.  It
> will have at least three dimensions: site, content-set, and status;
> these are all completely independent.  None of these are likely to have
> more than a few tens or hundreds of different values: status will have 2
>
> or 3, definitely less than 10.
>
> I need to be able to retrieve documents based on the values of each
> dimension independently (ie all; documents in content set X), as well as
>
> (and this could be more typical) a fully-specified vector (content-set,
> site and status)
>
> I can think of several possibilities:
>
> 1. An element whose text includes all three values as words in some
> predefined order:
>
> <collection>cs100 site50 status1</collection>
>
> with word queries for single dimension queries and value (or maybe
> phrase queries?) for joins.
>
> 2. A ML collection whose name is all three values concatenated in some
> order:
>
> collection("cs100-site50-status1")
>
> joins of all three dimensions become a simple collection lookup, and
> cts:collection-match() for single- or dual-dimension queries.
>
> 3. An element with three attributes:
> <collection cs="100" site="50" status="1" />
> This is attractive from the perspective of XML modeling and will expose
> the values neatly for xpath (perhaps we could combine it with one of the
>
> above), but I'm concerned that:
> cts:element-query(collection, ...) might not be as efficient for
> retrieval?
> Also: would we need to enable element-position indexes to make this
> accurate as an unfiltered query?
>
> Would anyone care to comment on the "best" design?  Other ideas?
>
> Thanks!
>
>    
_______________________________________________
General mailing list
[email protected]
http://developer.marklogic.com/mailman/listinfo/general

Re: [MarkLogic Dev General] efficient storage/retrieval scheme

Reply via email to