Re: [MarkLogic Dev General] efficient storage/retrieval scheme

Geert Josten Wed, 02 Mar 2011 13:27:48 -0800

(and pass the result of that into a collection-query..)

-----Oorspronkelijk bericht-----
Van: [email protected] 
[mailto:[email protected]] Namens Geert Josten
Verzonden: woensdag 2 maart 2011 22:23
Aan: General MarkLogic Developer Discussion
Onderwerp: Re: [MarkLogic Dev General] efficient storage/retrieval scheme


Hi Mike,

How about passing a wildcard search for the partial cases to 
cts:collection-match?

Kind regards,
Geert

-----Oorspronkelijk bericht-----
Van: [email protected] 
[mailto:[email protected]] Namens Mike Sokolov
Verzonden: woensdag 2 maart 2011 22:15
Aan: General MarkLogic Developer Discussion
Onderwerp: Re: [MarkLogic Dev General] efficient storage/retrieval scheme

Thank you to everybody who responded.  I ran some tests on 100000 docs 
with some random data.  The upshot is that collection() is about the 
same speed as estimating an element-value-query.  Doing 
element-query(and-query(attribute-query(),..))) was about 5 times slower 
(and estimates are wrong in this case: you have to run filtered).

So I think I would concur w/Mike Blakeley:  collection() (or possibly 
value-query) for the fully-specified case, and a query based on 
attribute values for a single dimension query.  I'm still up in the air 
what to do about intermediate cases (ie query two attributes only).  
We'll see if that's an important use case...

Thanks again!

-Mike

On 03/02/2011 09:30 AM, Mike Sokolov wrote:
> I need to design a data element for our platform with an eye to the most
> efficient possible retrieval of documents in a collection defined by
> this data element.  Assume there could be millions of documents.  It
> will have at least three dimensions: site, content-set, and status;
> these are all completely independent.  None of these are likely to have
> more than a few tens or hundreds of different values: status will have 2
> or 3, definitely less than 10.
>
> I need to be able to retrieve documents based on the values of each
> dimension independently (ie all; documents in content set X), as well as
> (and this could be more typical) a fully-specified vector (content-set,
> site and status)
>
> I can think of several possibilities:
>
> 1. An element whose text includes all three values as words in some
> predefined order:
>
> <collection>cs100 site50 status1</collection>
>
> with word queries for single dimension queries and value (or maybe
> phrase queries?) for joins.
>
> 2. A ML collection whose name is all three values concatenated in some
> order:
>
> collection("cs100-site50-status1")
>
> joins of all three dimensions become a simple collection lookup, and
> cts:collection-match() for single- or dual-dimension queries.
>
> 3. An element with three attributes:
> <collection cs="100" site="50" status="1" />
> This is attractive from the perspective of XML modeling and will expose
> the values neatly for xpath (perhaps we could combine it with one of the
> above), but I'm concerned that:
> cts:element-query(collection, ...) might not be as efficient for retrieval?
> Also: would we need to enable element-position indexes to make this
> accurate as an unfiltered query?
>
> Would anyone care to comment on the "best" design?  Other ideas?
>
> Thanks!
>
>    
_______________________________________________
General mailing list
[email protected]
http://developer.marklogic.com/mailman/listinfo/general
_______________________________________________
General mailing list
[email protected]
http://developer.marklogic.com/mailman/listinfo/general
_______________________________________________
General mailing list
[email protected]
http://developer.marklogic.com/mailman/listinfo/general

Re: [MarkLogic Dev General] efficient storage/retrieval scheme

Reply via email to