TDE is Template Driven Extraction. Short version: you define templates, matching data goes straight into the indexes without you having to modify your document structure. Tutorial: http://developer.marklogic.com/learn/template-driven-extraction
-- Dave Cassel, @dmcassel <https://twitter.com/dmcassel> Technical Community Manager MarkLogic Corporation <http://www.marklogic.com/> http://developer.marklogic.com/ On 5/23/17, 7:30 AM, "general-boun...@developer.marklogic.com on behalf of Eliot Kimber" <general-boun...@developer.marklogic.com on behalf of ekim...@contrext.com> wrote: > >What is TDE? I’m not conversant with ML 9 features yet. > >Also, I’m currently working against an ML 4.2 server (don’t ask). > >TaskBot looks like just what I need but docs say it requires ML 7+ but >could possibly be made to work with earlier releases. If someone can >point me in the right direction I can take a stab at making it work with >ML 4. > >Thanks, > >Eliot >-- >Eliot Kimber >http://contrext.com > > > > >On 5/23/17, 8:56 AM, "general-boun...@developer.marklogic.com on behalf >of Erik Hennum" <general-boun...@developer.marklogic.com on behalf of >erik.hen...@marklogic.com> wrote: > > Hi, Eliot: > > On reflection, let me retract the range index suggestion. I wasn't >considering > the domain implied by the element names -- it would never make sense > to blow out a range index with the value of all of the paragraphs. > > The TDE suggestion for MarkLogic 9 would still work, however, because >you > could have an xs:short column with a value of 1 for every paragraph. > > > Erik Hennum > > ________________________________________ > From: general-boun...@developer.marklogic.com >[general-boun...@developer.marklogic.com] on behalf of Erik Hennum >[erik.hen...@marklogic.com] > Sent: Tuesday, May 23, 2017 6:21 AM > To: MarkLogic Developer Discussion > Subject: Re: [MarkLogic Dev General] Processing Large Number of Docs >to Get Statistics > > Hi, Eliot: > > One alternative to Geert's good suggestion -- if and only if the >number > of element names is small and you can create range indexes on them: > > * add an element attribute range index on Article/@id > * add an element range index on p > * execute a cts:value-tuples() call with the constraining element >query and directory query > * iterate over the tuples, incrementing the value of the id in a map > * remove the range index on p > > In MarkLogic 9, that approach gets simpler. You can just use TDE > to project rows with columns for the id and element, group on > the id column, and count the rows in the group. > > Hoping that's useful (and salutations in passing), > > > Erik Hennum > > ________________________________________ > From: general-boun...@developer.marklogic.com >[general-boun...@developer.marklogic.com] on behalf of Geert Josten >[geert.jos...@marklogic.com] > Sent: Tuesday, May 23, 2017 12:53 AM > To: MarkLogic Developer Discussion > Subject: Re: [MarkLogic Dev General] Processing Large Number of Docs >to Get Statistics > > Hi Eliot, > > I¹d consider using taskbot > (http://registry.demo.marklogic.com/package/taskbot), and using that >in > combination with either $tb:OPTIONS-SYNC or $tb:OPTIONS-SYNC-UPDATE. >It > will make optimal use of the TaskServer of the host on which you >initiate > the call. It doesn¹t scale endlessly, but it batches up the work > automatically for you, and will get you a lot further fairly easily.. > > Cheers, > Geert > > On 5/23/17, 5:43 AM, "general-boun...@developer.marklogic.com on >behalf of > Eliot Kimber" <general-boun...@developer.marklogic.com on behalf of > ekim...@contrext.com> wrote: > > >I haven¹t yet seen anything in the docs that directly address what >I¹m > >trying to do and suspect I¹m simply missing some ML basics or just >going > >about things the wrong way. > > > >I have a corpus of several hundred thousand docs (but could be >millions, > >of course), where each doc is an average of 200K and several thousand > >elements. > > > >I want to analyze the corpus to get details about the number of >specific > >subelements within each document, e.g.: > > > > > >for $article in cts:search(/Article, cts:directory-query("/Default/", > >"infinity"))[$start to $end] > > return <article-counts id=²{$article/@id}² > >paras=²{count($article//p}²/> > > > >I¹m running this as a query from Oxygen (so I can capture the results > >locally so I can do other stuff with them). > > > >On the server I¹m using I blow the expanded tree cache if I try to > >request more than about 20,000 docs. > > > >Is there a way to do this kind of processing over an arbitrarily >large > >set *and* get the results back from a single query request? > > > >I think the only solution is to write the results to back to the >database > >and then fetch that as the last thing but I was hoping there was > >something simpler. > > > >Have I missed an obvious solution? > > > >Thanks, > > > >Eliot > > > >-- > >Eliot Kimber > >http://contrext.com > > > > > > > > > >_______________________________________________ > >General mailing list > >General@developer.marklogic.com > >Manage your subscription at: > >http://developer.marklogic.com/mailman/listinfo/general > > _______________________________________________ > General mailing list > General@developer.marklogic.com > Manage your subscription at: > http://developer.marklogic.com/mailman/listinfo/general > _______________________________________________ > General mailing list > General@developer.marklogic.com > Manage your subscription at: > http://developer.marklogic.com/mailman/listinfo/general > _______________________________________________ > General mailing list > General@developer.marklogic.com > Manage your subscription at: > http://developer.marklogic.com/mailman/listinfo/general > > > > > > >_______________________________________________ >General mailing list >General@developer.marklogic.com >Manage your subscription at: >http://developer.marklogic.com/mailman/listinfo/general _______________________________________________ General mailing list General@developer.marklogic.com Manage your subscription at: http://developer.marklogic.com/mailman/listinfo/general