Unsubscribe me. -----Original Message----- From: general-boun...@developer.marklogic.com [mailto:general-boun...@developer.marklogic.com] On Behalf Of general-requ...@developer.marklogic.com Sent: Tuesday, May 23, 2017 9:56 AM To: general@developer.marklogic.com Subject: General Digest, Vol 155, Issue 27
Send General mailing list submissions to general@developer.marklogic.com To subscribe or unsubscribe via the World Wide Web, visit http://developer.marklogic.com/mailman/listinfo/general or, via email, send a message with subject or body 'help' to general-requ...@developer.marklogic.com You can reach the person managing the list at general-ow...@developer.marklogic.com When replying, please edit your Subject line so it is more specific than "Re: Contents of General digest..." Today's Topics: 1. Re: General Digest, Vol 155, Issue 24 (Shiv Shankar) 2. Re: Processing Large Number of Docs to Get Statistics (Erik Hennum) ---------------------------------------------------------------------- Message: 1 Date: Tue, 23 May 2017 09:55:35 -0400 From: Shiv Shankar <shiv.shivshan...@gmail.com> Subject: Re: [MarkLogic Dev General] General Digest, Vol 155, Issue 24 To: MarkLogic Developer Discussion <general@developer.marklogic.com> Message-ID: <cafyr2h6prmjzd9plh85kp7xpkuqgn+kjmk8haygeaq9oavf...@mail.gmail.com> Content-Type: text/plain; charset="utf-8" Hi Justin, Thanks for introducing jsonPropertyReference. I see some of the dob is empty ("") and it is kicking exception. I see a bit of challenging here. I could able aggregate numbers/count by using cts.values, but facing difficulties in aggregating by datewise and grouping them. The scenario is content: {"dob":"1977-06-20", "dob":"", "dob":"1980-06-20","dob":"1977-06-20"} Expected result is {"0": 20, "1": 2, "4": 0}; I am using the below workaround for the age, but we want to achieve similar to this using dob by ignoring any missing dobs. Any help? var ageQuery = cts.andQuery([ cts.elementRangeQuery(xs.QName('age'), ">=", 0), cts.elementRangeQuery(xs.QName('age'), "<=", 100) ]); var result = {}; for (var agegroup of cts.values(cts.elementReference(xs.QName('age')), null, null, query)){ var query = cts.andQuery([ ageQuery, cts.jsonPropertyValueQuery('age', agegroup) ]); result[agegroup] = cts.estimate(query); } result; Thanks Shan. On Tue, May 23, 2017 at 3:24 AM, <general-requ...@developer.marklogic.com> wrote: > Send General mailing list submissions to > general@developer.marklogic.com > > To subscribe or unsubscribe via the World Wide Web, visit > http://developer.marklogic.com/mailman/listinfo/general > or, via email, send a message with subject or body 'help' to > general-requ...@developer.marklogic.com > > You can reach the person managing the list at > general-ow...@developer.marklogic.com > > When replying, please edit your Subject line so it is more specific > than "Re: Contents of General digest..." > > > Today's Topics: > > 1. Search by age wise from dob property (Shiv Shankar) > 2. Re: Search by age wise from dob property (Justin Makeig) > 3. Processing Large Number of Docs to Get Statistics (Eliot Kimber) > 4. Re: Priorities for queries (Geert Josten) > > > ---------------------------------------------------------------------- > > Message: 1 > Date: Mon, 22 May 2017 16:08:11 -0400 > From: Shiv Shankar <shiv.shivshan...@gmail.com> > Subject: [MarkLogic Dev General] Search by age wise from dob property > To: MarkLogic Developer Discussion <general@developer.marklogic.com> > Message-ID: > <CAFyr2H5Y5JR7kVfg4NFa4i-4xH1wvNskJg4reihCyPO6oecXHQ@ > mail.gmail.com> > Content-Type: text/plain; charset="utf-8" > > Hi, > There is a dob json property in the documents and I need to search on > dob based on age wise i.e age > 30, age >30 and age <50. Any samples > to calculate age and compare in the search queries? > > Thanks > Shan. > -------------- next part -------------- An HTML attachment was > scrubbed... > URL: http://developer.marklogic.com/pipermail/general/ > attachments/20170522/34d2f34c/attachment-0001.html > > ------------------------------ > > Message: 2 > Date: Mon, 22 May 2017 20:33:06 +0000 > From: Justin Makeig <justin.mak...@marklogic.com> > Subject: Re: [MarkLogic Dev General] Search by age wise from dob > property > To: MarkLogic Developer Discussion <general@developer.marklogic.com> > Message-ID: <600acf55-4384-4294-bbf9-74700a4a8...@marklogic.com> > Content-Type: text/plain; charset="us-ascii" > > const person = { dob: xs.date("1979-02-03") }; const thrirtyYearsAgo = > fn.currentDate().subtract(xs. > yearMonthDuration("P30Y")); > person.dob < thrirtyYearsAgo; // true > > You can do date math with xs.duration types. In the above case, I'm > subtracting 30 years from the current date. > xs.date.prototype.subtract() returns an xs.date. You can compare that > xs.date to any other xs.date. To do this comparison in MarkLogic's > indexes you'll need to create a range index > <https://docs.marklogic.com/guide/concepts/indexing#id_51573>. A range > index, as its name implies, queries efficiently for ranges of typed values, > for example, dates less than thirty ago from today. > > cts.rangeQuery(cts.jsonPropertyReference('dob'), '<', > thrirtyYearsAgo); // requires a range index of type xs:date on the dob > JSON property > > > Justin > > > > On May 22, 2017, at 1:08 PM, Shiv Shankar > > <shiv.shivshan...@gmail.com> > wrote: > > > > Hi, > > There is a dob json property in the documents and I need to search > > on > dob based on age wise i.e age > 30, age >30 and age <50. Any samples > to calculate age and compare in the search queries? > > > > Thanks > > Shan. > > > > > > > > _______________________________________________ > > General mailing list > > General@developer.marklogic.com > > Manage your subscription at: > > http://developer.marklogic.com/mailman/listinfo/general > > > > ------------------------------ > > Message: 3 > Date: Mon, 22 May 2017 22:43:26 -0500 > From: Eliot Kimber <ekim...@contrext.com> > Subject: [MarkLogic Dev General] Processing Large Number of Docs to > Get Statistics > To: MarkLogic Developer Discussion <general@developer.marklogic.com> > Message-ID: <bdf9d2b1-c160-455d-b836-bc11c1db7...@contrext.com> > Content-Type: text/plain; charset="UTF-8" > > I haven?t yet seen anything in the docs that directly address what I?m > trying to do and suspect I?m simply missing some ML basics or just > going about things the wrong way. > > I have a corpus of several hundred thousand docs (but could be > millions, of course), where each doc is an average of 200K and several > thousand elements. > > I want to analyze the corpus to get details about the number of > specific subelements within each document, e.g.: > > > for $article in cts:search(/Article, cts:directory-query("/Default/", > "infinity"))[$start to $end] > return <article-counts id=?{$article/@id}? > paras=?{count($article//p}?/> > > I?m running this as a query from Oxygen (so I can capture the results > locally so I can do other stuff with them). > > On the server I?m using I blow the expanded tree cache if I try to > request more than about 20,000 docs. > > Is there a way to do this kind of processing over an arbitrarily large > set > *and* get the results back from a single query request? > > I think the only solution is to write the results to back to the > database and then fetch that as the last thing but I was hoping there > was something simpler. > > Have I missed an obvious solution? > > Thanks, > > Eliot > > -- > Eliot Kimber > http://contrext.com > > > > > > > ------------------------------ > > Message: 4 > Date: Tue, 23 May 2017 07:24:31 +0000 > From: Geert Josten <geert.jos...@marklogic.com> > Subject: Re: [MarkLogic Dev General] Priorities for queries > To: MarkLogic Developer Discussion <general@developer.marklogic.com> > Message-ID: <d549af3b.117ddb%geert.jos...@marklogic.com> > Content-Type: text/plain; charset="windows-1252" > > Hi Oleksii, > > If you use xdmp:spawn or xdmp:spawn-function, you would be able to use > the <priority> option. It takes ?normal? and ?higher? as values. These > priorities have separate queues and worker threads, so they should > interfere less with each other. > > It might also be worth looking into a way to push out low priority > work to a dedicated host for longer running tasks. You could do that > by writing such queries to the database, have a schedule running on > that particular host monitor for such tasks, which picks them up 1 by > 1, and writes back results once done. It might be easiest to switch > around script queries to an asynchronous process that polls regularly > to see if results have been written. Makes sense? > > Cheers, > Geert > > From: <general-boun...@developer.marklogic.com<mailto:general- > boun...@developer.marklogic.com>> on behalf of Oleksii Segeda < > oseg...@worldbankgroup.org<mailto:oseg...@worldbankgroup.org>> > Reply-To: MarkLogic Developer Discussion > <general@developer.marklogic.com > <mailto:general@developer.marklogic.com>> > Date: Monday, May 22, 2017 at 8:59 PM > To: "general@developer.marklogic.com<mailto:general@developer. > marklogic.com>" <general@developer.marklogic.com<mailto:general@developer. > marklogic.com>> > Subject: [MarkLogic Dev General] Priorities for queries > > Hi, > > Is there a way to give a lower priority to certain queries? We have > two different types of API consumers ? real users and various scripts. > No matter how often scripts are hitting endpoints or how ?heavy? are > their queries, they should not affect API performance for real users. > In other words, scripts are tolerant of high latency, but users are not. > > Regards, > > Oleksii Segeda > > IT Analyst > > Information and Technology Solutions > > W > > www.worldbank.org<http://www.worldbank.org/> > > [http://siteresources.worldbank.org/NEWS/Images/spacer.png] > > [http://siteresources.worldbank.org/NEWS/Images/WBG_ > Information_and_Technology_Solutions.png] > > > > -------------- next part -------------- An HTML attachment was > scrubbed... > URL: http://developer.marklogic.com/pipermail/general/ > attachments/20170523/c01547ba/attachment.html > -------------- next part -------------- A non-text attachment was > scrubbed... > Name: image003.png > Type: image/png > Size: 6577 bytes > Desc: image003.png > Url : http://developer.marklogic.com/pipermail/general/ > attachments/20170523/c01547ba/attachment.png > -------------- next part -------------- A non-text attachment was > scrubbed... > Name: image002.png > Type: image/png > Size: 170 bytes > Desc: image002.png > Url : http://developer.marklogic.com/pipermail/general/ > attachments/20170523/c01547ba/attachment-0001.png > > ------------------------------ > > _______________________________________________ > General mailing list > General@developer.marklogic.com > Manage your subscription at: > http://developer.marklogic.com/mailman/listinfo/general > > > End of General Digest, Vol 155, Issue 24 > **************************************** > -------------- next part -------------- An HTML attachment was scrubbed... URL: http://developer.marklogic.com/pipermail/general/attachments/20170523/bf09ba37/attachment-0001.html ------------------------------ Message: 2 Date: Tue, 23 May 2017 13:56:00 +0000 From: Erik Hennum <erik.hen...@marklogic.com> Subject: Re: [MarkLogic Dev General] Processing Large Number of Docs to Get Statistics To: MarkLogic Developer Discussion <general@developer.marklogic.com> Message-ID: <dfdf2fd50bf5aa42adaf93ff2e3ca1850c7f4...@exchg10-be02.marklogic.com> Content-Type: text/plain; charset="iso-8859-1" Hi, Eliot: On reflection, let me retract the range index suggestion. I wasn't considering the domain implied by the element names -- it would never make sense to blow out a range index with the value of all of the paragraphs. The TDE suggestion for MarkLogic 9 would still work, however, because you could have an xs:short column with a value of 1 for every paragraph. Erik Hennum ________________________________________ From: general-boun...@developer.marklogic.com [general-boun...@developer.marklogic.com] on behalf of Erik Hennum [erik.hen...@marklogic.com] Sent: Tuesday, May 23, 2017 6:21 AM To: MarkLogic Developer Discussion Subject: Re: [MarkLogic Dev General] Processing Large Number of Docs to Get Statistics Hi, Eliot: One alternative to Geert's good suggestion -- if and only if the number of element names is small and you can create range indexes on them: * add an element attribute range index on Article/@id * add an element range index on p * execute a cts:value-tuples() call with the constraining element query and directory query * iterate over the tuples, incrementing the value of the id in a map * remove the range index on p In MarkLogic 9, that approach gets simpler. You can just use TDE to project rows with columns for the id and element, group on the id column, and count the rows in the group. Hoping that's useful (and salutations in passing), Erik Hennum ________________________________________ From: general-boun...@developer.marklogic.com [general-boun...@developer.marklogic.com] on behalf of Geert Josten [geert.jos...@marklogic.com] Sent: Tuesday, May 23, 2017 12:53 AM To: MarkLogic Developer Discussion Subject: Re: [MarkLogic Dev General] Processing Large Number of Docs to Get Statistics Hi Eliot, I?d consider using taskbot (http://registry.demo.marklogic.com/package/taskbot), and using that in combination with either $tb:OPTIONS-SYNC or $tb:OPTIONS-SYNC-UPDATE. It will make optimal use of the TaskServer of the host on which you initiate the call. It doesn?t scale endlessly, but it batches up the work automatically for you, and will get you a lot further fairly easily.. Cheers, Geert On 5/23/17, 5:43 AM, "general-boun...@developer.marklogic.com on behalf of Eliot Kimber" <general-boun...@developer.marklogic.com on behalf of ekim...@contrext.com> wrote: >I haven?t yet seen anything in the docs that directly address what I?m >trying to do and suspect I?m simply missing some ML basics or just >going about things the wrong way. > >I have a corpus of several hundred thousand docs (but could be >millions, of course), where each doc is an average of 200K and several >thousand elements. > >I want to analyze the corpus to get details about the number of >specific subelements within each document, e.g.: > > >for $article in cts:search(/Article, cts:directory-query("/Default/", >"infinity"))[$start to $end] > return <article-counts id=?{$article/@id}? >paras=?{count($article//p}?/> > >I?m running this as a query from Oxygen (so I can capture the results >locally so I can do other stuff with them). > >On the server I?m using I blow the expanded tree cache if I try to >request more than about 20,000 docs. > >Is there a way to do this kind of processing over an arbitrarily large >set *and* get the results back from a single query request? > >I think the only solution is to write the results to back to the >database and then fetch that as the last thing but I was hoping there >was something simpler. > >Have I missed an obvious solution? > >Thanks, > >Eliot > >-- >Eliot Kimber >http://contrext.com > > > > >_______________________________________________ >General mailing list >General@developer.marklogic.com >Manage your subscription at: >http://developer.marklogic.com/mailman/listinfo/general _______________________________________________ General mailing list General@developer.marklogic.com Manage your subscription at: http://developer.marklogic.com/mailman/listinfo/general _______________________________________________ General mailing list General@developer.marklogic.com Manage your subscription at: http://developer.marklogic.com/mailman/listinfo/general ------------------------------ _______________________________________________ General mailing list General@developer.marklogic.com Manage your subscription at: http://developer.marklogic.com/mailman/listinfo/general End of General Digest, Vol 155, Issue 27 **************************************** This electronic mail (including any attachments) may contain information that is privileged, confidential, and/or otherwise protected from disclosure to anyone other than its intended recipient(s). Any dissemination or use of this electronic email or its contents (including any attachments) by persons other than the intended recipient(s) is strictly prohibited. If you have received this message in error, please notify the sender by reply email and delete the original message (including any attachments) in its entirety. _______________________________________________ General mailing list General@developer.marklogic.com Manage your subscription at: http://developer.marklogic.com/mailman/listinfo/general