[MarkLogic Dev General] Unsubscribe

Hanumantharayappa, Shanthamurthy Tue, 23 May 2017 06:57:33 -0700

Unsubscribe me.

-----Original Message-----
From: general-boun...@developer.marklogic.com 
[mailto:general-boun...@developer.marklogic.com] On Behalf Of 
general-requ...@developer.marklogic.com
Sent: Tuesday, May 23, 2017 9:56 AM
To: general@developer.marklogic.com
Subject: General Digest, Vol 155, Issue 27


Send General mailing list submissions to
general@developer.marklogic.com

To subscribe or unsubscribe via the World Wide Web, visit
http://developer.marklogic.com/mailman/listinfo/general
or, via email, send a message with subject or body 'help' to
general-requ...@developer.marklogic.com

You can reach the person managing the list at
general-ow...@developer.marklogic.com

When replying, please edit your Subject line so it is more specific than "Re: 
Contents of General digest..."


Today's Topics:

   1. Re: General Digest, Vol 155, Issue 24 (Shiv Shankar)
   2. Re: Processing Large Number of Docs to Get Statistics
      (Erik Hennum)


----------------------------------------------------------------------

Message: 1
Date: Tue, 23 May 2017 09:55:35 -0400
From: Shiv Shankar <shiv.shivshan...@gmail.com>
Subject: Re: [MarkLogic Dev General] General Digest, Vol 155, Issue 24
To: MarkLogic Developer Discussion <general@developer.marklogic.com>
Message-ID:
<cafyr2h6prmjzd9plh85kp7xpkuqgn+kjmk8haygeaq9oavf...@mail.gmail.com>
Content-Type: text/plain; charset="utf-8"

Hi Justin,
Thanks for introducing jsonPropertyReference. I see some of the dob is empty 
("") and it is kicking exception.
I see a bit of challenging here. I could able aggregate numbers/count by using 
cts.values, but facing difficulties in aggregating by datewise and grouping 
them.

The scenario is
content: {"dob":"1977-06-20", "dob":"",
"dob":"1980-06-20","dob":"1977-06-20"}

Expected result is {"0": 20, "1": 2, "4": 0};

I am using the below workaround for the age, but we want to achieve similar to 
this using dob by ignoring any missing dobs. Any help?

var ageQuery = cts.andQuery([
    cts.elementRangeQuery(xs.QName('age'), ">=", 0),
    cts.elementRangeQuery(xs.QName('age'), "<=", 100)
  ]);
var result = {};
for (var agegroup of cts.values(cts.elementReference(xs.QName('age')),
null, null, query)){
  var query = cts.andQuery([
    ageQuery,
    cts.jsonPropertyValueQuery('age', agegroup)
  ]);
  result[agegroup] = cts.estimate(query); } result;



Thanks
Shan.


On Tue, May 23, 2017 at 3:24 AM, <general-requ...@developer.marklogic.com>
wrote:

> Send General mailing list submissions to
>         general@developer.marklogic.com
>
> To subscribe or unsubscribe via the World Wide Web, visit
>         http://developer.marklogic.com/mailman/listinfo/general
> or, via email, send a message with subject or body 'help' to
>         general-requ...@developer.marklogic.com
>
> You can reach the person managing the list at
>         general-ow...@developer.marklogic.com
>
> When replying, please edit your Subject line so it is more specific
> than "Re: Contents of General digest..."
>
>
> Today's Topics:
>
>    1. Search by age wise from dob property (Shiv Shankar)
>    2. Re: Search by age wise from dob property (Justin Makeig)
>    3. Processing Large Number of Docs to Get    Statistics (Eliot Kimber)
>    4. Re: Priorities for queries (Geert Josten)
>
>
> ----------------------------------------------------------------------
>
> Message: 1
> Date: Mon, 22 May 2017 16:08:11 -0400
> From: Shiv Shankar <shiv.shivshan...@gmail.com>
> Subject: [MarkLogic Dev General] Search by age wise from dob property
> To: MarkLogic Developer Discussion <general@developer.marklogic.com>
> Message-ID:
>         <CAFyr2H5Y5JR7kVfg4NFa4i-4xH1wvNskJg4reihCyPO6oecXHQ@
> mail.gmail.com>
> Content-Type: text/plain; charset="utf-8"
>
> Hi,
> There is a dob json property in the documents and I need to search on
> dob based on age wise i.e age > 30, age >30 and age <50.  Any samples
> to calculate age and compare in the search queries?
>
> Thanks
> Shan.
> -------------- next part -------------- An HTML attachment was
> scrubbed...
> URL: http://developer.marklogic.com/pipermail/general/
> attachments/20170522/34d2f34c/attachment-0001.html
>
> ------------------------------
>
> Message: 2
> Date: Mon, 22 May 2017 20:33:06 +0000
> From: Justin Makeig <justin.mak...@marklogic.com>
> Subject: Re: [MarkLogic Dev General] Search by age wise from dob
>         property
> To: MarkLogic Developer Discussion <general@developer.marklogic.com>
> Message-ID: <600acf55-4384-4294-bbf9-74700a4a8...@marklogic.com>
> Content-Type: text/plain; charset="us-ascii"
>
> const person = { dob: xs.date("1979-02-03") }; const thrirtyYearsAgo =
> fn.currentDate().subtract(xs.
> yearMonthDuration("P30Y"));
> person.dob < thrirtyYearsAgo; // true
>
> You can do date math with xs.duration types. In the above case, I'm
> subtracting 30 years from the current date.
> xs.date.prototype.subtract() returns an xs.date. You can compare that
> xs.date to any other xs.date. To do this comparison in MarkLogic's
> indexes you'll need to create a range index
> <https://docs.marklogic.com/guide/concepts/indexing#id_51573>. A range
> index, as its name implies, queries efficiently for ranges of typed values, 
> for example, dates less than thirty ago from today.
>
> cts.rangeQuery(cts.jsonPropertyReference('dob'), '<',
> thrirtyYearsAgo); // requires a range index of type xs:date on the dob
> JSON property
>
>
> Justin
>
>
> > On May 22, 2017, at 1:08 PM, Shiv Shankar
> > <shiv.shivshan...@gmail.com>
> wrote:
> >
> > Hi,
> > There is a dob json property in the documents and I need to search
> > on
> dob based on age wise i.e age > 30, age >30 and age <50.  Any samples
> to calculate age and compare in the search queries?
> >
> > Thanks
> > Shan.
> >
> >
> >
> > _______________________________________________
> > General mailing list
> > General@developer.marklogic.com
> > Manage your subscription at:
> > http://developer.marklogic.com/mailman/listinfo/general
>
>
>
> ------------------------------
>
> Message: 3
> Date: Mon, 22 May 2017 22:43:26 -0500
> From: Eliot Kimber <ekim...@contrext.com>
> Subject: [MarkLogic Dev General] Processing Large Number of Docs to
>         Get     Statistics
> To: MarkLogic Developer Discussion <general@developer.marklogic.com>
> Message-ID: <bdf9d2b1-c160-455d-b836-bc11c1db7...@contrext.com>
> Content-Type: text/plain;       charset="UTF-8"
>
> I haven?t yet seen anything in the docs that directly address what I?m
> trying to do and suspect I?m simply missing some ML basics or just
> going about things the wrong way.
>
> I have a corpus of several hundred thousand docs (but could be
> millions, of course), where each doc is an average of 200K and several
> thousand elements.
>
> I want to analyze the corpus to get details about the number of
> specific subelements within each document, e.g.:
>
>
> for $article in cts:search(/Article, cts:directory-query("/Default/",
> "infinity"))[$start to $end]
>      return <article-counts id=?{$article/@id}?
> paras=?{count($article//p}?/>
>
> I?m running this as a query from Oxygen (so I can capture the results
> locally so I can do other stuff with them).
>
> On the server I?m using I blow the expanded tree cache if I try to
> request more than about 20,000 docs.
>
> Is there a way to do this kind of processing over an arbitrarily large
> set
> *and* get the results back from a single query request?
>
> I think the only solution is to write the results to back to the
> database and then fetch that as the last thing but I was hoping there
> was something simpler.
>
> Have I missed an obvious solution?
>
> Thanks,
>
> Eliot
>
> --
> Eliot Kimber
> http://contrext.com
>
>
>
>
>
>
> ------------------------------
>
> Message: 4
> Date: Tue, 23 May 2017 07:24:31 +0000
> From: Geert Josten <geert.jos...@marklogic.com>
> Subject: Re: [MarkLogic Dev General] Priorities for queries
> To: MarkLogic Developer Discussion <general@developer.marklogic.com>
> Message-ID: <d549af3b.117ddb%geert.jos...@marklogic.com>
> Content-Type: text/plain; charset="windows-1252"
>
> Hi Oleksii,
>
> If you use xdmp:spawn or xdmp:spawn-function, you would be able to use
> the <priority> option. It takes ?normal? and ?higher? as values. These
> priorities have separate queues and worker threads, so they should
> interfere less with each other.
>
> It might also be worth looking into a way to push out low priority
> work to a dedicated host for longer running tasks. You could do that
> by writing such queries to the database, have a schedule running on
> that particular host monitor for such tasks, which picks them up 1 by
> 1, and writes back results once done. It might be easiest to switch
> around script queries to an asynchronous process that polls regularly
> to see if results have been written. Makes sense?
>
> Cheers,
> Geert
>
> From: <general-boun...@developer.marklogic.com<mailto:general-
> boun...@developer.marklogic.com>> on behalf of Oleksii Segeda <
> oseg...@worldbankgroup.org<mailto:oseg...@worldbankgroup.org>>
> Reply-To: MarkLogic Developer Discussion
> <general@developer.marklogic.com
> <mailto:general@developer.marklogic.com>>
> Date: Monday, May 22, 2017 at 8:59 PM
> To: "general@developer.marklogic.com<mailto:general@developer.
> marklogic.com>" <general@developer.marklogic.com<mailto:general@developer.
> marklogic.com>>
> Subject: [MarkLogic Dev General] Priorities for queries
>
> Hi,
>
> Is there a way to give a lower priority to certain queries? We have
> two different types of API consumers ? real users and various scripts.
> No matter how often scripts are hitting endpoints or how ?heavy? are
> their queries, they should not affect API performance for real users.
> In other words, scripts are tolerant of high latency, but users are not.
>
> Regards,
>
> Oleksii Segeda
>
> IT Analyst
>
> Information and Technology Solutions
>
> W
>
> www.worldbank.org<http://www.worldbank.org/>
>
> [http://siteresources.worldbank.org/NEWS/Images/spacer.png]
>
> [http://siteresources.worldbank.org/NEWS/Images/WBG_
> Information_and_Technology_Solutions.png]
>
>
>
> -------------- next part -------------- An HTML attachment was
> scrubbed...
> URL: http://developer.marklogic.com/pipermail/general/
> attachments/20170523/c01547ba/attachment.html
> -------------- next part -------------- A non-text attachment was
> scrubbed...
> Name: image003.png
> Type: image/png
> Size: 6577 bytes
> Desc: image003.png
> Url : http://developer.marklogic.com/pipermail/general/
> attachments/20170523/c01547ba/attachment.png
> -------------- next part -------------- A non-text attachment was
> scrubbed...
> Name: image002.png
> Type: image/png
> Size: 170 bytes
> Desc: image002.png
> Url : http://developer.marklogic.com/pipermail/general/
> attachments/20170523/c01547ba/attachment-0001.png
>
> ------------------------------
>
> _______________________________________________
> General mailing list
> General@developer.marklogic.com
> Manage your subscription at:
> http://developer.marklogic.com/mailman/listinfo/general
>
>
> End of General Digest, Vol 155, Issue 24
> ****************************************
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: 
http://developer.marklogic.com/pipermail/general/attachments/20170523/bf09ba37/attachment-0001.html

------------------------------

Message: 2
Date: Tue, 23 May 2017 13:56:00 +0000
From: Erik Hennum <erik.hen...@marklogic.com>
Subject: Re: [MarkLogic Dev General] Processing Large Number of Docs
to Get Statistics
To: MarkLogic Developer Discussion <general@developer.marklogic.com>
Message-ID:
<dfdf2fd50bf5aa42adaf93ff2e3ca1850c7f4...@exchg10-be02.marklogic.com>
Content-Type: text/plain; charset="iso-8859-1"

Hi, Eliot:

On reflection, let me retract the range index suggestion.  I wasn't considering 
the domain implied by the element names -- it would never make sense to blow 
out a range index with the value of all of the paragraphs.

The TDE suggestion for MarkLogic 9 would still work, however, because you could 
have an xs:short column with a value of 1 for every paragraph.


Erik Hennum

________________________________________
From: general-boun...@developer.marklogic.com 
[general-boun...@developer.marklogic.com] on behalf of Erik Hennum 
[erik.hen...@marklogic.com]
Sent: Tuesday, May 23, 2017 6:21 AM
To: MarkLogic Developer Discussion
Subject: Re: [MarkLogic Dev General] Processing Large Number of Docs to Get 
Statistics

Hi, Eliot:

One alternative to Geert's good suggestion -- if and only if the number of 
element names is small and you can create range indexes on them:

*  add an element attribute range index on Article/@id
*  add an element range index on p
*  execute a cts:value-tuples() call with the constraining element query and 
directory query
*  iterate over the tuples, incrementing the value of the id in a map
*  remove the range index on p

In MarkLogic 9, that approach gets simpler.  You can just use TDE to project 
rows with columns for the id and element, group on the id column, and count the 
rows in the group.

Hoping that's useful (and salutations in passing),


Erik Hennum

________________________________________
From: general-boun...@developer.marklogic.com 
[general-boun...@developer.marklogic.com] on behalf of Geert Josten 
[geert.jos...@marklogic.com]
Sent: Tuesday, May 23, 2017 12:53 AM
To: MarkLogic Developer Discussion
Subject: Re: [MarkLogic Dev General] Processing Large Number of Docs to Get 
Statistics

Hi Eliot,

I?d consider using taskbot
(http://registry.demo.marklogic.com/package/taskbot), and using that in 
combination with either $tb:OPTIONS-SYNC or $tb:OPTIONS-SYNC-UPDATE. It will 
make optimal use of the TaskServer of the host on which you initiate the call. 
It doesn?t scale endlessly, but it batches up the work automatically for you, 
and will get you a lot further fairly easily..

Cheers,
Geert

On 5/23/17, 5:43 AM, "general-boun...@developer.marklogic.com on behalf of 
Eliot Kimber" <general-boun...@developer.marklogic.com on behalf of 
ekim...@contrext.com> wrote:

>I haven?t yet seen anything in the docs that directly address what I?m
>trying to do and suspect I?m simply missing some ML basics or just
>going about things the wrong way.
>
>I have a corpus of several hundred thousand docs (but could be
>millions, of course), where each doc is an average of 200K and several
>thousand elements.
>
>I want to analyze the corpus to get details about the number of
>specific subelements within each document, e.g.:
>
>
>for $article in cts:search(/Article, cts:directory-query("/Default/",
>"infinity"))[$start to $end]
>     return <article-counts id=?{$article/@id}?
>paras=?{count($article//p}?/>
>
>I?m running this as a query from Oxygen (so I can capture the results
>locally so I can do other stuff with them).
>
>On the server I?m using I blow the expanded tree cache if I try to
>request more than about 20,000 docs.
>
>Is there a way to do this kind of processing over an arbitrarily large
>set *and* get the results back from a single query request?
>
>I think the only solution is to write the results to back to the
>database and then fetch that as the last thing but I was hoping there
>was something simpler.
>
>Have I missed an obvious solution?
>
>Thanks,
>
>Eliot
>
>--
>Eliot Kimber
>http://contrext.com
>
>
>
>
>_______________________________________________
>General mailing list
>General@developer.marklogic.com
>Manage your subscription at:
>http://developer.marklogic.com/mailman/listinfo/general

_______________________________________________
General mailing list
General@developer.marklogic.com
Manage your subscription at:
http://developer.marklogic.com/mailman/listinfo/general
_______________________________________________
General mailing list
General@developer.marklogic.com
Manage your subscription at:
http://developer.marklogic.com/mailman/listinfo/general


------------------------------

_______________________________________________
General mailing list
General@developer.marklogic.com
Manage your subscription at:
http://developer.marklogic.com/mailman/listinfo/general


End of General Digest, Vol 155, Issue 27
****************************************
This electronic mail (including any attachments) may contain information that 
is privileged, confidential, and/or otherwise protected from disclosure to 
anyone other than its intended recipient(s). Any dissemination or use of this 
electronic email or its contents (including any attachments) by persons other 
than the intended recipient(s) is strictly prohibited. If you have received 
this message in error, please notify the sender by reply email and delete the 
original message (including any attachments) in its entirety.
_______________________________________________
General mailing list
General@developer.marklogic.com
Manage your subscription at: 
http://developer.marklogic.com/mailman/listinfo/general

[MarkLogic Dev General] Unsubscribe

Reply via email to