from:"Eliot Kimber"

Re: [MarkLogic Dev General] Marklogic XXE and XML Bomb prevention

2018-03-14 Thread Eliot Kimber

Actually, I never sent an earlier mail, so my “earlier mail” statement is wrong 
(that is, I was wrong about being wrong here.

 

Anyway, the original sample doc was (is) valid and the injection can be done if 
you have access to the ML server’s file system and ML has read access to a 
directory you can write to and you can create and can run XQuery to load the 
file from the server’s file system.

 

But except by disallowing entity reference resolution I don’t see how this is 
something ML itself can prevent. I think it’s an application and server 
security issue.

 

I tried the same test but using an HTTP URL that is resolvable:

 

 



   https://github.com/dita-community/dita-ot-project-docker/raw/master/README.md";

]>

&xxe;

 

I verified that the URL is resolvable by using Oxygen’s open URL function to 
get the file.

 

Using xdmp:load() ML reports:

 

[1.0-ml] SVC-FILOPN: xdmp:eval("xquery version "1.0-ml";

let 
$source := xdmp:...", (), 13776655024510127060...) -- 
File open error: open 
'https://github.com/dita-community/dita-ot-project-docker/raw/master/README.md':
 No such file or directory

So injection of HTTP URLs appears to not work in this case.

 

So I think the injection case can only happen from ML-executed XQuery when 
reading files from the server’s file system. But in a normal server, file 
system access should be tightly controlled. Documents loaded using facilities 
outside ML, e.g., Java code that parses source XML to pass to XCC or something, 
it outside of ML’s ability to control and is thus an application issue, not an 
ML issue.

 

It does not appear to be possible to do this injection using documents supplied 
via say an HTTP response as even if you could provide an XML document with a 
DOCTYPE declaration ML would not be able to resolve any entity references that 
were not to files in the local ML database. 

 

Unquote did not handle the entity reference—it clearly tried to resolve it 
before doing the unquoting, resulting in an “illegal entity reference” message 
and escaping the “&” resulted in a single “;” where the entity reference was 
(meaning “&xxe;” was not converted to “&xxe;” and then resolved--not 
actually sure what the unquote processor is doing in that case).

 

Cheers,

 

E.

--

Eliot Kimber

http://contrext.com

 

From:  on behalf of Eliot Kimber 

Reply-To: MarkLogic Developer Discussion 
Date: Wednesday, March 14, 2018 at 2:49 PM
To: MarkLogic Developer Discussion 
Subject: Re: [MarkLogic Dev General] Marklogic XXE and XML Bomb prevention

 

My earlier mail was wrong.

 

I was able to replicate the behavior in ML 9 using query console.

 

Here is my source doc:

 

 



   ]>

&xxe;

 

And test.xml:

 

---

This is a text file injected.

---

 

Here’s my query:

 

xquery version "1.0-ml";

 

let $source := xdmp:load(“/ekimber/test/injection-test.xml")

 

let $result := doc(“/ekimber/test/injection-test.xml")

 

return $result

 

Result:

 


This is a text file injected.


 

 

So ML’s built-in parser will resolve entity references to files on the file 
system when loaded from the file system using xdmp:load().

 

But I’m not sure this is something ML can or should prevent. I think the 
security presumption is that if you have access to the server (the machine 
running ML) and rights to run xquery on ML that you can do anything.

 

I think it is up to a MarkLogic application to impose rules on what documents 
are allowed to be loaded and what the constraints on entity resolution are.

 

Cheers,

 

Eliot

--

Eliot Kimber

http://contrext.com

 

 

From:  on behalf of Keith Breinholt 

Reply-To: MarkLogic Developer Discussion 
Date: Wednesday, March 14, 2018 at 12:07 PM
To: MarkLogic Developer Discussion 
Subject: Re: [MarkLogic Dev General] Marklogic XXE and XML Bomb prevention

 

Perhaps you could show the code that you used to insert the document into the 
database.

 

I, personally, cannot get your code to work for a number of reasons.  1) having 
both an xml processing statement and an HTML doctype is invalid.  2) Trying to 
assign the “document” to a variable throws an error because of #1. 3) If I try 
to put the “document” below into a file on the file system and load it I cannot 
use xdmp:document-insert() to insert the “document” into the database because 
there isn’t a valid node.

 

There may be something I have overlooked so please share the code you used to 
insert this document into a database.

 

-Keith

 

From: general-boun...@developer.marklogic.com 
 On Behalf Of Marcel de Kleine
Sent: Wednesday, March 14, 2018 6:43 AM
To: general@developer.marklogic.com
Subject: [MarkLogic Dev General] Marklogic XXE and XML Bomb prevention

 

Hello,

 

We have noticed Marklogic is vulnerable to xxe (entity expansion) and xml bomb 
attacks. When loading an malicious document using xdmp:document-insert it won’t 
catch these and cause e

Re: [MarkLogic Dev General] Marklogic XXE and XML Bomb prevention

2018-03-14 Thread Eliot Kimber

My earlier mail was wrong.

 

 I was able to replicate the behavior in ML 9 using query console.

 

Here is my source doc:

 

 



   ]>

&xxe;

 

And test.xml:

 

---

This is a text file injected.

---

 

Here’s my query:

 

xquery version "1.0-ml";

 

let $source := xdmp:load(“/ekimber/test/injection-test.xml")

 

let $result := doc(“/ekimber/test/injection-test.xml")

 

return $result

 

Result:

 


This is a text file injected.


 

 

So ML’s built-in parser will resolve entity references to files on the file 
system when loaded from the file system using xdmp:load().

 

But I’m not sure this is something ML can or should prevent. I think the 
security presumption is that if you have access to the server (the machine 
running ML) and rights to run xquery on ML that you can do anything.

 

I think it is up to a MarkLogic application to impose rules on what documents 
are allowed to be loaded and what the constraints on entity resolution are.

 

Cheers,

 

Eliot

--

Eliot Kimber

http://contrext.com

 

 

From:  on behalf of Keith Breinholt 

Reply-To: MarkLogic Developer Discussion 
Date: Wednesday, March 14, 2018 at 12:07 PM
To: MarkLogic Developer Discussion 
Subject: Re: [MarkLogic Dev General] Marklogic XXE and XML Bomb prevention

 

Perhaps you could show the code that you used to insert the document into the 
database.

 

I, personally, cannot get your code to work for a number of reasons.  1) having 
both an xml processing statement and an HTML doctype is invalid.  2) Trying to 
assign the “document” to a variable throws an error because of #1. 3) If I try 
to put the “document” below into a file on the file system and load it I cannot 
use xdmp:document-insert() to insert the “document” into the database because 
there isn’t a valid node.

 

There may be something I have overlooked so please share the code you used to 
insert this document into a database.

 

-Keith

 

From: general-boun...@developer.marklogic.com 
 On Behalf Of Marcel de Kleine
Sent: Wednesday, March 14, 2018 6:43 AM
To: general@developer.marklogic.com
Subject: [MarkLogic Dev General] Marklogic XXE and XML Bomb prevention

 

Hello,

 

We have noticed Marklogic is vulnerable to xxe (entity expansion) and xml bomb 
attacks. When loading an malicious document using xdmp:document-insert it won’t 
catch these and cause either loading of unwanted external documents (xxe) and 
lockup of the system (xml bomb). 

 

For example, if I load this document :





   ]>

&xxe;

 

The file test.xml gets nicely added to the xml document.

 

See OWASP and others for examples.

 

This is clearly a xml processing issue so the question is : can we disable 
this? And if so, on what levels would this be possible. Best should be 
system-wide. 

( And if you cannot disable this, I think this is something ML should address 
immediately. 

 

Thank you in advance,

Marcel de Kleine, EPAM

 

Marcel de Kleine 

Senior Software Engineer 

 

Office: +31 20 241 6134 x 30530   Cell: +31 6 14806016   Email: 
marcel_de_kle...@epam.com 

Delft, Netherlands   epam.com 

 

CONFIDENTIALITY CAUTION AND DISCLAIMER
This message is intended only for the use of the individual(s) or entity(ies) 
to which it is addressed and contains information that is legally privileged 
and confidential. If you are not the intended recipient, or the person 
responsible for delivering the message to the intended recipient, you are 
hereby notified that any dissemination, distribution or copying of this 
communication is strictly prohibited. All unintended recipients are obliged to 
delete this message and destroy any printed copies. 

 

___
General mailing list
General@developer.marklogic.com
Manage your subscription at: 
http://developer.marklogic.com/mailman/listinfo/general

[MarkLogic Dev General] How to Reliably Tell if FlexRep Push is Done

2018-01-31 Thread Eliot Kimber

I have a process that has to sync up a database between a master server and a 
number of remote servers before submitting tasks to those servers. In some 
cases the sync may be a complete copy of the database, which can take an hour 
or more to sync. In many cases the sync will either be unnecessary or a quick 
sync of a few updated docs.

My task submission process cycles through the pool of remote servers, checking 
their general status and submitting tasks to them.

However, I don't want to submit any tasks until any flexrep syncs are done.

I've worked out this function:

declare function er:is-flexrep-in-progress(
$server
)
{
let $domain-ids := flexrep:configuration-domain-ids()
let $missing-counts as xs:integer* :=
for $domain-id in $domain-ids
let $cfg := flexrep:configuration-get($domain-id)
let $targets := flexrep:configuration-targets($cfg)
return 
for $target in $targets
let $target-id as xs:unsignedLong := $target/flexrep:target-id
let $status := flexrep:target-status($domain-id, $target-id)
let $target-name as xs:string := $status/flexrep:target-name

let $missing-count  := xs:integer($status/flexrep:missing-count)
return
if (contains($target-name, $server) and $missing-count gt 0)
then $missing-count
else 0

let $result := sum($missing-counts) gt 0
return $result
};

But it's kind of slow--in my initial profiling through qconsole it could take 
as much as 800 ms and typically took about 500 ms. The time could be an 
unavoidable side effect of having 100s of 1000s of fragments in this db--my 
profiling showed that most of time came from doing 1000s of small operations 
down in the flexrep code.

I'm wondering if there's a better or more efficient way to determine if flexrep 
is still in progress?

Thanks,

E.
--
Eliot Kimber
http://contrext.com
 



___
General mailing list
General@developer.marklogic.com
Manage your subscription at: 
http://developer.marklogic.com/mailman/listinfo/general

[MarkLogic Dev General] Rsync-Like DB Contents Comparison and Update?

2018-01-27 Thread Eliot Kimber

ML 9

I have a system of servers where a master server gets new remote servers 
allocated it more or less randomly and dynamically.

The remote servers need to have a correct copy of a databse on the master 
server but the database is pretty big (the previously-mentioned 380K doc, 3GB 
database).

I can of course sync it with FlexRep but when a new server comes available I 
don't know what the current state of its local copy of the database is (if it 
has one at all) so I'm forced to recreate my master server's replication 
targets and do a full push, which takes an hour or two. 

In the case where the remote server already has a copy of the database I would 
like to be able to compare it's contents to the master's and determine what the 
deltas are, if any, and only handle those, which usually would only be a few 
docs out of the total set.

Does there exist this kind of rsync or git-like comparison mechanism, either 
out of the box or as a public project?

I'm thinking of something comparable to what git does, which is create hashes 
of each file and then comparing hashes. 

I could do this in XQuery but I suspect something more efficient could be done 
at the forest level, if one knew what one was doing.

Thanks,

Eliot
--
Eliot Kimber
http://contrext.com
 



___
General mailing list
General@developer.marklogic.com
Manage your subscription at: 
http://developer.marklogic.com/mailman/listinfo/general

[MarkLogic Dev General] Why Would FlexRep Pull be Dramatically Slower Than Push for Same Database and Server Pair?

2018-01-25 Thread Eliot Kimber

I have a pair of ML 9 servers. 

On the master server I have a domain with a target configured with a docs per 
batch of 100 for a database with about 380K docs coming in at about 3GB 
reported by the ML status page.

When I use FlexRep push to another server with an empty database the push 
takes about an hour to 2 hours depending on time of day (and thus overall 
network traffic).

When I use FlexRep pull to pull from master to the remote, it takes about 9 
hours.

What would account for this time difference? I'm guessing it's that the pull 
process doesn't use the docs/batch setting (which if I manually set it to 1 for 
a push also results in about 9 hours).

As it happens, I don't need to use pull as I can use push just as easily, but I 
was just curious about the time difference and whether it's an inherent aspect 
of FlexRep pull, indicates a bug, or could be some configuration error on my 
part (but I don't think so since the target configuration is the same in both 
cases--the only variable is pull vs push).

Cheers,

Eliot
--
Eliot Kimber
http://contrext.com
 



___
General mailing list
General@developer.marklogic.com
Manage your subscription at: 
http://developer.marklogic.com/mailman/listinfo/general

Re: [MarkLogic Dev General] Good Way to Automatical

2018-01-22 Thread Eliot Kimber

I've taken the Admin install CPF script and reworked it as a function library 
and removed the code related to default domains (I don't want default domains 
in any case).

What's left includes code to set up the pipelines and triggers, as described in 
the CPF Configuration chapter.

*But*

It also includes the loading of schemas that are used by the pipelines and I 
didn't see anything that does that (or mentions it) in the CPF API.

So unless I'm missing something (which is quite possible), I still need to do 
the schema loading.

What I've extracted from the Admin code seems like a convenient way to just get 
CPF in place so that you can then set up your custom domains.

It could be optimized for the needs of FlexRep, namely only bothering to even 
install the change and FlexRep pipelines but seems likely that other servers 
that will need automatic conversion might use other pipelines and there's no 
particular harm in having unused pipelines lying about.

Note my requirement is simply to have CPF available so that I can then 
configure FlexRep. The lack of a quick-and-easy way to programmatically install 
CPF is simply a roadblock to the real configuration I need to do,  namely 
configure FlexRep, which is otherwise easy enough to do (once one has 
understood how all the FlexRep parts fit together, which was a little harder 
than it should have been, but I think I've already commented on the FlexRep 
docs...).

So I was really looking for a "call this one function to get CPF installed so 
you can continue on with your real task of getting FlexRep configured via a 
script" and I'm not seeing that out of the box.

Or said more directly: there's a one-button task in the Admin UI to get CPF 
installed for a database. There should be a corresponding single-call function 
to do it programmatically and the FlexRep docs should make reference to that 
function at the same time they refer to the manual CPF installation process.

Cheers,

E.

--
Eliot Kimber
http://contrext.com

On 1/22/18, 5:32 PM, "general-boun...@developer.marklogic.com on behalf of 
Mary Holstege"  wrote:

There isn't a single API that orchestrates all the pieces, but there are  
APIs do do all the necessary parts in the pipelines and domains modules.  
These should be executed against your triggers database. If you share a  
triggers database, you don't need to do it all over again.

p:insert to put a pipeline XML into the right collections etc.
dom:configuration-create to create the overall configuration object that  
defines your restart user etc.
You need to do this before you create domains or things will go  
horribly wrong.
dom:create to define your domains
dom:add-pipeline to attach pipelines if you didn't put them in the domain  
in dom:create

All default pipelines are in the Installer directory.

The thing in the admin GUI makes some default assumptions about some of  
this that aren't always the appropriate thing to do.

I'd suggest making a script that creates the domains you want and loads  
and attaches the appropriate pipelines.

//Mary

On Mon, 22 Jan 2018 14:09:23 -0800, Eliot Kimber   
wrote:

> I'm putting together a script that will do all the configuration for a  
> server all the way through defining a FlexRep app server, domains, and  
> targets. The requirement is avoid the need for any manual intervention  
> once the configuration is started.
>
> The one fly in this ointment is the CPF--since I'm creating new  
> databases they of course won't have CPF installed, so I need to install  
> the CPF into those that are involved in FlexRep.
>
> As far as I can tell there is no API for doing this API (there should  
> be), so I'm going to attempt to simply call the  
> Admin/database-cpf-admin-go.xqy module, which seems simple enough (I  
> only need to specify the database name as far as I can tell).
>
> But calling an Admin module like this feels a little dirty and has some  
> risk since it's not a published API and there's no guarantee it will not  
> change without warning in the future (although the risk seems pretty  
> small since it's a module that hasn't changed in ages and it's only  
> called in one place in my code).
>
> Is there a better way to automate installation of the CPF than doing  
> what the "confirm CPF installation" UI form does?
>
> This is in the context of setting up new servers on demand, e.g., in a  
> Docker environment where this server has a very narrow use.
>
> Thanks,
>
> Eliot
> --
> Eliot Kimber
>

Re: [MarkLogic Dev General] Good Way to Automatically Install CPF

2018-01-22 Thread Eliot Kimber

I was pointed to the Scripting Content Processing Framework (CPF) Configuration 
chapter, which does seem to have the guidance I seek. I was focused on 
scripting FlexRep configuration and didn't fully appreciate the underlying 
requirement on CPF. I just wanted an Easy button

In looking more closely at the Admin code that does the CPF default 
installation I see that it's not actually callable as a module anyway as it 
expects to get values from HTTP request parameters (insert snarky comment about 
separation of concerns in code here).

So it looks like the answer is "cut and paste what's in the Admin installer 
code, fix it to be callable as functions, and use that".

Cheers,

E.
--
Eliot Kimber
http://contrext.com

On 1/22/18, 4:15 PM, "general-boun...@developer.marklogic.com on behalf of 
Eliot Kimber"  wrote:

Correct subject  line for this thread.

Cheers,

E.

--
Eliot Kimber
http://contrext.com
 


On 1/22/18, 4:09 PM, "general-boun...@developer.marklogic.com on behalf of 
Eliot Kimber"  wrote:

I'm putting together a script that will do all the configuration for a 
server all the way through defining a FlexRep app server, domains, and targets. 
The requirement is avoid the need for any manual intervention once the 
configuration is started.

The one fly in this ointment is the CPF--since I'm creating new 
databases they of course won't have CPF installed, so I need to install the CPF 
into those that are involved in FlexRep.

As far as I can tell there is no API for doing this API (there should 
be), so I'm going to attempt to simply call the Admin/database-cpf-admin-go.xqy 
module, which seems simple enough (I only need to specify the database name as 
far as I can tell).

But calling an Admin module like this feels a little dirty and has some 
risk since it's not a published API and there's no guarantee it will not change 
without warning in the future (although the risk seems pretty small since it's 
a module that hasn't changed in ages and it's only called in one place in my 
code).

Is there a better way to automate installation of the CPF than doing 
what the "confirm CPF installation" UI form does?

This is in the context of setting up new servers on demand, e.g., in a 
Docker environment where this server has a very narrow use.

Thanks,

Eliot
--
Eliot Kimber
http://contrext.com
 



___
General mailing list
General@developer.marklogic.com
Manage your subscription at: 
http://developer.marklogic.com/mailman/listinfo/general



___
General mailing list
General@developer.marklogic.com
Manage your subscription at: 
http://developer.marklogic.com/mailman/listinfo/general





___
General mailing list
General@developer.marklogic.com
Manage your subscription at: 
http://developer.marklogic.com/mailman/listinfo/general

Re: [MarkLogic Dev General] Good Way to Automatically Install CPF

2018-01-22 Thread Eliot Kimber

Correct subject  line for this thread.

Cheers,

E.

--
Eliot Kimber
http://contrext.com
 


On 1/22/18, 4:09 PM, "general-boun...@developer.marklogic.com on behalf of 
Eliot Kimber"  wrote:

I'm putting together a script that will do all the configuration for a 
server all the way through defining a FlexRep app server, domains, and targets. 
The requirement is avoid the need for any manual intervention once the 
configuration is started.

The one fly in this ointment is the CPF--since I'm creating new databases 
they of course won't have CPF installed, so I need to install the CPF into 
those that are involved in FlexRep.

As far as I can tell there is no API for doing this API (there should be), 
so I'm going to attempt to simply call the Admin/database-cpf-admin-go.xqy 
module, which seems simple enough (I only need to specify the database name as 
far as I can tell).

But calling an Admin module like this feels a little dirty and has some 
risk since it's not a published API and there's no guarantee it will not change 
without warning in the future (although the risk seems pretty small since it's 
a module that hasn't changed in ages and it's only called in one place in my 
code).

Is there a better way to automate installation of the CPF than doing what 
the "confirm CPF installation" UI form does?

This is in the context of setting up new servers on demand, e.g., in a 
Docker environment where this server has a very narrow use.

Thanks,

Eliot
--
Eliot Kimber
http://contrext.com
 



___
General mailing list
General@developer.marklogic.com
Manage your subscription at: 
http://developer.marklogic.com/mailman/listinfo/general



___
General mailing list
General@developer.marklogic.com
Manage your subscription at: 
http://developer.marklogic.com/mailman/listinfo/general

[MarkLogic Dev General] Good Way to Automatical

2018-01-22 Thread Eliot Kimber

I'm putting together a script that will do all the configuration for a server 
all the way through defining a FlexRep app server, domains, and targets. The 
requirement is avoid the need for any manual intervention once the 
configuration is started.

The one fly in this ointment is the CPF--since I'm creating new databases they 
of course won't have CPF installed, so I need to install the CPF into those 
that are involved in FlexRep.

As far as I can tell there is no API for doing this API (there should be), so 
I'm going to attempt to simply call the Admin/database-cpf-admin-go.xqy module, 
which seems simple enough (I only need to specify the database name as far as I 
can tell).

But calling an Admin module like this feels a little dirty and has some risk 
since it's not a published API and there's no guarantee it will not change 
without warning in the future (although the risk seems pretty small since it's 
a module that hasn't changed in ages and it's only called in one place in my 
code).

Is there a better way to automate installation of the CPF than doing what the 
"confirm CPF installation" UI form does?

This is in the context of setting up new servers on demand, e.g., in a Docker 
environment where this server has a very narrow use.

Thanks,

Eliot
--
Eliot Kimber
http://contrext.com
 



___
General mailing list
General@developer.marklogic.com
Manage your subscription at: 
http://developer.marklogic.com/mailman/listinfo/general

Re: [MarkLogic Dev General] General Digest, Vol 162, Issue 17

2017-12-21 Thread Eliot Kimber

OK, good to know it should work except for the point-in-time stuff.

 

Any idea why the forests I was trying to restore would have been greyed out in 
the restore-from-backup panel? Would that reflect an issue with there being 
point-in-time data in the ML4 data being restored or would it be because of 
something else?

 

I checked the usual suspects, such as the forest names not matching or not 
having permissions on the files and those all seemed to be correct, but of 
course there’s still a strong chance it’s my user error. I haven’t had much 
time to work on this so when it didn’t work immediately I set it aside to 
pursue other avenues (and in particular, was able to make Flexrep push from ML4 
to ML 9 work at appropriate speed, so that’s good).

 

Thanks,

 

E.

--

Eliot Kimber

http://contrext.com 

 

From:  on behalf of Rajesh Kumar 

Reply-To: MarkLogic Developer Discussion 
Date: Thursday, December 21, 2017 at 12:55 AM
To: "general@developer.marklogic.com" 
Subject: Re: [MarkLogic Dev General] General Digest, Vol 162, Issue 17

 

Hi Eliot, 

 

You can do restore of the content in ML 9 from ML 4 , but Point-in-time data 
will not work. In ML 4 the timestamp was defined based on the transaction but 
in later versions  ( from 6 i guess )  it was long. So Point-in-time data will 
not work.

 

Regards,

Rajesh 

 

On Thu, Dec 21, 2017 at 1:30 AM,  
wrote:

Send General mailing list submissions to
general@developer.marklogic.com

To subscribe or unsubscribe via the World Wide Web, visit
http://developer.marklogic.com/mailman/listinfo/general
or, via email, send a message with subject or body 'help' to
general-requ...@developer.marklogic.com

You can reach the person managing the list at
general-ow...@developer.marklogic.com

When replying, please edit your Subject line so it is more specific
than "Re: Contents of General digest..."


Today's Topics:

   1. Re: Possible to Restore ML 4 Backup To ML 9 (Eliot Kimber)


--

Message: 1
Date: Tue, 19 Dec 2017 14:54:39 -0600
From: Eliot Kimber 
Subject: Re: [MarkLogic Dev General] Possible to Restore ML 4 Backup
To ML 9
To: MarkLogic Developer Discussion 
Message-ID: <451f0463-39a0-4921-b924-1036844a3...@mitchell1.com>
Content-Type: text/plain; charset="utf-8"



I did not consider using the API. I will try that.



Thanks,



Eliot



--

Eliot Kimber

http://contrext.com



From:  on behalf of Arthur Tsoi 

Reply-To: MarkLogic Developer Discussion 
Date: Tuesday, December 19, 2017 at 1:21 PM
To: "general@developer.marklogic.com" 
Subject: Re: [MarkLogic Dev General] Possible to Restore ML 4 Backup To ML 9



On ML 9, can you try using the API xdmp:database-restore instead of the UI?



Alternatively, since you can restore into an ML 8 server, you can do another 
backup from the ML 8 server and then restore that into an ML 9 server.



Arthur

-- next part --
An HTML attachment was scrubbed...
URL: 
http://developer.marklogic.com/pipermail/general/attachments/20171219/724876c7/attachment-0001.html

--

___
General mailing list
General@developer.marklogic.com
Manage your subscription at:
http://developer.marklogic.com/mailman/listinfo/general


End of General Digest, Vol 162, Issue 17


 

___
General mailing list
General@developer.marklogic.com
Manage your subscription at: 
http://developer.marklogic.com/mailman/listinfo/general

Re: [MarkLogic Dev General] Possible to Restore ML 4 Backup To ML 9

2017-12-19 Thread Eliot Kimber

 

I did not consider using the API. I will try that.

 

Thanks,

 

Eliot

 

--

Eliot Kimber

http://contrext.com

 

From:  on behalf of Arthur Tsoi 

Reply-To: MarkLogic Developer Discussion 
Date: Tuesday, December 19, 2017 at 1:21 PM
To: "general@developer.marklogic.com" 
Subject: Re: [MarkLogic Dev General] Possible to Restore ML 4 Backup To ML 9

 

On ML 9, can you try using the API xdmp:database-restore instead of the UI?

 

Alternatively, since you can restore into an ML 8 server, you can do another 
backup from the ML 8 server and then restore that into an ML 9 server.

 

Arthur

___
General mailing list
General@developer.marklogic.com
Manage your subscription at: 
http://developer.marklogic.com/mailman/listinfo/general

Re: [MarkLogic Dev General] Possible to Restore ML 4 Backup To ML 9

2017-12-19 Thread Eliot Kimber


Just pinging this issue again as being able to do this would make a number of 
things easier than they would otherwise be.

Thanks,

Eliot

--
Eliot Kimber http://contrext.com   

On 12/14/17, 3:42 PM, "general-boun...@developer.marklogic.com on behalf of 
Eliot Kimber"  wrote:

I also made sure the files are world readable and writeable since I thought 
at first it might be a permissions problem.

Cheers,

Eliot
    --
Eliot Kimber
http://contrext.com
 


On 12/14/17, 3:41 PM, "general-boun...@developer.marklogic.com on behalf of 
Eliot Kimber"  wrote:

I have successfully restored an ML 4 backup into an ML 8 server.

I'm now trying to restore an ML 4 backup into an ML 9 server and not 
having any luck.

If I try to do a full database restore the forest directories in the 
backup are listed but their check boxes are not selectable. I have verified 
that the forest names match.

If I try to restore each forest individually, I can start the backup 
but it fails with this error:

Error: Forest label does not exist: 
/marklogic/backup/20171214-1/Forests/mydatabase-2/Label

There is a Label file:

# cat mydatabase-2/Label 
??*h1?5
# _

So I'm assuming this is a version incompatibility.

Thanks,

    Eliot
    --
Eliot Kimber
http://contrext.com
 



___
General mailing list
General@developer.marklogic.com
Manage your subscription at: 
http://developer.marklogic.com/mailman/listinfo/general



___
General mailing list
General@developer.marklogic.com
Manage your subscription at: 
http://developer.marklogic.com/mailman/listinfo/general





___
General mailing list
General@developer.marklogic.com
Manage your subscription at: 
http://developer.marklogic.com/mailman/listinfo/general

Re: [MarkLogic Dev General] Possible to Flexrep Pull From ML9 to ML4?

2017-12-15 Thread Eliot Kimber

I am able to flexrep push local forests from ML4 to ML 9, so that will let me 
do what I want by dynamically managing the flexrep config on the master host.

Cheers,

Eliot
--
Eliot Kimber
http://contrext.com
 


On 12/14/17, 12:42 PM, "general-boun...@developer.marklogic.com on behalf of 
Eliot Kimber"  wrote:

I am trying to set up flexrep pull where an ML 9 server is pulling from an 
ML 4 server.

Is this actually possible?

It looks like the code is different in ML 4 from what the ML 9 pull code 
expects.

In particular, the ML 9 pull code is calling the module pollBinaryChunk.xqy 
on the pulled-from server but as far as I can tell no such module exists in the 
ML 4 code.

Is there a workaround for this?

Because of the way my systems are set up it's problematic to use 
command-line tools like mlcp or corb, so I'm trying to keep everything in 
XQuery.

My specific requirement is to sync part of a much larger database to a set 
of Docker-based servers whenever those servers are started up. Because the 
servers are not persistent I cannot use normal push flexrep and because I only 
want part of the database backup and restore is not ideal and I also want to 
avoid the reindexing cost if possible.

Thanks,

    Eliot
    --
Eliot Kimber
http://contrext.com


___
General mailing list
General@developer.marklogic.com
Manage your subscription at: 
http://developer.marklogic.com/mailman/listinfo/general



___
General mailing list
General@developer.marklogic.com
Manage your subscription at: 
http://developer.marklogic.com/mailman/listinfo/general

Re: [MarkLogic Dev General] Possible to Restore ML 4 Backup To ML 9

2017-12-14 Thread Eliot Kimber

I also made sure the files are world readable and writeable since I thought at 
first it might be a permissions problem.

Cheers,

Eliot
--
Eliot Kimber
http://contrext.com
 


On 12/14/17, 3:41 PM, "general-boun...@developer.marklogic.com on behalf of 
Eliot Kimber"  wrote:

I have successfully restored an ML 4 backup into an ML 8 server.

I'm now trying to restore an ML 4 backup into an ML 9 server and not having 
any luck.

If I try to do a full database restore the forest directories in the backup 
are listed but their check boxes are not selectable. I have verified that the 
forest names match.

If I try to restore each forest individually, I can start the backup but it 
fails with this error:

Error: Forest label does not exist: 
/marklogic/backup/20171214-1/Forests/mydatabase-2/Label

There is a Label file:

# cat mydatabase-2/Label 
??*h1?5
# _

So I'm assuming this is a version incompatibility.

Thanks,

    Eliot
    --
Eliot Kimber
http://contrext.com
 



___
General mailing list
General@developer.marklogic.com
Manage your subscription at: 
http://developer.marklogic.com/mailman/listinfo/general



___
General mailing list
General@developer.marklogic.com
Manage your subscription at: 
http://developer.marklogic.com/mailman/listinfo/general

[MarkLogic Dev General] Possible to Restore ML 4 Backup To ML 9

2017-12-14 Thread Eliot Kimber

I have successfully restored an ML 4 backup into an ML 8 server.

I'm now trying to restore an ML 4 backup into an ML 9 server and not having any 
luck.

If I try to do a full database restore the forest directories in the backup are 
listed but their check boxes are not selectable. I have verified that the 
forest names match.

If I try to restore each forest individually, I can start the backup but it 
fails with this error:

Error: Forest label does not exist: 
/marklogic/backup/20171214-1/Forests/mydatabase-2/Label

There is a Label file:

# cat mydatabase-2/Label 
??*h1?5
# _

So I'm assuming this is a version incompatibility.

Thanks,

Eliot
--
Eliot Kimber
http://contrext.com
 



___
General mailing list
General@developer.marklogic.com
Manage your subscription at: 
http://developer.marklogic.com/mailman/listinfo/general

[MarkLogic Dev General] Possible to Flexrep Pull From ML9 to ML4?

2017-12-14 Thread Eliot Kimber

I am trying to set up flexrep pull where an ML 9 server is pulling from an ML 4 
server.

Is this actually possible?

It looks like the code is different in ML 4 from what the ML 9 pull code 
expects.

In particular, the ML 9 pull code is calling the module pollBinaryChunk.xqy on 
the pulled-from server but as far as I can tell no such module exists in the ML 
4 code.

Is there a workaround for this?

Because of the way my systems are set up it's problematic to use command-line 
tools like mlcp or corb, so I'm trying to keep everything in XQuery.

My specific requirement is to sync part of a much larger database to a set of 
Docker-based servers whenever those servers are started up. Because the servers 
are not persistent I cannot use normal push flexrep and because I only want 
part of the database backup and restore is not ideal and I also want to avoid 
the reindexing cost if possible.

Thanks,

Eliot
--
Eliot Kimber
http://contrext.com


___
General mailing list
General@developer.marklogic.com
Manage your subscription at: 
http://developer.marklogic.com/mailman/listinfo/general

Re: [MarkLogic Dev General] Best Approach to Manage "Flags" That Might Change Within a Single Transaction

2017-12-07 Thread Eliot Kimber

I think I've solved my problem by once again being more careful about holding 
elements in memory. By replacing global reads of my job doc with on-demand 
reads through xdmp:eval() I seem to have resolved my issue with changes to the 
job doc not being seen within the same separate transaction (e.g,, my read 
loop). I seem to be unable to let go of my procedural language brain damage

Still, it seems like having a general, cross-application field or shared memory 
mechanism would be useful for this type of application where one app (e.g., my 
Web UI) spawns tasks that do the work and need a way to dynamically communicate 
within the scope of a single long-running transaction. At least that's the way 
I would go about building this type of application in a different environment.

Cheers,

E.
--
Eliot Kimber
http://contrext.com
 

On 12/7/17, 10:48 AM, "general-boun...@developer.marklogic.com on behalf of 
Eliot Kimber"  wrote:

I don't think server fields are going to work because they are per 
application server and I have different application servers at work.

There is an HTTP server that gets the pause/resume request and then spawned 
tasks running the TaskServer that need to read the field. 

My experiments show that, per the docs, a field changed by one app is not 
seen by a different app. 

Cheers,
    
    Eliot
--
Eliot Kimber
http://contrext.com
 

On 12/7/17, 10:13 AM, "general-boun...@developer.marklogic.com on behalf of 
Eliot Kimber"  wrote:

I had not considered server fields--I'll check it out.

Cheers,

    E.

--
Eliot Kimber
http://contrext.com
 

On 12/7/17, 10:11 AM, "general-boun...@developer.marklogic.com on 
behalf of Erik Hennum"  wrote:

Hi, Eliot:

Have you considered a server field -- where any code that changes 
the status also updates the server field and the iterator checks the server 
field?

The server fields are local to the host, so there's no concern 
about a separate iterator running on a different host.

If multiple iterators run on the same host, each would need to 
distinguish its status by an id, which the iterator could generate from a 
random id when it starts.


Hoping that helps,


Erik Hennum




From: general-boun...@developer.marklogic.com 
 on behalf of Eliot Kimber 

Sent: Thursday, December 7, 2017 7:48:44 AM
To: MarkLogic Developer Discussion
Subject: [MarkLogic Dev General] Best Approach to Manage "Flags" 
That Might Change Within a Single Transaction

In the context of my remote processing management system, where my 
client server is sending many tasks to a set of remote servers through a set of 
spawned tasks running in parallel, I need to be able to pause the client so 
that it stops sending new tasks to the remote servers.

So far I've been using a single document stored in ML as my 
mechanism for indicating that a job is in progress and capturing the job 
details (job ID, start time, servers in use, etc.). This works fine because it 
was only updated at the start and end of the job.

But for the pause/resume use case I need to have a flag that 
indicates that the job is paused and have other processes (e.g., my 
task-submission code) immediately respond to a change. For example, if I'm 
looping over 100 tasks to load up a remote task queue and the job is paused, I 
want that loop to end immediately.

So basically, in this loop, for every iteration, check the "is 
paused" status, which requires reading the job doc to see if a @paused 
attribute is present (the @paused attribute captures the time the pause was 
requested and serves as the "is paused" flag). However, because the loop is a 
single transaction, it will see the same version of the job doc for every 
iteration, even if it's changed.

I tried using xdmp:eval() to read the job doc but that didn't seem 
to change the behavior.

E.g., doing this in query console:

return (er:is-job-paused(), er:pause-job(), 
er:is-job-paused())

Results in (false, false)

So this isn't going to work.

So my question: what's the best way to manage this kind of dynamic 
flag in ML?

I could use file system files instead of docs in the database, 
which would avoid the ML transaction beh

Re: [MarkLogic Dev General] Best Approach to Manage "Flags" That Might Change Within a Single Transaction

2017-12-07 Thread Eliot Kimber

I don't think server fields are going to work because they are per application 
server and I have different application servers at work.

There is an HTTP server that gets the pause/resume request and then spawned 
tasks running the TaskServer that need to read the field. 

My experiments show that, per the docs, a field changed by one app is not seen 
by a different app. 

Cheers,

Eliot
--
Eliot Kimber
http://contrext.com
 

On 12/7/17, 10:13 AM, "general-boun...@developer.marklogic.com on behalf of 
Eliot Kimber"  wrote:

I had not considered server fields--I'll check it out.

Cheers,

    E.
    
    --
Eliot Kimber
http://contrext.com
 

On 12/7/17, 10:11 AM, "general-boun...@developer.marklogic.com on behalf of 
Erik Hennum"  wrote:

Hi, Eliot:

Have you considered a server field -- where any code that changes the 
status also updates the server field and the iterator checks the server field?

The server fields are local to the host, so there's no concern about a 
separate iterator running on a different host.

If multiple iterators run on the same host, each would need to 
distinguish its status by an id, which the iterator could generate from a 
random id when it starts.


Hoping that helps,


Erik Hennum




From: general-boun...@developer.marklogic.com 
 on behalf of Eliot Kimber 

Sent: Thursday, December 7, 2017 7:48:44 AM
To: MarkLogic Developer Discussion
Subject: [MarkLogic Dev General] Best Approach to Manage "Flags" That 
Might Change Within a Single Transaction

In the context of my remote processing management system, where my 
client server is sending many tasks to a set of remote servers through a set of 
spawned tasks running in parallel, I need to be able to pause the client so 
that it stops sending new tasks to the remote servers.

So far I've been using a single document stored in ML as my mechanism 
for indicating that a job is in progress and capturing the job details (job ID, 
start time, servers in use, etc.). This works fine because it was only updated 
at the start and end of the job.

But for the pause/resume use case I need to have a flag that indicates 
that the job is paused and have other processes (e.g., my task-submission code) 
immediately respond to a change. For example, if I'm looping over 100 tasks to 
load up a remote task queue and the job is paused, I want that loop to end 
immediately.

So basically, in this loop, for every iteration, check the "is paused" 
status, which requires reading the job doc to see if a @paused attribute is 
present (the @paused attribute captures the time the pause was requested and 
serves as the "is paused" flag). However, because the loop is a single 
transaction, it will see the same version of the job doc for every iteration, 
even if it's changed.

I tried using xdmp:eval() to read the job doc but that didn't seem to 
change the behavior.

E.g., doing this in query console:

return (er:is-job-paused(), er:pause-job(), er:is-job-paused())

Results in (false, false)

So this isn't going to work.

So my question: what's the best way to manage this kind of dynamic flag 
in ML?

I could use file system files instead of docs in the database, which 
would avoid the ML transaction behavior but that seems a little hackier than 
I'd like.

What I'd really like is some kind of "shared memory" mechanism where I 
can set and reset variables at will across different modules running in 
parallel but I haven't seen anything like that in my study of the ML API.

Is there such a mechanism that I've missed?

Or am I just thinking about the problem the wrong way?

Thanks,

Eliot

--
Eliot Kimber
http://contrext.com




___
General mailing list
General@developer.marklogic.com
Manage your subscription at:
http://developer.marklogic.com/mailman/listinfo/general
___
General mailing list
General@developer.marklogic.com
Manage your subscription at: 
http://developer.marklogic.com/mailman/listinfo/general



___
General mailing list
General@developer.marklogic.com
Manage your subscription at: 
http://develop

Re: [MarkLogic Dev General] Best Approach to Manage "Flags" That Might Change Within a Single Transaction

2017-12-07 Thread Eliot Kimber

I had not considered server fields--I'll check it out.

Cheers,

E.

--
Eliot Kimber
http://contrext.com
 

On 12/7/17, 10:11 AM, "general-boun...@developer.marklogic.com on behalf of 
Erik Hennum"  wrote:

Hi, Eliot:

Have you considered a server field -- where any code that changes the 
status also updates the server field and the iterator checks the server field?

The server fields are local to the host, so there's no concern about a 
separate iterator running on a different host.

If multiple iterators run on the same host, each would need to distinguish 
its status by an id, which the iterator could generate from a random id when it 
starts.


Hoping that helps,


Erik Hennum




From: general-boun...@developer.marklogic.com 
 on behalf of Eliot Kimber 

Sent: Thursday, December 7, 2017 7:48:44 AM
To: MarkLogic Developer Discussion
Subject: [MarkLogic Dev General] Best Approach to Manage "Flags" That Might 
Change Within a Single Transaction

In the context of my remote processing management system, where my client 
server is sending many tasks to a set of remote servers through a set of 
spawned tasks running in parallel, I need to be able to pause the client so 
that it stops sending new tasks to the remote servers.

So far I've been using a single document stored in ML as my mechanism for 
indicating that a job is in progress and capturing the job details (job ID, 
start time, servers in use, etc.). This works fine because it was only updated 
at the start and end of the job.

But for the pause/resume use case I need to have a flag that indicates that 
the job is paused and have other processes (e.g., my task-submission code) 
immediately respond to a change. For example, if I'm looping over 100 tasks to 
load up a remote task queue and the job is paused, I want that loop to end 
immediately.

So basically, in this loop, for every iteration, check the "is paused" 
status, which requires reading the job doc to see if a @paused attribute is 
present (the @paused attribute captures the time the pause was requested and 
serves as the "is paused" flag). However, because the loop is a single 
transaction, it will see the same version of the job doc for every iteration, 
even if it's changed.

I tried using xdmp:eval() to read the job doc but that didn't seem to 
change the behavior.

E.g., doing this in query console:

return (er:is-job-paused(), er:pause-job(), er:is-job-paused())

Results in (false, false)

So this isn't going to work.

So my question: what's the best way to manage this kind of dynamic flag in 
ML?

I could use file system files instead of docs in the database, which would 
avoid the ML transaction behavior but that seems a little hackier than I'd like.

What I'd really like is some kind of "shared memory" mechanism where I can 
set and reset variables at will across different modules running in parallel 
but I haven't seen anything like that in my study of the ML API.

Is there such a mechanism that I've missed?
    
    Or am I just thinking about the problem the wrong way?

Thanks,

Eliot

--
Eliot Kimber
http://contrext.com




___
General mailing list
General@developer.marklogic.com
Manage your subscription at:
http://developer.marklogic.com/mailman/listinfo/general
___
General mailing list
General@developer.marklogic.com
Manage your subscription at: 
http://developer.marklogic.com/mailman/listinfo/general



___
General mailing list
General@developer.marklogic.com
Manage your subscription at: 
http://developer.marklogic.com/mailman/listinfo/general

[MarkLogic Dev General] Best Approach to Manage "Flags" That Might Change Within a Single Transaction

2017-12-07 Thread Eliot Kimber

In the context of my remote processing management system, where my client 
server is sending many tasks to a set of remote servers through a set of 
spawned tasks running in parallel, I need to be able to pause the client so 
that it stops sending new tasks to the remote servers.

So far I've been using a single document stored in ML as my mechanism for 
indicating that a job is in progress and capturing the job details (job ID, 
start time, servers in use, etc.). This works fine because it was only updated 
at the start and end of the job. 

But for the pause/resume use case I need to have a flag that indicates that the 
job is paused and have other processes (e.g., my task-submission code) 
immediately respond to a change. For example, if I'm looping over 100 tasks to 
load up a remote task queue and the job is paused, I want that loop to end 
immediately.

So basically, in this loop, for every iteration, check the "is paused" status, 
which requires reading the job doc to see if a @paused attribute is present 
(the @paused attribute captures the time the pause was requested and serves as 
the "is paused" flag). However, because the loop is a single transaction, it 
will see the same version of the job doc for every iteration, even if it's 
changed.

I tried using xdmp:eval() to read the job doc but that didn't seem to change 
the behavior.

E.g., doing this in query console:

return (er:is-job-paused(), er:pause-job(), er:is-job-paused())

Results in (false, false)

So this isn't going to work.

So my question: what's the best way to manage this kind of dynamic flag in ML?

I could use file system files instead of docs in the database, which would 
avoid the ML transaction behavior but that seems a little hackier than I'd like.

What I'd really like is some kind of "shared memory" mechanism where I can set 
and reset variables at will across different modules running in parallel but I 
haven't seen anything like that in my study of the ML API.

Is there such a mechanism that I've missed?

Or am I just thinking about the problem the wrong way?

Thanks,

Eliot

--
Eliot Kimber
http://contrext.com
 



___
General mailing list
General@developer.marklogic.com
Manage your subscription at: 
http://developer.marklogic.com/mailman/listinfo/general

[MarkLogic Dev General] Error in Typeswitch Syntax Diagram

2017-12-01 Thread Eliot Kimber

In the topic on typeswitch in this doc (ML 9 version): 

http://docs.marklogic.com/guide/xquery/langoverview#id_75915

The syntax diagram shows a “,” separator for the repeat line for “case” clauses.

The comma should not be there.

Cheers,

Eliot

--
Eliot Kimber
http://contrext.com
 



___
General mailing list
General@developer.marklogic.com
Manage your subscription at: 
http://developer.marklogic.com/mailman/listinfo/general

Re: [MarkLogic Dev General] Bug in XSLT and XQuery Reference Guide

2017-11-29 Thread Eliot Kimber

Here’s the code I have:

for $map in $details
order by map:get($map, 'active') ascending, 
 map:get($map, 'queued') ascending,
return $map

Cheers,

E.

--
Eliot Kimber
http://contrext.com
 


On 11/29/17, 11:46 AM, "general-boun...@developer.marklogic.com on behalf of 
Geert Josten"  wrote:

Thanks, looks like you are right.

Can you elaborate on the multiple expressions?

Cheers,
Geert

On 11/29/17, 5:30 PM, "general-boun...@developer.marklogic.com on behalf
of Eliot Kimber"  wrote:

>I didn¹t see a place to submit comments in the guide like you can in the
>reference topics so I¹m posting here.
>
>In http://docs.marklogic.com/guide/xquery/langoverview#id_11626, in the
>section on the order-by clause, the syntax diagram shows the repeat
>returning to before the ³order by² keyword.
>
>The correct syntax should have the repeat returning *after* the ³order
>by² keyword and before the $varExpr
>
>That is, order by is:
>
>order by expression1, expression2
>
>not order by expression1, order by expression2
>
>I also didn¹t see any examples of order-by clauses with multiple
>expressions‹that would be useful to have.
>
>Cheers,
>
>E.
>
>--
>Eliot Kimber
>http://contrext.com
> 
>
>
>
>___
>General mailing list
>General@developer.marklogic.com
>Manage your subscription at:
>http://developer.marklogic.com/mailman/listinfo/general

___
General mailing list
General@developer.marklogic.com
Manage your subscription at: 
http://developer.marklogic.com/mailman/listinfo/general



___
General mailing list
General@developer.marklogic.com
Manage your subscription at: 
http://developer.marklogic.com/mailman/listinfo/general

[MarkLogic Dev General] Bug in XSLT and XQuery Reference Guide

2017-11-29 Thread Eliot Kimber

I didn’t see a place to submit comments in the guide like you can in the 
reference topics so I’m posting here.

In http://docs.marklogic.com/guide/xquery/langoverview#id_11626, in the section 
on the order-by clause, the syntax diagram shows the repeat returning to before 
the “order by” keyword. 

The correct syntax should have the repeat returning *after* the “order by” 
keyword and before the $varExpr 

That is, order by is:

order by expression1, expression2

not order by expression1, order by expression2

I also didn’t see any examples of order-by clauses with multiple 
expressions—that would be useful to have.

Cheers,

E.

--
Eliot Kimber
http://contrext.com
 



___
General mailing list
General@developer.marklogic.com
Manage your subscription at: 
http://developer.marklogic.com/mailman/listinfo/general

Re: [MarkLogic Dev General] How to Do Equivalent of While true() Loop In ML?

2017-11-28 Thread Eliot Kimber

I can see how this form of recursive function using xdmp:set() is better than 
for 1 to 1,000,000.

I was trying to find a pure XQuery solution, that is, one that didn’t rely on 
xdmp:set(), so my recursion was:

declare function local:handle-task($task, $tasks) {
   if (empty($task)) 
   then ()
   else 
  let $do-submit := local:submit-job($task)
  return local:handle-task(head($tasks), tail($tasks))
}

(I could of course just pass tasks but I think the function signature is 
clearer if the thing that is acted on is passed as a separate parameter.)

But beyond that I have another loop or recursion that polls the remote servers 
until a server becomes available. That recursion I think was where I was 
getting out of memory even though tail recursion optimization should have 
prevented it.

So in my mind there’s a still a question of whether or not a pure XQuery 
solution is possible with ML 9 or do I have to use xdmp:set()?

Of course I have to do whatever will solve the problem but I like to avoid 
proprietary extensions whenever possible (otherwise, what’s the point of using 
standards). 

Cheers,

Eliot

--
Eliot Kimber
http://contrext.com
 


On 11/28/17, 10:39 AM, "general-boun...@developer.marklogic.com on behalf of 
John Snelson"  wrote:

You should use recursive functions for this kind of thing:

declare function local:while($test, $body) {
   if($test()) then ($body(), local:while($test,$body)) else ()
};

let $tasks := ...
return
   local:while(
 function() { exists($tasks) },
 function() {
   submit-task(head($tasks)),
   xdmp:set($tasks,tail($tasks))
 }
   )

MarkLogic's tail call optimization will mean that the local:while() 
function will use a constant amount of stack space.

However in your specific example you really just want to execute a 
function on each member of a sequence. In that specific case you can use 
fn:map:

fn:map(submit-task#1,$tasks)

John


On 27/11/17 16:56, Eliot Kimber wrote:
> I have a client-server system where the client is spawning 100s of 1000s 
of jobs on the client. The client polls the servers to see when each server’s 
task queue is ready for more jobs. This all works fine.
>
> Logically this polling is a while-true() loop that will continue until 
either all the servers are offline or all the tasks to be submitted are 
consumed.
>
> In a procedural language this is trivial, but in XQuery 2 I’m not finding 
a way to do it that works. In XQuery 3 I could use the new iterate operator but 
that doesn’t seem to be available in MarkLogic 9.
>
> My first attempt was to use a recursive process, relying on tail 
recursion optimization to avoid blowing the stack buffer. That worked logically 
but I still ran into out-of-memory on the server at some point (around 200K 
jobs submitted) and it seems likely that it was runaway recursion doing it.
>
> So I tried using a simple loop with xdmp:set() to iterate over the tasks 
and use an exception to break out when all the tasks are done:
>
>  try {
>  for $i in 1 to 100 (: i.e., loop forever :)
>   if (empty($tasks))
>   then error()
>  else submit-task(head($tasks))
>  xdmp:set($tasks, tail($tasks))
>   } catch ($e) {
>  (: We’re done. (
>  }
>
> Is there a better way to do this kind of looping forever?
>
> I’m also having a very strange behavior where in my new looping code I’m 
getting what I think must be a pending commit deadlock that I didn’t get in my 
recursive version of the code. I can trace the code to the xdmp:eval() that 
would commit an update to the task and that code never returns.
>
> Each task is a document that I update to reflect the details of the 
task’s status (start and end times, current processing status, etc.). Those 
updates are all done either in separately-run modules or via xdmp:eval(), so as 
far as I can tell there shouldn’t be any issues with uncommitted updates. I 
didn’t change anything in the logic that updates the task documents, only the 
loop that iterates over the tasks.
>
> Could it be that the use of xdmp:set() to modify the $tasks variable (a 
sequence of  elements) would be causing some kind of commit lock?
>
> Thanks,
>
> Eliot
>
> --
> Eliot Kimber
> http://contrext.com
>   
>
>
>
> ___
> General mailing list
> General@developer.marklogic.com
> Manage your subscription at:
> http://developer.marklogic.com/mailman/listinfo/general


--

Re: [MarkLogic Dev General] How to Do Equivalent of While true() Loop In ML?

2017-11-28 Thread Eliot Kimber

My error was using the list of  elements as the control for my loop 
rather than some lookup key. This was causing read locks on the  elements 
 that then blocked my subsequent update attempts.

I rewrote my looping code to limit reads of the  elements to just those 
places where it was doing an update so that no read lock was held outside the 
updating code.

My process is now almost completed for its full 500K+ task set, so memory 
issues resolved. Now I just need to fix a bug in my queue loading logic and it 
might just work for production…

Cheers,

E.

--
Eliot Kimber
http://contrext.com

On 11/27/17, 2:50 PM, "general-boun...@developer.marklogic.com on behalf of 
Eliot Kimber"  wrote:

I’ve tried turning on the lock tracing and I do see deadlocks both with the 
recursive version and the loop-with-set version, but the loop-with-set version 
is basically ever task doc is locked (which is what I would expect) but with 
the recursive version I get little bursts of five or six deadlocks at a time 
but the code generally runs. 

But this does suggest that my code is creating unexpected locks that I need 
to resolve. 

Cheers,

E.
    --
Eliot Kimber
http://contrext.com

On 11/27/17, 11:45 AM, "general-boun...@developer.marklogic.com on behalf 
of Will Thompson"  wrote:

Eliot,

Is the controller/while-loop transaction read-only (i.e.: is 
xdmp:request-timestamp() nonempty)? If it is, then I think you can be sure it's 
not holding locks. Otherwise, I would restructure that part of the application 
so that any transaction responsible for dispatching jobs doesn't make any 
updates. Generally, you don't want a long-running update transaction to touch 
lots of documents.

If you turn on debug logging, ML will report to the error log when it 
detects a deadlock (and randomly kills and retries one of the deadlocking 
transactions). There is also a lock trace event you can enable to get detailed 
output about which transactions are holding locks and which ones are waiting on 
them (See: 
https://help.marklogic.com/knowledgebase/article/View/387/0/understanding-the-lock-trace-diagnostic-trace-event).
 All of the reporting IIRC is based on transaction IDs, so you generally have 
to do your own logging elsewhere to identify which IDs are associated with 
which transactions. As you might expect, this can get kind of hairy.

In the past I have used a task server job to check for a condition, and 
if it hasn't been met, sleep for a few 100ms and respawn. Similar behavior 
could also be accomplished with triggers or wth CPF, but both are probably 
overkill for your case. 

-Will

> On Nov 27, 2017, at 10:59 AM, William Sawyer  
wrote:
> 
> You could recursively spawn or setup a schedule task to run every 
minute or faster if needed.
> 
> -Will
    > 
> On Mon, Nov 27, 2017 at 9:56 AM, Eliot Kimber  
wrote:
> I have a client-server system where the client is spawning 100s of 
1000s of jobs on the client. The client polls the servers to see when each 
server’s task queue is ready for more jobs. This all works fine.
> 
> Logically this polling is a while-true() loop that will continue 
until either all the servers are offline or all the tasks to be submitted are 
consumed.
> 
> In a procedural language this is trivial, but in XQuery 2 I’m not 
finding a way to do it that works. In XQuery 3 I could use the new iterate 
operator but that doesn’t seem to be available in MarkLogic 9.
> 
> My first attempt was to use a recursive process, relying on tail 
recursion optimization to avoid blowing the stack buffer. That worked logically 
but I still ran into out-of-memory on the server at some point (around 200K 
jobs submitted) and it seems likely that it was runaway recursion doing it.
> 
> So I tried using a simple loop with xdmp:set() to iterate over the 
tasks and use an exception to break out when all the tasks are done:
> 
> try {
> for $i in 1 to 100 (: i.e., loop forever :)
> if (empty($tasks))
> then error()
> else submit-task(head($tasks))
> xdmp:set($tasks, tail($tasks))
>  } catch ($e) {
> (: We’re done. (
> }
> 
> Is there a better way to do this kind of looping forever?
> 
> I’m also having a very strange behavior where in my new looping code 
I’m getting what I think must be a pending commit deadlock that I didn’t get in 
my recursive version of the code. I can trace the code to the xdmp:eval() that 
would commit an update to the

Re: [MarkLogic Dev General] How to Do Equivalent of While true() Loop In ML?

2017-11-27 Thread Eliot Kimber

I’ve tried turning on the lock tracing and I do see deadlocks both with the 
recursive version and the loop-with-set version, but the loop-with-set version 
is basically ever task doc is locked (which is what I would expect) but with 
the recursive version I get little bursts of five or six deadlocks at a time 
but the code generally runs. 

But this does suggest that my code is creating unexpected locks that I need to 
resolve. 

Cheers,

E.
--
Eliot Kimber
http://contrext.com

On 11/27/17, 11:45 AM, "general-boun...@developer.marklogic.com on behalf of 
Will Thompson"  wrote:

Eliot,

Is the controller/while-loop transaction read-only (i.e.: is 
xdmp:request-timestamp() nonempty)? If it is, then I think you can be sure it's 
not holding locks. Otherwise, I would restructure that part of the application 
so that any transaction responsible for dispatching jobs doesn't make any 
updates. Generally, you don't want a long-running update transaction to touch 
lots of documents.

If you turn on debug logging, ML will report to the error log when it 
detects a deadlock (and randomly kills and retries one of the deadlocking 
transactions). There is also a lock trace event you can enable to get detailed 
output about which transactions are holding locks and which ones are waiting on 
them (See: 
https://help.marklogic.com/knowledgebase/article/View/387/0/understanding-the-lock-trace-diagnostic-trace-event).
 All of the reporting IIRC is based on transaction IDs, so you generally have 
to do your own logging elsewhere to identify which IDs are associated with 
which transactions. As you might expect, this can get kind of hairy.

In the past I have used a task server job to check for a condition, and if 
it hasn't been met, sleep for a few 100ms and respawn. Similar behavior could 
also be accomplished with triggers or wth CPF, but both are probably overkill 
for your case. 

-Will

> On Nov 27, 2017, at 10:59 AM, William Sawyer  
wrote:
> 
> You could recursively spawn or setup a schedule task to run every minute 
or faster if needed.
> 
> -Will
    > 
    > On Mon, Nov 27, 2017 at 9:56 AM, Eliot Kimber  
wrote:
> I have a client-server system where the client is spawning 100s of 1000s 
of jobs on the client. The client polls the servers to see when each server’s 
task queue is ready for more jobs. This all works fine.
> 
> Logically this polling is a while-true() loop that will continue until 
either all the servers are offline or all the tasks to be submitted are 
consumed.
> 
> In a procedural language this is trivial, but in XQuery 2 I’m not finding 
a way to do it that works. In XQuery 3 I could use the new iterate operator but 
that doesn’t seem to be available in MarkLogic 9.
> 
> My first attempt was to use a recursive process, relying on tail 
recursion optimization to avoid blowing the stack buffer. That worked logically 
but I still ran into out-of-memory on the server at some point (around 200K 
jobs submitted) and it seems likely that it was runaway recursion doing it.
> 
> So I tried using a simple loop with xdmp:set() to iterate over the tasks 
and use an exception to break out when all the tasks are done:
> 
> try {
> for $i in 1 to 100 (: i.e., loop forever :)
> if (empty($tasks))
> then error()
> else submit-task(head($tasks))
> xdmp:set($tasks, tail($tasks))
>  } catch ($e) {
> (: We’re done. (
> }
> 
> Is there a better way to do this kind of looping forever?
> 
> I’m also having a very strange behavior where in my new looping code I’m 
getting what I think must be a pending commit deadlock that I didn’t get in my 
recursive version of the code. I can trace the code to the xdmp:eval() that 
would commit an update to the task and that code never returns.
> 
> Each task is a document that I update to reflect the details of the 
task’s status (start and end times, current processing status, etc.). Those 
updates are all done either in separately-run modules or via xdmp:eval(), so as 
far as I can tell there shouldn’t be any issues with uncommitted updates. I 
didn’t change anything in the logic that updates the task documents, only the 
loop that iterates over the tasks.
> 
> Could it be that the use of xdmp:set() to modify the $tasks variable (a 
sequence of  elements) would be causing some kind of commit lock?
> 
> Thanks,
> 
> Eliot
> 
> --
> Eliot Kimber
> http://contrext.com
> 
> 
> 
> 
> ___
> General mailing list
> General@developer.marklogic.com
>

Re: [MarkLogic Dev General] How to Do Equivalent of While true() Loop In ML?

2017-11-27 Thread Eliot Kimber

I looks like my deadlock issue is, not surprisingly, my own bug somewhere—the 
use of xdmp:set on a sequence of elements appears to a red herring.

 

Cheers,

 

E.

 

--

Eliot Kimber

http://contrext.com

 

 

 

From:  on behalf of Eliot Kimber 

Reply-To: MarkLogic Developer Discussion 
Date: Monday, November 27, 2017 at 11:24 AM
To: MarkLogic Developer Discussion 
Subject: Re: [MarkLogic Dev General] How to Do Equivalent of While true() Loop 
In ML?

 

It does appear that using xdmp:set() on my sequence of elements leads to the 
apparent commit deadlock.

 

I reworked my code to iterate over a sequence of task IDs and then in each 
iteration I fetch the task element using that ID. With that change, my loop 
succeeds.

 

Using a scheduled job in this case doesn’t really work because I need to load 
up the client with as many job-submission tasks as there are available threads 
in the task server to make sure the servers are fully loaded (most jobs 
complete in a few seconds). So my approach is to divide the total jobs among N 
tasks in the task server. I do have a scheduled job to recreate these tasks, 
for example, in the case of a server restart. I could probably adjust my system 
where the tasks only do a subset of the total jobs and I depend on the 
scheduled job to create a new batch of job-submission tasks, but that seems 
unnecessarily complicated.

 

Cheers,

 

E.

 

--

Eliot Kimber

http://contrext.com

 

 

 

From:  on behalf of William Sawyer 

Reply-To: MarkLogic Developer Discussion 
Date: Monday, November 27, 2017 at 10:59 AM
To: MarkLogic Developer Discussion 
Subject: Re: [MarkLogic Dev General] How to Do Equivalent of While true() Loop 
In ML?

 

You could recursively spawn or setup a schedule task to run every minute or 
faster if needed. 

 

-Will

 

On Mon, Nov 27, 2017 at 9:56 AM, Eliot Kimber  wrote:

I have a client-server system where the client is spawning 100s of 1000s of 
jobs on the client. The client polls the servers to see when each server’s task 
queue is ready for more jobs. This all works fine.

Logically this polling is a while-true() loop that will continue until either 
all the servers are offline or all the tasks to be submitted are consumed.

In a procedural language this is trivial, but in XQuery 2 I’m not finding a way 
to do it that works. In XQuery 3 I could use the new iterate operator but that 
doesn’t seem to be available in MarkLogic 9.

My first attempt was to use a recursive process, relying on tail recursion 
optimization to avoid blowing the stack buffer. That worked logically but I 
still ran into out-of-memory on the server at some point (around 200K jobs 
submitted) and it seems likely that it was runaway recursion doing it.

So I tried using a simple loop with xdmp:set() to iterate over the tasks and 
use an exception to break out when all the tasks are done:

try {
for $i in 1 to 100 (: i.e., loop forever :)
if (empty($tasks))
then error()
else submit-task(head($tasks))
xdmp:set($tasks, tail($tasks))
 } catch ($e) {
(: We’re done. (
}

Is there a better way to do this kind of looping forever?

I’m also having a very strange behavior where in my new looping code I’m 
getting what I think must be a pending commit deadlock that I didn’t get in my 
recursive version of the code. I can trace the code to the xdmp:eval() that 
would commit an update to the task and that code never returns.

Each task is a document that I update to reflect the details of the task’s 
status (start and end times, current processing status, etc.). Those updates 
are all done either in separately-run modules or via xdmp:eval(), so as far as 
I can tell there shouldn’t be any issues with uncommitted updates. I didn’t 
change anything in the logic that updates the task documents, only the loop 
that iterates over the tasks.

Could it be that the use of xdmp:set() to modify the $tasks variable (a 
sequence of  elements) would be causing some kind of commit lock?

Thanks,

Eliot

--
Eliot Kimber
http://contrext.com




___
General mailing list
General@developer.marklogic.com
Manage your subscription at:
http://developer.marklogic.com/mailman/listinfo/general

 

___
General mailing list
General@developer.marklogic.com
Manage your subscription at: 
http://developer.marklogic.com/mailman/listinfo/general

Re: [MarkLogic Dev General] How to Do Equivalent of While true() Loop In ML?

2017-11-27 Thread Eliot Kimber

It does appear that using xdmp:set() on my sequence of elements leads to the 
apparent commit deadlock.

 

I reworked my code to iterate over a sequence of task IDs and then in each 
iteration I fetch the task element using that ID. With that change, my loop 
succeeds.

 

Using a scheduled job in this case doesn’t really work because I need to load 
up the client with as many job-submission tasks as there are available threads 
in the task server to make sure the servers are fully loaded (most jobs 
complete in a few seconds). So my approach is to divide the total jobs among N 
tasks in the task server. I do have a scheduled job to recreate these tasks, 
for example, in the case of a server restart. I could probably adjust my system 
where the tasks only do a subset of the total jobs and I depend on the 
scheduled job to create a new batch of job-submission tasks, but that seems 
unnecessarily complicated.

 

Cheers,

 

E.

 

--

Eliot Kimber

http://contrext.com

 

 

 

From:  on behalf of William Sawyer 

Reply-To: MarkLogic Developer Discussion 
Date: Monday, November 27, 2017 at 10:59 AM
To: MarkLogic Developer Discussion 
Subject: Re: [MarkLogic Dev General] How to Do Equivalent of While true() Loop 
In ML?

 

You could recursively spawn or setup a schedule task to run every minute or 
faster if needed. 

 

-Will

 

On Mon, Nov 27, 2017 at 9:56 AM, Eliot Kimber  wrote:

I have a client-server system where the client is spawning 100s of 1000s of 
jobs on the client. The client polls the servers to see when each server’s task 
queue is ready for more jobs. This all works fine.

Logically this polling is a while-true() loop that will continue until either 
all the servers are offline or all the tasks to be submitted are consumed.

In a procedural language this is trivial, but in XQuery 2 I’m not finding a way 
to do it that works. In XQuery 3 I could use the new iterate operator but that 
doesn’t seem to be available in MarkLogic 9.

My first attempt was to use a recursive process, relying on tail recursion 
optimization to avoid blowing the stack buffer. That worked logically but I 
still ran into out-of-memory on the server at some point (around 200K jobs 
submitted) and it seems likely that it was runaway recursion doing it.

So I tried using a simple loop with xdmp:set() to iterate over the tasks and 
use an exception to break out when all the tasks are done:

try {
for $i in 1 to 100 (: i.e., loop forever :)
if (empty($tasks))
then error()
else submit-task(head($tasks))
xdmp:set($tasks, tail($tasks))
 } catch ($e) {
(: We’re done. (
}

Is there a better way to do this kind of looping forever?

I’m also having a very strange behavior where in my new looping code I’m 
getting what I think must be a pending commit deadlock that I didn’t get in my 
recursive version of the code. I can trace the code to the xdmp:eval() that 
would commit an update to the task and that code never returns.

Each task is a document that I update to reflect the details of the task’s 
status (start and end times, current processing status, etc.). Those updates 
are all done either in separately-run modules or via xdmp:eval(), so as far as 
I can tell there shouldn’t be any issues with uncommitted updates. I didn’t 
change anything in the logic that updates the task documents, only the loop 
that iterates over the tasks.

Could it be that the use of xdmp:set() to modify the $tasks variable (a 
sequence of  elements) would be causing some kind of commit lock?

Thanks,

Eliot

--
Eliot Kimber
http://contrext.com




___
General mailing list
General@developer.marklogic.com
Manage your subscription at:
http://developer.marklogic.com/mailman/listinfo/general

 

___
General mailing list
General@developer.marklogic.com
Manage your subscription at: 
http://developer.marklogic.com/mailman/listinfo/general

[MarkLogic Dev General] How to Do Equivalent of While true() Loop In ML?

2017-11-27 Thread Eliot Kimber

I have a client-server system where the client is spawning 100s of 1000s of 
jobs on the client. The client polls the servers to see when each server’s task 
queue is ready for more jobs. This all works fine.

Logically this polling is a while-true() loop that will continue until either 
all the servers are offline or all the tasks to be submitted are consumed.

In a procedural language this is trivial, but in XQuery 2 I’m not finding a way 
to do it that works. In XQuery 3 I could use the new iterate operator but that 
doesn’t seem to be available in MarkLogic 9.

My first attempt was to use a recursive process, relying on tail recursion 
optimization to avoid blowing the stack buffer. That worked logically but I 
still ran into out-of-memory on the server at some point (around 200K jobs 
submitted) and it seems likely that it was runaway recursion doing it.

So I tried using a simple loop with xdmp:set() to iterate over the tasks and 
use an exception to break out when all the tasks are done:

try {
for $i in 1 to 100 (: i.e., loop forever :)
if (empty($tasks))
then error()
else submit-task(head($tasks))
xdmp:set($tasks, tail($tasks))
 } catch ($e) {
(: We’re done. (
}

Is there a better way to do this kind of looping forever?

I’m also having a very strange behavior where in my new looping code I’m 
getting what I think must be a pending commit deadlock that I didn’t get in my 
recursive version of the code. I can trace the code to the xdmp:eval() that 
would commit an update to the task and that code never returns.

Each task is a document that I update to reflect the details of the task’s 
status (start and end times, current processing status, etc.). Those updates 
are all done either in separately-run modules or via xdmp:eval(), so as far as 
I can tell there shouldn’t be any issues with uncommitted updates. I didn’t 
change anything in the logic that updates the task documents, only the loop 
that iterates over the tasks.

Could it be that the use of xdmp:set() to modify the $tasks variable (a 
sequence of  elements) would be causing some kind of commit lock?

Thanks,

Eliot

--
Eliot Kimber
http://contrext.com
 



___
General mailing list
General@developer.marklogic.com
Manage your subscription at: 
http://developer.marklogic.com/mailman/listinfo/general

[MarkLogic Dev General] Ensuring That Tail Recursion Optimization Will Be Applied

2017-11-11 Thread Eliot Kimber

ML 9

I’m using recursive functions to process an arbitrarily large set of items 
where in a procedural language I would use a while true() loop. The number of 
items can be large so tail recursion optimization has to be in place or I’ll 
eventually blow the call stack.

My question: How do I ensure that my functions are constructed so that tail 
recursion optimization will be applied?

A search on “recursion” or “tail recursion” didn’t reveal anything on the ML 
docs site.

Thanks,

Eliot
--
Eliot Kimber
http://contrext.com
 



___
General mailing list
General@developer.marklogic.com
Manage your subscription at: 
http://developer.marklogic.com/mailman/listinfo/general

Re: [MarkLogic Dev General] [RESOLVE] Spawned Task Appears to Block Other Threads

2017-11-10 Thread Eliot Kimber

I moved my response handler to a different web app than the web app that starts 
the job submission task, but I was still having the issue where the response 
handler didn’t handle responses (or didn’t get responses, hard for me to tell 
which it is) until the job submission task is canceled.

I realized that my problem was the initial job submission was updating the job 
record for each job, but I was doing the update as part of the main processing, 
rather than using eval(), so the commit wasn’t done until the task ended.

The response handler also wants to update the job record, but because there is 
a pending commit, it is blocked. 

By having the job submission do the update via eval() then the commit is done 
immediately and everything works.

So the lesson is: you have to really understand the implications of updates and 
commits or you will go astray. Or maybe “concurrency is always hard”.

Thanks,

Eliot
--
Eliot Kimber
http://contrext.com

On 11/10/17, 9:39 AM, "general-boun...@developer.marklogic.com on behalf of 
Eliot Kimber"  wrote:

Yes, the client process is started from a Web app, so I think your analysis 
is correct.

I will move the response handling to a separate Web app—probably should 
have done that from the start.

Thanks,

Eliot
    --
Eliot Kimber
http://contrext.com

On 11/9/17, 11:46 PM, "general-boun...@developer.marklogic.com on behalf of 
Geert Josten"  wrote:

Hi Eliot,

I think you kicked off your watcher job with an HTTP request, and it 
keeps
the port open until it finishes. Only one thread can use the port at the
same time. Use a different port for task response traffic, or consider
running your watcher as a scheduled task.

Not super robust, and probably not used in production, but i did write 
an
alternative queque for MarkLogic. It might give you some ideas..

https://github.com/grtjn/ml-queue

Cheers,
Geert

On 11/10/17, 1:06 AM, "general-boun...@developer.marklogic.com on behalf
of Eliot Kimber"  wrote:

>I have a system where I have a ³client² ML server that submits jobs to 
a
>set of remote ML servers, checking their task queues and keeping each
>server¹s queue at a max of 100 queued items (the remote servers could 
go
>away without notice so the client needs to be able to restart tasks and
>not have too many things queued up that would just have to 
resubmitted).
>
>The remote tasks then talk back to the client to report status and 
return
>their final results.
>
>My job submission code use recursive functions to iterate over the set 
of
>tasks to be submitted, checking for free remote queue slots via the ML
>REST API and submitting jobs as the queues empty. This code is spawned
>into a separate task in the task server. It uses xdmp:sleep(1000) to
>pause between checking the job queues.
>
>This all works fine, in that my jobs are submitted correctly and the
>remote queues fill up.
>
>However, as long as the job-submission task in the task server is
>running, the HTTP app that handles the REST calls from the remote 
servers
>is blocked (which blocks the remote jobs, which are of course waiting 
for
>responses from the client).
>
>If I kill the task server task, then the remote responses are handled 
as
>I would expect.
>
>My question: Why would the task server task block the other app? There
>must be something I¹m doing or not doing but I have no idea what it 
might
>be.
>
>Thanks,
>
>Eliot
>--
>Eliot Kimber
>http://contrext.com
> 
>
>
>
>___
>General mailing list
>General@developer.marklogic.com
>Manage your subscription at:
>http://developer.marklogic.com/mailman/listinfo/general

___
General mailing list
General@developer.marklogic.com
Manage your subscription at: 
http://developer.marklogic.com/mailman/listinfo/general

___
General mailing list
General@developer.marklogic.com
Manage your subscription at: 
http://developer.marklogic.com/mailman/listinfo/general

___
General mailing list
General@developer.marklogic.com
Manage your subscription at: 
http://developer.marklogic.com/mailman/listinfo/general

Re: [MarkLogic Dev General] Spawned Task Appears to Block Other Threads

2017-11-10 Thread Eliot Kimber

Yes, the client process is started from a Web app, so I think your analysis is 
correct.

I will move the response handling to a separate Web app—probably should have 
done that from the start.

Thanks,

Eliot
--
Eliot Kimber
http://contrext.com
 


On 11/9/17, 11:46 PM, "general-boun...@developer.marklogic.com on behalf of 
Geert Josten"  wrote:

Hi Eliot,

I think you kicked off your watcher job with an HTTP request, and it keeps
the port open until it finishes. Only one thread can use the port at the
same time. Use a different port for task response traffic, or consider
running your watcher as a scheduled task.

Not super robust, and probably not used in production, but i did write an
alternative queque for MarkLogic. It might give you some ideas..

https://github.com/grtjn/ml-queue


Cheers,
Geert

On 11/10/17, 1:06 AM, "general-boun...@developer.marklogic.com on behalf
of Eliot Kimber"  wrote:

>I have a system where I have a ³client² ML server that submits jobs to a
>set of remote ML servers, checking their task queues and keeping each
>server¹s queue at a max of 100 queued items (the remote servers could go
>away without notice so the client needs to be able to restart tasks and
>not have too many things queued up that would just have to resubmitted).
>
>The remote tasks then talk back to the client to report status and return
>their final results.
>
>My job submission code use recursive functions to iterate over the set of
>tasks to be submitted, checking for free remote queue slots via the ML
>REST API and submitting jobs as the queues empty. This code is spawned
>into a separate task in the task server. It uses xdmp:sleep(1000) to
>pause between checking the job queues.
>
>This all works fine, in that my jobs are submitted correctly and the
>remote queues fill up.
>
>However, as long as the job-submission task in the task server is
>running, the HTTP app that handles the REST calls from the remote servers
>is blocked (which blocks the remote jobs, which are of course waiting for
>responses from the client).
>
>If I kill the task server task, then the remote responses are handled as
>I would expect.
>
>My question: Why would the task server task block the other app? There
>must be something I¹m doing or not doing but I have no idea what it might
>be.
>
>Thanks,
>
>Eliot
>--
>Eliot Kimber
>http://contrext.com
> 
>
>
>
>___
>General mailing list
>General@developer.marklogic.com
>Manage your subscription at:
>http://developer.marklogic.com/mailman/listinfo/general

___
General mailing list
General@developer.marklogic.com
Manage your subscription at: 
http://developer.marklogic.com/mailman/listinfo/general



___
General mailing list
General@developer.marklogic.com
Manage your subscription at: 
http://developer.marklogic.com/mailman/listinfo/general

[MarkLogic Dev General] Spawned Task Appears to Block Other Threads

2017-11-09 Thread Eliot Kimber

I have a system where I have a “client” ML server that submits jobs to a set of 
remote ML servers, checking their task queues and keeping each server’s queue 
at a max of 100 queued items (the remote servers could go away without notice 
so the client needs to be able to restart tasks and not have too many things 
queued up that would just have to resubmitted).

The remote tasks then talk back to the client to report status and return their 
final results.

My job submission code use recursive functions to iterate over the set of tasks 
to be submitted, checking for free remote queue slots via the ML REST API and 
submitting jobs as the queues empty. This code is spawned into a separate task 
in the task server. It uses xdmp:sleep(1000) to pause between checking the job 
queues.

This all works fine, in that my jobs are submitted correctly and the remote 
queues fill up.

However, as long as the job-submission task in the task server is running, the 
HTTP app that handles the REST calls from the remote servers is blocked (which 
blocks the remote jobs, which are of course waiting for responses from the 
client).

If I kill the task server task, then the remote responses are handled as I 
would expect.

My question: Why would the task server task block the other app? There must be 
something I’m doing or not doing but I have no idea what it might be.

Thanks,

Eliot
--
Eliot Kimber
http://contrext.com
 



___
General mailing list
General@developer.marklogic.com
Manage your subscription at: 
http://developer.marklogic.com/mailman/listinfo/general

Re: [MarkLogic Dev General] [resolved] What Might Cause Documents to Silently Not Be Created?

2017-11-09 Thread Eliot Kimber

I’ve found my bug: bad assumption about the data: While the URIs of all my 
input documents are unique, their filenames are not and I was using just the 
filename as the basis for my task record URIs.

I was in the process of posting my code and that led me to verify that my 
assumptions were correct (because I knew somebody would challenge them) and 
what do you know, they weren’t.

So the lesson for the day is: always double check your assumptions about data.

But I also learned something about uncatchable exceptions, so that’s good too.

Thanks for everyone’s help.

Cheers,

Eliot 

--
Eliot Kimber
http://contrext.com

On 11/9/17, 10:11 AM, "general-boun...@developer.marklogic.com on behalf of 
Eliot Kimber"  wrote:

I’m actually not doing anything with the HTTP response. I get the response 
but currently don’t examine it (in fact I have a FIXME in the code to add 
handling of non-200 response codes, but for now it’s basically fire and 
forget—the request to the remote server ultimately spawns a task on that 
server, so the only non-success response would be one where the xmdp:spawn() on 
the remote server failed, which is unlikely to happen under normal operating 
conditions).

I’m also careful to always turn off auto mapping, which is as evil as evil 
can be.

There were no relevant errors in the ErrorLog.txt. The code is running on 
the task server and I do see all my expected (success) messages there.

Looking at the uncatchable exceptions article, the only possible issue 
would be failures during commit that result in uncatchable exceptions.

Per the article, I’m now using eval() to do the document-insert—that should 
allow any commit-time failure exception to now be caught.

Otherwise, none of the conditions that you suggested could hold: document 
URIs should be unique (because they reflect the URIs of the source items, each 
of which is the root node of its own document), there are no permissions in 
effect, I’m only creating a few 100 tasks in the task queue (each task then 
processes a 1000 input items, so 500K items means 500 tasks), I’m not spawning 
in update mode (but if that was the problem then it should fail for all 
attempts, not just a few of them).

Cheers,

E.

Eliot Kimber

On 11/9/17, 10:00 AM, "general-boun...@developer.marklogic.com on behalf of 
Will Thompson"  wrote:

Eliot,

When you make the remote HTTP call, are you using one of the 
xdmp:http-XYZ functions? Since those functions return a payload describing the 
response condition and don't throw exceptions for most errors, is it possible 
that an HTTP response error condition is not being handled, resulting in 
inserting an empty sequence instead of a document? In the default case where 
function mapping is turned on, inserting an empty sequence will result in not 
calling xdmp:document-insert at all. You could test to see if that's happening 
by disabling function mapping, which would cause an exception to be raised 
instead.

-Will

> On Nov 8, 2017, at 5:25 PM, Eliot Kimber  wrote:
> 
> Using ML 9:
> 
> I have a process that quickly creates a large number of small 
documents, one for each item in a set of input items.
> 
> My code is basically:
> 
> 1. Log that I’m about to act on the input item
> 2. Act on the input item (send the input item to a remote HTTP end 
point)
> 3. Create a new doc reflecting the input item I just acted on
> 
> This code is within a try/catch and I log the exception, so I should 
know if there are any exceptions during this process by examining the log.
> 
> I’m processing about 500K input items, with the processing spread 
over the 16 threads of my task server. So there are 16 tasks quickly writing 
these docs concurrently.
> 
> I know the exact count of the input items and I get that count in the 
log, so I know that I’m actually processing all the items I should be.
> 
> However, if I subsequently count the documents created in step 3 I’m 
short by about 1500, meaning that not all the docs got created, which should 
not be able to happen unless there was an exception between the log message and 
the document-insert() call, but I’m not finding any exceptions or other errors 
reported in the log.
> 
> My question: is there anything that would cause docs to silently not 
get created under this kind of heavy-load? I would hope not but just wanted to 
make sure.
> 
> I’m assuming this issue is my bug somewhere, but the code is pretty 
simple and I’m not seeing any obvious way the documents could not get created 
without a corresponding exception report.
&

Re: [MarkLogic Dev General] What Might Cause Documents to Silently Not Be Created?

2017-11-09 Thread Eliot Kimber

I’m actually not doing anything with the HTTP response. I get the response but 
currently don’t examine it (in fact I have a FIXME in the code to add handling 
of non-200 response codes, but for now it’s basically fire and forget—the 
request to the remote server ultimately spawns a task on that server, so the 
only non-success response would be one where the xmdp:spawn() on the remote 
server failed, which is unlikely to happen under normal operating conditions).

I’m also careful to always turn off auto mapping, which is as evil as evil can 
be.

There were no relevant errors in the ErrorLog.txt. The code is running on the 
task server and I do see all my expected (success) messages there.

Looking at the uncatchable exceptions article, the only possible issue would be 
failures during commit that result in uncatchable exceptions.

Per the article, I’m now using eval() to do the document-insert—that should 
allow any commit-time failure exception to now be caught.

Otherwise, none of the conditions that you suggested could hold: document URIs 
should be unique (because they reflect the URIs of the source items, each of 
which is the root node of its own document), there are no permissions in 
effect, I’m only creating a few 100 tasks in the task queue (each task then 
processes a 1000 input items, so 500K items means 500 tasks), I’m not spawning 
in update mode (but if that was the problem then it should fail for all 
attempts, not just a few of them).

Cheers,

E.

Eliot Kimber

On 11/9/17, 10:00 AM, "general-boun...@developer.marklogic.com on behalf of 
Will Thompson"  wrote:

Eliot,

When you make the remote HTTP call, are you using one of the xdmp:http-XYZ 
functions? Since those functions return a payload describing the response 
condition and don't throw exceptions for most errors, is it possible that an 
HTTP response error condition is not being handled, resulting in inserting an 
empty sequence instead of a document? In the default case where function 
mapping is turned on, inserting an empty sequence will result in not calling 
xdmp:document-insert at all. You could test to see if that's happening by 
disabling function mapping, which would cause an exception to be raised instead.

-Will

> On Nov 8, 2017, at 5:25 PM, Eliot Kimber  wrote:
> 
> Using ML 9:
> 
> I have a process that quickly creates a large number of small documents, 
one for each item in a set of input items.
> 
> My code is basically:
> 
> 1. Log that I’m about to act on the input item
> 2. Act on the input item (send the input item to a remote HTTP end point)
> 3. Create a new doc reflecting the input item I just acted on
> 
> This code is within a try/catch and I log the exception, so I should know 
if there are any exceptions during this process by examining the log.
> 
> I’m processing about 500K input items, with the processing spread over 
the 16 threads of my task server. So there are 16 tasks quickly writing these 
docs concurrently.
> 
> I know the exact count of the input items and I get that count in the 
log, so I know that I’m actually processing all the items I should be.
> 
> However, if I subsequently count the documents created in step 3 I’m 
short by about 1500, meaning that not all the docs got created, which should 
not be able to happen unless there was an exception between the log message and 
the document-insert() call, but I’m not finding any exceptions or other errors 
reported in the log.
> 
> My question: is there anything that would cause docs to silently not get 
created under this kind of heavy-load? I would hope not but just wanted to make 
sure.
> 
> I’m assuming this issue is my bug somewhere, but the code is pretty 
simple and I’m not seeing any obvious way the documents could not get created 
without a corresponding exception report.
> 
> Thanks,
> 
> Eliot
> --
> Eliot Kimber
> 
https://urldefense.proofpoint.com/v2/url?u=http-3A__contrext.com&d=DwIGaQ&c=IdrBOxAMwHPzAikPNzltHw&r=_thRNTuzvzYaEDwaA_AfnAe5hN2lWgi6qdluz6ApLYI&m=2iEH0KHItwSGn5Cq8UYIMpA4MQnafnAny1y8s43aoag&s=mTsM_MYz77769uC2Vfuy-90pJind0H3TE9DPcO3HaDM&e=
> 
> 
> 
> 
> ___
> General mailing list
> General@developer.marklogic.com
> Manage your subscription at: 
> 
https://urldefense.proofpoint.com/v2/url?u=http-3A__developer.marklogic.com_mailman_listinfo_general&d=DwIGaQ&c=IdrBOxAMwHPzAikPNzltHw&r=_thRNTuzvzYaEDwaA_AfnAe5hN2lWgi6qdluz6ApLYI&m=2iEH0KHItwSGn5Cq8UYIMpA4MQnafnAny1y8s43aoag&s=rwaLAlQ6u8lCrp2pFbliZy9Buu5-PZZo65CIbCTXoUk&e=

_

[MarkLogic Dev General] What Might Cause Documents to Silently Not Be Created?

2017-11-08 Thread Eliot Kimber

Using ML 9:

I have a process that quickly creates a large number of small documents, one 
for each item in a set of input items.

My code is basically:

1. Log that I’m about to act on the input item
2. Act on the input item (send the input item to a remote HTTP end point)
3. Create a new doc reflecting the input item I just acted on

This code is within a try/catch and I log the exception, so I should know if 
there are any exceptions during this process by examining the log.

I’m processing about 500K input items, with the processing spread over the 16 
threads of my task server. So there are 16 tasks quickly writing these docs 
concurrently.

I know the exact count of the input items and I get that count in the log, so I 
know that I’m actually processing all the items I should be.

However, if I subsequently count the documents created in step 3 I’m short by 
about 1500, meaning that not all the docs got created, which should not be able 
to happen unless there was an exception between the log message and the 
document-insert() call, but I’m not finding any exceptions or other errors 
reported in the log.

My question: is there anything that would cause docs to silently not get 
created under this kind of heavy-load? I would hope not but just wanted to make 
sure.

I’m assuming this issue is my bug somewhere, but the code is pretty simple and 
I’m not seeing any obvious way the documents could not get created without a 
corresponding exception report.

Thanks,

Eliot
--
Eliot Kimber
http://contrext.com
 



___
General mailing list
General@developer.marklogic.com
Manage your subscription at: 
http://developer.marklogic.com/mailman/listinfo/general

Re: [MarkLogic Dev General] exact match using regex

2017-10-13 Thread Eliot Kimber

The anchors mean that the expression must match the entirety of the input 
string, so “^re$” can only match the input string “re”.

If you want to match only the blank-delimited token “re” then you would want 
something like:

“(^re\s|\sre\s|\sre$)”

That is, match “re “ at start of input, “ re “ anywhere, or “ re” at end of 
input.

Or you could tokenize the input and then use the equals operator:

Tokenize(“I am learning regex”, “ “) = (“re”)

Remember that the “=” operator is sequence comparison, so if any member of the 
left-hand sequence equals any member of the right-hand sequence, it resolves to 
true().

Cheers,

Eliot

Eliot Kimber
http://contrext.com

On 10/13/17, 2:41 PM, "general-boun...@developer.marklogic.com on behalf of 
vikas.sin...@cognizant.com"  wrote:

Thanks for your reply

But metacharacter addition not working  it is behaving like ,
This statement is returning fn:matches("I am learning regex","^re$")  false 
(expected)
But this  statement is returning  fn:matches("I am learning re","^re$")  
false (unexpected) but I am expecting true for the same.

-Original Message-
From: general-boun...@developer.marklogic.com 
[mailto:general-boun...@developer.marklogic.com] On Behalf Of Christopher Hamlin
Sent: Friday, October 13, 2017 3:26 PM
To: MarkLogic Developer Discussion 
Subject: Re: [MarkLogic Dev General] exact match using regex

You can use anchors, as

fn:matches("I am learning regex","^re$")

7.6.2 fn:matches of https://www.w3.org/TR/xpath-functions/#flags says:

Unless the metacharacters ^ and $ are used as anchors, the string is 
considered to match the pattern if any substring matches the pattern.
But if anchors are used, the anchors must match the start/end of the string 
(in string mode), or the start/end of a line (in multiline mode).

Note:

This is different from the behavior of patterns in [XML Schema Part 2:
Datatypes Second Edition], where regular expressions are implicitly 
anchored.

On Fri, Oct 13, 2017 at 3:19 PM,   wrote:
> Hi All,
>
>
>
> How to match exact word using fn:matches and regex.
>
>
>
> Example : fn:matches(“I am learning regex”,”re”)
>
> Above statement returning true as it is  matching with “regex”  but I
> want it should match only when string found exact keyword.
>
>
>
> Regards,
>
> Vikas Singh
>
>
>
> This e-mail and any files transmitted with it are for the sole use of
> the intended recipient(s) and may contain confidential and privileged
> information. If you are not the intended recipient(s), please reply to
> the sender and destroy all copies of the original message. Any
> unauthorized review, use, disclosure, dissemination, forwarding,
> printing or copying of this email, and/or any action taken in reliance
> on the contents of this e-mail is strictly prohibited and may be
> unlawful. Where permitted by applicable law, this e-mail and other
> e-mail communications sent to and from Cognizant e-mail addresses may be 
monitored.
>
> ___
> General mailing list
> General@developer.marklogic.com
> Manage your subscription at:
> http://developer.marklogic.com/mailman/listinfo/general
>
___
General mailing list
General@developer.marklogic.com
Manage your subscription at:
http://developer.marklogic.com/mailman/listinfo/general
This e-mail and any files transmitted with it are for the sole use of the 
intended recipient(s) and may contain confidential and privileged information. 
If you are not the intended recipient(s), please reply to the sender and 
destroy all copies of the original message. Any unauthorized review, use, 
disclosure, dissemination, forwarding, printing or copying of this email, 
and/or any action taken in reliance on the contents of this e-mail is strictly 
prohibited and may be unlawful. Where permitted by applicable law, this e-mail 
and other e-mail communications sent to and from Cognizant e-mail addresses may 
be monitored.
___
General mailing list
General@developer.marklogic.com
Manage your subscription at: 
http://developer.marklogic.com/mailman/listinfo/general

___
General mailing list
General@developer.marklogic.com
Manage your subscription at: 
http://developer.marklogic.com/mailman/listinfo/general

Re: [MarkLogic Dev General] How To Detect Task Time Limit Exceeded Failures?

2017-10-07 Thread Eliot Kimber


In fact, now that I look for it, that’s already happening in my code, I just 
didn’t realize it. So problem solved.

Cheers,

E.
--
Eliot Kimber
http://contrext.com


On 10/7/17, 8:18 AM, "general-boun...@developer.marklogic.com on behalf of 
Eliot Kimber"  wrote:

I can certainly experiment with that.

Cheers,

E.
    --
Eliot Kimber
http://contrext.com
 


On 10/7/17, 7:41 AM, "general-boun...@developer.marklogic.com on behalf of 
Geert Josten"  wrote:

Hi Eliot,

I heard the other day that it should be possible to capture such 
timeouts
with a try catch within the code itself. That gives an extra 10 seconds
delay which might be sufficient to send out an alert email, or raise 
some
other flag. After those few extra seconds, the timeout gets rethrown if
you don¹t finish in time..

Might be worth investigating?

Cheers,
Geert

On 10/7/17, 12:10 AM, "general-boun...@developer.marklogic.com on behalf
of Eliot Kimber"  wrote:

>Using current ML 9:
>
>I¹ve set up a little client-server application where the client spawns 
a
>large number of tasks on a remote cluster. Each remote task reports its
>status back to the client via HTTP.
>
>However, if one of the tasks times out in the Task Server there¹s no 
way
>for it to report its own failure and there doesn¹t seem to be anything
>else other than the task server that can detect the failure and report 
it.
>
>Is there any built-in mechanism by which a task time limit exceeded
>failure can be detected in a way that would allow me to the report back
>to the calling client? For example, something that gets the task¹s
>current call stack at the time of failure, which would give me the 
info I
>need to report back to the calling client.
>
>Unfortunately, the code I¹m running in these tasks is pre-existing
>processing that I¹m building this remote processing around so I can¹t
>easily do something like provide a heartbeat signal for each running 
task
>that a separate process could poll in order to detect terminated
>processes, although I¹m guessing that¹s the most likely solution now 
that
>I think about it.
>
>I do report to the client when each task starts so I guess I could
>presume that if a task hasn¹t finished some time after the configured 
max
>time limit that it is presumed to have failed.
>
>Thanks,
>
>Eliot  
>--
>Eliot Kimber
>http://contrext.com
> 
>
>
>
>___
>General mailing list
>General@developer.marklogic.com
>Manage your subscription at:
>http://developer.marklogic.com/mailman/listinfo/general

___
General mailing list
General@developer.marklogic.com
Manage your subscription at: 
http://developer.marklogic.com/mailman/listinfo/general



___
General mailing list
General@developer.marklogic.com
Manage your subscription at: 
http://developer.marklogic.com/mailman/listinfo/general






___
General mailing list
General@developer.marklogic.com
Manage your subscription at: 
http://developer.marklogic.com/mailman/listinfo/general

Re: [MarkLogic Dev General] How To Detect Task Time Limit Exceeded Failures?

2017-10-07 Thread Eliot Kimber

I can certainly experiment with that.

Cheers,

E.
--
Eliot Kimber
http://contrext.com
 


On 10/7/17, 7:41 AM, "general-boun...@developer.marklogic.com on behalf of 
Geert Josten"  wrote:

Hi Eliot,

I heard the other day that it should be possible to capture such timeouts
with a try catch within the code itself. That gives an extra 10 seconds
delay which might be sufficient to send out an alert email, or raise some
other flag. After those few extra seconds, the timeout gets rethrown if
you don¹t finish in time..

Might be worth investigating?

Cheers,
Geert

On 10/7/17, 12:10 AM, "general-boun...@developer.marklogic.com on behalf
of Eliot Kimber"  wrote:

>Using current ML 9:
>
>I¹ve set up a little client-server application where the client spawns a
>large number of tasks on a remote cluster. Each remote task reports its
>status back to the client via HTTP.
>
>However, if one of the tasks times out in the Task Server there¹s no way
>for it to report its own failure and there doesn¹t seem to be anything
>else other than the task server that can detect the failure and report it.
>
>Is there any built-in mechanism by which a task time limit exceeded
>failure can be detected in a way that would allow me to the report back
>to the calling client? For example, something that gets the task¹s
>current call stack at the time of failure, which would give me the info I
>need to report back to the calling client.
>
>Unfortunately, the code I¹m running in these tasks is pre-existing
>processing that I¹m building this remote processing around so I can¹t
>easily do something like provide a heartbeat signal for each running task
>that a separate process could poll in order to detect terminated
>processes, although I¹m guessing that¹s the most likely solution now that
>I think about it.
>
>I do report to the client when each task starts so I guess I could
>presume that if a task hasn¹t finished some time after the configured max
>time limit that it is presumed to have failed.
>
>Thanks,
>
>Eliot  
>--
>Eliot Kimber
>http://contrext.com
> 
>
>
>
>___
>General mailing list
>General@developer.marklogic.com
>Manage your subscription at:
>http://developer.marklogic.com/mailman/listinfo/general

___
General mailing list
General@developer.marklogic.com
Manage your subscription at: 
http://developer.marklogic.com/mailman/listinfo/general



___
General mailing list
General@developer.marklogic.com
Manage your subscription at: 
http://developer.marklogic.com/mailman/listinfo/general

[MarkLogic Dev General] How To Detect Task Time Limit Exceeded Failures?

2017-10-06 Thread Eliot Kimber

Using current ML 9:

I’ve set up a little client-server application where the client spawns a large 
number of tasks on a remote cluster. Each remote task reports its status back 
to the client via HTTP.

However, if one of the tasks times out in the Task Server there’s no way for it 
to report its own failure and there doesn’t seem to be anything else other than 
the task server that can detect the failure and report it.

Is there any built-in mechanism by which a task time limit exceeded failure can 
be detected in a way that would allow me to the report back to the calling 
client? For example, something that gets the task’s current call stack at the 
time of failure, which would give me the info I need to report back to the 
calling client.

Unfortunately, the code I’m running in these tasks is pre-existing processing 
that I’m building this remote processing around so I can’t easily do something 
like provide a heartbeat signal for each running task that a separate process 
could poll in order to detect terminated processes, although I’m guessing 
that’s the most likely solution now that I think about it.

I do report to the client when each task starts so I guess I could presume that 
if a task hasn’t finished some time after the configured max time limit that it 
is presumed to have failed.

Thanks,

Eliot  
--
Eliot Kimber
http://contrext.com
 



___
General mailing list
General@developer.marklogic.com
Manage your subscription at: 
http://developer.marklogic.com/mailman/listinfo/general

Re: [MarkLogic Dev General] Trouble with syntax... trying to return XML tree

2017-10-05 Thread Eliot Kimber

Your code would be easier to read if you used literal result elements rather 
than element constructors, e.g.:

 



    {$policy_num}

    {$num_recs}

    {

    local:recordsByPolicy($policy_num)

    }



 

As a general rule, it’s only necessary (or useful) to use element constructors 
when the element type is dynamically determined.

 

Cheers,

 

Eliot

--

Eliot Kimber

http://contrext.com

 

 

 

From:  on behalf of Matt Moody 

Reply-To: MarkLogic Developer Discussion 
Date: Wednesday, October 4, 2017 at 7:08 PM
To: MarkLogic Developer Discussion 
Subject: [MarkLogic Dev General] Trouble with syntax... trying to return XML 
tree

 

I am getting the error XDMP-UNEXPECTED: (err:XPST0003) Unexpected token syntax 
error, unexpected $end from the below query, and cannot figure out why.

 

Any ideas would be appreciated!

 

The idea here, is that an Email may show up in multiple Policy Records, and 
that same Email may be linked to multiple Policy Numbers, and each Policy 
Number may show up in multiple Records. I want to use this query to return all 
Emails with multiple Policies, return each Policy number linked to the Email, 
and then show each Record (Document) where that Policy number is contained.

 

xquery version "1.0-ml";

 

declare variable $coll := "insurance-policies ";

 

declare function local:recordsByPolicy($policyNum as xs:string) {(

for $doc in fn:collection($coll)//policy[policy_num/text() = $policyNum]

return element document {

element doc_uri { xdmp:node-uri($doc) }

}

)};

 

declare function local:policiesByEmail($sourceEmail as xs:string) {(

for $policy_num in 
fn:distinct-values(fn:collection($coll)//policy[insured_email/text() = 
$email]/policy_num/text())

let $num_recs := fn:count(fn:collection($coll)//policy[policy_num/text() = 
$policy_num])

order by $num_recs descending

return element policy {

element policy_num {$policy_num},

element number_of_records {$num_recs},

element records {(

local:recordsByPolicy($policy_num)

)}

}

)};

 

let $emails_with_multiple_policies :=

for $em at $i in 
fn:distinct-values(fn:collection($coll)//insured_email/text())

let $policies := 
fn:count(fn:distinct-values(fn:collection($coll)//policy[insured_email/text() = 
$em]/policy_num/text()))

where $policies > 1

 

return (

element total_source_records {(fn:count(fn:collection($coll)))},

element unique_source_emails {(fn:count($unique_source_emails))},

element emails_w_mpolicies {(fn:count($emails_with_multiple_policies))},

element results {(

for $email in 
fn:distinct-values(fn:collection($coll)//insured_email/text())

let $num_policies := 
fn:count(fn:distinct-values(fn:collection($coll)//policy[insured_email/text() = 
$email]/policy_num/text()))

where $num_policies > 1

order by $num_policies descending

return element result {

element email {$email},

element number_of_policies_found {$num_policies},

element policies {(

local:policiesByEmail($email)

)}

}

)}

)

 

 

 

 

 

 

 

 

Matt Moody

Sales Engineer

MarkLogic Corporation

matt.mo...@marklogic.com

Mobile: +61 (0)415 564 355

 

 

This e-mail and any accompanying attachments are confidential. The information 
is intended solely for the use of the individual to whom it is addressed. Any 
review, disclosure, copying, distribution, or use of this e-mail communication 
by others is strictly prohibited. If you are not the intended recipient, please 
notify us immediately by returning this message to the sender and delete all 
copies. Thank you for your cooperation.

___
General mailing list
General@developer.marklogic.com
Manage your subscription at: 
http://developer.marklogic.com/mailman/listinfo/general

[MarkLogic Dev General] How To Reflect Specific Timezone in Formatted Date Time?

2017-09-29 Thread Eliot Kimber

I’m trying to produce a formatted date that reflects a specific time zone name, 
rather than e.g., “GMT-07:00”

format-dateTime($time, "[Y0001]-[M01]-[D01] at [H01]:[m01]:[s01] [ZN]")

where $time = 2017-09-29T08:01:54.216992-07:00

Returns

2017-09-29 at 08:01:54 GMT-07:00

Running on server in pacific time zone.

What I’d like is 

2017-09-29 at 08:01:54  PDT

I’ve tried setting the $place parameter to different values but nothing I’ve 
tried gives me a different result except to add a prefix before the date 
indicating the location. I also tried different values for the time zone 
pattern with no change (or simple failure due to a bad pattern). The W3C docs 
suggest that “[ZN]” should result in just the time zone name but those specs 
are very difficult to understand so I’m never sure I’m understanding them 
correctly.

Thanks,

Eliot
--
Eliot Kimber
http://contrext.com
 



___
General mailing list
General@developer.marklogic.com
Manage your subscription at: 
http://developer.marklogic.com/mailman/listinfo/general

Re: [MarkLogic Dev General] Apparent Memory Leak in Profiler

2017-09-01 Thread Eliot Kimber

I can verify that ML 8.07 resolves the memory leak in the profiler. I can now 
profile 100s of 1000s of tasks no problem.

Cheers,

E.

--
Eliot Kimber
http://contrext.com
 


On 8/28/17, 1:41 PM, "general-boun...@developer.marklogic.com on behalf of 
Eliot Kimber"  wrote:

Thanks—I should be able to test with latest ML 8 in a couple of days.

Cheers,

E.

    --
Eliot Kimber
http://contrext.com
 


On 8/28/17, 12:37 PM, "general-boun...@developer.marklogic.com on behalf of 
Christopher Hamlin"  wrote:

There was a bug where, under certain circumstances, the profiler will
result in a query deadlock &/or a resource leak (#45569).  It could be
that this is what you are seeing.

It was noticed in 8.0-2 and is fixed in the latest release (8.0-7).

On Mon, Aug 28, 2017 at 1:11 PM, Eliot Kimber  
wrote:
> I reported earlier that my profiling application was causing 
MarkLogic to restart after handling about 20,000 tasks. Turns out it was an 
out-of-memory issue on the server itself (currently configured with 256GB of 
RAM). We could see a distinct spike in memory usage, at which point the server 
restarted MarkLogic. I tried different input data sets so it doesn’t appear to 
be an issue with a particular input document (my data set has a few outliers 
that are much larger than typical but only a few).
>
> Subsequent testing determined that it was the use of the MarkLogic 
profiler that was causing the memory spike: if I turned off the profiler then 
memory usage was flat and all the tasks completed as expected.
>
> This is ML 8.03. I’m still working on getting my server upgraded to a 
newer version of MarkLogic so I can see if this is an issue that has already 
been fixed.
>
> So it looks like there’s some kind of memory leak related to the 
profiler and I’d like to understand what that issue and either understand how 
to avoid it or report it formally.
>
> If it’s a general potential problem with large-scale processing would 
like to understand how to avoid it or plan for it. If it’s a problem specific 
to the profiler then need to report it formally and provide appropriate 
diagnostics.
>
> So my questions:
>
> 1. Is this a known issue with profiling? I’m guessing not in that I’m 
probably doing something out-of-the-ordinary vis-à-vis profiling and is 
something that nobody would see in typical single-instance ad-hoc profiling.
> 2. What types of MarkLogic processing would cause this kind of memory 
spike that lasts across the execution of multiple tasks? I would expect the 
memory required for a given task to be released as soon as the task is complete 
so I’m guessing it must be an issue with caches or something?
>
    > Thanks,
>
> Eliot
> --
> Eliot Kimber
> http://contrext.com
>
>
>
>
> ___
> General mailing list
> General@developer.marklogic.com
> Manage your subscription at:
> http://developer.marklogic.com/mailman/listinfo/general
___
General mailing list
General@developer.marklogic.com
Manage your subscription at: 
http://developer.marklogic.com/mailman/listinfo/general



___
General mailing list
General@developer.marklogic.com
Manage your subscription at: 
http://developer.marklogic.com/mailman/listinfo/general



___
General mailing list
General@developer.marklogic.com
Manage your subscription at: 
http://developer.marklogic.com/mailman/listinfo/general

Re: [MarkLogic Dev General] Apparent Memory Leak in Profiler

2017-08-28 Thread Eliot Kimber

Thanks—I should be able to test with latest ML 8 in a couple of days.

Cheers,

E.

--
Eliot Kimber
http://contrext.com
 


On 8/28/17, 12:37 PM, "general-boun...@developer.marklogic.com on behalf of 
Christopher Hamlin"  wrote:

There was a bug where, under certain circumstances, the profiler will
result in a query deadlock &/or a resource leak (#45569).  It could be
that this is what you are seeing.

It was noticed in 8.0-2 and is fixed in the latest release (8.0-7).

On Mon, Aug 28, 2017 at 1:11 PM, Eliot Kimber  wrote:
> I reported earlier that my profiling application was causing MarkLogic to 
restart after handling about 20,000 tasks. Turns out it was an out-of-memory 
issue on the server itself (currently configured with 256GB of RAM). We could 
see a distinct spike in memory usage, at which point the server restarted 
MarkLogic. I tried different input data sets so it doesn’t appear to be an 
issue with a particular input document (my data set has a few outliers that are 
much larger than typical but only a few).
>
> Subsequent testing determined that it was the use of the MarkLogic 
profiler that was causing the memory spike: if I turned off the profiler then 
memory usage was flat and all the tasks completed as expected.
>
> This is ML 8.03. I’m still working on getting my server upgraded to a 
newer version of MarkLogic so I can see if this is an issue that has already 
been fixed.
>
> So it looks like there’s some kind of memory leak related to the profiler 
and I’d like to understand what that issue and either understand how to avoid 
it or report it formally.
>
> If it’s a general potential problem with large-scale processing would 
like to understand how to avoid it or plan for it. If it’s a problem specific 
to the profiler then need to report it formally and provide appropriate 
diagnostics.
>
> So my questions:
>
> 1. Is this a known issue with profiling? I’m guessing not in that I’m 
probably doing something out-of-the-ordinary vis-à-vis profiling and is 
something that nobody would see in typical single-instance ad-hoc profiling.
> 2. What types of MarkLogic processing would cause this kind of memory 
spike that lasts across the execution of multiple tasks? I would expect the 
memory required for a given task to be released as soon as the task is complete 
so I’m guessing it must be an issue with caches or something?
    >
> Thanks,
>
> Eliot
> --
> Eliot Kimber
> http://contrext.com
>
>
>
>
> ___
> General mailing list
> General@developer.marklogic.com
> Manage your subscription at:
> http://developer.marklogic.com/mailman/listinfo/general
___
General mailing list
General@developer.marklogic.com
Manage your subscription at: 
http://developer.marklogic.com/mailman/listinfo/general



___
General mailing list
General@developer.marklogic.com
Manage your subscription at: 
http://developer.marklogic.com/mailman/listinfo/general

[MarkLogic Dev General] Apparent Memory Leak in Profiler

2017-08-28 Thread Eliot Kimber

I reported earlier that my profiling application was causing MarkLogic to 
restart after handling about 20,000 tasks. Turns out it was an out-of-memory 
issue on the server itself (currently configured with 256GB of RAM). We could 
see a distinct spike in memory usage, at which point the server restarted 
MarkLogic. I tried different input data sets so it doesn’t appear to be an 
issue with a particular input document (my data set has a few outliers that are 
much larger than typical but only a few).

Subsequent testing determined that it was the use of the MarkLogic profiler 
that was causing the memory spike: if I turned off the profiler then memory 
usage was flat and all the tasks completed as expected.

This is ML 8.03. I’m still working on getting my server upgraded to a newer 
version of MarkLogic so I can see if this is an issue that has already been 
fixed.

So it looks like there’s some kind of memory leak related to the profiler and 
I’d like to understand what that issue and either understand how to avoid it or 
report it formally.

If it’s a general potential problem with large-scale processing would like to 
understand how to avoid it or plan for it. If it’s a problem specific to the 
profiler then need to report it formally and provide appropriate diagnostics.

So my questions:

1. Is this a known issue with profiling? I’m guessing not in that I’m probably 
doing something out-of-the-ordinary vis-à-vis profiling and is something that 
nobody would see in typical single-instance ad-hoc profiling.
2. What types of MarkLogic processing would cause this kind of memory spike 
that lasts across the execution of multiple tasks? I would expect the memory 
required for a given task to be released as soon as the task is complete so I’m 
guessing it must be an issue with caches or something? 

Thanks,

Eliot
--
Eliot Kimber
http://contrext.com
 



___
General mailing list
General@developer.marklogic.com
Manage your subscription at: 
http://developer.marklogic.com/mailman/listinfo/general

[MarkLogic Dev General] Why No cts:median-aggregate() Function?

2017-08-25 Thread Eliot Kimber

I’m upgrading my profiling system to use cts:math functions for doing math on 
large numbers of durations—this speeds things up tremendously of course.

However, there doesn’t appear to be a median-aggregate() function in ML 8 or ML 
9, only cts:median(), which operates on a sequence of doubles.

For example, for a range index that is xs:dayTimeDurations I can I do:

let $average := 
cts:avg-aggregate(cts:element-reference(xs:QName("prof:overall-elapsed")), 
("item-frequency"),
 
cts:collection-query(epf:get-trial-collection($trial-number)))

But to get the equivalent median the only solution I’m seeing is to convert all 
the durations to doubles and then take the median, which is very slow.

At least in my data set, the median is a better measure of overall performance 
than average because I have a small number of very slow outliers, so I really 
need both median and average.

This seems like an obvious oversight in the ct:math package—am I missing a 
solution?

Thanks,

Eliot

--
Eliot Kimber
http://contrext.com
 



___
General mailing list
General@developer.marklogic.com
Manage your subscription at: 
http://developer.marklogic.com/mailman/listinfo/general

Re: [MarkLogic Dev General] Noob query question..

2017-08-24 Thread Eliot Kimber

I just went through an exercise similar to this with my profiling application.

 

In my case I’m capturing the output from the profiler for a large number of 
processing instances (potentially millions). I’m measuring both raw performance 
and also at-scale performance for processing of a large corpus, so it’s not 
sufficient to just profile a few cases. We know there is wide variation in 
performance for different input documents and we also want to see trends, both 
within the data and over time as the data, code, and servers evolve. So I’m 
measuring everything.

 

I want to know which expressions take the most time across all the instances 
and get a count. For example, in the longest instances one particular 
expression is always the top one but is it the top one in the faster instances?

 

The information is in the profiler histogram output but it is not ordered by 
shallow time (the value I’m interested in), so it’s not as easy as just getting 
the first expression for each histogram.

 

The solution approach, developed for me by Evan Lenz, is to use a trick with 
co-occurrence queries where attributes on the same element have a proximity of 
zero. If you construct an index over two attributes you can then use 
cts:value-co-occurrences() to get all the pairs and then select the ones you 
want. This approach also requires that each set be in a separate document (so 
that you can limit each call to cts:value-co-occurrences() to a single document 
using cts:document-query(). If there were multiple profiling results in a 
single document there would be no index-based way to limit 
value-co-occurrences() to a single profiling instance.

 

To enable this I had to post-process the output the MarkLogic profiler to add 
attributes to the prof:expression elements with the shallow-time and 
expr-source values, which are otherwise within subelements, the result being:

 



 

(I used a simple XSLT transform for this part of the processing, applied as I 
store my profiling results.)

 

I then defined attribute range indexes with word positions turned on for the 
@shallow-time and @expr-source attributes.

 

With that I could then do this to find the longest for each profiling instance 
(where each profiling instance is stored as a separate document):

 

let $maps :=

for $uri in cts:uris((), (), cts:collection-query($collection))
  let $expression-index := 
cts:element-attribute-reference(xs:QName("prof:expression"),xs:QName("expr-source"))
  let $shallow-time-index := 
cts:element-attribute-reference(xs:QName("prof:expression"),xs:QName("shallow-time"))
  let $max := cts:max($shallow-time-index, (), cts:document-query($uri))
  let $co-occurrences :=
  cts:value-co-occurrences(
  $expression-index,
  $shallow-time-index,
  ("proximity=0", "map"),
  cts:document-query($uri)
  )  
  let $max-co-occurrence := 
  for $map in $co-occurrences
  let $keys := map:keys($map)
  for $key in $keys
  return if ($max eq xs:dayTimeDuration(map:get($map, $key)[1])) 
  then 
  map:entry($key, map:get($map, $key))
      else ()
  return $max-co-occurrence

 

--

Eliot Kimber

http://contrext.com

 

From:  on behalf of "Ladner, Eric 
(Eric.Ladner)" 
Reply-To: MarkLogic Developer Discussion 
Date: Thursday, August 24, 2017 at 4:30 PM
To: MarkLogic Developer Discussion 
Subject: Re: [MarkLogic Dev General] Noob query question..

 

Thank you.  I will play with this in my development environment tomorrow.  I 
don’t quote see how it’s getting the counts per subject, though.

 

For reference.. the structure is similar to this:

 



   Test Subject

  2017-04-01T15:32:00

  Blah, blah



 

There would be many notes, obviously and the output would ideally be something 
like (not married to that output, but some output showing the counts for each 
subject over that time range).

 



  

 Test Subject

 2

  

  

Subject 2

4

  

   ...



 

Eric Ladner

Systems Analyst

eric.lad...@chevron.com

 

 

From: general-boun...@developer.marklogic.com 
[mailto:general-boun...@developer.marklogic.com] On Behalf Of Sam Mefford
Sent: August 24, 2017 15:59
To: MarkLogic Developer Discussion 
Subject: [**EXTERNAL**] Re: [MarkLogic Dev General] Noob query question..

 

I should point out that this is not the fastest way to do it.  A faster way 
would be to index "date-taken" as a dateTime element range index and use 
cts:search with cts:element-range-query.

 

Sam Mefford
Senior Engineer
MarkLogic Corporation
sam.meff...@marklogic.com
Cell: +1 801 706 9731
www.marklogic.com

This e-mail and any accompanying attachments are confidential. The information 
is intended solely for the use of the individual to whom it is addressed. Any 
review, disclosure, copying, distribution, or use of

Re: [MarkLogic Dev General] Where is General Documentation for the Task Server App?

2017-08-23 Thread Eliot Kimber

I’ll see if I can get MarkLogic server upgraded—does make sense to be on the 
latest version of ML 8.

Cheers,

Eliot

--
Eliot Kimber
http://contrext.com
 


On 8/23/17, 11:25 AM, "general-boun...@developer.marklogic.com on behalf of 
Geert Josten"  wrote:

Hi Eliot,

You could be hitting a bug in MarkLogic. It might be worth upgrading to
8.0-7, and seeing if it still happens with that version. A lot of patches
and performance improvements have been made since 8.0-3.2..

Cheers,
Geert

On 8/23/17, 5:47 PM, "general-boun...@developer.marklogic.com on behalf of
    Eliot Kimber"  wrote:

>Yes, I checked the log and the messages are:
>
>2017-08-22 21:43:42.287 Info: TaskServer: profiling-task.xqy
>[4136613570697343302]: Starting, start: 32361, group size: 10,
>outdir="/profiling/trial-058/group-3237/
>2017-08-22 21:43:42.287 Info: TaskServer: "
>2017-08-22 21:43:42.405 Info: Saving /marklogic/Forests/Meters/0a02
>2017-08-22 21:44:34.572 Notice: Starting MarkLogic Server 8.0-3.2 x86_64
>in /opt/MarkLogic with data in /marklogic
>2017-08-22 21:44:34.617 Info: Host  running Linux
>3.10.0-327.18.2.el7.x86_64 (Red Hat Enterprise Linux Server release 6.8
>(Santiago))
>2017-08-22 21:44:34.690 Info: SSL FIPS mode has been enabled
>
>The first message is from my task indicating that the 3237st (out of
>50,000 in the queue) is starting.
>
>Then the MarkLogic start message for no obvious reason. It’s not a time
>at which a scheduled server restart would have likely happened and nobody
>was (or should have been) awake at that hour and I’m the only person who
>should be doing anything with this server anyway.
>
>What’s interesting is that I’m getting this restart consistently at about
>the 3200th task, so it feels like either a time out or a resource
>exhaustion that then triggers a restart, but there are no messages about
>any kind of failure, out of memory condition, etc.
>
>I’m pretty sure it’s an issue with the configuration of the underlying
>linux server but I wanted to know if there were any conditions under
>which the Task Server or ML server itself would spontaneously restart.
>
>Thanks,
>
>Eliot
>
>--
>Eliot Kimber
>http://contrext.com
> 
>
>
>On 8/23/17, 10:25 AM, "general-boun...@developer.marklogic.com on behalf
>of Dave Cassel" david.cas...@marklogic.com> wrote:
>
>I don't believe there's any reason why the Task Server would be
>triggering
>a restart (although some configuration changes affecting the Task
>Server
>would). I'd look elsewhere for an error. Specifically, I'd check
>ErrorLog.txt, find the time when a restart happened, and look to see
>if
>anything interesting was logged just before. (Perhaps you've already
>done
>that.) 
>
>-- 
>Dave Cassel, @dmcassel <https://twitter.com/dmcassel>
>Technical Community Manager
    >MarkLogic Corporation <http://www.marklogic.com/>
>
>http://developer.marklogic.com/
>
>
>
>
>On 8/23/17, 11:18 AM, "general-boun...@developer.marklogic.com on
>behalf
>of Eliot Kimber" ekim...@contrext.com> wrote:
>
>>I¹m trying to understand the Task Server (and in my case, why it is
>>consistently restarting after satisfying a subset of its queue).
>>
>>Going through the ML 8 docs I¹m not finding any general discussion
>of the
>>Task Server, only references to it from elsewhere (e.g., in the docs
>for
>>xdmp:spawn() and in discussion of scheduling tasks).
>>
>>But not finding anything that would appear to provide insight into
>why
>>the server would perform an uncommanded restart (or information
>>indicating that it would never do that and thus the problem must be
>>elsewhere).
>>
>>Have I missed it? Given that the Task Server is a built-in and
>prominent
>>part of MarkLogic it seems odd that there¹s no general documentation
>for
>>it, which makes me think I must have missed it. But I both searched
>the
>>doc set and ToC and scanned the entire Guide ToC and didn¹t find
>anything.
>>
>>Thanks,
>>
>>Eliot
>>--
>>Eliot Kimber
>

Re: [MarkLogic Dev General] Where is General Documentation for the Task Server App?

2017-08-23 Thread Eliot Kimber

Yes, I checked the log and the messages are:

2017-08-22 21:43:42.287 Info: TaskServer: profiling-task.xqy 
[4136613570697343302]: Starting, start: 32361, group size: 10, 
outdir="/profiling/trial-058/group-3237/
2017-08-22 21:43:42.287 Info: TaskServer: "
2017-08-22 21:43:42.405 Info: Saving /marklogic/Forests/Meters/0a02
2017-08-22 21:44:34.572 Notice: Starting MarkLogic Server 8.0-3.2 x86_64 in 
/opt/MarkLogic with data in /marklogic
2017-08-22 21:44:34.617 Info: Host  running Linux 
3.10.0-327.18.2.el7.x86_64 (Red Hat Enterprise Linux Server release 6.8 
(Santiago))
2017-08-22 21:44:34.690 Info: SSL FIPS mode has been enabled

The first message is from my task indicating that the 3237st (out of 50,000 in 
the queue) is starting. 

Then the MarkLogic start message for no obvious reason. It’s not a time at 
which a scheduled server restart would have likely happened and nobody was (or 
should have been) awake at that hour and I’m the only person who should be 
doing anything with this server anyway.

What’s interesting is that I’m getting this restart consistently at about the 
3200th task, so it feels like either a time out or a resource exhaustion that 
then triggers a restart, but there are no messages about any kind of failure, 
out of memory condition, etc.

I’m pretty sure it’s an issue with the configuration of the underlying linux 
server but I wanted to know if there were any conditions under which the Task 
Server or ML server itself would spontaneously restart.

Thanks,

Eliot

--
Eliot Kimber
http://contrext.com

On 8/23/17, 10:25 AM, "general-boun...@developer.marklogic.com on behalf of 
Dave Cassel"  wrote:

I don't believe there's any reason why the Task Server would be triggering
a restart (although some configuration changes affecting the Task Server
would). I'd look elsewhere for an error. Specifically, I'd check
ErrorLog.txt, find the time when a restart happened, and look to see if
anything interesting was logged just before. (Perhaps you've already done
that.) 

-- 
Dave Cassel, @dmcassel <https://twitter.com/dmcassel>
Technical Community Manager
MarkLogic Corporation <http://www.marklogic.com/>

http://developer.marklogic.com/

On 8/23/17, 11:18 AM, "general-boun...@developer.marklogic.com on behalf
of Eliot Kimber"  wrote:

>I¹m trying to understand the Task Server (and in my case, why it is
>consistently restarting after satisfying a subset of its queue).
>
>Going through the ML 8 docs I¹m not finding any general discussion of the
>Task Server, only references to it from elsewhere (e.g., in the docs for
>xdmp:spawn() and in discussion of scheduling tasks).
>
>But not finding anything that would appear to provide insight into why
>the server would perform an uncommanded restart (or information
>indicating that it would never do that and thus the problem must be
>elsewhere).
>
>Have I missed it? Given that the Task Server is a built-in and prominent
>part of MarkLogic it seems odd that there¹s no general documentation for
>it, which makes me think I must have missed it. But I both searched the
>doc set and ToC and scanned the entire Guide ToC and didn¹t find anything.
>
>Thanks,
>
>Eliot
>--
>Eliot Kimber
>http://contrext.com
> 
>
>
>
>___
>General mailing list
>General@developer.marklogic.com
>Manage your subscription at:
>http://developer.marklogic.com/mailman/listinfo/general

___
General mailing list
General@developer.marklogic.com
Manage your subscription at: 
http://developer.marklogic.com/mailman/listinfo/general

___
General mailing list
General@developer.marklogic.com
Manage your subscription at: 
http://developer.marklogic.com/mailman/listinfo/general

[MarkLogic Dev General] Where is General Documentation for the Task Server App?

2017-08-23 Thread Eliot Kimber

I’m trying to understand the Task Server (and in my case, why it is 
consistently restarting after satisfying a subset of its queue).

Going through the ML 8 docs I’m not finding any general discussion of the Task 
Server, only references to it from elsewhere (e.g., in the docs for 
xdmp:spawn() and in discussion of scheduling tasks).

But not finding anything that would appear to provide insight into why the 
server would perform an uncommanded restart (or information indicating that it 
would never do that and thus the problem must be elsewhere).

Have I missed it? Given that the Task Server is a built-in and prominent part 
of MarkLogic it seems odd that there’s no general documentation for it, which 
makes me think I must have missed it. But I both searched the doc set and ToC 
and scanned the entire Guide ToC and didn’t find anything.

Thanks,

Eliot
--
Eliot Kimber
http://contrext.com
 



___
General mailing list
General@developer.marklogic.com
Manage your subscription at: 
http://developer.marklogic.com/mailman/listinfo/general

Re: [MarkLogic Dev General] Getting Impossible Value from count()--why?

2017-08-23 Thread Eliot Kimber

Yes, I added “item-frequency” to my cts:element-values() call and now all the 
numbers appear to be correct.

I haven’t circled back to my original issue with the ML-provided search buckets 
not being the right size but if I have time I’ll see if the issue was failing 
to specify item-frequency at some point.

Cheers,

E.
-
On 8/23/17, 2:37 AM, "general-boun...@developer.marklogic.com on behalf of 
Geert Josten"  wrote:

Hi Eliot,

Keep in mind that you pass in item-frequency in cts:element-values, but
the default for range constraints is likely fragment-frequency. Did you
pass in an item-frequency facet-option in there too?

Kind regards,
Geert

On 8/22/17, 10:47 PM, "general-boun...@developer.marklogic.com on behalf
of Eliot Kimber"  wrote:

>If I sum the counts of each bucket calculated using cts:frequency() it
>matches the total calculated using the initial result from the
>element-values() query, so I guess the 10,000 count is a side effect of
>some internal lexicon implementation magic.
>
>Cheers,
    >
>E.
>
>--
>Eliot Kimber
>http://contrext.com
> 
>
>
>On 8/22/17, 3:25 PM, "general-boun...@developer.marklogic.com on behalf
>of Eliot Kimber" ekim...@contrext.com> wrote:
>
>I think this is again my weak understanding of lexicons and frequency
>counting. 
>
>If I change my code to sum the frequencies of the durations in each
>range then I get more sensible numbers, e.g.:
>
>let $count := sum(for $dur in $durations[. lt $upper-bound][. ge
>$lower-bound] return cts:frequency($dur))
>
>Having updated get-enrichment-durations() to:
>
>cts:element-values(xs:QName("prof:overall-elapsed"), (),
>("descending", "item-frequency"),
> cts:collection-query($collection))
>
>It still seems odd that the pure lexicon check returns exactly 10,000
>*values*--that still seems suspect, but then using those 10,000 values to
>calculate the total frequency does result in a more likely number. I
>guess I can do some brute-force querying to see if it¹s accurate.
>
>Cheers,
>
>Eliot
    >    --
>Eliot Kimber
>http://contrext.com
> 
>
>
>On 8/22/17, 2:52 PM, "general-boun...@developer.marklogic.com on
>behalf of Eliot Kimber" behalf of ekim...@contrext.com> wrote:
>
>Using ML 8.0-3.2
>
>As part of my profiling application I run a large number of
>profiles, storing the profiler results back to the database. I¹m then
>extracting the times from the profiling data to create histograms and do
>other analysis.
>
>My first attempt to do this with buckets ran into the problem
>that the index-based buckets were not returning accurate numbers, so I
>reimplemented it to construct the buckets manually from a list of the
>actual duration values.
>
>My code is:
>
>let $durations as xs:dayTimeDuration* :=
>epf:get-enrichment-durations($collection)
>let $search-range := epf:construct-search-range()
>let $facets :=
>for $bucket in $search-range/search:bucket
>let $upper-bound := if ($bucket/@lt) then
>xs:dayTimeDuration($bucket/@lt) else xs:dayTimeDuration("PT0S")
>let $lower-bound := xs:dayTimeDuration($bucket/@ge)
>let $count := count($durations[. lt $upper-bound][. ge
>$lower-bound]) 
>return if ($count gt 0)
>   then count="{$count}">{epf:format-day-time-duration($upper-bound)}t-value>
>   else ()
>
>The get-enrichment-durations() function does this:
>
>  cts:element-values(xs:QName("prof:overall-elapsed"), (),
>"descending",
> cts:collection-query($collection))
>
>This works nicely and seems to provide correct numbers except
>when the number of durations within a particular set of bounds exceeds
>10,000, at which point count() returns 10,000, which is an impossible
>number‹the chance of there being exactly 10,000 instances within a given
>range is basically zero. But I¹m getting 10,000 twice, which is
>absolutely impossible.
>
>Here¹s the results I get from runnin

Re: [MarkLogic Dev General] Getting Impossible Value from count()--why?

2017-08-22 Thread Eliot Kimber

If I sum the counts of each bucket calculated using cts:frequency() it matches 
the total calculated using the initial result from the element-values() query, 
so I guess the 10,000 count is a side effect of some internal lexicon 
implementation magic. 

Cheers,

E.

--
Eliot Kimber
http://contrext.com
 


On 8/22/17, 3:25 PM, "general-boun...@developer.marklogic.com on behalf of 
Eliot Kimber"  wrote:

I think this is again my weak understanding of lexicons and frequency 
counting. 

If I change my code to sum the frequencies of the durations in each range 
then I get more sensible numbers, e.g.:

let $count := sum(for $dur in $durations[. lt $upper-bound][. ge 
$lower-bound] return cts:frequency($dur))

Having updated get-enrichment-durations() to:

cts:element-values(xs:QName("prof:overall-elapsed"), (), ("descending", 
"item-frequency"),
 cts:collection-query($collection))

It still seems odd that the pure lexicon check returns exactly 10,000 
*values*--that still seems suspect, but then using those 10,000 values to 
calculate the total frequency does result in a more likely number. I guess I 
can do some brute-force querying to see if it’s accurate.
    
    Cheers,

Eliot
--
Eliot Kimber
http://contrext.com
 


On 8/22/17, 2:52 PM, "general-boun...@developer.marklogic.com on behalf of 
Eliot Kimber"  wrote:

Using ML 8.0-3.2

As part of my profiling application I run a large number of profiles, 
storing the profiler results back to the database. I’m then extracting the 
times from the profiling data to create histograms and do other analysis.

My first attempt to do this with buckets ran into the problem that the 
index-based buckets were not returning accurate numbers, so I reimplemented it 
to construct the buckets manually from a list of the actual duration values.

My code is:

let $durations as xs:dayTimeDuration* := 
epf:get-enrichment-durations($collection)
let $search-range := epf:construct-search-range()
let $facets :=
for $bucket in $search-range/search:bucket
let $upper-bound := if ($bucket/@lt) then 
xs:dayTimeDuration($bucket/@lt) else xs:dayTimeDuration("PT0S")
let $lower-bound := xs:dayTimeDuration($bucket/@ge)
let $count := count($durations[. lt $upper-bound][. ge 
$lower-bound]) 
return if ($count gt 0) 
   then {epf:format-day-time-duration($upper-bound)}
   else ()

The get-enrichment-durations() function does this:

  cts:element-values(xs:QName("prof:overall-elapsed"), (), "descending",
 cts:collection-query($collection))

This works nicely and seems to provide correct numbers except when the 
number of durations within a particular set of bounds exceeds 10,000, at which 
point count() returns 10,000, which is an impossible number—the chance of there 
being exactly 10,000 instances within a given range is basically zero. But I’m 
getting 10,000 twice, which is absolutely impossible.

Here’s the results I get from running this in the query console:


75778

http://marklogic.com/appservices/search";>0.01 
seconds
http://marklogic.com/appservices/search";>0.02 
seconds
http://marklogic.com/appservices/search";>0.03 
seconds
http://marklogic.com/appservices/search";>0.04 
seconds
http://marklogic.com/appservices/search";>0.05 
seconds
 …



There are 75,778 actual duration values and the count value for the 3rd 
and 4th ranges are exactly 10,000.

If I change the let $count := expression to only test the upper or 
lower bound then I get numbers greater than 10,000. I also tried changing the 
order of the predicates and using a single predicate with “and”. The problem 
only seems to be related to using both predicates when the resulting sequence 
would have more than 10K items.

Is there an explanation for why count() gives me exactly 10,000 in this 
case?

Is there a workaround for this behavior?

The search range I’m constructing is normal ML-defined markup for 
defining a search range, e.g.:

http://marklogic.com/appservices/search";>
0.001 
Second
0.002 
Second
0.003 
Second
    0.004 
Second
0.005 
Second
…


Thanks,

Eliot
--
Eliot Kimber
http://contrext.com
 



___
General mailing l

Re: [MarkLogic Dev General] Large job processing question.

2017-08-22 Thread Eliot Kimber

The Task Manager will queue the jobs. It will only process as many at once as 
there are threads configured for the Task Manager.

 

In my profiling application I’m queueing 10s of 1000s of 10-doc tasks. My Task 
Manager has a maximum queue of 100,000 tasks. If I do a small number of large 
tasks then I quickly exhaust RAM.

 

Cheers,

 

E.

 

--

Eliot Kimber

http://contrext.com

 

 

 

From:  on behalf of "Ladner, Eric 
(Eric.Ladner)" 
Reply-To: MarkLogic Developer Discussion 
Date: Tuesday, August 22, 2017 at 3:33 PM
To: MarkLogic Developer Discussion 
Subject: Re: [MarkLogic Dev General] Large job processing question.

 

Is it smart enough not to spawn 100,000 jobs at once and swamp the system?

 

Eric Ladner

Systems Analyst

eric.lad...@chevron.com

 

 

From: general-boun...@developer.marklogic.com 
[mailto:general-boun...@developer.marklogic.com] On Behalf Of Geert Josten
Sent: August 22, 2017 13:59
To: MarkLogic Developer Discussion 
Subject: [**EXTERNAL**] Re: [MarkLogic Dev General] Large job processing 
question.

 

Hi Eric,

 

Personally, I would probably let go of the all-docs-at-once approach, and spawn 
processes for each input (sub)folder, and potentially for batches or individual 
files in any folder as well. Same for the existing documents, spawn a process 
for batches or individual docs that check if they still exist. If you make them 
append logs to the documents or their properties, you can gather reports about 
changes afterwards if needed.

 

Cheers,

Geert

 

From:  on behalf of "Ladner, Eric 
(Eric.Ladner)" 
Reply-To: MarkLogic Developer Discussion 
Date: Tuesday, August 22, 2017 at 4:36 PM
To: "general@developer.marklogic.com" 
Subject: [MarkLogic Dev General] Large job processing question.

 

We have some large jobs (ingestion and validation of unstructured documents) 
that have timeout issues.

The way the jobs are structured is structured is that the first job checks that 
all the existing documents are valid (still exists on the file system).  It 
does this in two steps:  

 

 1) gather all documents to be validated from the DB 

 2) check that list against the file system.

 

The second job is: 

 1) the filesystem is traversed to find any new documents (or that have 
been modified in the last X days), 

 2) those new/modified documents are ingested.

 

The problem in the second step is there could be tens of thousands of documents 
in a hundred thousand folders (don’t ask).  The job will typically time out 
after an hour during the “go find all the new documents” phase.  I’m trying to 
find out if there’s a way to re-structure the job so that it runs faster and 
doesn’t time out, or maybe breaks up the task into different parts that run in 
parallel or something.  Any thoughts welcome.

 

Eric Ladner

Systems Analyst

eric.lad...@chevron.com

 

___
General mailing list
General@developer.marklogic.com
Manage your subscription at: 
http://developer.marklogic.com/mailman/listinfo/general

Re: [MarkLogic Dev General] Getting Impossible Value from count()--why?

2017-08-22 Thread Eliot Kimber

I think this is again my weak understanding of lexicons and frequency counting. 

If I change my code to sum the frequencies of the durations in each range then 
I get more sensible numbers, e.g.:

let $count := sum(for $dur in $durations[. lt $upper-bound][. ge $lower-bound] 
return cts:frequency($dur))

Having updated get-enrichment-durations() to:

cts:element-values(xs:QName("prof:overall-elapsed"), (), ("descending", 
"item-frequency"),
 cts:collection-query($collection))

It still seems odd that the pure lexicon check returns exactly 10,000 
*values*--that still seems suspect, but then using those 10,000 values to 
calculate the total frequency does result in a more likely number. I guess I 
can do some brute-force querying to see if it’s accurate.

Cheers,

Eliot
--
Eliot Kimber
http://contrext.com
 


On 8/22/17, 2:52 PM, "general-boun...@developer.marklogic.com on behalf of 
Eliot Kimber"  wrote:

Using ML 8.0-3.2

As part of my profiling application I run a large number of profiles, 
storing the profiler results back to the database. I’m then extracting the 
times from the profiling data to create histograms and do other analysis.

My first attempt to do this with buckets ran into the problem that the 
index-based buckets were not returning accurate numbers, so I reimplemented it 
to construct the buckets manually from a list of the actual duration values.

My code is:

let $durations as xs:dayTimeDuration* := 
epf:get-enrichment-durations($collection)
let $search-range := epf:construct-search-range()
let $facets :=
for $bucket in $search-range/search:bucket
let $upper-bound := if ($bucket/@lt) then 
xs:dayTimeDuration($bucket/@lt) else xs:dayTimeDuration("PT0S")
let $lower-bound := xs:dayTimeDuration($bucket/@ge)
let $count := count($durations[. lt $upper-bound][. ge $lower-bound]) 
return if ($count gt 0) 
   then {epf:format-day-time-duration($upper-bound)}
   else ()

The get-enrichment-durations() function does this:

  cts:element-values(xs:QName("prof:overall-elapsed"), (), "descending",
 cts:collection-query($collection))

This works nicely and seems to provide correct numbers except when the 
number of durations within a particular set of bounds exceeds 10,000, at which 
point count() returns 10,000, which is an impossible number—the chance of there 
being exactly 10,000 instances within a given range is basically zero. But I’m 
getting 10,000 twice, which is absolutely impossible.

Here’s the results I get from running this in the query console:


75778

http://marklogic.com/appservices/search";>0.01 
seconds
http://marklogic.com/appservices/search";>0.02 
seconds
http://marklogic.com/appservices/search";>0.03 
seconds
http://marklogic.com/appservices/search";>0.04 
seconds
http://marklogic.com/appservices/search";>0.05 
seconds
 …



There are 75,778 actual duration values and the count value for the 3rd and 
4th ranges are exactly 10,000.

If I change the let $count := expression to only test the upper or lower 
bound then I get numbers greater than 10,000. I also tried changing the order 
of the predicates and using a single predicate with “and”. The problem only 
seems to be related to using both predicates when the resulting sequence would 
have more than 10K items.

Is there an explanation for why count() gives me exactly 10,000 in this 
case?

Is there a workaround for this behavior?

The search range I’m constructing is normal ML-defined markup for defining 
a search range, e.g.:

http://marklogic.com/appservices/search";>
0.001 
Second
0.002 
Second
    0.003 
Second
0.004 
Second
0.005 
Second
…


Thanks,

Eliot
--
Eliot Kimber
http://contrext.com
 



___
General mailing list
General@developer.marklogic.com
Manage your subscription at: 
http://developer.marklogic.com/mailman/listinfo/general



___
General mailing list
General@developer.marklogic.com
Manage your subscription at: 
http://developer.marklogic.com/mailman/listinfo/general

Re: [MarkLogic Dev General] Count of cts:element-values() not equal to number of element instances--what's going on?

2017-08-22 Thread Eliot Kimber

A closer reading of the manual reveals my mistake: I needed to specify 
"item-frequency" in the element-values() query. Without it I was getting the 
count of *fragments* with the value, not the total number of occurrences. 

When I add the “item-frequency” option to element-values() then I get the 
correct count from the sum of cts:frequency().

Cheers,

E.
--
Eliot Kimber
http://contrext.com
 


On 8/14/17, 2:58 PM, "general-boun...@developer.marklogic.com on behalf of 
Eliot Kimber"  wrote:

Using both cts:frequence and cts:count-aggregate I get numbers that are 
closer to the correct count but are short by about 200. What would account for 
the difference?

Queries:

let $profiles := 
collection($collection)/enrprof:profiling-instance/enrprof:enrichment/enrprof:evalResult/prof:*
let $histograms := $profiles/prof:histogram
let $overall-elapsed := $profiles/prof:metadata/prof:overall-elapsed
let $durations := cts:element-values(xs:QName("prof:overall-elapsed"), (), 
"descending",
 cts:collection-query($collection))
let $count-frequency := sum(for $dur in $durations return 
cts:frequency($dur))
let $overall-elapsed-ref := 
cts:element-reference(fn:QName("http://marklogic.com/xdmp/profile","overall-elapsed";),("type=dayTimeDuration"))

let $count-frequency := sum(for $dur in $durations return 
cts:frequency($dur))
let $count-aggregate := cts:count-aggregate($overall-elapsed-ref,(), 
cts:collection-query($collection))

Results:

47539
47539
47539
47371
    47371
21219

Cheers,

E.
--
Eliot Kimber
http://contrext.com
 



On 8/14/17, 1:53 PM, "general-boun...@developer.marklogic.com on behalf of 
Mary Holstege"  wrote:


That is overkill.  The results you get out of cts:element-values have a 
 
frequency (accessible via cts:frequency). The cts: aggregates (e.g.  
cts:count, cts:sum) take the frequency into account.

//Mary

On Mon, 14 Aug 2017 11:42:07 -0700, Oleksii Segeda  
 wrote:

> Eliot,
>
> You can do something like this:
>   
cts:element-value-co-occurrences(xs:QName("prof:overall-elapsed"),xs:QName("xdmp:document"))
> if you have only one element per document.
>
> Best,
>
> Oleksii Segeda
> IT Analyst
> Information and Technology Solutions
> www.worldbank.org
>
>
> -Original Message-
    > From: general-boun...@developer.marklogic.com  
> [mailto:general-boun...@developer.marklogic.com] On Behalf Of Eliot  
> Kimber
> Sent: Monday, August 14, 2017 2:31 PM
> To: MarkLogic Developer Discussion 
> Subject: [MarkLogic Dev General] Count of cts:element-values() not 
equal  
> to number of element instances--what's going on?
>
> I have this query:
>
> let $durations := 
cts:element-values(xs:QName("prof:overall-elapsed"),  
> (), "descending",
>  cts:collection-query($collection))
>
> And this query:
>
> let $overall-elapsed := $profiles/prof:metadata/prof:overall-elapsed
>
> Where there an element range index for prof:overall-elapsed.
>
> Comparing the two results I get very different numbers when I 
expected  
> them to be equal:
>
> 47539
> 21219
>
> Doing this:
>
> count(distinct-values($overall-elapsed ! xs:dayTimeDuration(.))
>
> Returns 21219, making it clear that the range index is returning  
> distinct values, not all values. It makes sense in terms of how I 
would  
> expect a range index to be structured (a one-to-many mapping for 
values  
> to elements) but doesn’t make sense as the return for a function 
named  
> “element-values” (and not element-distinct-values).
>
> I didn’t see this behavior mentioned in the docs (although the  
> introduction to the Lexicon reference section does describe lexicons 
as  
> sets of unique values).
>
> My requirement is to *quickly* get a list of the durations for all  
> prof:expression elements (which I use for both counting and for  
> bucketing, so I need all values, not just all distinct values).
>
> Is there a way to do what I want using only indexes?
>
> Thanks,
>
>

[MarkLogic Dev General] Getting Impossible Value from count()--why?

2017-08-22 Thread Eliot Kimber

Using ML 8.0-3.2

As part of my profiling application I run a large number of profiles, storing 
the profiler results back to the database. I’m then extracting the times from 
the profiling data to create histograms and do other analysis.

My first attempt to do this with buckets ran into the problem that the 
index-based buckets were not returning accurate numbers, so I reimplemented it 
to construct the buckets manually from a list of the actual duration values.

My code is:

let $durations as xs:dayTimeDuration* := 
epf:get-enrichment-durations($collection)
let $search-range := epf:construct-search-range()
let $facets :=
for $bucket in $search-range/search:bucket
let $upper-bound := if ($bucket/@lt) then xs:dayTimeDuration($bucket/@lt) 
else xs:dayTimeDuration("PT0S")
let $lower-bound := xs:dayTimeDuration($bucket/@ge)
let $count := count($durations[. lt $upper-bound][. ge $lower-bound]) 
return if ($count gt 0) 
   then {epf:format-day-time-duration($upper-bound)}
   else ()

The get-enrichment-durations() function does this:

  cts:element-values(xs:QName("prof:overall-elapsed"), (), "descending",
 cts:collection-query($collection))

This works nicely and seems to provide correct numbers except when the number 
of durations within a particular set of bounds exceeds 10,000, at which point 
count() returns 10,000, which is an impossible number—the chance of there being 
exactly 10,000 instances within a given range is basically zero. But I’m 
getting 10,000 twice, which is absolutely impossible.

Here’s the results I get from running this in the query console:


75778

http://marklogic.com/appservices/search";>0.01 
seconds
http://marklogic.com/appservices/search";>0.02 
seconds
http://marklogic.com/appservices/search";>0.03 
seconds
http://marklogic.com/appservices/search";>0.04 
seconds
http://marklogic.com/appservices/search";>0.05 
seconds
 …



There are 75,778 actual duration values and the count value for the 3rd and 4th 
ranges are exactly 10,000.

If I change the let $count := expression to only test the upper or lower bound 
then I get numbers greater than 10,000. I also tried changing the order of the 
predicates and using a single predicate with “and”. The problem only seems to 
be related to using both predicates when the resulting sequence would have more 
than 10K items.

Is there an explanation for why count() gives me exactly 10,000 in this case?

Is there a workaround for this behavior?

The search range I’m constructing is normal ML-defined markup for defining a 
search range, e.g.:

http://marklogic.com/appservices/search";>
0.001 
Second
0.002 
Second
0.003 
Second
0.004 
Second
0.005 
Second
…


Thanks,

Eliot
--
Eliot Kimber
http://contrext.com
 



___
General mailing list
General@developer.marklogic.com
Manage your subscription at: 
http://developer.marklogic.com/mailman/listinfo/general

Re: [MarkLogic Dev General] Count of cts:element-values() not equal to number of element instances--what's going on?

2017-08-15 Thread Eliot Kimber

That would make sense but since these elements are generated by the ML profiler 
I don’t think it’s possible for them to ever be empty. This query returns zero:

let $overall-elapsed := 
collection($collection)/enrprof:profiling-instance/enrprof:enrichment/enrprof:evalResult/prof:report/prof:metadata/prof:overall-elapsed
 
count($overall-elapsed[normalize-space(xs:string(.)) eq ''])

Cheers,

E.
--
Eliot Kimber
http://contrext.com
 


On 8/15/17, 2:09 AM, "general-boun...@developer.marklogic.com on behalf of 
Geert Josten"  wrote:

Wild guess.. Empty prof:overall-elapsed elements, that are
ignored/rejected by the range index?

Cheers

On 8/14/17, 9:58 PM, "general-boun...@developer.marklogic.com on behalf of
Eliot Kimber"  wrote:

>Using both cts:frequence and cts:count-aggregate I get numbers that are
>closer to the correct count but are short by about 200. What would
>account for the difference?
>
>Queries:
>
>let $profiles := 
>collection($collection)/enrprof:profiling-instance/enrprof:enrichment/enrp
>rof:evalResult/prof:*
>let $histograms := $profiles/prof:histogram
>let $overall-elapsed := $profiles/prof:metadata/prof:overall-elapsed
>let $durations := cts:element-values(xs:QName("prof:overall-elapsed"),
>(), "descending",
> cts:collection-query($collection))
>let $count-frequency := sum(for $dur in $durations return
>cts:frequency($dur))
>let $overall-elapsed-ref :=
>cts:element-reference(fn:QName("http://marklogic.com/xdmp/profile","overal
>l-elapsed"),("type=dayTimeDuration"))
>
>let $count-frequency := sum(for $dur in $durations return
>cts:frequency($dur))
>let $count-aggregate := cts:count-aggregate($overall-elapsed-ref,(),
>cts:collection-query($collection))
>
>Results:
>
>47539
>47539
>47539
>47371
>47371
>21219
>
>Cheers,
>
>E.
>--
>Eliot Kimber
>http://contrext.com
> 
>
>
>
>On 8/14/17, 1:53 PM, "general-boun...@developer.marklogic.com on behalf
>of Mary Holstege" mary.holst...@marklogic.com> wrote:
>
>
>That is overkill.  The results you get out of cts:element-values have
>a  
>frequency (accessible via cts:frequency). The cts: aggregates (e.g.
>cts:count, cts:sum) take the frequency into account.
>
>//Mary
>
>On Mon, 14 Aug 2017 11:42:07 -0700, Oleksii Segeda
> wrote:
>
>> Eliot,
>>
>> You can do something like this:
>> 
>   
cts:element-value-co-occurrences(xs:QName("prof:overall-elapsed"),xs:QNam
>e("xdmp:document"))
>> if you have only one element per document.
>>
>> Best,
>>
>> Oleksii Segeda
>> IT Analyst
>> Information and Technology Solutions
>> www.worldbank.org
>>
>>
>> -Original Message-
>> From: general-boun...@developer.marklogic.com
>> [mailto:general-boun...@developer.marklogic.com] On Behalf Of Eliot
> 
>> Kimber
>> Sent: Monday, August 14, 2017 2:31 PM
>> To: MarkLogic Developer Discussion 
>> Subject: [MarkLogic Dev General] Count of cts:element-values() not
>equal  
>> to number of element instances--what's going on?
>>
>> I have this query:
>>
>> let $durations :=
>cts:element-values(xs:QName("prof:overall-elapsed"),
>> (), "descending",
>>  cts:collection-query($collection))
>>
>> And this query:
>>
>> let $overall-elapsed := $profiles/prof:metadata/prof:overall-elapsed
>>
>> Where there an element range index for prof:overall-elapsed.
>>
>> Comparing the two results I get very different numbers when I
>expected  
>> them to be equal:
>>
>> 47539
>> 21219
>>
>> Doing this:
>>
>> count(distinct-values($overall-elapsed ! xs:dayTimeDuration(.))
>>
>> Returns 21219, making it clear that the range index is returning
>> distinct values, not all values. It makes sense in terms of how I
>would  
>> expect a range inde

Re: [MarkLogic Dev General] Count of cts:element-values() not equal to number of element instances--what's going on?

2017-08-14 Thread Eliot Kimber

Using both cts:frequence and cts:count-aggregate I get numbers that are closer 
to the correct count but are short by about 200. What would account for the 
difference?

Queries:

let $profiles := 
collection($collection)/enrprof:profiling-instance/enrprof:enrichment/enrprof:evalResult/prof:*
let $histograms := $profiles/prof:histogram
let $overall-elapsed := $profiles/prof:metadata/prof:overall-elapsed
let $durations := cts:element-values(xs:QName("prof:overall-elapsed"), (), 
"descending",
 cts:collection-query($collection))
let $count-frequency := sum(for $dur in $durations return cts:frequency($dur))
let $overall-elapsed-ref := 
cts:element-reference(fn:QName("http://marklogic.com/xdmp/profile","overall-elapsed";),("type=dayTimeDuration"))

let $count-frequency := sum(for $dur in $durations return cts:frequency($dur))
let $count-aggregate := cts:count-aggregate($overall-elapsed-ref,(), 
cts:collection-query($collection))

Results:

47539
47539
47539
47371
47371
21219

Cheers,

E.
--
Eliot Kimber
http://contrext.com
 



On 8/14/17, 1:53 PM, "general-boun...@developer.marklogic.com on behalf of Mary 
Holstege"  wrote:


That is overkill.  The results you get out of cts:element-values have a  
frequency (accessible via cts:frequency). The cts: aggregates (e.g.  
cts:count, cts:sum) take the frequency into account.

//Mary

On Mon, 14 Aug 2017 11:42:07 -0700, Oleksii Segeda  
 wrote:

> Eliot,
>
> You can do something like this:
>   
cts:element-value-co-occurrences(xs:QName("prof:overall-elapsed"),xs:QName("xdmp:document"))
> if you have only one element per document.
>
> Best,
>
> Oleksii Segeda
> IT Analyst
> Information and Technology Solutions
> www.worldbank.org
>
>
> -Original Message-----
    > From: general-boun...@developer.marklogic.com  
> [mailto:general-boun...@developer.marklogic.com] On Behalf Of Eliot  
> Kimber
> Sent: Monday, August 14, 2017 2:31 PM
> To: MarkLogic Developer Discussion 
> Subject: [MarkLogic Dev General] Count of cts:element-values() not equal  
> to number of element instances--what's going on?
>
> I have this query:
>
> let $durations := cts:element-values(xs:QName("prof:overall-elapsed"),  
> (), "descending",
>  cts:collection-query($collection))
>
> And this query:
>
> let $overall-elapsed := $profiles/prof:metadata/prof:overall-elapsed
>
> Where there an element range index for prof:overall-elapsed.
>
> Comparing the two results I get very different numbers when I expected  
> them to be equal:
>
> 47539
> 21219
>
> Doing this:
>
> count(distinct-values($overall-elapsed ! xs:dayTimeDuration(.))
>
> Returns 21219, making it clear that the range index is returning  
> distinct values, not all values. It makes sense in terms of how I would  
> expect a range index to be structured (a one-to-many mapping for values  
> to elements) but doesn’t make sense as the return for a function named  
> “element-values” (and not element-distinct-values).
>
> I didn’t see this behavior mentioned in the docs (although the  
> introduction to the Lexicon reference section does describe lexicons as  
> sets of unique values).
>
> My requirement is to *quickly* get a list of the durations for all  
> prof:expression elements (which I use for both counting and for  
> bucketing, so I need all values, not just all distinct values).
>
> Is there a way to do what I want using only indexes?
>
> Thanks,
>
> E.
> --
> Eliot Kimber
> http://contrext.com
>
>
>
> ___
> General mailing list
> General@developer.marklogic.com
> Manage your subscription at:
> http://developer.marklogic.com/mailman/listinfo/general
> ___
> General mailing list
> General@developer.marklogic.com
> Manage your subscription at:
> http://developer.marklogic.com/mailman/listinfo/general


-- 
Using Opera's revolutionary email client: http://www.opera.com/mail/
___
General mailing list
General@developer.marklogic.com
Manage your subscription at: 
http://developer.marklogic.com/mailman/listinfo/general



___
General mailing list
General@developer.marklogic.com
Manage your subscription at: 
http://developer.marklogic.com/mailman/listinfo/general

[MarkLogic Dev General] Count of cts:element-values() not equal to number of element instances--what's going on?

2017-08-14 Thread Eliot Kimber

I have this query:

let $durations := cts:element-values(xs:QName("prof:overall-elapsed"), (), 
"descending",
 cts:collection-query($collection))

And this query:

let $overall-elapsed := $profiles/prof:metadata/prof:overall-elapsed

Where there an element range index for prof:overall-elapsed.

Comparing the two results I get very different numbers when I expected them to 
be equal:

47539
21219

Doing this: 

count(distinct-values($overall-elapsed ! xs:dayTimeDuration(.))

Returns 21219, making it clear that the range index is returning distinct 
values, not all values. It makes sense in terms of how I would expect a range 
index to be structured (a one-to-many mapping for values to elements) but 
doesn’t make sense as the return for a function named “element-values” (and not 
element-distinct-values).

I didn’t see this behavior mentioned in the docs (although the introduction to 
the Lexicon reference section does describe lexicons as sets of unique values).

My requirement is to *quickly* get a list of the durations for all 
prof:expression elements (which I use for both counting and for bucketing, so I 
need all values, not just all distinct values).

Is there a way to do what I want using only indexes? 

Thanks,

E.
--
Eliot Kimber
http://contrext.com
 



___
General mailing list
General@developer.marklogic.com
Manage your subscription at: 
http://developer.marklogic.com/mailman/listinfo/general

Re: [MarkLogic Dev General] Tracking Spawned Tasks?

2017-08-14 Thread Eliot Kimber

That’s a good point about comments. I’ll try to add comments for things I found 
lacking and subsequently discover answers to.

 

Cheers,

 

E.

--

Eliot Kimber

http://contrext.com

 

 

 

From:  on behalf of Evan Lenz 

Reply-To: MarkLogic Developer Discussion 
Date: Monday, August 14, 2017 at 12:27 PM
To: MarkLogic Developer Discussion 
Subject: Re: [MarkLogic Dev General] Tracking Spawned Tasks?

 

Hi Eliot, 

 

One nice thing about the MarkLogic documentation for functions is that you can 
add helpful comments yourself. I've seen others write "See also" comments and 
have done so myself from time to time. In general I've found the comments very 
helpful, even if it's a user getting their newbie question answered.

 

Evan


Evan Lenz
President, Lenz Consulting Group, Inc.
http://lenzconsulting.com

 

On Mon, Aug 14, 2017 at 9:57 AM, Erik Hennum  wrote:

Hi, Eliot and Ron: 

 

The return option is explained with the rest of the options in the eval article:

 

http://docs.marklogic.com/xdmp:eval

 

The second example under spawn uses the promise:

 

http://docs.marklogic.com/xdmp:spawn

 

As Ron notes, the server field is only useful if the polling requests go back 
to the same host.

 

To allow for restarts, the polling logic should check for the persisted final 
status document if the server field is empty.  (That's the motivation for 
persisting a final status document even when using server fields.)

 

Thanks for the feedback on the documentation -- I'll pass that along.

 

 

Erik Hennum

 

From: general-boun...@developer.marklogic.com 
[general-boun...@developer.marklogic.com] on behalf of Ron Hitchens 
[r...@ronsoft.com]
Sent: Monday, August 14, 2017 7:55 AM
To: MarkLogic Developer Discussion
Subject: Re: [MarkLogic Dev General] Tracking Spawned Tasks?

 

   Proceed with caution when using server fields.  They exist only on a single 
machine, they are not propagated across nodes in a cluster.  If you have a 
cluster behind a load balancer (as most are) and you stash something in a 
server field to be checked later, the next request may be vectored to a 
different cluster node, where your stashed value will not be present.  
Likewise, if you put something in a field to be picked up by a spawned task, 
the spawned task may run on a different node.

 



Ron Hitchens r...@overstory.co.uk, +44 7879 358212 

 

 

On August 14, 2017 at 3:24:32 PM, Eliot Kimber (ekim...@contrext.com) wrote:

I like using set-server-field: my requirement feels like just what server 
fields were intended for.

Cheers,

E.
--
Eliot Kimber
http://contrext.com



On 8/14/17, 8:32 AM, "general-boun...@developer.marklogic.com on behalf of Erik 
Hennum"  wrote:

Hi, Eliot:

xdmp:spawn() doesn't return an identifier because, if it is used as a future 
via the result option, it is obligated to return the result.

The approach you sketch below -- passing in an identifier and writing tickets 
to a status database -- is pretty much what InfoStudio did. 

One refinement would be to log status in a server field via 
xdmp:set-server-field() and, on completion, write final status to a database 
(for durability in the case of a restart).


Hoping that helps,


Erik Hennum


From: general-boun...@developer.marklogic.com 
[general-boun...@developer.marklogic.com] on behalf of Eliot Kimber 
[ekim...@contrext.com]
Sent: Saturday, August 12, 2017 10:15 AM
To: MarkLogic Developer Discussion
Subject: [MarkLogic Dev General] Tracking Spawned Tasks?

Using ML 8

I’m refining a profiling application that spawns a number of tasks and then, 
eventually, reports on the results once all the tasks have completed.

Right now I just fire off the tasks and then refresh my app, which looks for 
results.

It would be nice to be able to show the status of the spawned tasks but it 
looks like xdmp:spawn() doesn’t return anything (sort of expected to get some 
sort of task ID or something) and so there’s no obvious way to track spawned 
tasks from the spawning application.

I could do something like generate private task IDs and pass those as 
parameters to the spawned tasks and then maintain a set of task status docs, 
but I was hoping there was some something easier.

It seems like it would be a common requirement but I couldn’t find anything 
useful in the ML 8 docs or searching the web.

Thanks,

Eliot
--
Eliot Kimber
http://contrext.com




___
General mailing list
General@developer.marklogic.com
Manage your subscription at:
http://developer.marklogic.com/mailman/listinfo/general
___
General mailing list
General@developer.marklogic.com
Manage your subscription at: 
http://developer.marklogic.com/mailman/listinfo/general



___
General mailing list
General@developer.marklogic.com
Manage your subscription at

Re: [MarkLogic Dev General] Tracking Spawned Tasks?

2017-08-14 Thread Eliot Kimber

I like using set-server-field: my requirement feels like just what server 
fields were intended for.

Cheers,

E.
--
Eliot Kimber
http://contrext.com
 


On 8/14/17, 8:32 AM, "general-boun...@developer.marklogic.com on behalf of Erik 
Hennum"  wrote:

Hi, Eliot:

xdmp:spawn() doesn't return an identifier because, if it is used as a 
future via the result option, it is obligated to return the result.

The approach you sketch below -- passing in an identifier and writing 
tickets to a status database -- is pretty much what InfoStudio did.  

One refinement would be to log status in a server field via 
xdmp:set-server-field() and, on completion, write final status to a database 
(for durability in the case of a restart).


Hoping that helps,


Erik Hennum


From: general-boun...@developer.marklogic.com 
[general-boun...@developer.marklogic.com] on behalf of Eliot Kimber 
[ekim...@contrext.com]
Sent: Saturday, August 12, 2017 10:15 AM
To: MarkLogic Developer Discussion
Subject: [MarkLogic Dev General] Tracking Spawned Tasks?

Using ML 8

I’m refining a profiling application that spawns a number of tasks and 
then, eventually, reports on the results once all the tasks have completed.

Right now I just fire off the tasks and then refresh my app, which looks 
for results.

It would be nice to be able to show the status of the spawned tasks but it 
looks like xdmp:spawn() doesn’t return anything (sort of expected to get some 
sort of task ID or something) and so there’s no obvious way to track spawned 
tasks from the spawning application.

I could do something like generate private task IDs and pass those as 
parameters to the spawned tasks and then maintain a set of task status docs, 
but I was hoping there was some something easier.

It seems like it would be a common requirement but I couldn’t find anything 
useful in the ML 8 docs or searching the web.

Thanks,

Eliot
    --
    Eliot Kimber
http://contrext.com




___
General mailing list
General@developer.marklogic.com
Manage your subscription at:
http://developer.marklogic.com/mailman/listinfo/general
___
General mailing list
General@developer.marklogic.com
Manage your subscription at: 
http://developer.marklogic.com/mailman/listinfo/general



___
General mailing list
General@developer.marklogic.com
Manage your subscription at: 
http://developer.marklogic.com/mailman/listinfo/general

Re: [MarkLogic Dev General] Tracking Spawned Tasks?

2017-08-14 Thread Eliot Kimber

Can you expand on this statement:

“if it is used as a future via the result option, it is obligated to return the 
result.”

I didn’t see anything about this in the ML 8 docs for xdmp:spawn() and it seems 
pretty important. 

One general comment I’ll make about the ML docs is that it seems to 
assume/require a fairly encyclopedic knowledge of many subtle details. I’ve now 
read pretty much all the guides at least once and spent a lot of time in the 
reference docs, and while it’s all easy to access and very useful, it’s still a 
challenge to fully understand the implications of many things. For example, the 
documentation for various options for ranges only shows the syntax but has no 
guidance about semantics or what values are actually allowed, which would be 
very useful. This is documentation where a few well-placed see-alsos or a bit 
more usage guidance would go a long way. As a professional technical writer I 
know how challenging this aspect of docs is but it would help a lot.

Cheers,

Eliot

On 8/14/17, 8:32 AM, "general-boun...@developer.marklogic.com on behalf of Erik 
Hennum"  wrote:

Hi, Eliot:

xdmp:spawn() doesn't return an identifier because, if it is used as a 
future via the result option, it is obligated to return the result.

The approach you sketch below -- passing in an identifier and writing 
tickets to a status database -- is pretty much what InfoStudio did.  

One refinement would be to log status in a server field via 
xdmp:set-server-field() and, on completion, write final status to a database 
(for durability in the case of a restart).


Hoping that helps,


Erik Hennum


From: general-boun...@developer.marklogic.com 
[general-boun...@developer.marklogic.com] on behalf of Eliot Kimber 
[ekim...@contrext.com]
Sent: Saturday, August 12, 2017 10:15 AM
To: MarkLogic Developer Discussion
Subject: [MarkLogic Dev General] Tracking Spawned Tasks?

Using ML 8

I’m refining a profiling application that spawns a number of tasks and 
then, eventually, reports on the results once all the tasks have completed.

Right now I just fire off the tasks and then refresh my app, which looks 
for results.

It would be nice to be able to show the status of the spawned tasks but it 
looks like xdmp:spawn() doesn’t return anything (sort of expected to get some 
sort of task ID or something) and so there’s no obvious way to track spawned 
tasks from the spawning application.

I could do something like generate private task IDs and pass those as 
parameters to the spawned tasks and then maintain a set of task status docs, 
but I was hoping there was some something easier.

It seems like it would be a common requirement but I couldn’t find anything 
useful in the ML 8 docs or searching the web.

Thanks,

Eliot
    --
    Eliot Kimber
http://contrext.com




___
General mailing list
General@developer.marklogic.com
Manage your subscription at:
http://developer.marklogic.com/mailman/listinfo/general
___
General mailing list
General@developer.marklogic.com
Manage your subscription at: 
http://developer.marklogic.com/mailman/listinfo/general






___
General mailing list
General@developer.marklogic.com
Manage your subscription at: 
http://developer.marklogic.com/mailman/listinfo/general

[MarkLogic Dev General] Tracking Spawned Tasks?

2017-08-12 Thread Eliot Kimber

Using ML 8

I’m refining a profiling application that spawns a number of tasks and then, 
eventually, reports on the results once all the tasks have completed.

Right now I just fire off the tasks and then refresh my app, which looks for 
results.

It would be nice to be able to show the status of the spawned tasks but it 
looks like xdmp:spawn() doesn’t return anything (sort of expected to get some 
sort of task ID or something) and so there’s no obvious way to track spawned 
tasks from the spawning application.

I could do something like generate private task IDs and pass those as 
parameters to the spawned tasks and then maintain a set of task status docs, 
but I was hoping there was some something easier.

It seems like it would be a common requirement but I couldn’t find anything 
useful in the ML 8 docs or searching the web.

Thanks,

Eliot
--
Eliot Kimber
http://contrext.com
 



___
General mailing list
General@developer.marklogic.com
Manage your subscription at: 
http://developer.marklogic.com/mailman/listinfo/general

Re: [MarkLogic Dev General] Unexpected Failure Doing Math on Durations (ML 8.0-3.2

2017-08-11 Thread Eliot Kimber

Hmm. Apparently I have to set $div to xs:double. That still seems unnecessary 
and feels like a bug.

Cheers,

E.


Eliot Kimber

On 8/11/17, 10:45 AM, "general-boun...@developer.marklogic.com on behalf of 
Eliot Kimber"  wrote:

(ML 8.0-3.2)

In my xquery I’m doing this:

let $total := xs:dayTimeDuration("PT6M38.33S")
let $fastest-three := xs:dayTimeDuration("PT4M6.258784S")
let $div := ($fastest-three div $total)

 return ($div)

Which returns:

0.6182280621595159793

If I then try to multiple $div by 100 (to get a percent) I get decimal 
overflow:

[1.0-ml] XDMP-DECOVRFLW: (err:FOAR0002) $div * 100 -- Decimal overflow

Which is not at all expected.

I also noticed that ML 8 does not appear to support the XQuery 3.x 
two-argument round() function, e.g., round($div, 2).

Why am I getting a decimal overflow here?

As a test I tried doing the same thing with Saxon 9.7 and XSLT 3:

http://www.w3.org/1999/XSL/Transform";
  xmlns:xs="http://www.w3.org/2001/XMLSchema";
  exclude-result-prefixes="xs"
  version="3.1">
  




  div: 
  $div * 100: 
  round($div, 2) = 

  


Which produces:


  div: 0.61822806215951597921
  $div * 100: 61.822806215951597921
      round($div, 2) = 0.62

Thanks,

Eliot
--
Eliot Kimber
http://contrext.com
 



___
General mailing list
General@developer.marklogic.com
Manage your subscription at: 
http://developer.marklogic.com/mailman/listinfo/general



___
General mailing list
General@developer.marklogic.com
Manage your subscription at: 
http://developer.marklogic.com/mailman/listinfo/general

[MarkLogic Dev General] Unexpected Failure Doing Math on Durations (ML 8.0-3.2

2017-08-11 Thread Eliot Kimber

(ML 8.0-3.2)

In my xquery I’m doing this:

let $total := xs:dayTimeDuration("PT6M38.33S")
let $fastest-three := xs:dayTimeDuration("PT4M6.258784S")
let $div := ($fastest-three div $total)

 return ($div)

Which returns:

0.6182280621595159793

If I then try to multiple $div by 100 (to get a percent) I get decimal overflow:

[1.0-ml] XDMP-DECOVRFLW: (err:FOAR0002) $div * 100 -- Decimal overflow

Which is not at all expected.

I also noticed that ML 8 does not appear to support the XQuery 3.x two-argument 
round() function, e.g., round($div, 2).

Why am I getting a decimal overflow here?

As a test I tried doing the same thing with Saxon 9.7 and XSLT 3:

http://www.w3.org/1999/XSL/Transform";
  xmlns:xs="http://www.w3.org/2001/XMLSchema";
  exclude-result-prefixes="xs"
  version="3.1">
  




  div: 
  $div * 100: 
  round($div, 2) = 

  


Which produces:


  div: 0.61822806215951597921
  $div * 100: 61.822806215951597921
  round($div, 2) = 0.62

Thanks,

Eliot
--
Eliot Kimber
http://contrext.com
 



___
General mailing list
General@developer.marklogic.com
Manage your subscription at: 
http://developer.marklogic.com/mailman/listinfo/general

Re: [MarkLogic Dev General] Making Collection Facet Work with search:search() (Resolved)

2017-08-07 Thread Eliot Kimber

My  element was not in the search namespace. This was a side effect 
of cutting and pasting from various examples where a default namespace had been 
set.

Hmph.

Cheers,

E.

--
Eliot Kimber
http://contrext.com
 


On 8/7/17, 9:55 PM, "general-boun...@developer.marklogic.com on behalf of Eliot 
Kimber"  wrote:

This is ML 8.0-3.2

Cheers,

E.
    --
Eliot Kimber
http://contrext.com
 


On 8/7/17, 9:45 PM, "general-boun...@developer.marklogic.com on behalf of 
Eliot Kimber"  wrote:

I’m trying to do a search:search() with two contraints: one for 
collection and one for a bucketed facet. 

Here is my search definition:

search:search(("install"),

  

limit=5
  
  

  100th of a second
  200th of a Second
 … (bunch of buckets omitted)
  More than 2 seconds
  http://marklogic.com/xdmp/query-meters";
 name="elapsed-time"/>

  
)

I can’t see any problem with this definition and if I run it as shown 
it works and my bucket constraint result is good (very nice, by the way).

However, If I try to specify a value for the named constraint “trial:”, 
e.g.:

search:search(("install and trial:trial-001"), …)

Then I get this failure:

[1.0-ml] XDMP-AS: (err:XPTY0004) $constraint-elem as element() -- 
Invalid coercion: () as element()
Stack Trace

In /MarkLogic/appservices/search/ast.xqy on line 305
In ast:joiner-constraint(map:map(http://www.w3.org/2001/XMLSchema"; 
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"; 
xmlns:map="http://marklogic.com/xdmp/map";>6<..XDMP-ATOMIZEFUNC: (err:FOTY0013) 
Functions cannot be atomized...), http://marklogic.com/appservices/search";>trial)

(and lots more stack trace items).

What is causing this failure and what do I do to resolve it?

Thanks,

Eliot
--
Eliot Kimber
http://contrext.com
 



___
General mailing list
General@developer.marklogic.com
Manage your subscription at: 
http://developer.marklogic.com/mailman/listinfo/general



___
General mailing list
General@developer.marklogic.com
Manage your subscription at: 
http://developer.marklogic.com/mailman/listinfo/general



___
General mailing list
General@developer.marklogic.com
Manage your subscription at: 
http://developer.marklogic.com/mailman/listinfo/general

Re: [MarkLogic Dev General] Making Collection Facet Work with search:search()

2017-08-07 Thread Eliot Kimber

This is ML 8.0-3.2

Cheers,

E.
--
Eliot Kimber
http://contrext.com
 


On 8/7/17, 9:45 PM, "general-boun...@developer.marklogic.com on behalf of Eliot 
Kimber"  wrote:

I’m trying to do a search:search() with two contraints: one for collection 
and one for a bucketed facet. 

Here is my search definition:

search:search(("install"),

  

limit=5
  
  

  100th of a second
  200th of a Second
 … (bunch of buckets omitted)
  More than 2 seconds
  http://marklogic.com/xdmp/query-meters";
 name="elapsed-time"/>

  
)

I can’t see any problem with this definition and if I run it as shown it 
works and my bucket constraint result is good (very nice, by the way).

However, If I try to specify a value for the named constraint “trial:”, 
e.g.:

search:search(("install and trial:trial-001"), …)

Then I get this failure:

[1.0-ml] XDMP-AS: (err:XPTY0004) $constraint-elem as element() -- Invalid 
coercion: () as element()
Stack Trace

In /MarkLogic/appservices/search/ast.xqy on line 305
In ast:joiner-constraint(map:map(http://www.w3.org/2001/XMLSchema"; 
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"; 
xmlns:map="http://marklogic.com/xdmp/map";>6<..XDMP-ATOMIZEFUNC: (err:FOTY0013) 
Functions cannot be atomized...), http://marklogic.com/appservices/search";>trial)

(and lots more stack trace items).

What is causing this failure and what do I do to resolve it?

Thanks,

Eliot
--
Eliot Kimber
http://contrext.com
 



___
General mailing list
General@developer.marklogic.com
Manage your subscription at: 
http://developer.marklogic.com/mailman/listinfo/general



___
General mailing list
General@developer.marklogic.com
Manage your subscription at: 
http://developer.marklogic.com/mailman/listinfo/general

[MarkLogic Dev General] Making Collection Facet Work with search:search()

2017-08-07 Thread Eliot Kimber

I’m trying to do a search:search() with two contraints: one for collection and 
one for a bucketed facet. 

Here is my search definition:

search:search(("install"),

  

limit=5
  
  

  100th of a second
  200th of a Second
 … (bunch of buckets omitted)
  More than 2 seconds
  http://marklogic.com/xdmp/query-meters";
 name="elapsed-time"/>

  
)

I can’t see any problem with this definition and if I run it as shown it works 
and my bucket constraint result is good (very nice, by the way).

However, If I try to specify a value for the named constraint “trial:”, e.g.:

search:search(("install and trial:trial-001"), …)

Then I get this failure:

[1.0-ml] XDMP-AS: (err:XPTY0004) $constraint-elem as element() -- Invalid 
coercion: () as element()
Stack Trace

In /MarkLogic/appservices/search/ast.xqy on line 305
In ast:joiner-constraint(map:map(http://www.w3.org/2001/XMLSchema"; 
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"; 
xmlns:map="http://marklogic.com/xdmp/map";>6<..XDMP-ATOMIZEFUNC: (err:FOTY0013) 
Functions cannot be atomized...), http://marklogic.com/appservices/search";>trial)

(and lots more stack trace items).

What is causing this failure and what do I do to resolve it?

Thanks,

Eliot
--
Eliot Kimber
http://contrext.com
 



___
General mailing list
General@developer.marklogic.com
Manage your subscription at: 
http://developer.marklogic.com/mailman/listinfo/general

Re: [MarkLogic Dev General] Trying to Get ML Data Visualization Widgets Working

2017-08-07 Thread Eliot Kimber


I figured out my n00b error: The initialization script needs to be at the end 
of the HTML doc. Once I did that (and also set the constraintType configuration 
property on the chart configuration), I’m now getting a visualization widget on 
my page. 

Now I just need to fill it with data….

Cheers,

E.


Eliot Kimber
Doer of Things Nobody Else Has Time For

On 8/7/17, 3:19 PM, "general-boun...@developer.marklogic.com on behalf of Eliot 
Kimber"  wrote:


I was looking at the Google visualization stuff and then found Dave Lee’s 
paper on using it with MarkLogic, which then led me to the ML 8 docs.

Note that I’m not really using the application builder, just using code 
that comes with it. I suspect my issue is just a basic JavaScript problem.

I’ll take a look at vis.js—I need easy for this project…

Cheers,

E.
    
Eliot Kimber
Doer of Things Nobody Else Has Tme For

On 8/7/17, 3:08 PM, "general-boun...@developer.marklogic.com on behalf of 
Erik Hennum"  wrote:

Hi, Eliot:

The AppBuilder has been superseded by initiatives in the JavaScript 
ecosystem and is deprecated in MarkLogic 8 and removed in 9.

I've heard good things about the D3 (versatile) and vis.js (easy) Open 
Source JavaScript visualization libraries.


Hoping that's useful,


Erik Hennum


From: general-boun...@developer.marklogic.com 
[general-boun...@developer.marklogic.com] on behalf of Eliot Kimber 
[ekim...@contrext.com]
Sent: Monday, August 07, 2017 12:24 PM
To: MarkLogic Developer Discussion
Subject: [MarkLogic Dev General] Trying to Get ML Data Visualization 
WidgetsWorking

Using ML 8, I’m setting up a little profiling web application and I 
need to do visualization on the recorded data, e.g., durations reported by 
query meters for a large number operations.

I’m following the guidance in the Search Developer's Guide — Chapter 
31, Data Visualization Widgets, in the context of my own simple Web app (that 
is, I did not use the application builder to initially create my app, I just 
created a simple HTTP app from scratch.

I’m generating an HTML page that includes all the Javascript for 
visualization:








var durationBarChartConfig = {
title: "Duration Distributions",
dataLabel: "Durations",
dataType: "int"
}
ML.controller.init();
ML.chartWidget('duration-bar-chart-1', 'bar', 
durationBarChartConfig); ML.chartWidget('duration-bar-chart-2', 'bar', 
durationBarChartConfig); ML.chartWidget('duration-bar-chart-3', 'bar', 
durationBarChartConfig);

ML.controller.loadData();


And in the main HTML I’m generating the corresponding widget-containing 
divs:

   
  


However, when I load the page I get this result in the console:

chart.js:82 Uncaught Chart widget container ID "duration-bar-chart-1" 
does not exist

The element exists and there are no other errors in the JS console.

I assume I must be missing something basic here but as I’m not at all 
versed in JavaScript I’m hoping someone can point me in the right direction.

I didn’t see anything in the ML guide or the underlying JavaScript code 
that suggested I’m missing some setup.

Thanks,

Eliot

--
Eliot Kimber
http://contrext.com




___
General mailing list
General@developer.marklogic.com
Manage your subscription at:
http://developer.marklogic.com/mailman/listinfo/general
___
General mailing list
General@developer.marklogic.com
Manage your subscription at: 
http://developer.marklogic.com/mailman/listinfo/general






___
General mailing list
General@developer.marklogic.com
Manage your subscription at: 
http://developer.marklogic.com/mailman/listinfo/general






___
General mailing list
General@developer.marklogic.com
Manage your subscription at: 
http://developer.marklogic.com/mailman/listinfo/general

Re: [MarkLogic Dev General] Trying to Get ML Data Visualization Widgets Working

2017-08-07 Thread Eliot Kimber


I was looking at the Google visualization stuff and then found Dave Lee’s paper 
on using it with MarkLogic, which then led me to the ML 8 docs.

Note that I’m not really using the application builder, just using code that 
comes with it. I suspect my issue is just a basic JavaScript problem.

I’ll take a look at vis.js—I need easy for this project…

Cheers,

E.

Eliot Kimber
Doer of Things Nobody Else Has Tme For

On 8/7/17, 3:08 PM, "general-boun...@developer.marklogic.com on behalf of Erik 
Hennum"  wrote:

Hi, Eliot:

The AppBuilder has been superseded by initiatives in the JavaScript 
ecosystem and is deprecated in MarkLogic 8 and removed in 9.

I've heard good things about the D3 (versatile) and vis.js (easy) Open 
Source JavaScript visualization libraries.


Hoping that's useful,


Erik Hennum


From: general-boun...@developer.marklogic.com 
[general-boun...@developer.marklogic.com] on behalf of Eliot Kimber 
[ekim...@contrext.com]
Sent: Monday, August 07, 2017 12:24 PM
To: MarkLogic Developer Discussion
Subject: [MarkLogic Dev General] Trying to Get ML Data Visualization 
WidgetsWorking

Using ML 8, I’m setting up a little profiling web application and I need to 
do visualization on the recorded data, e.g., durations reported by query meters 
for a large number operations.

I’m following the guidance in the Search Developer's Guide — Chapter 31, 
Data Visualization Widgets, in the context of my own simple Web app (that is, I 
did not use the application builder to initially create my app, I just created 
a simple HTTP app from scratch.

I’m generating an HTML page that includes all the Javascript for 
visualization:








var durationBarChartConfig = {
title: "Duration Distributions",
dataLabel: "Durations",
dataType: "int"
}
ML.controller.init();
ML.chartWidget('duration-bar-chart-1', 'bar', 
durationBarChartConfig); ML.chartWidget('duration-bar-chart-2', 'bar', 
durationBarChartConfig); ML.chartWidget('duration-bar-chart-3', 'bar', 
durationBarChartConfig);

ML.controller.loadData();


And in the main HTML I’m generating the corresponding widget-containing 
divs:

   
  


However, when I load the page I get this result in the console:

chart.js:82 Uncaught Chart widget container ID "duration-bar-chart-1" does 
not exist

The element exists and there are no other errors in the JS console.

I assume I must be missing something basic here but as I’m not at all 
versed in JavaScript I’m hoping someone can point me in the right direction.

I didn’t see anything in the ML guide or the underlying JavaScript code 
that suggested I’m missing some setup.

Thanks,

Eliot

--
Eliot Kimber
http://contrext.com




___
General mailing list
General@developer.marklogic.com
Manage your subscription at:
http://developer.marklogic.com/mailman/listinfo/general
___
General mailing list
General@developer.marklogic.com
Manage your subscription at: 
http://developer.marklogic.com/mailman/listinfo/general






___
General mailing list
General@developer.marklogic.com
Manage your subscription at: 
http://developer.marklogic.com/mailman/listinfo/general

[MarkLogic Dev General] Trying to Get ML Data Visualization Widgets Working

2017-08-07 Thread Eliot Kimber

Using ML 8, I’m setting up a little profiling web application and I need to do 
visualization on the recorded data, e.g., durations reported by query meters 
for a large number operations.

I’m following the guidance in the Search Developer's Guide — Chapter 31, Data 
Visualization Widgets, in the context of my own simple Web app (that is, I did 
not use the application builder to initially create my app, I just created a 
simple HTTP app from scratch.

I’m generating an HTML page that includes all the Javascript for visualization:








var durationBarChartConfig = {
title: "Duration Distributions",
dataLabel: "Durations",
dataType: "int"
}
ML.controller.init();
ML.chartWidget('duration-bar-chart-1', 'bar', durationBarChartConfig); 
ML.chartWidget('duration-bar-chart-2', 'bar', durationBarChartConfig); 
ML.chartWidget('duration-bar-chart-3', 'bar', durationBarChartConfig);

ML.controller.loadData();


And in the main HTML I’m generating the corresponding widget-containing divs:

   
  


However, when I load the page I get this result in the console:

chart.js:82 Uncaught Chart widget container ID "duration-bar-chart-1" does not 
exist

The element exists and there are no other errors in the JS console.

I assume I must be missing something basic here but as I’m not at all versed in 
JavaScript I’m hoping someone can point me in the right direction.

I didn’t see anything in the ML guide or the underlying JavaScript code that 
suggested I’m missing some setup.

Thanks,

Eliot

--
Eliot Kimber
http://contrext.com
 



___
General mailing list
General@developer.marklogic.com
Manage your subscription at: 
http://developer.marklogic.com/mailman/listinfo/general

[MarkLogic Dev General] Possible to Create Multi-Tagname Range Indexes Using admin Functions?

2017-07-31 Thread Eliot Kimber

Using ML 8 I’m setting up a script to configure a large number of range indexes.

If I do it manually via the UI I can create a single range index configuration 
that lists multiple tag names.

However, this doesn’t appear to be possible with the admin API using 
admin:database-add-range-element-index().

I’m passing multiple range index definitions to a single call of 
admin:database-add-range-element-index() but the result in the UI is still one 
range index configuration per element name:

declare function local:configure-range-element-indexes(
   $config as element(),
   $db-id as xs:integer,
   $datatype as xs:string,
   $namespace as xs:string?,
   $tagnames as xs:string*
) as element() {
   let $indexes as element()* :=
   for $tagname in $tagnames
   return admin:database-range-element-index(
  $datatype,
  $namespace,
  $tagname,
  "http://marklogic.com/collation/";,
  false(),
  "reject")

   let $new-config :=
 admin:database-add-range-element-index(
$config,
$db-id,
$indexes)
   return $new-config
};

Some of these tag name lists are quite long, reflecting logical groupings of 
element types, so it would be nice to have them grouped under single 
definitions in the UI for the benefit of people inspecting the configuration.

Is what I want to do possible?

Thanks,

Eliot

--
Eliot Kimber
http://contrext.com
 



___
General mailing list
General@developer.marklogic.com
Manage your subscription at: 
http://developer.marklogic.com/mailman/listinfo/general

Re: [MarkLogic Dev General] Make string XML "safe" in xquery

2017-07-29 Thread Eliot Kimber

 

If you have the string in one variable then the earlier answer should do what 
you want:

 

let $str := “This is  xml”

let $elem as element() := {$str}

return $elem

 

Result is:

 

This is <not> xml

 

Cheers,

 

E.



Eliot Kimber

 

From:  on behalf of Steven Anderson 

Reply-To: MarkLogic Developer Discussion 
Date: Saturday, July 29, 2017 at 2:59 AM
To: MarkLogic Developer Discussion 
Subject: Re: [MarkLogic Dev General] Make string XML "safe" in xquery

 

I know that the text string between the start and end tags contains no XML, 
only text with no markup.

 

My plan is to put the text string between the open and close title tags as part 
of a larger text string that I want to convert to an XML document via 
xdmp:unquote .

 

Sounds like I'll be whipping up my own function based on fn:replace.

 

   Steve 


On Jul 29, 2017, at 12:39 AM, Jason Hunter  wrote:

Why do you have malformed xml like that? 

 

How do you reliably know what's tags and what's string?

 

Your plan is to preprocess the text to make it well-formed xml so it can be 
unquoted?

Sent from my iPhone


On Jul 29, 2017, at 14:08, Steven Anderson  wrote:

Within the larger context, I have a string like this:

 

A title for  the product

 

that I'm then converting into an xml document node using xdmp:unquote.

 

That makes xdmp:unquote barf, but if I do a fn:replace, on the specific 
characters it works.

 

As I said, I can whip something up, but I assumed there was an obvious function 
for this.

 

  Steve

 

On Jul 28, 2017, at 10:28 PM, Jason Hunter  wrote:

In normal XQuery you don't need to do this. Are you sure you do?

 

Maybe you just need:

 

{ $value }

 

The value will be properly escaped on output. 

Sent from my iPhone


On Jul 29, 2017, at 13:01, Steven Anderson  wrote:

I could do that, but I just figured there'd be a xquery function do to it for 
all three special XML characters.  It's easy enough to write one, but I just 
assumed that someone else would have needed it.


On Jul 28, 2017, at 9:12 PM, Indrajeet Verma  wrote:

Steve - Did you try using fn:replace? e.g. fn:replace(fn:replace($title, "<", 
"&lt;"), ">", "&gt;") 

 

 

On Sat, Jul 29, 2017 at 5:32 AM, Steve Anderson  
wrote:

I have a string like this:   

 

A title for  the product 

 

and I'd like to replace it with 

 

A title for <placeholder> the product 

 

Basically, I want to make the a valid XML text node, fixing greater than, less 
than, and ampersands.  I thought I could make xdmp:quote do that, but, perhaps 
because it's Friday afternoon, I can't find the right options to make it work.  

 

Is there any easy solution I can't find?

 

   Steve

 

 


___
General mailing list
General@developer.marklogic.com
Manage your subscription at:
http://developer.marklogic.com/mailman/listinfo/general

 

___
General mailing list
General@developer.marklogic.com
Manage your subscription at: 
http://developer.marklogic.com/mailman/listinfo/general

___
General mailing list
General@developer.marklogic.com
Manage your subscription at: 
http://developer.marklogic.com/mailman/listinfo/general

___
General mailing list
General@developer.marklogic.com
Manage your subscription at: 
http://developer.marklogic.com/mailman/listinfo/general

___
General mailing list
General@developer.marklogic.com
Manage your subscription at: 
http://developer.marklogic.com/mailman/listinfo/general

___
General mailing list
General@developer.marklogic.com
Manage your subscription at: 
http://developer.marklogic.com/mailman/listinfo/general

___
General mailing list
General@developer.marklogic.com
Manage your subscription at: 
http://developer.marklogic.com/mailman/listinfo/general

Re: [MarkLogic Dev General] Failure Trying to Restore ML 4.2 Backup to ML 8

2017-07-28 Thread Eliot Kimber

It looks like the issue was related to permissions on the file system that held 
the backup directories. We relocated them and ensured that they were owned by 
the ML server’s user and now I’m able to restore into ML 8 (or at least the 
restore has started). Was able to restore into ML 4, which of course should 
work no problem.

Cheers,

E.

--
Eliot Kimber
http://contrext.com
 


On 7/27/17, 1:41 PM, "general-boun...@developer.marklogic.com on behalf of 
Eliot Kimber"  wrote:

I have a backup from an ML 4.2 database that I’m trying to restore to an ML 
8 server. The database configurations are (or should be) the same between both 
servers and both are running on Linux servers.

If I have the same set of forests defined on the ML 8 server as in the 
backup then when I go to the restore screen it lists all the forests in the 
backup but all their check boxes are unchecked and greyed out. If I delete some 
of the forests from the ML 8 server then those forests are selected and not 
greyed out.

But…

When I proceed with the restore I consistently get failures like this:

Operation failed with error message: XDMP-NOFOREST: 
xdmp:database-restore((xs:unsignedLong("5211046837612715608"), 
xs:unsignedLong("5138674030818805002")), "/marklogic/backup/rsuite/20170726-1", 
(), fn:false(), (), fn:false(), ()) -- No forest with identifier 
5211046837612715608. Check server logs.

I’m not seeing any other errors in the ErrorLog.txt log

I didn’t see anything in the ML 8 backup and restore docs that suggested 
what the issue might be.

The 5211046837612715608 value comes from databases.xml and I see a mapping 
for this ID in the assignments.xml file:

  
rsuite02
true
14194071972761628339
/somedir/MarkLogic
all
false

5211046837612715608




And there is a directory Forests/rsuite02 in the backup.

Any idea what would be causing this failure?
    
    Thanks,

Eliot
--
Eliot Kimber
http://contrext.com
 



___
General mailing list
General@developer.marklogic.com
Manage your subscription at: 
http://developer.marklogic.com/mailman/listinfo/general



___
General mailing list
General@developer.marklogic.com
Manage your subscription at: 
http://developer.marklogic.com/mailman/listinfo/general

[MarkLogic Dev General] Failure Trying to Restore ML 4.2 Backup to ML 8

2017-07-27 Thread Eliot Kimber

I have a backup from an ML 4.2 database that I’m trying to restore to an ML 8 
server. The database configurations are (or should be) the same between both 
servers and both are running on Linux servers.

If I have the same set of forests defined on the ML 8 server as in the backup 
then when I go to the restore screen it lists all the forests in the backup but 
all their check boxes are unchecked and greyed out. If I delete some of the 
forests from the ML 8 server then those forests are selected and not greyed out.

But…

When I proceed with the restore I consistently get failures like this:

Operation failed with error message: XDMP-NOFOREST: 
xdmp:database-restore((xs:unsignedLong("5211046837612715608"), 
xs:unsignedLong("5138674030818805002")), "/marklogic/backup/rsuite/20170726-1", 
(), fn:false(), (), fn:false(), ()) -- No forest with identifier 
5211046837612715608. Check server logs.

I’m not seeing any other errors in the ErrorLog.txt log

I didn’t see anything in the ML 8 backup and restore docs that suggested what 
the issue might be.

The 5211046837612715608 value comes from databases.xml and I see a mapping for 
this ID in the assignments.xml file:

  
rsuite02
true
14194071972761628339
/somedir/MarkLogic
all
false

5211046837612715608




And there is a directory Forests/rsuite02 in the backup.

Any idea what would be causing this failure?

Thanks,

Eliot
--
Eliot Kimber
http://contrext.com
 



___
General mailing list
General@developer.marklogic.com
Manage your subscription at: 
http://developer.marklogic.com/mailman/listinfo/general

Re: [MarkLogic Dev General] Using CURL to Test ML HTTP Processing

2017-06-23 Thread Eliot Kimber

OK, I think I got it sorted, although I’m not sure I understand why it needs to 
be this way. 

On my curl command I added: 

-H "Content-Type: application/text"  

Along with:

--data-binary "@testfile.txt"

And then in my XQuery I use:

xdmp:get-request-body("text")

And get the response I expected (and wanted).

Cheers,

E.
--
Eliot Kimber
http://contrext.com
 



On 6/23/17, 9:18 AM, "general-boun...@developer.marklogic.com on behalf of Erik 
Hennum"  wrote:

Hi, Eliot:

Try specifying the content-type.

I believe that, if a POST request doesn't specify the content-type, 
curl defaults the content-type to application/x-www-form-urlencoded 

(This convenience may or may not be seen as a feature.)


Regards,


Erik Hennum


From: general-boun...@developer.marklogic.com 
[general-boun...@developer.marklogic.com] on behalf of Eliot Kimber 
[ekim...@contrext.com]
Sent: Thursday, June 22, 2017 3:02 PM
To: MarkLogic Developer Discussion
Subject: [MarkLogic Dev General] Using CURL to Test ML HTTP Processing

I’m trying to understand the ML support for handling HTTP requests and I’m 
trying to use CURL to test things just to learn. I’m getting an odd behavior 
and I haven’t been able to figure out what I’m doing wrong from either the curl 
info I can find or from the relevant ML docs.

Here’s my module:

xquery version "1.0-ml";


let $type:= xdmp:get-request-header('Content-Type')
let $field-names := xdmp:get-request-field-names()
return

This is test remote access
{$type}
{
for $name in $field-names
return {$name}
}
{
for $name in $field-names
return xdmp:get-request-field($name)}



I’m trying to use POST to send the data in a file to this module using the 
–data-binary parameter:

curl -X POST  --data-binary "file=@testfile.txt" --user ekimber:ekimber 
http://anglia.corp.mitchellrepair.com:11984/test-remote-access.xqy

However, the response I get is:

This is test remote 
accessapplication/x-www-form-urlencodedfile@testfile.txt

Note that the field value is the string “@testfile.txt”, not the content of 
the file. This is the form of call that appears to be the correct way to 
associate a field name with the data from a file.

If I leave off “file=” then the contents of testfile.txt become the field 
name:

curl -X POST  --data-binary "@testfile.txt" --user ekimber:ekimber 
http://anglia.corp.mitchellrepair.com:11984/test-remote-access.xqy
This is test remote 
accessapplication/x-www-form-urlencodedThis
 is the test
File. More text.


Which also seems wrong.

I must be doing something wrong, either on the CURL side or on the ML side 
but I can’t figure out what it is. All the examples I could find in the ML docs 
use direct form submission rather than CURL.

Thanks,

Eliot

--
Eliot Kimber
http://contrext.com




___
General mailing list
General@developer.marklogic.com
Manage your subscription at:
http://developer.marklogic.com/mailman/listinfo/general
___
General mailing list
General@developer.marklogic.com
Manage your subscription at: 
http://developer.marklogic.com/mailman/listinfo/general



___
General mailing list
General@developer.marklogic.com
Manage your subscription at: 
http://developer.marklogic.com/mailman/listinfo/general

[MarkLogic Dev General] Using CURL to Test ML HTTP Processing

2017-06-22 Thread Eliot Kimber

I’m trying to understand the ML support for handling HTTP requests and I’m 
trying to use CURL to test things just to learn. I’m getting an odd behavior 
and I haven’t been able to figure out what I’m doing wrong from either the curl 
info I can find or from the relevant ML docs. 

Here’s my module:

xquery version "1.0-ml";


let $type:= xdmp:get-request-header('Content-Type')
let $field-names := xdmp:get-request-field-names()
return

This is test remote access
{$type}
{
for $name in $field-names
return {$name}
}
{
for $name in $field-names 
return xdmp:get-request-field($name)}



I’m trying to use POST to send the data in a file to this module using the 
–data-binary parameter:

curl -X POST  --data-binary "file=@testfile.txt" --user ekimber:ekimber 
http://anglia.corp.mitchellrepair.com:11984/test-remote-access.xqy

However, the response I get is:

This is test remote 
accessapplication/x-www-form-urlencodedfile@testfile.txt

Note that the field value is the string “@testfile.txt”, not the content of the 
file. This is the form of call that appears to be the correct way to associate 
a field name with the data from a file.

If I leave off “file=” then the contents of testfile.txt become the field name:

curl -X POST  --data-binary "@testfile.txt" --user ekimber:ekimber 
http://anglia.corp.mitchellrepair.com:11984/test-remote-access.xqy
This is test remote 
accessapplication/x-www-form-urlencodedThis
 is the test
File. More text.


Which also seems wrong.

I must be doing something wrong, either on the CURL side or on the ML side but 
I can’t figure out what it is. All the examples I could find in the ML docs use 
direct form submission rather than CURL.

Thanks,

Eliot

--
Eliot Kimber
http://contrext.com
 



___
General mailing list
General@developer.marklogic.com
Manage your subscription at: 
http://developer.marklogic.com/mailman/listinfo/general

Re: [MarkLogic Dev General] Processing Large Number of Docs to Get Statistics

2017-05-25 Thread Eliot Kimber

Thanks, I’ll take a look.

 

Cheers,

 

E.

 

--

Eliot Kimber

http://contrext.com

 

 

 

From:  on behalf of Gary Vidal 

Reply-To: MarkLogic Developer Discussion 
Date: Thursday, May 25, 2017 at 5:37 AM
To: 
Subject: [MarkLogic Dev General] Processing Large Number of Docs to Get 
Statistics

 

Eliot,

 

I will share some code I wrote using Apache Flink, which does exactly what you 
want to do for MarkLogic on a client machine.  The problem is with such an old 
version of ML you are forced to pull every document out and perform analysis 
externally.  In my previous life I wrote a version that runs on MarkLogic using 
spawn and parallel tasks, but not sure it would work on 4.2, but will share for 
sake of others.  Feel free to contact me directly for any additional help

 

https://github.com/garyvidal/ml-libraries/tree/master/task-spawner

 

 

___ General mailing list 
General@developer.marklogic.com Manage your subscription at: 
http://developer.marklogic.com/mailman/listinfo/general 

___
General mailing list
General@developer.marklogic.com
Manage your subscription at: 
http://developer.marklogic.com/mailman/listinfo/general

Re: [MarkLogic Dev General] Processing Large Number of Docs to Get Statistics

2017-05-24 Thread Eliot Kimber

I got what I needed by creating a simple groovy script that uses the XCC 
library to submit queries. Script is below. My main discovery was that I need 
to create a new session for every iteration to avoid connection time outs. With 
this I was able to process several 100 thousand docs and accumulate the results 
on my local machine. My command line is:

groovy -cp lib/xcc.jar GetArticleMetadataDetails.groovy

I chose groovy because it supports Java libraries directly and makes it easy to 
script things.

Groovy script:

#!/usr/bin/env groovy
/*
 * Use XCC jar to run enrichment jobs and collect the results.
 */
 
import com.marklogic.xcc.*;
import com.marklogic.xcc.types.*;
 
ContentSource source = ContentSourceFactory.newContentSource("myserver", 1984, 
"user", "pw");

RequestOptions options = new RequestOptions();
options.setRequestTimeLimit(3600)

moduleUrl = "rq-metadata-analysis.xqy"

println "Running module ${moduleUrl}..."
println new Date()
File outfile = new File("query-result.xml")

outfile.write "\n";

 
(36..56).each { index ->
Session session = source.newSession();
ModuleInvoke request = session.newModuleInvoke(moduleUrl)

println "Group number: ${index}, ${new Date()}"
request.setNewIntegerVariable("", "groupNum", index);
request.setNewIntegerVariable("", "length", 1);

request.setOptions(options);

ResultSequence rs = session.submitRequest(request);

ResultItem item = rs.next();
XdmItem xdmItem = item.getItem();
InputStream is = item.asInputStream();

is.eachLine { line ->
  outfile.append line
  outfile.append "\n"
}
session.close();
}

outfile.append "";

println "Done."
//  End of script.

--
Eliot Kimber
http://contrext.com
 



On 5/22/17, 10:43 PM, "general-boun...@developer.marklogic.com on behalf of 
Eliot Kimber"  wrote:

I haven’t yet seen anything in the docs that directly address what I’m 
trying to do and suspect I’m simply missing some ML basics or just going about 
things the wrong way.

I have a corpus of several hundred thousand docs (but could be millions, of 
course), where each doc is an average of 200K and several thousand elements.

I want to analyze the corpus to get details about the number of specific 
subelements within each document, e.g.:


for $article in cts:search(/Article, cts:directory-query("/Default/", 
"infinity"))[$start to $end]
 return 

I’m running this as a query from Oxygen (so I can capture the results 
locally so I can do other stuff with them).

On the server I’m using I blow the expanded tree cache if I try to request 
more than about 20,000 docs.

Is there a way to do this kind of processing over an arbitrarily large set 
*and* get the results back from a single query request?

I think the only solution is to write the results to back to the database 
and then fetch that as the last thing but I was hoping there was something 
simpler.

Have I missed an obvious solution?

Thanks,

Eliot

--
Eliot Kimber
http://contrext.com
 



___
General mailing list
General@developer.marklogic.com
Manage your subscription at: 
http://developer.marklogic.com/mailman/listinfo/general



___
General mailing list
General@developer.marklogic.com
Manage your subscription at: 
http://developer.marklogic.com/mailman/listinfo/general

Re: [MarkLogic Dev General] Processing Large Number of Docs to Get Statistics

2017-05-23 Thread Eliot Kimber

What is TDE? I’m not conversant with ML 9 features yet.

Also, I’m currently working against an ML 4.2 server (don’t ask).

TaskBot looks like just what I need but docs say it requires ML 7+ but could 
possibly be made to work with earlier releases. If someone can point me in the 
right direction I can take a stab at making it work with ML 4.

Thanks,

Eliot
--
Eliot Kimber
http://contrext.com

On 5/23/17, 8:56 AM, "general-boun...@developer.marklogic.com on behalf of Erik 
Hennum"  wrote:

Hi, Eliot:

On reflection, let me retract the range index suggestion.  I wasn't 
considering
the domain implied by the element names -- it would never make sense
to blow out a range index with the value of all of the paragraphs.

The TDE suggestion for MarkLogic 9 would still work, however, because you
could have an xs:short column with a value of 1 for every paragraph.

Erik Hennum

From: general-boun...@developer.marklogic.com 
[general-boun...@developer.marklogic.com] on behalf of Erik Hennum 
[erik.hen...@marklogic.com]
Sent: Tuesday, May 23, 2017 6:21 AM
To: MarkLogic Developer Discussion
Subject: Re: [MarkLogic Dev General] Processing Large Number of Docs to Get 
Statistics

Hi, Eliot:

One alternative to Geert's good suggestion -- if and only if the number
of element names is small and you can create range indexes on them:

*  add an element attribute range index on Article/@id
*  add an element range index on p
*  execute a cts:value-tuples() call with the constraining element query 
and directory query
*  iterate over the tuples, incrementing the value of the id in a map
*  remove the range index on p

In MarkLogic 9, that approach gets simpler.  You can just use TDE
to project rows with columns for the id and element, group on
the id column, and count the rows in the group.

Hoping that's useful (and salutations in passing),

Erik Hennum

From: general-boun...@developer.marklogic.com 
[general-boun...@developer.marklogic.com] on behalf of Geert Josten 
[geert.jos...@marklogic.com]
Sent: Tuesday, May 23, 2017 12:53 AM
To: MarkLogic Developer Discussion
Subject: Re: [MarkLogic Dev General] Processing Large Number of Docs to Get 
Statistics

Hi Eliot,

I¹d consider using taskbot
(http://registry.demo.marklogic.com/package/taskbot), and using that in
combination with either $tb:OPTIONS-SYNC or $tb:OPTIONS-SYNC-UPDATE. It
will make optimal use of the TaskServer of the host on which you initiate
the call. It doesn¹t scale endlessly, but it batches up the work
automatically for you, and will get you a lot further fairly easily..

Cheers,
Geert

On 5/23/17, 5:43 AM, "general-boun...@developer.marklogic.com on behalf of
Eliot Kimber"  wrote:

>I haven¹t yet seen anything in the docs that directly address what I¹m
>trying to do and suspect I¹m simply missing some ML basics or just going
>about things the wrong way.
>
>I have a corpus of several hundred thousand docs (but could be millions,
>of course), where each doc is an average of 200K and several thousand
>elements.
>
>I want to analyze the corpus to get details about the number of specific
>subelements within each document, e.g.:
>
>
>for $article in cts:search(/Article, cts:directory-query("/Default/",
>"infinity"))[$start to $end]
> return paras=²{count($article//p}²/>
>
>I¹m running this as a query from Oxygen (so I can capture the results
>locally so I can do other stuff with them).
>
>On the server I¹m using I blow the expanded tree cache if I try to
>request more than about 20,000 docs.
>
>Is there a way to do this kind of processing over an arbitrarily large
>set *and* get the results back from a single query request?
>
>I think the only solution is to write the results to back to the database
>and then fetch that as the last thing but I was hoping there was
>something simpler.
>
>Have I missed an obvious solution?
>
>Thanks,
>
>Eliot
>
>--
>Eliot Kimber
>http://contrext.com
>
>
>
>
>___
>General mailing list
>General@developer.marklogic.com
>Manage your subscription at:
>http://developer.marklogic.com/mailman/listinfo/general

___
General mailing list
General@developer.marklogic.com
Manage your subscription at:
http://deve

[MarkLogic Dev General] Processing Large Number of Docs to Get Statistics

2017-05-22 Thread Eliot Kimber

I haven’t yet seen anything in the docs that directly address what I’m trying 
to do and suspect I’m simply missing some ML basics or just going about things 
the wrong way.

I have a corpus of several hundred thousand docs (but could be millions, of 
course), where each doc is an average of 200K and several thousand elements.

I want to analyze the corpus to get details about the number of specific 
subelements within each document, e.g.:


for $article in cts:search(/Article, cts:directory-query("/Default/", 
"infinity"))[$start to $end]
 return 

I’m running this as a query from Oxygen (so I can capture the results locally 
so I can do other stuff with them).

On the server I’m using I blow the expanded tree cache if I try to request more 
than about 20,000 docs.

Is there a way to do this kind of processing over an arbitrarily large set 
*and* get the results back from a single query request?

I think the only solution is to write the results to back to the database and 
then fetch that as the last thing but I was hoping there was something simpler.

Have I missed an obvious solution?

Thanks,

Eliot

--
Eliot Kimber
http://contrext.com
 



___
General mailing list
General@developer.marklogic.com
Manage your subscription at: 
http://developer.marklogic.com/mailman/listinfo/general

Re: [MarkLogic Dev General] Docker & Amazon Cloud Support / Marklogic 9

2017-05-17 Thread Eliot Kimber

I run MarkLogic in a container in order to quickly manage development 
environments where ML is part of a larger environment, e.g., RSuite CMS set ups 
where I have ML, RSuite, and MySQL each running in a separate container, all 
managed via docker-compose.

 

Cheers,

 

E.

 

--

Eliot Kimber

http://contrext.com

 

 

 

From:  on behalf of Dave Cassel 

Reply-To: MarkLogic Developer Discussion 
Date: Wednesday, May 17, 2017 at 12:54 PM
To: MarkLogic Developer Discussion 
Subject: Re: [MarkLogic Dev General] Docker & Amazon Cloud Support / Marklogic 9

 

I asked about Docker — Docker is not currently a supported environment, though 
we're happy to collect feedback about use cases. 

 

-- 

Dave Cassel, @dmcassel
Technical Community Manager

MarkLogic Corporation

http://developer.marklogic.com/

 

From:  on behalf of Dave Cassel 

Reply-To: MarkLogic Developer Discussion 
Date: Tuesday, May 16, 2017 at 9:00 PM
To: MarkLogic Developer Discussion 
Subject: Re: [MarkLogic Dev General] Docker & Amazon Cloud Support / Marklogic 9

 

AMIs are in the approval process and should be available soon. 

 

-- 

Dave Cassel, @dmcassel
Technical Community Manager

MarkLogic Corporation

http://developer.marklogic.com/

 

From:  on behalf of Andreas Felix 

Reply-To: MarkLogic Developer Discussion 
Date: Tuesday, May 16, 2017 at 10:47 AM
To: "general@developer.marklogic.com" 
Subject: [MarkLogic Dev General] Docker & Amazon Cloud Support / Marklogic 9

 

Hi, 

is there already a support for Amazon Cloud and Docker in Marklogic 9?

I found nothing in the Amazon Marketplace and no Information about the 
announced Docker Support in the Release Notes.
 

regards

andreas

-- 

Mit freundlichen Grüßen / Kind regards

Ing. Andreas Felix

Senior IT Consultant

 

EBCONT enterprise technologies GmbH

 

Millennium Tower

Handelskai 94-96

1200 Wien

 

Mobil: +43 664 606 51 747

Fax: +43 2772 812 69-9

Email: andreas.fe...@ebcont.com

Web: http://www.ebcont-et.com/

 

OUR TEAM IS YOUR SUCCESS

 

HG St. Pölten - FN 293731h

UID: ATU63444589

___ General mailing list 
General@developer.marklogic.com Manage your subscription at: 
http://developer.marklogic.com/mailman/listinfo/general 

___
General mailing list
General@developer.marklogic.com
Manage your subscription at: 
http://developer.marklogic.com/mailman/listinfo/general

Re: [MarkLogic Dev General] Understanding My Profiling Results

2017-05-16 Thread Eliot Kimber

OK, I’ve done some more timing analysis and determined that, as expected, the 
reverse query takes the bulk of the time.

So it looks like 0.025 seconds for this reverse query is the time we can expect 
and that any performance improvement will come from improved and/or more 
hardware.

Cheers,

Eliot

--
Eliot Kimber
http://contrext.com
 


On 5/16/17, 10:14 AM, "Eliot Kimber"  wrote:

Apparently I’m an idiot—it was pointed out that the time for the 
cts:and-query() is just the query constructor, which of course takes no time, 
so the reported time is the time for the search itself.

I can do more testing to see what time each part of the search.

Cheers,

E.

    --
Eliot Kimber
http://contrext.com
 


On 5/16/17, 9:33 AM, "Eliot Kimber" 
 
wrote:


Some more background: there are about 2.5 million MatchingQuery 
documents in the database I’m testing with. The reverse query index is of 
course turned on.

Cheers,

E.
    --
Eliot Kimber
http://contrext.com


On 5/15/17, 7:44 PM, "general-boun...@developer.marklogic.com on behalf 
of Eliot Kimber"  wrote:

I’m getting a raw profiling report outside the context of CQ and 
trying to do some analysis on it (I’m running the same operation on several 
hundred input objects and collecting the profiling for each instance in order 
to try to get better trend data).

I’ve identified one expression that takes the bulk of the 
processing time but the profiling details aren’t adding up so I’m wondering 
what I’m missing.

Here’s the expression that is reported in the histogram:

cts:search(fn:collection()/MatchingQuery,

cts:and-query((func:func-returns-boolean($some-param),
cts:collection-query("collection-name"),
cts:reverse-query($node))), 
"unfiltered")

The intent of this search is to find MatchingQuery documents that 
match the node in $node. 

The deep time for this is PT0.023642S and the shallow time is 
PT0.023289S, which is what I would expect (shallow and deep almost the same).

So the question is, which of these terms is contributing to this 
time?

If I search for histogram entries for the individual terms I get a 
deep time of “0.000342” for the cts:and-query, which is obviously a small 
fraction of total time of 0.023 seconds.

Does that mean that the “fn:collection()/MatchingQuery” term 
accounts for the remaining time (the bulk of the 0.23 seconds)? If not, what 
accounts for the remaining time?

I’m also capturing the query meters and the only cache misses I’m 
seeing are value cache misses (17 misses, 1 hit). I’m not sure what aspect of 
this query (if any) would hit the value cache.

So my question: what are these times telling me about this 
particular search expression? 

Thanks,

Eliot
    
--
Eliot Kimber
http://contrext.com
 



___
General mailing list
General@developer.marklogic.com
Manage your subscription at: 
http://developer.marklogic.com/mailman/listinfo/general






___
General mailing list
General@developer.marklogic.com
Manage your subscription at: 
http://developer.marklogic.com/mailman/listinfo/general



___
General mailing list
General@developer.marklogic.com
Manage your subscription at: 
http://developer.marklogic.com/mailman/listinfo/general



___
General mailing list
General@developer.marklogic.com
Manage your subscription at: 
http://developer.marklogic.com/mailman/listinfo/general

Re: [MarkLogic Dev General] Understanding My Profiling Results

2017-05-16 Thread Eliot Kimber

Apparently I’m an idiot—it was pointed out that the time for the 
cts:and-query() is just the query constructor, which of course takes no time, 
so the reported time is the time for the search itself.

I can do more testing to see what time each part of the search.

Cheers,

E.

--
Eliot Kimber
http://contrext.com
 


On 5/16/17, 9:33 AM, "Eliot Kimber"  wrote:


Some more background: there are about 2.5 million MatchingQuery documents 
in the database I’m testing with. The reverse query index is of course turned 
on.

Cheers,

E.
    --
Eliot Kimber
http://contrext.com


On 5/15/17, 7:44 PM, "general-boun...@developer.marklogic.com on behalf of 
Eliot Kimber"  wrote:

I’m getting a raw profiling report outside the context of CQ and trying 
to do some analysis on it (I’m running the same operation on several hundred 
input objects and collecting the profiling for each instance in order to try to 
get better trend data).

I’ve identified one expression that takes the bulk of the processing 
time but the profiling details aren’t adding up so I’m wondering what I’m 
missing.

Here’s the expression that is reported in the histogram:

cts:search(fn:collection()/MatchingQuery,

cts:and-query((func:func-returns-boolean($some-param),
cts:collection-query("collection-name"),
cts:reverse-query($node))), "unfiltered")

The intent of this search is to find MatchingQuery documents that match 
the node in $node. 

The deep time for this is PT0.023642S and the shallow time is 
PT0.023289S, which is what I would expect (shallow and deep almost the same).

So the question is, which of these terms is contributing to this time?

If I search for histogram entries for the individual terms I get a deep 
time of “0.000342” for the cts:and-query, which is obviously a small fraction 
of total time of 0.023 seconds.

Does that mean that the “fn:collection()/MatchingQuery” term accounts 
for the remaining time (the bulk of the 0.23 seconds)? If not, what accounts 
for the remaining time?

I’m also capturing the query meters and the only cache misses I’m 
seeing are value cache misses (17 misses, 1 hit). I’m not sure what aspect of 
this query (if any) would hit the value cache.

So my question: what are these times telling me about this particular 
search expression? 

Thanks,

    Eliot

--
Eliot Kimber
http://contrext.com
 



___
General mailing list
General@developer.marklogic.com
Manage your subscription at: 
http://developer.marklogic.com/mailman/listinfo/general






___
General mailing list
General@developer.marklogic.com
Manage your subscription at: 
http://developer.marklogic.com/mailman/listinfo/general



___
General mailing list
General@developer.marklogic.com
Manage your subscription at: 
http://developer.marklogic.com/mailman/listinfo/general

Re: [MarkLogic Dev General] Understanding My Profiling Results

2017-05-16 Thread Eliot Kimber


Some more background: there are about 2.5 million MatchingQuery documents in 
the database I’m testing with. The reverse query index is of course turned on.

Cheers,

E.
--
Eliot Kimber
http://contrext.com


On 5/15/17, 7:44 PM, "general-boun...@developer.marklogic.com on behalf of 
Eliot Kimber"  wrote:

I’m getting a raw profiling report outside the context of CQ and trying to 
do some analysis on it (I’m running the same operation on several hundred input 
objects and collecting the profiling for each instance in order to try to get 
better trend data).

I’ve identified one expression that takes the bulk of the processing time 
but the profiling details aren’t adding up so I’m wondering what I’m missing.

Here’s the expression that is reported in the histogram:

cts:search(fn:collection()/MatchingQuery,

cts:and-query((func:func-returns-boolean($some-param),
cts:collection-query("collection-name"),
cts:reverse-query($node))), "unfiltered")

The intent of this search is to find MatchingQuery documents that match the 
node in $node. 

The deep time for this is PT0.023642S and the shallow time is PT0.023289S, 
which is what I would expect (shallow and deep almost the same).

So the question is, which of these terms is contributing to this time?

If I search for histogram entries for the individual terms I get a deep 
time of “0.000342” for the cts:and-query, which is obviously a small fraction 
of total time of 0.023 seconds.

Does that mean that the “fn:collection()/MatchingQuery” term accounts for 
the remaining time (the bulk of the 0.23 seconds)? If not, what accounts for 
the remaining time?

I’m also capturing the query meters and the only cache misses I’m seeing 
are value cache misses (17 misses, 1 hit). I’m not sure what aspect of this 
query (if any) would hit the value cache.

So my question: what are these times telling me about this particular 
search expression? 

Thanks,
    
    Eliot

--
Eliot Kimber
http://contrext.com
 



___
General mailing list
General@developer.marklogic.com
Manage your subscription at: 
http://developer.marklogic.com/mailman/listinfo/general






___
General mailing list
General@developer.marklogic.com
Manage your subscription at: 
http://developer.marklogic.com/mailman/listinfo/general

[MarkLogic Dev General] Understanding My Profiling Results

2017-05-15 Thread Eliot Kimber

I’m getting a raw profiling report outside the context of CQ and trying to do 
some analysis on it (I’m running the same operation on several hundred input 
objects and collecting the profiling for each instance in order to try to get 
better trend data).

I’ve identified one expression that takes the bulk of the processing time but 
the profiling details aren’t adding up so I’m wondering what I’m missing.

Here’s the expression that is reported in the histogram:

cts:search(fn:collection()/MatchingQuery,

cts:and-query((func:func-returns-boolean($some-param),
cts:collection-query("collection-name"),
cts:reverse-query($node))), "unfiltered")

The intent of this search is to find MatchingQuery documents that match the 
node in $node. 

The deep time for this is PT0.023642S and the shallow time is PT0.023289S, 
which is what I would expect (shallow and deep almost the same).

So the question is, which of these terms is contributing to this time?

If I search for histogram entries for the individual terms I get a deep time of 
“0.000342” for the cts:and-query, which is obviously a small fraction of total 
time of 0.023 seconds.

Does that mean that the “fn:collection()/MatchingQuery” term accounts for the 
remaining time (the bulk of the 0.23 seconds)? If not, what accounts for the 
remaining time?

I’m also capturing the query meters and the only cache misses I’m seeing are 
value cache misses (17 misses, 1 hit). I’m not sure what aspect of this query 
(if any) would hit the value cache.

So my question: what are these times telling me about this particular search 
expression? 

Thanks,

Eliot

--
Eliot Kimber
http://contrext.com
 



___
General mailing list
General@developer.marklogic.com
Manage your subscription at: 
http://developer.marklogic.com/mailman/listinfo/general

Re: [MarkLogic Dev General] Optimizing Reverse Queries

2017-05-02 Thread Eliot Kimber

This is a process that is performed almost constantly as new material is added 
to the corpus or the classification details are refined, both of which happen 
all the time.

One of the required features of the system is to produce a report of what the 
new classification would be if the fully classification process was applied to 
the current content so that those responsible for the classification can 
evaluate its correctness. That process takes so long that it risks delaying 
publishing of updated content in the time required by the business process this 
system serves.

I think I have enough to go on now to explore a few possible avenues, as well 
as gather more precise profiling and performance info.

Cheers,

E.

--
Eliot Kimber
http://contrext.com

On 5/2/17, 1:04 AM, "Jason Hunter"  wrote:

> By “which query” I mean which of the 125,000 separate query docs actually 
matched for a given cts:reverse-query() call. 

cts:search(
  doc(),
  cts:reverse-query(doc("newdoc.xml"))
)

This will return all the docs containing any serialized queries which would 
match newdoc.xml.

> I guess my question is: in the case where the reverse query is applied to 
an element that is not a full document, does the “brute force” have to be 
applied for every candidate query or only for those that match containing 
document of the input element? 

In general I avoid putting any xpath in the first arg.  In the JavaScript 
API it's not even possible, because it gives a false sense of optimization.

> If the brute force cost is applied to each query then doing a two-phase 
search would be faster: determine which reverse queries apply to the input 
document and then use those to find the elements within the input document that 
actually matched. But if the brute force cost only applies to those queries 
that match the containing doc then ML internally must produce the faster result 
than doing it in my own code. 
> 
> But as you say, that calls into the question the use of reverse queries 
at all: why not simply run the 125,000 forward queries and update each element 
matched as appropriate?

Yep.  If it's a one-time batch job and you're trying to minimize the time 
then this would be faster, I bet.

> Or it may simply be that we need to do some horizontal scaling and invest 
in additional D-nodes.

You're going to do this often?

-jh-

___
General mailing list
General@developer.marklogic.com
Manage your subscription at: 
http://developer.marklogic.com/mailman/listinfo/general

___
General mailing list
General@developer.marklogic.com
Manage your subscription at: 
http://developer.marklogic.com/mailman/listinfo/general

Re: [MarkLogic Dev General] Optimizing Reverse Queries

2017-05-01 Thread Eliot Kimber

By “which query” I mean which of the 125,000 separate query docs actually 
matched for a given cts:reverse-query() call. 

I guess my question is: in the case where the reverse query is applied to an 
element that is not a full document, does the “brute force” have to be applied 
for every candidate query or only for those that match containing document of 
the input element? 

If the brute force cost is applied to each query then doing a two-phase search 
would be faster: determine which reverse queries apply to the input document 
and then use those to find the elements within the input document that actually 
matched. But if the brute force cost only applies to those queries that match 
the containing doc then ML internally must produce the faster result than doing 
it in my own code. 

But as you say, that calls into the question the use of reverse queries at all: 
why not simply run the 125,000 forward queries and update each element matched 
as appropriate?

Or it may simply be that we need to do some horizontal scaling and invest in 
additional D-nodes.

Cheers,

E.
--
Eliot Kimber
http://contrext.com

On 5/1/17, 10:26 PM, "Jason Hunter"  wrote:

> Another question: having gotten a result from a reverse search at the 
full document level, is there a way to know *which* queries matched? If so then 
it would be easy enough to apply those queries to the relevant elements to do 
additional filtering (although I suppose that might get us back to the same 
place).

I'm a little confused.  You're putting multiple serialized queries into 
each document?  If you have just one serialized query in a document it's going 
to be obvious which query was the reverse match -- it was that one.

> In particular, if I have 125,000 reverse queries applied to a single 
document (assuming that total database volume doesn’t affect query speed in 
this case) on a modern fast server with appropriate indexes in place, how fast 
should I expect that query to take? 1ms?, 10ms?, 100ms? 1 second?

If you have 125,000 documents each with a serialized query in it and you do 
a reverse query for one document against those serialized queries and there's 
no hits, it should be extremely fast.  More hits will slow things a little bit 
because hits involve a little work.  The IMLS paper explains what the algorithm 
has to do.  I suspect (but haven't measured) that it's a lot like forward 
queries in that the timing depends a lot on number of matches.

> Our corpus has about 25 million elements that would be fragments per the 
advice above (about 1.5 million full documents). 

If you have 25 million elements you want to run against 125,000 serialized 
queries, wouldn't forward queries be faster?  You'd only have to do 125,000 
search calls instead of 25,000,000.  :)

> I’ve never done much with fragments in MarkLogic so I’m not sure what the 
full implication of making these subelements into fragments would be for other 
processing.

Yeah, fragmentation is not to be done lightly.

-jh-

___
General mailing list
General@developer.marklogic.com
Manage your subscription at: 
http://developer.marklogic.com/mailman/listinfo/general

___
General mailing list
General@developer.marklogic.com
Manage your subscription at: 
http://developer.marklogic.com/mailman/listinfo/general

Re: [MarkLogic Dev General] Optimizing Reverse Queries

2017-05-01 Thread Eliot Kimber

I think the key bit is here:

“MarkLogic indexes work at the fragment/document level.  So doing a reverse 
query 20 times against different subparts of a document is going to involve 
brute force filtering to see if the match was in the needed part or not.”

That suggests that our general approach to using reverse queries is flawed for 
this reason and would explain the apparent poor performance.

It’s not possible to break the current docs into smaller docs but it might be 
possible to configure fragmentation at a level where each fragment would only 
have one element we need to match on (e.g., titles).

Another question: having gotten a result from a reverse search at the full 
document level, is there a way to know *which* queries matched? If so then it 
would be easy enough to apply those queries to the relevant elements to do 
additional filtering (although I suppose that might get us back to the same 
place).

Unfortunately my current performance metrics are “it takes way too long now and 
needs to take at least ½ as long”. I need to do more work to get some useful 
measurements and do some calculations to determine what a reasonable 
performance should be (e.g., we have X million cases to check and 100ms per it 
should take about Y time but it takes Y*n time—why?).

Ultimately I need to try to determine how fast it *should* be for this type of 
operation. If I can determine that then I can determine whether the throughput 
requirements can be met by simply achieving that performance with the current 
server configuration or determine that it cannot and that we need to scale up, 
e.g, add additional D-nodes or something. 

I realize that nobody can offer me solid numbers based on what little I can 
share about the project details, other than to suggest some bounds.

In particular, if I have 125,000 reverse queries applied to a single document 
(assuming that total database volume doesn’t affect query speed in this case) 
on a modern fast server with appropriate indexes in place, how fast should I 
expect that query to take? 1ms?, 10ms?, 100ms? 1 second?

Based on my experience with ML and the documentation I would expect something 
around 10ms.

Our corpus has about 25 million elements that would be fragments per the advice 
above (about 1.5 million full documents). 

If we assume 10ms per query per fragment then it would take about 3 days to 
process all of them. Currently it takes 9, so roughly a 3x slowdown over what I 
think we could expect +/- 1 day (there’s other overhead in this 9-day number 
that may or may not be reduceable).

I’ve never done much with fragments in MarkLogic so I’m not sure what the full 
implication of making these subelements into fragments would be for other 
processing.

Cheers,

Eliot
--
Eliot Kimber
http://contrext.com

On 5/1/17, 9:43 PM, "Jason Hunter"  wrote:

So what's the performance you're seeing?

And what do you expect to be able to see?

Something to consider:  MarkLogic indexes work at the fragment/document 
level.  So doing a reverse query 20 times against different subparts of a 
document is going to involve brute force filtering to see if the match was in 
the needed part or not.  Might be better to have 20 documents instead of 1.

-jh-

> On May 2, 2017, at 01:29, Eliot Kimber  wrote:
> 
> Actually, its expected that every element will be matched by at least one 
query. This is a classification application and the intent of the application 
is that every element of interest will be classified. Many, if not most, of the 
queries depend on word-search features, e.g., stemmed matches, case 
insensitivity, etc. 
> 
> I’m new to this project so it may be that there is a better way to 
approach the problem in general. This is the system as currently implemented.
> 
> My overall charge is to improve the throughput performance so my first 
task is to first understand what the performance bottlenecks are then identify 
possible solutions.
> 
> It seems unlikely that we’ve done something silly in our queries or ML 
configuration but I want to eliminate the easy-to-fix before exploring more 
complicated options. 
> 
> Cheers,
> 
> Eliot
> 
> --
> Eliot Kimber
> http://contrext.com
> 
> 
> 
> On 5/1/17, 12:10 PM, "Jason Hunter" 
 wrote:
> 
>> The processing is, for each document to be processed, examine on the 
order of 10-20 elements to see if they match the reverse query by getting the 
node to be looked up and then doing:
> 
>Maybe you can reverse query on the document as a whole instead of 
running 20 reverse queries per document.  Only bother with the enumeration of 
the 20 if there's a proven hit within the document.
> 
>(I assume the vast majority of the time the

Re: [MarkLogic Dev General] Optimizing Reverse Queries

2017-05-01 Thread Eliot Kimber

Actually, its expected that every element will be matched by at least one 
query. This is a classification application and the intent of the application 
is that every element of interest will be classified. Many, if not most, of the 
queries depend on word-search features, e.g., stemmed matches, case 
insensitivity, etc. 

I’m new to this project so it may be that there is a better way to approach the 
problem in general. This is the system as currently implemented.

My overall charge is to improve the throughput performance so my first task is 
to first understand what the performance bottlenecks are then identify possible 
solutions.

It seems unlikely that we’ve done something silly in our queries or ML 
configuration but I want to eliminate the easy-to-fix before exploring more 
complicated options. 

Cheers,

Eliot

--
Eliot Kimber
http://contrext.com
 


On 5/1/17, 12:10 PM, "Jason Hunter"  wrote:

> The processing is, for each document to be processed, examine on the 
order of 10-20 elements to see if they match the reverse query by getting the 
node to be looked up and then doing:

Maybe you can reverse query on the document as a whole instead of running 
20 reverse queries per document.  Only bother with the enumeration of the 20 if 
there's a proven hit within the document.

(I assume the vast majority of the time there's not going to be hits.  If 
that's true then why not prove that in one pop instead of 20 pops.)

-jh-

___
General mailing list
General@developer.marklogic.com
Manage your subscription at: 
http://developer.marklogic.com/mailman/listinfo/general




___
General mailing list
General@developer.marklogic.com
Manage your subscription at: 
http://developer.marklogic.com/mailman/listinfo/general

Re: [MarkLogic Dev General] Optimizing Reverse Queries

2017-05-01 Thread Eliot Kimber

Just realized I didn’t show all the relevant query details (the actual query 
has more terms but I’m not at liberty to show those details). But the 
cts:search does specify “unfiltered”:

So a more complete representation is:

cts:search(
   collection()/MyReverseQueries,
   cts:and-query((
me:normal-query1(),
me:normal-query2(),
cts:reverse-query($node))),
   “unfiltered”)

Cheers,

E.

--
Eliot Kimber
http://contrext.com

On 5/1/17, 10:31 AM, "Eliot Kimber"  wrote:

Here is a typical reverse query document. Others may be a bit more complex, 
for example, an OR query matching on text strings or doing a cts word search:

http://marklogic.com/cts";>

c15fc
case-insensitive
diacritic-insensitive
punctuation-insensitive
whitespace-insensitive
unstemmed
wildcarded

The processing is, for each document to be processed, examine on the order 
of 10-20 elements to see if they match the reverse query by getting the node to 
be looked up and then doing:

cts:search(cts:reverse-query($node))

The initial profiling we did was just taking one source document and 
applying the process that then uses these reverse queries (that is, we haven’t 
yet had a chance to profile a larger run of documents).

I’m just starting my performance analysis here, but I don’t have any 
experience with reverse queries so I mostly just wanted to make sure that there 
wasn’t something fairly obvious that I might look for as a source of slowness 
before digging into things more deeply. I’m pretty sure I’ll have to do deeper 
profiling to see where the time is really being taken—strong possibility that 
it’s in our code and not really the reverse queries. 

Cheers,

Eliot

--
Eliot Kimber
http://contrext.com

On 5/1/17, 10:00 AM, "Jason Hunter" 
 wrote:

    On May 1, 2017, at 20:45, Eliot Kimber  wrote:
> 
> Using ML 8 we have an application that relies on reverse queries. The 
overall application is not performing as well as we need it to and our initial 
attempts at profiling show that the reverse queries are taking most of the 
time. We have about 120,000 separate reverse query documents. 

What kind of reverse queries are they?  Text?  Geo?  Simple?  Complex?

> The “Inside MarkLogic” document suggests that reverse queries, 
properly indexed, should be quite fast. I have verified that we have the “fast 
reverse queries” index turned on.
> 
> My question: What should I look for that might be causing our reverse 
queries to not be optimized?

What are you doing with them?  Looping against 1,000 documents?  Sample 
code will help us all understand.

How fast are they running exactly?

How fast do you need them to run?

> Are there any other ML settings or server configurations that might 
affect reverse query performance? Are there particular query patterns that 
might be suboptimal? Is there a way that I can confirm that the reverse queries 
are performing as fast as possible?

The xdmp:plan function is your friend.

-jh-

___
General mailing list
General@developer.marklogic.com
Manage your subscription at: 
http://developer.marklogic.com/mailman/listinfo/general

___
General mailing list
General@developer.marklogic.com
Manage your subscription at: 
http://developer.marklogic.com/mailman/listinfo/general

___
General mailing list
General@developer.marklogic.com
Manage your subscription at: 
http://developer.marklogic.com/mailman/listinfo/general

Re: [MarkLogic Dev General] Optimizing Reverse Queries

2017-05-01 Thread Eliot Kimber

Here is a typical reverse query document. Others may be a bit more complex, for 
example, an OR query matching on text strings or doing a cts word search:

http://marklogic.com/cts";>

c15fc
case-insensitive
diacritic-insensitive
punctuation-insensitive
whitespace-insensitive
unstemmed
wildcarded

The processing is, for each document to be processed, examine on the order of 
10-20 elements to see if they match the reverse query by getting the node to be 
looked up and then doing:

cts:search(cts:reverse-query($node))

The initial profiling we did was just taking one source document and applying 
the process that then uses these reverse queries (that is, we haven’t yet had a 
chance to profile a larger run of documents).

I’m just starting my performance analysis here, but I don’t have any experience 
with reverse queries so I mostly just wanted to make sure that there wasn’t 
something fairly obvious that I might look for as a source of slowness before 
digging into things more deeply. I’m pretty sure I’ll have to do deeper 
profiling to see where the time is really being taken—strong possibility that 
it’s in our code and not really the reverse queries. 

Cheers,

Eliot

--
Eliot Kimber
http://contrext.com

On 5/1/17, 10:00 AM, "Jason Hunter"  wrote:

On May 1, 2017, at 20:45, Eliot Kimber  wrote:
> 
> Using ML 8 we have an application that relies on reverse queries. The 
overall application is not performing as well as we need it to and our initial 
attempts at profiling show that the reverse queries are taking most of the 
time. We have about 120,000 separate reverse query documents. 

What kind of reverse queries are they?  Text?  Geo?  Simple?  Complex?

> The “Inside MarkLogic” document suggests that reverse queries, properly 
indexed, should be quite fast. I have verified that we have the “fast reverse 
queries” index turned on.
> 
> My question: What should I look for that might be causing our reverse 
queries to not be optimized?

What are you doing with them?  Looping against 1,000 documents?  Sample 
code will help us all understand.

How fast are they running exactly?

How fast do you need them to run?

> Are there any other ML settings or server configurations that might 
affect reverse query performance? Are there particular query patterns that 
might be suboptimal? Is there a way that I can confirm that the reverse queries 
are performing as fast as possible?

The xdmp:plan function is your friend.

-jh-

___
General mailing list
General@developer.marklogic.com
Manage your subscription at: 
http://developer.marklogic.com/mailman/listinfo/general

___
General mailing list
General@developer.marklogic.com
Manage your subscription at: 
http://developer.marklogic.com/mailman/listinfo/general

[MarkLogic Dev General] Optimizing Reverse Queries

2017-05-01 Thread Eliot Kimber

Using ML 8 we have an application that relies on reverse queries. The overall 
application is not performing as well as we need it to and our initial attempts 
at profiling show that the reverse queries are taking most of the time. We have 
about 120,000 separate reverse query documents. 

The “Inside MarkLogic” document suggests that reverse queries, properly 
indexed, should be quite fast. I have verified that we have the “fast reverse 
queries” index turned on.

My question: What should I look for that might be causing our reverse queries 
to not be optimized? Are there any other ML settings or server configurations 
that might affect reverse query performance? Are there particular query 
patterns that might be suboptimal? Is there a way that I can confirm that the 
reverse queries are performing as fast as possible?

This is an application that is applied to 100s of 1000s of documents, so even a 
small performance improvement will be significant.

Thanks,

Eliot

--
Eliot Kimber
http://contrext.com
 



___
General mailing list
General@developer.marklogic.com
Manage your subscription at: 
http://developer.marklogic.com/mailman/listinfo/general

Re: [MarkLogic Dev General] Attempting to Run ML 4 Under CentOs: Can't find libbteuclid.so.6.5.1

2017-04-30 Thread Eliot Kimber

Thanks to a suggestion from Norm Walsh I got this working. The key was simply 
adding the /opt/MarkLogic/lib dir to the LD_LIBRARY_PATH environment variable.

I now have ML 4 running under an Ubutunu 14-based Docker container, along with 
another container for RSuite 3.6.3 and a third for MySQL 5.1 (on which RSuite 
3.6.3 also depends). I haven’t circled back to see if the same approach would 
work with CentOS—it probably would.

Part of the reason for insisting on ML4 is that I need to do performance 
comparisons between the current ML4-based environment and a potential ML7- or 
8-based environment (where I presume I will see performance improvements but 
you don’t know until you measure). So I need ML4 to establish a baseline.

Even if older versions of a product are not supported the installers should be 
provided for situations like this—not everyone can upgrade and technologies 
like Docker make it possible to maintain older environments.

I understand the need to limit support exposures but you can do that without 
severely inconveniencing customers by simply making support policies clear as 
regards older versions. I was fortunate that I found a packrat who never throws 
anything away…

Cheers,

Eliot

--
Eliot Kimber
http://contrext.com
 


On 4/28/17, 6:01 PM, "Eliot Kimber"  wrote:

Upgrading this particular RSuite installation is not an option at this time.

I’m going to explore installing ML4 on Ubuntu.

Cheers,

Eliot

    --
Eliot Kimber
http://contrext.com
 


On 4/28/17, 3:58 PM, "Ganesh Vaideeswaran" 
 wrote:

Eliot, MarkLogic 4 does not support CentOS 6. So, I am not sure if I 
can offer any more guidance on this other than to say please upgrade RSuite to 
a version that runs on a supported MarkLogic version. And you are probably 
looking at that option.

Ganesh
-Original Message-
From: general-boun...@developer.marklogic.com 
[mailto:general-boun...@developer.marklogic.com] On Behalf Of Eliot Kimber
Sent: Friday, April 28, 2017 3:31 PM
To: MarkLogic Developer Discussion
Subject: Re: [MarkLogic Dev General] Attempting to Run ML 4 Under 
CentOs: Can't find libbteuclid.so.6.5.1

I realize that ML4 is not supported. However, the application I’m using 
(RSuite 3.6.3) requires ML 4 so I need to be able to run it.

Cheers,

Eliot
    
    --
Eliot Kimber
http://contrext.com
 


On 4/28/17, 3:29 PM, "Ganesh Vaideeswaran" 
 wrote:

Eliot,

MarkLogic 4 is not supported anymore. I would encourage you to 
upgrade to the latest version of MarkLogic 8. With respect to what is the best 
OS choice, since you are familiar with CentOS, I would suggest CentOS 7 though 
MarkLogic 8 supports CentOS 6 as well. Also at this time, we do not test 
MarkLogic running inside a docker. If you deploy MarkLogic inside a container 
and you need help from our support team, they _may_ request you reproduce the 
issue in a supported platform. 

Note that MarkLogic 9 only supports CentOS 7. Good luck with your 
upgrade. 

Ganesh
-Original Message-
From: general-boun...@developer.marklogic.com 
[mailto:general-boun...@developer.marklogic.com] On Behalf Of Eliot Kimber
Sent: Friday, April 28, 2017 3:18 PM
To: general@developer.marklogic.com
Subject: [MarkLogic Dev General] Attempting to Run ML 4 Under 
CentOs: Can't find libbteuclid.so.6.5.1

In order to support an ancient version of RSuite CMS I need to run 
MarkLogic 4.2. I have a 64-bit RPM that I’ve installed into Cento6.

However, when I run MarkLogic I get this failure:

[root@localhost bin]# ./MarkLogic
./MarkLogic: error while loading shared libraries: 
libbteuclid.so.6.5.1: cannot open shared object file: No such file or directory 
[root@localhost bin]# 

The library is present:

[root@localhost bin]# ls ../lib
libbteuclid.so.6.5.1  libbtrliprofile.so.6.5.1  libbtrlpc.so.6.5.1  
libbtunicode.so  libbtutils.so.6.5.0
libbtrlijni.solibbtrlpcore.so.6.5   libbtrlpjni.so  
libbtutiljni.so
[root@localhost bin]# 

So I think it must be a configuration issue, possibly the wrong 
version of the library. My online research did not reveal any obvious solution 
and my linux fu is weak.

I have the Redhat lsb-base package installed:

[root@localhost /]# yum install redhat-lsb Loaded plugins: 
fastestmirror, refresh-packagekit, security Setting up Install Process Loading 
mirror speeds from cac

Re: [MarkLogic Dev General] Attempting to Run ML 4 Under CentOs: Can't find libbteuclid.so.6.5.1

2017-04-28 Thread Eliot Kimber

Upgrading this particular RSuite installation is not an option at this time.

I’m going to explore installing ML4 on Ubuntu.

Cheers,

Eliot

--
Eliot Kimber
http://contrext.com
 


On 4/28/17, 3:58 PM, "Ganesh Vaideeswaran" 
 wrote:

Eliot, MarkLogic 4 does not support CentOS 6. So, I am not sure if I can 
offer any more guidance on this other than to say please upgrade RSuite to a 
version that runs on a supported MarkLogic version. And you are probably 
looking at that option.

Ganesh
-Original Message-
From: general-boun...@developer.marklogic.com 
[mailto:general-boun...@developer.marklogic.com] On Behalf Of Eliot Kimber
Sent: Friday, April 28, 2017 3:31 PM
To: MarkLogic Developer Discussion
Subject: Re: [MarkLogic Dev General] Attempting to Run ML 4 Under CentOs: 
Can't find libbteuclid.so.6.5.1

I realize that ML4 is not supported. However, the application I’m using 
(RSuite 3.6.3) requires ML 4 so I need to be able to run it.

Cheers,

Eliot
    
    --
    Eliot Kimber
http://contrext.com
 


On 4/28/17, 3:29 PM, "Ganesh Vaideeswaran" 
 wrote:

Eliot,

MarkLogic 4 is not supported anymore. I would encourage you to upgrade 
to the latest version of MarkLogic 8. With respect to what is the best OS 
choice, since you are familiar with CentOS, I would suggest CentOS 7 though 
MarkLogic 8 supports CentOS 6 as well. Also at this time, we do not test 
MarkLogic running inside a docker. If you deploy MarkLogic inside a container 
and you need help from our support team, they _may_ request you reproduce the 
issue in a supported platform. 

Note that MarkLogic 9 only supports CentOS 7. Good luck with your 
upgrade. 

Ganesh
-Original Message-
From: general-boun...@developer.marklogic.com 
[mailto:general-boun...@developer.marklogic.com] On Behalf Of Eliot Kimber
Sent: Friday, April 28, 2017 3:18 PM
To: general@developer.marklogic.com
Subject: [MarkLogic Dev General] Attempting to Run ML 4 Under CentOs: 
Can't find libbteuclid.so.6.5.1

In order to support an ancient version of RSuite CMS I need to run 
MarkLogic 4.2. I have a 64-bit RPM that I’ve installed into Cento6.

However, when I run MarkLogic I get this failure:

[root@localhost bin]# ./MarkLogic
./MarkLogic: error while loading shared libraries: 
libbteuclid.so.6.5.1: cannot open shared object file: No such file or directory 
[root@localhost bin]# 

The library is present:

[root@localhost bin]# ls ../lib
libbteuclid.so.6.5.1  libbtrliprofile.so.6.5.1  libbtrlpc.so.6.5.1  
libbtunicode.so  libbtutils.so.6.5.0
libbtrlijni.solibbtrlpcore.so.6.5   libbtrlpjni.so  
libbtutiljni.so
[root@localhost bin]# 

So I think it must be a configuration issue, possibly the wrong version 
of the library. My online research did not reveal any obvious solution and my 
linux fu is weak.

I have the Redhat lsb-base package installed:

[root@localhost /]# yum install redhat-lsb Loaded plugins: 
fastestmirror, refresh-packagekit, security Setting up Install Process Loading 
mirror speeds from cached hostfile
 * base: mirror.5ninesolutions.com
 * extras: centos.eecs.wsu.edu
 * updates: mirror.5ninesolutions.com
Package redhat-lsb-4.0-7.el6.centos.x86_64 already installed and latest 
version Nothing to do

So my question: Is it possible to run ML 4.2 under CentOS and if so 
what do I need to do to resolve this library problem? If not, what is my best 
OS choice?

My ultimate goal is to run ML in a Docker container (along with RSuite 
and MySQL, on which RSuite depends), so I was using the ML 7+ Dockerfile as a 
base (thus my use of CentOS). 

Thanks,

    Eliot
    --
Eliot Kimber
http://contrext.com
 






___
General mailing list
General@developer.marklogic.com
Manage your subscription at: 
http://developer.marklogic.com/mailman/listinfo/general
___
General mailing list
General@developer.marklogic.com
Manage your subscription at: 
http://developer.marklogic.com/mailman/listinfo/general



___
General mailing list
General@developer.marklogic.com
Manage your subscription at: 
http://developer.marklogic.com/mailman/listinfo/general
___
General mailing list
General@developer.marklogic.com
Manage y

Re: [MarkLogic Dev General] Attempting to Run ML 4 Under CentOs: Can't find libbteuclid.so.6.5.1

2017-04-28 Thread Eliot Kimber

I realize that ML4 is not supported. However, the application I’m using (RSuite 
3.6.3) requires ML 4 so I need to be able to run it.

Cheers,

Eliot

--
Eliot Kimber
http://contrext.com
 


On 4/28/17, 3:29 PM, "Ganesh Vaideeswaran" 
 wrote:

Eliot,

MarkLogic 4 is not supported anymore. I would encourage you to upgrade to 
the latest version of MarkLogic 8. With respect to what is the best OS choice, 
since you are familiar with CentOS, I would suggest CentOS 7 though MarkLogic 8 
supports CentOS 6 as well. Also at this time, we do not test MarkLogic running 
inside a docker. If you deploy MarkLogic inside a container and you need help 
from our support team, they _may_ request you reproduce the issue in a 
supported platform. 

Note that MarkLogic 9 only supports CentOS 7. Good luck with your upgrade. 

Ganesh
-Original Message-
From: general-boun...@developer.marklogic.com 
[mailto:general-boun...@developer.marklogic.com] On Behalf Of Eliot Kimber
Sent: Friday, April 28, 2017 3:18 PM
To: general@developer.marklogic.com
Subject: [MarkLogic Dev General] Attempting to Run ML 4 Under CentOs: Can't 
find libbteuclid.so.6.5.1

In order to support an ancient version of RSuite CMS I need to run 
MarkLogic 4.2. I have a 64-bit RPM that I’ve installed into Cento6.

However, when I run MarkLogic I get this failure:

[root@localhost bin]# ./MarkLogic
./MarkLogic: error while loading shared libraries: libbteuclid.so.6.5.1: 
cannot open shared object file: No such file or directory [root@localhost bin]# 

The library is present:

[root@localhost bin]# ls ../lib
libbteuclid.so.6.5.1  libbtrliprofile.so.6.5.1  libbtrlpc.so.6.5.1  
libbtunicode.so  libbtutils.so.6.5.0
libbtrlijni.solibbtrlpcore.so.6.5   libbtrlpjni.so  
libbtutiljni.so
[root@localhost bin]# 

So I think it must be a configuration issue, possibly the wrong version of 
the library. My online research did not reveal any obvious solution and my 
linux fu is weak.

I have the Redhat lsb-base package installed:

[root@localhost /]# yum install redhat-lsb Loaded plugins: fastestmirror, 
refresh-packagekit, security Setting up Install Process Loading mirror speeds 
from cached hostfile
 * base: mirror.5ninesolutions.com
 * extras: centos.eecs.wsu.edu
 * updates: mirror.5ninesolutions.com
Package redhat-lsb-4.0-7.el6.centos.x86_64 already installed and latest 
version Nothing to do

So my question: Is it possible to run ML 4.2 under CentOS and if so what do 
I need to do to resolve this library problem? If not, what is my best OS choice?

My ultimate goal is to run ML in a Docker container (along with RSuite and 
MySQL, on which RSuite depends), so I was using the ML 7+ Dockerfile as a base 
(thus my use of CentOS). 

Thanks,

Eliot
    --
    Eliot Kimber
http://contrext.com
 






___
General mailing list
General@developer.marklogic.com
Manage your subscription at: 
http://developer.marklogic.com/mailman/listinfo/general
___
General mailing list
General@developer.marklogic.com
Manage your subscription at: 
http://developer.marklogic.com/mailman/listinfo/general



___
General mailing list
General@developer.marklogic.com
Manage your subscription at: 
http://developer.marklogic.com/mailman/listinfo/general

[MarkLogic Dev General] Attempting to Run ML 4 Under CentOs: Can't find libbteuclid.so.6.5.1

2017-04-28 Thread Eliot Kimber

In order to support an ancient version of RSuite CMS I need to run MarkLogic 
4.2. I have a 64-bit RPM that I’ve installed into Cento6.

However, when I run MarkLogic I get this failure:

[root@localhost bin]# ./MarkLogic 
./MarkLogic: error while loading shared libraries: libbteuclid.so.6.5.1: cannot 
open shared object file: No such file or directory
[root@localhost bin]# 

The library is present:

[root@localhost bin]# ls ../lib
libbteuclid.so.6.5.1  libbtrliprofile.so.6.5.1  libbtrlpc.so.6.5.1  
libbtunicode.so  libbtutils.so.6.5.0
libbtrlijni.solibbtrlpcore.so.6.5   libbtrlpjni.so  
libbtutiljni.so
[root@localhost bin]# 

So I think it must be a configuration issue, possibly the wrong version of the 
library. My online research did not reveal any obvious solution and my linux fu 
is weak.

I have the Redhat lsb-base package installed:

[root@localhost /]# yum install redhat-lsb
Loaded plugins: fastestmirror, refresh-packagekit, security
Setting up Install Process
Loading mirror speeds from cached hostfile
 * base: mirror.5ninesolutions.com
 * extras: centos.eecs.wsu.edu
 * updates: mirror.5ninesolutions.com
Package redhat-lsb-4.0-7.el6.centos.x86_64 already installed and latest version
Nothing to do

So my question: Is it possible to run ML 4.2 under CentOS and if so what do I 
need to do to resolve this library problem? If not, what is my best OS choice?

My ultimate goal is to run ML in a Docker container (along with RSuite and 
MySQL, on which RSuite depends), so I was using the ML 7+ Dockerfile as a base 
(thus my use of CentOS). 

Thanks,

Eliot
--
Eliot Kimber
http://contrext.com
 






___
General mailing list
General@developer.marklogic.com
Manage your subscription at: 
http://developer.marklogic.com/mailman/listinfo/general

Re: [MarkLogic Dev General] Determining Whether Whitespace is In Data as Stored or A Result of Serialization?

2011-11-30 Thread Eliot Kimber

I upgraded to 4.2-7 and verified that the serialization issue was resolved.

Cheers,

E.

On 11/30/11 6:56 AM, "general@developer.marklogic.com"
 wrote:

> Date: Tue, 29 Nov 2011 16:55:31 -0800
> From: Danny Sokolsky 
> Subject: Re: [MarkLogic Dev General] Determining Whether Whitespace is
> In Data as Stored or A Result of Serialization?
> To: General MarkLogic Developer Discussion
> 
> Message-ID:
> 
> Content-Type: text/plain; charset="us-ascii"
> 
> Hi Eliot,
> 
> There were some changes made in later 4.2 releases to restore the behavior
> from earlier releases.  The serialization is about how it is output, not how
> it is stored, so it should be stored correctly.
> 
> I recommend trying it on the latest 4.2 release (4.2-7 now, I think).  I think
> it will then, by default, behave the same as in 4.1.  In 4.2, there are some
> serialization options you can set at the query level to control this.  In
> MarkLogic 5, you can also control these options' default values at the App
> Server level.
> 
> Here is the 4.2 release not item that describes some of these changes:
> 
> http://docs.marklogic.com/4.2doc/docapp.xqy#display.xqy?fname=http://pubs/4.2d
> oc/xml/relnotes/chap4.xml%2340996
> 
> -Danny
**

-- 
Eliot Kimber
Senior Solutions Architect
"Bringing Strategy, Content, and Technology Together"
Main: 512.554.9368
www.reallysi.com
www.rsuitecms.com

___
General mailing list
General@developer.marklogic.com
http://developer.marklogic.com/mailman/listinfo/general

[MarkLogic Dev General] Determining Whether Whitespace is In Data as Stored or A Result of Serialization?

2011-11-28 Thread Eliot Kimber

I have determined that content loaded through the XccRunner.load() method
has unwanted whitespace not in the original XML when subsequently accessed
from MarkLogic.

I've tested on 4.2-1. Earlier versions do not seem to have this behavior
(although I need to do more testing to confirm--but we certainly would have
noticed it if we had, as from our standpoint it constitutes a data
corruption issue as data being returned from ML is different from what was
given to ML).

I traced the DOM being loaded right to the call of load() and verified by
inspection that there were no whitespace nodes between two particular
elements, e.g., the original source was:

texttext

Accessing the loaded document using e.g.,:

doc('/foo/bar/mynewdoc.xml')

Results in:


  text
  text
   

(where there is multiple whitespace before the  start tags and before
the  close tag).

I tried various access routes, including CQ, access via our own product's
calls to the XccRunner API, OxygenXML via WebDAV and direct XQuery (via Xcc)
and get the same result. Some accesses show more indention than others, but
they all have indention.

>From what I could find it appears that this is the result of a change in the
default serialization options.
  
My primary question is: how can I determine how the XML is stored in ML
without interference from any serialization options? Assuming the ML is not
literally storing the bytes of the ML, I assume I can't just look inside the
forest, but is there a reliable way to see what the original whitespace was?
My first task is to prove that the ML is correct as provided to MarkLogic.

My secondary questions:

1. Is there any way that options on the load() method could affect
whitespace as stored? I didn't see any but I could have missed something.

2. If this is in fact a function of serialization options, where would we
control that in our Java code that uses Xcc to run XQueries? Is it simply a
matter of adding "declare option xdmp:output indent=no;" to our XQuery
modules?

3. Is this default serialization behavior changed in ML 5?

Thanks,

Eliot

-- 
Eliot Kimber
Senior Solutions Architect
"Bringing Strategy, Content, and Technology Together"
Main: 512.554.9368
www.reallysi.com
www.rsuitecms.com

___
General mailing list
General@developer.marklogic.com
http://developer.marklogic.com/mailman/listinfo/general

Re: [MarkLogic Dev General] Trailing Spaces Removed from Attribute Values--Bug or Feature?

2008-03-12 Thread Eliot Kimber


Michael Blakeley wrote:
I can't speak to the product issue, but is it practical to work around 
this behavior by ending your class attributes with a dummy character? 
For example:


  
/self::*[contains(@class, ' topic/topic ')]

  => true

DITA attaches meaning to the leading '-' and '+' characters. Will the 
trailing '-' cause problems? If so, could another character be used?


I've been exploring that question with the DITA Technical Committee and 
the answer appears to be that in fact there are tools that would be 
disrupted by having anything after the initial "-" or "+" that is not a 
module/type name pair.


Adding a trailing character would be an easy fix but it would still 
require scrubbing of the data for use by tools outside of MarkLogic.


Cheers,

Eliot
--
Eliot Kimber
Senior Solutions Architect
"Bringing Strategy, Content, and Technology Together"
Main: 610.631.6770
www.reallysi.com
www.rsuitecms.com
___
General mailing list
General@developer.marklogic.com
http://xqzone.com/mailman/listinfo/general

1 2 >

1 - 100 of 105 matches

Mail list logo