Re: [MarkLogic Dev General] How to handle very large xml file to prevent com.marklogic.xcc.exceptions.XQueryException: Time limit exceeded

2016-05-24 Thread Kari Cowan
Jason, Geert – thanks for the suggestions.  I am going to test some changes out 
– we’re first trying moving the xQuery into the java application itself rather 
than calling the script via the app – for better processing.

Do appreciate the ideas – thanks again.

From: general-boun...@developer.marklogic.com 
[mailto:general-boun...@developer.marklogic.com] On Behalf Of Jason Hunter
Sent: Tuesday, May 24, 2016 6:37 AM
To: MarkLogic Developer Discussion <general@developer.marklogic.com>
Subject: Re: [MarkLogic Dev General] How to handle very large xml file to 
prevent com.marklogic.xcc.exceptions.XQueryException: Time limit exceeded

One tip:

Any time you can express $node//child as $node/exact/path/to/child you'll get 
better performance, because it saves MarkLogic from having to scan the full 
tree looking for the child.

Then there's little things to try, like if you're going to repeatedly compare a 
node to another node's value, you can get the data($val) value and compare 
using that instead, so the atomization of the node happens just once.  Internal 
optimizations things like this change between server versions so I tend to 
experiment.

And why get /text() if you want /string().

The following line of code is called presumably a large number of times, so the 
above ideas could help.

$xml_doc//firmname[.=$theOrigFirmname]/../translation/text()

Maybe:

$xml_doc/exact/path/translation[firmname = $theOrigFirmnameData]/string()

Also, have you tried using the profiler?

-jh-

On May 24, 2016, at 2:53 AM, Kari Cowan <kco...@alm.com<mailto:kco...@alm.com>> 
wrote:

The file is used in a different application that I don’t have control over, so 
I am just adjusting the data that’s in the file – to fix the firmname 
(correcting some typo’s and inconsistencies they had and continue to have – 
can’t really prevent that because the service pulls the data from various 
public court records and every law clerk seems to have their own way of 
entering the data).

When my script is doing: for $firms in $pacer_doc//(counsel|party) …
Is there a better way than load the doc nodes in a for loop – maybe some other 
function I am not aware of or another flowr ?



From: 
general-boun...@developer.marklogic.com<mailto:general-boun...@developer.marklogic.com>
 [mailto:general-boun...@developer.marklogic.com] On Behalf Of Geert Josten
Sent: Monday, May 23, 2016 11:44 AM
To: MarkLogic Developer Discussion 
<general@developer.marklogic.com<mailto:general@developer.marklogic.com>>
Subject: Re: [MarkLogic Dev General] How to handle very large xml file to 
prevent com.marklogic.xcc.exceptions.XQueryException: Time limit exceeded

Hi Kari,

13 Mb isn’t really big actually, but big enough to perform less optimal, and 
cause timeouts. You could just increase the timeout, but it is probably a 
better idea to revise your strategy, and consider breaking your large file into 
record-like files (each containing just one firm for instance). You can then 
make much more use of the search capabilities of MarkLogic.

Cheers,
Geert

From: 
<general-boun...@developer.marklogic.com<mailto:general-boun...@developer.marklogic.com>>
 on behalf of Kari Cowan <kco...@alm.com<mailto:kco...@alm.com>>
Reply-To: MarkLogic Developer Discussion 
<general@developer.marklogic.com<mailto:general@developer.marklogic.com>>
Date: Monday, May 23, 2016 at 8:40 PM
To: "general@developer.marklogic.com<mailto:general@developer.marklogic.com>" 
<general@developer.marklogic.com<mailto:general@developer.marklogic.com>>
Subject: [MarkLogic Dev General] How to handle very large xml file to prevent 
com.marklogic.xcc.exceptions.XQueryException: Time limit exceeded

There must be a better way to do this.  My script works fine when it’s loading 
a document that is not very large, but occassionally one of the docs is massive 
(13Mb on one of my error issues), and when that happens, in my application I 
get an error like:
com.marklogic.xcc.exceptions.XQueryException: Time limit exceeded

The script is basically getting a uri, reading it back and comparing the 
‘firmname’ nodes (there can be many in the same document), and if it differs in 
the shortlist.xml, we change it to what that file says it should be.

The problem with my large file – there’s over 72,000 lawfirms it’s trying to 
compare

This is my script – anyone have a suggestion of a better way to accomplish what 
I am attempting?



xquery version "1.0-ml";
declare namespace html = "http://www.w3.org/1999/xhtml;;

declare variable $uri as xs:string external;
let $uri := try { ($uri) } catch ($e) { "" }
(: let $uri:="/olympus/pacer-xml/9739715_3:15-cv-01221" :)

let $xml_doc:=fn:doc("/olympus/data-utils/standard_firmnames_shortlist.xml")

for $this_uri in "$uri"
let $doc := fn:doc($uri)
let $pacer_doc:=$doc

for $firms in $pacer_doc//(counsel|party)
  let $theO

Re: [MarkLogic Dev General] How to handle very large xml file to prevent com.marklogic.xcc.exceptions.XQueryException: Time limit exceeded

2016-05-24 Thread pp11p...@yahoo.com



  Sent from my T-Mobile 4G LTE device  -- Original message-- From: Kari Cowan<kco...@alm.com>Date: Mon, 5/23/2016 1:40 PMTo: general@developer.marklogic.com;Subject:[MarkLogic Dev General] How to handle very large xml file to prevent com.marklogic.xcc.exceptions.XQueryException: Time limit exceeded 


There must be a better way to do this.  My script works fine when it’s loading a document that is not very large, but occassionally one of the docs is massive (13Mb on one of my error issues), and when that happens, in my application I get an error like:
com.marklogic.xcc.exceptions.XQueryException: Time limit exceeded
 
The script is basically getting a uri, reading it back and comparing the ‘firmname’ nodes (there can be many in the same document), and if it differs in the shortlist.xml, we change it to what that file says it should be.
 
The problem with my large file – there’s over 72,000 lawfirms it’s trying to compare
 
This is my script – anyone have a suggestion of a better way to accomplish what I am attempting?
 
 
 
xquery version "1.0-ml";
declare namespace html = "http://www.w3.org/1999/xhtml";
 
declare variable $uri as xs:string external;
let $uri := try { ($uri) } catch ($e) { "" }
(: let $uri:="/olympus/pacer-xml/9739715_3:15-cv-01221" :)
 
let $xml_doc:=fn:doc("/olympus/data-utils/standard_firmnames_shortlist.xml")
 
for $this_uri in "$uri"
let $doc := fn:doc($uri)
let $pacer_doc:=$doc
 
for $firms in $pacer_doc//(counsel|party) 
  let $theOrigFirmname:= $firms/originalFirmname  

  let $theFirmname:= $firms/firmname
  let $translation:= $xml_doc//firmname[.=$theOrigFirmname]/../translation/text()
 
 
for $firm in $pacer_doc
return if( fn:exists($translation) and fn:exists($theFirmname) and ($translation ne $theFirmname ) ) then
(
  fn:concat("CHANGING FIRMNAME: ",$theFirmname, " TO STANDARD FIRMNAME TRANSLATION: ",$translation, " IN URI: " ,$uri),

  xdmp:log(fn:concat("Olympotomus Changed Firmname: ",$theFirmname, " in URI: " ,$uri)),

  xdmp:node-replace($theFirmname,{$translation}) 

 )
else (
  fn:concat("...Evaluated and did not change Firmname: ",$theFirmname, " in URI: " ,$uri),
  xdmp:log(fn:concat("Olympotomus Evaluated and did not change a Firmname: ",$theFirmname, " in URI: " ,$uri))
  )


 
ALM, an information and intelligence company, provides customers with critical news, data, analysis, marketing solutions and events to successfully manage the business of business.    Customers use ALM solutions to discover new ideas and approaches for solving business challenges, connect to the right professionals and peers to move business forward, and compete to win through access to data, analytics and insight. ALM serves a community of over six million business professionals seeking to discover, connect and compete in highly complex industries. Learn more at www.alm.com. 


___
General mailing list
General@developer.marklogic.com
Manage your subscription at: 
http://developer.marklogic.com/mailman/listinfo/general


Re: [MarkLogic Dev General] How to handle very large xml file to prevent com.marklogic.xcc.exceptions.XQueryException: Time limit exceeded

2016-05-24 Thread Jason Hunter
One tip:

Any time you can express $node//child as $node/exact/path/to/child you'll get 
better performance, because it saves MarkLogic from having to scan the full 
tree looking for the child.

Then there's little things to try, like if you're going to repeatedly compare a 
node to another node's value, you can get the data($val) value and compare 
using that instead, so the atomization of the node happens just once.  Internal 
optimizations things like this change between server versions so I tend to 
experiment.

And why get /text() if you want /string().

The following line of code is called presumably a large number of times, so the 
above ideas could help.

> $xml_doc//firmname[.=$theOrigFirmname]/../translation/text()


Maybe:

$xml_doc/exact/path/translation[firmname = $theOrigFirmnameData]/string()

Also, have you tried using the profiler?

-jh-

> On May 24, 2016, at 2:53 AM, Kari Cowan <kco...@alm.com> wrote:
> 
> The file is used in a different application that I don’t have control over, 
> so I am just adjusting the data that’s in the file – to fix the firmname 
> (correcting some typo’s and inconsistencies they had and continue to have – 
> can’t really prevent that because the service pulls the data from various 
> public court records and every law clerk seems to have their own way of 
> entering the data).
>  
> When my script is doing: for $firms in $pacer_doc//(counsel|party) …
> Is there a better way than load the doc nodes in a for loop – maybe some 
> other function I am not aware of or another flowr ?
>  
>  
>  
> From: general-boun...@developer.marklogic.com 
> <mailto:general-boun...@developer.marklogic.com> 
> [mailto:general-boun...@developer.marklogic.com 
> <mailto:general-boun...@developer.marklogic.com>] On Behalf Of Geert Josten
> Sent: Monday, May 23, 2016 11:44 AM
> To: MarkLogic Developer Discussion <general@developer.marklogic.com 
> <mailto:general@developer.marklogic.com>>
> Subject: Re: [MarkLogic Dev General] How to handle very large xml file to 
> prevent com.marklogic.xcc.exceptions.XQueryException: Time limit exceeded
>  
> Hi Kari,
>  
> 13 Mb isn’t really big actually, but big enough to perform less optimal, and 
> cause timeouts. You could just increase the timeout, but it is probably a 
> better idea to revise your strategy, and consider breaking your large file 
> into record-like files (each containing just one firm for instance). You can 
> then make much more use of the search capabilities of MarkLogic.
>  
> Cheers,
> Geert
>  
> From: <general-boun...@developer.marklogic.com 
> <mailto:general-boun...@developer.marklogic.com>> on behalf of Kari Cowan 
> <kco...@alm.com <mailto:kco...@alm.com>>
> Reply-To: MarkLogic Developer Discussion <general@developer.marklogic.com 
> <mailto:general@developer.marklogic.com>>
> Date: Monday, May 23, 2016 at 8:40 PM
> To: "general@developer.marklogic.com 
> <mailto:general@developer.marklogic.com>" <general@developer.marklogic.com 
> <mailto:general@developer.marklogic.com>>
> Subject: [MarkLogic Dev General] How to handle very large xml file to prevent 
> com.marklogic.xcc.exceptions.XQueryException: Time limit exceeded
>  
> There must be a better way to do this.  My script works fine when it’s 
> loading a document that is not very large, but occassionally one of the docs 
> is massive (13Mb on one of my error issues), and when that happens, in my 
> application I get an error like:
> com.marklogic.xcc.exceptions.XQueryException: Time limit exceeded
>  
> The script is basically getting a uri, reading it back and comparing the 
> ‘firmname’ nodes (there can be many in the same document), and if it differs 
> in the shortlist.xml, we change it to what that file says it should be.
>  
> The problem with my large file – there’s over 72,000 lawfirms it’s trying to 
> compare
>  
> This is my script – anyone have a suggestion of a better way to accomplish 
> what I am attempting?
>  
>  
>  
> xquery version "1.0-ml";
> declare namespace html = "http://www.w3.org/1999/xhtml 
> <http://www.w3.org/1999/xhtml>";
>  
> declare variable $uri as xs:string external;
> let $uri := try { ($uri) } catch ($e) { "" }
> (: let $uri:="/olympus/pacer-xml/9739715_3:15-cv-01221" :)
>  
> let $xml_doc:=fn:doc("/olympus/data-utils/standard_firmnames_shortlist.xml")
>  
> for $this_uri in "$uri"
> let $doc := fn:doc($uri)
> let $pacer_doc:=$doc
>  
> for $firms in $pacer_doc//(counsel|party)
>   let $theOrigFirmname:= $firms/originalFirmname 
>   let $theFirmname:= $firms/firmname
>   let $translation:= 
> $xml_do

Re: [MarkLogic Dev General] How to handle very large xml file to prevent com.marklogic.xcc.exceptions.XQueryException: Time limit exceeded

2016-05-23 Thread Kari Cowan
The file is used in a different application that I don't have control over, so 
I am just adjusting the data that's in the file - to fix the firmname 
(correcting some typo's and inconsistencies they had and continue to have - 
can't really prevent that because the service pulls the data from various 
public court records and every law clerk seems to have their own way of 
entering the data).

When my script is doing: for $firms in $pacer_doc//(counsel|party) ...
Is there a better way than load the doc nodes in a for loop - maybe some other 
function I am not aware of or another flowr ?



From: general-boun...@developer.marklogic.com 
[mailto:general-boun...@developer.marklogic.com] On Behalf Of Geert Josten
Sent: Monday, May 23, 2016 11:44 AM
To: MarkLogic Developer Discussion <general@developer.marklogic.com>
Subject: Re: [MarkLogic Dev General] How to handle very large xml file to 
prevent com.marklogic.xcc.exceptions.XQueryException: Time limit exceeded

Hi Kari,

13 Mb isn't really big actually, but big enough to perform less optimal, and 
cause timeouts. You could just increase the timeout, but it is probably a 
better idea to revise your strategy, and consider breaking your large file into 
record-like files (each containing just one firm for instance). You can then 
make much more use of the search capabilities of MarkLogic.

Cheers,
Geert

From: 
<general-boun...@developer.marklogic.com<mailto:general-boun...@developer.marklogic.com>>
 on behalf of Kari Cowan <kco...@alm.com<mailto:kco...@alm.com>>
Reply-To: MarkLogic Developer Discussion 
<general@developer.marklogic.com<mailto:general@developer.marklogic.com>>
Date: Monday, May 23, 2016 at 8:40 PM
To: "general@developer.marklogic.com<mailto:general@developer.marklogic.com>" 
<general@developer.marklogic.com<mailto:general@developer.marklogic.com>>
Subject: [MarkLogic Dev General] How to handle very large xml file to prevent 
com.marklogic.xcc.exceptions.XQueryException: Time limit exceeded

There must be a better way to do this.  My script works fine when it's loading 
a document that is not very large, but occassionally one of the docs is massive 
(13Mb on one of my error issues), and when that happens, in my application I 
get an error like:
com.marklogic.xcc.exceptions.XQueryException: Time limit exceeded

The script is basically getting a uri, reading it back and comparing the 
'firmname' nodes (there can be many in the same document), and if it differs in 
the shortlist.xml, we change it to what that file says it should be.

The problem with my large file - there's over 72,000 lawfirms it's trying to 
compare

This is my script - anyone have a suggestion of a better way to accomplish what 
I am attempting?



xquery version "1.0-ml";
declare namespace html = "http://www.w3.org/1999/xhtml;;

declare variable $uri as xs:string external;
let $uri := try { ($uri) } catch ($e) { "" }
(: let $uri:="/olympus/pacer-xml/9739715_3:15-cv-01221" :)

let $xml_doc:=fn:doc("/olympus/data-utils/standard_firmnames_shortlist.xml")

for $this_uri in "$uri"
let $doc := fn:doc($uri)
let $pacer_doc:=$doc

for $firms in $pacer_doc//(counsel|party)
  let $theOrigFirmname:= $firms/originalFirmname
  let $theFirmname:= $firms/firmname
  let $translation:= 
$xml_doc//firmname[.=$theOrigFirmname]/../translation/text()


for $firm in $pacer_doc
return if( fn:exists($translation) and fn:exists($theFirmname) and 
($translation ne $theFirmname ) ) then
(
  fn:concat("CHANGING FIRMNAME: ",$theFirmname, " TO STANDARD FIRMNAME 
TRANSLATION: ",$translation, " IN URI: " ,$uri),
  xdmp:log(fn:concat("Olympotomus Changed Firmname: ",$theFirmname, " in URI: " 
,$uri)),
  xdmp:node-replace($theFirmname,{$translation})
 )
else (
  fn:concat("...Evaluated and did not change Firmname: ",$theFirmname, " in 
URI: " ,$uri),
  xdmp:log(fn:concat("Olympotomus Evaluated and did not change a Firmname: 
",$theFirmname, " in URI: " ,$uri))
  )

ALM, an information and intelligence company, provides customers with critical 
news, data, analysis, marketing solutions and events to successfully manage the 
business of business.

Customers use ALM solutions to discover new ideas and approaches for solving 
business challenges, connect to the right professionals and peers to move 
business forward, and compete to win through access to data, analytics and 
insight. ALM serves a community of over six million business professionals 
seeking to discover, connect and compete in highly complex industries. Learn 
more at www.alm.com.



___
General mailing list
General@developer.marklogic.com
Manage your subscription at: 
http://developer.marklogic.com/mailman/listinfo/general


Re: [MarkLogic Dev General] How to handle very large xml file to prevent com.marklogic.xcc.exceptions.XQueryException: Time limit exceeded

2016-05-23 Thread Geert Josten
Hi Kari,

13 Mb isn’t really big actually, but big enough to perform less optimal, and 
cause timeouts. You could just increase the timeout, but it is probably a 
better idea to revise your strategy, and consider breaking your large file into 
record-like files (each containing just one firm for instance). You can then 
make much more use of the search capabilities of MarkLogic.

Cheers,
Geert

From: 
<general-boun...@developer.marklogic.com<mailto:general-boun...@developer.marklogic.com>>
 on behalf of Kari Cowan <kco...@alm.com<mailto:kco...@alm.com>>
Reply-To: MarkLogic Developer Discussion 
<general@developer.marklogic.com<mailto:general@developer.marklogic.com>>
Date: Monday, May 23, 2016 at 8:40 PM
To: "general@developer.marklogic.com<mailto:general@developer.marklogic.com>" 
<general@developer.marklogic.com<mailto:general@developer.marklogic.com>>
Subject: [MarkLogic Dev General] How to handle very large xml file to prevent 
com.marklogic.xcc.exceptions.XQueryException: Time limit exceeded

There must be a better way to do this.  My script works fine when it’s loading 
a document that is not very large, but occassionally one of the docs is massive 
(13Mb on one of my error issues), and when that happens, in my application I 
get an error like:
com.marklogic.xcc.exceptions.XQueryException: Time limit exceeded

The script is basically getting a uri, reading it back and comparing the 
‘firmname’ nodes (there can be many in the same document), and if it differs in 
the shortlist.xml, we change it to what that file says it should be.

The problem with my large file – there’s over 72,000 lawfirms it’s trying to 
compare

This is my script – anyone have a suggestion of a better way to accomplish what 
I am attempting?



xquery version "1.0-ml";
declare namespace html = "http://www.w3.org/1999/xhtml;;

declare variable $uri as xs:string external;
let $uri := try { ($uri) } catch ($e) { "" }
(: let $uri:="/olympus/pacer-xml/9739715_3:15-cv-01221" :)

let $xml_doc:=fn:doc("/olympus/data-utils/standard_firmnames_shortlist.xml")

for $this_uri in "$uri"
let $doc := fn:doc($uri)
let $pacer_doc:=$doc

for $firms in $pacer_doc//(counsel|party)
  let $theOrigFirmname:= $firms/originalFirmname
  let $theFirmname:= $firms/firmname
  let $translation:= 
$xml_doc//firmname[.=$theOrigFirmname]/../translation/text()


for $firm in $pacer_doc
return if( fn:exists($translation) and fn:exists($theFirmname) and 
($translation ne $theFirmname ) ) then
(
  fn:concat("CHANGING FIRMNAME: ",$theFirmname, " TO STANDARD FIRMNAME 
TRANSLATION: ",$translation, " IN URI: " ,$uri),
  xdmp:log(fn:concat("Olympotomus Changed Firmname: ",$theFirmname, " in URI: " 
,$uri)),
  xdmp:node-replace($theFirmname,{$translation})
 )
else (
  fn:concat("...Evaluated and did not change Firmname: ",$theFirmname, " in 
URI: " ,$uri),
  xdmp:log(fn:concat("Olympotomus Evaluated and did not change a Firmname: 
",$theFirmname, " in URI: " ,$uri))
  )


ALM, an information and intelligence company, provides customers with critical 
news, data, analysis, marketing solutions and events to successfully manage the 
business of business.

Customers use ALM solutions to discover new ideas and approaches for solving 
business challenges, connect to the right professionals and peers to move 
business forward, and compete to win through access to data, analytics and 
insight. ALM serves a community of over six million business professionals 
seeking to discover, connect and compete in highly complex industries. Learn 
more at www.alm.com.
___
General mailing list
General@developer.marklogic.com
Manage your subscription at: 
http://developer.marklogic.com/mailman/listinfo/general