Re: [MarkLogic Dev General] How to handle very large xml file to prevent com.marklogic.xcc.exceptions.XQueryException: Time limit exceeded
Jason, Geert – thanks for the suggestions. I am going to test some changes out – we’re first trying moving the xQuery into the java application itself rather than calling the script via the app – for better processing. Do appreciate the ideas – thanks again. From: general-boun...@developer.marklogic.com [mailto:general-boun...@developer.marklogic.com] On Behalf Of Jason Hunter Sent: Tuesday, May 24, 2016 6:37 AM To: MarkLogic Developer Discussion <general@developer.marklogic.com> Subject: Re: [MarkLogic Dev General] How to handle very large xml file to prevent com.marklogic.xcc.exceptions.XQueryException: Time limit exceeded One tip: Any time you can express $node//child as $node/exact/path/to/child you'll get better performance, because it saves MarkLogic from having to scan the full tree looking for the child. Then there's little things to try, like if you're going to repeatedly compare a node to another node's value, you can get the data($val) value and compare using that instead, so the atomization of the node happens just once. Internal optimizations things like this change between server versions so I tend to experiment. And why get /text() if you want /string(). The following line of code is called presumably a large number of times, so the above ideas could help. $xml_doc//firmname[.=$theOrigFirmname]/../translation/text() Maybe: $xml_doc/exact/path/translation[firmname = $theOrigFirmnameData]/string() Also, have you tried using the profiler? -jh- On May 24, 2016, at 2:53 AM, Kari Cowan <kco...@alm.com<mailto:kco...@alm.com>> wrote: The file is used in a different application that I don’t have control over, so I am just adjusting the data that’s in the file – to fix the firmname (correcting some typo’s and inconsistencies they had and continue to have – can’t really prevent that because the service pulls the data from various public court records and every law clerk seems to have their own way of entering the data). When my script is doing: for $firms in $pacer_doc//(counsel|party) … Is there a better way than load the doc nodes in a for loop – maybe some other function I am not aware of or another flowr ? From: general-boun...@developer.marklogic.com<mailto:general-boun...@developer.marklogic.com> [mailto:general-boun...@developer.marklogic.com] On Behalf Of Geert Josten Sent: Monday, May 23, 2016 11:44 AM To: MarkLogic Developer Discussion <general@developer.marklogic.com<mailto:general@developer.marklogic.com>> Subject: Re: [MarkLogic Dev General] How to handle very large xml file to prevent com.marklogic.xcc.exceptions.XQueryException: Time limit exceeded Hi Kari, 13 Mb isn’t really big actually, but big enough to perform less optimal, and cause timeouts. You could just increase the timeout, but it is probably a better idea to revise your strategy, and consider breaking your large file into record-like files (each containing just one firm for instance). You can then make much more use of the search capabilities of MarkLogic. Cheers, Geert From: <general-boun...@developer.marklogic.com<mailto:general-boun...@developer.marklogic.com>> on behalf of Kari Cowan <kco...@alm.com<mailto:kco...@alm.com>> Reply-To: MarkLogic Developer Discussion <general@developer.marklogic.com<mailto:general@developer.marklogic.com>> Date: Monday, May 23, 2016 at 8:40 PM To: "general@developer.marklogic.com<mailto:general@developer.marklogic.com>" <general@developer.marklogic.com<mailto:general@developer.marklogic.com>> Subject: [MarkLogic Dev General] How to handle very large xml file to prevent com.marklogic.xcc.exceptions.XQueryException: Time limit exceeded There must be a better way to do this. My script works fine when it’s loading a document that is not very large, but occassionally one of the docs is massive (13Mb on one of my error issues), and when that happens, in my application I get an error like: com.marklogic.xcc.exceptions.XQueryException: Time limit exceeded The script is basically getting a uri, reading it back and comparing the ‘firmname’ nodes (there can be many in the same document), and if it differs in the shortlist.xml, we change it to what that file says it should be. The problem with my large file – there’s over 72,000 lawfirms it’s trying to compare This is my script – anyone have a suggestion of a better way to accomplish what I am attempting? xquery version "1.0-ml"; declare namespace html = "http://www.w3.org/1999/xhtml;; declare variable $uri as xs:string external; let $uri := try { ($uri) } catch ($e) { "" } (: let $uri:="/olympus/pacer-xml/9739715_3:15-cv-01221" :) let $xml_doc:=fn:doc("/olympus/data-utils/standard_firmnames_shortlist.xml") for $this_uri in "$uri" let $doc := fn:doc($uri) let $pacer_doc:=$doc for $firms in $pacer_doc//(counsel|party) let $theO
Re: [MarkLogic Dev General] How to handle very large xml file to prevent com.marklogic.xcc.exceptions.XQueryException: Time limit exceeded
Sent from my T-Mobile 4G LTE device -- Original message-- From: Kari Cowan<kco...@alm.com>Date: Mon, 5/23/2016 1:40 PMTo: general@developer.marklogic.com;Subject:[MarkLogic Dev General] How to handle very large xml file to prevent com.marklogic.xcc.exceptions.XQueryException: Time limit exceeded There must be a better way to do this. My script works fine when it’s loading a document that is not very large, but occassionally one of the docs is massive (13Mb on one of my error issues), and when that happens, in my application I get an error like: com.marklogic.xcc.exceptions.XQueryException: Time limit exceeded The script is basically getting a uri, reading it back and comparing the ‘firmname’ nodes (there can be many in the same document), and if it differs in the shortlist.xml, we change it to what that file says it should be. The problem with my large file – there’s over 72,000 lawfirms it’s trying to compare This is my script – anyone have a suggestion of a better way to accomplish what I am attempting? xquery version "1.0-ml"; declare namespace html = "http://www.w3.org/1999/xhtml"; declare variable $uri as xs:string external; let $uri := try { ($uri) } catch ($e) { "" } (: let $uri:="/olympus/pacer-xml/9739715_3:15-cv-01221" :) let $xml_doc:=fn:doc("/olympus/data-utils/standard_firmnames_shortlist.xml") for $this_uri in "$uri" let $doc := fn:doc($uri) let $pacer_doc:=$doc for $firms in $pacer_doc//(counsel|party) let $theOrigFirmname:= $firms/originalFirmname let $theFirmname:= $firms/firmname let $translation:= $xml_doc//firmname[.=$theOrigFirmname]/../translation/text() for $firm in $pacer_doc return if( fn:exists($translation) and fn:exists($theFirmname) and ($translation ne $theFirmname ) ) then ( fn:concat("CHANGING FIRMNAME: ",$theFirmname, " TO STANDARD FIRMNAME TRANSLATION: ",$translation, " IN URI: " ,$uri), xdmp:log(fn:concat("Olympotomus Changed Firmname: ",$theFirmname, " in URI: " ,$uri)), xdmp:node-replace($theFirmname,{$translation}) ) else ( fn:concat("...Evaluated and did not change Firmname: ",$theFirmname, " in URI: " ,$uri), xdmp:log(fn:concat("Olympotomus Evaluated and did not change a Firmname: ",$theFirmname, " in URI: " ,$uri)) ) ALM, an information and intelligence company, provides customers with critical news, data, analysis, marketing solutions and events to successfully manage the business of business. Customers use ALM solutions to discover new ideas and approaches for solving business challenges, connect to the right professionals and peers to move business forward, and compete to win through access to data, analytics and insight. ALM serves a community of over six million business professionals seeking to discover, connect and compete in highly complex industries. Learn more at www.alm.com. ___ General mailing list General@developer.marklogic.com Manage your subscription at: http://developer.marklogic.com/mailman/listinfo/general
Re: [MarkLogic Dev General] How to handle very large xml file to prevent com.marklogic.xcc.exceptions.XQueryException: Time limit exceeded
One tip: Any time you can express $node//child as $node/exact/path/to/child you'll get better performance, because it saves MarkLogic from having to scan the full tree looking for the child. Then there's little things to try, like if you're going to repeatedly compare a node to another node's value, you can get the data($val) value and compare using that instead, so the atomization of the node happens just once. Internal optimizations things like this change between server versions so I tend to experiment. And why get /text() if you want /string(). The following line of code is called presumably a large number of times, so the above ideas could help. > $xml_doc//firmname[.=$theOrigFirmname]/../translation/text() Maybe: $xml_doc/exact/path/translation[firmname = $theOrigFirmnameData]/string() Also, have you tried using the profiler? -jh- > On May 24, 2016, at 2:53 AM, Kari Cowan <kco...@alm.com> wrote: > > The file is used in a different application that I don’t have control over, > so I am just adjusting the data that’s in the file – to fix the firmname > (correcting some typo’s and inconsistencies they had and continue to have – > can’t really prevent that because the service pulls the data from various > public court records and every law clerk seems to have their own way of > entering the data). > > When my script is doing: for $firms in $pacer_doc//(counsel|party) … > Is there a better way than load the doc nodes in a for loop – maybe some > other function I am not aware of or another flowr ? > > > > From: general-boun...@developer.marklogic.com > <mailto:general-boun...@developer.marklogic.com> > [mailto:general-boun...@developer.marklogic.com > <mailto:general-boun...@developer.marklogic.com>] On Behalf Of Geert Josten > Sent: Monday, May 23, 2016 11:44 AM > To: MarkLogic Developer Discussion <general@developer.marklogic.com > <mailto:general@developer.marklogic.com>> > Subject: Re: [MarkLogic Dev General] How to handle very large xml file to > prevent com.marklogic.xcc.exceptions.XQueryException: Time limit exceeded > > Hi Kari, > > 13 Mb isn’t really big actually, but big enough to perform less optimal, and > cause timeouts. You could just increase the timeout, but it is probably a > better idea to revise your strategy, and consider breaking your large file > into record-like files (each containing just one firm for instance). You can > then make much more use of the search capabilities of MarkLogic. > > Cheers, > Geert > > From: <general-boun...@developer.marklogic.com > <mailto:general-boun...@developer.marklogic.com>> on behalf of Kari Cowan > <kco...@alm.com <mailto:kco...@alm.com>> > Reply-To: MarkLogic Developer Discussion <general@developer.marklogic.com > <mailto:general@developer.marklogic.com>> > Date: Monday, May 23, 2016 at 8:40 PM > To: "general@developer.marklogic.com > <mailto:general@developer.marklogic.com>" <general@developer.marklogic.com > <mailto:general@developer.marklogic.com>> > Subject: [MarkLogic Dev General] How to handle very large xml file to prevent > com.marklogic.xcc.exceptions.XQueryException: Time limit exceeded > > There must be a better way to do this. My script works fine when it’s > loading a document that is not very large, but occassionally one of the docs > is massive (13Mb on one of my error issues), and when that happens, in my > application I get an error like: > com.marklogic.xcc.exceptions.XQueryException: Time limit exceeded > > The script is basically getting a uri, reading it back and comparing the > ‘firmname’ nodes (there can be many in the same document), and if it differs > in the shortlist.xml, we change it to what that file says it should be. > > The problem with my large file – there’s over 72,000 lawfirms it’s trying to > compare > > This is my script – anyone have a suggestion of a better way to accomplish > what I am attempting? > > > > xquery version "1.0-ml"; > declare namespace html = "http://www.w3.org/1999/xhtml > <http://www.w3.org/1999/xhtml>"; > > declare variable $uri as xs:string external; > let $uri := try { ($uri) } catch ($e) { "" } > (: let $uri:="/olympus/pacer-xml/9739715_3:15-cv-01221" :) > > let $xml_doc:=fn:doc("/olympus/data-utils/standard_firmnames_shortlist.xml") > > for $this_uri in "$uri" > let $doc := fn:doc($uri) > let $pacer_doc:=$doc > > for $firms in $pacer_doc//(counsel|party) > let $theOrigFirmname:= $firms/originalFirmname > let $theFirmname:= $firms/firmname > let $translation:= > $xml_do
Re: [MarkLogic Dev General] How to handle very large xml file to prevent com.marklogic.xcc.exceptions.XQueryException: Time limit exceeded
The file is used in a different application that I don't have control over, so I am just adjusting the data that's in the file - to fix the firmname (correcting some typo's and inconsistencies they had and continue to have - can't really prevent that because the service pulls the data from various public court records and every law clerk seems to have their own way of entering the data). When my script is doing: for $firms in $pacer_doc//(counsel|party) ... Is there a better way than load the doc nodes in a for loop - maybe some other function I am not aware of or another flowr ? From: general-boun...@developer.marklogic.com [mailto:general-boun...@developer.marklogic.com] On Behalf Of Geert Josten Sent: Monday, May 23, 2016 11:44 AM To: MarkLogic Developer Discussion <general@developer.marklogic.com> Subject: Re: [MarkLogic Dev General] How to handle very large xml file to prevent com.marklogic.xcc.exceptions.XQueryException: Time limit exceeded Hi Kari, 13 Mb isn't really big actually, but big enough to perform less optimal, and cause timeouts. You could just increase the timeout, but it is probably a better idea to revise your strategy, and consider breaking your large file into record-like files (each containing just one firm for instance). You can then make much more use of the search capabilities of MarkLogic. Cheers, Geert From: <general-boun...@developer.marklogic.com<mailto:general-boun...@developer.marklogic.com>> on behalf of Kari Cowan <kco...@alm.com<mailto:kco...@alm.com>> Reply-To: MarkLogic Developer Discussion <general@developer.marklogic.com<mailto:general@developer.marklogic.com>> Date: Monday, May 23, 2016 at 8:40 PM To: "general@developer.marklogic.com<mailto:general@developer.marklogic.com>" <general@developer.marklogic.com<mailto:general@developer.marklogic.com>> Subject: [MarkLogic Dev General] How to handle very large xml file to prevent com.marklogic.xcc.exceptions.XQueryException: Time limit exceeded There must be a better way to do this. My script works fine when it's loading a document that is not very large, but occassionally one of the docs is massive (13Mb on one of my error issues), and when that happens, in my application I get an error like: com.marklogic.xcc.exceptions.XQueryException: Time limit exceeded The script is basically getting a uri, reading it back and comparing the 'firmname' nodes (there can be many in the same document), and if it differs in the shortlist.xml, we change it to what that file says it should be. The problem with my large file - there's over 72,000 lawfirms it's trying to compare This is my script - anyone have a suggestion of a better way to accomplish what I am attempting? xquery version "1.0-ml"; declare namespace html = "http://www.w3.org/1999/xhtml;; declare variable $uri as xs:string external; let $uri := try { ($uri) } catch ($e) { "" } (: let $uri:="/olympus/pacer-xml/9739715_3:15-cv-01221" :) let $xml_doc:=fn:doc("/olympus/data-utils/standard_firmnames_shortlist.xml") for $this_uri in "$uri" let $doc := fn:doc($uri) let $pacer_doc:=$doc for $firms in $pacer_doc//(counsel|party) let $theOrigFirmname:= $firms/originalFirmname let $theFirmname:= $firms/firmname let $translation:= $xml_doc//firmname[.=$theOrigFirmname]/../translation/text() for $firm in $pacer_doc return if( fn:exists($translation) and fn:exists($theFirmname) and ($translation ne $theFirmname ) ) then ( fn:concat("CHANGING FIRMNAME: ",$theFirmname, " TO STANDARD FIRMNAME TRANSLATION: ",$translation, " IN URI: " ,$uri), xdmp:log(fn:concat("Olympotomus Changed Firmname: ",$theFirmname, " in URI: " ,$uri)), xdmp:node-replace($theFirmname,{$translation}) ) else ( fn:concat("...Evaluated and did not change Firmname: ",$theFirmname, " in URI: " ,$uri), xdmp:log(fn:concat("Olympotomus Evaluated and did not change a Firmname: ",$theFirmname, " in URI: " ,$uri)) ) ALM, an information and intelligence company, provides customers with critical news, data, analysis, marketing solutions and events to successfully manage the business of business. Customers use ALM solutions to discover new ideas and approaches for solving business challenges, connect to the right professionals and peers to move business forward, and compete to win through access to data, analytics and insight. ALM serves a community of over six million business professionals seeking to discover, connect and compete in highly complex industries. Learn more at www.alm.com. ___ General mailing list General@developer.marklogic.com Manage your subscription at: http://developer.marklogic.com/mailman/listinfo/general
Re: [MarkLogic Dev General] How to handle very large xml file to prevent com.marklogic.xcc.exceptions.XQueryException: Time limit exceeded
Hi Kari, 13 Mb isn’t really big actually, but big enough to perform less optimal, and cause timeouts. You could just increase the timeout, but it is probably a better idea to revise your strategy, and consider breaking your large file into record-like files (each containing just one firm for instance). You can then make much more use of the search capabilities of MarkLogic. Cheers, Geert From: <general-boun...@developer.marklogic.com<mailto:general-boun...@developer.marklogic.com>> on behalf of Kari Cowan <kco...@alm.com<mailto:kco...@alm.com>> Reply-To: MarkLogic Developer Discussion <general@developer.marklogic.com<mailto:general@developer.marklogic.com>> Date: Monday, May 23, 2016 at 8:40 PM To: "general@developer.marklogic.com<mailto:general@developer.marklogic.com>" <general@developer.marklogic.com<mailto:general@developer.marklogic.com>> Subject: [MarkLogic Dev General] How to handle very large xml file to prevent com.marklogic.xcc.exceptions.XQueryException: Time limit exceeded There must be a better way to do this. My script works fine when it’s loading a document that is not very large, but occassionally one of the docs is massive (13Mb on one of my error issues), and when that happens, in my application I get an error like: com.marklogic.xcc.exceptions.XQueryException: Time limit exceeded The script is basically getting a uri, reading it back and comparing the ‘firmname’ nodes (there can be many in the same document), and if it differs in the shortlist.xml, we change it to what that file says it should be. The problem with my large file – there’s over 72,000 lawfirms it’s trying to compare This is my script – anyone have a suggestion of a better way to accomplish what I am attempting? xquery version "1.0-ml"; declare namespace html = "http://www.w3.org/1999/xhtml;; declare variable $uri as xs:string external; let $uri := try { ($uri) } catch ($e) { "" } (: let $uri:="/olympus/pacer-xml/9739715_3:15-cv-01221" :) let $xml_doc:=fn:doc("/olympus/data-utils/standard_firmnames_shortlist.xml") for $this_uri in "$uri" let $doc := fn:doc($uri) let $pacer_doc:=$doc for $firms in $pacer_doc//(counsel|party) let $theOrigFirmname:= $firms/originalFirmname let $theFirmname:= $firms/firmname let $translation:= $xml_doc//firmname[.=$theOrigFirmname]/../translation/text() for $firm in $pacer_doc return if( fn:exists($translation) and fn:exists($theFirmname) and ($translation ne $theFirmname ) ) then ( fn:concat("CHANGING FIRMNAME: ",$theFirmname, " TO STANDARD FIRMNAME TRANSLATION: ",$translation, " IN URI: " ,$uri), xdmp:log(fn:concat("Olympotomus Changed Firmname: ",$theFirmname, " in URI: " ,$uri)), xdmp:node-replace($theFirmname,{$translation}) ) else ( fn:concat("...Evaluated and did not change Firmname: ",$theFirmname, " in URI: " ,$uri), xdmp:log(fn:concat("Olympotomus Evaluated and did not change a Firmname: ",$theFirmname, " in URI: " ,$uri)) ) ALM, an information and intelligence company, provides customers with critical news, data, analysis, marketing solutions and events to successfully manage the business of business. Customers use ALM solutions to discover new ideas and approaches for solving business challenges, connect to the right professionals and peers to move business forward, and compete to win through access to data, analytics and insight. ALM serves a community of over six million business professionals seeking to discover, connect and compete in highly complex industries. Learn more at www.alm.com. ___ General mailing list General@developer.marklogic.com Manage your subscription at: http://developer.marklogic.com/mailman/listinfo/general