Hello, Excellent, thank you very much. It does work, and quite fast it seems.
Now I'll go and read some documentation on xquery... Merci encore, et bon week-end Simon On 22 September 2017 at 14:58, Fabrice ETANCHAUD < [email protected]> wrote: > Bonjour à nouveau, Simon, > > > > I think that tumbling windows could be of great help in your use case : > > > > Let consider the following test db : > > > > 1. Creation > > > > db:create(‘test’) > > > > 2. Documents insertion (in @ts descending order to check that the > solution is working whatever the document physical order) > > > > for $i in 1 to 100 > > let $ts := current-dateTime() + xs:dayTimeDuration('PT'||(100-$i+1)||'S') > > let $flag := random:integer(2) > > return > > db:add( > > 'test', > > <notif id ="name1" ts="{$ts}"> > > <flag>{$flag}</flag> > > </notif>, > > 'notif' || $i || '.xml') > > > > Then the following query should do the job : > > > > for tumbling window $i in sort( > > db:open('test'), > > (), > > function($doc) { > > $doc/notif/@ts/data() > > }) > > start $s when fn:true() > > end $e next $n when $e/notif/flag != $n/notif/flag > > return > > $i[1] > > > > It iterate on the sorted documents (by ascending @ts), > > And output the first document of each monotonic flag group. > > > > Hoping I did it right, > > Best regards, > > > > Fabrice > > CERFrance Poitou-Charentes > > > > *De :* [email protected] [mailto: > [email protected]] *De la part de* Simon > Chatelain > *Envoyé :* vendredi 22 septembre 2017 13:32 > *À :* BaseX > *Objet :* Re: [basex-talk] OutOfMemoryError at Query#more() > > > > Bonjour Fabrice, > > > > Thanks for the suggestion. I did try that (sending a query for each > document), and it does work … sort of. Performance wise, it's really slow > even if the database is fully optimized. > > > > As for writing my process in xquery, that’s a good question. Honestly I > don’t know as I am quite new at xquery, I lack the expertise. > > > > I’ll try to give more detail about what I am trying to achieve. > > > > In my database I have a series of XML documents, which, once really > simplified, look like that. > > > > <notif id ="name1" ts="2016-01-01T08:01:05.000"> > > <flag>0</flag> > > </notif> > > <notif id ="name1" ts="2016-01-01T08:01:10.000"> > > <flag>0</flag> > > </notif> > > <notif id ="name1" ts="2016-01-01T08:01:15.000"> > > <flag>0</flag> > > </notif> > > ... > > <notif id ="name1" ts="2016-01-01T08:01:20.000"> > > <flag>1</flag> > > </notif> > > > > <notif id ="name1" ts="2016-01-01T08:01:25.000"> > > <flag>0</flag> > > </notif> > > <notif id ="name1" ts="2016-01-01T08:01:30.000"> > > <flag>0</flag> > > </notif> > > <notif id ="name1" ts="2016-01-01T08:01:35.000"> > > <flag>0</flag> > > </notif> > > ... > > <notif id ="name1" ts="2016-01-01T08:01:40.000"> > > <flag>1</flag> > > </notif> > > > > What I need to get is: > > The first XML document (first as in smallest @ts value) > > Then the next document with <flag>1</flag> (again next in the @ts order) > > Then the next document with <flag>0</flag> > > And so on… > > > > That would be the documents highlighted in red in the above example. > > Roughly only 1 out of 1000 documents has <flag>1</flag> > > > > I tried several approaches to do that, but the faster one I found is to > iterate through all documents with a very simple xquery and keep only the > ones I need, > > for $d in collection(‘1234567’)/* where $d/@name = ‘name1’ return $d > > Another approach was to first select all documents with <flag>1</flag> > > for $d in collection(‘1234567’)/* where $d/@name = ‘name1’ and $d/flag = 1 > return $d > > then for each of those get the next document > > (for $d in collection(‘1234567’)/* where $d/@name = ‘name1’ and $d/flag = > 0 and $d/@ts > ‘[ts of previous document]’ return $d)[1] > > > > Or select the first document, > > (for $d in collection(‘1234567’)/* where $d/@name = ‘name1’ return $d)[1] > > then query the next > > (for $d in collection(‘1234567’)/* where $d/@name = ‘name1’ and $d/flag = > 1 and $d/@ts > ‘[ts of previous document]’ return $d)[1] > > And the next… > > (for $d in collection(‘1234567’)/* where $d/@name = ‘name1’ and $d/flag = > 0 and $d/@ts > ‘[ts of previous document]’ return $d)[1] > > And so on. > > > > But none of those is as fast as the first one, and then I hit this > OutOfMemory issue. > > > > So if there is a way to rewrite all that process in xquery that could be > an option worth trying, or if there is a more efficient way to write the > query > > (for $d in collection(‘1234567’)/* where $d/@name = ‘name1’ and $d/flag = > 0 and $d/@ts > ‘[ts of previous document]’ return $d)[1] > > That could also solve my problem. > > > > Regards > > > > Simon > > > > > > > > On 22 September 2017 at 09:53, Fabrice ETANCHAUD < > [email protected]> wrote: > > Bonjour Simon, > > > > I would send a query for each document, > > externalizing the loop in java. > > > > A question : could you process be written in xquery ? That way you might > not face memory overflow. > > > > Best regards, > > Fabrice Etanchaud > > CERFrance Poitou-Charentes > > > > *De :* [email protected] [mailto: > [email protected]] *De la part de* Simon > Chatelain > *Envoyé :* vendredi 22 septembre 2017 09:34 > *À :* BaseX > *Objet :* [basex-talk] OutOfMemoryError at Query#more() > > > > Hello, > > I am facing an issue while retrieving some big amount of XML documents > from a BaseX collection. > > Each document (as an XML file) is around 10 KB, and in the problematic > case I must retrieve around 70000 of them. > > I am using Session#query(String query) then Query#more() and Query#next() > to iterate through the result of my query. > > > > try (final Query query = l_Session.query(“query”)) { > > while (query.more()) { > > String xml = query.next(); > > } > > } > > If there is more than a certain amount of XML document in the result of my > query I get a OutOfMemoryError (full stack trace in attached file) when > executing query.more(). > > > > I did the test with BaseX 8.6.6 and 8.6.7, Java 8, VM arguments –Xmx1024m > > > > Increasing the Xmx value is not a solution as I don’t know what the > maximum amount of data I will have to retrieve in the future. So what I > need is a reliable way of executing such queries and iterate through the > result without exploding the heap size. > > I also try to use QueryProcessor and QueryProcessor#iter() instead of > Session#query(String > query). But is it safe to use it knowing that my application is > multithreaded and that each thread has its own session to query or add > elements from/to multiple collections? > > Moreover, for now all access to BaseX are done through a session, so my > application can run with an embedded BaseX or with a BaseX server. If I > start using QueryProcessor, then it will be embedded BaseX only, right? > > > > I also attached a simple example showing the problem. > > > > Any advice would be much appreciated > > > > Thanks > > Simon > > > > > > > > >

