> I currently implemented this idea which closes and opens a new session every > 1000 imports. We'll see how it goes. But my question remains, what > information is kept in memory after a database connection is closed?
Usually nothing ;) I just created some thousand databases in a loop, and memory consumption was constant. But I guess we’re doing slightly different things? > > Also, the memory limit that is been set for different servers only applies to > *that* basex server, right, and not to all basex servers running on a single > machine? If I am running 6 servers on different ports on a single machine, > does a set memory limit of, say, 512MB mean that each instance is allocated > 512MB, or that 512MB is distributed among all basex instances? > > > Kind regards > > Bram > > -----Oorspronkelijk bericht----- > Van: basex-talk-boun...@mailman.uni-konstanz.de > [mailto:basex-talk-boun...@mailman.uni-konstanz.de] Namens Christian Grün > Verzonden: zondag 16 oktober 2016 10:22 > Aan: Marco Lettere <m.lett...@gmail.com> > CC: BaseX <basex-talk@mailman.uni-konstanz.de> > Onderwerp: Re: [basex-talk] Creating more than a million databases per > session: Out Of Memory > > Hi Bram, > > I second Marco in his advise to find a good compromise between single > databases and single documents. > > Regarding the OOM, the stack trace could possibly be helpful for judging what > might go wrong in your setup. > > Cheers > Christian > > > On Sat, Oct 15, 2016 at 4:19 PM, Marco Lettere <m.lett...@gmail.com> wrote: >> Hi Bram, >> not being much into the issue of creating databases at this scale I'm >> not sure whether the OOM problems you are facing are related to Basex >> of JVM actually. >> Anyway something rather simple you could try is to behave "in between". >> Instead of opening a single session for the create statements >> alltogether or one session for each and every create, you could split >> your create statements in chunks of 100/1000 or the like and >> distribute them over subsequent (or maybe even parallel?) sessions.... >> I'm not sure whether this is applicable for your use case though. >> Regards, >> Marco. >> >> >> On 15/10/2016 10:48, Bram Vanroy | KU Leuven wrote: >> >> Hi all >> >> >> >> I’ve talked before on how we restructured our data to drastically >> improve search times on a 500 million token corpus. [1] Now, after >> some minor improvements, I am trying to import the generated XML files >> into BaseX. The result would be 100,00s to millions of BaseX databases >> – as we expect. When doing the import, though, I am running into OOM >> errors. We put our memory limit on 512MB. The thing is that this seems >> incredibly odd to me: because we are creating so many different >> databases, which are all really small as a consequence, I would not >> expect BaseX to need to store much in memory. After each database is >> created, the garbage collector can come along and remove everything that was >> needed for the previously generated database. >> >> >> >> A solution, I suppose, would be to close and open the BaseX session on >> each creation but I’m afraid that (on such a huge scale) the impact on >> speed would be too large. How it is set up now, in pseudo code: >> >> >> >> ---------------------------------------------------------------------- >> ---------- >> >> >> >> $session = Session->new(host, port, user, pw); >> >> >> >> # @allFiles is at least 100,000 items >> >> For $file (@allFiles) { >> >> $database_name = $file . “name”; >> >> $session->execute("CREATE DB $database_name file "); >> >> $session->execute("CLOSE"); >> >> } >> >> >> >> $session->close(); >> >> >> >> ---------------------------------------------------------------------- >> ---------- >> >> >> >> So all databases are created on the same session which I believe >> causes the issue. But why? What is still required in memory after >> ->execute(“CLOSE”)? >> Are the indices for the generated databases stored in memory? If so, >> can we force them to write to disk? >> >> >> >> ANY thoughts on this are appreciated. Enlightenment on how what is >> stored in a Session’s memory is useful as well. Increasing the memory >> should be a last resort. >> >> >> >> >> >> Thank you in advance! >> >> >> >> Bram >> >> >> >> >> >> [1]: >> http://www.lrec-conf.org/proceedings/lrec2014/workshops/LREC2014Worksh >> op-CMLC2%20Proceedings-rev2.pdf#page=20 >> >> >> >> >