> I currently implemented this idea which closes and opens a new session every 
> 1000 imports. We'll see how it goes. But my question remains, what 
> information is kept in memory after a database connection is closed?

Usually nothing ;) I just created some thousand databases in a loop,
and memory consumption was constant. But I guess we’re doing slightly
different things?



>
> Also, the memory limit that is been set for different servers only applies to 
> *that* basex server, right, and not to all basex servers running on a single 
> machine? If I am running 6 servers on different ports on a single machine, 
> does a set memory limit of, say, 512MB mean that each instance is allocated 
> 512MB, or that 512MB is distributed among all basex instances?
>
>
> Kind regards
>
> Bram
>
> -----Oorspronkelijk bericht-----
> Van: basex-talk-boun...@mailman.uni-konstanz.de 
> [mailto:basex-talk-boun...@mailman.uni-konstanz.de] Namens Christian Grün
> Verzonden: zondag 16 oktober 2016 10:22
> Aan: Marco Lettere <m.lett...@gmail.com>
> CC: BaseX <basex-talk@mailman.uni-konstanz.de>
> Onderwerp: Re: [basex-talk] Creating more than a million databases per 
> session: Out Of Memory
>
> Hi Bram,
>
> I second Marco in his advise to find a good compromise between single 
> databases and single documents.
>
> Regarding the OOM, the stack trace could possibly be helpful for judging what 
> might go wrong in your setup.
>
> Cheers
> Christian
>
>
> On Sat, Oct 15, 2016 at 4:19 PM, Marco Lettere <m.lett...@gmail.com> wrote:
>> Hi Bram,
>> not being much into the issue of creating databases at this scale I'm
>> not sure whether the OOM problems you are facing are related to Basex
>> of JVM actually.
>> Anyway something rather simple you could try is to behave "in between".
>> Instead of opening a single session for the create statements
>> alltogether or one session for each and every create, you could split
>> your create statements in chunks of 100/1000 or the like and
>> distribute them over subsequent (or maybe even parallel?) sessions....
>> I'm not sure whether this is applicable for your use case though.
>> Regards,
>> Marco.
>>
>>
>> On 15/10/2016 10:48, Bram Vanroy | KU Leuven wrote:
>>
>> Hi all
>>
>>
>>
>> I’ve talked before on how we restructured our data to drastically
>> improve search times on a 500 million token corpus. [1] Now, after
>> some minor improvements, I am trying to import the generated XML files
>> into BaseX. The result would be 100,00s to millions of BaseX databases
>> – as we expect. When doing the import, though, I am running into OOM
>> errors. We put our memory limit on 512MB. The thing is that this seems
>> incredibly odd to me: because we are creating so many different
>> databases, which are all really small as a consequence, I would not
>> expect BaseX to need to store much in memory. After each database is
>> created, the garbage collector can come along and remove everything that was 
>> needed for the previously generated database.
>>
>>
>>
>> A solution, I suppose, would be to close and open the BaseX session on
>> each creation but I’m afraid that (on such a huge scale) the impact on
>> speed would be too large. How it is set up now, in pseudo code:
>>
>>
>>
>> ----------------------------------------------------------------------
>> ----------
>>
>>
>>
>> $session = Session->new(host, port, user, pw);
>>
>>
>>
>> # @allFiles is at least 100,000 items
>>
>> For $file (@allFiles) {
>>
>>     $database_name = $file . “name”;
>>
>>     $session->execute("CREATE DB $database_name file ");
>>
>>     $session->execute("CLOSE");
>>
>> }
>>
>>
>>
>> $session->close();
>>
>>
>>
>> ----------------------------------------------------------------------
>> ----------
>>
>>
>>
>> So all databases are created on the same session which I believe
>> causes the issue. But why? What is still required in memory after 
>> ->execute(“CLOSE”)?
>> Are the indices for the generated databases stored in memory? If so,
>> can we force them to write to disk?
>>
>>
>>
>> ANY thoughts on this are appreciated. Enlightenment on how what is
>> stored in a Session’s memory is useful as well. Increasing the memory
>> should be a last resort.
>>
>>
>>
>>
>>
>> Thank you in advance!
>>
>>
>>
>> Bram
>>
>>
>>
>>
>>
>> [1]:
>> http://www.lrec-conf.org/proceedings/lrec2014/workshops/LREC2014Worksh
>> op-CMLC2%20Proceedings-rev2.pdf#page=20
>>
>>
>>
>>
>

Reply via email to