Re: Fuseki - how to release memory

2017-01-06 Thread A. Soroka
Can you give us your actual Fuseki config (i.e. assembler file)? Or are you 
repeatedly creating new datasets via the admin API?

---
A. Soroka
The University of Virginia Library

> On Jan 6, 2017, at 10:43 AM, Janda, Radim  wrote:
> 
> Hello,
> we use in-memory datasets.
> JVM is big enough but as we process thousands of small data sets the memory
> is allocated continuously.
> Actualy we restart Fuseki every hour to avoid out of memory error.
> However the performance is also decreasing in time (before restart) that's
> why we are looking for the possibility of memory cleanup.
> 
> Radim
> 
> On Fri, Jan 6, 2017 at 4:12 PM, Andy Seaborne  wrote:
> 
>> Are you using persistent or an in-memory datasets for your working storage?
>> 
>> If you really mean memory (RAM), are you sure the JVM is big enough?
>> 
>> Fuseki tries to avoid holding on to cache transactions but if the server
>> is under heavy read requests (Rob's point) then it can build up (solution -
>> reduce the read load for a short while) - also TDB does try to switch to
>> emergency measures after a while but maybe before then the RAM usage has
>> grown too much.
>> 
>>Andy
>> 
>> 
>> On 06/01/17 14:07, Rob Vesse wrote:
>> 
>>> Deleting data does not reclaim all the memory, exactly what is and isn’t
>>> reclaimed depends somewhat on your exact usage pattern.
>>> 
>>> The B+Tree’s which are the primary data structure for TDB, the default
>>> database used in Fuseki, does not reclaim the space. It is potentially
>>> subject fragmentation as well so memory used tends to grow over time. The
>>> node table portion of the database, the mapping from RDF terms to internal
>>> database identifiers is a sequential data structure that will only ever
>>> grow over time. It is also worth noting that many of the data structures
>>> are backed by memory mapped files which are off-heap and subject to the
>>> vagaries of how your OS handles this.
>>> 
>>> Additionally, if you place Fuseki under continuous load TDB maybe blocked
>>> from writing the in memory journal back to disk which can cause back to
>>> grow unbounded overtime and prevent memory being reclaimed. Adding
>>> occasional pauses between operations can help to alleviate this.
>>> 
>>> As Lorenz notes for this kind of use case you may not need Fuseki at all
>>> and could simply drive TDB programmatically instead.
>>> 
>>> As a general point creating a fresh database rather than reusing an
>>> existing one will much more efficiently use memory. However, if you’re
>>> running on Windows then there is a known OS specific JVM bug that can cause
>>> memory mapped files to not be properly deleted until after the process
>>> exits.
>>> 
>>> Rob
>>> 
>>> On 06/01/2017 12:23, "Janda, Radim"  wrote:
>>> 
>>>Hello Lorenz,
>>>yes I meant delete data from Fuseki using DELETE command.
>>>We have version 2.4 installed.
>>>We use two types of queries:
>>>1. Insert new triples based on existing triples rdf model (insert
>>> sparql)
>>>2. Find some results in the data (select sparql)
>>> 
>>>Thanks
>>> 
>>>Radim
>>> 
>>>On Fri, Jan 6, 2017 at 1:04 PM, Lorenz B. <
>>>buehm...@informatik.uni-leipzig.de> wrote:
>>> 
 Hello Radim,
 
 just to avoid confusion, with "Delete whole Fuseki" you mean the
>>> data
 loaded into Fuseki, right?
 
 Which Fuseki version do you use?
 
 What kind of transformation do you do? I'm asking because I'm
>>> wondering
 if it's necessary to use Fuseki.
 
 
 
 Cheers,
 Lorenz
 
> Hello,
> We use Jena Fuseki to process a lot of small data sets.
> 
> It works in the following way:
> 1. Delete whole Fuseki (using DELETE command)
> 2. Load data to Fuseki (using INSERT)
> 3. Tranform data and create output (sparql called from Python)
> 4. ad 1)2)3  delete Fuseki and Transform another data set
> 
> We have found out that memory is not released after delete in
>>> Fuseki.
> That means we have lack of memory after some data sets are
>>> transformed.
> Actually we restart Fuseki server after some number of data sets
>>> but we
> are looking for the better solution.
> 
> Can you please help us with memory releasing?
> 
> Many thanks
> 
> Radim
> 
 --
 Lorenz Bühmann
 AKSW group, University of Leipzig
 Group: http://aksw.org - semantic web research center
 
 
>>> 
>>> 
>>> 
>>> 
>>> 
>>> 



Re: Fuseki - how to release memory

2017-01-06 Thread Janda, Radim
Hello,
we use in-memory datasets.
JVM is big enough but as we process thousands of small data sets the memory
is allocated continuously.
Actualy we restart Fuseki every hour to avoid out of memory error.
However the performance is also decreasing in time (before restart) that's
why we are looking for the possibility of memory cleanup.

Radim

On Fri, Jan 6, 2017 at 4:12 PM, Andy Seaborne  wrote:

> Are you using persistent or an in-memory datasets for your working storage?
>
> If you really mean memory (RAM), are you sure the JVM is big enough?
>
> Fuseki tries to avoid holding on to cache transactions but if the server
> is under heavy read requests (Rob's point) then it can build up (solution -
> reduce the read load for a short while) - also TDB does try to switch to
> emergency measures after a while but maybe before then the RAM usage has
> grown too much.
>
> Andy
>
>
> On 06/01/17 14:07, Rob Vesse wrote:
>
>> Deleting data does not reclaim all the memory, exactly what is and isn’t
>> reclaimed depends somewhat on your exact usage pattern.
>>
>> The B+Tree’s which are the primary data structure for TDB, the default
>> database used in Fuseki, does not reclaim the space. It is potentially
>> subject fragmentation as well so memory used tends to grow over time. The
>> node table portion of the database, the mapping from RDF terms to internal
>> database identifiers is a sequential data structure that will only ever
>> grow over time. It is also worth noting that many of the data structures
>> are backed by memory mapped files which are off-heap and subject to the
>> vagaries of how your OS handles this.
>>
>> Additionally, if you place Fuseki under continuous load TDB maybe blocked
>> from writing the in memory journal back to disk which can cause back to
>> grow unbounded overtime and prevent memory being reclaimed. Adding
>> occasional pauses between operations can help to alleviate this.
>>
>> As Lorenz notes for this kind of use case you may not need Fuseki at all
>> and could simply drive TDB programmatically instead.
>>
>> As a general point creating a fresh database rather than reusing an
>> existing one will much more efficiently use memory. However, if you’re
>> running on Windows then there is a known OS specific JVM bug that can cause
>> memory mapped files to not be properly deleted until after the process
>> exits.
>>
>> Rob
>>
>> On 06/01/2017 12:23, "Janda, Radim"  wrote:
>>
>> Hello Lorenz,
>> yes I meant delete data from Fuseki using DELETE command.
>> We have version 2.4 installed.
>> We use two types of queries:
>> 1. Insert new triples based on existing triples rdf model (insert
>> sparql)
>> 2. Find some results in the data (select sparql)
>>
>> Thanks
>>
>> Radim
>>
>> On Fri, Jan 6, 2017 at 1:04 PM, Lorenz B. <
>> buehm...@informatik.uni-leipzig.de> wrote:
>>
>> > Hello Radim,
>> >
>> > just to avoid confusion, with "Delete whole Fuseki" you mean the
>> data
>> > loaded into Fuseki, right?
>> >
>> > Which Fuseki version do you use?
>> >
>> > What kind of transformation do you do? I'm asking because I'm
>> wondering
>> > if it's necessary to use Fuseki.
>> >
>> >
>> >
>> > Cheers,
>> > Lorenz
>> >
>> > > Hello,
>> > > We use Jena Fuseki to process a lot of small data sets.
>> > >
>> > > It works in the following way:
>> > > 1. Delete whole Fuseki (using DELETE command)
>> > > 2. Load data to Fuseki (using INSERT)
>> > > 3. Tranform data and create output (sparql called from Python)
>> > > 4. ad 1)2)3  delete Fuseki and Transform another data set
>> > >
>> > > We have found out that memory is not released after delete in
>> Fuseki.
>> > > That means we have lack of memory after some data sets are
>> transformed.
>> > > Actually we restart Fuseki server after some number of data sets
>> but we
>> > > are looking for the better solution.
>> > >
>> > > Can you please help us with memory releasing?
>> > >
>> > > Many thanks
>> > >
>> > > Radim
>> > >
>> > --
>> > Lorenz Bühmann
>> > AKSW group, University of Leipzig
>> > Group: http://aksw.org - semantic web research center
>> >
>> >
>>
>>
>>
>>
>>
>>


Re: Fuseki - how to release memory

2017-01-06 Thread Andy Seaborne

Are you using persistent or an in-memory datasets for your working storage?

If you really mean memory (RAM), are you sure the JVM is big enough?

Fuseki tries to avoid holding on to cache transactions but if the server 
is under heavy read requests (Rob's point) then it can build up 
(solution - reduce the read load for a short while) - also TDB does try 
to switch to emergency measures after a while but maybe before then the 
RAM usage has grown too much.


Andy

On 06/01/17 14:07, Rob Vesse wrote:

Deleting data does not reclaim all the memory, exactly what is and isn’t 
reclaimed depends somewhat on your exact usage pattern.

The B+Tree’s which are the primary data structure for TDB, the default database 
used in Fuseki, does not reclaim the space. It is potentially subject 
fragmentation as well so memory used tends to grow over time. The node table 
portion of the database, the mapping from RDF terms to internal database 
identifiers is a sequential data structure that will only ever grow over time. 
It is also worth noting that many of the data structures are backed by memory 
mapped files which are off-heap and subject to the vagaries of how your OS 
handles this.

Additionally, if you place Fuseki under continuous load TDB maybe blocked from 
writing the in memory journal back to disk which can cause back to grow 
unbounded overtime and prevent memory being reclaimed. Adding occasional pauses 
between operations can help to alleviate this.

As Lorenz notes for this kind of use case you may not need Fuseki at all and 
could simply drive TDB programmatically instead.

As a general point creating a fresh database rather than reusing an existing 
one will much more efficiently use memory. However, if you’re running on 
Windows then there is a known OS specific JVM bug that can cause memory mapped 
files to not be properly deleted until after the process exits.

Rob

On 06/01/2017 12:23, "Janda, Radim"  wrote:

Hello Lorenz,
yes I meant delete data from Fuseki using DELETE command.
We have version 2.4 installed.
We use two types of queries:
1. Insert new triples based on existing triples rdf model (insert sparql)
2. Find some results in the data (select sparql)

Thanks

Radim

On Fri, Jan 6, 2017 at 1:04 PM, Lorenz B. <
buehm...@informatik.uni-leipzig.de> wrote:

> Hello Radim,
>
> just to avoid confusion, with "Delete whole Fuseki" you mean the data
> loaded into Fuseki, right?
>
> Which Fuseki version do you use?
>
> What kind of transformation do you do? I'm asking because I'm wondering
> if it's necessary to use Fuseki.
>
>
>
> Cheers,
> Lorenz
>
> > Hello,
> > We use Jena Fuseki to process a lot of small data sets.
> >
> > It works in the following way:
> > 1. Delete whole Fuseki (using DELETE command)
> > 2. Load data to Fuseki (using INSERT)
> > 3. Tranform data and create output (sparql called from Python)
> > 4. ad 1)2)3  delete Fuseki and Transform another data set
> >
> > We have found out that memory is not released after delete in Fuseki.
> > That means we have lack of memory after some data sets are transformed.
> > Actually we restart Fuseki server after some number of data sets but we
> > are looking for the better solution.
> >
> > Can you please help us with memory releasing?
> >
> > Many thanks
> >
> > Radim
> >
> --
> Lorenz Bühmann
> AKSW group, University of Leipzig
> Group: http://aksw.org - semantic web research center
>
>







Re: Fuseki - how to release memory

2017-01-06 Thread Rob Vesse
Deleting data does not reclaim all the memory, exactly what is and isn’t 
reclaimed depends somewhat on your exact usage pattern.

The B+Tree’s which are the primary data structure for TDB, the default database 
used in Fuseki, does not reclaim the space. It is potentially subject 
fragmentation as well so memory used tends to grow over time. The node table 
portion of the database, the mapping from RDF terms to internal database 
identifiers is a sequential data structure that will only ever grow over time. 
It is also worth noting that many of the data structures are backed by memory 
mapped files which are off-heap and subject to the vagaries of how your OS 
handles this.

Additionally, if you place Fuseki under continuous load TDB maybe blocked from 
writing the in memory journal back to disk which can cause back to grow 
unbounded overtime and prevent memory being reclaimed. Adding occasional pauses 
between operations can help to alleviate this.

As Lorenz notes for this kind of use case you may not need Fuseki at all and 
could simply drive TDB programmatically instead.

As a general point creating a fresh database rather than reusing an existing 
one will much more efficiently use memory. However, if you’re running on 
Windows then there is a known OS specific JVM bug that can cause memory mapped 
files to not be properly deleted until after the process exits.

Rob

On 06/01/2017 12:23, "Janda, Radim"  wrote:

Hello Lorenz,
yes I meant delete data from Fuseki using DELETE command.
We have version 2.4 installed.
We use two types of queries:
1. Insert new triples based on existing triples rdf model (insert sparql)
2. Find some results in the data (select sparql)

Thanks

Radim

On Fri, Jan 6, 2017 at 1:04 PM, Lorenz B. <
buehm...@informatik.uni-leipzig.de> wrote:

> Hello Radim,
>
> just to avoid confusion, with "Delete whole Fuseki" you mean the data
> loaded into Fuseki, right?
>
> Which Fuseki version do you use?
>
> What kind of transformation do you do? I'm asking because I'm wondering
> if it's necessary to use Fuseki.
>
>
>
> Cheers,
> Lorenz
>
> > Hello,
> > We use Jena Fuseki to process a lot of small data sets.
> >
> > It works in the following way:
> > 1. Delete whole Fuseki (using DELETE command)
> > 2. Load data to Fuseki (using INSERT)
> > 3. Tranform data and create output (sparql called from Python)
> > 4. ad 1)2)3  delete Fuseki and Transform another data set
> >
> > We have found out that memory is not released after delete in Fuseki.
> > That means we have lack of memory after some data sets are transformed.
> > Actually we restart Fuseki server after some number of data sets but we
> > are looking for the better solution.
> >
> > Can you please help us with memory releasing?
> >
> > Many thanks
> >
> > Radim
> >
> --
> Lorenz Bühmann
> AKSW group, University of Leipzig
> Group: http://aksw.org - semantic web research center
>
>







Re: Fuseki - how to release memory

2017-01-06 Thread Janda, Radim
Hello Lorenz,
yes I meant delete data from Fuseki using DELETE command.
We have version 2.4 installed.
We use two types of queries:
1. Insert new triples based on existing triples rdf model (insert sparql)
2. Find some results in the data (select sparql)

Thanks

Radim

On Fri, Jan 6, 2017 at 1:04 PM, Lorenz B. <
buehm...@informatik.uni-leipzig.de> wrote:

> Hello Radim,
>
> just to avoid confusion, with "Delete whole Fuseki" you mean the data
> loaded into Fuseki, right?
>
> Which Fuseki version do you use?
>
> What kind of transformation do you do? I'm asking because I'm wondering
> if it's necessary to use Fuseki.
>
>
>
> Cheers,
> Lorenz
>
> > Hello,
> > We use Jena Fuseki to process a lot of small data sets.
> >
> > It works in the following way:
> > 1. Delete whole Fuseki (using DELETE command)
> > 2. Load data to Fuseki (using INSERT)
> > 3. Tranform data and create output (sparql called from Python)
> > 4. ad 1)2)3  delete Fuseki and Transform another data set
> >
> > We have found out that memory is not released after delete in Fuseki.
> > That means we have lack of memory after some data sets are transformed.
> > Actually we restart Fuseki server after some number of data sets but we
> > are looking for the better solution.
> >
> > Can you please help us with memory releasing?
> >
> > Many thanks
> >
> > Radim
> >
> --
> Lorenz Bühmann
> AKSW group, University of Leipzig
> Group: http://aksw.org - semantic web research center
>
>


Re: Fuseki - how to release memory

2017-01-06 Thread Lorenz B.
Hello Radim,

just to avoid confusion, with "Delete whole Fuseki" you mean the data
loaded into Fuseki, right?

Which Fuseki version do you use?

What kind of transformation do you do? I'm asking because I'm wondering
if it's necessary to use Fuseki.



Cheers,
Lorenz

> Hello,
> We use Jena Fuseki to process a lot of small data sets.
>
> It works in the following way:
> 1. Delete whole Fuseki (using DELETE command)
> 2. Load data to Fuseki (using INSERT)
> 3. Tranform data and create output (sparql called from Python)
> 4. ad 1)2)3  delete Fuseki and Transform another data set
>
> We have found out that memory is not released after delete in Fuseki.
> That means we have lack of memory after some data sets are transformed.
> Actually we restart Fuseki server after some number of data sets but we
> are looking for the better solution.
>
> Can you please help us with memory releasing?
>
> Many thanks
>
> Radim
>
-- 
Lorenz Bühmann
AKSW group, University of Leipzig
Group: http://aksw.org - semantic web research center