Re: [Fuseki] Configuring Fuseki2 to impose a maximum limit on the number of rows returned.

2017-10-27 Thread Dave Reynolds

On 26/10/17 12:27, Phil Gooch wrote:

Hi there

I am running Fuseki2 within Tomcat and I'm looking for a way to configure
Fuseki to limit the number of rows returned by a query. For example, to
prevent a rogue query such as

SELECT * WHERE {?s ?v ?o}

from being executed to completion.

I've imposed a maximum timeout via

ja:context [ ja:cxtName "arq:queryTimeout" ;  ja:cxtValue "6" ] ;

in config.ttl and also in the individual .ttl files, but this does
not seem to prevent the above query from locking up the server.


Timeouts do generally work. There used to be problems with sort queries 
but those have been resolved and that's not a sort query.


Might be worth trying the two value version (time to first result and 
time for whole query):


ja:context [ja:cxtName "arq:queryTimeout";  ja:cxtValue "3,6" ];



I've looked through the documentation at

https://jena.apache.org/documentation/fuseki2/fuseki-configuration.html
https://jena.apache.org/documentation/serving_data/#fuseki-configuration-file
https://github.com/apache/jena/tree/master/jena-fuseki2/examples

but I've not found the right config option.

Is this possible, or will I need to modify the source code to add a LIMIT n
if this is not specified in the original query?


There's no built-in machinery to limit the number of rows so far as I 
know. So if timeouts really don't work for you then indeed you would 
need to inject a LIMIT clause into the queries yourself.


Timeouts are generally better because some queries are really really 
hard but return few results whereas queries like the above stream 
perfectly well and should impose low load, they just go on for a long time.


In our case the endpoints we expose are typically APIs where we can 
inject API-specific hard/soft row limits as part of the query generation 
phase. For full sparql endpoints then we rely on timeouts.


Dave



Re: Removig persitent dataset in Fuseki 3.4.0 doen't work

2017-10-27 Thread Andy Seaborne

David,

The sent the same message from email dmolinaestr...@costaisa.com 
yesterday.  It got through to the list at 13:23 UTC


A way to check is to look in the Apache archives:

https://lists.apache.org/list.html?users@jena.apache.org

==>

https://lists.apache.org/thread.html/6d5b703112d718582787becd4782692ae99e5c669bb97dbe71d2e003@%3Cusers.jena.apache.org%3E

Andy

On 27/10/17 07:49, David Molina wrote:

Hi,

When a persistent dataset is deleted from Fuseki WEB GUI, the dataset
folder in fuseki's databases folder and the dataset config ttl file in
fuseki's configuration folder remain.

I am running a Fuseki.war 3.4.0 in a Tomcat. Then, when i restart the
Tomcat server, the deleted dataset reappears.

Thank you,
David Molina



Re: Slow query when getting rdf:type

2017-10-27 Thread Mikael Pesonen


Hi,

thanks! I'll try that when get chance to stop jena. Yes we are using TDB.



On 26.10.2017 16:15, Rob Vesse wrote:

Is TDB the underlying database?

If so is there a stats.opt  file in your database directory?

I remember there being issues in the past with the statistics for rdf:type 
triples being wrongly prioritised. You might want to look at that file, 
assuming that it exists, and you try adjusting values associated with rdf:type 
based upon the guidance in the documentation:

http://jena.apache.org/documentation/tdb/optimizer.html#statistics-rule-file

Also if this is a database which is being updated then the statistics can get 
out of date relative to the database. You can use the commandline tdbstats tool 
to try regenerating this:

http://jena.apache.org/documentation/tdb/optimizer.html#generating-a-statistics-file

Note that you will need to stop Fuseki in order to run this as only a single 
process is permitted to access a TDB database at a time

Rob

On 26/10/2017 13:47, "Mikael Pesonen"  wrote:

 
 Hi, I have trouble understanding why the first query is slow and second

 one is fast. Using Jena Fuseki 3.4.0.
 
 So I want to get all resources that reference , and their

 types:
 
 SELECT * WHERE

 {
GRAPH ?g
{
?s ?p  .
?s a ?type
}
 }
 
 SELECT * WHERE

 {
GRAPH ?g
{
?s ?p  .
?s ?p2 ?o2
}
 }
 
 
 First one takes 5 seconds which is too slow for our application. Can it

 be rearranged somehow to make fast? Sorry if this is not a correct forum
 for this.
 
 Thanks!
 
 --
 
 







--
Lingsoft - 30 years of Leading Language Management

www.lingsoft.fi

Speech Applications - Language Management - Translation - Reader's and Writer's 
Tools - Text Tools - E-books and M-books

Mikael Pesonen
System Engineer

e-mail: mikael.peso...@lingsoft.fi
Tel. +358 2 279 3300

Time zone: GMT+2

Helsinki Office
Eteläranta 10
FI-00130 Helsinki
FINLAND

Turku Office
Kauppiaskatu 5 A
FI-20100 Turku
FINLAND



Re: Removig persitent dataset in Fuseki 3.4.0 doen't work

2017-10-27 Thread Andy Seaborne

Confirm - that's what happens.

Could you raise a JIRA please?

The code in ActionDatasets.execDeleteItem appears to have delete 
database and configuration ... it just doesn't seem to do it.



The code needs to be sensitive to whether the database is inside the 
runtiem area of elsewhere.


If the DB is in the server configuration file, then it can't remove the 
configuration.


Andy

On 26/10/17 14:23, DAVID MOLINA ESTRADA wrote:

Hi,

When a persistent dataset is deleted from Fuseki WEB GUI, the dataset folder in 
fuseki's databases folder and the dataset config ttl file in fuseki's 
configuration folder remain.

I am running a Fuseki.war 3.4.0 in a Tomcat. Then, when i restart the Tomcat 
server, the deleted dataset reappears.

Thank you,
David Molina
Evite imprimir este mensaje si no es estrictamente necesario | Eviti imprimir 
aquest missatge si no és estrictament necessari | Avoid printing this message 
if it is not absolutely necessary



Re: Slow query when getting rdf:type

2017-10-27 Thread Andy Seaborne
In this case, stats won't help.  The  shoudl eb the 
starting point.


(quadpattern
  (quad ?g ?s ?p )
  (quad ?g ?s  ?type)
)

(quadpattern
  (quad ?g ?s ?p )
  (quad ?g ?s ?p2 ?o2)
)))

Are you using inference as well?

Is it the same ?

Is the timing for the rdf:type variant on a cold system?

Andy



On 27/10/17 10:22, Mikael Pesonen wrote:


Hi,

thanks! I'll try that when get chance to stop jena. Yes we are using TDB.



On 26.10.2017 16:15, Rob Vesse wrote:

Is TDB the underlying database?

If so is there a stats.opt  file in your database directory?

I remember there being issues in the past with the statistics for 
rdf:type triples being wrongly prioritised. You might want to look at 
that file, assuming that it exists, and you try adjusting values 
associated with rdf:type based upon the guidance in the documentation:


http://jena.apache.org/documentation/tdb/optimizer.html#statistics-rule-file 



Also if this is a database which is being updated then the statistics 
can get out of date relative to the database. You can use the 
commandline tdbstats tool to try regenerating this:


http://jena.apache.org/documentation/tdb/optimizer.html#generating-a-statistics-file 



Note that you will need to stop Fuseki in order to run this as only a 
single process is permitted to access a TDB database at a time


Rob

On 26/10/2017 13:47, "Mikael Pesonen"  wrote:

 Hi, I have trouble understanding why the first query is slow and 
second

 one is fast. Using Jena Fuseki 3.4.0.
 So I want to get all resources that reference , 
and their

 types:
 SELECT * WHERE
 {
 GRAPH ?g
 {
 ?s ?p  .
  ?s a ?type
 }
 }
 SELECT * WHERE
 {
 GRAPH ?g
 {
 ?s ?p  .
  ?s ?p2 ?o2
 }
 }
 First one takes 5 seconds which is too slow for our application. 
Can it
 be rearranged somehow to make fast? Sorry if this is not a 
correct forum

 for this.
 Thanks!
 --








Re: Slow query when getting rdf:type

2017-10-27 Thread Mikael Pesonen


Hi,

yes I was using the same resource for testing. Jena has been running for 
weeks so not a cold system if understood correctly. Sorry what means 
inference?


Br,
Mikael


On 27.10.2017 13:02, Andy Seaborne wrote:
In this case, stats won't help.  The  shoudl eb the 
starting point.


(quadpattern
  (quad ?g ?s ?p )
  (quad ?g ?s  ?type)
)

(quadpattern
  (quad ?g ?s ?p )
  (quad ?g ?s ?p2 ?o2)
)))

Are you using inference as well?

Is it the same ?

Is the timing for the rdf:type variant on a cold system?

    Andy



On 27/10/17 10:22, Mikael Pesonen wrote:


Hi,

thanks! I'll try that when get chance to stop jena. Yes we are using 
TDB.




On 26.10.2017 16:15, Rob Vesse wrote:

Is TDB the underlying database?

If so is there a stats.opt  file in your database directory?

I remember there being issues in the past with the statistics for 
rdf:type triples being wrongly prioritised. You might want to look 
at that file, assuming that it exists, and you try adjusting values 
associated with rdf:type based upon the guidance in the documentation:


http://jena.apache.org/documentation/tdb/optimizer.html#statistics-rule-file 



Also if this is a database which is being updated then the 
statistics can get out of date relative to the database. You can use 
the commandline tdbstats tool to try regenerating this:


http://jena.apache.org/documentation/tdb/optimizer.html#generating-a-statistics-file 



Note that you will need to stop Fuseki in order to run this as only 
a single process is permitted to access a TDB database at a time


Rob

On 26/10/2017 13:47, "Mikael Pesonen"  
wrote:


 Hi, I have trouble understanding why the first query is slow 
and second

 one is fast. Using Jena Fuseki 3.4.0.
 So I want to get all resources that reference , 
and their

 types:
 SELECT * WHERE
 {
 GRAPH ?g
 {
 ?s ?p  .
  ?s a ?type
 }
 }
 SELECT * WHERE
 {
 GRAPH ?g
 {
 ?s ?p  .
  ?s ?p2 ?o2
 }
 }
 First one takes 5 seconds which is too slow for our 
application. Can it
 be rearranged somehow to make fast? Sorry if this is not a 
correct forum

 for this.
 Thanks!
 --








--
Lingsoft - 30 years of Leading Language Management

www.lingsoft.fi

Speech Applications - Language Management - Translation - Reader's and Writer's 
Tools - Text Tools - E-books and M-books

Mikael Pesonen
System Engineer

e-mail: mikael.peso...@lingsoft.fi
Tel. +358 2 279 3300

Time zone: GMT+2

Helsinki Office
Eteläranta 10
FI-00130 Helsinki
FINLAND

Turku Office
Kauppiaskatu 5 A
FI-20100 Turku
FINLAND



How to derive Change Statements

2017-10-27 Thread anuj kumar
Hi Jena Users,
 I have a query regarding the most effective way to capture changes in the
underlying Triple Store.
I have a requirement where:
1. Every time a property of a Node (represented as a Triple Statement)
changes, I also need to generate certain change statements to capture what
has changed, who changed it, when it was changed etc.
2. If I delete a Node (represented as a Set of Triples in the RDF Store), I
need to capture the action DELETE on this node, who deleted the node, when
it was deleted etc.

Basically, I need to have a audit trail developed so that I  can create the
graph as it was at a given moment in time.

The question is:
1. What is the best way to implement such functionality? Does Jena support
such a thing either natively or through some standard mechanism?

Thanks,
-- 
*Anuj Kumar*


Re: Slow query when getting rdf:type

2017-10-27 Thread Mikael Pesonen


Tried this also with other properties such as dcterms:created, and it 
didnt slow down with them.


-Mikael


On 27.10.2017 13:02, Andy Seaborne wrote:
In this case, stats won't help.  The  shoudl eb the 
starting point.


(quadpattern
  (quad ?g ?s ?p )
  (quad ?g ?s  ?type)
)

(quadpattern
  (quad ?g ?s ?p )
  (quad ?g ?s ?p2 ?o2)
)))

Are you using inference as well?

Is it the same ?

Is the timing for the rdf:type variant on a cold system?

    Andy



On 27/10/17 10:22, Mikael Pesonen wrote:


Hi,

thanks! I'll try that when get chance to stop jena. Yes we are using 
TDB.




On 26.10.2017 16:15, Rob Vesse wrote:

Is TDB the underlying database?

If so is there a stats.opt  file in your database directory?

I remember there being issues in the past with the statistics for 
rdf:type triples being wrongly prioritised. You might want to look 
at that file, assuming that it exists, and you try adjusting values 
associated with rdf:type based upon the guidance in the documentation:


http://jena.apache.org/documentation/tdb/optimizer.html#statistics-rule-file 



Also if this is a database which is being updated then the 
statistics can get out of date relative to the database. You can use 
the commandline tdbstats tool to try regenerating this:


http://jena.apache.org/documentation/tdb/optimizer.html#generating-a-statistics-file 



Note that you will need to stop Fuseki in order to run this as only 
a single process is permitted to access a TDB database at a time


Rob

On 26/10/2017 13:47, "Mikael Pesonen"  
wrote:


 Hi, I have trouble understanding why the first query is slow 
and second

 one is fast. Using Jena Fuseki 3.4.0.
 So I want to get all resources that reference , 
and their

 types:
 SELECT * WHERE
 {
 GRAPH ?g
 {
 ?s ?p  .
  ?s a ?type
 }
 }
 SELECT * WHERE
 {
 GRAPH ?g
 {
 ?s ?p  .
  ?s ?p2 ?o2
 }
 }
 First one takes 5 seconds which is too slow for our 
application. Can it
 be rearranged somehow to make fast? Sorry if this is not a 
correct forum

 for this.
 Thanks!
 --








--
Lingsoft - 30 years of Leading Language Management

www.lingsoft.fi

Speech Applications - Language Management - Translation - Reader's and Writer's 
Tools - Text Tools - E-books and M-books

Mikael Pesonen
System Engineer

e-mail: mikael.peso...@lingsoft.fi
Tel. +358 2 279 3300

Time zone: GMT+2

Helsinki Office
Eteläranta 10
FI-00130 Helsinki
FINLAND

Turku Office
Kauppiaskatu 5 A
FI-20100 Turku
FINLAND



Re: How to derive Change Statements

2017-10-27 Thread Claude Warren
Since you need to detect who changed what the only way I can see to do this
is turn on authentication on Fuseki and track changes made through it.

You could bastardise the permissions layer[1] to do what you want.  The
permissions layer will let you filter down to the actions on the triples,
rather than implementing a SecurityEvaluator to perform the restriction you
could implement it record all changes (including who made them) in any
storage and format you wish.

1. https://jena.apache.org/documentation/permissions/index.html


On Fri, Oct 27, 2017 at 11:42 AM, anuj kumar 
wrote:

> Hi Jena Users,
>  I have a query regarding the most effective way to capture changes in the
> underlying Triple Store.
> I have a requirement where:
> 1. Every time a property of a Node (represented as a Triple Statement)
> changes, I also need to generate certain change statements to capture what
> has changed, who changed it, when it was changed etc.
> 2. If I delete a Node (represented as a Set of Triples in the RDF Store), I
> need to capture the action DELETE on this node, who deleted the node, when
> it was deleted etc.
>
> Basically, I need to have a audit trail developed so that I  can create the
> graph as it was at a given moment in time.
>
> The question is:
> 1. What is the best way to implement such functionality? Does Jena support
> such a thing either natively or through some standard mechanism?
>
> Thanks,
> --
> *Anuj Kumar*
>



-- 
I like: Like Like - The likeliest place on the web

LinkedIn: http://www.linkedin.com/in/claudewarren


Re: How to derive Change Statements

2017-10-27 Thread Andy Seaborne

Hi Anuj,

Jena has some building blocks: GraphListener, DatasetChanges

GraphListener enables an app to watch every change on a graph.
DatasetGraphMonitor/DatasetChanges can be used if you want to look at a 
dataset.


Both give you triggers on individual triple/quad changes on which you 
can add application code.


GraphListener includes some identification of groups of changes but only 
when the changes are made as a single operation which might be hard in 
the app (e.g. deletes and inserts). Even then, the bulk operations can 
get lost in hierarchies of graphs wrapping other graphs. 
DatasetGraphMonitor does not provide grouped operations anyway.


See also org.apache.jena.graph.GraphUtil



I have the need to work with logical collections of changes.

Caution:
  work-in-progress alter
  not part of Jena

https://afs.github.io/rdf-delta/rdf-patch.html

Here, transactions are captured so there are logical groups of changes 
such as your #2 requirement.


Annotating, or linking to RDF Patches gives an audit.

In my $job, this is used to replicate changes between databases (High 
Availability).


Andy

On 27/10/17 11:42, anuj kumar wrote:

Hi Jena Users,
  I have a query regarding the most effective way to capture changes in the
underlying Triple Store.
I have a requirement where:
1. Every time a property of a Node (represented as a Triple Statement)
changes, I also need to generate certain change statements to capture what
has changed, who changed it, when it was changed etc.
2. If I delete a Node (represented as a Set of Triples in the RDF Store), I
need to capture the action DELETE on this node, who deleted the node, when
it was deleted etc.

Basically, I need to have a audit trail developed so that I  can create the
graph as it was at a given moment in time.

The question is:
1. What is the best way to implement such functionality? Does Jena support
such a thing either natively or through some standard mechanism?

Thanks,



Re: Removig persitent dataset in Fuseki 3.4.0 doen't work

2017-10-27 Thread Andy Seaborne

David raised JENA-1410.


On 27/10/17 10:58, Andy Seaborne wrote:

Confirm - that's what happens.

Could you raise a JIRA please?

The code in ActionDatasets.execDeleteItem appears to have delete 
database and configuration ... it just doesn't seem to do it.



The code needs to be sensitive to whether the database is inside the 
runtiem area of elsewhere.


If the DB is in the server configuration file, then it can't remove the 
configuration.


 Andy

On 26/10/17 14:23, DAVID MOLINA ESTRADA wrote:

Hi,

When a persistent dataset is deleted from Fuseki WEB GUI, the dataset 
folder in fuseki's databases folder and the dataset config ttl file in 
fuseki's configuration folder remain.


I am running a Fuseki.war 3.4.0 in a Tomcat. Then, when i restart the 
Tomcat server, the deleted dataset reappears.


Thank you,
David Molina
Evite imprimir este mensaje si no es estrictamente necesario | Eviti 
imprimir aquest missatge si no és estrictament necessari | Avoid 
printing this message if it is not absolutely necessary




Re: [Fuseki] Configuring Fuseki2 to impose a maximum limit on the number of rows returned.

2017-10-27 Thread Andy Seaborne

Phil -

Which version are you running?

Can you show the configuration file?

 Andy

On 27/10/17 08:30, Dave Reynolds wrote:

On 26/10/17 12:27, Phil Gooch wrote:

Hi there

I am running Fuseki2 within Tomcat and I'm looking for a way to configure
Fuseki to limit the number of rows returned by a query. For example, to
prevent a rogue query such as

SELECT * WHERE {?s ?v ?o}

from being executed to completion.

I've imposed a maximum timeout via

ja:context [ ja:cxtName "arq:queryTimeout" ;  ja:cxtValue "6" ] ;

in config.ttl and also in the individual .ttl files, but this 
does

not seem to prevent the above query from locking up the server.


Timeouts do generally work. There used to be problems with sort queries 
but those have been resolved and that's not a sort query.


Might be worth trying the two value version (time to first result and 
time for whole query):


ja:context [ja:cxtName "arq:queryTimeout";  ja:cxtValue "3,6" ];



I've looked through the documentation at

https://jena.apache.org/documentation/fuseki2/fuseki-configuration.html
https://jena.apache.org/documentation/serving_data/#fuseki-configuration-file 


https://github.com/apache/jena/tree/master/jena-fuseki2/examples

but I've not found the right config option.

Is this possible, or will I need to modify the source code to add a 
LIMIT n

if this is not specified in the original query?


There's no built-in machinery to limit the number of rows so far as I 
know. So if timeouts really don't work for you then indeed you would 
need to inject a LIMIT clause into the queries yourself.


Timeouts are generally better because some queries are really really 
hard but return few results whereas queries like the above stream 
perfectly well and should impose low load, they just go on for a long time.


In our case the endpoints we expose are typically APIs where we can 
inject API-specific hard/soft row limits as part of the query generation 
phase. For full sparql endpoints then we rely on timeouts.


Dave



Re: [Fuseki] Configuring Fuseki2 to impose a maximum limit on the number of rows returned.

2017-10-27 Thread Phil Gooch
@Dave - thanks for the info about the two value timeout, I'll try that.

@Andy - according to the META-INF in the fuseki.war file I'm running 2.6.0

#Generated by Maven
#Tue May 02 13:43:43 EDT 2017
version=2.6.0
groupId=org.apache.jena
artifactId=jena-fuseki-war

The config file for demo.ttl in the configuration directory looks like this

@prefix :   .
@prefix tdb:    .
@prefix rdf:    .
@prefix ja: .
@prefix rdfs:   .
@prefix fuseki:  .

:service_tdb_all  a   fuseki:Service ;
rdfs:label"TDB demo" ;
fuseki:dataset:tdb_dataset_readwrite ;
fuseki:name   "demo" ;
fuseki:serviceQuery   "query" , "sparql" ;
fuseki:serviceReadGraphStore  "get", "post" ;
fuseki:serviceReadWriteGraphStore
"data" ;
fuseki:serviceUpdate  "update" ;
fuseki:serviceUpload  "upload" .

:tdb_dataset_readwrite
a tdb:DatasetTDB ;
ja:context [ ja:cxtName "arq:queryTimeout" ;  ja:cxtValue
"3,6" ] ;
tdb:location  "/etc/fuseki/databases/demo" .


Cheers

Phil



On Fri, Oct 27, 2017 at 3:27 PM, Andy Seaborne  wrote:

> Phil -
>
> Which version are you running?
>
> Can you show the configuration file?
>
>  Andy
>
>
> On 27/10/17 08:30, Dave Reynolds wrote:
>
>> On 26/10/17 12:27, Phil Gooch wrote:
>>
>>> Hi there
>>>
>>> I am running Fuseki2 within Tomcat and I'm looking for a way to configure
>>> Fuseki to limit the number of rows returned by a query. For example, to
>>> prevent a rogue query such as
>>>
>>> SELECT * WHERE {?s ?v ?o}
>>>
>>> from being executed to completion.
>>>
>>> I've imposed a maximum timeout via
>>>
>>> ja:context [ ja:cxtName "arq:queryTimeout" ;  ja:cxtValue "6" ] ;
>>>
>>> in config.ttl and also in the individual .ttl files, but this
>>> does
>>> not seem to prevent the above query from locking up the server.
>>>
>>
>> Timeouts do generally work. There used to be problems with sort queries
>> but those have been resolved and that's not a sort query.
>>
>> Might be worth trying the two value version (time to first result and
>> time for whole query):
>>
>> ja:context [ja:cxtName "arq:queryTimeout";  ja:cxtValue "3,6" ];
>>
>>
>>> I've looked through the documentation at
>>>
>>> https://jena.apache.org/documentation/fuseki2/fuseki-configuration.html
>>> https://jena.apache.org/documentation/serving_data/#fuseki-
>>> configuration-file
>>> https://github.com/apache/jena/tree/master/jena-fuseki2/examples
>>>
>>> but I've not found the right config option.
>>>
>>> Is this possible, or will I need to modify the source code to add a
>>> LIMIT n
>>> if this is not specified in the original query?
>>>
>>
>> There's no built-in machinery to limit the number of rows so far as I
>> know. So if timeouts really don't work for you then indeed you would need
>> to inject a LIMIT clause into the queries yourself.
>>
>> Timeouts are generally better because some queries are really really hard
>> but return few results whereas queries like the above stream perfectly well
>> and should impose low load, they just go on for a long time.
>>
>> In our case the endpoints we expose are typically APIs where we can
>> inject API-specific hard/soft row limits as part of the query generation
>> phase. For full sparql endpoints then we rely on timeouts.
>>
>> Dave
>>
>>