Re: [Owlim-discussion] Questions about strange triple insertion rate changes

2011-07-26 Thread Damyan Ognyanov

Hi Jerven,

a lot of data to look at ... though didn't examine it very closely at 
this point, I like to give you some comments on a possible explanation 
on the behavior you are observing ...


First, the statement storage for each of the main indices (po, pso) is 
organized as B+ tree where each component of the statement (subject, 
predicate, etc.) is an integer(40bit) number representing the id of the 
RDF node.


Those Ids are assigned when a new node appear in the data as a component 
of some RDF statement . So the new nodes introduced by your import 
process get greater integer values and because those are most probably 
either subjects or objects of the new statements, those statements are 
always put at the end of a particular subsection (depending on the 
ordering used) of the BTrees.


Because of the caching, it is more probably that those pages that are 
about to be altered are found in the cache (the new data is at the end 
of some sub-sequence because of the natural ordering by ID we are using) 
so the whole process runs smoothly.


When the data you are putting start to reuse some already existing 
nodes, then the statements you are storing are no longer at the end of 
such particular subsection of the tree and tend to be randomly 
distributed across the whole tree and the cache becames inefficient, 
mainly because the page where you are about to place the new statement 
is most probably out of it and it needs to be read from the disk. Yes, 
it is cashed at that point but you probably will not need it anymore 
after that single operation so the seek/read cost became an issue and 
the process of data import slows down greatly.


One way to reduce that cost is to somehow tweak the order of the 
statements you are importing (needs some pre-processing to rearrange 
them in some particular way so that the caching start working 
efficiently) .


For instance, you may start adding the statements at batches organized 
by a particullar predicate - that way the whole tree subsection that 
holds all the statements with that particular predicate will be cashed 
(most of it) so even the statements are a bit random they always will be 
part of that section.


In any way I will look closely at the data you sent to see if something 
else pop in my mind related to that sudden drop of the throughput of the 
storage.


many thanks for the detailed info you sent,

Regards,
Damyan Ognyanov
Ontotext AD.

On 26.7.2011 ?. 09:49 ?., Jerven Bolleman wrote:

Dear Owlim developers,

I am trying to load all the UniProt data on a 64GB RAM machine. I have 
a case where I am very pleased with the loading speed of a billion 
triples but then it just flatlines. I have included a set of graphs 
which show the relevant behavior and statistics on this machine. Maybe 
you could have a look at it. You might think that this is due to 
performance dropping of after loading a billion triples but I have the 
same problem the otherway round. See attachment For discussion.png) 
where performance first flatlines taking 32 hours to insert 300 
million triples before recovering and loads a billion triples in 5 
hours. This is with an empty ruleset.


What could cause this behaviour?

Regards,
Jerven



Some data from owlim.properties as written after a sync (insertion of 
http://www.ontotext.com/owlim/system#flush)


Uniform owlim image
NumberOfStatements=1039890206
NumberOfExplicitStatements=1039890206
NumberOfEntities=403958694
VersionId=40
BNodeCounter=238


And all the relevant statistics.

 Original Message 
Subject: JMX values for ptx-serv01.vital-it.ch at 2011-07-26
Date: Tue, 26 Jul 2011 06:38:24 + (GMT)
From: nore...@uniprot.org
To: uuw_...@isb-sib.ch



Started on 2011-07-25 04:02
JVM Java HotSpot(TM) 64-Bit Server VM Sun Microsystems Inc.(20.1-b02)
Runtime name 6...@ptx-serv01.vital-it.ch
JVM Arguments
-Dwar=/data/sparql_uuw/expasy4j-sparql/dist/expasy4j-sparql.war
-Djava.util.logging.config.file=tomcat/conf/logging.properties
-Duser.timezone=GMT
-Xms45G
-Xmx55G
-XX:+HeapDumpOnOutOfMemoryError
-Djava.io.tmpdir=/data/tmp
-Djava.awt.headless=true
-Dexpasy_sparql_entityIndexsize=1147483647
-Dexpasy_sparql_cacheMemory=7G
-Dexpasy_sparql_tupleIndexMemory=5G
-Dshutdown.port=8081
-Dhttp.port=8080
-Dsecure.port=8082
-Dcom.sun.management.jmxremote
-Dcom.sun.management.jmxremote.port=6969
-Dcom.sun.management.jmxremote.ssl=false
-Dcom.sun.management.jmxremote.authenticate=false
-Dexpasy_sparql_commitSize=100
-Duniprot.singlethreaded=true
-Dexpasy_sparql_path=/data
-Dexpasy_sparql_ruleset=empty
-Dexpasy_sparql_journaling=false
-Dexpasy_sparql_repositoryFragments=1
-Djava.util.logging.manager=org.apache.juli.ClassLoaderLogManager
-Dcatalina.base=tomcat/
-Dcatalina.home=tomcat/
Running

Re: [Owlim-discussion] Constraint rule

2011-07-05 Thread Damyan Ognyanov

Hi Christos,

You could not do that with the rule engine of Owlim - you may use rules 
to describe pattens in the RDF graph that, if found,  lead to the 
assertion of a particular statement, not as a way to restrict what kind 
of statements one may assert in the storage.


What could be done is to state that if such statements are asserted with 
both X and Z as objects then they are should be considered one and the 
same, e,g X owl:sameAs Z ... if that makes sense to you.
So you may add such a rule to the ruleset or even just state that the 
propety:p from your rule is of type owl:InverseFunctionalProperty. 
That would trigger the same kind of inference.


HTH,
Damyan Ognyanov
Ontotext



On 5.7.2011 г. 14:47 ч., Christos Strubulis wrote:

Hello to all,
I am trying to make a constraint rule in SwiftOwlim 3.5 but I cannot. 
I would like to tell the reasoner that when there is an already 
inferred statement in the KB: x property:p y  I do not want another 
one with same y.


One example of my rule is the following:

Id: r1
x property:p y [Constraint x != z]
z property:p y [Constraint z != x]
---
x property:p  y

I had read in the thread rule format  Cut, again that I can have 
the above behavior using [Cut]:


Id: r1
x property:p y [Constraint x != z] [Cut]
z property:p y [Constraint z != x]
---
x property:p  y

Any help on this plz...



___
OWLIM-discussion mailing list
OWLIM-discussion@ontotext.com
http://ontotext.com/mailman/listinfo/owlim-discussion


___
OWLIM-discussion mailing list
OWLIM-discussion@ontotext.com
http://ontotext.com/mailman/listinfo/owlim-discussion


Re: [Owlim-discussion] Adding rules to an existing repository.

2011-03-25 Thread Damyan Ognyanov

Hi Danny,

in short - you can use only a single ruleset file - so you need to find 
a way to combine both into a single .pie - the easiest way is to start 
with one of those we provide with the distribution and alter it accordingly.


HTH,
Damyan Ognyanov
Ontotext AD

On 25.3.2011 г. 09:25 ч., Danny Tran wrote:

Thanks Ivan!

I've read the user guide section 5 and I have a question that isn't
answered there:

Is it possible to have multiple ruleset files for a repo?  When I
modified the swiftowlim.ttl file to assert multiple owlim:ruleset, I
got an error.  Does this mean I have to combine my custom_rules.pie
with the builtin_owl2-rl-conf.pie into one ruleset file?

Thanks again for your help,
Danny

On Thu, Mar 24, 2011 at 4:45 AM, Ivan Peikovivan.pei...@ontotext.com  wrote:

Hi Danny,

The SwiftOWLIM user guide has it all described under section 5 (syntax,
semantics, examples, etc). After you've gone through it we'll answer any
questions about particular rules you might want to implement. Just post them
here on the mailing list.

Good luck!


Cheers,
Ivan

On Tuesday 22 March 2011 18:51:04 Danny Tran wrote:

Can someone point me in the right direction for
documentation/discussion forums about adding rules (via .pie file?)
to an existing repository?

I'm using SwiftOwlim 3.4

Thanks!
Danny
___
OWLIM-discussion mailing list
OWLIM-discussion@ontotext.com
http://ontotext.com/mailman/listinfo/owlim-discussion


___
OWLIM-discussion mailing list
OWLIM-discussion@ontotext.com
http://ontotext.com/mailman/listinfo/owlim-discussion


___
OWLIM-discussion mailing list
OWLIM-discussion@ontotext.com
http://ontotext.com/mailman/listinfo/owlim-discussion


Re: [Owlim-discussion] OWLIM-discussion Digest, Vol 26, Issue 19

2011-03-21 Thread Damyan Ognyanov

Hi Roberto,

the exception is not thrown because the query was not processed - it was 
thrown during query pre-processing when we do a conversion from Jena/ARQ 
query model to our own for further evaluation - our internal one do not 
support all the features of Jena/ARQ one so we convert only those pars 
of the query that we are able to process - everything else is handled by 
ARQ engine.


The stack trace you see is a leftover / debug print that slip in our 
official release of BO 3.4 and is generated when we encounter a query 
construct we do not know how to process with our model an let it ARQ 
proceed with it.


I've tried that exact query from your post and it gives me some 
meaningful results even if only the axioms of owl-horst-optimized 
ruleset are present: e.g. classes like rdf:List, rdf:Property;rdfs:Class 
... etc.


HTH,
Damyan Ognyanov
Ontotext AD

On 21.3.2011 г. 11:43 ч., Roberto García wrote:

Reoberto

can you be a bit more specific, please?

for the sake of background, I am not aware of a perfect SPARQL egine

Do you mean the current actual recommendation SPATRQL 1.0 or the newer version 
1.1?

Cheers
Naso

OK, I'm reattaching the problem details below:

We are trying to develop a OWLIM connector for our Linked Data
publishing platform Rhizomer (http://rhizomik.net/rhizomer/).

First of all, we init the OWLIM repository as shown in the documentation:

public void init(ServletConfig config) throws Exception
{
if (config.getInitParameter(dir_name)!=null)
{
String basePath
=config.getServletContext().getRealPath(config.getInitParameter(dir_name));
OwlimSchemaRepository schema = new OwlimSchemaRepository();
// set the data folder where BigOWLIM will persist its data
schema.setDataDir(new File(basePath+/owlim));
// configure BigOWLIM with some parameters
schema.setParameter(storage-folder, ./);
schema.setParameter(repository-type, file-repository);
schema.setParameter(ruleset, rdfs);
// wrap it into a Sesame SailRepository
SailRepository repository = new SailRepository(schema);
// initialize
repository.initialize();
RepositoryConnection connection = repository.getConnection();
// finally, create the DatasetGraph instance
dataset = new SesameDataset(connection);
model = ModelFactory.createModelForGraph(dataset.getDefaultGraph());
}
}

It works fine:

INFO [main] (JCLLoggerAdapter.java:265) - ConnectorServlet
successfully initialized!
  INFO [main] (JCLLoggerAdapter.java:265) - OwlimSchemaRepository:
version: 3.4, revision: 3012
  INFO [main] (JCLLoggerAdapter.java:265) - Build date: Fri Nov 26
16:11:15 CET 2010
  INFO [main] (JCLLoggerAdapter.java:265) - Configured parameter
'ruleset' to 'rdfs'
  INFO [main] (JCLLoggerAdapter.java:265) - Cache pages for tuples: 4193
  INFO [main] (JCLLoggerAdapter.java:265) - Cache pages for predicates: 0
  INFO [main] (JCLLoggerAdapter.java:265) - Configured parameter
'storage-folder' to './'
  INFO [main] (JCLLoggerAdapter.java:265) - Detected unclean shutdown
  INFO [main] (JCLLoggerAdapter.java:265) - Starting automatic database
recovery...
  INFO [main] (JCLLoggerAdapter.java:265) - Restoring entities from
persistence...
  INFO [main] (JCLLoggerAdapter.java:265) - Done in 65 ms.
  INFO [main] (JCLLoggerAdapter.java:265) - Repository must be rebuilt.
  INFO [main] (JCLLoggerAdapter.java:265) - Restoring statements from
/Users/roberto/Documents/Proyectos/Rhizomer/src/main/webapp/metadata/owlim/./backup...
  INFO [main] (JCLLoggerAdapter.java:265) - ruleSet=rdfs,
partialRdfs=false, multithread=false
  INFO [main] (JCLLoggerAdapter.java:265) - NumberOfStatements = 851
  INFO [main] (JCLLoggerAdapter.java:265) - NumberOfExplicitStatements = 846
  INFO [main] (JCLLoggerAdapter.java:265) - NumberOfEntities = 136
  INFO [main] (JCLLoggerAdapter.java:265) - 0 statements overall.
ERROR [main] (JCLLoggerAdapter.java:457) - Done in 796 ms.
  INFO [main] (JCLLoggerAdapter.java:265) - Finished automatic database recovery
  INFO [main] (JCLLoggerAdapter.java:265) - Restoring entity hash table...
  INFO [main] (JCLLoggerAdapter.java:265) - Done in 73 ms.
  INFO [main] (JCLLoggerAdapter.java:265) - Using Hash Entity Pool
  INFO [main] (JCLLoggerAdapter.java:265) - Configured parameter
'repository-type' to 'file-repository'
  INFO [main] (JCLLoggerAdapter.java:265) - ruleSet=rdfs,
partialRdfs=false, multithread=false
  INFO [main] (JCLLoggerAdapter.java:265) - Searching for plugins
available in the classpath...
  INFO [main] (JCLLoggerAdapter.java:265) - Registering plugin fts
  INFO [main] (JCLLoggerAdapter.java:265) - Registering plugin direct
  INFO [main] (JCLLoggerAdapter.java:265) - Registering plugin rdfrank
  INFO [main] (JCLLoggerAdapter.java:265) - Registering plugin geospatial
  INFO [main] (JCLLoggerAdapter.java:265