Re: [Owlim-discussion] Questions about strange triple insertion rate changes
Hi Jerven, a lot of data to look at ... though didn't examine it very closely at this point, I like to give you some comments on a possible explanation on the behavior you are observing ... First, the statement storage for each of the main indices (po, pso) is organized as B+ tree where each component of the statement (subject, predicate, etc.) is an integer(40bit) number representing the id of the RDF node. Those Ids are assigned when a new node appear in the data as a component of some RDF statement . So the new nodes introduced by your import process get greater integer values and because those are most probably either subjects or objects of the new statements, those statements are always put at the end of a particular subsection (depending on the ordering used) of the BTrees. Because of the caching, it is more probably that those pages that are about to be altered are found in the cache (the new data is at the end of some sub-sequence because of the natural ordering by ID we are using) so the whole process runs smoothly. When the data you are putting start to reuse some already existing nodes, then the statements you are storing are no longer at the end of such particular subsection of the tree and tend to be randomly distributed across the whole tree and the cache becames inefficient, mainly because the page where you are about to place the new statement is most probably out of it and it needs to be read from the disk. Yes, it is cashed at that point but you probably will not need it anymore after that single operation so the seek/read cost became an issue and the process of data import slows down greatly. One way to reduce that cost is to somehow tweak the order of the statements you are importing (needs some pre-processing to rearrange them in some particular way so that the caching start working efficiently) . For instance, you may start adding the statements at batches organized by a particullar predicate - that way the whole tree subsection that holds all the statements with that particular predicate will be cashed (most of it) so even the statements are a bit random they always will be part of that section. In any way I will look closely at the data you sent to see if something else pop in my mind related to that sudden drop of the throughput of the storage. many thanks for the detailed info you sent, Regards, Damyan Ognyanov Ontotext AD. On 26.7.2011 ?. 09:49 ?., Jerven Bolleman wrote: Dear Owlim developers, I am trying to load all the UniProt data on a 64GB RAM machine. I have a case where I am very pleased with the loading speed of a billion triples but then it just flatlines. I have included a set of graphs which show the relevant behavior and statistics on this machine. Maybe you could have a look at it. You might think that this is due to performance dropping of after loading a billion triples but I have the same problem the otherway round. See attachment For discussion.png) where performance first flatlines taking 32 hours to insert 300 million triples before recovering and loads a billion triples in 5 hours. This is with an empty ruleset. What could cause this behaviour? Regards, Jerven Some data from owlim.properties as written after a sync (insertion of http://www.ontotext.com/owlim/system#flush) Uniform owlim image NumberOfStatements=1039890206 NumberOfExplicitStatements=1039890206 NumberOfEntities=403958694 VersionId=40 BNodeCounter=238 And all the relevant statistics. Original Message Subject: JMX values for ptx-serv01.vital-it.ch at 2011-07-26 Date: Tue, 26 Jul 2011 06:38:24 + (GMT) From: nore...@uniprot.org To: uuw_...@isb-sib.ch Started on 2011-07-25 04:02 JVM Java HotSpot(TM) 64-Bit Server VM Sun Microsystems Inc.(20.1-b02) Runtime name 6...@ptx-serv01.vital-it.ch JVM Arguments -Dwar=/data/sparql_uuw/expasy4j-sparql/dist/expasy4j-sparql.war -Djava.util.logging.config.file=tomcat/conf/logging.properties -Duser.timezone=GMT -Xms45G -Xmx55G -XX:+HeapDumpOnOutOfMemoryError -Djava.io.tmpdir=/data/tmp -Djava.awt.headless=true -Dexpasy_sparql_entityIndexsize=1147483647 -Dexpasy_sparql_cacheMemory=7G -Dexpasy_sparql_tupleIndexMemory=5G -Dshutdown.port=8081 -Dhttp.port=8080 -Dsecure.port=8082 -Dcom.sun.management.jmxremote -Dcom.sun.management.jmxremote.port=6969 -Dcom.sun.management.jmxremote.ssl=false -Dcom.sun.management.jmxremote.authenticate=false -Dexpasy_sparql_commitSize=100 -Duniprot.singlethreaded=true -Dexpasy_sparql_path=/data -Dexpasy_sparql_ruleset=empty -Dexpasy_sparql_journaling=false -Dexpasy_sparql_repositoryFragments=1 -Djava.util.logging.manager=org.apache.juli.ClassLoaderLogManager -Dcatalina.base=tomcat/ -Dcatalina.home=tomcat/ Running
Re: [Owlim-discussion] Constraint rule
Hi Christos, You could not do that with the rule engine of Owlim - you may use rules to describe pattens in the RDF graph that, if found, lead to the assertion of a particular statement, not as a way to restrict what kind of statements one may assert in the storage. What could be done is to state that if such statements are asserted with both X and Z as objects then they are should be considered one and the same, e,g X owl:sameAs Z ... if that makes sense to you. So you may add such a rule to the ruleset or even just state that the propety:p from your rule is of type owl:InverseFunctionalProperty. That would trigger the same kind of inference. HTH, Damyan Ognyanov Ontotext On 5.7.2011 г. 14:47 ч., Christos Strubulis wrote: Hello to all, I am trying to make a constraint rule in SwiftOwlim 3.5 but I cannot. I would like to tell the reasoner that when there is an already inferred statement in the KB: x property:p y I do not want another one with same y. One example of my rule is the following: Id: r1 x property:p y [Constraint x != z] z property:p y [Constraint z != x] --- x property:p y I had read in the thread rule format Cut, again that I can have the above behavior using [Cut]: Id: r1 x property:p y [Constraint x != z] [Cut] z property:p y [Constraint z != x] --- x property:p y Any help on this plz... ___ OWLIM-discussion mailing list OWLIM-discussion@ontotext.com http://ontotext.com/mailman/listinfo/owlim-discussion ___ OWLIM-discussion mailing list OWLIM-discussion@ontotext.com http://ontotext.com/mailman/listinfo/owlim-discussion
Re: [Owlim-discussion] Adding rules to an existing repository.
Hi Danny, in short - you can use only a single ruleset file - so you need to find a way to combine both into a single .pie - the easiest way is to start with one of those we provide with the distribution and alter it accordingly. HTH, Damyan Ognyanov Ontotext AD On 25.3.2011 г. 09:25 ч., Danny Tran wrote: Thanks Ivan! I've read the user guide section 5 and I have a question that isn't answered there: Is it possible to have multiple ruleset files for a repo? When I modified the swiftowlim.ttl file to assert multiple owlim:ruleset, I got an error. Does this mean I have to combine my custom_rules.pie with the builtin_owl2-rl-conf.pie into one ruleset file? Thanks again for your help, Danny On Thu, Mar 24, 2011 at 4:45 AM, Ivan Peikovivan.pei...@ontotext.com wrote: Hi Danny, The SwiftOWLIM user guide has it all described under section 5 (syntax, semantics, examples, etc). After you've gone through it we'll answer any questions about particular rules you might want to implement. Just post them here on the mailing list. Good luck! Cheers, Ivan On Tuesday 22 March 2011 18:51:04 Danny Tran wrote: Can someone point me in the right direction for documentation/discussion forums about adding rules (via .pie file?) to an existing repository? I'm using SwiftOwlim 3.4 Thanks! Danny ___ OWLIM-discussion mailing list OWLIM-discussion@ontotext.com http://ontotext.com/mailman/listinfo/owlim-discussion ___ OWLIM-discussion mailing list OWLIM-discussion@ontotext.com http://ontotext.com/mailman/listinfo/owlim-discussion ___ OWLIM-discussion mailing list OWLIM-discussion@ontotext.com http://ontotext.com/mailman/listinfo/owlim-discussion
Re: [Owlim-discussion] OWLIM-discussion Digest, Vol 26, Issue 19
Hi Roberto, the exception is not thrown because the query was not processed - it was thrown during query pre-processing when we do a conversion from Jena/ARQ query model to our own for further evaluation - our internal one do not support all the features of Jena/ARQ one so we convert only those pars of the query that we are able to process - everything else is handled by ARQ engine. The stack trace you see is a leftover / debug print that slip in our official release of BO 3.4 and is generated when we encounter a query construct we do not know how to process with our model an let it ARQ proceed with it. I've tried that exact query from your post and it gives me some meaningful results even if only the axioms of owl-horst-optimized ruleset are present: e.g. classes like rdf:List, rdf:Property;rdfs:Class ... etc. HTH, Damyan Ognyanov Ontotext AD On 21.3.2011 г. 11:43 ч., Roberto García wrote: Reoberto can you be a bit more specific, please? for the sake of background, I am not aware of a perfect SPARQL egine Do you mean the current actual recommendation SPATRQL 1.0 or the newer version 1.1? Cheers Naso OK, I'm reattaching the problem details below: We are trying to develop a OWLIM connector for our Linked Data publishing platform Rhizomer (http://rhizomik.net/rhizomer/). First of all, we init the OWLIM repository as shown in the documentation: public void init(ServletConfig config) throws Exception { if (config.getInitParameter(dir_name)!=null) { String basePath =config.getServletContext().getRealPath(config.getInitParameter(dir_name)); OwlimSchemaRepository schema = new OwlimSchemaRepository(); // set the data folder where BigOWLIM will persist its data schema.setDataDir(new File(basePath+/owlim)); // configure BigOWLIM with some parameters schema.setParameter(storage-folder, ./); schema.setParameter(repository-type, file-repository); schema.setParameter(ruleset, rdfs); // wrap it into a Sesame SailRepository SailRepository repository = new SailRepository(schema); // initialize repository.initialize(); RepositoryConnection connection = repository.getConnection(); // finally, create the DatasetGraph instance dataset = new SesameDataset(connection); model = ModelFactory.createModelForGraph(dataset.getDefaultGraph()); } } It works fine: INFO [main] (JCLLoggerAdapter.java:265) - ConnectorServlet successfully initialized! INFO [main] (JCLLoggerAdapter.java:265) - OwlimSchemaRepository: version: 3.4, revision: 3012 INFO [main] (JCLLoggerAdapter.java:265) - Build date: Fri Nov 26 16:11:15 CET 2010 INFO [main] (JCLLoggerAdapter.java:265) - Configured parameter 'ruleset' to 'rdfs' INFO [main] (JCLLoggerAdapter.java:265) - Cache pages for tuples: 4193 INFO [main] (JCLLoggerAdapter.java:265) - Cache pages for predicates: 0 INFO [main] (JCLLoggerAdapter.java:265) - Configured parameter 'storage-folder' to './' INFO [main] (JCLLoggerAdapter.java:265) - Detected unclean shutdown INFO [main] (JCLLoggerAdapter.java:265) - Starting automatic database recovery... INFO [main] (JCLLoggerAdapter.java:265) - Restoring entities from persistence... INFO [main] (JCLLoggerAdapter.java:265) - Done in 65 ms. INFO [main] (JCLLoggerAdapter.java:265) - Repository must be rebuilt. INFO [main] (JCLLoggerAdapter.java:265) - Restoring statements from /Users/roberto/Documents/Proyectos/Rhizomer/src/main/webapp/metadata/owlim/./backup... INFO [main] (JCLLoggerAdapter.java:265) - ruleSet=rdfs, partialRdfs=false, multithread=false INFO [main] (JCLLoggerAdapter.java:265) - NumberOfStatements = 851 INFO [main] (JCLLoggerAdapter.java:265) - NumberOfExplicitStatements = 846 INFO [main] (JCLLoggerAdapter.java:265) - NumberOfEntities = 136 INFO [main] (JCLLoggerAdapter.java:265) - 0 statements overall. ERROR [main] (JCLLoggerAdapter.java:457) - Done in 796 ms. INFO [main] (JCLLoggerAdapter.java:265) - Finished automatic database recovery INFO [main] (JCLLoggerAdapter.java:265) - Restoring entity hash table... INFO [main] (JCLLoggerAdapter.java:265) - Done in 73 ms. INFO [main] (JCLLoggerAdapter.java:265) - Using Hash Entity Pool INFO [main] (JCLLoggerAdapter.java:265) - Configured parameter 'repository-type' to 'file-repository' INFO [main] (JCLLoggerAdapter.java:265) - ruleSet=rdfs, partialRdfs=false, multithread=false INFO [main] (JCLLoggerAdapter.java:265) - Searching for plugins available in the classpath... INFO [main] (JCLLoggerAdapter.java:265) - Registering plugin fts INFO [main] (JCLLoggerAdapter.java:265) - Registering plugin direct INFO [main] (JCLLoggerAdapter.java:265) - Registering plugin rdfrank INFO [main] (JCLLoggerAdapter.java:265) - Registering plugin geospatial INFO [main] (JCLLoggerAdapter.java:265