Re: TDB: records not strictly increasing
B/ A different, better approach is to build a special version of TDB. The changes needed are small but you need to build Jena. These instructions apply to code in SVN as it is now, today. Not the last release, not last week. It's just easier to setup and explain from the current code base as a small recent change centralised the point you need to change and also introduced an easy to use testing feature. 1/ svn co the Jena code from trunk. Done 2/ Build Jena mvn clean install Done It is easier to build and install than just package. You must use the development releases of the other modules. I don't think you need to set up maven to use the snapshot builds on Apache but if you do: Set repository http://jena.apache.org/download/maven.html 3/ mvn eclipse:eclipse to use Eclipse if you plan to use that to edit the code. Didn't set up maven or use Eclipse. 4/ Setup to use this build for tdbdump. e.g. the apache-jena or fuseki. For added ease - use the Fuseki server jar which as everything in it java -cp fuseki-server.jar tdb.tdbdump —version java -cp jena-fuseki/target/jena-fuseki-0.2.6-SNAPSHOT-server.jar tdb.tdbdump —version Jena: VERSION: 2.10.0-SNAPSHOT Jena: BUILD_DATE: 2013-01-28T21:00:30+ ARQ:VERSION: 2.10.0-SNAPSHOT ARQ:BUILD_DATE: 2013-01-28T21:00:30+ TDB:VERSION: 0.10.0-SNAPSHOT TDB:BUILD_DATE: 2013-01-28T21:00:30+ Check timestamps/version numbers. 5/ Test create a small text file of a few triples. --- D.ttl @prefix : http://example/ . :s1 :p 1 . :s2 :p 2 . :s3 :q 3 . :s2 :q 4 . :s1 :q 5 . --- tdbdump --data D.ttl should dump the file with triples clustered by subject. (no - you do not need to load a database - --data is a recent feature for testing) java -cp jena-fuseki/target/jena-fuseki-0.2.6-SNAPSHOT-server.jar tdb.tdbdump --data D.ttl http://example/s1 http://example/p 1^^http://www.w3.org/2001/XMLSchema#integer . http://example/s1 http://example/q 5^^http://www.w3.org/2001/XMLSchema#integer . http://example/s2 http://example/p 2^^http://www.w3.org/2001/XMLSchema#integer . http://example/s2 http://example/q 4^^http://www.w3.org/2001/XMLSchema#integer . http://example/s3 http://example/q 3^^http://www.w3.org/2001/XMLSchema#integer . 6/ Edit com.hp.hpl.jena.tdb.index.TupleTable, static method chooseScanAllIndex Change: - if ( tupleLen != 4 ) return indexes[0] ; == if ( tupleLen != 4 ) { if ( indexes.length == 3 ) return indexes[1] ; else return indexes[0] ; } - 7/ Rebuild. Yes - the tests for TDB should pass! 8/ check the new version tdbdump --version check the change tdbdump --data D.ttl and it should be n-triples clustered by property, different to earlier on. java -cp jena-fuseki/target/jena-fuseki-0.2.6-SNAPSHOT-server.jar tdb.tdbdump --data D.ttl http://example/s1 http://example/p 1^^http://www.w3.org/2001/XMLSchema#integer . http://example/s2 http://example/p 2^^http://www.w3.org/2001/XMLSchema#integer . http://example/s3 http://example/q 3^^http://www.w3.org/2001/XMLSchema#integer . http://example/s2 http://example/q 4^^http://www.w3.org/2001/XMLSchema#integer . http://example/s1 http://example/q 5^^http://www.w3.org/2001/XMLSchema#integer . Is it what you expect? Yes. 9/ Dump your database. Hope there is a good index. It works and no errors were reported, however the size of the dump file is just 84MB, which is considerable smaller than the actual tdb (~1GB) Quite possible - especially if you have also been deleting stuff in the database as well as adding. You can also try indexes[2] not indexes[1] to use the OSP index. Each dumps the entire database, but in different triple orders. I did also try this changes of indexes, and it gave me the same error Exception in thread main com.hp.hpl.jena.tdb.base.StorageException: RecordRangeIterator: records not strictly increasing: 021aa0a206cffe6b0005233d // 021a2c0a06b85f9f0005233d The OSP index is also broken. 10/ Clean up maven to get rid of the temporary build. rm -r REPO/org/apache/jena/ 11/ Rebuild the database with tdbloader/tdbloader2. java -cp jena-fuseki/target/jena-fuseki-0.2.6-SNAPSHOT-server.jar tdb.tdbloader --loc=tdb tdb.dump but the size of the tdb is smaller than the original tdb The loader produces more compact indexes than if the data has been loaded incrementally. This is even more the case for tdblaoder2. Also if you have been deleting and adding, for 0.8, then the database can grow. This is addressed, but not totlally fixed in 0.9.X (the load is slower than if dumped in SPO order) I tested the change here on that test file - I don't have a large corrupt database to try it on. Any ideas of how to get it fixed are more than welcome. Personally, I would adopt a 2 stream approach. Do approach above
Re: Combined query over different dataset and interlinking btween dataset
From: Andy Seaborne a...@apache.org To: Vishal Sinha vishal.sinha...@yahoo.com Cc: users@jena.apache.org users@jena.apache.org Sent: Monday, January 28, 2013 5:16 PM Subject: Re: Combined query over different dataset and interlinking btween dataset On 28/01/13 06:24, Vishal Sinha wrote: *From:* Andy Seaborne a...@apache.org *To:* users@jena.apache.org *Sent:* Monday, January 28, 2013 3:25 AM *Subject:* Re: Combined query over different dataset and interlinking btween dataset On 25/01/13 06:43, ankur padia wrote: hello vishal, Based on my experience, list of statement would be help full for question 1 and for question 2 some if condition would be required to be specified. And I think Filter class would be helpful but as I haven't came across any tutorial on Filter. As a result it would be like hit and try. Regards, Ankur Padia On Fri, Jan 25, 2013 at 11:00 AM, Vishal Sinha vishal.sinha...@yahoo.com mailto:vishal.sinha...@yahoo.comwrote: Hi, I have created two Datasets using Jena. Each Datasets having two or three models. Lets say triples in Dataset1 are: x1 y1 z1. x2 y2 z2. x3 y3 z3. x4 y4 z4. x5 y5 z5. x6 y6 z6. Lets say triples in Dataset2 are: x11 y11 z11. x21 y22 z22. x33 y33 z33. x44 y44 z44. x55 y55 z55. x66 y66 z66. My questions: - How can I make combined query on these two data-sets, or lets say multiple datasets using Jena ? - How can I state that 'y22' in Dataset2 is actually same as 'y5' in Dataset1 ? Where should I keep this information? Andy wrote: Do the datasets have named graphs in common? If not, then making a single dataset with all the data in is one possibility. Vishal wrote: Both the datasets has default graph models, not any named graph. Then you can put both in one dataset, each as a named graph. You can query a specific graph with GRAPH or the combined grapgs using unionDefaultGraph. ++ Thanks, it works now. Otherwise, you can create a union graph and put each graph in it. Less efficient but it depends if you have a lot of data or not (= TDB database and several million triples). Andy Andy Thanks, Vishal
Re: listInstances OntClass problem
Thanks for the reply Dave. It seems that jena.apache.org/documentation/ontology/index.html Fig. 5 explains what you are explaining in your email. I still believe thought that javadocs should be more clear about this. Thanks again Regards Papadakos Panagiotis On Tue, Jan 29, 2013 at 3:24 PM, Dave Reynolds dave.e.reyno...@gmail.comwrote: On 24/01/13 13:08, Panagiotis Papadakos wrote: Ian and Dave, thank you both for your help. I didn't post the correct code and I am sorry for this. Regarding the ontology, I know it is not correct. Maybe changing Europe to European, Germany to German, etc. would be better. Now regarding the listInstances method, I still believe something is wrong either in the code, in the API or in my way of thinking. listInstances is supposed to return the instances, either direct or instances of its subclasses. Unfortunately if I use a simple RDF_MEM model with no inference, listInstances(false) for the Manufacturer class returns no result. Somehow I feel this is wrong. I was thinking that internally, since there is no inference, jena should visit each subclass, and the subclasses of them, etc. getting the direct instances of each one and returning all the instances of the class and its subclasses. Is this correct? No. The notion is that reasoning is the job of the reasoner and that the OntAPI provides convenient access to that, but doesn't duplicate it. There are a few special cases but in general if you want reasoning then configure a reasoner. Now regarding listInstances(true), I am supposing that it should return all direct instances of the class, even if these instances are also instances of a subclass (which for example can happen if I load the TestInference.rdf file). No. That's the point of direct, as it says in the javadoc setting direct=true means excluding sub-classes of this class. If something is also an instance of a subclass of C then it is not a direct instance of C and should not be returned by listInstances(true). Dave -- http://www.flickr.com/photos/papadako
OWL Deprecation in Schemagen-generated classes
A colleague and I have been using Jena's schemagen to get lots of generated constants from a vocabulary we've developed. We're at the point that we're marking some of the vocabulary deprecated. It would be convenient for our application code that uses the vocabulary if the vocabulary constants that are deprecated also had a Java deprecation annotation. Our application would then generate compiler warnings where it used deprecated vocabulary. This raises two questions: * We didn't find anything in the Jena schemagen doc describing this. Are we correct that schemagen can't presently do this? * This probably isn't too hard to implement; we might go and do it if we get some free time. Is there any interest in this? (I.e., if we submitted it as a patch, would it be added to Jena, and would it be useful to anyone?) Thanks, //JT -- Joshua Taylor, http://www.cs.rpi.edu/~tayloj/
Re: OWL Deprecation in Schemagen-generated classes
On Tue, Jan 29, 2013 at 12:55 PM, Joshua TAYLOR joshuaaa...@gmail.com wrote: A colleague and I have been using Jena's schemagen to get lots of generated constants from a vocabulary we've developed. We're at the point that we're marking some of the vocabulary deprecated. It would be convenient for our application code that uses the vocabulary if the vocabulary constants that are deprecated also had a Java deprecation annotation. Our application would then generate compiler warnings where it used deprecated vocabulary. This raises two questions: * We didn't find anything in the Jena schemagen doc describing this. Are we correct that schemagen can't presently do this? * This probably isn't too hard to implement; we might go and do it if we get some free time. Is there any interest in this? (I.e., if we submitted it as a patch, would it be added to Jena, and would it be useful to anyone?) On Tue, Jan 29, 2013 at 1:03 PM, Stephen Allen sal...@apache.org wrote: Sounds very useful to me, I use schemagen a fair amount. Looking forward to a patch. The best way to submit it would be to create a new issue on our JIRA site [1], and submit it there as an attachment. -Stephen [1] https://issues.apache.org/jira/browse/JENA Sounds like a plan. It's not a particularly high priority thing for us at the moment, so I don't have any particular ETA, but it's on the long-term to-do if we get the time list. :) //JT -- Joshua Taylor, http://www.cs.rpi.edu/~tayloj/
Re: Binding causes hang in Fuseki
Cool, thanks guys, will give this a try tomorrow :-) Rob On Tue, Jan 29, 2013 at 7:36 PM, Andy Seaborne a...@apache.org wrote: On 29/01/13 18:21, Alexander Dutton wrote: -BEGIN PGP SIGNED MESSAGE- Hash: SHA1 Hi Rob, On 29/01/13 18:11, Rob Walpole wrote: Am I doing something wrong here? The short answer is that the inner SELECT is evaluated first, leading to the results being calculated in the second case in a rather inefficient way. In the first inner SELECT ?deselected is bound, so it's quite quick to find all its ancestors. In the second, all possible ?deselected and ?ancestor pairs are returned by the inner query, which are then (effectively) filtered to remove all the pairs where ?deselected isn't whatever it was BINDed to. Here's more from the spec: http://www.w3.org/TR/**sparql11-query/#subquerieshttp://www.w3.org/TR/sparql11-query/#subqueries . I /think/ ARQ is able to perform some optimisations along these lines, but obviously not for your query. Spot on. If you remove the inner SELECT it should do better. { BIND(...) AS ?readyStatus) BIND(...) AS ?deselected) ?export rdfs:member ?member . ?export dri:username rwalpole^^xsd:string . ?export dri:exportStatus ?readyStatus OPTIONAL { ?deselected (dri:parent)+ ?ancestor FILTER EXISTS {?export rdfs:member ?ancestor } } } but technically this is a different query so it'll depend on your data as to whether it is right. http://www.sparql.org/query-**validator.htmlhttp://www.sparql.org/query-validator.html Andy Best regards, Alex PS. You don't need to do URI(http://?;); you can do a straight IRI literal: http://? - -- Alexander Dutton Developer, Office of the CIO; data.ox.ac.uk, OxPoints IT Services, University of Oxford -BEGIN PGP SIGNATURE- Version: GnuPG v1.4.13 (GNU/Linux) Comment: Using GnuPG with Thunderbird - http://www.enigmail.net/ iQEcBAEBAgAGBQJRCBMZAAoJEPotab**D1ANF7Fb0H/**jeCedjfCIuhI2KTNETOcrVR Gvl8N4k9ty4AN4F0xFKA3kcGCTR2CI**pgz/**hez6BM5s8mDqLc7ViNPXWxbUhb4kHh fxVuuoYBr13VhGnyufvWFliFeT3xSV**LO3eDUilzoja2pvH/Cx/**sNQvcHbi2Ee+EX MoWLyfSvtSGY2rXDmMAXvBz49wgk42**mC2Bsr5ptNUfXWQjzz6BXp5SxTKADy**SBXG Tm/**DmqGRclHxw233I6EcB9lKfDytTosVu**gH1Yl0BGEHiFPL2/wkkB+**AZiLIwCmb/ cy+Y8/**I9PlD4onvYlDMRmP169HQVYt849Skx**5/TnTyjMBBNIgQiE8+cj0a/oDc8= =ZQec -END PGP SIGNATURE- -- Rob Walpole Email robkwalp...@gmail.com Tel. +44 (0)7969 869881 Skype: RobertWalpolehttp://www.linkedin.com/in/robwalpole