Re: OntModel.read parsing non-suffixed TTL as RDF/XML
Sorry for the OT question but it's the first time I learn about SWEET. Is there any list of datasets/graphs using this vocabulary? Sent: Wednesday, March 07, 2018 at 5:28 AM From: "lewis john mcgibbney"To: users@jena.apache.org Subject: OntModel.read parsing non-suffixed TTL as RDF/XML Hi Folks, Over on the SWEET ontology suite [0] we recently changed out canonical serialization to TTL. Additionally however, we removed all file suffixes from the resources themselves, therefore although the following resource [1] is serialized as TTL, you would never know unless you looked at it or peeked into the server response. I am experiencing issues when I attempt to load the SWEET 'master' file [2], which essentially produces a graph by importing every file within the ontology suite. The code I use to do this is as follows. ((OntModel) m).read(url, null, lang); //Where lang is the string "TTL" or "TURTLE" depending on previous logic. The stack trace I get is as follows [3] As you can see it loads the sweetAll file correctly but chokes on the imported resource http://sweetontology.net/realmSoil, which is also a TTL serialization but with no file suffix. The stack trace indicates that an attempt is being made to parse the resource as RDF/XML... which in this case is incorrect. Any hints on how to override/define this behaviour? Another related question, the large stack trace I receive from the above parsing activity seems to also indicate that the OntModel.read logic randomly processes imports when processing a resource such as [2]. Is this correct? Thanks for any hits folks, I appreciate it. Lewis [0] http://sweetontology.net[http://sweetontology.net] [1] http://sweetontology.net/reprDataProduct[http://sweetontology.net/reprDataProduct] [2] http://sweetontology.net/sweetAll[http://sweetontology.net/sweetAll] [3] https://paste.apache.org/vLt8[https://paste.apache.org/vLt8] -- http://home.apache.org/~lewismc/[http://home.apache.org/~lewismc/] http://people.apache.org/keys/committer/lewismc
OntModel.read parsing non-suffixed TTL as RDF/XML
Hi Folks, Over on the SWEET ontology suite [0] we recently changed out canonical serialization to TTL. Additionally however, we removed all file suffixes from the resources themselves, therefore although the following resource [1] is serialized as TTL, you would never know unless you looked at it or peeked into the server response. I am experiencing issues when I attempt to load the SWEET 'master' file [2], which essentially produces a graph by importing every file within the ontology suite. The code I use to do this is as follows. ((OntModel) m).read(url, null, lang); //Where lang is the string "TTL" or "TURTLE" depending on previous logic. The stack trace I get is as follows [3] As you can see it loads the sweetAll file correctly but chokes on the imported resource http://sweetontology.net/realmSoil, which is also a TTL serialization but with no file suffix. The stack trace indicates that an attempt is being made to parse the resource as RDF/XML... which in this case is incorrect. Any hints on how to override/define this behaviour? Another related question, the large stack trace I receive from the above parsing activity seems to also indicate that the OntModel.read logic randomly processes imports when processing a resource such as [2]. Is this correct? Thanks for any hits folks, I appreciate it. Lewis [0] http://sweetontology.net [1] http://sweetontology.net/reprDataProduct [2] http://sweetontology.net/sweetAll [3] https://paste.apache.org/vLt8 -- http://home.apache.org/~lewismc/ http://people.apache.org/keys/committer/lewismc
Re: Streaming CONSTRUCT/INSERTs in TDB
On 03.03.18 17:11, Andy Seaborne wrote: Hi Andy, > Do you have an example of such an update? yes I can deliver two use-cases, with data and query. First one is this dataset: http://ktk.netlabs.org/misc/rdf/fuseki-lock.nq.gz Query: https://pastebin.com/7TbsiAii This returns reliably in Stardog, in less than one minute. The UNION is most probably necessary due to blank-nodes issues so I don't think I can split them. In Fuseki it runs out with Exception in thread "main" java.lang.OutOfMemoryError: GC overhead limit exceeded And I once allocated almost all I had on my system (> 8GB) > Some cases can't stream, but it is possible some cases aren't streaming > when they could. ok > Or the whole transaction is quite large which is where TDB2 comes in. I did try that on TDB2 recently as well, same issue. Will post the other sample later. regards Adrian
Re: NullPointerExceptions in v3.6.0
Hi Andy, It does seem weird. Just for completeness, I changed my first example to: Model model = dataset.getDefaultModel(); model.add(new StatementImpl( ModelFactory.createDefaultModel().createResource("http://example.org/thing "), FOAF.name, ModelFactory.createDefaultModel().createLiteral("Chris"))); but (on Windows 10, at least) I still get the error. Chris On 5 March 2018 at 16:31, Andy Seabornewrote: > > > On 02/03/18 18:02, Chris Wood wrote: > >> Hi Andy, >> >> I've answered one question, but got another... >> >> Your comment "What is more, I can't line the stacktrace line numbers up >> with the code. Jena 3.0.1 lines up better" made me sheepishly realise >> that >> my tdbquery path was pointing to an old version of Jena I had locally >> (although this was actually v3.3.0). I changed tdbquery to use v3.6.0 and >> it worked! But, it struck me that I'd worked on some other code recently >> where I'd compiled with 3.6.0 but tested with tdbquery 3.3.0 and hadn't >> seen errors - so I did some basic tests. >> > > Even with a mix of versions, I could not reproduce the NPE. The DB format > has not changed and ont at RDF 1.1 / Jena 3.0 was a reload needed. > > I've found that reading in a URL directly into a model doesn't result in >> this error; i.e. using my previous workflow but using this version of >> jena_test.java >> >> >> import org.apache.jena.query.Dataset; >> import org.apache.jena.query.ReadWrite; >> import org.apache.jena.rdf.model.Model; >> import org.apache.jena.rdf.model.ModelFactory; >> import org.apache.jena.tdb.TDBFactory; >> >> public class jena_test { >> public static void main(String args[]) >> { >> Dataset dataset = TDBFactory.createDataset("my_dataset"); >> Model model = dataset.getDefaultModel(); >> dataset.begin(ReadWrite.WRITE); >> >> try{ >> model.add(ModelFactory.createDefaultModel().read(" >> https://www.w3.org/TR/REC-rdf-syntax/example14.nt;, "N-TRIPLE")); >> dataset.commit(); >> } finally { >> dataset.close(); >> } >> } >> } >> > > In your first example theer are > > ModelFactory.createDefaultModel().createResource("person:1") > ModelFactory.createDefaultModel().createResource("Chris") > > which are dubious URIs (the first is the "person:" URI schema, not a > namespace of person, the second is a relative URI). > > > https://www.w3.org/TR/REC-rdf-syntax/example14.nt > > is all good URIs. > > Andy > > >> >> with the same pom version but tdbquery v3.3.0 >> >> C:\Users\chris\jena_test>tdbquery --version >> Jena: VERSION: 3.3.0 >> Jena: BUILD_DATE: 2017-05-02T17:38:25+ >> ARQ:VERSION: 3.3.0 >> ARQ:BUILD_DATE: 2017-05-02T17:38:25+ >> RIOT: VERSION: 3.3.0 >> RIOT: BUILD_DATE: 2017-05-02T17:38:25+ >> TDB:VERSION: 3.3.0 >> TDB:BUILD_DATE: 2017-05-02T17:38:25+ >> >> >> >> results in >> >> C:\Users\chris\jena_test >tdbquery --loc="my_dataset" "SELECT (count(*) as >> ?t) where {?a ?b ?c . }" >> - >> | t | >> = >> | 2 | >> - >> >> As expected. >> >> I recognise that using different versions of jena for compiling and for >> tdbquery is almost certainly not supported (even if not implicitly), but >> perhaps raising awareness from the troubles I've had this week might help >> someone else! >> >> Cheers >> Chris >> >> >> >> >> >> On 2 March 2018 at 13:38, Andy Seaborne wrote: >> >> Hi Chris, >>> >>> I am on Linux, with Apache Maven 3.5.2, java openjdk version "1.8.0_151". >>> >>> It works for me. >>> >>> What is more, I can't line the stacktrace line numbers up with the code. >>> Jena 3.0.1 lines up better on >>> >>> JournalControl.recoverSegment(JournalControl.java:185) >>> >>> because that is a call to "replay" >>> >>> >>> I ran the maven compiled version with >>> >>> java -cp target/classes:target/tdb_generator_resources/\* jena_test >>> >>> and then >>> >>> java -cp /home/afs/jlib/apache-jena-3.6.0/lib/\* tdb.tdbquery >>> --loc=my_dataset 'SELECT (count(*) as ?t) where {?a ?b ?c . }' >>> >>> "tdbquery --version" ==> >>> >>> Jena: VERSION: 3.7.0-SNAPSHOT >>> Jena: BUILD_DATE: 2018-02-27T22:54:52+ >>> ARQ:VERSION: 3.7.0-SNAPSHOT >>> ARQ:BUILD_DATE: 2018-02-27T22:54:52+ >>> RIOT: VERSION: 3.7.0-SNAPSHOT >>> RIOT: BUILD_DATE: 2018-02-27T22:54:52+ >>> TDB:VERSION: ${project.version} >>> TDB:BUILD_DATE: ${build.time.xsd} >>> >>> (the TDB bit is old junk in 3.6.0 - ignore it) >>> >>> On 01/03/18 18:05, Chris Wood wrote: >>> >>> java -jar .\target\jena_test.jar >>> The shade plugin wasn't configured to run, nor was it set to call the >>> right main class. >>> >>> When I change the configuration, I also ran: >>> >>> java -jar target/jena_test.jar >>> >>> In all cases I got: >>> >>> - >>> | t | >>> = >>> | 1 | >>> - >>> >>> So it's still a mystery to me, I'm afraid. >>> >>> Andy >>> >>> >>> org.apache.maven.plugins >>> maven-shade-plugin
Re: Fuseki errors with concurrent requests
Thanks for the tip. Out test has so much logic, testing of search results, comparing stuff etc, so we decided to build it with PHP. But from Jmeter got an idea, is it possible to save all calls to Jena (with content, headers) so that it's possible to rerun the test without our environment? On 6.3.2018 12:32, Martynas Jusevičius wrote: Maybe you can make a reproducible using JMeter or such. On Tue, Mar 6, 2018 at 11:24 AM, Mikael Pesonenwrote: Yes, clean install of Ubuntu, Jena etc. On 5.3.2018 17:40, Andy Seaborne wrote: On 05/03/18 15:04, Mikael Pesonen wrote: We are using GSP and our test script is doing ~20 json-ld inserts and sparql updates in a row ASAP, and we are running 10 test scripts concurrently. This test is failing now. Starting with an empty database? On 5.3.2018 16:51, ajs6f wrote: "fairly high load and concurrent usage" This is not a very precise or reproducible measure. Many sites use Jena in production at all kinds of scales for all kinds of dimensions, including HA setups. If you can explain more about your specific situation, you will get more useful advice. ajs6f On Mar 5, 2018, at 9:45 AM, Mikael Pesonen wrote: To be clear: can Jena be recommended for production database in our customer cases for fairly high load and concurrent usage? Or is it mainly for scientific purposes? Br On 5.3.2018 16:41, ajs6f wrote: To my knowledge (Andy of course is the TDB expert) you can't really rebuild a TDB instance from a corrupted TDB instance. You should start with a known-good backup or original RDF files. ajs6f On Mar 5, 2018, at 9:32 AM, Mikael Pesonen < mikael.peso...@lingsoft.fi> wrote: Still having these issues on all of our installations. I'm going to rule out corrupted database on our oldest server. What would be preferred way to rebuild data? Data folder: 5226102784 Mar 5 12:48 GOSP.dat 260046848 Mar 5 12:48 GOSP.idn 5377097728 Mar 5 12:48 GPOS.dat 268435456 Mar 5 12:48 GPOS.idn 5486149632 Mar 5 12:48 GSPO.dat 285212672 Mar 5 12:48 GSPO.idn 0 Mar 5 12:48 journal.jrnl 545259520 Mar 5 12:38 node2id.dat 150994944 Feb 20 16:32 node2id.idn 497658012 Mar 5 12:38 nodes.dat 1 Nov 14 15:27 none.opt 33554432 Jan 24 17:06 OSP.dat 4848615424 Mar 5 12:48 OSPG.dat 293601280 Mar 1 12:46 OSPG.idn 8388608 Jan 24 16:59 OSP.idn 25165824 Jan 24 17:06 POS.dat 4966055936 Mar 5 12:48 POSG.dat 276824064 Mar 5 12:38 POSG.idn 8388608 Jan 24 16:55 POS.idn 8388608 Jan 31 12:06 prefix2id.dat 8388608 Mar 15 2016 prefix2id.idn 6771 Jan 31 12:06 prefixes.dat 25165824 Jan 31 12:06 prefixIdx.dat 8388608 Jan 8 13:19 prefixIdx.idn 33554432 Jan 24 17:06 SPO.dat 5075107840 Mar 5 12:48 SPOG.dat 369098752 Mar 5 12:48 SPOG.idn 8388608 Jan 24 17:04 SPO.idn 4069 Nov 7 16:38 _stats.opt 4 Feb 6 12:01 tdb.lock On 30.1.2018 15:04, Andy Seaborne wrote: These seem to be different errors. "In the middle of an alloc-write" is possibly a concurrency issue. "Failed to read" is possibly a previous corrupted database This is a text dataset? That should be using an MRSW lock to get some level isolation. What's the Fuseki config in this case? Andy On 24/01/18 23:40, Chris Tomlinson wrote: On the latest 3.7.0-Snapshot (master branch) I also saw repeated occurrences of this the other day while running some queries from the fuseki browser app and with a database load going on with our own app using: DatasetAccessorFactory.createHTTP(baseUrl+"/data”); with for the first model to transfer: DatasetAccessor putModel(graphName, m); and for following models: static void addToTransferBulk(final String graphName, final Model m) { if (currentDataset == null) currentDataset = DatasetFactory.createGeneral(); currentDataset.addNamedModel(graphName, m); triplesInDataset += m.size(); if (triplesInDataset > initialLoadBulkSize) { try { loadDatasetMutex(currentDataset); currentDataset = null; triplesInDataset = 0; } catch (TimeoutException e) { e.printStackTrace(); return; } } } as I say the exceptions appeared while I was running some queries from from the fuseki browser app: [2018-01-22 16:25:02] Fuseki INFO [475] 200 OK (17.050 s) [2018-01-22 16:25:03] Fuseki INFO [477] POST http://localhost:13180/fuseki/bdrcrw [2018-01-22 16:25:03] BindingTDB ERROR get1(?lit) org.apache.jena.tdb.base.file.FileException: In the middle of an alloc-write at org.apache.jena.tdb.base.objec tfile.ObjectFileStorage.read(ObjectFileStorage.java:311) at org.apache.jena.tdb.base.objec
Re: Fuseki errors with concurrent requests
Maybe you can make a reproducible using JMeter or such. On Tue, Mar 6, 2018 at 11:24 AM, Mikael Pesonenwrote: > > Yes, clean install of Ubuntu, Jena etc. > > > > > On 5.3.2018 17:40, Andy Seaborne wrote: > >> >> >> On 05/03/18 15:04, Mikael Pesonen wrote: >> >>> >>> We are using GSP and our test script is doing ~20 json-ld inserts and >>> sparql updates in a row ASAP, and we are running 10 test scripts >>> concurrently. This test is failing now. >>> >> >> Starting with an empty database? >> >> >>> >>> On 5.3.2018 16:51, ajs6f wrote: >>> "fairly high load and concurrent usage" This is not a very precise or reproducible measure. Many sites use Jena in production at all kinds of scales for all kinds of dimensions, including HA setups. If you can explain more about your specific situation, you will get more useful advice. ajs6f On Mar 5, 2018, at 9:45 AM, Mikael Pesonen > wrote: > > > To be clear: can Jena be recommended for production database in our > customer cases for fairly high load and concurrent usage? Or is it mainly > for scientific purposes? > > Br > > On 5.3.2018 16:41, ajs6f wrote: > >> To my knowledge (Andy of course is the TDB expert) you can't really >> rebuild a TDB instance from a corrupted TDB instance. You should start >> with >> a known-good backup or original RDF files. >> >> ajs6f >> >> On Mar 5, 2018, at 9:32 AM, Mikael Pesonen < >>> mikael.peso...@lingsoft.fi> wrote: >>> >>> >>> Still having these issues on all of our installations. >>> >>> I'm going to rule out corrupted database on our oldest server. What >>> would be preferred way to rebuild data? >>> >>> Data folder: >>> >>> 5226102784 Mar 5 12:48 GOSP.dat >>>260046848 Mar 5 12:48 GOSP.idn >>> 5377097728 Mar 5 12:48 GPOS.dat >>>268435456 Mar 5 12:48 GPOS.idn >>> 5486149632 Mar 5 12:48 GSPO.dat >>>285212672 Mar 5 12:48 GSPO.idn >>>0 Mar 5 12:48 journal.jrnl >>>545259520 Mar 5 12:38 node2id.dat >>>150994944 Feb 20 16:32 node2id.idn >>>497658012 Mar 5 12:38 nodes.dat >>>1 Nov 14 15:27 none.opt >>> 33554432 Jan 24 17:06 OSP.dat >>> 4848615424 Mar 5 12:48 OSPG.dat >>>293601280 Mar 1 12:46 OSPG.idn >>> 8388608 Jan 24 16:59 OSP.idn >>> 25165824 Jan 24 17:06 POS.dat >>> 4966055936 Mar 5 12:48 POSG.dat >>>276824064 Mar 5 12:38 POSG.idn >>> 8388608 Jan 24 16:55 POS.idn >>> 8388608 Jan 31 12:06 prefix2id.dat >>> 8388608 Mar 15 2016 prefix2id.idn >>> 6771 Jan 31 12:06 prefixes.dat >>> 25165824 Jan 31 12:06 prefixIdx.dat >>> 8388608 Jan 8 13:19 prefixIdx.idn >>> 33554432 Jan 24 17:06 SPO.dat >>> 5075107840 Mar 5 12:48 SPOG.dat >>>369098752 Mar 5 12:48 SPOG.idn >>> 8388608 Jan 24 17:04 SPO.idn >>> 4069 Nov 7 16:38 _stats.opt >>>4 Feb 6 12:01 tdb.lock >>> >>> On 30.1.2018 15:04, Andy Seaborne wrote: >>> These seem to be different errors. "In the middle of an alloc-write" is possibly a concurrency issue. "Failed to read" is possibly a previous corrupted database This is a text dataset? That should be using an MRSW lock to get some level isolation. What's the Fuseki config in this case? Andy On 24/01/18 23:40, Chris Tomlinson wrote: > On the latest 3.7.0-Snapshot (master branch) I also saw repeated > occurrences of this the other day while running some queries from the > fuseki browser app and with a database load going on with our own app > using: > > DatasetAccessorFactory.createHTTP(baseUrl+"/data”); > > > with for the first model to transfer: > > DatasetAccessor putModel(graphName, m); > > and for following models: > > static void addToTransferBulk(final String graphName, final > Model m) { > if (currentDataset == null) > currentDataset = DatasetFactory.createGeneral(); > currentDataset.addNamedModel(graphName, m); > triplesInDataset += m.size(); > if (triplesInDataset > initialLoadBulkSize) { > try { > loadDatasetMutex(currentDataset); > currentDataset = null; > triplesInDataset = 0; > } catch (TimeoutException e) { > e.printStackTrace(); > return; > } >
Re: Fuseki errors with concurrent requests
Yes, clean install of Ubuntu, Jena etc. On 5.3.2018 17:40, Andy Seaborne wrote: On 05/03/18 15:04, Mikael Pesonen wrote: We are using GSP and our test script is doing ~20 json-ld inserts and sparql updates in a row ASAP, and we are running 10 test scripts concurrently. This test is failing now. Starting with an empty database? On 5.3.2018 16:51, ajs6f wrote: "fairly high load and concurrent usage" This is not a very precise or reproducible measure. Many sites use Jena in production at all kinds of scales for all kinds of dimensions, including HA setups. If you can explain more about your specific situation, you will get more useful advice. ajs6f On Mar 5, 2018, at 9:45 AM, Mikael Pesonenwrote: To be clear: can Jena be recommended for production database in our customer cases for fairly high load and concurrent usage? Or is it mainly for scientific purposes? Br On 5.3.2018 16:41, ajs6f wrote: To my knowledge (Andy of course is the TDB expert) you can't really rebuild a TDB instance from a corrupted TDB instance. You should start with a known-good backup or original RDF files. ajs6f On Mar 5, 2018, at 9:32 AM, Mikael Pesonen wrote: Still having these issues on all of our installations. I'm going to rule out corrupted database on our oldest server. What would be preferred way to rebuild data? Data folder: 5226102784 Mar 5 12:48 GOSP.dat 260046848 Mar 5 12:48 GOSP.idn 5377097728 Mar 5 12:48 GPOS.dat 268435456 Mar 5 12:48 GPOS.idn 5486149632 Mar 5 12:48 GSPO.dat 285212672 Mar 5 12:48 GSPO.idn 0 Mar 5 12:48 journal.jrnl 545259520 Mar 5 12:38 node2id.dat 150994944 Feb 20 16:32 node2id.idn 497658012 Mar 5 12:38 nodes.dat 1 Nov 14 15:27 none.opt 33554432 Jan 24 17:06 OSP.dat 4848615424 Mar 5 12:48 OSPG.dat 293601280 Mar 1 12:46 OSPG.idn 8388608 Jan 24 16:59 OSP.idn 25165824 Jan 24 17:06 POS.dat 4966055936 Mar 5 12:48 POSG.dat 276824064 Mar 5 12:38 POSG.idn 8388608 Jan 24 16:55 POS.idn 8388608 Jan 31 12:06 prefix2id.dat 8388608 Mar 15 2016 prefix2id.idn 6771 Jan 31 12:06 prefixes.dat 25165824 Jan 31 12:06 prefixIdx.dat 8388608 Jan 8 13:19 prefixIdx.idn 33554432 Jan 24 17:06 SPO.dat 5075107840 Mar 5 12:48 SPOG.dat 369098752 Mar 5 12:48 SPOG.idn 8388608 Jan 24 17:04 SPO.idn 4069 Nov 7 16:38 _stats.opt 4 Feb 6 12:01 tdb.lock On 30.1.2018 15:04, Andy Seaborne wrote: These seem to be different errors. "In the middle of an alloc-write" is possibly a concurrency issue. "Failed to read" is possibly a previous corrupted database This is a text dataset? That should be using an MRSW lock to get some level isolation. What's the Fuseki config in this case? Andy On 24/01/18 23:40, Chris Tomlinson wrote: On the latest 3.7.0-Snapshot (master branch) I also saw repeated occurrences of this the other day while running some queries from the fuseki browser app and with a database load going on with our own app using: DatasetAccessorFactory.createHTTP(baseUrl+"/data”); with for the first model to transfer: DatasetAccessor putModel(graphName, m); and for following models: static void addToTransferBulk(final String graphName, final Model m) { if (currentDataset == null) currentDataset = DatasetFactory.createGeneral(); currentDataset.addNamedModel(graphName, m); triplesInDataset += m.size(); if (triplesInDataset > initialLoadBulkSize) { try { loadDatasetMutex(currentDataset); currentDataset = null; triplesInDataset = 0; } catch (TimeoutException e) { e.printStackTrace(); return; } } } as I say the exceptions appeared while I was running some queries from from the fuseki browser app: [2018-01-22 16:25:02] Fuseki INFO [475] 200 OK (17.050 s) [2018-01-22 16:25:03] Fuseki INFO [477] POST http://localhost:13180/fuseki/bdrcrw [2018-01-22 16:25:03] BindingTDB ERROR get1(?lit) org.apache.jena.tdb.base.file.FileException: In the middle of an alloc-write at org.apache.jena.tdb.base.objectfile.ObjectFileStorage.read(ObjectFileStorage.java:311) at org.apache.jena.tdb.base.objectfile.ObjectFileWrapper.read(ObjectFileWrapper.java:57) at org.apache.jena.tdb.lib.NodeLib.fetchDecode(NodeLib.java:78) at org.apache.jena.tdb.store.nodetable.NodeTableNative.readNodeFromTable(NodeTableNative.java:186) at org.apache.jena.tdb.store.nodetable.NodeTableNative._retrieveNodeByNodeId(NodeTableNative.java:111) at org.apache.jena.tdb.store.nodetable.NodeTableNative.getNodeForNodeId(NodeTableNative.java:70) at org.apache.jena.tdb.store.nodetable.NodeTableCache._retrieveNodeByNodeId(NodeTableCache.java:128)