[jira] [Commented] (JENA-997) Remove .json from registration of RDF/JSON.
[ https://issues.apache.org/jira/browse/JENA-997?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14640129#comment-14640129 ] ASF subversion and git services commented on JENA-997: -- Commit cb6fef61a2b23c3805dc413424ae1568064aa63e in jena's branch refs/heads/master from [~andy.seaborne] [ https://git-wip-us.apache.org/repos/asf?p=jena.git;h=cb6fef6 ] JENA-997: Add companion test to general testing. Remove .json from registration of RDF/JSON. - Key: JENA-997 URL: https://issues.apache.org/jira/browse/JENA-997 Project: Apache Jena Issue Type: Improvement Reporter: Andy Seaborne Assignee: Andy Seaborne Priority: Minor Fix For: Jena 3.0.0 To avoid confusion with JSON-LD, remove .json from registration of RDF/JSON. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (JENA-998) Exception in jena-text when executing query with subject already bound
[ https://issues.apache.org/jira/browse/JENA-998?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Andy Seaborne updated JENA-998: --- Description: An exception results when querying with jena-text where the subject is already bound to a concrete value. Example: {code} select * where { ?s rdf:type http://example.org/Entity . ?s text:query ( rdfs:label test ) . ?s rdfs:label ?o . } {code} This is caused by the fact that when the subject is concrete, the code is not properly checking to see if the score variable exists before trying to bind the score to it. Results: {code} java.lang.NullPointerException at org.apache.jena.sparql.engine.binding.Binding1.contains1(Binding1.java:60) at org.apache.jena.sparql.engine.binding.BindingBase.contains(BindingBase.java:108) at org.apache.jena.sparql.engine.binding.BindingBase.contains(BindingBase.java:112) at org.apache.jena.sparql.engine.binding.BindingHashMap.checkAdd(BindingHashMap.java:109) at org.apache.jena.sparql.engine.binding.BindingHashMap.add(BindingHashMap.java:91) at org.apache.jena.sparql.engine.iterator.QueryIterTriplePattern$TripleMapper.insert(QueryIterTriplePattern.java:119) at org.apache.jena.sparql.engine.iterator.QueryIterTriplePattern$TripleMapper.mapper(QueryIterTriplePattern.java:104) at org.apache.jena.sparql.engine.iterator.QueryIterTriplePattern$TripleMapper.hasNextBinding(QueryIterTriplePattern.java:138) at org.apache.jena.sparql.engine.iterator.QueryIteratorBase.hasNext(QueryIteratorBase.java:111) at org.apache.jena.sparql.engine.iterator.QueryIterRepeatApply.hasNextBinding(QueryIterRepeatApply.java:74) at org.apache.jena.sparql.engine.iterator.QueryIteratorBase.hasNext(QueryIteratorBase.java:111) at org.apache.jena.sparql.engine.iterator.QueryIterBlockTriples.hasNextBinding(QueryIterBlockTriples.java:63) at org.apache.jena.sparql.engine.iterator.QueryIteratorBase.hasNext(QueryIteratorBase.java:111) at org.apache.jena.sparql.engine.iterator.QueryIteratorWrapper.hasNextBinding(QueryIteratorWrapper.java:39) at org.apache.jena.sparql.engine.iterator.QueryIteratorBase.hasNext(QueryIteratorBase.java:111) at org.apache.jena.sparql.engine.iterator.QueryIteratorWrapper.hasNextBinding(QueryIteratorWrapper.java:39) at org.apache.jena.sparql.engine.iterator.QueryIteratorBase.hasNext(QueryIteratorBase.java:111) at org.apache.jena.sparql.engine.iterator.QueryIteratorWrapper.hasNextBinding(QueryIteratorWrapper.java:39) at org.apache.jena.sparql.engine.iterator.QueryIteratorBase.hasNext(QueryIteratorBase.java:111) at org.apache.jena.sparql.engine.ResultSetStream.hasNext(ResultSetStream.java:74) at org.apache.jena.sparql.engine.ResultSetCheckCondition.hasNext(ResultSetCheckCondition.java:59) at org.apache.jena.atlas.iterator.Iter.reduce(Iter.java:165) at org.apache.jena.atlas.iterator.Iter.toList(Iter.java:111) at org.apache.jena.query.text.TestTextTDB.itShouldWorkWithConcreteSubject(TestTextTDB.java:199) {code} was: An exception results when querying with jena-text where the subject is already bound to a concrete value. Example: {code} select * where { ?s rdf:type http://example.org/Entity . ?s text:query ( rdfs:label test ) . } {code} This is caused by the fact that when the subject is concrete, the code is not properly checking to see if the score variable exists before trying to bind the score to it. Results: {code} java.lang.NullPointerException at org.apache.jena.sparql.engine.binding.Binding1.contains1(Binding1.java:60) at org.apache.jena.sparql.engine.binding.BindingBase.contains(BindingBase.java:108) at org.apache.jena.sparql.engine.binding.BindingBase.contains(BindingBase.java:112) at org.apache.jena.sparql.engine.binding.BindingHashMap.checkAdd(BindingHashMap.java:109) at org.apache.jena.sparql.engine.binding.BindingHashMap.add(BindingHashMap.java:91) at org.apache.jena.sparql.engine.iterator.QueryIterTriplePattern$TripleMapper.insert(QueryIterTriplePattern.java:119) at org.apache.jena.sparql.engine.iterator.QueryIterTriplePattern$TripleMapper.mapper(QueryIterTriplePattern.java:104) at org.apache.jena.sparql.engine.iterator.QueryIterTriplePattern$TripleMapper.hasNextBinding(QueryIterTriplePattern.java:138) at org.apache.jena.sparql.engine.iterator.QueryIteratorBase.hasNext(QueryIteratorBase.java:111) at org.apache.jena.sparql.engine.iterator.QueryIterRepeatApply.hasNextBinding(QueryIterRepeatApply.java:74) at org.apache.jena.sparql.engine.iterator.QueryIteratorBase.hasNext(QueryIteratorBase.java:111) at org.apache.jena.sparql.engine.iterator.QueryIterBlockTriples.hasNextBinding(QueryIterBlockTriples.java:63) at
[jira] [Commented] (JENA-997) Remove .json from registration of RDF/JSON.
[ https://issues.apache.org/jira/browse/JENA-997?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14640139#comment-14640139 ] Andy Seaborne commented on JENA-997: (Ignore previous - wrong JIRA number in commit message) Remove .json from registration of RDF/JSON. - Key: JENA-997 URL: https://issues.apache.org/jira/browse/JENA-997 Project: Apache Jena Issue Type: Improvement Reporter: Andy Seaborne Assignee: Andy Seaborne Priority: Minor Fix For: Jena 3.0.0 To avoid confusion with JSON-LD, remove .json from registration of RDF/JSON. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (JENA-998) Exception in jena-text when executing query with subject already bound
[ https://issues.apache.org/jira/browse/JENA-998?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14640132#comment-14640132 ] Andy Seaborne commented on JENA-998: Added a companion test to the general test framework - it does not illustrate the problem because the stacktrace is TDB-specific. (Also adjusted the test added to match the description.) Exception in jena-text when executing query with subject already bound -- Key: JENA-998 URL: https://issues.apache.org/jira/browse/JENA-998 Project: Apache Jena Issue Type: Bug Components: Text Reporter: Stephen Allen Assignee: Stephen Allen An exception results when querying with jena-text where the subject is already bound to a concrete value. Example: {code} select * where { ?s rdf:type http://example.org/Entity . ?s text:query ( rdfs:label test ) . } {code} This is caused by the fact that when the subject is concrete, the code is not properly checking to see if the score variable exists before trying to bind the score to it. Results: {code} java.lang.NullPointerException at org.apache.jena.sparql.engine.binding.Binding1.contains1(Binding1.java:60) at org.apache.jena.sparql.engine.binding.BindingBase.contains(BindingBase.java:108) at org.apache.jena.sparql.engine.binding.BindingBase.contains(BindingBase.java:112) at org.apache.jena.sparql.engine.binding.BindingHashMap.checkAdd(BindingHashMap.java:109) at org.apache.jena.sparql.engine.binding.BindingHashMap.add(BindingHashMap.java:91) at org.apache.jena.sparql.engine.iterator.QueryIterTriplePattern$TripleMapper.insert(QueryIterTriplePattern.java:119) at org.apache.jena.sparql.engine.iterator.QueryIterTriplePattern$TripleMapper.mapper(QueryIterTriplePattern.java:104) at org.apache.jena.sparql.engine.iterator.QueryIterTriplePattern$TripleMapper.hasNextBinding(QueryIterTriplePattern.java:138) at org.apache.jena.sparql.engine.iterator.QueryIteratorBase.hasNext(QueryIteratorBase.java:111) at org.apache.jena.sparql.engine.iterator.QueryIterRepeatApply.hasNextBinding(QueryIterRepeatApply.java:74) at org.apache.jena.sparql.engine.iterator.QueryIteratorBase.hasNext(QueryIteratorBase.java:111) at org.apache.jena.sparql.engine.iterator.QueryIterBlockTriples.hasNextBinding(QueryIterBlockTriples.java:63) at org.apache.jena.sparql.engine.iterator.QueryIteratorBase.hasNext(QueryIteratorBase.java:111) at org.apache.jena.sparql.engine.iterator.QueryIteratorWrapper.hasNextBinding(QueryIteratorWrapper.java:39) at org.apache.jena.sparql.engine.iterator.QueryIteratorBase.hasNext(QueryIteratorBase.java:111) at org.apache.jena.sparql.engine.iterator.QueryIteratorWrapper.hasNextBinding(QueryIteratorWrapper.java:39) at org.apache.jena.sparql.engine.iterator.QueryIteratorBase.hasNext(QueryIteratorBase.java:111) at org.apache.jena.sparql.engine.iterator.QueryIteratorWrapper.hasNextBinding(QueryIteratorWrapper.java:39) at org.apache.jena.sparql.engine.iterator.QueryIteratorBase.hasNext(QueryIteratorBase.java:111) at org.apache.jena.sparql.engine.ResultSetStream.hasNext(ResultSetStream.java:74) at org.apache.jena.sparql.engine.ResultSetCheckCondition.hasNext(ResultSetCheckCondition.java:59) at org.apache.jena.atlas.iterator.Iter.reduce(Iter.java:165) at org.apache.jena.atlas.iterator.Iter.toList(Iter.java:111) at org.apache.jena.query.text.TestTextTDB.itShouldWorkWithConcreteSubject(TestTextTDB.java:199) {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (JENA-998) Exception in jena-text when executing query with subject already bound
[ https://issues.apache.org/jira/browse/JENA-998?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Andy Seaborne updated JENA-998: --- Description: An exception results when querying with jena-text where the subject is already bound to a concrete value. Example: {code} select * where { ?s rdf:type http://example.org/Entity . ?s text:query ( rdfs:label test ) . } {code} This is caused by the fact that when the subject is concrete, the code is not properly checking to see if the score variable exists before trying to bind the score to it. Results: {code} java.lang.NullPointerException at org.apache.jena.sparql.engine.binding.Binding1.contains1(Binding1.java:60) at org.apache.jena.sparql.engine.binding.BindingBase.contains(BindingBase.java:108) at org.apache.jena.sparql.engine.binding.BindingBase.contains(BindingBase.java:112) at org.apache.jena.sparql.engine.binding.BindingHashMap.checkAdd(BindingHashMap.java:109) at org.apache.jena.sparql.engine.binding.BindingHashMap.add(BindingHashMap.java:91) at org.apache.jena.sparql.engine.iterator.QueryIterTriplePattern$TripleMapper.insert(QueryIterTriplePattern.java:119) at org.apache.jena.sparql.engine.iterator.QueryIterTriplePattern$TripleMapper.mapper(QueryIterTriplePattern.java:104) at org.apache.jena.sparql.engine.iterator.QueryIterTriplePattern$TripleMapper.hasNextBinding(QueryIterTriplePattern.java:138) at org.apache.jena.sparql.engine.iterator.QueryIteratorBase.hasNext(QueryIteratorBase.java:111) at org.apache.jena.sparql.engine.iterator.QueryIterRepeatApply.hasNextBinding(QueryIterRepeatApply.java:74) at org.apache.jena.sparql.engine.iterator.QueryIteratorBase.hasNext(QueryIteratorBase.java:111) at org.apache.jena.sparql.engine.iterator.QueryIterBlockTriples.hasNextBinding(QueryIterBlockTriples.java:63) at org.apache.jena.sparql.engine.iterator.QueryIteratorBase.hasNext(QueryIteratorBase.java:111) at org.apache.jena.sparql.engine.iterator.QueryIteratorWrapper.hasNextBinding(QueryIteratorWrapper.java:39) at org.apache.jena.sparql.engine.iterator.QueryIteratorBase.hasNext(QueryIteratorBase.java:111) at org.apache.jena.sparql.engine.iterator.QueryIteratorWrapper.hasNextBinding(QueryIteratorWrapper.java:39) at org.apache.jena.sparql.engine.iterator.QueryIteratorBase.hasNext(QueryIteratorBase.java:111) at org.apache.jena.sparql.engine.iterator.QueryIteratorWrapper.hasNextBinding(QueryIteratorWrapper.java:39) at org.apache.jena.sparql.engine.iterator.QueryIteratorBase.hasNext(QueryIteratorBase.java:111) at org.apache.jena.sparql.engine.ResultSetStream.hasNext(ResultSetStream.java:74) at org.apache.jena.sparql.engine.ResultSetCheckCondition.hasNext(ResultSetCheckCondition.java:59) at org.apache.jena.atlas.iterator.Iter.reduce(Iter.java:165) at org.apache.jena.atlas.iterator.Iter.toList(Iter.java:111) at org.apache.jena.query.text.TestTextTDB.itShouldWorkWithConcreteSubject(TestTextTDB.java:199) {code} was: An exception results when querying with jena-text where the subject is already bound to a concrete value. Example: {code} select * where { ?s rdf:type http://example.org/Entity . ?s text:query ( rdfs:label test ) . ?s rdfs:label ?o . } {code} This is caused by the fact that when the subject is concrete, the code is not properly checking to see if the score variable exists before trying to bind the score to it. Results: {code} java.lang.NullPointerException at org.apache.jena.sparql.engine.binding.Binding1.contains1(Binding1.java:60) at org.apache.jena.sparql.engine.binding.BindingBase.contains(BindingBase.java:108) at org.apache.jena.sparql.engine.binding.BindingBase.contains(BindingBase.java:112) at org.apache.jena.sparql.engine.binding.BindingHashMap.checkAdd(BindingHashMap.java:109) at org.apache.jena.sparql.engine.binding.BindingHashMap.add(BindingHashMap.java:91) at org.apache.jena.sparql.engine.iterator.QueryIterTriplePattern$TripleMapper.insert(QueryIterTriplePattern.java:119) at org.apache.jena.sparql.engine.iterator.QueryIterTriplePattern$TripleMapper.mapper(QueryIterTriplePattern.java:104) at org.apache.jena.sparql.engine.iterator.QueryIterTriplePattern$TripleMapper.hasNextBinding(QueryIterTriplePattern.java:138) at org.apache.jena.sparql.engine.iterator.QueryIteratorBase.hasNext(QueryIteratorBase.java:111) at org.apache.jena.sparql.engine.iterator.QueryIterRepeatApply.hasNextBinding(QueryIterRepeatApply.java:74) at org.apache.jena.sparql.engine.iterator.QueryIteratorBase.hasNext(QueryIteratorBase.java:111) at org.apache.jena.sparql.engine.iterator.QueryIterBlockTriples.hasNextBinding(QueryIterBlockTriples.java:63) at
[jira] [Commented] (JENA-997) Remove .json from registration of RDF/JSON.
[ https://issues.apache.org/jira/browse/JENA-997?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14640136#comment-14640136 ] ASF subversion and git services commented on JENA-997: -- Commit 968f7e0a39989220263259b9e267fd718502931e in jena's branch refs/heads/master from [~andy.seaborne] [ https://git-wip-us.apache.org/repos/asf?p=jena.git;h=968f7e0 ] JENA-997: Slightly simplified test case Remove .json from registration of RDF/JSON. - Key: JENA-997 URL: https://issues.apache.org/jira/browse/JENA-997 Project: Apache Jena Issue Type: Improvement Reporter: Andy Seaborne Assignee: Andy Seaborne Priority: Minor Fix For: Jena 3.0.0 To avoid confusion with JSON-LD, remove .json from registration of RDF/JSON. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
Re: Jena3 release status
Rob, Still some problems :-( 1/ FREE_MEM For production use, are you expecting warnings on every run when the file size is larger than free memory? I have typically used tdblaoder2 on large files way bigger than memory so a WARN each time the program runs is a bit excessive. It isn't being calculated correctly - on non-darwin it calls FREE_MEM=$(free -b) and does no further processing but that is a long messy string. Which figure from multi-line output from free -b is it looking for? (what's {6} on OSX top output?) echo $OSTYPE == linux-gnu 2/ The use of pv is causing terminal problems (sic). In a gnome terminal, character echo is turned off when tdbloader2 exits. In an emacs shell buffer, the load hangs. Haven't found out why pv is doing this. I wonder if two in the same line are causing character mix up on output leading to broken terminal control sequences, especially with the small file I was using. It's a really nice feature to have to see the progress. Would it be safer for this release to switch off pv to give time for testing in different environments (other *nixes, cygwin) with a simple HAS_PV=0 ? Andy On 23/07/15 15:40, Rob Vesse wrote: Comments inline: On 23/07/2015 14:41, Andy Seaborne a...@apache.org wrote: Trying to do a release, I came across some issues. JENA-992: (Refactor graph/permissions interface layer) Not sure of the status of this but I'm assuming that the code already in 'master' is releasable. JENA-997: (tdbloader2 script refactoring) The new scripts misbehave on Linux - there isn't one (obvious) issue. To unblock the release, if there is a small fix, then great. Another possibility is to revert to the older scripts for 3.0.0, so as to fix afterwards. This gives more time and space for testing. Looks to be relatively simple, think I have the bugs you identified resolved For Case 1 I needed to look up the drive info based on the directory where the work files will be created and not the work file itself because that doesn't exist yet. As part of fixing this I also made the script resistant to errors where the drive information was unavailable For Case 2 I was checking the directory before I had ensured it existed and was a directory so that just required changing the order of checks Rob It looks to me like bash on OSX is bash 3.2 derived (3.2 was originally 2006) whereas on Ubuntu currently it is 4.3. There might be other issues that arise if the current ones are resolved with bash or other commands. Andy
[jira] [Commented] (JENA-999) Poor jena-text query performance when a bound subject is used
[ https://issues.apache.org/jira/browse/JENA-999?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14640235#comment-14640235 ] Andy Seaborne commented on JENA-999: I think Option 1 should at least be tried first - looks more like a bug than anything else if it's not happening and can. It makes a lot of sense for the case of {{text:query}} being an additional filter on the earlier pattern. Score is not the most common usage and also it is fairly meaningless in this case anyway. Indeed, maybe the code should only deal with score when the text:query is called without a bound subject. I think it would be easier to get an option 2 like effect by implementing {{exec(QueryIterator)}}, and not the current implements {{exec(Binding)}}. {{exec(Binding)}} is there for most property function because it's easier to work with but it's a comvenience form of the general case. Then the whole input is visible and a choice of strategy made, which can include one call to the text index with unknown subject. The caching approach looks complicated in the way that memory can be used up and hang around (your update example, also where there is a {{text:query}} early and not again). Thought: maybe we should add an explicit algebra operator for text query. Poor jena-text query performance when a bound subject is used - Key: JENA-999 URL: https://issues.apache.org/jira/browse/JENA-999 Project: Apache Jena Issue Type: Improvement Reporter: Stephen Allen Assignee: Stephen Allen Priority: Minor When executing a jena-text query, the performance is terrible if the subject is already bound to a variable. This is because the current code will execute a new lucene query that does not have the subject/entity bound on every iteration and then iterate through the lucene results to join against the subject. This is quite inefficient. Example query: {code} select * where { ?s rdf:type http://example.org/Entity . ?s text:query ( rdfs:label test ) . } {code} This would be quite slow if there were a lot of entities in the system. Two potential solutions present themselves: # Craft a more explicit lucene query that specifies the entity URI, so that the results coming back from lucene are much smaller. However, this would cause problems with the score not being correct across multiple iterations. Additionally we are still potentially running a lot of lucene queries, each of which has a probably non-negligble constant cost (parsing the query string, etc). # Execute the more general lucene query the first time it is encountered, then caching the results somewhere. From there, we can then perform a hash table lookup against those cached results. I would like to pursue option 2, but there is a problem. Because jena-text is implemented as a property function instead of a query op in and of itself (like QueryIterMinus is for example), we have to find a place to stash the lucene results. I believe this can be done by placing it in the ExecutionContext object, using the lucene query as a cache key. Updates provide a slightly troubling case because you could have an update request like: {code} insert data { urn:test1 rdf:type http://example.org/Entity ; rdfs:label test } ; delete { ?s ?p ?o } where { ?s rdf:type http://example.org/Entity ; text:query ( rdfs:label test ) . ?p ?o . } ; insert data { urn:test2 rdf:type http://example.org/Entity ; rdfs:label test } ; delete { ?s ?p ?o } where { ?s rdf:type http://example.org/Entity ; text:query ( rdfs:label test ) ; ?p ?o . } {code} And then the end result should be an empty database. But if the ExecutionContext was the same for both delete queries, you would be using the cached results from the first delete query in the second delete query, which would result in {{urn:test2}} not being deleted properly. If the ExecutionContext is indeed shared between the two update queries in the situation above, I think this can be solved by making the cache key for the lucene resultset be a combination of both the lucene query and the QueryIterRoot or BindingRoot. I need to investigate this. An alternative, if there was a way to be notified when a query has finished executing, we could clear the cache in the ExecutionContext. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (JENA-977) tdbloader2 script refactoring
[ https://issues.apache.org/jira/browse/JENA-977?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14640311#comment-14640311 ] Andy Seaborne commented on JENA-977: Loading BSBM 5m which is less than 1G on disk. Works on Jena 2.13.0 with tdbloader2. {noformat} 11:51:26 ERROR Insufficient free space on database drive /dev/sdb4, there are 170276476 bytes free but 255030549 bytes are required {noformat} but {{df -h}} afterwards, without deleting intermediates shows: {{170276476 bytes}} - the output of {{df}} on Ubuntu is in 1K blocks. {noformat} Filesystem 1K-blocks Used Available Use% Mounted on /dev/sdb4 219947820 38579656 170172416 19% /home {noformat} {noformat} Filesystem Size Used Avail Use% Mounted on /dev/sdb4 210G 37G 163G 19% /home {noformat} tdbloader2 script refactoring - Key: JENA-977 URL: https://issues.apache.org/jira/browse/JENA-977 Project: Apache Jena Issue Type: Improvement Components: TDB Affects Versions: Jena 2.13.0 Reporter: Rob Vesse Assignee: Rob Vesse Priority: Blocker Fix For: Jena 2.13.1, Jena 3.0.0 As noted on the dev list the current scripts are a little rough around the edges, work items include: - Splitting data and index phase into separate scripts - Being able to restart a build from a later phase - Progress monitoring for the sort portion of indexing - Warning if sort is using a disk where you may have insufficient space - Better usage summaries - Better argument handling (avoid relying on magic environment variables wherever possible) -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (JENA-977) tdbloader2 script refactoring
[ https://issues.apache.org/jira/browse/JENA-977?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14640270#comment-14640270 ] ASF subversion and git services commented on JENA-977: -- Commit 8b6a36dd2703c1d6e9ed893ef9ecd1f06358d712 in jena's branch refs/heads/master from [~rvesse] [ https://git-wip-us.apache.org/repos/asf?p=jena.git;h=8b6a36d ] More fixes for JENA-977 - Disable pipe viewer usage for the time being since it hangs some terminals - Tone done some warnings to just debug level messages - Clean up output of free for Linux 3.x kernels - Only try to use free if running a Linux kernel tdbloader2 script refactoring - Key: JENA-977 URL: https://issues.apache.org/jira/browse/JENA-977 Project: Apache Jena Issue Type: Improvement Components: TDB Affects Versions: Jena 2.13.0 Reporter: Rob Vesse Assignee: Rob Vesse Priority: Blocker Fix For: Jena 2.13.1, Jena 3.0.0 As noted on the dev list the current scripts are a little rough around the edges, work items include: - Splitting data and index phase into separate scripts - Being able to restart a build from a later phase - Progress monitoring for the sort portion of indexing - Warning if sort is using a disk where you may have insufficient space - Better usage summaries - Better argument handling (avoid relying on magic environment variables wherever possible) -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (JENA-977) tdbloader2 script refactoring
[ https://issues.apache.org/jira/browse/JENA-977?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14640284#comment-14640284 ] ASF subversion and git services commented on JENA-977: -- Commit 62558f47b6aa01b332f2f2fba5f829ab9673ea28 in jena's branch refs/heads/master from [~rvesse] [ https://git-wip-us.apache.org/repos/asf?p=jena.git;h=62558f4 ] Fix typo in extracting value from free (JENA-977) head was missing -n 1 tdbloader2 script refactoring - Key: JENA-977 URL: https://issues.apache.org/jira/browse/JENA-977 Project: Apache Jena Issue Type: Improvement Components: TDB Affects Versions: Jena 2.13.0 Reporter: Rob Vesse Assignee: Rob Vesse Priority: Blocker Fix For: Jena 2.13.1, Jena 3.0.0 As noted on the dev list the current scripts are a little rough around the edges, work items include: - Splitting data and index phase into separate scripts - Being able to restart a build from a later phase - Progress monitoring for the sort portion of indexing - Warning if sort is using a disk where you may have insufficient space - Better usage summaries - Better argument handling (avoid relying on magic environment variables wherever possible) -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (JENA-977) tdbloader2 script refactoring
[ https://issues.apache.org/jira/browse/JENA-977?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14640286#comment-14640286 ] Rob Vesse commented on JENA-977: Yep, should copy paste rather than re-typing from memory tdbloader2 script refactoring - Key: JENA-977 URL: https://issues.apache.org/jira/browse/JENA-977 Project: Apache Jena Issue Type: Improvement Components: TDB Affects Versions: Jena 2.13.0 Reporter: Rob Vesse Assignee: Rob Vesse Priority: Blocker Fix For: Jena 2.13.1, Jena 3.0.0 As noted on the dev list the current scripts are a little rough around the edges, work items include: - Splitting data and index phase into separate scripts - Being able to restart a build from a later phase - Progress monitoring for the sort portion of indexing - Warning if sort is using a disk where you may have insufficient space - Better usage summaries - Better argument handling (avoid relying on magic environment variables wherever possible) -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (JENA-999) Poor jena-text query performance when a bound subject is used
[ https://issues.apache.org/jira/browse/JENA-999?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14640237#comment-14640237 ] Andy Seaborne commented on JENA-999: Various join algorithms: [join library|https://github.com/afs/mantis/tree/master/dboe-quack/src/main/java/org/seaborne/dboe/engine/join], part of TDB2 work. Poor jena-text query performance when a bound subject is used - Key: JENA-999 URL: https://issues.apache.org/jira/browse/JENA-999 Project: Apache Jena Issue Type: Improvement Reporter: Stephen Allen Assignee: Stephen Allen Priority: Minor When executing a jena-text query, the performance is terrible if the subject is already bound to a variable. This is because the current code will execute a new lucene query that does not have the subject/entity bound on every iteration and then iterate through the lucene results to join against the subject. This is quite inefficient. Example query: {code} select * where { ?s rdf:type http://example.org/Entity . ?s text:query ( rdfs:label test ) . } {code} This would be quite slow if there were a lot of entities in the system. Two potential solutions present themselves: # Craft a more explicit lucene query that specifies the entity URI, so that the results coming back from lucene are much smaller. However, this would cause problems with the score not being correct across multiple iterations. Additionally we are still potentially running a lot of lucene queries, each of which has a probably non-negligble constant cost (parsing the query string, etc). # Execute the more general lucene query the first time it is encountered, then caching the results somewhere. From there, we can then perform a hash table lookup against those cached results. I would like to pursue option 2, but there is a problem. Because jena-text is implemented as a property function instead of a query op in and of itself (like QueryIterMinus is for example), we have to find a place to stash the lucene results. I believe this can be done by placing it in the ExecutionContext object, using the lucene query as a cache key. Updates provide a slightly troubling case because you could have an update request like: {code} insert data { urn:test1 rdf:type http://example.org/Entity ; rdfs:label test } ; delete { ?s ?p ?o } where { ?s rdf:type http://example.org/Entity ; text:query ( rdfs:label test ) . ?p ?o . } ; insert data { urn:test2 rdf:type http://example.org/Entity ; rdfs:label test } ; delete { ?s ?p ?o } where { ?s rdf:type http://example.org/Entity ; text:query ( rdfs:label test ) ; ?p ?o . } {code} And then the end result should be an empty database. But if the ExecutionContext was the same for both delete queries, you would be using the cached results from the first delete query in the second delete query, which would result in {{urn:test2}} not being deleted properly. If the ExecutionContext is indeed shared between the two update queries in the situation above, I think this can be solved by making the cache key for the lucene resultset be a combination of both the lucene query and the QueryIterRoot or BindingRoot. I need to investigate this. An alternative, if there was a way to be notified when a query has finished executing, we could clear the cache in the ExecutionContext. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (JENA-977) tdbloader2 script refactoring
[ https://issues.apache.org/jira/browse/JENA-977?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14640283#comment-14640283 ] Andy Seaborne commented on JENA-977: Now get: {noformat} /home/afs/Release/apache-jena-3.0.0-SNAPSHOT/bin/tdbloader2index: line 350: [: 22501806080 28475846656 34359734272: integer expression expected {noformat} {{echo $FREE_MEM | tail -n +2 | head | awk '{print $4}')}} Should that be {{head -1}}? tdbloader2 script refactoring - Key: JENA-977 URL: https://issues.apache.org/jira/browse/JENA-977 Project: Apache Jena Issue Type: Improvement Components: TDB Affects Versions: Jena 2.13.0 Reporter: Rob Vesse Assignee: Rob Vesse Priority: Blocker Fix For: Jena 2.13.1, Jena 3.0.0 As noted on the dev list the current scripts are a little rough around the edges, work items include: - Splitting data and index phase into separate scripts - Being able to restart a build from a later phase - Progress monitoring for the sort portion of indexing - Warning if sort is using a disk where you may have insufficient space - Better usage summaries - Better argument handling (avoid relying on magic environment variables wherever possible) -- This message was sent by Atlassian JIRA (v6.3.4#6332)
Re: Jena3 release status
Comments inline On 24/07/2015 10:12, Andy Seaborne a...@apache.org wrote: Rob, Still some problems :-( 1/ FREE_MEM For production use, are you expecting warnings on every run when the file size is larger than free memory? I have typically used tdblaoder2 on large files way bigger than memory so a WARN each time the program runs is a bit excessive. Agreed, a couple of warnings have been reduced to debug It isn't being calculated correctly - on non-darwin it calls FREE_MEM=$(free -b) and does no further processing but that is a long messy string. Which figure from multi-line output from free -b is it looking for? (what's {6} on OSX top output?) echo $OSTYPE == linux-gnu This appears to be a 2.x vs 3.x kernel issue When I tested this on some of our internal Linux servers (which are 2.x kernels) free -b just returns an integer, as you note on newer (3.x I assume) kernels it instead prints a more complex output I have changed the logic to now check for complex output and extract the desired value and to do an extra check for numerics with errors suppressed and to return the not available value if not numeric I have also restricted the functionality to just OSTYPE linux* 2/ The use of pv is causing terminal problems (sic). In a gnome terminal, character echo is turned off when tdbloader2 exits. In an emacs shell buffer, the load hangs. Haven't found out why pv is doing this. I wonder if two in the same line are causing character mix up on output leading to broken terminal control sequences, especially with the small file I was using. It's a really nice feature to have to see the progress. Would it be safer for this release to switch off pv to give time for testing in different environments (other *nixes, cygwin) with a simple HAS_PV=0 ? Agreed, actually it is HAS_PV=1 because I just use the return of `which pv` to detect it so 0 is enabled and non-zero is disabled Have removed the auto-detection and defaulted it to off but users can set HAS_PV=0 in their environment if they known pv works reliably in their environment Rob Andy On 23/07/15 15:40, Rob Vesse wrote: Comments inline: On 23/07/2015 14:41, Andy Seaborne a...@apache.org wrote: Trying to do a release, I came across some issues. JENA-992: (Refactor graph/permissions interface layer) Not sure of the status of this but I'm assuming that the code already in 'master' is releasable. JENA-997: (tdbloader2 script refactoring) The new scripts misbehave on Linux - there isn't one (obvious) issue. To unblock the release, if there is a small fix, then great. Another possibility is to revert to the older scripts for 3.0.0, so as to fix afterwards. This gives more time and space for testing. Looks to be relatively simple, think I have the bugs you identified resolved For Case 1 I needed to look up the drive info based on the directory where the work files will be created and not the work file itself because that doesn't exist yet. As part of fixing this I also made the script resistant to errors where the drive information was unavailable For Case 2 I was checking the directory before I had ensured it existed and was a directory so that just required changing the order of checks Rob It looks to me like bash on OSX is bash 3.2 derived (3.2 was originally 2006) whereas on Ubuntu currently it is 4.3. There might be other issues that arise if the current ones are resolved with bash or other commands. Andy
[jira] [Commented] (JENA-977) tdbloader2 script refactoring
[ https://issues.apache.org/jira/browse/JENA-977?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14640281#comment-14640281 ] ASF subversion and git services commented on JENA-977: -- Commit d734e8abec82cfdf4c8ca9fdb415ba095077f093 in jena's branch refs/heads/master from [~rvesse] [ https://git-wip-us.apache.org/repos/asf?p=jena.git;h=d734e8a ] Extra checking for free memory return (JENA-977) Really verify that the extracted free value is numeric and return unavailable if it is not tdbloader2 script refactoring - Key: JENA-977 URL: https://issues.apache.org/jira/browse/JENA-977 Project: Apache Jena Issue Type: Improvement Components: TDB Affects Versions: Jena 2.13.0 Reporter: Rob Vesse Assignee: Rob Vesse Priority: Blocker Fix For: Jena 2.13.1, Jena 3.0.0 As noted on the dev list the current scripts are a little rough around the edges, work items include: - Splitting data and index phase into separate scripts - Being able to restart a build from a later phase - Progress monitoring for the sort portion of indexing - Warning if sort is using a disk where you may have insufficient space - Better usage summaries - Better argument handling (avoid relying on magic environment variables wherever possible) -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Comment Edited] (JENA-977) tdbloader2 script refactoring
[ https://issues.apache.org/jira/browse/JENA-977?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14640283#comment-14640283 ] Andy Seaborne edited comment on JENA-977 at 7/24/15 10:43 AM: -- Now get: {noformat} /home/afs/Release/apache-jena-3.0.0-SNAPSHOT/bin/tdbloader2index: line 350: [: 22501806080 28475846656 34359734272: integer expression expected {noformat} {noformat} echo $FREE_MEM | tail -n +2 | head | awk '{print $4}') {noformat} Should that be {{head -1}} if its the 22501806080 the code is after? was (Author: andy.seaborne): Now get: {noformat} /home/afs/Release/apache-jena-3.0.0-SNAPSHOT/bin/tdbloader2index: line 350: [: 22501806080 28475846656 34359734272: integer expression expected {noformat} {{echo $FREE_MEM | tail -n +2 | head | awk '{print $4}')}} Should that be {{head -1}}? tdbloader2 script refactoring - Key: JENA-977 URL: https://issues.apache.org/jira/browse/JENA-977 Project: Apache Jena Issue Type: Improvement Components: TDB Affects Versions: Jena 2.13.0 Reporter: Rob Vesse Assignee: Rob Vesse Priority: Blocker Fix For: Jena 2.13.1, Jena 3.0.0 As noted on the dev list the current scripts are a little rough around the edges, work items include: - Splitting data and index phase into separate scripts - Being able to restart a build from a later phase - Progress monitoring for the sort portion of indexing - Warning if sort is using a disk where you may have insufficient space - Better usage summaries - Better argument handling (avoid relying on magic environment variables wherever possible) -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Comment Edited] (JENA-977) tdbloader2 script refactoring
[ https://issues.apache.org/jira/browse/JENA-977?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14640311#comment-14640311 ] Andy Seaborne edited comment on JENA-977 at 7/24/15 11:08 AM: -- Loading BSBM 5m which is less than 1G on disk. It works with Jena 2.13.0 with tdbloader2. {noformat} 11:51:26 ERROR Insufficient free space on database drive /dev/sdb4, there are 170276476 bytes free but 255030549 bytes are required {noformat} {{170276476 bytes}} - the output of {{df}} on Ubuntu is in 1K blocks. {noformat} Filesystem 1K-blocks Used Available Use% Mounted on /dev/sdb4 219947820 38579656 170172416 19% /home {noformat} {noformat} Filesystem Size Used Avail Use% Mounted on /dev/sdb4 210G 37G 163G 19% /home {noformat} was (Author: andy.seaborne): Loading BSBM 5m which is less than 1G on disk. Works on Jena 2.13.0 with tdbloader2. {noformat} 11:51:26 ERROR Insufficient free space on database drive /dev/sdb4, there are 170276476 bytes free but 255030549 bytes are required {noformat} but {{df -h}} afterwards, without deleting intermediates shows: {{170276476 bytes}} - the output of {{df}} on Ubuntu is in 1K blocks. {noformat} Filesystem 1K-blocks Used Available Use% Mounted on /dev/sdb4 219947820 38579656 170172416 19% /home {noformat} {noformat} Filesystem Size Used Avail Use% Mounted on /dev/sdb4 210G 37G 163G 19% /home {noformat} tdbloader2 script refactoring - Key: JENA-977 URL: https://issues.apache.org/jira/browse/JENA-977 Project: Apache Jena Issue Type: Improvement Components: TDB Affects Versions: Jena 2.13.0 Reporter: Rob Vesse Assignee: Rob Vesse Priority: Blocker Fix For: Jena 2.13.1, Jena 3.0.0 As noted on the dev list the current scripts are a little rough around the edges, work items include: - Splitting data and index phase into separate scripts - Being able to restart a build from a later phase - Progress monitoring for the sort portion of indexing - Warning if sort is using a disk where you may have insufficient space - Better usage summaries - Better argument handling (avoid relying on magic environment variables wherever possible) -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (JENA-977) tdbloader2 script refactoring
[ https://issues.apache.org/jira/browse/JENA-977?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14640648#comment-14640648 ] Andy Seaborne commented on JENA-977: Where else is this going to matter? The file size is compared to the free memory isn't it? FREE_MEM is bytes? and compared to SIZE On the FREE_MEM check - sort is set to use {{--buffer-size=50%}} -- does that matter? tdbloader2 script refactoring - Key: JENA-977 URL: https://issues.apache.org/jira/browse/JENA-977 Project: Apache Jena Issue Type: Improvement Components: TDB Affects Versions: Jena 2.13.0 Reporter: Rob Vesse Assignee: Rob Vesse Priority: Blocker Fix For: Jena 2.13.1, Jena 3.0.0 As noted on the dev list the current scripts are a little rough around the edges, work items include: - Splitting data and index phase into separate scripts - Being able to restart a build from a later phase - Progress monitoring for the sort portion of indexing - Warning if sort is using a disk where you may have insufficient space - Better usage summaries - Better argument handling (avoid relying on magic environment variables wherever possible) -- This message was sent by Atlassian JIRA (v6.3.4#6332)
Re: Jena3 release status
I've already done one trial build and if the shell script issue gets sorted out, and I have time, (it's already taken quiet a lot of time) then I hope to do the vote-able process as soon as possible, like tomorrow. No promises but. Andy On 24/07/15 15:54, Claude Warren wrote: I have one more piece to add to JENA-992. I have it at home working and will add it this evening. Following that a clean build will close the issue. On Fri, Jul 24, 2015 at 11:41 AM, Rob Vesse rve...@dotnetrdf.org wrote: Comments inline On 24/07/2015 10:12, Andy Seaborne a...@apache.org wrote: Rob, Still some problems :-( 1/ FREE_MEM For production use, are you expecting warnings on every run when the file size is larger than free memory? I have typically used tdblaoder2 on large files way bigger than memory so a WARN each time the program runs is a bit excessive. Agreed, a couple of warnings have been reduced to debug It isn't being calculated correctly - on non-darwin it calls FREE_MEM=$(free -b) and does no further processing but that is a long messy string. Which figure from multi-line output from free -b is it looking for? (what's {6} on OSX top output?) echo $OSTYPE == linux-gnu This appears to be a 2.x vs 3.x kernel issue When I tested this on some of our internal Linux servers (which are 2.x kernels) free -b just returns an integer, as you note on newer (3.x I assume) kernels it instead prints a more complex output I have changed the logic to now check for complex output and extract the desired value and to do an extra check for numerics with errors suppressed and to return the not available value if not numeric I have also restricted the functionality to just OSTYPE linux* 2/ The use of pv is causing terminal problems (sic). In a gnome terminal, character echo is turned off when tdbloader2 exits. In an emacs shell buffer, the load hangs. Haven't found out why pv is doing this. I wonder if two in the same line are causing character mix up on output leading to broken terminal control sequences, especially with the small file I was using. It's a really nice feature to have to see the progress. Would it be safer for this release to switch off pv to give time for testing in different environments (other *nixes, cygwin) with a simple HAS_PV=0 ? Agreed, actually it is HAS_PV=1 because I just use the return of `which pv` to detect it so 0 is enabled and non-zero is disabled Have removed the auto-detection and defaulted it to off but users can set HAS_PV=0 in their environment if they known pv works reliably in their environment Rob Andy On 23/07/15 15:40, Rob Vesse wrote: Comments inline: On 23/07/2015 14:41, Andy Seaborne a...@apache.org wrote: Trying to do a release, I came across some issues. JENA-992: (Refactor graph/permissions interface layer) Not sure of the status of this but I'm assuming that the code already in 'master' is releasable. JENA-997: (tdbloader2 script refactoring) The new scripts misbehave on Linux - there isn't one (obvious) issue. To unblock the release, if there is a small fix, then great. Another possibility is to revert to the older scripts for 3.0.0, so as to fix afterwards. This gives more time and space for testing. Looks to be relatively simple, think I have the bugs you identified resolved For Case 1 I needed to look up the drive info based on the directory where the work files will be created and not the work file itself because that doesn't exist yet. As part of fixing this I also made the script resistant to errors where the drive information was unavailable For Case 2 I was checking the directory before I had ensured it existed and was a directory so that just required changing the order of checks Rob It looks to me like bash on OSX is bash 3.2 derived (3.2 was originally 2006) whereas on Ubuntu currently it is 4.3. There might be other issues that arise if the current ones are resolved with bash or other commands. Andy
Re: Jena3 release status
I have one more piece to add to JENA-992. I have it at home working and will add it this evening. Following that a clean build will close the issue. On Fri, Jul 24, 2015 at 11:41 AM, Rob Vesse rve...@dotnetrdf.org wrote: Comments inline On 24/07/2015 10:12, Andy Seaborne a...@apache.org wrote: Rob, Still some problems :-( 1/ FREE_MEM For production use, are you expecting warnings on every run when the file size is larger than free memory? I have typically used tdblaoder2 on large files way bigger than memory so a WARN each time the program runs is a bit excessive. Agreed, a couple of warnings have been reduced to debug It isn't being calculated correctly - on non-darwin it calls FREE_MEM=$(free -b) and does no further processing but that is a long messy string. Which figure from multi-line output from free -b is it looking for? (what's {6} on OSX top output?) echo $OSTYPE == linux-gnu This appears to be a 2.x vs 3.x kernel issue When I tested this on some of our internal Linux servers (which are 2.x kernels) free -b just returns an integer, as you note on newer (3.x I assume) kernels it instead prints a more complex output I have changed the logic to now check for complex output and extract the desired value and to do an extra check for numerics with errors suppressed and to return the not available value if not numeric I have also restricted the functionality to just OSTYPE linux* 2/ The use of pv is causing terminal problems (sic). In a gnome terminal, character echo is turned off when tdbloader2 exits. In an emacs shell buffer, the load hangs. Haven't found out why pv is doing this. I wonder if two in the same line are causing character mix up on output leading to broken terminal control sequences, especially with the small file I was using. It's a really nice feature to have to see the progress. Would it be safer for this release to switch off pv to give time for testing in different environments (other *nixes, cygwin) with a simple HAS_PV=0 ? Agreed, actually it is HAS_PV=1 because I just use the return of `which pv` to detect it so 0 is enabled and non-zero is disabled Have removed the auto-detection and defaulted it to off but users can set HAS_PV=0 in their environment if they known pv works reliably in their environment Rob Andy On 23/07/15 15:40, Rob Vesse wrote: Comments inline: On 23/07/2015 14:41, Andy Seaborne a...@apache.org wrote: Trying to do a release, I came across some issues. JENA-992: (Refactor graph/permissions interface layer) Not sure of the status of this but I'm assuming that the code already in 'master' is releasable. JENA-997: (tdbloader2 script refactoring) The new scripts misbehave on Linux - there isn't one (obvious) issue. To unblock the release, if there is a small fix, then great. Another possibility is to revert to the older scripts for 3.0.0, so as to fix afterwards. This gives more time and space for testing. Looks to be relatively simple, think I have the bugs you identified resolved For Case 1 I needed to look up the drive info based on the directory where the work files will be created and not the work file itself because that doesn't exist yet. As part of fixing this I also made the script resistant to errors where the drive information was unavailable For Case 2 I was checking the directory before I had ensured it existed and was a directory so that just required changing the order of checks Rob It looks to me like bash on OSX is bash 3.2 derived (3.2 was originally 2006) whereas on Ubuntu currently it is 4.3. There might be other issues that arise if the current ones are resolved with bash or other commands. Andy -- I like: Like Like - The likeliest place on the web http://like-like.xenei.com LinkedIn: http://www.linkedin.com/in/claudewarren
[jira] [Commented] (JENA-977) tdbloader2 script refactoring
[ https://issues.apache.org/jira/browse/JENA-977?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14640470#comment-14640470 ] Rob Vesse commented on JENA-977: Yes OS X has {{ls -k}} tdbloader2 script refactoring - Key: JENA-977 URL: https://issues.apache.org/jira/browse/JENA-977 Project: Apache Jena Issue Type: Improvement Components: TDB Affects Versions: Jena 2.13.0 Reporter: Rob Vesse Assignee: Rob Vesse Priority: Blocker Fix For: Jena 2.13.1, Jena 3.0.0 As noted on the dev list the current scripts are a little rough around the edges, work items include: - Splitting data and index phase into separate scripts - Being able to restart a build from a later phase - Progress monitoring for the sort portion of indexing - Warning if sort is using a disk where you may have insufficient space - Better usage summaries - Better argument handling (avoid relying on magic environment variables wherever possible) -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (JENA-1000) tdbdump / tdbloader sequence corrupts rdf:type predicates
Donald Pellegrino created JENA-1000: --- Summary: tdbdump / tdbloader sequence corrupts rdf:type predicates Key: JENA-1000 URL: https://issues.apache.org/jira/browse/JENA-1000 Project: Apache Jena Issue Type: Bug Components: TDB Affects Versions: Jena 2.13.0, Jena 2.12.1 Environment: Tested with tdbloader and tdbloader2 versions 2.12.1 and 2.13.0 on Windows/Sun Java 1.7.0_60 and CentOS 6.3/OpenJDK 1.8.0_25. tdbdump was 2.12.1 on CentOS 6.3/OpenJDK 1.8.0_25. Reporter: Donald Pellegrino Priority: Critical Steps to reproduce: 1. A TDB database was exported to N-Quads with tdbdump. 2. The dump file was then imported into a new TDB database with tdbloader2. 3. Observe that all rdf:type predicates were replaced with the same randomly selected predicate. Work-around: A work-around was to run a DELETE/INSERT SPARQL command to reassign rdf:type predicates after the load: DELETE { ?s custom:200501898-4-1 ?o } INSERT { ?s rdf:type ?o } WHERE { ?s custom:200501898-4-1 ?o } Testing: The behavior was consistent across multiple reloads of the same dump file. tdbloader and tdbloader2 were both used for loads and they were run on both Windows and Linux with the same results. Note that this is a Critical issue as it leads to silent corruption of user data. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (JENA-1000) tdbdump / tdbloader sequence corrupts rdf:type predicates
[ https://issues.apache.org/jira/browse/JENA-1000?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14640961#comment-14640961 ] Andy Seaborne commented on JENA-1000: - Please could you provide a short data file that shows the problem for you. tdbdump / tdbloader sequence corrupts rdf:type predicates - Key: JENA-1000 URL: https://issues.apache.org/jira/browse/JENA-1000 Project: Apache Jena Issue Type: Bug Components: TDB Affects Versions: Jena 2.12.1, Jena 2.13.0 Environment: Tested with tdbloader and tdbloader2 versions 2.12.1 and 2.13.0 on Windows/Sun Java 1.7.0_60 and CentOS 6.3/OpenJDK 1.8.0_25. tdbdump was 2.12.1 on CentOS 6.3/OpenJDK 1.8.0_25. Reporter: Donald Pellegrino Priority: Critical Steps to reproduce: 1. A TDB database was exported to N-Quads with tdbdump. 2. The dump file was then imported into a new TDB database with tdbloader2. 3. Observe that all rdf:type predicates were replaced with the same randomly selected predicate. Work-around: A work-around was to run a DELETE/INSERT SPARQL command to reassign rdf:type predicates after the load: DELETE { ?s custom:200501898-4-1 ?o } INSERT { ?s rdf:type ?o } WHERE { ?s custom:200501898-4-1 ?o } Testing: The behavior was consistent across multiple reloads of the same dump file. tdbloader and tdbloader2 were both used for loads and they were run on both Windows and Linux with the same results. Note that this is a Critical issue as it leads to silent corruption of user data. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Comment Edited] (JENA-1000) tdbdump / tdbloader sequence corrupts rdf:type predicates
[ https://issues.apache.org/jira/browse/JENA-1000?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14640961#comment-14640961 ] Andy Seaborne edited comment on JENA-1000 at 7/24/15 7:31 PM: -- Please could you provide a short data file that shows the problem for you. Does the n-quads dump show the problem? was (Author: andy.seaborne): Please could you provide a short data file that shows the problem for you. tdbdump / tdbloader sequence corrupts rdf:type predicates - Key: JENA-1000 URL: https://issues.apache.org/jira/browse/JENA-1000 Project: Apache Jena Issue Type: Bug Components: TDB Affects Versions: Jena 2.12.1, Jena 2.13.0 Environment: Tested with tdbloader and tdbloader2 versions 2.12.1 and 2.13.0 on Windows/Sun Java 1.7.0_60 and CentOS 6.3/OpenJDK 1.8.0_25. tdbdump was 2.12.1 on CentOS 6.3/OpenJDK 1.8.0_25. Reporter: Donald Pellegrino Priority: Critical Steps to reproduce: 1. A TDB database was exported to N-Quads with tdbdump. 2. The dump file was then imported into a new TDB database with tdbloader2. 3. Observe that all rdf:type predicates were replaced with the same randomly selected predicate. Work-around: A work-around was to run a DELETE/INSERT SPARQL command to reassign rdf:type predicates after the load: DELETE { ?s custom:200501898-4-1 ?o } INSERT { ?s rdf:type ?o } WHERE { ?s custom:200501898-4-1 ?o } Testing: The behavior was consistent across multiple reloads of the same dump file. tdbloader and tdbloader2 were both used for loads and they were run on both Windows and Linux with the same results. Note that this is a Critical issue as it leads to silent corruption of user data. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (JENA-977) tdbloader2 script refactoring
[ https://issues.apache.org/jira/browse/JENA-977?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14640432#comment-14640432 ] Andy Seaborne commented on JENA-977: getSize and getDriveInfo need to be in step as to units. OSX outputs df is in 512 blocks so the getDriveInfo calculation may be wrong on Macs as well. The manual implies ls -l output is in bytes. df -k (1 k block size) seems to exist on mac, ubuntu, openBSD, but I can't check other linux or *nix's ls does have a -k flag where I am. Is -k the right way to go? tdbloader2 script refactoring - Key: JENA-977 URL: https://issues.apache.org/jira/browse/JENA-977 Project: Apache Jena Issue Type: Improvement Components: TDB Affects Versions: Jena 2.13.0 Reporter: Rob Vesse Assignee: Rob Vesse Priority: Blocker Fix For: Jena 2.13.1, Jena 3.0.0 As noted on the dev list the current scripts are a little rough around the edges, work items include: - Splitting data and index phase into separate scripts - Being able to restart a build from a later phase - Progress monitoring for the sort portion of indexing - Warning if sort is using a disk where you may have insufficient space - Better usage summaries - Better argument handling (avoid relying on magic environment variables wherever possible) -- This message was sent by Atlassian JIRA (v6.3.4#6332)