[jira] [Commented] (JENA-997) Remove .json from registration of RDF/JSON.

2015-07-24 Thread ASF subversion and git services (JIRA)

[ 
https://issues.apache.org/jira/browse/JENA-997?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14640129#comment-14640129
 ] 

ASF subversion and git services commented on JENA-997:
--

Commit cb6fef61a2b23c3805dc413424ae1568064aa63e in jena's branch 
refs/heads/master from [~andy.seaborne]
[ https://git-wip-us.apache.org/repos/asf?p=jena.git;h=cb6fef6 ]

JENA-997: Add companion test to general testing.

 Remove .json from registration of RDF/JSON.
 -

 Key: JENA-997
 URL: https://issues.apache.org/jira/browse/JENA-997
 Project: Apache Jena
  Issue Type: Improvement
Reporter: Andy Seaborne
Assignee: Andy Seaborne
Priority: Minor
 Fix For: Jena 3.0.0


 To avoid confusion with JSON-LD,  remove .json from registration of 
 RDF/JSON.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (JENA-998) Exception in jena-text when executing query with subject already bound

2015-07-24 Thread Andy Seaborne (JIRA)

 [ 
https://issues.apache.org/jira/browse/JENA-998?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andy Seaborne updated JENA-998:
---
Description: 
An exception results when querying with jena-text where the subject is already 
bound to a concrete value.

Example:
{code}
select *
where {
  ?s rdf:type http://example.org/Entity .
  ?s text:query ( rdfs:label test ) .
  ?s rdfs:label ?o .
}
{code}

This is caused by the fact that when the subject is concrete, the code is not 
properly checking to see if the score variable exists before trying to bind the 
score to it.

Results:
{code}
java.lang.NullPointerException
at 
org.apache.jena.sparql.engine.binding.Binding1.contains1(Binding1.java:60)
at 
org.apache.jena.sparql.engine.binding.BindingBase.contains(BindingBase.java:108)
at 
org.apache.jena.sparql.engine.binding.BindingBase.contains(BindingBase.java:112)
at 
org.apache.jena.sparql.engine.binding.BindingHashMap.checkAdd(BindingHashMap.java:109)
at 
org.apache.jena.sparql.engine.binding.BindingHashMap.add(BindingHashMap.java:91)
at 
org.apache.jena.sparql.engine.iterator.QueryIterTriplePattern$TripleMapper.insert(QueryIterTriplePattern.java:119)
at 
org.apache.jena.sparql.engine.iterator.QueryIterTriplePattern$TripleMapper.mapper(QueryIterTriplePattern.java:104)
at 
org.apache.jena.sparql.engine.iterator.QueryIterTriplePattern$TripleMapper.hasNextBinding(QueryIterTriplePattern.java:138)
at 
org.apache.jena.sparql.engine.iterator.QueryIteratorBase.hasNext(QueryIteratorBase.java:111)
at 
org.apache.jena.sparql.engine.iterator.QueryIterRepeatApply.hasNextBinding(QueryIterRepeatApply.java:74)
at 
org.apache.jena.sparql.engine.iterator.QueryIteratorBase.hasNext(QueryIteratorBase.java:111)
at 
org.apache.jena.sparql.engine.iterator.QueryIterBlockTriples.hasNextBinding(QueryIterBlockTriples.java:63)
at 
org.apache.jena.sparql.engine.iterator.QueryIteratorBase.hasNext(QueryIteratorBase.java:111)
at 
org.apache.jena.sparql.engine.iterator.QueryIteratorWrapper.hasNextBinding(QueryIteratorWrapper.java:39)
at 
org.apache.jena.sparql.engine.iterator.QueryIteratorBase.hasNext(QueryIteratorBase.java:111)
at 
org.apache.jena.sparql.engine.iterator.QueryIteratorWrapper.hasNextBinding(QueryIteratorWrapper.java:39)
at 
org.apache.jena.sparql.engine.iterator.QueryIteratorBase.hasNext(QueryIteratorBase.java:111)
at 
org.apache.jena.sparql.engine.iterator.QueryIteratorWrapper.hasNextBinding(QueryIteratorWrapper.java:39)
at 
org.apache.jena.sparql.engine.iterator.QueryIteratorBase.hasNext(QueryIteratorBase.java:111)
at 
org.apache.jena.sparql.engine.ResultSetStream.hasNext(ResultSetStream.java:74)
at 
org.apache.jena.sparql.engine.ResultSetCheckCondition.hasNext(ResultSetCheckCondition.java:59)
at org.apache.jena.atlas.iterator.Iter.reduce(Iter.java:165)
at org.apache.jena.atlas.iterator.Iter.toList(Iter.java:111)
at 
org.apache.jena.query.text.TestTextTDB.itShouldWorkWithConcreteSubject(TestTextTDB.java:199)
{code}

  was:
An exception results when querying with jena-text where the subject is already 
bound to a concrete value.

Example:
{code}
select *
where {
  ?s rdf:type http://example.org/Entity .
  ?s text:query ( rdfs:label test ) .
}
{code}

This is caused by the fact that when the subject is concrete, the code is not 
properly checking to see if the score variable exists before trying to bind the 
score to it.

Results:
{code}
java.lang.NullPointerException
at 
org.apache.jena.sparql.engine.binding.Binding1.contains1(Binding1.java:60)
at 
org.apache.jena.sparql.engine.binding.BindingBase.contains(BindingBase.java:108)
at 
org.apache.jena.sparql.engine.binding.BindingBase.contains(BindingBase.java:112)
at 
org.apache.jena.sparql.engine.binding.BindingHashMap.checkAdd(BindingHashMap.java:109)
at 
org.apache.jena.sparql.engine.binding.BindingHashMap.add(BindingHashMap.java:91)
at 
org.apache.jena.sparql.engine.iterator.QueryIterTriplePattern$TripleMapper.insert(QueryIterTriplePattern.java:119)
at 
org.apache.jena.sparql.engine.iterator.QueryIterTriplePattern$TripleMapper.mapper(QueryIterTriplePattern.java:104)
at 
org.apache.jena.sparql.engine.iterator.QueryIterTriplePattern$TripleMapper.hasNextBinding(QueryIterTriplePattern.java:138)
at 
org.apache.jena.sparql.engine.iterator.QueryIteratorBase.hasNext(QueryIteratorBase.java:111)
at 
org.apache.jena.sparql.engine.iterator.QueryIterRepeatApply.hasNextBinding(QueryIterRepeatApply.java:74)
at 
org.apache.jena.sparql.engine.iterator.QueryIteratorBase.hasNext(QueryIteratorBase.java:111)
at 
org.apache.jena.sparql.engine.iterator.QueryIterBlockTriples.hasNextBinding(QueryIterBlockTriples.java:63)
at 

[jira] [Commented] (JENA-997) Remove .json from registration of RDF/JSON.

2015-07-24 Thread Andy Seaborne (JIRA)

[ 
https://issues.apache.org/jira/browse/JENA-997?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14640139#comment-14640139
 ] 

Andy Seaborne commented on JENA-997:


(Ignore previous - wrong JIRA number in commit message)

 Remove .json from registration of RDF/JSON.
 -

 Key: JENA-997
 URL: https://issues.apache.org/jira/browse/JENA-997
 Project: Apache Jena
  Issue Type: Improvement
Reporter: Andy Seaborne
Assignee: Andy Seaborne
Priority: Minor
 Fix For: Jena 3.0.0


 To avoid confusion with JSON-LD,  remove .json from registration of 
 RDF/JSON.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (JENA-998) Exception in jena-text when executing query with subject already bound

2015-07-24 Thread Andy Seaborne (JIRA)

[ 
https://issues.apache.org/jira/browse/JENA-998?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14640132#comment-14640132
 ] 

Andy Seaborne commented on JENA-998:


Added a companion test to the general test framework - it does not illustrate 
the problem because the stacktrace is TDB-specific.

(Also adjusted the test added to match the description.)

 Exception in jena-text when executing query with subject already bound
 --

 Key: JENA-998
 URL: https://issues.apache.org/jira/browse/JENA-998
 Project: Apache Jena
  Issue Type: Bug
  Components: Text
Reporter: Stephen Allen
Assignee: Stephen Allen

 An exception results when querying with jena-text where the subject is 
 already bound to a concrete value.
 Example:
 {code}
 select *
 where {
   ?s rdf:type http://example.org/Entity .
   ?s text:query ( rdfs:label test ) .
 }
 {code}
 This is caused by the fact that when the subject is concrete, the code is not 
 properly checking to see if the score variable exists before trying to bind 
 the score to it.
 Results:
 {code}
 java.lang.NullPointerException
   at 
 org.apache.jena.sparql.engine.binding.Binding1.contains1(Binding1.java:60)
   at 
 org.apache.jena.sparql.engine.binding.BindingBase.contains(BindingBase.java:108)
   at 
 org.apache.jena.sparql.engine.binding.BindingBase.contains(BindingBase.java:112)
   at 
 org.apache.jena.sparql.engine.binding.BindingHashMap.checkAdd(BindingHashMap.java:109)
   at 
 org.apache.jena.sparql.engine.binding.BindingHashMap.add(BindingHashMap.java:91)
   at 
 org.apache.jena.sparql.engine.iterator.QueryIterTriplePattern$TripleMapper.insert(QueryIterTriplePattern.java:119)
   at 
 org.apache.jena.sparql.engine.iterator.QueryIterTriplePattern$TripleMapper.mapper(QueryIterTriplePattern.java:104)
   at 
 org.apache.jena.sparql.engine.iterator.QueryIterTriplePattern$TripleMapper.hasNextBinding(QueryIterTriplePattern.java:138)
   at 
 org.apache.jena.sparql.engine.iterator.QueryIteratorBase.hasNext(QueryIteratorBase.java:111)
   at 
 org.apache.jena.sparql.engine.iterator.QueryIterRepeatApply.hasNextBinding(QueryIterRepeatApply.java:74)
   at 
 org.apache.jena.sparql.engine.iterator.QueryIteratorBase.hasNext(QueryIteratorBase.java:111)
   at 
 org.apache.jena.sparql.engine.iterator.QueryIterBlockTriples.hasNextBinding(QueryIterBlockTriples.java:63)
   at 
 org.apache.jena.sparql.engine.iterator.QueryIteratorBase.hasNext(QueryIteratorBase.java:111)
   at 
 org.apache.jena.sparql.engine.iterator.QueryIteratorWrapper.hasNextBinding(QueryIteratorWrapper.java:39)
   at 
 org.apache.jena.sparql.engine.iterator.QueryIteratorBase.hasNext(QueryIteratorBase.java:111)
   at 
 org.apache.jena.sparql.engine.iterator.QueryIteratorWrapper.hasNextBinding(QueryIteratorWrapper.java:39)
   at 
 org.apache.jena.sparql.engine.iterator.QueryIteratorBase.hasNext(QueryIteratorBase.java:111)
   at 
 org.apache.jena.sparql.engine.iterator.QueryIteratorWrapper.hasNextBinding(QueryIteratorWrapper.java:39)
   at 
 org.apache.jena.sparql.engine.iterator.QueryIteratorBase.hasNext(QueryIteratorBase.java:111)
   at 
 org.apache.jena.sparql.engine.ResultSetStream.hasNext(ResultSetStream.java:74)
   at 
 org.apache.jena.sparql.engine.ResultSetCheckCondition.hasNext(ResultSetCheckCondition.java:59)
   at org.apache.jena.atlas.iterator.Iter.reduce(Iter.java:165)
   at org.apache.jena.atlas.iterator.Iter.toList(Iter.java:111)
   at 
 org.apache.jena.query.text.TestTextTDB.itShouldWorkWithConcreteSubject(TestTextTDB.java:199)
 {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (JENA-998) Exception in jena-text when executing query with subject already bound

2015-07-24 Thread Andy Seaborne (JIRA)

 [ 
https://issues.apache.org/jira/browse/JENA-998?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andy Seaborne updated JENA-998:
---
Description: 
An exception results when querying with jena-text where the subject is already 
bound to a concrete value.

Example:
{code}
select *
where {
  ?s rdf:type http://example.org/Entity .
  ?s text:query ( rdfs:label test ) .
}
{code}

This is caused by the fact that when the subject is concrete, the code is not 
properly checking to see if the score variable exists before trying to bind the 
score to it.

Results:
{code}
java.lang.NullPointerException
at 
org.apache.jena.sparql.engine.binding.Binding1.contains1(Binding1.java:60)
at 
org.apache.jena.sparql.engine.binding.BindingBase.contains(BindingBase.java:108)
at 
org.apache.jena.sparql.engine.binding.BindingBase.contains(BindingBase.java:112)
at 
org.apache.jena.sparql.engine.binding.BindingHashMap.checkAdd(BindingHashMap.java:109)
at 
org.apache.jena.sparql.engine.binding.BindingHashMap.add(BindingHashMap.java:91)
at 
org.apache.jena.sparql.engine.iterator.QueryIterTriplePattern$TripleMapper.insert(QueryIterTriplePattern.java:119)
at 
org.apache.jena.sparql.engine.iterator.QueryIterTriplePattern$TripleMapper.mapper(QueryIterTriplePattern.java:104)
at 
org.apache.jena.sparql.engine.iterator.QueryIterTriplePattern$TripleMapper.hasNextBinding(QueryIterTriplePattern.java:138)
at 
org.apache.jena.sparql.engine.iterator.QueryIteratorBase.hasNext(QueryIteratorBase.java:111)
at 
org.apache.jena.sparql.engine.iterator.QueryIterRepeatApply.hasNextBinding(QueryIterRepeatApply.java:74)
at 
org.apache.jena.sparql.engine.iterator.QueryIteratorBase.hasNext(QueryIteratorBase.java:111)
at 
org.apache.jena.sparql.engine.iterator.QueryIterBlockTriples.hasNextBinding(QueryIterBlockTriples.java:63)
at 
org.apache.jena.sparql.engine.iterator.QueryIteratorBase.hasNext(QueryIteratorBase.java:111)
at 
org.apache.jena.sparql.engine.iterator.QueryIteratorWrapper.hasNextBinding(QueryIteratorWrapper.java:39)
at 
org.apache.jena.sparql.engine.iterator.QueryIteratorBase.hasNext(QueryIteratorBase.java:111)
at 
org.apache.jena.sparql.engine.iterator.QueryIteratorWrapper.hasNextBinding(QueryIteratorWrapper.java:39)
at 
org.apache.jena.sparql.engine.iterator.QueryIteratorBase.hasNext(QueryIteratorBase.java:111)
at 
org.apache.jena.sparql.engine.iterator.QueryIteratorWrapper.hasNextBinding(QueryIteratorWrapper.java:39)
at 
org.apache.jena.sparql.engine.iterator.QueryIteratorBase.hasNext(QueryIteratorBase.java:111)
at 
org.apache.jena.sparql.engine.ResultSetStream.hasNext(ResultSetStream.java:74)
at 
org.apache.jena.sparql.engine.ResultSetCheckCondition.hasNext(ResultSetCheckCondition.java:59)
at org.apache.jena.atlas.iterator.Iter.reduce(Iter.java:165)
at org.apache.jena.atlas.iterator.Iter.toList(Iter.java:111)
at 
org.apache.jena.query.text.TestTextTDB.itShouldWorkWithConcreteSubject(TestTextTDB.java:199)
{code}

  was:
An exception results when querying with jena-text where the subject is already 
bound to a concrete value.

Example:
{code}
select *
where {
  ?s rdf:type http://example.org/Entity .
  ?s text:query ( rdfs:label test ) .
  ?s rdfs:label ?o .
}
{code}

This is caused by the fact that when the subject is concrete, the code is not 
properly checking to see if the score variable exists before trying to bind the 
score to it.

Results:
{code}
java.lang.NullPointerException
at 
org.apache.jena.sparql.engine.binding.Binding1.contains1(Binding1.java:60)
at 
org.apache.jena.sparql.engine.binding.BindingBase.contains(BindingBase.java:108)
at 
org.apache.jena.sparql.engine.binding.BindingBase.contains(BindingBase.java:112)
at 
org.apache.jena.sparql.engine.binding.BindingHashMap.checkAdd(BindingHashMap.java:109)
at 
org.apache.jena.sparql.engine.binding.BindingHashMap.add(BindingHashMap.java:91)
at 
org.apache.jena.sparql.engine.iterator.QueryIterTriplePattern$TripleMapper.insert(QueryIterTriplePattern.java:119)
at 
org.apache.jena.sparql.engine.iterator.QueryIterTriplePattern$TripleMapper.mapper(QueryIterTriplePattern.java:104)
at 
org.apache.jena.sparql.engine.iterator.QueryIterTriplePattern$TripleMapper.hasNextBinding(QueryIterTriplePattern.java:138)
at 
org.apache.jena.sparql.engine.iterator.QueryIteratorBase.hasNext(QueryIteratorBase.java:111)
at 
org.apache.jena.sparql.engine.iterator.QueryIterRepeatApply.hasNextBinding(QueryIterRepeatApply.java:74)
at 
org.apache.jena.sparql.engine.iterator.QueryIteratorBase.hasNext(QueryIteratorBase.java:111)
at 
org.apache.jena.sparql.engine.iterator.QueryIterBlockTriples.hasNextBinding(QueryIterBlockTriples.java:63)
at 

[jira] [Commented] (JENA-997) Remove .json from registration of RDF/JSON.

2015-07-24 Thread ASF subversion and git services (JIRA)

[ 
https://issues.apache.org/jira/browse/JENA-997?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14640136#comment-14640136
 ] 

ASF subversion and git services commented on JENA-997:
--

Commit 968f7e0a39989220263259b9e267fd718502931e in jena's branch 
refs/heads/master from [~andy.seaborne]
[ https://git-wip-us.apache.org/repos/asf?p=jena.git;h=968f7e0 ]

JENA-997: Slightly simplified test case


 Remove .json from registration of RDF/JSON.
 -

 Key: JENA-997
 URL: https://issues.apache.org/jira/browse/JENA-997
 Project: Apache Jena
  Issue Type: Improvement
Reporter: Andy Seaborne
Assignee: Andy Seaborne
Priority: Minor
 Fix For: Jena 3.0.0


 To avoid confusion with JSON-LD,  remove .json from registration of 
 RDF/JSON.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


Re: Jena3 release status

2015-07-24 Thread Andy Seaborne

Rob,

Still some problems :-(


1/ FREE_MEM

For production use, are you expecting warnings on every run when the 
file size is larger than free memory?  I have typically used tdblaoder2 
on large files way bigger than memory so a WARN each time the program 
runs is a bit excessive.


It isn't being calculated correctly - on non-darwin it calls

FREE_MEM=$(free -b)

and does no further processing but that is a long messy string.

Which figure from multi-line output from free -b is it looking for?
(what's {6} on OSX top output?)


echo $OSTYPE
==
linux-gnu



2/ The use of pv is causing terminal problems (sic).

In a gnome terminal, character echo is turned off when tdbloader2 exits.
In an emacs shell buffer, the load hangs.

Haven't found out why pv is doing this.  I wonder if two in the same 
line are causing character mix up on output leading to broken terminal 
control sequences, especially with the small file I was using.


It's a really nice feature to have to see the progress. Would it be 
safer for this release to switch off pv to give time for testing in 
different environments (other *nixes, cygwin) with a simple HAS_PV=0 ?


Andy

On 23/07/15 15:40, Rob Vesse wrote:

Comments inline:

On 23/07/2015 14:41, Andy Seaborne a...@apache.org wrote:


Trying to do a release, I came across some issues.

JENA-992: (Refactor graph/permissions interface layer)

Not sure of the status of this but I'm assuming that the code already in
'master' is releasable.

JENA-997: (tdbloader2 script refactoring)

The new scripts misbehave on Linux - there isn't one (obvious) issue.

To unblock the release, if there is a small fix, then great.  Another
possibility is to revert to the older scripts for 3.0.0, so as to fix
afterwards. This gives more time and space for testing.


Looks to be relatively simple, think I have the bugs you identified
resolved

For Case 1 I needed to look up the drive info based on the directory where
the work files will be created and not the work file itself because that
doesn't exist yet.  As part of fixing this I also made the script
resistant to errors where the drive information was unavailable

For Case 2 I was checking the directory before I had ensured it existed
and was a directory so that just required changing the order of checks

Rob



It looks to me like bash on OSX is bash 3.2 derived (3.2 was
originally 2006) whereas on Ubuntu currently it is 4.3.  There might be
other issues that arise if the current ones are resolved with bash or
other commands.

Andy









[jira] [Commented] (JENA-999) Poor jena-text query performance when a bound subject is used

2015-07-24 Thread Andy Seaborne (JIRA)

[ 
https://issues.apache.org/jira/browse/JENA-999?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14640235#comment-14640235
 ] 

Andy Seaborne commented on JENA-999:


I think Option 1 should at least be tried first - looks more like a bug than 
anything else if it's not happening and can. It makes a lot of sense for the 
case of {{text:query}} being an additional filter on the earlier pattern.  
Score is not the most common usage and also it is fairly meaningless in this 
case anyway. Indeed, maybe the code should only deal with score when the 
text:query is called without a bound subject.

I think it would be easier to get an option 2 like effect by implementing 
{{exec(QueryIterator)}}, and not the current implements {{exec(Binding)}}. 
{{exec(Binding)}} is there for most property function because it's easier to 
work with but it's a comvenience form of the general case.

Then the whole input is visible and a choice of strategy made, which can 
include one call to the text index with unknown subject.

The caching approach looks complicated in the way that memory can be used up 
and hang around (your update example, also where there is a {{text:query}} 
early and not again).

Thought: maybe we should add an explicit algebra operator for text query.


 Poor jena-text query performance when a bound subject is used
 -

 Key: JENA-999
 URL: https://issues.apache.org/jira/browse/JENA-999
 Project: Apache Jena
  Issue Type: Improvement
Reporter: Stephen Allen
Assignee: Stephen Allen
Priority: Minor

 When executing a jena-text query, the performance is terrible if the subject 
 is already bound to a variable.  This is because the current code will 
 execute a new lucene query that does not have the subject/entity bound on 
 every iteration and then iterate through the lucene results to join against 
 the subject.  This is quite inefficient.
 Example query:
 {code}
 select *
 where {
   ?s rdf:type http://example.org/Entity .
   ?s text:query ( rdfs:label test ) .
 }
 {code}
 This would be quite slow if there were a lot of entities in the system.
 Two potential solutions present themselves:
 # Craft a more explicit lucene query that specifies the entity URI, so that 
 the results coming back from lucene are much smaller.  However, this would 
 cause problems with the score not being correct across multiple iterations.  
 Additionally we are still potentially running a lot of lucene queries, each 
 of which has a probably non-negligble constant cost (parsing the query 
 string, etc).
 # Execute the more general lucene query the first time it is encountered, 
 then caching the results somewhere.  From there, we can then perform a hash 
 table lookup against those cached results.
 I would like to pursue option 2, but there is a problem.  Because jena-text 
 is implemented as a property function instead of a query op in and of itself 
 (like QueryIterMinus is for example), we have to find a place to stash the 
 lucene results.  I believe this can be done by placing it in the 
 ExecutionContext object, using the lucene query as a cache key.  Updates 
 provide a slightly troubling case because you could have an update request 
 like:
 {code}
 insert data { urn:test1 rdf:type http://example.org/Entity ; rdfs:label 
 test } ;
 delete { ?s ?p ?o }
 where { ?s rdf:type http://example.org/Entity ; text:query ( rdfs:label 
 test ) . ?p ?o . } ;
 insert data { urn:test2 rdf:type http://example.org/Entity ; rdfs:label 
 test } ;
 delete { ?s ?p ?o }
 where { ?s rdf:type http://example.org/Entity ; text:query ( rdfs:label 
 test ) ; ?p ?o . }
 {code}
 And then the end result should be an empty database.  But if the 
 ExecutionContext was the same for both delete queries, you would be using the 
 cached results from the first delete query in the second delete query, which 
 would result in {{urn:test2}} not being deleted properly.
 If the ExecutionContext is indeed shared between the two update queries in 
 the situation above, I think this can be solved by making the cache key for 
 the lucene resultset be a combination of both the lucene query and the 
 QueryIterRoot or BindingRoot.  I need to investigate this.  An alternative, 
 if there was a way to be notified when a query has finished executing, we 
 could clear the cache in the ExecutionContext.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (JENA-977) tdbloader2 script refactoring

2015-07-24 Thread Andy Seaborne (JIRA)

[ 
https://issues.apache.org/jira/browse/JENA-977?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14640311#comment-14640311
 ] 

Andy Seaborne commented on JENA-977:


Loading BSBM 5m which is less than 1G on disk.
Works on Jena 2.13.0 with tdbloader2.

{noformat}
 11:51:26 ERROR Insufficient free space on database drive /dev/sdb4, there are 
170276476 bytes free but 255030549 bytes are required
{noformat}
but {{df -h}} afterwards, without deleting intermediates shows:

{{170276476 bytes}} - the output of {{df}} on Ubuntu is in 1K blocks.

{noformat}
Filesystem 1K-blocks Used Available Use% Mounted on
/dev/sdb4  219947820 38579656 170172416  19% /home
{noformat}
{noformat}
Filesystem  Size  Used Avail Use% Mounted on
/dev/sdb4   210G   37G  163G  19% /home
{noformat}



 tdbloader2 script refactoring
 -

 Key: JENA-977
 URL: https://issues.apache.org/jira/browse/JENA-977
 Project: Apache Jena
  Issue Type: Improvement
  Components: TDB
Affects Versions: Jena 2.13.0
Reporter: Rob Vesse
Assignee: Rob Vesse
Priority: Blocker
 Fix For: Jena 2.13.1, Jena 3.0.0


 As noted on the dev list the current scripts are a little rough around the 
 edges, work items include:
 - Splitting data and index phase into separate scripts
 - Being able to restart a build from a later phase
 - Progress monitoring for the sort portion of indexing
 - Warning if sort is using a disk where you may have insufficient space
 - Better usage summaries
 - Better argument handling (avoid relying on magic environment variables 
 wherever possible)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (JENA-977) tdbloader2 script refactoring

2015-07-24 Thread ASF subversion and git services (JIRA)

[ 
https://issues.apache.org/jira/browse/JENA-977?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14640270#comment-14640270
 ] 

ASF subversion and git services commented on JENA-977:
--

Commit 8b6a36dd2703c1d6e9ed893ef9ecd1f06358d712 in jena's branch 
refs/heads/master from [~rvesse]
[ https://git-wip-us.apache.org/repos/asf?p=jena.git;h=8b6a36d ]

More fixes for JENA-977

- Disable pipe viewer usage for the time being since it hangs some
  terminals
- Tone done some warnings to just debug level messages
- Clean up output of free for Linux 3.x kernels
- Only try to use free if running a Linux kernel


 tdbloader2 script refactoring
 -

 Key: JENA-977
 URL: https://issues.apache.org/jira/browse/JENA-977
 Project: Apache Jena
  Issue Type: Improvement
  Components: TDB
Affects Versions: Jena 2.13.0
Reporter: Rob Vesse
Assignee: Rob Vesse
Priority: Blocker
 Fix For: Jena 2.13.1, Jena 3.0.0


 As noted on the dev list the current scripts are a little rough around the 
 edges, work items include:
 - Splitting data and index phase into separate scripts
 - Being able to restart a build from a later phase
 - Progress monitoring for the sort portion of indexing
 - Warning if sort is using a disk where you may have insufficient space
 - Better usage summaries
 - Better argument handling (avoid relying on magic environment variables 
 wherever possible)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (JENA-977) tdbloader2 script refactoring

2015-07-24 Thread ASF subversion and git services (JIRA)

[ 
https://issues.apache.org/jira/browse/JENA-977?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14640284#comment-14640284
 ] 

ASF subversion and git services commented on JENA-977:
--

Commit 62558f47b6aa01b332f2f2fba5f829ab9673ea28 in jena's branch 
refs/heads/master from [~rvesse]
[ https://git-wip-us.apache.org/repos/asf?p=jena.git;h=62558f4 ]

Fix typo in extracting value from free (JENA-977)

head was missing -n 1


 tdbloader2 script refactoring
 -

 Key: JENA-977
 URL: https://issues.apache.org/jira/browse/JENA-977
 Project: Apache Jena
  Issue Type: Improvement
  Components: TDB
Affects Versions: Jena 2.13.0
Reporter: Rob Vesse
Assignee: Rob Vesse
Priority: Blocker
 Fix For: Jena 2.13.1, Jena 3.0.0


 As noted on the dev list the current scripts are a little rough around the 
 edges, work items include:
 - Splitting data and index phase into separate scripts
 - Being able to restart a build from a later phase
 - Progress monitoring for the sort portion of indexing
 - Warning if sort is using a disk where you may have insufficient space
 - Better usage summaries
 - Better argument handling (avoid relying on magic environment variables 
 wherever possible)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (JENA-977) tdbloader2 script refactoring

2015-07-24 Thread Rob Vesse (JIRA)

[ 
https://issues.apache.org/jira/browse/JENA-977?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14640286#comment-14640286
 ] 

Rob Vesse commented on JENA-977:


Yep, should copy paste rather than re-typing from memory

 tdbloader2 script refactoring
 -

 Key: JENA-977
 URL: https://issues.apache.org/jira/browse/JENA-977
 Project: Apache Jena
  Issue Type: Improvement
  Components: TDB
Affects Versions: Jena 2.13.0
Reporter: Rob Vesse
Assignee: Rob Vesse
Priority: Blocker
 Fix For: Jena 2.13.1, Jena 3.0.0


 As noted on the dev list the current scripts are a little rough around the 
 edges, work items include:
 - Splitting data and index phase into separate scripts
 - Being able to restart a build from a later phase
 - Progress monitoring for the sort portion of indexing
 - Warning if sort is using a disk where you may have insufficient space
 - Better usage summaries
 - Better argument handling (avoid relying on magic environment variables 
 wherever possible)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (JENA-999) Poor jena-text query performance when a bound subject is used

2015-07-24 Thread Andy Seaborne (JIRA)

[ 
https://issues.apache.org/jira/browse/JENA-999?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14640237#comment-14640237
 ] 

Andy Seaborne commented on JENA-999:


Various join algorithms: [join 
library|https://github.com/afs/mantis/tree/master/dboe-quack/src/main/java/org/seaborne/dboe/engine/join],
 part of TDB2 work.

 Poor jena-text query performance when a bound subject is used
 -

 Key: JENA-999
 URL: https://issues.apache.org/jira/browse/JENA-999
 Project: Apache Jena
  Issue Type: Improvement
Reporter: Stephen Allen
Assignee: Stephen Allen
Priority: Minor

 When executing a jena-text query, the performance is terrible if the subject 
 is already bound to a variable.  This is because the current code will 
 execute a new lucene query that does not have the subject/entity bound on 
 every iteration and then iterate through the lucene results to join against 
 the subject.  This is quite inefficient.
 Example query:
 {code}
 select *
 where {
   ?s rdf:type http://example.org/Entity .
   ?s text:query ( rdfs:label test ) .
 }
 {code}
 This would be quite slow if there were a lot of entities in the system.
 Two potential solutions present themselves:
 # Craft a more explicit lucene query that specifies the entity URI, so that 
 the results coming back from lucene are much smaller.  However, this would 
 cause problems with the score not being correct across multiple iterations.  
 Additionally we are still potentially running a lot of lucene queries, each 
 of which has a probably non-negligble constant cost (parsing the query 
 string, etc).
 # Execute the more general lucene query the first time it is encountered, 
 then caching the results somewhere.  From there, we can then perform a hash 
 table lookup against those cached results.
 I would like to pursue option 2, but there is a problem.  Because jena-text 
 is implemented as a property function instead of a query op in and of itself 
 (like QueryIterMinus is for example), we have to find a place to stash the 
 lucene results.  I believe this can be done by placing it in the 
 ExecutionContext object, using the lucene query as a cache key.  Updates 
 provide a slightly troubling case because you could have an update request 
 like:
 {code}
 insert data { urn:test1 rdf:type http://example.org/Entity ; rdfs:label 
 test } ;
 delete { ?s ?p ?o }
 where { ?s rdf:type http://example.org/Entity ; text:query ( rdfs:label 
 test ) . ?p ?o . } ;
 insert data { urn:test2 rdf:type http://example.org/Entity ; rdfs:label 
 test } ;
 delete { ?s ?p ?o }
 where { ?s rdf:type http://example.org/Entity ; text:query ( rdfs:label 
 test ) ; ?p ?o . }
 {code}
 And then the end result should be an empty database.  But if the 
 ExecutionContext was the same for both delete queries, you would be using the 
 cached results from the first delete query in the second delete query, which 
 would result in {{urn:test2}} not being deleted properly.
 If the ExecutionContext is indeed shared between the two update queries in 
 the situation above, I think this can be solved by making the cache key for 
 the lucene resultset be a combination of both the lucene query and the 
 QueryIterRoot or BindingRoot.  I need to investigate this.  An alternative, 
 if there was a way to be notified when a query has finished executing, we 
 could clear the cache in the ExecutionContext.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (JENA-977) tdbloader2 script refactoring

2015-07-24 Thread Andy Seaborne (JIRA)

[ 
https://issues.apache.org/jira/browse/JENA-977?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14640283#comment-14640283
 ] 

Andy Seaborne commented on JENA-977:


Now get:
{noformat}
/home/afs/Release/apache-jena-3.0.0-SNAPSHOT/bin/tdbloader2index: line 350: [: 
22501806080
28475846656
34359734272: integer expression expected
{noformat}

{{echo $FREE_MEM | tail -n +2 | head | awk '{print $4}')}}

Should that be {{head -1}}?


 tdbloader2 script refactoring
 -

 Key: JENA-977
 URL: https://issues.apache.org/jira/browse/JENA-977
 Project: Apache Jena
  Issue Type: Improvement
  Components: TDB
Affects Versions: Jena 2.13.0
Reporter: Rob Vesse
Assignee: Rob Vesse
Priority: Blocker
 Fix For: Jena 2.13.1, Jena 3.0.0


 As noted on the dev list the current scripts are a little rough around the 
 edges, work items include:
 - Splitting data and index phase into separate scripts
 - Being able to restart a build from a later phase
 - Progress monitoring for the sort portion of indexing
 - Warning if sort is using a disk where you may have insufficient space
 - Better usage summaries
 - Better argument handling (avoid relying on magic environment variables 
 wherever possible)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


Re: Jena3 release status

2015-07-24 Thread Rob Vesse
Comments inline

On 24/07/2015 10:12, Andy Seaborne a...@apache.org wrote:

Rob,

Still some problems :-(


1/ FREE_MEM

For production use, are you expecting warnings on every run when the
file size is larger than free memory?  I have typically used tdblaoder2
on large files way bigger than memory so a WARN each time the program
runs is a bit excessive.

Agreed, a couple of warnings have been reduced to debug


It isn't being calculated correctly - on non-darwin it calls

FREE_MEM=$(free -b)

and does no further processing but that is a long messy string.

Which figure from multi-line output from free -b is it looking for?
(what's {6} on OSX top output?)


echo $OSTYPE
==
linux-gnu

This appears to be a 2.x vs 3.x kernel issue

When I tested this on some of our internal Linux servers (which are 2.x
kernels) free -b just returns an integer, as you note on newer (3.x I
assume) kernels it instead prints a more complex output

I have changed the logic to now check for complex output and extract the
desired value and to do an extra check for numerics with errors suppressed
and to return the not available value if not numeric

I have also restricted the functionality to just OSTYPE linux*




2/ The use of pv is causing terminal problems (sic).

In a gnome terminal, character echo is turned off when tdbloader2 exits.
In an emacs shell buffer, the load hangs.

Haven't found out why pv is doing this.  I wonder if two in the same
line are causing character mix up on output leading to broken terminal
control sequences, especially with the small file I was using.

It's a really nice feature to have to see the progress. Would it be
safer for this release to switch off pv to give time for testing in
different environments (other *nixes, cygwin) with a simple HAS_PV=0 ?

Agreed, actually it is HAS_PV=1 because I just use the return of `which
pv` to detect it so 0 is enabled and non-zero is disabled

Have removed the auto-detection and defaulted it to off but users can set
HAS_PV=0 in their environment if they known pv works reliably in their
environment

Rob


   Andy

On 23/07/15 15:40, Rob Vesse wrote:
 Comments inline:

 On 23/07/2015 14:41, Andy Seaborne a...@apache.org wrote:

 Trying to do a release, I came across some issues.

 JENA-992: (Refactor graph/permissions interface layer)

 Not sure of the status of this but I'm assuming that the code already
in
 'master' is releasable.

 JENA-997: (tdbloader2 script refactoring)

 The new scripts misbehave on Linux - there isn't one (obvious) issue.

 To unblock the release, if there is a small fix, then great.  Another
 possibility is to revert to the older scripts for 3.0.0, so as to fix
 afterwards. This gives more time and space for testing.

 Looks to be relatively simple, think I have the bugs you identified
 resolved

 For Case 1 I needed to look up the drive info based on the directory
where
 the work files will be created and not the work file itself because that
 doesn't exist yet.  As part of fixing this I also made the script
 resistant to errors where the drive information was unavailable

 For Case 2 I was checking the directory before I had ensured it existed
 and was a directory so that just required changing the order of checks

 Rob


 It looks to me like bash on OSX is bash 3.2 derived (3.2 was
 originally 2006) whereas on Ubuntu currently it is 4.3.  There might be
 other issues that arise if the current ones are resolved with bash or
 other commands.

 Andy











[jira] [Commented] (JENA-977) tdbloader2 script refactoring

2015-07-24 Thread ASF subversion and git services (JIRA)

[ 
https://issues.apache.org/jira/browse/JENA-977?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14640281#comment-14640281
 ] 

ASF subversion and git services commented on JENA-977:
--

Commit d734e8abec82cfdf4c8ca9fdb415ba095077f093 in jena's branch 
refs/heads/master from [~rvesse]
[ https://git-wip-us.apache.org/repos/asf?p=jena.git;h=d734e8a ]

Extra checking for free memory return (JENA-977)

Really verify that the extracted free value is numeric and return
unavailable if it is not


 tdbloader2 script refactoring
 -

 Key: JENA-977
 URL: https://issues.apache.org/jira/browse/JENA-977
 Project: Apache Jena
  Issue Type: Improvement
  Components: TDB
Affects Versions: Jena 2.13.0
Reporter: Rob Vesse
Assignee: Rob Vesse
Priority: Blocker
 Fix For: Jena 2.13.1, Jena 3.0.0


 As noted on the dev list the current scripts are a little rough around the 
 edges, work items include:
 - Splitting data and index phase into separate scripts
 - Being able to restart a build from a later phase
 - Progress monitoring for the sort portion of indexing
 - Warning if sort is using a disk where you may have insufficient space
 - Better usage summaries
 - Better argument handling (avoid relying on magic environment variables 
 wherever possible)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Comment Edited] (JENA-977) tdbloader2 script refactoring

2015-07-24 Thread Andy Seaborne (JIRA)

[ 
https://issues.apache.org/jira/browse/JENA-977?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14640283#comment-14640283
 ] 

Andy Seaborne edited comment on JENA-977 at 7/24/15 10:43 AM:
--

Now get:
{noformat}
/home/afs/Release/apache-jena-3.0.0-SNAPSHOT/bin/tdbloader2index: line 350: [: 
22501806080
28475846656
34359734272: integer expression expected
{noformat}

{noformat}
echo $FREE_MEM | tail -n +2 | head | awk '{print $4}')
{noformat}

Should that be {{head -1}} if its the 22501806080 the code is after?



was (Author: andy.seaborne):
Now get:
{noformat}
/home/afs/Release/apache-jena-3.0.0-SNAPSHOT/bin/tdbloader2index: line 350: [: 
22501806080
28475846656
34359734272: integer expression expected
{noformat}

{{echo $FREE_MEM | tail -n +2 | head | awk '{print $4}')}}

Should that be {{head -1}}?


 tdbloader2 script refactoring
 -

 Key: JENA-977
 URL: https://issues.apache.org/jira/browse/JENA-977
 Project: Apache Jena
  Issue Type: Improvement
  Components: TDB
Affects Versions: Jena 2.13.0
Reporter: Rob Vesse
Assignee: Rob Vesse
Priority: Blocker
 Fix For: Jena 2.13.1, Jena 3.0.0


 As noted on the dev list the current scripts are a little rough around the 
 edges, work items include:
 - Splitting data and index phase into separate scripts
 - Being able to restart a build from a later phase
 - Progress monitoring for the sort portion of indexing
 - Warning if sort is using a disk where you may have insufficient space
 - Better usage summaries
 - Better argument handling (avoid relying on magic environment variables 
 wherever possible)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Comment Edited] (JENA-977) tdbloader2 script refactoring

2015-07-24 Thread Andy Seaborne (JIRA)

[ 
https://issues.apache.org/jira/browse/JENA-977?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14640311#comment-14640311
 ] 

Andy Seaborne edited comment on JENA-977 at 7/24/15 11:08 AM:
--

Loading BSBM 5m which is less than 1G on disk. It works with  Jena 2.13.0 with 
tdbloader2.

{noformat}
 11:51:26 ERROR Insufficient free space on database drive /dev/sdb4, there are 
170276476 bytes free but 255030549 bytes are required
{noformat}

{{170276476 bytes}} - the output of {{df}} on Ubuntu is in 1K blocks.

{noformat}
Filesystem 1K-blocks Used Available Use% Mounted on
/dev/sdb4  219947820 38579656 170172416  19% /home
{noformat}
{noformat}
Filesystem  Size  Used Avail Use% Mounted on
/dev/sdb4   210G   37G  163G  19% /home
{noformat}




was (Author: andy.seaborne):
Loading BSBM 5m which is less than 1G on disk.
Works on Jena 2.13.0 with tdbloader2.

{noformat}
 11:51:26 ERROR Insufficient free space on database drive /dev/sdb4, there are 
170276476 bytes free but 255030549 bytes are required
{noformat}
but {{df -h}} afterwards, without deleting intermediates shows:

{{170276476 bytes}} - the output of {{df}} on Ubuntu is in 1K blocks.

{noformat}
Filesystem 1K-blocks Used Available Use% Mounted on
/dev/sdb4  219947820 38579656 170172416  19% /home
{noformat}
{noformat}
Filesystem  Size  Used Avail Use% Mounted on
/dev/sdb4   210G   37G  163G  19% /home
{noformat}



 tdbloader2 script refactoring
 -

 Key: JENA-977
 URL: https://issues.apache.org/jira/browse/JENA-977
 Project: Apache Jena
  Issue Type: Improvement
  Components: TDB
Affects Versions: Jena 2.13.0
Reporter: Rob Vesse
Assignee: Rob Vesse
Priority: Blocker
 Fix For: Jena 2.13.1, Jena 3.0.0


 As noted on the dev list the current scripts are a little rough around the 
 edges, work items include:
 - Splitting data and index phase into separate scripts
 - Being able to restart a build from a later phase
 - Progress monitoring for the sort portion of indexing
 - Warning if sort is using a disk where you may have insufficient space
 - Better usage summaries
 - Better argument handling (avoid relying on magic environment variables 
 wherever possible)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (JENA-977) tdbloader2 script refactoring

2015-07-24 Thread Andy Seaborne (JIRA)

[ 
https://issues.apache.org/jira/browse/JENA-977?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14640648#comment-14640648
 ] 

Andy Seaborne commented on JENA-977:


Where else is this going to matter?

The file size is compared to the free memory isn't it?  FREE_MEM is bytes? and 
compared to SIZE

On the FREE_MEM check - sort is set to use {{--buffer-size=50%}} -- does that 
matter?

 tdbloader2 script refactoring
 -

 Key: JENA-977
 URL: https://issues.apache.org/jira/browse/JENA-977
 Project: Apache Jena
  Issue Type: Improvement
  Components: TDB
Affects Versions: Jena 2.13.0
Reporter: Rob Vesse
Assignee: Rob Vesse
Priority: Blocker
 Fix For: Jena 2.13.1, Jena 3.0.0


 As noted on the dev list the current scripts are a little rough around the 
 edges, work items include:
 - Splitting data and index phase into separate scripts
 - Being able to restart a build from a later phase
 - Progress monitoring for the sort portion of indexing
 - Warning if sort is using a disk where you may have insufficient space
 - Better usage summaries
 - Better argument handling (avoid relying on magic environment variables 
 wherever possible)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


Re: Jena3 release status

2015-07-24 Thread Andy Seaborne
I've already done one trial build and if the shell script issue gets 
sorted out, and I have time, (it's already taken quiet a lot of time) 
then I hope to do the vote-able process as soon as possible, like 
tomorrow.  No promises but.


Andy

On 24/07/15 15:54, Claude Warren wrote:

I have one more piece to add to JENA-992.  I have it at home working and
will add it this evening.  Following that a clean build will close the
issue.

On Fri, Jul 24, 2015 at 11:41 AM, Rob Vesse rve...@dotnetrdf.org wrote:


Comments inline

On 24/07/2015 10:12, Andy Seaborne a...@apache.org wrote:


Rob,

Still some problems :-(


1/ FREE_MEM

For production use, are you expecting warnings on every run when the
file size is larger than free memory?  I have typically used tdblaoder2
on large files way bigger than memory so a WARN each time the program
runs is a bit excessive.


Agreed, a couple of warnings have been reduced to debug



It isn't being calculated correctly - on non-darwin it calls

FREE_MEM=$(free -b)

and does no further processing but that is a long messy string.

Which figure from multi-line output from free -b is it looking for?
(what's {6} on OSX top output?)


echo $OSTYPE
==
linux-gnu


This appears to be a 2.x vs 3.x kernel issue

When I tested this on some of our internal Linux servers (which are 2.x
kernels) free -b just returns an integer, as you note on newer (3.x I
assume) kernels it instead prints a more complex output

I have changed the logic to now check for complex output and extract the
desired value and to do an extra check for numerics with errors suppressed
and to return the not available value if not numeric

I have also restricted the functionality to just OSTYPE linux*





2/ The use of pv is causing terminal problems (sic).

In a gnome terminal, character echo is turned off when tdbloader2 exits.
In an emacs shell buffer, the load hangs.

Haven't found out why pv is doing this.  I wonder if two in the same
line are causing character mix up on output leading to broken terminal
control sequences, especially with the small file I was using.

It's a really nice feature to have to see the progress. Would it be
safer for this release to switch off pv to give time for testing in
different environments (other *nixes, cygwin) with a simple HAS_PV=0 ?


Agreed, actually it is HAS_PV=1 because I just use the return of `which
pv` to detect it so 0 is enabled and non-zero is disabled

Have removed the auto-detection and defaulted it to off but users can set
HAS_PV=0 in their environment if they known pv works reliably in their
environment

Rob



   Andy

On 23/07/15 15:40, Rob Vesse wrote:

Comments inline:

On 23/07/2015 14:41, Andy Seaborne a...@apache.org wrote:


Trying to do a release, I came across some issues.

JENA-992: (Refactor graph/permissions interface layer)

Not sure of the status of this but I'm assuming that the code already
in
'master' is releasable.

JENA-997: (tdbloader2 script refactoring)

The new scripts misbehave on Linux - there isn't one (obvious) issue.

To unblock the release, if there is a small fix, then great.  Another
possibility is to revert to the older scripts for 3.0.0, so as to fix
afterwards. This gives more time and space for testing.


Looks to be relatively simple, think I have the bugs you identified
resolved

For Case 1 I needed to look up the drive info based on the directory
where
the work files will be created and not the work file itself because that
doesn't exist yet.  As part of fixing this I also made the script
resistant to errors where the drive information was unavailable

For Case 2 I was checking the directory before I had ensured it existed
and was a directory so that just required changing the order of checks

Rob



It looks to me like bash on OSX is bash 3.2 derived (3.2 was
originally 2006) whereas on Ubuntu currently it is 4.3.  There might be
other issues that arise if the current ones are resolved with bash or
other commands.

 Andy




















Re: Jena3 release status

2015-07-24 Thread Claude Warren
I have one more piece to add to JENA-992.  I have it at home working and
will add it this evening.  Following that a clean build will close the
issue.

On Fri, Jul 24, 2015 at 11:41 AM, Rob Vesse rve...@dotnetrdf.org wrote:

 Comments inline

 On 24/07/2015 10:12, Andy Seaborne a...@apache.org wrote:

 Rob,
 
 Still some problems :-(
 
 
 1/ FREE_MEM
 
 For production use, are you expecting warnings on every run when the
 file size is larger than free memory?  I have typically used tdblaoder2
 on large files way bigger than memory so a WARN each time the program
 runs is a bit excessive.

 Agreed, a couple of warnings have been reduced to debug

 
 It isn't being calculated correctly - on non-darwin it calls
 
 FREE_MEM=$(free -b)
 
 and does no further processing but that is a long messy string.
 
 Which figure from multi-line output from free -b is it looking for?
 (what's {6} on OSX top output?)
 
 
 echo $OSTYPE
 ==
 linux-gnu

 This appears to be a 2.x vs 3.x kernel issue

 When I tested this on some of our internal Linux servers (which are 2.x
 kernels) free -b just returns an integer, as you note on newer (3.x I
 assume) kernels it instead prints a more complex output

 I have changed the logic to now check for complex output and extract the
 desired value and to do an extra check for numerics with errors suppressed
 and to return the not available value if not numeric

 I have also restricted the functionality to just OSTYPE linux*

 
 
 
 2/ The use of pv is causing terminal problems (sic).
 
 In a gnome terminal, character echo is turned off when tdbloader2 exits.
 In an emacs shell buffer, the load hangs.
 
 Haven't found out why pv is doing this.  I wonder if two in the same
 line are causing character mix up on output leading to broken terminal
 control sequences, especially with the small file I was using.
 
 It's a really nice feature to have to see the progress. Would it be
 safer for this release to switch off pv to give time for testing in
 different environments (other *nixes, cygwin) with a simple HAS_PV=0 ?

 Agreed, actually it is HAS_PV=1 because I just use the return of `which
 pv` to detect it so 0 is enabled and non-zero is disabled

 Have removed the auto-detection and defaulted it to off but users can set
 HAS_PV=0 in their environment if they known pv works reliably in their
 environment

 Rob

 
Andy
 
 On 23/07/15 15:40, Rob Vesse wrote:
  Comments inline:
 
  On 23/07/2015 14:41, Andy Seaborne a...@apache.org wrote:
 
  Trying to do a release, I came across some issues.
 
  JENA-992: (Refactor graph/permissions interface layer)
 
  Not sure of the status of this but I'm assuming that the code already
 in
  'master' is releasable.
 
  JENA-997: (tdbloader2 script refactoring)
 
  The new scripts misbehave on Linux - there isn't one (obvious) issue.
 
  To unblock the release, if there is a small fix, then great.  Another
  possibility is to revert to the older scripts for 3.0.0, so as to fix
  afterwards. This gives more time and space for testing.
 
  Looks to be relatively simple, think I have the bugs you identified
  resolved
 
  For Case 1 I needed to look up the drive info based on the directory
 where
  the work files will be created and not the work file itself because that
  doesn't exist yet.  As part of fixing this I also made the script
  resistant to errors where the drive information was unavailable
 
  For Case 2 I was checking the directory before I had ensured it existed
  and was a directory so that just required changing the order of checks
 
  Rob
 
 
  It looks to me like bash on OSX is bash 3.2 derived (3.2 was
  originally 2006) whereas on Ubuntu currently it is 4.3.  There might be
  other issues that arise if the current ones are resolved with bash or
  other commands.
 
  Andy
 
 
 
 
 







-- 
I like: Like Like - The likeliest place on the web
http://like-like.xenei.com
LinkedIn: http://www.linkedin.com/in/claudewarren


[jira] [Commented] (JENA-977) tdbloader2 script refactoring

2015-07-24 Thread Rob Vesse (JIRA)

[ 
https://issues.apache.org/jira/browse/JENA-977?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14640470#comment-14640470
 ] 

Rob Vesse commented on JENA-977:


Yes OS X has {{ls -k}}

 tdbloader2 script refactoring
 -

 Key: JENA-977
 URL: https://issues.apache.org/jira/browse/JENA-977
 Project: Apache Jena
  Issue Type: Improvement
  Components: TDB
Affects Versions: Jena 2.13.0
Reporter: Rob Vesse
Assignee: Rob Vesse
Priority: Blocker
 Fix For: Jena 2.13.1, Jena 3.0.0


 As noted on the dev list the current scripts are a little rough around the 
 edges, work items include:
 - Splitting data and index phase into separate scripts
 - Being able to restart a build from a later phase
 - Progress monitoring for the sort portion of indexing
 - Warning if sort is using a disk where you may have insufficient space
 - Better usage summaries
 - Better argument handling (avoid relying on magic environment variables 
 wherever possible)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (JENA-1000) tdbdump / tdbloader sequence corrupts rdf:type predicates

2015-07-24 Thread Donald Pellegrino (JIRA)
Donald Pellegrino created JENA-1000:
---

 Summary: tdbdump / tdbloader sequence corrupts rdf:type predicates
 Key: JENA-1000
 URL: https://issues.apache.org/jira/browse/JENA-1000
 Project: Apache Jena
  Issue Type: Bug
  Components: TDB
Affects Versions: Jena 2.13.0, Jena 2.12.1
 Environment: Tested with tdbloader and tdbloader2 versions 2.12.1 and 
2.13.0 on Windows/Sun Java 1.7.0_60 and CentOS 6.3/OpenJDK 1.8.0_25. tdbdump 
was 2.12.1 on CentOS 6.3/OpenJDK 1.8.0_25.
Reporter: Donald Pellegrino
Priority: Critical


Steps to reproduce:

1. A TDB database was exported to N-Quads with tdbdump.
2. The dump file was then imported into a new TDB database with tdbloader2.
3. Observe that all rdf:type predicates were replaced with the same randomly 
selected predicate.

Work-around:

A work-around was to run a DELETE/INSERT SPARQL command to reassign rdf:type 
predicates after the load:

DELETE { ?s custom:200501898-4-1 ?o }
INSERT { ?s rdf:type ?o }
WHERE {
  ?s custom:200501898-4-1 ?o
}

Testing:

The behavior was consistent across multiple reloads of the same dump file. 
tdbloader and tdbloader2 were both used for loads and they were run on both 
Windows and Linux with the same results.

Note that this is a Critical issue as it leads to silent corruption of user 
data.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (JENA-1000) tdbdump / tdbloader sequence corrupts rdf:type predicates

2015-07-24 Thread Andy Seaborne (JIRA)

[ 
https://issues.apache.org/jira/browse/JENA-1000?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14640961#comment-14640961
 ] 

Andy Seaborne commented on JENA-1000:
-

Please could you provide a short data file that shows the problem for you.

 tdbdump / tdbloader sequence corrupts rdf:type predicates
 -

 Key: JENA-1000
 URL: https://issues.apache.org/jira/browse/JENA-1000
 Project: Apache Jena
  Issue Type: Bug
  Components: TDB
Affects Versions: Jena 2.12.1, Jena 2.13.0
 Environment: Tested with tdbloader and tdbloader2 versions 2.12.1 and 
 2.13.0 on Windows/Sun Java 1.7.0_60 and CentOS 6.3/OpenJDK 1.8.0_25. tdbdump 
 was 2.12.1 on CentOS 6.3/OpenJDK 1.8.0_25.
Reporter: Donald Pellegrino
Priority: Critical

 Steps to reproduce:
 1. A TDB database was exported to N-Quads with tdbdump.
 2. The dump file was then imported into a new TDB database with tdbloader2.
 3. Observe that all rdf:type predicates were replaced with the same randomly 
 selected predicate.
 Work-around:
 A work-around was to run a DELETE/INSERT SPARQL command to reassign rdf:type 
 predicates after the load:
 DELETE { ?s custom:200501898-4-1 ?o }
 INSERT { ?s rdf:type ?o }
 WHERE {
   ?s custom:200501898-4-1 ?o
 }
 Testing:
 The behavior was consistent across multiple reloads of the same dump file. 
 tdbloader and tdbloader2 were both used for loads and they were run on both 
 Windows and Linux with the same results.
 Note that this is a Critical issue as it leads to silent corruption of user 
 data.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Comment Edited] (JENA-1000) tdbdump / tdbloader sequence corrupts rdf:type predicates

2015-07-24 Thread Andy Seaborne (JIRA)

[ 
https://issues.apache.org/jira/browse/JENA-1000?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14640961#comment-14640961
 ] 

Andy Seaborne edited comment on JENA-1000 at 7/24/15 7:31 PM:
--

Please could you provide a short data file that shows the problem for you.

Does the n-quads dump show the problem?


was (Author: andy.seaborne):
Please could you provide a short data file that shows the problem for you.

 tdbdump / tdbloader sequence corrupts rdf:type predicates
 -

 Key: JENA-1000
 URL: https://issues.apache.org/jira/browse/JENA-1000
 Project: Apache Jena
  Issue Type: Bug
  Components: TDB
Affects Versions: Jena 2.12.1, Jena 2.13.0
 Environment: Tested with tdbloader and tdbloader2 versions 2.12.1 and 
 2.13.0 on Windows/Sun Java 1.7.0_60 and CentOS 6.3/OpenJDK 1.8.0_25. tdbdump 
 was 2.12.1 on CentOS 6.3/OpenJDK 1.8.0_25.
Reporter: Donald Pellegrino
Priority: Critical

 Steps to reproduce:
 1. A TDB database was exported to N-Quads with tdbdump.
 2. The dump file was then imported into a new TDB database with tdbloader2.
 3. Observe that all rdf:type predicates were replaced with the same randomly 
 selected predicate.
 Work-around:
 A work-around was to run a DELETE/INSERT SPARQL command to reassign rdf:type 
 predicates after the load:
 DELETE { ?s custom:200501898-4-1 ?o }
 INSERT { ?s rdf:type ?o }
 WHERE {
   ?s custom:200501898-4-1 ?o
 }
 Testing:
 The behavior was consistent across multiple reloads of the same dump file. 
 tdbloader and tdbloader2 were both used for loads and they were run on both 
 Windows and Linux with the same results.
 Note that this is a Critical issue as it leads to silent corruption of user 
 data.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (JENA-977) tdbloader2 script refactoring

2015-07-24 Thread Andy Seaborne (JIRA)

[ 
https://issues.apache.org/jira/browse/JENA-977?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14640432#comment-14640432
 ] 

Andy Seaborne commented on JENA-977:


getSize and getDriveInfo need to be in step as to units.

OSX outputs df is in 512 blocks so the getDriveInfo calculation may be wrong on 
Macs as well.  The manual implies ls -l output is in bytes.

df -k (1 k block size) seems to exist on mac, ubuntu, openBSD, but I can't 
check other linux or *nix's 

ls does have a -k flag where I am.

Is -k the right way to go?


 tdbloader2 script refactoring
 -

 Key: JENA-977
 URL: https://issues.apache.org/jira/browse/JENA-977
 Project: Apache Jena
  Issue Type: Improvement
  Components: TDB
Affects Versions: Jena 2.13.0
Reporter: Rob Vesse
Assignee: Rob Vesse
Priority: Blocker
 Fix For: Jena 2.13.1, Jena 3.0.0


 As noted on the dev list the current scripts are a little rough around the 
 edges, work items include:
 - Splitting data and index phase into separate scripts
 - Being able to restart a build from a later phase
 - Progress monitoring for the sort portion of indexing
 - Warning if sort is using a disk where you may have insufficient space
 - Better usage summaries
 - Better argument handling (avoid relying on magic environment variables 
 wherever possible)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)