[jira] [Resolved] (JENA-990) rename the UpdateDeniedException

2015-07-23 Thread Andy Seaborne (JIRA)

 [ 
https://issues.apache.org/jira/browse/JENA-990?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andy Seaborne resolved JENA-990.

   Resolution: Fixed
Fix Version/s: Jena 3.0.0

>  rename the UpdateDeniedException
> -
>
> Key: JENA-990
> URL: https://issues.apache.org/jira/browse/JENA-990
> Project: Apache Jena
>  Issue Type: Improvement
>  Components: Core
>Affects Versions: Jena 3.0.0
>Reporter: Claude Warren
>Assignee: Claude Warren
>Priority: Minor
> Fix For: Jena 3.0.0
>
>
> As noted in a discussion on the dev list between myself and Andy this update 
> is to rename the current UpdateDeniedException to AccessDeniedException and 
> extend it from a newly created OperationDeniedException.
> AddDeniedException and DeleteDeniedException will extend 
> AccessDeniedException.
> jena-permissions will extend AccessDeniedException to create:
> ReadDeniedException -- for read restrictions
> UpdateDeniedException -- for update restrictions (modifying triples that 
> already exists as opposed to adding new triples)
> This will allow Fuskei to properly respond to the case where jena-permissions 
> is in place and there are update restrictions in place.  Currently Fuseki 
> returns this as a 500 error.  Once we have a common permission denied 
> exception we can return either authentication required or access denied as 
> appropriate.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Comment Edited] (JENA-990) rename the UpdateDeniedException

2015-07-23 Thread Andy Seaborne (JIRA)

[ 
https://issues.apache.org/jira/browse/JENA-990?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14635850#comment-14635850
 ] 

Andy Seaborne edited comment on JENA-990 at 7/23/15 8:34 AM:
-

This seems done. Can we resolve it and unblock the jena3 release?


was (Author: andy.seaborne):
This seems done. Can we resolve it an unblock the jena3 release?

>  rename the UpdateDeniedException
> -
>
> Key: JENA-990
> URL: https://issues.apache.org/jira/browse/JENA-990
> Project: Apache Jena
>  Issue Type: Improvement
>  Components: Core
>Affects Versions: Jena 3.0.0
>Reporter: Claude Warren
>Assignee: Claude Warren
>Priority: Minor
> Fix For: Jena 3.0.0
>
>
> As noted in a discussion on the dev list between myself and Andy this update 
> is to rename the current UpdateDeniedException to AccessDeniedException and 
> extend it from a newly created OperationDeniedException.
> AddDeniedException and DeleteDeniedException will extend 
> AccessDeniedException.
> jena-permissions will extend AccessDeniedException to create:
> ReadDeniedException -- for read restrictions
> UpdateDeniedException -- for update restrictions (modifying triples that 
> already exists as opposed to adding new triples)
> This will allow Fuskei to properly respond to the case where jena-permissions 
> is in place and there are update restrictions in place.  Currently Fuseki 
> returns this as a 500 error.  Once we have a common permission denied 
> exception we can return either authentication required or access denied as 
> appropriate.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (JENA-996) riot should recognize invalid URIs in large jsonld files

2015-07-23 Thread ASF subversion and git services (JIRA)

[ 
https://issues.apache.org/jira/browse/JENA-996?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14638528#comment-14638528
 ] 

ASF subversion and git services commented on JENA-996:
--

Commit 9d2b415ff1aab3936610d187e254e35f2a6b48a4 in jena's branch 
refs/heads/master from [~andy.seaborne]
[ https://git-wip-us.apache.org/repos/asf?p=jena.git;h=9d2b415 ]

JENA-996: Errors and warnings from commands to stderr


> riot should recognize invalid URIs in large jsonld files
> 
>
> Key: JENA-996
> URL: https://issues.apache.org/jira/browse/JENA-996
> Project: Apache Jena
>  Issue Type: Bug
>  Components: RIOT
>Affects Versions: Jena 2.13.0
>Reporter: Joachim Neubert
> Attachments: defect_doi.jsonld
>
>
> With riot --validate, in large jsonld files URIs including whitespace are not 
> flagged, the files seems to be valid.
> However, when loading this with Fusekis tdbloader, loading aborts with
> ERROR [line: 31572453, col: 1 ] Broken token (newline): Public consulting 
> services for smaller comme03:06:01 WARN  riot :: Bad IRI: 
>  Code: 17/WHITESPACE in PATH: A single 
> whitespace character. These match no grammar rules of URIs/IRIs. These 
> characters are permitted in RDF URI References, XML system identifiers, and 
> XML Schema anyURIs.
> Unfortunately, it seems that the error can not be reproduced on small files. 
> The isolated sequence:
> {code}
> {
>   "@context":
>   {
> "eb": "http://zbw.eu/beta/resource/title/";,
> "doi": "http://dx.doi.org/";,
> "identifier_doi": { "@id": "umbel:isLike", "@type": "@id" }
>   },
>   "@graph":
> [
> {
>"@id" : "eb:10003656538",
>"identifier_doi" : [
>   "doi:DOI 10.2767/59617"
>]
> }
> ]
> }
> {code}
> creates a message:
> 14:13:52 WARN  riot :: Bad IRI:  10.2767/59617> Code: 17/WHITESPACE in PATH: A single whitespace character. 
> These match no grammar rules of URIs/IRIs. These characters are permitted in 
> RDF URI References, XML system identifiers, and XML Schema anyURIs.
> I can make the large file (econis_0037.json, 59Mb) available for download.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (JENA-996) riot should recognize invalid URIs in large jsonld files

2015-07-23 Thread Andy Seaborne (JIRA)

[ 
https://issues.apache.org/jira/browse/JENA-996?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14638532#comment-14638532
 ] 

Andy Seaborne commented on JENA-996:


Problem found. Warnings and errors are supposed to go to stderr.

The scripts use the Log4J setup in {{$JENA_HOME/jena-log4j.properties}} where 
{{$JENA_HOME}} is the root of the installation. A one line fix to that solve 
the problem for an installed copy.  Fixed in the codebase now as well. 

{noformat}
log4j.appender.stdlog.target=System.err
{noformat}


> riot should recognize invalid URIs in large jsonld files
> 
>
> Key: JENA-996
> URL: https://issues.apache.org/jira/browse/JENA-996
> Project: Apache Jena
>  Issue Type: Bug
>  Components: RIOT
>Affects Versions: Jena 2.13.0
>Reporter: Joachim Neubert
> Attachments: defect_doi.jsonld
>
>
> With riot --validate, in large jsonld files URIs including whitespace are not 
> flagged, the files seems to be valid.
> However, when loading this with Fusekis tdbloader, loading aborts with
> ERROR [line: 31572453, col: 1 ] Broken token (newline): Public consulting 
> services for smaller comme03:06:01 WARN  riot :: Bad IRI: 
>  Code: 17/WHITESPACE in PATH: A single 
> whitespace character. These match no grammar rules of URIs/IRIs. These 
> characters are permitted in RDF URI References, XML system identifiers, and 
> XML Schema anyURIs.
> Unfortunately, it seems that the error can not be reproduced on small files. 
> The isolated sequence:
> {code}
> {
>   "@context":
>   {
> "eb": "http://zbw.eu/beta/resource/title/";,
> "doi": "http://dx.doi.org/";,
> "identifier_doi": { "@id": "umbel:isLike", "@type": "@id" }
>   },
>   "@graph":
> [
> {
>"@id" : "eb:10003656538",
>"identifier_doi" : [
>   "doi:DOI 10.2767/59617"
>]
> }
> ]
> }
> {code}
> creates a message:
> 14:13:52 WARN  riot :: Bad IRI:  10.2767/59617> Code: 17/WHITESPACE in PATH: A single whitespace character. 
> These match no grammar rules of URIs/IRIs. These characters are permitted in 
> RDF URI References, XML system identifiers, and XML Schema anyURIs.
> I can make the large file (econis_0037.json, 59Mb) available for download.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (JENA-996) riot should recognize invalid URIs in large jsonld files

2015-07-23 Thread Joachim Neubert (JIRA)

[ 
https://issues.apache.org/jira/browse/JENA-996?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14638560#comment-14638560
 ] 

Joachim Neubert commented on JENA-996:
--

The fix works great, thanks!

In $FUSEKI_HOME/log4j.properties, I've found a line
{code}
## log4j.appender.stdlog.target=System.err
{code}
Should it be active too?

> riot should recognize invalid URIs in large jsonld files
> 
>
> Key: JENA-996
> URL: https://issues.apache.org/jira/browse/JENA-996
> Project: Apache Jena
>  Issue Type: Bug
>  Components: RIOT
>Affects Versions: Jena 2.13.0
>Reporter: Joachim Neubert
> Attachments: defect_doi.jsonld
>
>
> With riot --validate, in large jsonld files URIs including whitespace are not 
> flagged, the files seems to be valid.
> However, when loading this with Fusekis tdbloader, loading aborts with
> ERROR [line: 31572453, col: 1 ] Broken token (newline): Public consulting 
> services for smaller comme03:06:01 WARN  riot :: Bad IRI: 
>  Code: 17/WHITESPACE in PATH: A single 
> whitespace character. These match no grammar rules of URIs/IRIs. These 
> characters are permitted in RDF URI References, XML system identifiers, and 
> XML Schema anyURIs.
> Unfortunately, it seems that the error can not be reproduced on small files. 
> The isolated sequence:
> {code}
> {
>   "@context":
>   {
> "eb": "http://zbw.eu/beta/resource/title/";,
> "doi": "http://dx.doi.org/";,
> "identifier_doi": { "@id": "umbel:isLike", "@type": "@id" }
>   },
>   "@graph":
> [
> {
>"@id" : "eb:10003656538",
>"identifier_doi" : [
>   "doi:DOI 10.2767/59617"
>]
> }
> ]
> }
> {code}
> creates a message:
> 14:13:52 WARN  riot :: Bad IRI:  10.2767/59617> Code: 17/WHITESPACE in PATH: A single whitespace character. 
> These match no grammar rules of URIs/IRIs. These characters are permitted in 
> RDF URI References, XML system identifiers, and XML Schema anyURIs.
> I can make the large file (econis_0037.json, 59Mb) available for download.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (JENA-996) riot should recognize invalid URIs in large jsonld files

2015-07-23 Thread Andy Seaborne (JIRA)

[ 
https://issues.apache.org/jira/browse/JENA-996?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14638598#comment-14638598
 ] 

Andy Seaborne commented on JENA-996:


Which file exactly?

Fuseki server log output does go to stdout.  It is only log output so it does 
not get mixed with anything else. As log output is sometimes redirected e.g. 
run as a service, stdout makes it a little easier.

> riot should recognize invalid URIs in large jsonld files
> 
>
> Key: JENA-996
> URL: https://issues.apache.org/jira/browse/JENA-996
> Project: Apache Jena
>  Issue Type: Bug
>  Components: RIOT
>Affects Versions: Jena 2.13.0
>Reporter: Joachim Neubert
> Attachments: defect_doi.jsonld
>
>
> With riot --validate, in large jsonld files URIs including whitespace are not 
> flagged, the files seems to be valid.
> However, when loading this with Fusekis tdbloader, loading aborts with
> ERROR [line: 31572453, col: 1 ] Broken token (newline): Public consulting 
> services for smaller comme03:06:01 WARN  riot :: Bad IRI: 
>  Code: 17/WHITESPACE in PATH: A single 
> whitespace character. These match no grammar rules of URIs/IRIs. These 
> characters are permitted in RDF URI References, XML system identifiers, and 
> XML Schema anyURIs.
> Unfortunately, it seems that the error can not be reproduced on small files. 
> The isolated sequence:
> {code}
> {
>   "@context":
>   {
> "eb": "http://zbw.eu/beta/resource/title/";,
> "doi": "http://dx.doi.org/";,
> "identifier_doi": { "@id": "umbel:isLike", "@type": "@id" }
>   },
>   "@graph":
> [
> {
>"@id" : "eb:10003656538",
>"identifier_doi" : [
>   "doi:DOI 10.2767/59617"
>]
> }
> ]
> }
> {code}
> creates a message:
> 14:13:52 WARN  riot :: Bad IRI:  10.2767/59617> Code: 17/WHITESPACE in PATH: A single whitespace character. 
> These match no grammar rules of URIs/IRIs. These characters are permitted in 
> RDF URI References, XML system identifiers, and XML Schema anyURIs.
> I can make the large file (econis_0037.json, 59Mb) available for download.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[GitHub] jena pull request: Bump versions of JUnit, Guava for Elephas

2015-07-23 Thread asfgit
Github user asfgit closed the pull request at:

https://github.com/apache/jena/pull/87


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[jira] [Commented] (JENA-994) Update dependencies for Jena3

2015-07-23 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/JENA-994?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14638601#comment-14638601
 ] 

ASF GitHub Bot commented on JENA-994:
-

Github user asfgit closed the pull request at:

https://github.com/apache/jena/pull/87


> Update dependencies for Jena3
> -
>
> Key: JENA-994
> URL: https://issues.apache.org/jira/browse/JENA-994
> Project: Apache Jena
>  Issue Type: Task
>Affects Versions: Jena 3.0.0
>Reporter: Andy Seaborne
>Assignee: Andy Seaborne
> Fix For: Jena 3.0.0
>
>
> Update versions of dependencies for incremental updates.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (JENA-996) riot should recognize invalid URIs in large jsonld files

2015-07-23 Thread Joachim Neubert (JIRA)

[ 
https://issues.apache.org/jira/browse/JENA-996?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14638626#comment-14638626
 ] 

Joachim Neubert commented on JENA-996:
--

OK - thanks for the explanation!

> riot should recognize invalid URIs in large jsonld files
> 
>
> Key: JENA-996
> URL: https://issues.apache.org/jira/browse/JENA-996
> Project: Apache Jena
>  Issue Type: Bug
>  Components: RIOT
>Affects Versions: Jena 2.13.0
>Reporter: Joachim Neubert
> Attachments: defect_doi.jsonld
>
>
> With riot --validate, in large jsonld files URIs including whitespace are not 
> flagged, the files seems to be valid.
> However, when loading this with Fusekis tdbloader, loading aborts with
> ERROR [line: 31572453, col: 1 ] Broken token (newline): Public consulting 
> services for smaller comme03:06:01 WARN  riot :: Bad IRI: 
>  Code: 17/WHITESPACE in PATH: A single 
> whitespace character. These match no grammar rules of URIs/IRIs. These 
> characters are permitted in RDF URI References, XML system identifiers, and 
> XML Schema anyURIs.
> Unfortunately, it seems that the error can not be reproduced on small files. 
> The isolated sequence:
> {code}
> {
>   "@context":
>   {
> "eb": "http://zbw.eu/beta/resource/title/";,
> "doi": "http://dx.doi.org/";,
> "identifier_doi": { "@id": "umbel:isLike", "@type": "@id" }
>   },
>   "@graph":
> [
> {
>"@id" : "eb:10003656538",
>"identifier_doi" : [
>   "doi:DOI 10.2767/59617"
>]
> }
> ]
> }
> {code}
> creates a message:
> 14:13:52 WARN  riot :: Bad IRI:  10.2767/59617> Code: 17/WHITESPACE in PATH: A single whitespace character. 
> These match no grammar rules of URIs/IRIs. These characters are permitted in 
> RDF URI References, XML system identifiers, and XML Schema anyURIs.
> I can make the large file (econis_0037.json, 59Mb) available for download.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Closed] (JENA-996) riot should recognize invalid URIs in large jsonld files

2015-07-23 Thread Joachim Neubert (JIRA)

 [ 
https://issues.apache.org/jira/browse/JENA-996?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Joachim Neubert closed JENA-996.

Resolution: Fixed

> riot should recognize invalid URIs in large jsonld files
> 
>
> Key: JENA-996
> URL: https://issues.apache.org/jira/browse/JENA-996
> Project: Apache Jena
>  Issue Type: Bug
>  Components: RIOT
>Affects Versions: Jena 2.13.0
>Reporter: Joachim Neubert
> Attachments: defect_doi.jsonld
>
>
> With riot --validate, in large jsonld files URIs including whitespace are not 
> flagged, the files seems to be valid.
> However, when loading this with Fusekis tdbloader, loading aborts with
> ERROR [line: 31572453, col: 1 ] Broken token (newline): Public consulting 
> services for smaller comme03:06:01 WARN  riot :: Bad IRI: 
>  Code: 17/WHITESPACE in PATH: A single 
> whitespace character. These match no grammar rules of URIs/IRIs. These 
> characters are permitted in RDF URI References, XML system identifiers, and 
> XML Schema anyURIs.
> Unfortunately, it seems that the error can not be reproduced on small files. 
> The isolated sequence:
> {code}
> {
>   "@context":
>   {
> "eb": "http://zbw.eu/beta/resource/title/";,
> "doi": "http://dx.doi.org/";,
> "identifier_doi": { "@id": "umbel:isLike", "@type": "@id" }
>   },
>   "@graph":
> [
> {
>"@id" : "eb:10003656538",
>"identifier_doi" : [
>   "doi:DOI 10.2767/59617"
>]
> }
> ]
> }
> {code}
> creates a message:
> 14:13:52 WARN  riot :: Bad IRI:  10.2767/59617> Code: 17/WHITESPACE in PATH: A single whitespace character. 
> These match no grammar rules of URIs/IRIs. These characters are permitted in 
> RDF URI References, XML system identifiers, and XML Schema anyURIs.
> I can make the large file (econis_0037.json, 59Mb) available for download.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Reopened] (JENA-977) tdbloader2 script refactoring

2015-07-23 Thread Andy Seaborne (JIRA)

 [ 
https://issues.apache.org/jira/browse/JENA-977?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andy Seaborne reopened JENA-977:


There are problems with the regenerated scripts.

# Uses bash syntax for functions but execs as /bin/sh
# echo output in normal use.

> tdbloader2 script refactoring
> -
>
> Key: JENA-977
> URL: https://issues.apache.org/jira/browse/JENA-977
> Project: Apache Jena
>  Issue Type: Improvement
>  Components: TDB
>Affects Versions: Jena 2.13.0
>Reporter: Rob Vesse
>Assignee: Rob Vesse
> Fix For: Jena 2.13.1, Jena 3.0.0
>
>
> As noted on the dev list the current scripts are a little rough around the 
> edges, work items include:
> - Splitting data and index phase into separate scripts
> - Being able to restart a build from a later phase
> - Progress monitoring for the sort portion of indexing
> - Warning if sort is using a disk where you may have insufficient space
> - Better usage summaries
> - Better argument handling (avoid relying on magic environment variables 
> wherever possible)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (JENA-977) tdbloader2 script refactoring

2015-07-23 Thread ASF subversion and git services (JIRA)

[ 
https://issues.apache.org/jira/browse/JENA-977?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14638725#comment-14638725
 ] 

ASF subversion and git services commented on JENA-977:
--

Commit e03018f7155a9417bb1fe57bf6d546e61aaa5529 in jena's branch 
refs/heads/master from [~andy.seaborne]
[ https://git-wip-us.apache.org/repos/asf?p=jena.git;h=e03018f ]

JENA-977: Make sh-legal; remove dev echo line


> tdbloader2 script refactoring
> -
>
> Key: JENA-977
> URL: https://issues.apache.org/jira/browse/JENA-977
> Project: Apache Jena
>  Issue Type: Improvement
>  Components: TDB
>Affects Versions: Jena 2.13.0
>Reporter: Rob Vesse
>Assignee: Rob Vesse
> Fix For: Jena 2.13.1, Jena 3.0.0
>
>
> As noted on the dev list the current scripts are a little rough around the 
> edges, work items include:
> - Splitting data and index phase into separate scripts
> - Being able to restart a build from a later phase
> - Progress monitoring for the sort portion of indexing
> - Warning if sort is using a disk where you may have insufficient space
> - Better usage summaries
> - Better argument handling (avoid relying on magic environment variables 
> wherever possible)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (JENA-977) tdbloader2 script refactoring

2015-07-23 Thread ASF subversion and git services (JIRA)

[ 
https://issues.apache.org/jira/browse/JENA-977?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14638726#comment-14638726
 ] 

ASF subversion and git services commented on JENA-977:
--

Commit 6074418ab8347efa9a9377ef46044dcac8b8a88a in jena's branch 
refs/heads/master from [~andy.seaborne]
[ https://git-wip-us.apache.org/repos/asf?p=jena.git;h=6074418 ]

JENA-977: Regenerated scripts


> tdbloader2 script refactoring
> -
>
> Key: JENA-977
> URL: https://issues.apache.org/jira/browse/JENA-977
> Project: Apache Jena
>  Issue Type: Improvement
>  Components: TDB
>Affects Versions: Jena 2.13.0
>Reporter: Rob Vesse
>Assignee: Rob Vesse
> Fix For: Jena 2.13.1, Jena 3.0.0
>
>
> As noted on the dev list the current scripts are a little rough around the 
> edges, work items include:
> - Splitting data and index phase into separate scripts
> - Being able to restart a build from a later phase
> - Progress monitoring for the sort portion of indexing
> - Warning if sort is using a disk where you may have insufficient space
> - Better usage summaries
> - Better argument handling (avoid relying on magic environment variables 
> wherever possible)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (JENA-985) Iterate using Apache Jena ExtendedIterator on Graph with big amount of triples

2015-07-23 Thread Andy Seaborne (JIRA)

[ 
https://issues.apache.org/jira/browse/JENA-985?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14638744#comment-14638744
 ] 

Andy Seaborne commented on JENA-985:


You are creating many TDB databases, not one, on each call to 
{{TDBFactory.createDatasetGraph}}. Each has it's own cache.

Why not load all the data into one TDB database, using one named graph per 
datasetPath and then use default union graph in TDB? 

https://jena.apache.org/documentation/tdb/datasets.html


> Iterate using Apache Jena ExtendedIterator on Graph with big amount of triples
> --
>
> Key: JENA-985
> URL: https://issues.apache.org/jira/browse/JENA-985
> Project: Apache Jena
>  Issue Type: Bug
>  Components: Core
>Affects Versions: Jena 2.13.0
> Environment: *Hardware*
> Windows 7 64-bit
> Intel Core i7 4785T @ 2.20GHz
> RAM 16,0GB DDR3
> 465GB Samsung SSD 850 EVO 500G SCSI Disk Device (SSD)
> *Software environment*
> java version "1.7.0_75"
> Java(TM) SE Runtime Environment (build 1.7.0_75-b13)
> Java HotSpot(TM) 64-Bit Server VM (build 24.75-b04, mixed mode)
> *Running options*
> VM options: -Xmx14g
>Reporter: Eugene Tenkaev
>Priority: Minor
>
> I'm generating Apache Jena Graph from DBpedia dumps and now I want iterate 
> through all "dbpedia-owl:abstract".
> So I do something like this:
> {code:java}
> ExtendedIterator iterator = Graph.find(Node.ANY, 
> NodeFactory.createURI("dbpedia-owl:abstract"), Node.ANY);
> {code}
> But then I try to iterate, memory consumption is increased, so looks like 
> "ExtendedIterator" store found nodes.
> I use VisualVM profiler and found that while I iterate, count of 
> "com.hp.hpl.jena.graph.Node_URI" is increasing.
> I try to do "iterator.reset()" but this takes no effect.
> Is this bug or feature?:D
> Can I iterate through all DBpedia abstracts without storing nodes and without 
> increasing consumption of memory that gc can't freed?
> Sorry for my bad english.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Comment Edited] (JENA-985) Iterate using Apache Jena ExtendedIterator on Graph with big amount of triples

2015-07-23 Thread Andy Seaborne (JIRA)

[ 
https://issues.apache.org/jira/browse/JENA-985?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14638751#comment-14638751
 ] 

Andy Seaborne edited comment on JENA-985 at 7/23/15 12:50 PM:
--

{{TDBFactory.reset()}} - you shouldn't need to call this (as the javadoc says, 
it's mainly for the tests) and it has been removed in the next release.  On MS 
Windows, it can not properly release all resources (java+windows issue).


was (Author: andy.seaborne):
{{TDBFactory.reset()}} - you shouldn't need to call this (as the javadoc says, 
it's mainly for the tests) and it has been removed in the next release.

> Iterate using Apache Jena ExtendedIterator on Graph with big amount of triples
> --
>
> Key: JENA-985
> URL: https://issues.apache.org/jira/browse/JENA-985
> Project: Apache Jena
>  Issue Type: Bug
>  Components: Core
>Affects Versions: Jena 2.13.0
> Environment: *Hardware*
> Windows 7 64-bit
> Intel Core i7 4785T @ 2.20GHz
> RAM 16,0GB DDR3
> 465GB Samsung SSD 850 EVO 500G SCSI Disk Device (SSD)
> *Software environment*
> java version "1.7.0_75"
> Java(TM) SE Runtime Environment (build 1.7.0_75-b13)
> Java HotSpot(TM) 64-Bit Server VM (build 24.75-b04, mixed mode)
> *Running options*
> VM options: -Xmx14g
>Reporter: Eugene Tenkaev
>Priority: Minor
>
> I'm generating Apache Jena Graph from DBpedia dumps and now I want iterate 
> through all "dbpedia-owl:abstract".
> So I do something like this:
> {code:java}
> ExtendedIterator iterator = Graph.find(Node.ANY, 
> NodeFactory.createURI("dbpedia-owl:abstract"), Node.ANY);
> {code}
> But then I try to iterate, memory consumption is increased, so looks like 
> "ExtendedIterator" store found nodes.
> I use VisualVM profiler and found that while I iterate, count of 
> "com.hp.hpl.jena.graph.Node_URI" is increasing.
> I try to do "iterator.reset()" but this takes no effect.
> Is this bug or feature?:D
> Can I iterate through all DBpedia abstracts without storing nodes and without 
> increasing consumption of memory that gc can't freed?
> Sorry for my bad english.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (JENA-985) Iterate using Apache Jena ExtendedIterator on Graph with big amount of triples

2015-07-23 Thread Andy Seaborne (JIRA)

[ 
https://issues.apache.org/jira/browse/JENA-985?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14638751#comment-14638751
 ] 

Andy Seaborne commented on JENA-985:


{{TDBFactory.reset()}} - you shouldn't need to call this (as the javadoc says, 
it's mainly for the tests) and it has been removed in the next release.

> Iterate using Apache Jena ExtendedIterator on Graph with big amount of triples
> --
>
> Key: JENA-985
> URL: https://issues.apache.org/jira/browse/JENA-985
> Project: Apache Jena
>  Issue Type: Bug
>  Components: Core
>Affects Versions: Jena 2.13.0
> Environment: *Hardware*
> Windows 7 64-bit
> Intel Core i7 4785T @ 2.20GHz
> RAM 16,0GB DDR3
> 465GB Samsung SSD 850 EVO 500G SCSI Disk Device (SSD)
> *Software environment*
> java version "1.7.0_75"
> Java(TM) SE Runtime Environment (build 1.7.0_75-b13)
> Java HotSpot(TM) 64-Bit Server VM (build 24.75-b04, mixed mode)
> *Running options*
> VM options: -Xmx14g
>Reporter: Eugene Tenkaev
>Priority: Minor
>
> I'm generating Apache Jena Graph from DBpedia dumps and now I want iterate 
> through all "dbpedia-owl:abstract".
> So I do something like this:
> {code:java}
> ExtendedIterator iterator = Graph.find(Node.ANY, 
> NodeFactory.createURI("dbpedia-owl:abstract"), Node.ANY);
> {code}
> But then I try to iterate, memory consumption is increased, so looks like 
> "ExtendedIterator" store found nodes.
> I use VisualVM profiler and found that while I iterate, count of 
> "com.hp.hpl.jena.graph.Node_URI" is increasing.
> I try to do "iterator.reset()" but this takes no effect.
> Is this bug or feature?:D
> Can I iterate through all DBpedia abstracts without storing nodes and without 
> increasing consumption of memory that gc can't freed?
> Sorry for my bad english.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Comment Edited] (JENA-985) Iterate using Apache Jena ExtendedIterator on Graph with big amount of triples

2015-07-23 Thread Andy Seaborne (JIRA)

[ 
https://issues.apache.org/jira/browse/JENA-985?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14638744#comment-14638744
 ] 

Andy Seaborne edited comment on JENA-985 at 7/23/15 12:52 PM:
--

You are creating many TDB databases, not one, on each call to 
{{TDBFactory.createDatasetGraph}}. How many are there? Each has it's own cache.

{{MultiUnion}} may well cause problems.

Why not load all the data into one TDB database, using one named graph per 
datasetPath and then use default union graph in TDB? 

https://jena.apache.org/documentation/tdb/datasets.html



was (Author: andy.seaborne):
You are creating many TDB databases, not one, on each call to 
{{TDBFactory.createDatasetGraph}}. Each has it's own cache.

Why not load all the data into one TDB database, using one named graph per 
datasetPath and then use default union graph in TDB? 

https://jena.apache.org/documentation/tdb/datasets.html


> Iterate using Apache Jena ExtendedIterator on Graph with big amount of triples
> --
>
> Key: JENA-985
> URL: https://issues.apache.org/jira/browse/JENA-985
> Project: Apache Jena
>  Issue Type: Bug
>  Components: Core
>Affects Versions: Jena 2.13.0
> Environment: *Hardware*
> Windows 7 64-bit
> Intel Core i7 4785T @ 2.20GHz
> RAM 16,0GB DDR3
> 465GB Samsung SSD 850 EVO 500G SCSI Disk Device (SSD)
> *Software environment*
> java version "1.7.0_75"
> Java(TM) SE Runtime Environment (build 1.7.0_75-b13)
> Java HotSpot(TM) 64-Bit Server VM (build 24.75-b04, mixed mode)
> *Running options*
> VM options: -Xmx14g
>Reporter: Eugene Tenkaev
>Priority: Minor
>
> I'm generating Apache Jena Graph from DBpedia dumps and now I want iterate 
> through all "dbpedia-owl:abstract".
> So I do something like this:
> {code:java}
> ExtendedIterator iterator = Graph.find(Node.ANY, 
> NodeFactory.createURI("dbpedia-owl:abstract"), Node.ANY);
> {code}
> But then I try to iterate, memory consumption is increased, so looks like 
> "ExtendedIterator" store found nodes.
> I use VisualVM profiler and found that while I iterate, count of 
> "com.hp.hpl.jena.graph.Node_URI" is increasing.
> I try to do "iterator.reset()" but this takes no effect.
> Is this bug or feature?:D
> Can I iterate through all DBpedia abstracts without storing nodes and without 
> increasing consumption of memory that gc can't freed?
> Sorry for my bad english.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Resolved] (JENA-977) tdbloader2 script refactoring

2015-07-23 Thread Andy Seaborne (JIRA)

 [ 
https://issues.apache.org/jira/browse/JENA-977?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andy Seaborne resolved JENA-977.

Resolution: Fixed

> tdbloader2 script refactoring
> -
>
> Key: JENA-977
> URL: https://issues.apache.org/jira/browse/JENA-977
> Project: Apache Jena
>  Issue Type: Improvement
>  Components: TDB
>Affects Versions: Jena 2.13.0
>Reporter: Rob Vesse
>Assignee: Rob Vesse
> Fix For: Jena 2.13.1, Jena 3.0.0
>
>
> As noted on the dev list the current scripts are a little rough around the 
> edges, work items include:
> - Splitting data and index phase into separate scripts
> - Being able to restart a build from a later phase
> - Progress monitoring for the sort portion of indexing
> - Warning if sort is using a disk where you may have insufficient space
> - Better usage summaries
> - Better argument handling (avoid relying on magic environment variables 
> wherever possible)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Reopened] (JENA-977) tdbloader2 script refactoring

2015-07-23 Thread Andy Seaborne (JIRA)

 [ 
https://issues.apache.org/jira/browse/JENA-977?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andy Seaborne reopened JENA-977:


tdbloader2 script fails on Linux.

> tdbloader2 script refactoring
> -
>
> Key: JENA-977
> URL: https://issues.apache.org/jira/browse/JENA-977
> Project: Apache Jena
>  Issue Type: Improvement
>  Components: TDB
>Affects Versions: Jena 2.13.0
>Reporter: Rob Vesse
>Assignee: Rob Vesse
> Fix For: Jena 2.13.1, Jena 3.0.0
>
>
> As noted on the dev list the current scripts are a little rough around the 
> edges, work items include:
> - Splitting data and index phase into separate scripts
> - Being able to restart a build from a later phase
> - Progress monitoring for the sort portion of indexing
> - Warning if sort is using a disk where you may have insufficient space
> - Better usage summaries
> - Better argument handling (avoid relying on magic environment variables 
> wherever possible)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (JENA-977) tdbloader2 script refactoring

2015-07-23 Thread Andy Seaborne (JIRA)

 [ 
https://issues.apache.org/jira/browse/JENA-977?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andy Seaborne updated JENA-977:
---
Priority: Blocker  (was: Major)

> tdbloader2 script refactoring
> -
>
> Key: JENA-977
> URL: https://issues.apache.org/jira/browse/JENA-977
> Project: Apache Jena
>  Issue Type: Improvement
>  Components: TDB
>Affects Versions: Jena 2.13.0
>Reporter: Rob Vesse
>Assignee: Rob Vesse
>Priority: Blocker
> Fix For: Jena 2.13.1, Jena 3.0.0
>
>
> As noted on the dev list the current scripts are a little rough around the 
> edges, work items include:
> - Splitting data and index phase into separate scripts
> - Being able to restart a build from a later phase
> - Progress monitoring for the sort portion of indexing
> - Warning if sort is using a disk where you may have insufficient space
> - Better usage summaries
> - Better argument handling (avoid relying on magic environment variables 
> wherever possible)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (JENA-977) tdbloader2 script refactoring

2015-07-23 Thread Andy Seaborne (JIRA)

[ 
https://issues.apache.org/jira/browse/JENA-977?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14638765#comment-14638765
 ] 

Andy Seaborne commented on JENA-977:


Case 1: empty database directory DB exists

{noformat}
$ bin/tdbloader2 --loc DB ~/tmp/D.ttl 
 13:56:43 INFO -- TDB Bulk Loader Start
 13:56:43 INFO Data Load Phase
 13:56:43 INFO Got 1 data files to load
 13:56:43 INFO Data file 1: /home/afs/tmp/D.ttl
INFO  Load: /home/afs/tmp/D.ttl -- 2015/07/23 13:56:44 BST
INFO  Total: 1 tuples : 0.10 seconds : 10.00 tuples/sec [2015/07/23 13:56:44 
BST]
 13:56:44 INFO Data Load Phase Completed
 13:56:44 INFO Index Building Phase
 13:56:44 INFO Creating Index SPO
df: '/home/afs/Release/apache-jena-3.0.0-SNAPSHOT/DB//SPO-txt': No such file or 
directory
/home/afs/Release/apache-jena-3.0.0-SNAPSHOT/bin/tdbloader2common: line 71: 100 
- : syntax error: operand expected (error token is "- ")
/home/afs/Release/apache-jena-3.0.0-SNAPSHOT/bin/tdbloader2index: line 331: [: 
: integer expression expected
/home/afs/Release/apache-jena-3.0.0-SNAPSHOT/bin/tdbloader2index: line 342: [:  
total   used   free sharedbuffers cached
Mem:33687789568 8906674176 24781115392  11176  454799360 4349825024
-/+ buffers/cache: 4102049792 29585739776
Swap:   34359734272  0 34359734272: integer expression expected
 13:56:44 WARN Unable to determine free memory on your OS, can't check whether 
sort will be in-memory or external sort using Temp Directory /tmp/
 13:56:44 INFO Sort SPO
{noformat}
and it hangs at that point.

Case 2: database directory DB does not exist
{noformat}
$ bin/tdbloader2 --loc DB ~/tmp/D.ttl 
 13:59:44 INFO -- TDB Bulk Loader Start
find: ‘/home/afs/Release/apache-jena-3.0.0-SNAPSHOT/DB’: No such file or 
directory
 13:59:44 INFO Data Load Phase
... as before ...
{noformat}

Also: There are different formats for the logging.


> tdbloader2 script refactoring
> -
>
> Key: JENA-977
> URL: https://issues.apache.org/jira/browse/JENA-977
> Project: Apache Jena
>  Issue Type: Improvement
>  Components: TDB
>Affects Versions: Jena 2.13.0
>Reporter: Rob Vesse
>Assignee: Rob Vesse
>Priority: Blocker
> Fix For: Jena 2.13.1, Jena 3.0.0
>
>
> As noted on the dev list the current scripts are a little rough around the 
> edges, work items include:
> - Splitting data and index phase into separate scripts
> - Being able to restart a build from a later phase
> - Progress monitoring for the sort portion of indexing
> - Warning if sort is using a disk where you may have insufficient space
> - Better usage summaries
> - Better argument handling (avoid relying on magic environment variables 
> wherever possible)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Comment Edited] (JENA-977) tdbloader2 script refactoring

2015-07-23 Thread Andy Seaborne (JIRA)

[ 
https://issues.apache.org/jira/browse/JENA-977?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14638765#comment-14638765
 ] 

Andy Seaborne edited comment on JENA-977 at 7/23/15 1:07 PM:
-

D.ttl contains 1 triple.

Case 1: empty database directory DB exists

{noformat}
$ bin/tdbloader2 --loc DB ~/tmp/D.ttl 
 13:56:43 INFO -- TDB Bulk Loader Start
 13:56:43 INFO Data Load Phase
 13:56:43 INFO Got 1 data files to load
 13:56:43 INFO Data file 1: /home/afs/tmp/D.ttl
INFO  Load: /home/afs/tmp/D.ttl -- 2015/07/23 13:56:44 BST
INFO  Total: 1 tuples : 0.10 seconds : 10.00 tuples/sec [2015/07/23 13:56:44 
BST]
 13:56:44 INFO Data Load Phase Completed
 13:56:44 INFO Index Building Phase
 13:56:44 INFO Creating Index SPO
df: '/home/afs/Release/apache-jena-3.0.0-SNAPSHOT/DB//SPO-txt': No such file or 
directory
/home/afs/Release/apache-jena-3.0.0-SNAPSHOT/bin/tdbloader2common: line 71: 100 
- : syntax error: operand expected (error token is "- ")
/home/afs/Release/apache-jena-3.0.0-SNAPSHOT/bin/tdbloader2index: line 331: [: 
: integer expression expected
/home/afs/Release/apache-jena-3.0.0-SNAPSHOT/bin/tdbloader2index: line 342: [:  
total   used   free sharedbuffers cached
Mem:33687789568 8906674176 24781115392  11176  454799360 4349825024
-/+ buffers/cache: 4102049792 29585739776
Swap:   34359734272  0 34359734272: integer expression expected
 13:56:44 WARN Unable to determine free memory on your OS, can't check whether 
sort will be in-memory or external sort using Temp Directory /tmp/
 13:56:44 INFO Sort SPO
{noformat}
and it hangs at that point.

Case 2: database directory DB does not exist
{noformat}
$ bin/tdbloader2 --loc DB ~/tmp/D.ttl 
 13:59:44 INFO -- TDB Bulk Loader Start
find: ‘/home/afs/Release/apache-jena-3.0.0-SNAPSHOT/DB’: No such file or 
directory
 13:59:44 INFO Data Load Phase
... as before ...
{noformat}

Also: There are different formats for the logging.



was (Author: andy.seaborne):
Case 1: empty database directory DB exists

{noformat}
$ bin/tdbloader2 --loc DB ~/tmp/D.ttl 
 13:56:43 INFO -- TDB Bulk Loader Start
 13:56:43 INFO Data Load Phase
 13:56:43 INFO Got 1 data files to load
 13:56:43 INFO Data file 1: /home/afs/tmp/D.ttl
INFO  Load: /home/afs/tmp/D.ttl -- 2015/07/23 13:56:44 BST
INFO  Total: 1 tuples : 0.10 seconds : 10.00 tuples/sec [2015/07/23 13:56:44 
BST]
 13:56:44 INFO Data Load Phase Completed
 13:56:44 INFO Index Building Phase
 13:56:44 INFO Creating Index SPO
df: '/home/afs/Release/apache-jena-3.0.0-SNAPSHOT/DB//SPO-txt': No such file or 
directory
/home/afs/Release/apache-jena-3.0.0-SNAPSHOT/bin/tdbloader2common: line 71: 100 
- : syntax error: operand expected (error token is "- ")
/home/afs/Release/apache-jena-3.0.0-SNAPSHOT/bin/tdbloader2index: line 331: [: 
: integer expression expected
/home/afs/Release/apache-jena-3.0.0-SNAPSHOT/bin/tdbloader2index: line 342: [:  
total   used   free sharedbuffers cached
Mem:33687789568 8906674176 24781115392  11176  454799360 4349825024
-/+ buffers/cache: 4102049792 29585739776
Swap:   34359734272  0 34359734272: integer expression expected
 13:56:44 WARN Unable to determine free memory on your OS, can't check whether 
sort will be in-memory or external sort using Temp Directory /tmp/
 13:56:44 INFO Sort SPO
{noformat}
and it hangs at that point.

Case 2: database directory DB does not exist
{noformat}
$ bin/tdbloader2 --loc DB ~/tmp/D.ttl 
 13:59:44 INFO -- TDB Bulk Loader Start
find: ‘/home/afs/Release/apache-jena-3.0.0-SNAPSHOT/DB’: No such file or 
directory
 13:59:44 INFO Data Load Phase
... as before ...
{noformat}

Also: There are different formats for the logging.


> tdbloader2 script refactoring
> -
>
> Key: JENA-977
> URL: https://issues.apache.org/jira/browse/JENA-977
> Project: Apache Jena
>  Issue Type: Improvement
>  Components: TDB
>Affects Versions: Jena 2.13.0
>Reporter: Rob Vesse
>Assignee: Rob Vesse
>Priority: Blocker
> Fix For: Jena 2.13.1, Jena 3.0.0
>
>
> As noted on the dev list the current scripts are a little rough around the 
> edges, work items include:
> - Splitting data and index phase into separate scripts
> - Being able to restart a build from a later phase
> - Progress monitoring for the sort portion of indexing
> - Warning if sort is using a disk where you may have insufficient space
> - Better usage summaries
> - Better argument handling (avoid relying on magic environment variables 
> wherever possible)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


Journaling DatasetGraph

2015-07-23 Thread aj...@virginia.edu
After a longish conversation with Andy Seaborne, I've worked up a simple 
journaling DatasetGraph wrapping implementation. The idea is to use journaling 
to support proper aborting behavior (which I believe this code does) and to add 
to that a semantic for DatasetGraph::addGraph that copies tuples instead of 
leaving a reference to the added Graph (which I believe this code also does). 
Between these two behaviors, the idea is to be able to support transactionality 
(MRSW only) reasonably well.

The idea is (if this code looks like a reasonable direction) to move onwards to 
an implementation that uses persistent data structures for covering indexes in 
order to get at least to MR+SW and eventually to attack JENA-624: "Develop a 
new in-memory RDF Dataset implementation".

Feedback / advice / criticism greedily desired and welcome!

https://github.com/ajs6f/jena/tree/JournalingDatasetgraph

https://github.com/apache/jena/compare/master...ajs6f:JournalingDatasetgraph

---
A. Soroka
The University of Virginia Library

Jena3 release status

2015-07-23 Thread Andy Seaborne

Trying to do a release, I came across some issues.

JENA-992: (Refactor graph/permissions interface layer)

Not sure of the status of this but I'm assuming that the code already in 
'master' is releasable.


JENA-997: (tdbloader2 script refactoring)

The new scripts misbehave on Linux - there isn't one (obvious) issue.

To unblock the release, if there is a small fix, then great.  Another 
possibility is to revert to the older scripts for 3.0.0, so as to fix 
afterwards. This gives more time and space for testing.


It looks to me like "bash" on OSX is bash 3.2 derived (3.2 was 
originally 2006) whereas on Ubuntu currently it is 4.3.  There might be 
other issues that arise if the current ones are resolved with bash or 
other commands.


Andy


[jira] [Commented] (JENA-977) tdbloader2 script refactoring

2015-07-23 Thread ASF subversion and git services (JIRA)

[ 
https://issues.apache.org/jira/browse/JENA-977?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14638924#comment-14638924
 ] 

ASF subversion and git services commented on JENA-977:
--

Commit 23feb82a337bd88c89a60ca5ec9711bb53098a05 in jena's branch 
refs/heads/master from [~rvesse]
[ https://git-wip-us.apache.org/repos/asf?p=jena.git;h=23feb82 ]

Fix some bugs in tdbloader2 script refactoring (JENA-977)

- Don't check database directory until after we will have created it and
  validated that it is a directory
- If getDriveInfo fails return empty information
- Don't try and calculate anything or print anything about drive
  information if it cannot be retrieved correctly


> tdbloader2 script refactoring
> -
>
> Key: JENA-977
> URL: https://issues.apache.org/jira/browse/JENA-977
> Project: Apache Jena
>  Issue Type: Improvement
>  Components: TDB
>Affects Versions: Jena 2.13.0
>Reporter: Rob Vesse
>Assignee: Rob Vesse
>Priority: Blocker
> Fix For: Jena 2.13.1, Jena 3.0.0
>
>
> As noted on the dev list the current scripts are a little rough around the 
> edges, work items include:
> - Splitting data and index phase into separate scripts
> - Being able to restart a build from a later phase
> - Progress monitoring for the sort portion of indexing
> - Warning if sort is using a disk where you may have insufficient space
> - Better usage summaries
> - Better argument handling (avoid relying on magic environment variables 
> wherever possible)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


Re: Jena3 release status

2015-07-23 Thread Rob Vesse
Comments inline:

On 23/07/2015 14:41, "Andy Seaborne"  wrote:

>Trying to do a release, I came across some issues.
>
>JENA-992: (Refactor graph/permissions interface layer)
>
>Not sure of the status of this but I'm assuming that the code already in
>'master' is releasable.
>
>JENA-997: (tdbloader2 script refactoring)
>
>The new scripts misbehave on Linux - there isn't one (obvious) issue.
>
>To unblock the release, if there is a small fix, then great.  Another
>possibility is to revert to the older scripts for 3.0.0, so as to fix
>afterwards. This gives more time and space for testing.

Looks to be relatively simple, think I have the bugs you identified
resolved

For Case 1 I needed to look up the drive info based on the directory where
the work files will be created and not the work file itself because that
doesn't exist yet.  As part of fixing this I also made the script
resistant to errors where the drive information was unavailable

For Case 2 I was checking the directory before I had ensured it existed
and was a directory so that just required changing the order of checks

Rob

>
>It looks to me like "bash" on OSX is bash 3.2 derived (3.2 was
>originally 2006) whereas on Ubuntu currently it is 4.3.  There might be
>other issues that arise if the current ones are resolved with bash or
>other commands.
>
>   Andy






[jira] [Commented] (JENA-977) tdbloader2 script refactoring

2015-07-23 Thread Rob Vesse (JIRA)

[ 
https://issues.apache.org/jira/browse/JENA-977?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14638929#comment-14638929
 ] 

Rob Vesse commented on JENA-977:


As for the different logging formats can you clarify what you mean

The scripts should use the same logging format throughout however they do not 
have any control over how the invoked Java tools do their logging which I 
assume is the issue?

I should have fixed the issues identified here so please retry

> tdbloader2 script refactoring
> -
>
> Key: JENA-977
> URL: https://issues.apache.org/jira/browse/JENA-977
> Project: Apache Jena
>  Issue Type: Improvement
>  Components: TDB
>Affects Versions: Jena 2.13.0
>Reporter: Rob Vesse
>Assignee: Rob Vesse
>Priority: Blocker
> Fix For: Jena 2.13.1, Jena 3.0.0
>
>
> As noted on the dev list the current scripts are a little rough around the 
> edges, work items include:
> - Splitting data and index phase into separate scripts
> - Being able to restart a build from a later phase
> - Progress monitoring for the sort portion of indexing
> - Warning if sort is using a disk where you may have insufficient space
> - Better usage summaries
> - Better argument handling (avoid relying on magic environment variables 
> wherever possible)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (JENA-977) tdbloader2 script refactoring

2015-07-23 Thread Andy Seaborne (JIRA)

[ 
https://issues.apache.org/jira/browse/JENA-977?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14638938#comment-14638938
 ] 

Andy Seaborne commented on JENA-977:


I see 
{noformat}
 13:56:43 INFO Data file 1: /home/afs/tmp/D.ttl
INFO  Load: /home/afs/tmp/D.ttl -- 2015/07/23 13:56:44 BST
{noformat}
Two kinds of INFO line.  This is invoking the scripts as in bin/ and no 
settings.  jena-log4j.properties is picked up by 
{noformat}
LOGGING="-Dlog4j.configuration=file:$JENA_HOME/jena-log4j.properties"
{noformat}
Can the shell script lines can be aligned to jena-log4j.properties?

> tdbloader2 script refactoring
> -
>
> Key: JENA-977
> URL: https://issues.apache.org/jira/browse/JENA-977
> Project: Apache Jena
>  Issue Type: Improvement
>  Components: TDB
>Affects Versions: Jena 2.13.0
>Reporter: Rob Vesse
>Assignee: Rob Vesse
>Priority: Blocker
> Fix For: Jena 2.13.1, Jena 3.0.0
>
>
> As noted on the dev list the current scripts are a little rough around the 
> edges, work items include:
> - Splitting data and index phase into separate scripts
> - Being able to restart a build from a later phase
> - Progress monitoring for the sort portion of indexing
> - Warning if sort is using a disk where you may have insufficient space
> - Better usage summaries
> - Better argument handling (avoid relying on magic environment variables 
> wherever possible)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (JENA-977) tdbloader2 script refactoring

2015-07-23 Thread Andy Seaborne (JIRA)

[ 
https://issues.apache.org/jira/browse/JENA-977?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14638954#comment-14638954
 ] 

Andy Seaborne commented on JENA-977:


Still failing and also earlier fixes undone.

{noformat}
JENA_HOME not set, attempting to locate JENA_HOME automatically
Located JENA_HOME at /home/afs/Release/apache-jena-3.0.0-SNAPSHOT
 15:50:11 INFO -- TDB Bulk Loader Start
 15:50:11 INFO Data Load Phase
 15:50:11 INFO Got 1 data files to load
 15:50:11 INFO Data file 1: /home/afs/tmp/D.ttl
INFO  Load: /home/afs/tmp/D.ttl -- 2015/07/23 15:50:12 BST
INFO  Total: 1 tuples : 0.08 seconds : 12.35 tuples/sec [2015/07/23 15:50:12 
BST]
 15:50:12 INFO Data Load Phase Completed
 15:50:12 INFO Index Building Phase
 15:50:12 INFO Creating Index SPO
/home/afs/Release/apache-jena-3.0.0-SNAPSHOT/bin/tdbloader2index: line 346: [:  
total   used   free sharedbuffers cached
Mem:33687789568 9095151616 24592637952  116035584  461705216 4433141760
-/+ buffers/cache: 4200304640 29487484928
Swap:   34359734272  0 34359734272: integer expression expected
 15:50:12 WARN Unable to determine free memory on your OS, can't check whether 
sort will be in-memory or external sort using Temp Directory /tmp/
 15:50:12 INFO Sort SPO
{noformat}

The 
{noformat}
JENA_HOME not set, attempting to locate JENA_HOME automatically
Located JENA_HOME at /home/afs/Release/apache-jena-3.0.0-SNAPSHOT
{noformat}
had fixed by removing those echos (same done to all other scripts).  Normal use 
is the scripts find {{JENA_HOME}} themselves.  I guess you have JENA_HOME set.

FYI: free -b =>
{noformat}
 total   used   free sharedbuffers cached
Mem:33687789568 9065099264 24622690304  116862976  462147584 4433760256
-/+ buffers/cache: 4169191424 29518598144
Swap:   34359734272  0 34359734272
{noformat}

> tdbloader2 script refactoring
> -
>
> Key: JENA-977
> URL: https://issues.apache.org/jira/browse/JENA-977
> Project: Apache Jena
>  Issue Type: Improvement
>  Components: TDB
>Affects Versions: Jena 2.13.0
>Reporter: Rob Vesse
>Assignee: Rob Vesse
>Priority: Blocker
> Fix For: Jena 2.13.1, Jena 3.0.0
>
>
> As noted on the dev list the current scripts are a little rough around the 
> edges, work items include:
> - Splitting data and index phase into separate scripts
> - Being able to restart a build from a later phase
> - Progress monitoring for the sort portion of indexing
> - Warning if sort is using a disk where you may have insufficient space
> - Better usage summaries
> - Better argument handling (avoid relying on magic environment variables 
> wherever possible)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Comment Edited] (JENA-977) tdbloader2 script refactoring

2015-07-23 Thread Andy Seaborne (JIRA)

[ 
https://issues.apache.org/jira/browse/JENA-977?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14638954#comment-14638954
 ] 

Andy Seaborne edited comment on JENA-977 at 7/23/15 3:04 PM:
-

Still failing and also earlier fixes undone.

{noformat}
JENA_HOME not set, attempting to locate JENA_HOME automatically
Located JENA_HOME at /home/afs/Release/apache-jena-3.0.0-SNAPSHOT
 15:50:11 INFO -- TDB Bulk Loader Start
 15:50:11 INFO Data Load Phase
 15:50:11 INFO Got 1 data files to load
 15:50:11 INFO Data file 1: /home/afs/tmp/D.ttl
INFO  Load: /home/afs/tmp/D.ttl -- 2015/07/23 15:50:12 BST
INFO  Total: 1 tuples : 0.08 seconds : 12.35 tuples/sec [2015/07/23 15:50:12 
BST]
 15:50:12 INFO Data Load Phase Completed
 15:50:12 INFO Index Building Phase
 15:50:12 INFO Creating Index SPO
/home/afs/Release/apache-jena-3.0.0-SNAPSHOT/bin/tdbloader2index: line 346: [:  
total   used   free sharedbuffers cached
Mem:33687789568 9095151616 24592637952  116035584  461705216 4433141760
-/+ buffers/cache: 4200304640 29487484928
Swap:   34359734272  0 34359734272: integer expression expected
 15:50:12 WARN Unable to determine free memory on your OS, can't check whether 
sort will be in-memory or external sort using Temp Directory /tmp/
 15:50:12 INFO Sort SPO
{noformat}

The 
{noformat}
JENA_HOME not set, attempting to locate JENA_HOME automatically
Located JENA_HOME at /home/afs/Release/apache-jena-3.0.0-SNAPSHOT
{noformat}
had fixed by removing those echos (same done to all other scripts).  Normal use 
is the scripts find {{JENA_HOME}} themselves.  I guess you have JENA_HOME set.

FYI: free -b =>
{noformat}
 total   used   free sharedbuffers cached
Mem:33687789568 9065099264 24622690304  116862976  462147584 4433760256
-/+ buffers/cache: 4169191424 29518598144
Swap:   34359734272  0 34359734272
{noformat}
and this becomes the value of {{FREE_MEM}}.


was (Author: andy.seaborne):
Still failing and also earlier fixes undone.

{noformat}
JENA_HOME not set, attempting to locate JENA_HOME automatically
Located JENA_HOME at /home/afs/Release/apache-jena-3.0.0-SNAPSHOT
 15:50:11 INFO -- TDB Bulk Loader Start
 15:50:11 INFO Data Load Phase
 15:50:11 INFO Got 1 data files to load
 15:50:11 INFO Data file 1: /home/afs/tmp/D.ttl
INFO  Load: /home/afs/tmp/D.ttl -- 2015/07/23 15:50:12 BST
INFO  Total: 1 tuples : 0.08 seconds : 12.35 tuples/sec [2015/07/23 15:50:12 
BST]
 15:50:12 INFO Data Load Phase Completed
 15:50:12 INFO Index Building Phase
 15:50:12 INFO Creating Index SPO
/home/afs/Release/apache-jena-3.0.0-SNAPSHOT/bin/tdbloader2index: line 346: [:  
total   used   free sharedbuffers cached
Mem:33687789568 9095151616 24592637952  116035584  461705216 4433141760
-/+ buffers/cache: 4200304640 29487484928
Swap:   34359734272  0 34359734272: integer expression expected
 15:50:12 WARN Unable to determine free memory on your OS, can't check whether 
sort will be in-memory or external sort using Temp Directory /tmp/
 15:50:12 INFO Sort SPO
{noformat}

The 
{noformat}
JENA_HOME not set, attempting to locate JENA_HOME automatically
Located JENA_HOME at /home/afs/Release/apache-jena-3.0.0-SNAPSHOT
{noformat}
had fixed by removing those echos (same done to all other scripts).  Normal use 
is the scripts find {{JENA_HOME}} themselves.  I guess you have JENA_HOME set.

FYI: free -b =>
{noformat}
 total   used   free sharedbuffers cached
Mem:33687789568 9065099264 24622690304  116862976  462147584 4433760256
-/+ buffers/cache: 4169191424 29518598144
Swap:   34359734272  0 34359734272
{noformat}

> tdbloader2 script refactoring
> -
>
> Key: JENA-977
> URL: https://issues.apache.org/jira/browse/JENA-977
> Project: Apache Jena
>  Issue Type: Improvement
>  Components: TDB
>Affects Versions: Jena 2.13.0
>Reporter: Rob Vesse
>Assignee: Rob Vesse
>Priority: Blocker
> Fix For: Jena 2.13.1, Jena 3.0.0
>
>
> As noted on the dev list the current scripts are a little rough around the 
> edges, work items include:
> - Splitting data and index phase into separate scripts
> - Being able to restart a build from a later phase
> - Progress monitoring for the sort portion of indexing
> - Warning if sort is using a disk where you may have insufficient space
> - Better usage summaries
> - Better argument handling (avoid relying on magic environment variables 
> wherever possible)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Comment Edited] (JENA-977) tdbloader2 script refactoring

2015-07-23 Thread Andy Seaborne (JIRA)

[ 
https://issues.apache.org/jira/browse/JENA-977?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14638962#comment-14638962
 ] 

Andy Seaborne edited comment on JENA-977 at 7/23/15 3:08 PM:
-

Even fudging the FREE_MEM issue, the script still hangs when run in an emacs 
shell.  Better in a terminal shell.


was (Author: andy.seaborne):
Even fudging the FREE_MEM issue, the script still hangs.

> tdbloader2 script refactoring
> -
>
> Key: JENA-977
> URL: https://issues.apache.org/jira/browse/JENA-977
> Project: Apache Jena
>  Issue Type: Improvement
>  Components: TDB
>Affects Versions: Jena 2.13.0
>Reporter: Rob Vesse
>Assignee: Rob Vesse
>Priority: Blocker
> Fix For: Jena 2.13.1, Jena 3.0.0
>
>
> As noted on the dev list the current scripts are a little rough around the 
> edges, work items include:
> - Splitting data and index phase into separate scripts
> - Being able to restart a build from a later phase
> - Progress monitoring for the sort portion of indexing
> - Warning if sort is using a disk where you may have insufficient space
> - Better usage summaries
> - Better argument handling (avoid relying on magic environment variables 
> wherever possible)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (JENA-977) tdbloader2 script refactoring

2015-07-23 Thread Andy Seaborne (JIRA)

[ 
https://issues.apache.org/jira/browse/JENA-977?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14638962#comment-14638962
 ] 

Andy Seaborne commented on JENA-977:


Even fudging the FREE_MEM issue, the script still hangs.

> tdbloader2 script refactoring
> -
>
> Key: JENA-977
> URL: https://issues.apache.org/jira/browse/JENA-977
> Project: Apache Jena
>  Issue Type: Improvement
>  Components: TDB
>Affects Versions: Jena 2.13.0
>Reporter: Rob Vesse
>Assignee: Rob Vesse
>Priority: Blocker
> Fix For: Jena 2.13.1, Jena 3.0.0
>
>
> As noted on the dev list the current scripts are a little rough around the 
> edges, work items include:
> - Splitting data and index phase into separate scripts
> - Being able to restart a build from a later phase
> - Progress monitoring for the sort portion of indexing
> - Warning if sort is using a disk where you may have insufficient space
> - Better usage summaries
> - Better argument handling (avoid relying on magic environment variables 
> wherever possible)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Comment Edited] (JENA-977) tdbloader2 script refactoring

2015-07-23 Thread Andy Seaborne (JIRA)

[ 
https://issues.apache.org/jira/browse/JENA-977?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14638962#comment-14638962
 ] 

Andy Seaborne edited comment on JENA-977 at 7/23/15 3:10 PM:
-

Even fudging the FREE_MEM issue, the script still hangs when run in an emacs 
shell.  Better in a terminal shell but it does not reset it correctly.  I now 
have no typing echo.



was (Author: andy.seaborne):
Even fudging the FREE_MEM issue, the script still hangs when run in an emacs 
shell.  Better in a terminal shell.

> tdbloader2 script refactoring
> -
>
> Key: JENA-977
> URL: https://issues.apache.org/jira/browse/JENA-977
> Project: Apache Jena
>  Issue Type: Improvement
>  Components: TDB
>Affects Versions: Jena 2.13.0
>Reporter: Rob Vesse
>Assignee: Rob Vesse
>Priority: Blocker
> Fix For: Jena 2.13.1, Jena 3.0.0
>
>
> As noted on the dev list the current scripts are a little rough around the 
> edges, work items include:
> - Splitting data and index phase into separate scripts
> - Being able to restart a build from a later phase
> - Progress monitoring for the sort portion of indexing
> - Warning if sort is using a disk where you may have insufficient space
> - Better usage summaries
> - Better argument handling (avoid relying on magic environment variables 
> wherever possible)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (JENA-997) Remove ".json" from registration of RDF/JSON.

2015-07-23 Thread Andy Seaborne (JIRA)
Andy Seaborne created JENA-997:
--

 Summary: Remove ".json" from registration of RDF/JSON.
 Key: JENA-997
 URL: https://issues.apache.org/jira/browse/JENA-997
 Project: Apache Jena
  Issue Type: Improvement
Reporter: Andy Seaborne
Priority: Minor


To avoid confusion with JSON-LD,  remove ".json" from registration of RDF/JSON.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (JENA-977) tdbloader2 script refactoring

2015-07-23 Thread Andy Seaborne (JIRA)

[ 
https://issues.apache.org/jira/browse/JENA-977?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14639356#comment-14639356
 ] 

Andy Seaborne commented on JENA-977:


I am at a lost as to how to fix the pv issue.

Maybe setting "HAS_PV=0" for the release, with that, both GNOME Terminal  and 
emacs shell buffer work. 
The terminal I'm using is GNOME Terminal (v3.14.2).



> tdbloader2 script refactoring
> -
>
> Key: JENA-977
> URL: https://issues.apache.org/jira/browse/JENA-977
> Project: Apache Jena
>  Issue Type: Improvement
>  Components: TDB
>Affects Versions: Jena 2.13.0
>Reporter: Rob Vesse
>Assignee: Rob Vesse
>Priority: Blocker
> Fix For: Jena 2.13.1, Jena 3.0.0
>
>
> As noted on the dev list the current scripts are a little rough around the 
> edges, work items include:
> - Splitting data and index phase into separate scripts
> - Being able to restart a build from a later phase
> - Progress monitoring for the sort portion of indexing
> - Warning if sort is using a disk where you may have insufficient space
> - Better usage summaries
> - Better argument handling (avoid relying on magic environment variables 
> wherever possible)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (JENA-997) Remove ".json" from registration of RDF/JSON.

2015-07-23 Thread ASF subversion and git services (JIRA)

[ 
https://issues.apache.org/jira/browse/JENA-997?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14639388#comment-14639388
 ] 

ASF subversion and git services commented on JENA-997:
--

Commit 1042f4cce48988db89359a48eb4c9836856773be in jena's branch 
refs/heads/master from [~andy.seaborne]
[ https://git-wip-us.apache.org/repos/asf?p=jena.git;h=1042f4c ]

JENA-997 : ".json" is no longer an extension for RDF/JSON. 

JSON-LD unaffected.


> Remove ".json" from registration of RDF/JSON.
> -
>
> Key: JENA-997
> URL: https://issues.apache.org/jira/browse/JENA-997
> Project: Apache Jena
>  Issue Type: Improvement
>Reporter: Andy Seaborne
>Priority: Minor
>
> To avoid confusion with JSON-LD,  remove ".json" from registration of 
> RDF/JSON.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Resolved] (JENA-997) Remove ".json" from registration of RDF/JSON.

2015-07-23 Thread Andy Seaborne (JIRA)

 [ 
https://issues.apache.org/jira/browse/JENA-997?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andy Seaborne resolved JENA-997.

   Resolution: Fixed
 Assignee: Andy Seaborne
Fix Version/s: Jena 3.0.0

> Remove ".json" from registration of RDF/JSON.
> -
>
> Key: JENA-997
> URL: https://issues.apache.org/jira/browse/JENA-997
> Project: Apache Jena
>  Issue Type: Improvement
>Reporter: Andy Seaborne
>Assignee: Andy Seaborne
>Priority: Minor
> Fix For: Jena 3.0.0
>
>
> To avoid confusion with JSON-LD,  remove ".json" from registration of 
> RDF/JSON.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (JENA-998) Exception in jena-text when executing query with subject already bound

2015-07-23 Thread Stephen Allen (JIRA)
Stephen Allen created JENA-998:
--

 Summary: Exception in jena-text when executing query with subject 
already bound
 Key: JENA-998
 URL: https://issues.apache.org/jira/browse/JENA-998
 Project: Apache Jena
  Issue Type: Bug
  Components: Text
Reporter: Stephen Allen
Assignee: Stephen Allen


An exception results when querying with jena-text where the subject is already 
bound to a concrete value.

Example:
{code}
select *
where {
  ?s rdf:type  .
  ?s text:query ( rdfs:label "test" ) .
}
{code}

This is caused by the fact that when the subject is concrete, the code is not 
properly checking to see if the score variable exists before trying to bind the 
score to it.

Results:
{code}
java.lang.NullPointerException
at 
org.apache.jena.sparql.engine.binding.Binding1.contains1(Binding1.java:60)
at 
org.apache.jena.sparql.engine.binding.BindingBase.contains(BindingBase.java:108)
at 
org.apache.jena.sparql.engine.binding.BindingBase.contains(BindingBase.java:112)
at 
org.apache.jena.sparql.engine.binding.BindingHashMap.checkAdd(BindingHashMap.java:109)
at 
org.apache.jena.sparql.engine.binding.BindingHashMap.add(BindingHashMap.java:91)
at 
org.apache.jena.sparql.engine.iterator.QueryIterTriplePattern$TripleMapper.insert(QueryIterTriplePattern.java:119)
at 
org.apache.jena.sparql.engine.iterator.QueryIterTriplePattern$TripleMapper.mapper(QueryIterTriplePattern.java:104)
at 
org.apache.jena.sparql.engine.iterator.QueryIterTriplePattern$TripleMapper.hasNextBinding(QueryIterTriplePattern.java:138)
at 
org.apache.jena.sparql.engine.iterator.QueryIteratorBase.hasNext(QueryIteratorBase.java:111)
at 
org.apache.jena.sparql.engine.iterator.QueryIterRepeatApply.hasNextBinding(QueryIterRepeatApply.java:74)
at 
org.apache.jena.sparql.engine.iterator.QueryIteratorBase.hasNext(QueryIteratorBase.java:111)
at 
org.apache.jena.sparql.engine.iterator.QueryIterBlockTriples.hasNextBinding(QueryIterBlockTriples.java:63)
at 
org.apache.jena.sparql.engine.iterator.QueryIteratorBase.hasNext(QueryIteratorBase.java:111)
at 
org.apache.jena.sparql.engine.iterator.QueryIteratorWrapper.hasNextBinding(QueryIteratorWrapper.java:39)
at 
org.apache.jena.sparql.engine.iterator.QueryIteratorBase.hasNext(QueryIteratorBase.java:111)
at 
org.apache.jena.sparql.engine.iterator.QueryIteratorWrapper.hasNextBinding(QueryIteratorWrapper.java:39)
at 
org.apache.jena.sparql.engine.iterator.QueryIteratorBase.hasNext(QueryIteratorBase.java:111)
at 
org.apache.jena.sparql.engine.iterator.QueryIteratorWrapper.hasNextBinding(QueryIteratorWrapper.java:39)
at 
org.apache.jena.sparql.engine.iterator.QueryIteratorBase.hasNext(QueryIteratorBase.java:111)
at 
org.apache.jena.sparql.engine.ResultSetStream.hasNext(ResultSetStream.java:74)
at 
org.apache.jena.sparql.engine.ResultSetCheckCondition.hasNext(ResultSetCheckCondition.java:59)
at org.apache.jena.atlas.iterator.Iter.reduce(Iter.java:165)
at org.apache.jena.atlas.iterator.Iter.toList(Iter.java:111)
at 
org.apache.jena.query.text.TestTextTDB.itShouldWorkWithConcreteSubject(TestTextTDB.java:199)
{code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (JENA-998) Exception in jena-text when executing query with subject already bound

2015-07-23 Thread ASF subversion and git services (JIRA)

[ 
https://issues.apache.org/jira/browse/JENA-998?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14639661#comment-14639661
 ] 

ASF subversion and git services commented on JENA-998:
--

Commit 9164a0cd51cea4446c8488fbb65848fdaa9f8e8c in jena's branch 
refs/heads/master from [~sallen]
[ https://git-wip-us.apache.org/repos/asf?p=jena.git;h=9164a0c ]

JENA-998 Exception in jena-text when executing query with subject already bound


> Exception in jena-text when executing query with subject already bound
> --
>
> Key: JENA-998
> URL: https://issues.apache.org/jira/browse/JENA-998
> Project: Apache Jena
>  Issue Type: Bug
>  Components: Text
>Reporter: Stephen Allen
>Assignee: Stephen Allen
>
> An exception results when querying with jena-text where the subject is 
> already bound to a concrete value.
> Example:
> {code}
> select *
> where {
>   ?s rdf:type  .
>   ?s text:query ( rdfs:label "test" ) .
> }
> {code}
> This is caused by the fact that when the subject is concrete, the code is not 
> properly checking to see if the score variable exists before trying to bind 
> the score to it.
> Results:
> {code}
> java.lang.NullPointerException
>   at 
> org.apache.jena.sparql.engine.binding.Binding1.contains1(Binding1.java:60)
>   at 
> org.apache.jena.sparql.engine.binding.BindingBase.contains(BindingBase.java:108)
>   at 
> org.apache.jena.sparql.engine.binding.BindingBase.contains(BindingBase.java:112)
>   at 
> org.apache.jena.sparql.engine.binding.BindingHashMap.checkAdd(BindingHashMap.java:109)
>   at 
> org.apache.jena.sparql.engine.binding.BindingHashMap.add(BindingHashMap.java:91)
>   at 
> org.apache.jena.sparql.engine.iterator.QueryIterTriplePattern$TripleMapper.insert(QueryIterTriplePattern.java:119)
>   at 
> org.apache.jena.sparql.engine.iterator.QueryIterTriplePattern$TripleMapper.mapper(QueryIterTriplePattern.java:104)
>   at 
> org.apache.jena.sparql.engine.iterator.QueryIterTriplePattern$TripleMapper.hasNextBinding(QueryIterTriplePattern.java:138)
>   at 
> org.apache.jena.sparql.engine.iterator.QueryIteratorBase.hasNext(QueryIteratorBase.java:111)
>   at 
> org.apache.jena.sparql.engine.iterator.QueryIterRepeatApply.hasNextBinding(QueryIterRepeatApply.java:74)
>   at 
> org.apache.jena.sparql.engine.iterator.QueryIteratorBase.hasNext(QueryIteratorBase.java:111)
>   at 
> org.apache.jena.sparql.engine.iterator.QueryIterBlockTriples.hasNextBinding(QueryIterBlockTriples.java:63)
>   at 
> org.apache.jena.sparql.engine.iterator.QueryIteratorBase.hasNext(QueryIteratorBase.java:111)
>   at 
> org.apache.jena.sparql.engine.iterator.QueryIteratorWrapper.hasNextBinding(QueryIteratorWrapper.java:39)
>   at 
> org.apache.jena.sparql.engine.iterator.QueryIteratorBase.hasNext(QueryIteratorBase.java:111)
>   at 
> org.apache.jena.sparql.engine.iterator.QueryIteratorWrapper.hasNextBinding(QueryIteratorWrapper.java:39)
>   at 
> org.apache.jena.sparql.engine.iterator.QueryIteratorBase.hasNext(QueryIteratorBase.java:111)
>   at 
> org.apache.jena.sparql.engine.iterator.QueryIteratorWrapper.hasNextBinding(QueryIteratorWrapper.java:39)
>   at 
> org.apache.jena.sparql.engine.iterator.QueryIteratorBase.hasNext(QueryIteratorBase.java:111)
>   at 
> org.apache.jena.sparql.engine.ResultSetStream.hasNext(ResultSetStream.java:74)
>   at 
> org.apache.jena.sparql.engine.ResultSetCheckCondition.hasNext(ResultSetCheckCondition.java:59)
>   at org.apache.jena.atlas.iterator.Iter.reduce(Iter.java:165)
>   at org.apache.jena.atlas.iterator.Iter.toList(Iter.java:111)
>   at 
> org.apache.jena.query.text.TestTextTDB.itShouldWorkWithConcreteSubject(TestTextTDB.java:199)
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Resolved] (JENA-998) Exception in jena-text when executing query with subject already bound

2015-07-23 Thread Stephen Allen (JIRA)

 [ 
https://issues.apache.org/jira/browse/JENA-998?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Stephen Allen resolved JENA-998.

Resolution: Fixed

> Exception in jena-text when executing query with subject already bound
> --
>
> Key: JENA-998
> URL: https://issues.apache.org/jira/browse/JENA-998
> Project: Apache Jena
>  Issue Type: Bug
>  Components: Text
>Reporter: Stephen Allen
>Assignee: Stephen Allen
>
> An exception results when querying with jena-text where the subject is 
> already bound to a concrete value.
> Example:
> {code}
> select *
> where {
>   ?s rdf:type  .
>   ?s text:query ( rdfs:label "test" ) .
> }
> {code}
> This is caused by the fact that when the subject is concrete, the code is not 
> properly checking to see if the score variable exists before trying to bind 
> the score to it.
> Results:
> {code}
> java.lang.NullPointerException
>   at 
> org.apache.jena.sparql.engine.binding.Binding1.contains1(Binding1.java:60)
>   at 
> org.apache.jena.sparql.engine.binding.BindingBase.contains(BindingBase.java:108)
>   at 
> org.apache.jena.sparql.engine.binding.BindingBase.contains(BindingBase.java:112)
>   at 
> org.apache.jena.sparql.engine.binding.BindingHashMap.checkAdd(BindingHashMap.java:109)
>   at 
> org.apache.jena.sparql.engine.binding.BindingHashMap.add(BindingHashMap.java:91)
>   at 
> org.apache.jena.sparql.engine.iterator.QueryIterTriplePattern$TripleMapper.insert(QueryIterTriplePattern.java:119)
>   at 
> org.apache.jena.sparql.engine.iterator.QueryIterTriplePattern$TripleMapper.mapper(QueryIterTriplePattern.java:104)
>   at 
> org.apache.jena.sparql.engine.iterator.QueryIterTriplePattern$TripleMapper.hasNextBinding(QueryIterTriplePattern.java:138)
>   at 
> org.apache.jena.sparql.engine.iterator.QueryIteratorBase.hasNext(QueryIteratorBase.java:111)
>   at 
> org.apache.jena.sparql.engine.iterator.QueryIterRepeatApply.hasNextBinding(QueryIterRepeatApply.java:74)
>   at 
> org.apache.jena.sparql.engine.iterator.QueryIteratorBase.hasNext(QueryIteratorBase.java:111)
>   at 
> org.apache.jena.sparql.engine.iterator.QueryIterBlockTriples.hasNextBinding(QueryIterBlockTriples.java:63)
>   at 
> org.apache.jena.sparql.engine.iterator.QueryIteratorBase.hasNext(QueryIteratorBase.java:111)
>   at 
> org.apache.jena.sparql.engine.iterator.QueryIteratorWrapper.hasNextBinding(QueryIteratorWrapper.java:39)
>   at 
> org.apache.jena.sparql.engine.iterator.QueryIteratorBase.hasNext(QueryIteratorBase.java:111)
>   at 
> org.apache.jena.sparql.engine.iterator.QueryIteratorWrapper.hasNextBinding(QueryIteratorWrapper.java:39)
>   at 
> org.apache.jena.sparql.engine.iterator.QueryIteratorBase.hasNext(QueryIteratorBase.java:111)
>   at 
> org.apache.jena.sparql.engine.iterator.QueryIteratorWrapper.hasNextBinding(QueryIteratorWrapper.java:39)
>   at 
> org.apache.jena.sparql.engine.iterator.QueryIteratorBase.hasNext(QueryIteratorBase.java:111)
>   at 
> org.apache.jena.sparql.engine.ResultSetStream.hasNext(ResultSetStream.java:74)
>   at 
> org.apache.jena.sparql.engine.ResultSetCheckCondition.hasNext(ResultSetCheckCondition.java:59)
>   at org.apache.jena.atlas.iterator.Iter.reduce(Iter.java:165)
>   at org.apache.jena.atlas.iterator.Iter.toList(Iter.java:111)
>   at 
> org.apache.jena.query.text.TestTextTDB.itShouldWorkWithConcreteSubject(TestTextTDB.java:199)
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (JENA-999) Poor jena-text query performance when a bound subject is used

2015-07-23 Thread Stephen Allen (JIRA)
Stephen Allen created JENA-999:
--

 Summary: Poor jena-text query performance when a bound subject is 
used
 Key: JENA-999
 URL: https://issues.apache.org/jira/browse/JENA-999
 Project: Apache Jena
  Issue Type: Improvement
Reporter: Stephen Allen
Assignee: Stephen Allen
Priority: Minor


When executing a jena-text query, the performance is terrible if the subject is 
already bound to a variable.  This is because the current code will execute a 
new lucene query that does not have the subject/entity bound on every iteration 
and then iterate through the lucene results to join against the subject.  This 
is quite inefficient.

Example query:
{code}
select *
where {
  ?s rdf:type  .
  ?s text:query ( rdfs:label "test" ) .
}
{code}
This would be quite slow if there were a lot of entities in the system.

Two potential solutions present themselves:
# Craft a more explicit lucene query that specifies the entity URI, so that the 
results coming back from lucene are much smaller.  However, this would cause 
problems with the score not being correct across multiple iterations.  
Additionally we are still potentially running a lot of lucene queries, each of 
which has a probably non-negligble constant cost (parsing the query string, 
etc).
# Execute the more general lucene query the first time it is encountered, then 
caching the results somewhere.  From there, we can then perform a hash table 
lookup against those cached results.

I would like to pursue option 2, but there is a problem.  Because jena-text is 
implemented as a property function instead of a query op in and of itself (like 
QueryIterMinus is for example), we have to find a place to stash the lucene 
results.  I believe this can be done by placing it in the ExecutionContext 
object, using the lucene query as a cache key.  Updates provide a slightly 
troubling case because you could have an update request like:
{code}
insert data {  rdf:type  ; rdfs:label 
"test" } ;

delete { ?s ?p ?o }
where { ?s rdf:type  ; text:query ( rdfs:label 
"test" ) . ?p ?o . } ;

insert data {  rdf:type  ; rdfs:label 
"test" } ;

delete { ?s ?p ?o }
where { ?s rdf:type  ; text:query ( rdfs:label 
"test" ) ; ?p ?o . }
{code}
And then the end result should be an empty database.  But if the 
ExecutionContext was the same for both delete queries, you would be using the 
cached results from the first delete query in the second delete query, which 
would result in {{}} not being deleted properly.

If the ExecutionContext is indeed shared between the two update queries in the 
situation above, I think this can be solved by making the cache key for the 
lucene resultset be a combination of both the lucene query and the 
QueryIterRoot or BindingRoot.  I need to investigate this.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (JENA-999) Poor jena-text query performance when a bound subject is used

2015-07-23 Thread Stephen Allen (JIRA)

 [ 
https://issues.apache.org/jira/browse/JENA-999?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Stephen Allen updated JENA-999:
---
Description: 
When executing a jena-text query, the performance is terrible if the subject is 
already bound to a variable.  This is because the current code will execute a 
new lucene query that does not have the subject/entity bound on every iteration 
and then iterate through the lucene results to join against the subject.  This 
is quite inefficient.

Example query:
{code}
select *
where {
  ?s rdf:type  .
  ?s text:query ( rdfs:label "test" ) .
}
{code}
This would be quite slow if there were a lot of entities in the system.

Two potential solutions present themselves:
# Craft a more explicit lucene query that specifies the entity URI, so that the 
results coming back from lucene are much smaller.  However, this would cause 
problems with the score not being correct across multiple iterations.  
Additionally we are still potentially running a lot of lucene queries, each of 
which has a probably non-negligble constant cost (parsing the query string, 
etc).
# Execute the more general lucene query the first time it is encountered, then 
caching the results somewhere.  From there, we can then perform a hash table 
lookup against those cached results.

I would like to pursue option 2, but there is a problem.  Because jena-text is 
implemented as a property function instead of a query op in and of itself (like 
QueryIterMinus is for example), we have to find a place to stash the lucene 
results.  I believe this can be done by placing it in the ExecutionContext 
object, using the lucene query as a cache key.  Updates provide a slightly 
troubling case because you could have an update request like:
{code}
insert data {  rdf:type  ; rdfs:label 
"test" } ;

delete { ?s ?p ?o }
where { ?s rdf:type  ; text:query ( rdfs:label 
"test" ) . ?p ?o . } ;

insert data {  rdf:type  ; rdfs:label 
"test" } ;

delete { ?s ?p ?o }
where { ?s rdf:type  ; text:query ( rdfs:label 
"test" ) ; ?p ?o . }
{code}
And then the end result should be an empty database.  But if the 
ExecutionContext was the same for both delete queries, you would be using the 
cached results from the first delete query in the second delete query, which 
would result in {{}} not being deleted properly.

If the ExecutionContext is indeed shared between the two update queries in the 
situation above, I think this can be solved by making the cache key for the 
lucene resultset be a combination of both the lucene query and the 
QueryIterRoot or BindingRoot.  I need to investigate this.  An alternative, if 
there was a way to be notified when a query has finished executing, we could 
clear the cache in the ExecutionContext.

  was:
When executing a jena-text query, the performance is terrible if the subject is 
already bound to a variable.  This is because the current code will execute a 
new lucene query that does not have the subject/entity bound on every iteration 
and then iterate through the lucene results to join against the subject.  This 
is quite inefficient.

Example query:
{code}
select *
where {
  ?s rdf:type  .
  ?s text:query ( rdfs:label "test" ) .
}
{code}
This would be quite slow if there were a lot of entities in the system.

Two potential solutions present themselves:
# Craft a more explicit lucene query that specifies the entity URI, so that the 
results coming back from lucene are much smaller.  However, this would cause 
problems with the score not being correct across multiple iterations.  
Additionally we are still potentially running a lot of lucene queries, each of 
which has a probably non-negligble constant cost (parsing the query string, 
etc).
# Execute the more general lucene query the first time it is encountered, then 
caching the results somewhere.  From there, we can then perform a hash table 
lookup against those cached results.

I would like to pursue option 2, but there is a problem.  Because jena-text is 
implemented as a property function instead of a query op in and of itself (like 
QueryIterMinus is for example), we have to find a place to stash the lucene 
results.  I believe this can be done by placing it in the ExecutionContext 
object, using the lucene query as a cache key.  Updates provide a slightly 
troubling case because you could have an update request like:
{code}
insert data {  rdf:type  ; rdfs:label 
"test" } ;

delete { ?s ?p ?o }
where { ?s rdf:type  ; text:query ( rdfs:label 
"test" ) . ?p ?o . } ;

insert data {  rdf:type  ; rdfs:label 
"test" } ;

delete { ?s ?p ?o }
where { ?s rdf:type  ; text:query ( rdfs:label 
"test" ) ; ?p ?o . }
{code}
And th