[jira] [Commented] (JENA-1093) jena-text query doesn't return all matching literals

2016-01-05 Thread ASF subversion and git services (JIRA)

[ 
https://issues.apache.org/jira/browse/JENA-1093?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15083242#comment-15083242
 ] 

ASF subversion and git services commented on JENA-1093:
---

Commit 859fa47b66c6ce0701d0b080c835955e96f30c73 in jena's branch 
refs/heads/master from [~osma]
[ https://git-wip-us.apache.org/repos/asf?p=jena.git;h=859fa47 ]

JENA-1093: return multiple literals from text query with bound subject


> jena-text query doesn't return all matching literals
> 
>
> Key: JENA-1093
> URL: https://issues.apache.org/jira/browse/JENA-1093
> Project: Apache Jena
>  Issue Type: Bug
>  Components: Text
>Affects Versions: Jena 3.0.1
>Reporter: Osma Suominen
>Assignee: Osma Suominen
>
> After the optimizations in JENA-999, the text:query property function, when 
> asked for stored literal values, no longer returns all matching literals. 
> Instead, each subject is returned with a random TextHit (i.e. score+literal 
> pair). This is a problem for me because I want to show to the user the most 
> relevant reason why the search matched a particular SKOS concept (there may 
> be many matching labels in various languages), or in some cases all the 
> reasons. 
> Also the returned match may not have the highest score, which could be a 
> problem if one is interested in the score (I'm not).
> For example, with storeLiterals enabled and this data:
> {noformat}
> ex:subject rdfs:label "one reason", "another reason" .
> {noformat}
> this query
> {noformat}
> (?s ?score ?literal) text:query "reason" .
> {noformat}
> will return a single binding where ?literal is bound to either "one reason" 
> or "another reason".
> Before JENA-999 it returned two bindings, one per literal.
> The culprit is the post-JENA-999 code in the TextQueryPF.exec method, 
> particularly around this line that suppresses subsequent hits with the same 
> subject URI:
> https://github.com/apache/jena/blob/master/jena-text/src/main/java/org/apache/jena/query/text/TextQueryPF.java#L188
> I already have a failing unit test that shows what I'd like to accomplish. I 
> will try to make a PR at some point.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (JENA-1093) jena-text query doesn't return all matching literals

2016-01-05 Thread Osma Suominen (JIRA)

 [ 
https://issues.apache.org/jira/browse/JENA-1093?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Osma Suominen updated JENA-1093:

Fix Version/s: Jena 3.1.0

> jena-text query doesn't return all matching literals
> 
>
> Key: JENA-1093
> URL: https://issues.apache.org/jira/browse/JENA-1093
> Project: Apache Jena
>  Issue Type: Bug
>  Components: Text
>Affects Versions: Jena 3.0.1
>Reporter: Osma Suominen
>Assignee: Osma Suominen
> Fix For: Jena 3.1.0
>
>
> After the optimizations in JENA-999, the text:query property function, when 
> asked for stored literal values, no longer returns all matching literals. 
> Instead, each subject is returned with a random TextHit (i.e. score+literal 
> pair). This is a problem for me because I want to show to the user the most 
> relevant reason why the search matched a particular SKOS concept (there may 
> be many matching labels in various languages), or in some cases all the 
> reasons. 
> Also the returned match may not have the highest score, which could be a 
> problem if one is interested in the score (I'm not).
> For example, with storeLiterals enabled and this data:
> {noformat}
> ex:subject rdfs:label "one reason", "another reason" .
> {noformat}
> this query
> {noformat}
> (?s ?score ?literal) text:query "reason" .
> {noformat}
> will return a single binding where ?literal is bound to either "one reason" 
> or "another reason".
> Before JENA-999 it returned two bindings, one per literal.
> The culprit is the post-JENA-999 code in the TextQueryPF.exec method, 
> particularly around this line that suppresses subsequent hits with the same 
> subject URI:
> https://github.com/apache/jena/blob/master/jena-text/src/main/java/org/apache/jena/query/text/TextQueryPF.java#L188
> I already have a failing unit test that shows what I'd like to accomplish. I 
> will try to make a PR at some point.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


Re: jena git commit: JENA-1108: Fix apache-jena as a pom to get jena-cmds

2016-01-05 Thread Andy Seaborne

On 05/01/16 14:15, Rob Vesse wrote:

Why are the dependencies marked optional?


The dependencies for sources and javadoc are explicitly named so the 
assembly pulls them into the distribution.  There maybe a better way to 
do that.  If there is, let's do that.


They are optional because they get pulled in as jar files and become 
maven dependencies when using apache-jenapom.


If they get on the classpath, the jars have the same package structure 
as the binary class jars.


If one is found before the binary class jar, it was getting used which 
does not work.


It does not happen when you have the projects open in Eclipse 
development, only when using the maven artifacts.


mvn dependency:tree shows

[INFO] +- org.apache.jena:jena-tdb:jar:sources:3.1.0-SNAPSHOT:compile

If there is a better way to get the javadoc and sources , then lets do 
it.  This way is a bit of a hack. Is there a correct way to do all this?


Andy

Alternatives, neither of which are very nice, include (1) separate the 
artifacts [naming?] or (2) demoting the assembly step to a special build 
step, which in the past gets mixed as to what the uploaded artifact 
actually is.




Rob

On 05/01/2016 13:41, "a...@apache.org"  wrote:


Repository: jena
Updated Branches:
  refs/heads/master 620b6f278 -> 245d5cad8


JENA-1108: Fix apache-jena as a pom to get jena-cmds


Project: http://git-wip-us.apache.org/repos/asf/jena/repo
Commit: http://git-wip-us.apache.org/repos/asf/jena/commit/245d5cad
Tree: http://git-wip-us.apache.org/repos/asf/jena/tree/245d5cad
Diff: http://git-wip-us.apache.org/repos/asf/jena/diff/245d5cad

Branch: refs/heads/master
Commit: 245d5cad8755ab3da561f9e4b02e1d35bae8fedf
Parents: 620b6f2
Author: Andy Seaborne 
Authored: Tue Jan 5 13:40:59 2016 +
Committer: Andy Seaborne 
Committed: Tue Jan 5 13:40:59 2016 +

--
apache-jena/pom.xml  | 8 
.../src/main/java/org/apache/jena/fuseki/build/Template.java | 5 +
.../org/apache/jena/fuseki/server/templates/config-mem   | 2 +-
.../org/apache/jena/fuseki/server/templates/config-tdb-mem   | 5 ++---
4 files changed, 16 insertions(+), 4 deletions(-)
--


http://git-wip-us.apache.org/repos/asf/jena/blob/245d5cad/apache-jena/pom.
xml
--
diff --git a/apache-jena/pom.xml b/apache-jena/pom.xml
index 4bbc6e8..ec7f94b 100644
--- a/apache-jena/pom.xml
+++ b/apache-jena/pom.xml
@@ -68,6 +68,7 @@
   jena-arq
   3.1.0-SNAPSHOT
   sources
+  true
 

 
@@ -75,6 +76,7 @@
   jena-arq
   3.1.0-SNAPSHOT
   javadoc
+  true
 

 
@@ -88,6 +90,7 @@
   jena-core
   3.1.0-SNAPSHOT
   sources
+  true
 

 
@@ -95,6 +98,7 @@
   jena-core
   3.1.0-SNAPSHOT
   javadoc
+  true
 

 
@@ -108,6 +112,7 @@
   jena-tdb
   3.1.0-SNAPSHOT
   sources
+  true
 

 
@@ -115,6 +120,7 @@
   jena-tdb
   3.1.0-SNAPSHOT
   javadoc
+  true
 

 
@@ -128,6 +134,7 @@
   jena-cmds
   3.1.0-SNAPSHOT
   sources
+  true
 

 
@@ -135,6 +142,7 @@
   jena-cmds
   3.1.0-SNAPSHOT
   javadoc
+  true
 

   

http://git-wip-us.apache.org/repos/asf/jena/blob/245d5cad/jena-fuseki2/jen
a-fuseki-core/src/main/java/org/apache/jena/fuseki/build/Template.java
--
diff --git
a/jena-fuseki2/jena-fuseki-core/src/main/java/org/apache/jena/fuseki/build
/Template.java
b/jena-fuseki2/jena-fuseki-core/src/main/java/org/apache/jena/fuseki/build
/Template.java
index 55a449e..b6cfccf 100644
---
a/jena-fuseki2/jena-fuseki-core/src/main/java/org/apache/jena/fuseki/build
/Template.java
+++
b/jena-fuseki2/jena-fuseki-core/src/main/java/org/apache/jena/fuseki/build
/Template.java
@@ -35,6 +35,11 @@ public class Template
 public static final String templateTDBDirFN =
templateDir+"/config-tdb-dir" ;
 public static final String templateServiceFN=
templateDir+"/config-service" ;   // Dummy used by dataset-less
service.

+public static final String templateMemFN_1  =
templateDir+"/config-mem-txn" ;
+
+
+
+
 // Template may be in a resources area of a jar file so you can't do
a directory listing.
 public static final String[] templateNames = {
 templateMemFN ,

http://git-wip-us.apache.org/repos/asf/jena/blob/245d5cad/jena-fuseki2/jen
a-fuseki-core/src/main/resources/org/apache/jena/fuseki/server/templates/c
onfig-mem
--
diff --git
a/jena-fuseki2/jena-fuseki-core/src/main/resources/org/apache/jena/fuseki/
server/templates/config-mem

[jira] [Resolved] (JENA-1093) jena-text query doesn't return all matching literals

2016-01-05 Thread Osma Suominen (JIRA)

 [ 
https://issues.apache.org/jira/browse/JENA-1093?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Osma Suominen resolved JENA-1093.
-
Resolution: Fixed

Merged PR #112, closing the issue.

> jena-text query doesn't return all matching literals
> 
>
> Key: JENA-1093
> URL: https://issues.apache.org/jira/browse/JENA-1093
> Project: Apache Jena
>  Issue Type: Bug
>  Components: Text
>Affects Versions: Jena 3.0.1
>Reporter: Osma Suominen
>Assignee: Osma Suominen
>
> After the optimizations in JENA-999, the text:query property function, when 
> asked for stored literal values, no longer returns all matching literals. 
> Instead, each subject is returned with a random TextHit (i.e. score+literal 
> pair). This is a problem for me because I want to show to the user the most 
> relevant reason why the search matched a particular SKOS concept (there may 
> be many matching labels in various languages), or in some cases all the 
> reasons. 
> Also the returned match may not have the highest score, which could be a 
> problem if one is interested in the score (I'm not).
> For example, with storeLiterals enabled and this data:
> {noformat}
> ex:subject rdfs:label "one reason", "another reason" .
> {noformat}
> this query
> {noformat}
> (?s ?score ?literal) text:query "reason" .
> {noformat}
> will return a single binding where ?literal is bound to either "one reason" 
> or "another reason".
> Before JENA-999 it returned two bindings, one per literal.
> The culprit is the post-JENA-999 code in the TextQueryPF.exec method, 
> particularly around this line that suppresses subsequent hits with the same 
> subject URI:
> https://github.com/apache/jena/blob/master/jena-text/src/main/java/org/apache/jena/query/text/TextQueryPF.java#L188
> I already have a failing unit test that shows what I'd like to accomplish. I 
> will try to make a PR at some point.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


Re: jena git commit: JENA-1108: Fix apache-jena as a pom to get jena-cmds

2016-01-05 Thread Rob Vesse
Why are the dependencies marked optional?

Rob

On 05/01/2016 13:41, "a...@apache.org"  wrote:

>Repository: jena
>Updated Branches:
>  refs/heads/master 620b6f278 -> 245d5cad8
>
>
>JENA-1108: Fix apache-jena as a pom to get jena-cmds
>
>
>Project: http://git-wip-us.apache.org/repos/asf/jena/repo
>Commit: http://git-wip-us.apache.org/repos/asf/jena/commit/245d5cad
>Tree: http://git-wip-us.apache.org/repos/asf/jena/tree/245d5cad
>Diff: http://git-wip-us.apache.org/repos/asf/jena/diff/245d5cad
>
>Branch: refs/heads/master
>Commit: 245d5cad8755ab3da561f9e4b02e1d35bae8fedf
>Parents: 620b6f2
>Author: Andy Seaborne 
>Authored: Tue Jan 5 13:40:59 2016 +
>Committer: Andy Seaborne 
>Committed: Tue Jan 5 13:40:59 2016 +
>
>--
> apache-jena/pom.xml  | 8 
> .../src/main/java/org/apache/jena/fuseki/build/Template.java | 5 +
> .../org/apache/jena/fuseki/server/templates/config-mem   | 2 +-
> .../org/apache/jena/fuseki/server/templates/config-tdb-mem   | 5 ++---
> 4 files changed, 16 insertions(+), 4 deletions(-)
>--
>
>
>http://git-wip-us.apache.org/repos/asf/jena/blob/245d5cad/apache-jena/pom.
>xml
>--
>diff --git a/apache-jena/pom.xml b/apache-jena/pom.xml
>index 4bbc6e8..ec7f94b 100644
>--- a/apache-jena/pom.xml
>+++ b/apache-jena/pom.xml
>@@ -68,6 +68,7 @@
>   jena-arq
>   3.1.0-SNAPSHOT
>   sources
>+  true
> 
> 
> 
>@@ -75,6 +76,7 @@
>   jena-arq
>   3.1.0-SNAPSHOT
>   javadoc
>+  true
> 
> 
> 
>@@ -88,6 +90,7 @@
>   jena-core
>   3.1.0-SNAPSHOT
>   sources
>+  true
> 
> 
> 
>@@ -95,6 +98,7 @@
>   jena-core
>   3.1.0-SNAPSHOT
>   javadoc
>+  true
> 
> 
> 
>@@ -108,6 +112,7 @@
>   jena-tdb
>   3.1.0-SNAPSHOT
>   sources
>+  true
> 
> 
> 
>@@ -115,6 +120,7 @@
>   jena-tdb
>   3.1.0-SNAPSHOT
>   javadoc
>+  true
> 
> 
> 
>@@ -128,6 +134,7 @@
>   jena-cmds
>   3.1.0-SNAPSHOT
>   sources
>+  true
> 
> 
> 
>@@ -135,6 +142,7 @@
>   jena-cmds
>   3.1.0-SNAPSHOT
>   javadoc
>+  true
> 
> 
>   
>
>http://git-wip-us.apache.org/repos/asf/jena/blob/245d5cad/jena-fuseki2/jen
>a-fuseki-core/src/main/java/org/apache/jena/fuseki/build/Template.java
>--
>diff --git 
>a/jena-fuseki2/jena-fuseki-core/src/main/java/org/apache/jena/fuseki/build
>/Template.java 
>b/jena-fuseki2/jena-fuseki-core/src/main/java/org/apache/jena/fuseki/build
>/Template.java
>index 55a449e..b6cfccf 100644
>--- 
>a/jena-fuseki2/jena-fuseki-core/src/main/java/org/apache/jena/fuseki/build
>/Template.java
>+++ 
>b/jena-fuseki2/jena-fuseki-core/src/main/java/org/apache/jena/fuseki/build
>/Template.java
>@@ -35,6 +35,11 @@ public class Template
> public static final String templateTDBDirFN =
>templateDir+"/config-tdb-dir" ;
> public static final String templateServiceFN=
>templateDir+"/config-service" ;   // Dummy used by dataset-less
>service.
> 
>+public static final String templateMemFN_1  =
>templateDir+"/config-mem-txn" ;
>+
>+
>+
>+
> // Template may be in a resources area of a jar file so you can't do
>a directory listing.
> public static final String[] templateNames = {
> templateMemFN ,
>
>http://git-wip-us.apache.org/repos/asf/jena/blob/245d5cad/jena-fuseki2/jen
>a-fuseki-core/src/main/resources/org/apache/jena/fuseki/server/templates/c
>onfig-mem
>--
>diff --git 
>a/jena-fuseki2/jena-fuseki-core/src/main/resources/org/apache/jena/fuseki/
>server/templates/config-mem
>b/jena-fuseki2/jena-fuseki-core/src/main/resources/org/apache/jena/fuseki/
>server/templates/config-mem
>index 06dcf1e..e455cca 100644
>--- 
>a/jena-fuseki2/jena-fuseki-core/src/main/resources/org/apache/jena/fuseki/
>server/templates/config-mem
>+++ 
>b/jena-fuseki2/jena-fuseki-core/src/main/resources/org/apache/jena/fuseki/
>server/templates/config-mem
>@@ -12,7 +12,7 @@
> ## Updatable in-memory dataset.
> 
> <#service1> rdf:type fuseki:Service ;
>-# URI of the dataset -- http://host:port/ds
>+# URI of the dataset -- http://host:port/{NAME}
> fuseki:name"{NAME}" ;
> fuseki:serviceQuery"sparql" ;
> fuseki:serviceQuery"query" ;
>
>http://git-wip-us.apache.org/repos/asf/jena/blob/245d5cad/jena-fuseki2/jen
>a-fuseki-core/src/main/resources/org/apache/jena/fuseki/server/templates/c
>onfig-tdb-mem
>--
>diff --git 

[jira] [Commented] (JENA-1093) jena-text query doesn't return all matching literals

2016-01-05 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/JENA-1093?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15083243#comment-15083243
 ] 

ASF GitHub Bot commented on JENA-1093:
--

Github user osma closed the pull request at:

https://github.com/apache/jena/pull/112


> jena-text query doesn't return all matching literals
> 
>
> Key: JENA-1093
> URL: https://issues.apache.org/jira/browse/JENA-1093
> Project: Apache Jena
>  Issue Type: Bug
>  Components: Text
>Affects Versions: Jena 3.0.1
>Reporter: Osma Suominen
>Assignee: Osma Suominen
>
> After the optimizations in JENA-999, the text:query property function, when 
> asked for stored literal values, no longer returns all matching literals. 
> Instead, each subject is returned with a random TextHit (i.e. score+literal 
> pair). This is a problem for me because I want to show to the user the most 
> relevant reason why the search matched a particular SKOS concept (there may 
> be many matching labels in various languages), or in some cases all the 
> reasons. 
> Also the returned match may not have the highest score, which could be a 
> problem if one is interested in the score (I'm not).
> For example, with storeLiterals enabled and this data:
> {noformat}
> ex:subject rdfs:label "one reason", "another reason" .
> {noformat}
> this query
> {noformat}
> (?s ?score ?literal) text:query "reason" .
> {noformat}
> will return a single binding where ?literal is bound to either "one reason" 
> or "another reason".
> Before JENA-999 it returned two bindings, one per literal.
> The culprit is the post-JENA-999 code in the TextQueryPF.exec method, 
> particularly around this line that suppresses subsequent hits with the same 
> subject URI:
> https://github.com/apache/jena/blob/master/jena-text/src/main/java/org/apache/jena/query/text/TextQueryPF.java#L188
> I already have a failing unit test that shows what I'd like to accomplish. I 
> will try to make a PR at some point.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[GitHub] jena pull request: JENA-1093: return multiple literals from text q...

2016-01-05 Thread osma
Github user osma closed the pull request at:

https://github.com/apache/jena/pull/112


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] jena pull request: JENA-999: jena-text Lucene cache using multimap...

2016-01-05 Thread osma
GitHub user osma opened a pull request:

https://github.com/apache/jena/pull/119

JENA-999: jena-text Lucene cache using multimaps

This set of commits implements a caching layer for Lucene queries. The 
cache is stored in the Context so that it is persisted even when new 
TextQueryPF's are created. Cache entries for query results are Guava Multimaps, 
which allow efficient lookups of known subject URIs in the case where the 
subject is already bound.

@afs I hope I did the Context storage right. You said it will have the 
right lifetime and I hope that's true since otherwise memory leaks may occur. I 
looked at Stephen Allen's example from the jena-text-cache experimental branch: 
https://github.com/apache/jena/commit/45081fabe012c56b3fc7ae6a92b4518245779eb2

I have verified that this gives good performance with Stephen's example 
queries, even in the UNION case where TextQueryPF is recreated over and over. 
For example, a query with 11,111 results is answered in less than 300 ms.

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/osma/jena jena-text-lucene-cache

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/jena/pull/119.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #119


commit a7bb1094a1750492c290d03ad3957d8fe42d4e2c
Author: Osma Suominen 
Date:   2015-12-22T16:45:50Z

very simple caching of Lucene query results in a hash map

commit af302e2b5cfa3ff2db9e1901dc36df547b1c4bad
Author: Osma Suominen 
Date:   2016-01-05T20:05:31Z

move Lucene query cache to Context for some persistence

commit b54e38bc00cfa3ddbb3969c4d8fb1efe658af9ea
Author: Osma Suominen 
Date:   2016-01-05T20:07:24Z

remove unused import

commit 718d275a7c5f160a0050ba392fdc1affadea093a
Author: Osma Suominen 
Date:   2016-01-05T20:34:02Z

store Multimaps in the cache for more efficient retrieval of known subject 
URIs




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[jira] [Commented] (JENA-999) Poor jena-text query performance when a bound subject is used

2016-01-05 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/JENA-999?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15083737#comment-15083737
 ] 

ASF GitHub Bot commented on JENA-999:
-

GitHub user osma opened a pull request:

https://github.com/apache/jena/pull/119

JENA-999: jena-text Lucene cache using multimaps

This set of commits implements a caching layer for Lucene queries. The 
cache is stored in the Context so that it is persisted even when new 
TextQueryPF's are created. Cache entries for query results are Guava Multimaps, 
which allow efficient lookups of known subject URIs in the case where the 
subject is already bound.

@afs I hope I did the Context storage right. You said it will have the 
right lifetime and I hope that's true since otherwise memory leaks may occur. I 
looked at Stephen Allen's example from the jena-text-cache experimental branch: 
https://github.com/apache/jena/commit/45081fabe012c56b3fc7ae6a92b4518245779eb2

I have verified that this gives good performance with Stephen's example 
queries, even in the UNION case where TextQueryPF is recreated over and over. 
For example, a query with 11,111 results is answered in less than 300 ms.

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/osma/jena jena-text-lucene-cache

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/jena/pull/119.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #119


commit a7bb1094a1750492c290d03ad3957d8fe42d4e2c
Author: Osma Suominen 
Date:   2015-12-22T16:45:50Z

very simple caching of Lucene query results in a hash map

commit af302e2b5cfa3ff2db9e1901dc36df547b1c4bad
Author: Osma Suominen 
Date:   2016-01-05T20:05:31Z

move Lucene query cache to Context for some persistence

commit b54e38bc00cfa3ddbb3969c4d8fb1efe658af9ea
Author: Osma Suominen 
Date:   2016-01-05T20:07:24Z

remove unused import

commit 718d275a7c5f160a0050ba392fdc1affadea093a
Author: Osma Suominen 
Date:   2016-01-05T20:34:02Z

store Multimaps in the cache for more efficient retrieval of known subject 
URIs




> Poor jena-text query performance when a bound subject is used
> -
>
> Key: JENA-999
> URL: https://issues.apache.org/jira/browse/JENA-999
> Project: Apache Jena
>  Issue Type: Improvement
>Reporter: Stephen Allen
>Assignee: Stephen Allen
>Priority: Minor
> Attachments: PerformanceTester.java, jena-text benchmarks.png
>
>
> When executing a jena-text query, the performance is terrible if the subject 
> is already bound to a variable.  This is because the current code will 
> execute a new lucene query that does not have the subject/entity bound on 
> every iteration and then iterate through the lucene results to join against 
> the subject.  This is quite inefficient.
> Example query:
> {code}
> select *
> where {
>   ?s rdf:type  .
>   ?s text:query ( rdfs:label "test" ) .
> }
> {code}
> This would be quite slow if there were a lot of entities in the system.
> Two potential solutions present themselves:
> # Craft a more explicit lucene query that specifies the entity URI, so that 
> the results coming back from lucene are much smaller.  However, this would 
> cause problems with the score not being correct across multiple iterations.  
> Additionally we are still potentially running a lot of lucene queries, each 
> of which has a probably non-negligble constant cost (parsing the query 
> string, etc).
> # Execute the more general lucene query the first time it is encountered, 
> then caching the results somewhere.  From there, we can then perform a hash 
> table lookup against those cached results.
> I would like to pursue option 2, but there is a problem.  Because jena-text 
> is implemented as a property function instead of a query op in and of itself 
> (like QueryIterMinus is for example), we have to find a place to stash the 
> lucene results.  I believe this can be done by placing it in the 
> ExecutionContext object, using the lucene query as a cache key.  Updates 
> provide a slightly troubling case because you could have an update request 
> like:
> {code}
> insert data {  rdf:type  ; rdfs:label 
> "test" } ;
> delete { ?s ?p ?o }
> where { ?s rdf:type  ; text:query ( rdfs:label 
> "test" ) . ?p ?o . } ;
> insert data {  rdf:type  ; rdfs:label 
> "test" } ;
> delete { ?s ?p ?o }
> where { ?s rdf:type  ; text:query ( rdfs:label 
> "test" ) ; 

[jira] [Commented] (JENA-999) Poor jena-text query performance when a bound subject is used

2016-01-05 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/JENA-999?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15083918#comment-15083918
 ] 

ASF GitHub Bot commented on JENA-999:
-

Github user ajs6f commented on a diff in the pull request:

https://github.com/apache/jena/pull/119#discussion_r48902208
  
--- Diff: 
jena-text/src/main/java/org/apache/jena/query/text/TextQueryPF.java ---
@@ -268,7 +276,25 @@ private QueryIterator concreteSubject(Binding binding, 
Node s, Node score, Node
 Explain.explain(execCxt.getContext(), "Text query: "+queryString) ;
 if ( log.isDebugEnabled())
 log.debug("Text query: {} ({})", queryString,limit) ;
-return textIndex.query(property, queryString, limit) ;
+
+String cacheKey = limit + " " + property + " " + queryString ;
+Map> queryCache = 
+(Map>) 
execCxt.getContext().get(cacheSymbol);
+if (queryCache == null) { /* doesn't yet exist, need to create it 
*/
+queryCache = new LinkedHashMap();
+execCxt.getContext().put(cacheSymbol, queryCache);
+}
+
+ListMultimap results = queryCache.get(cacheKey) ;
+if (results == null) { /* cache miss */
--- End diff --

Because you want to cache the result if it isn't already cached, maybe 
`queryCache.asMap()::computeIfAbsent` could be useful here? Just a thought, 
maybe it's not more clear.


> Poor jena-text query performance when a bound subject is used
> -
>
> Key: JENA-999
> URL: https://issues.apache.org/jira/browse/JENA-999
> Project: Apache Jena
>  Issue Type: Improvement
>Reporter: Stephen Allen
>Assignee: Stephen Allen
>Priority: Minor
> Attachments: PerformanceTester.java, jena-text benchmarks.png
>
>
> When executing a jena-text query, the performance is terrible if the subject 
> is already bound to a variable.  This is because the current code will 
> execute a new lucene query that does not have the subject/entity bound on 
> every iteration and then iterate through the lucene results to join against 
> the subject.  This is quite inefficient.
> Example query:
> {code}
> select *
> where {
>   ?s rdf:type  .
>   ?s text:query ( rdfs:label "test" ) .
> }
> {code}
> This would be quite slow if there were a lot of entities in the system.
> Two potential solutions present themselves:
> # Craft a more explicit lucene query that specifies the entity URI, so that 
> the results coming back from lucene are much smaller.  However, this would 
> cause problems with the score not being correct across multiple iterations.  
> Additionally we are still potentially running a lot of lucene queries, each 
> of which has a probably non-negligble constant cost (parsing the query 
> string, etc).
> # Execute the more general lucene query the first time it is encountered, 
> then caching the results somewhere.  From there, we can then perform a hash 
> table lookup against those cached results.
> I would like to pursue option 2, but there is a problem.  Because jena-text 
> is implemented as a property function instead of a query op in and of itself 
> (like QueryIterMinus is for example), we have to find a place to stash the 
> lucene results.  I believe this can be done by placing it in the 
> ExecutionContext object, using the lucene query as a cache key.  Updates 
> provide a slightly troubling case because you could have an update request 
> like:
> {code}
> insert data {  rdf:type  ; rdfs:label 
> "test" } ;
> delete { ?s ?p ?o }
> where { ?s rdf:type  ; text:query ( rdfs:label 
> "test" ) . ?p ?o . } ;
> insert data {  rdf:type  ; rdfs:label 
> "test" } ;
> delete { ?s ?p ?o }
> where { ?s rdf:type  ; text:query ( rdfs:label 
> "test" ) ; ?p ?o . }
> {code}
> And then the end result should be an empty database.  But if the 
> ExecutionContext was the same for both delete queries, you would be using the 
> cached results from the first delete query in the second delete query, which 
> would result in {{}} not being deleted properly.
> If the ExecutionContext is indeed shared between the two update queries in 
> the situation above, I think this can be solved by making the cache key for 
> the lucene resultset be a combination of both the lucene query and the 
> QueryIterRoot or BindingRoot.  I need to investigate this.  An alternative, 
> if there was a way to be notified when a query has finished executing, we 
> could clear the cache in the ExecutionContext.



--
This message was sent by Atlassian JIRA

[GitHub] jena pull request: JENA-999: jena-text Lucene cache using multimap...

2016-01-05 Thread ajs6f
Github user ajs6f commented on the pull request:

https://github.com/apache/jena/pull/119#issuecomment-169147952
  
I wonder if it is better to rely directly on the Guava classes here or to 
try to use a `org.apache.jena.atlas.lib.Cache<>`?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[jira] [Commented] (JENA-999) Poor jena-text query performance when a bound subject is used

2016-01-05 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/JENA-999?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15083925#comment-15083925
 ] 

ASF GitHub Bot commented on JENA-999:
-

Github user ajs6f commented on the pull request:

https://github.com/apache/jena/pull/119#issuecomment-169147952
  
I wonder if it is better to rely directly on the Guava classes here or to 
try to use a `org.apache.jena.atlas.lib.Cache<>`?


> Poor jena-text query performance when a bound subject is used
> -
>
> Key: JENA-999
> URL: https://issues.apache.org/jira/browse/JENA-999
> Project: Apache Jena
>  Issue Type: Improvement
>Reporter: Stephen Allen
>Assignee: Stephen Allen
>Priority: Minor
> Attachments: PerformanceTester.java, jena-text benchmarks.png
>
>
> When executing a jena-text query, the performance is terrible if the subject 
> is already bound to a variable.  This is because the current code will 
> execute a new lucene query that does not have the subject/entity bound on 
> every iteration and then iterate through the lucene results to join against 
> the subject.  This is quite inefficient.
> Example query:
> {code}
> select *
> where {
>   ?s rdf:type  .
>   ?s text:query ( rdfs:label "test" ) .
> }
> {code}
> This would be quite slow if there were a lot of entities in the system.
> Two potential solutions present themselves:
> # Craft a more explicit lucene query that specifies the entity URI, so that 
> the results coming back from lucene are much smaller.  However, this would 
> cause problems with the score not being correct across multiple iterations.  
> Additionally we are still potentially running a lot of lucene queries, each 
> of which has a probably non-negligble constant cost (parsing the query 
> string, etc).
> # Execute the more general lucene query the first time it is encountered, 
> then caching the results somewhere.  From there, we can then perform a hash 
> table lookup against those cached results.
> I would like to pursue option 2, but there is a problem.  Because jena-text 
> is implemented as a property function instead of a query op in and of itself 
> (like QueryIterMinus is for example), we have to find a place to stash the 
> lucene results.  I believe this can be done by placing it in the 
> ExecutionContext object, using the lucene query as a cache key.  Updates 
> provide a slightly troubling case because you could have an update request 
> like:
> {code}
> insert data {  rdf:type  ; rdfs:label 
> "test" } ;
> delete { ?s ?p ?o }
> where { ?s rdf:type  ; text:query ( rdfs:label 
> "test" ) . ?p ?o . } ;
> insert data {  rdf:type  ; rdfs:label 
> "test" } ;
> delete { ?s ?p ?o }
> where { ?s rdf:type  ; text:query ( rdfs:label 
> "test" ) ; ?p ?o . }
> {code}
> And then the end result should be an empty database.  But if the 
> ExecutionContext was the same for both delete queries, you would be using the 
> cached results from the first delete query in the second delete query, which 
> would result in {{}} not being deleted properly.
> If the ExecutionContext is indeed shared between the two update queries in 
> the situation above, I think this can be solved by making the cache key for 
> the lucene resultset be a combination of both the lucene query and the 
> QueryIterRoot or BindingRoot.  I need to investigate this.  An alternative, 
> if there was a way to be notified when a query has finished executing, we 
> could clear the cache in the ExecutionContext.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[GitHub] jena pull request: JENA-999: jena-text Lucene cache using multimap...

2016-01-05 Thread ajs6f
Github user ajs6f commented on a diff in the pull request:

https://github.com/apache/jena/pull/119#discussion_r48902208
  
--- Diff: 
jena-text/src/main/java/org/apache/jena/query/text/TextQueryPF.java ---
@@ -268,7 +276,25 @@ private QueryIterator concreteSubject(Binding binding, 
Node s, Node score, Node
 Explain.explain(execCxt.getContext(), "Text query: "+queryString) ;
 if ( log.isDebugEnabled())
 log.debug("Text query: {} ({})", queryString,limit) ;
-return textIndex.query(property, queryString, limit) ;
+
+String cacheKey = limit + " " + property + " " + queryString ;
+Map> queryCache = 
+(Map>) 
execCxt.getContext().get(cacheSymbol);
+if (queryCache == null) { /* doesn't yet exist, need to create it 
*/
+queryCache = new LinkedHashMap();
+execCxt.getContext().put(cacheSymbol, queryCache);
+}
+
+ListMultimap results = queryCache.get(cacheKey) ;
+if (results == null) { /* cache miss */
--- End diff --

Because you want to cache the result if it isn't already cached, maybe 
`queryCache.asMap()::computeIfAbsent` could be useful here? Just a thought, 
maybe it's not more clear.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] jena pull request: A maven module for Jena cmds

2016-01-05 Thread asfgit
Github user asfgit closed the pull request at:

https://github.com/apache/jena/pull/118


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[jira] [Resolved] (JENA-1108) New module : jena-cmds

2016-01-05 Thread Andy Seaborne (JIRA)

 [ 
https://issues.apache.org/jira/browse/JENA-1108?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andy Seaborne resolved JENA-1108.
-
   Resolution: Fixed
Fix Version/s: Jena 3.1.0

> New module : jena-cmds
> --
>
> Key: JENA-1108
> URL: https://issues.apache.org/jira/browse/JENA-1108
> Project: Apache Jena
>  Issue Type: Task
>  Components: cmd
>Affects Versions: Jena 3.0.1
>Reporter: Andy Seaborne
>Assignee: Andy Seaborne
> Fix For: Jena 3.1.0
>
>
> Put the command line tools (code, arq, tdb) in their own module.
> This includes the command line support code.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (JENA-1110) Check and update documentation to reflect new module jena-cmds

2016-01-05 Thread Andy Seaborne (JIRA)
Andy Seaborne created JENA-1110:
---

 Summary: Check and update documentation to reflect new module 
jena-cmds
 Key: JENA-1110
 URL: https://issues.apache.org/jira/browse/JENA-1110
 Project: Apache Jena
  Issue Type: Task
Reporter: Andy Seaborne


See JENA-1108 for details of the jena-cmds module.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


New module : jena-cmds

2016-01-05 Thread Andy Seaborne
I have just merged in changes for a new maven module to hold the java 
for the distribution command line tools.  This new module puts all the 
commands in one place.  They retain their original java package names to 
limit changes for users.


Commands covered are those from jena-core, jena-arq, and jena-tdb - the 
ones in the binary distribution.


Fuseki2 also puts these commands in its server jar - people have found 
executing from just the server jar a convenient feature.  This can be 
reconsidered; this change is not an opinion on that aspect.


The binary distribution (zip, tar.gz) now depends on module jena-cmds. 
tdbloader2 scripts are affected and updated (the one internal package 
change to separate the command from the machinery, which remains in 
jena-tdb).


Maven artifact apache-jena-libs does not depends on module jena-cmds so 
using that artifact will now miss the commands. Maven artifact 
apache-jena (as a pom) should get them.  That's the theory: A 
touch of caution here because my machine is now potentially contaminated 
by development artifacts.


Andy

https://issues.apache.org/jira/browse/JENA-1108


[jira] [Commented] (JENA-1108) New module : jena-cmds

2016-01-05 Thread ASF subversion and git services (JIRA)

[ 
https://issues.apache.org/jira/browse/JENA-1108?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15082993#comment-15082993
 ] 

ASF subversion and git services commented on JENA-1108:
---

Commit 216209190640eb646a05f95f2f78afd50b42dbb2 in jena's branch 
refs/heads/master from [~andy.seaborne]
[ https://git-wip-us.apache.org/repos/asf?p=jena.git;h=2162091 ]

JENA-1108: Merge pull request https://github.com/apache/jena/pull/118

This closes #118.


> New module : jena-cmds
> --
>
> Key: JENA-1108
> URL: https://issues.apache.org/jira/browse/JENA-1108
> Project: Apache Jena
>  Issue Type: Task
>  Components: cmd
>Affects Versions: Jena 3.0.1
>Reporter: Andy Seaborne
>Assignee: Andy Seaborne
>
> Put the command line tools (code, arq, tdb) in their own module.
> This includes the command line support code.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (JENA-1108) New module : jena-cmds

2016-01-05 Thread ASF subversion and git services (JIRA)

[ 
https://issues.apache.org/jira/browse/JENA-1108?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15082992#comment-15082992
 ] 

ASF subversion and git services commented on JENA-1108:
---

Commit cda5c7cd5804033379f09a3732a55791acd14a60 in jena's branch 
refs/heads/master from [~andy.seaborne]
[ https://git-wip-us.apache.org/repos/asf?p=jena.git;h=cda5c7c ]

JENA-1108: Command line tools for TDB bulkloader2


> New module : jena-cmds
> --
>
> Key: JENA-1108
> URL: https://issues.apache.org/jira/browse/JENA-1108
> Project: Apache Jena
>  Issue Type: Task
>  Components: cmd
>Affects Versions: Jena 3.0.1
>Reporter: Andy Seaborne
>Assignee: Andy Seaborne
>
> Put the command line tools (code, arq, tdb) in their own module.
> This includes the command line support code.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (JENA-1108) New module : jena-cmds

2016-01-05 Thread ASF subversion and git services (JIRA)

[ 
https://issues.apache.org/jira/browse/JENA-1108?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15082991#comment-15082991
 ] 

ASF subversion and git services commented on JENA-1108:
---

Commit 498b2264143f67024085a32c354f7920a74802ae in jena's branch 
refs/heads/master from [~andy.seaborne]
[ https://git-wip-us.apache.org/repos/asf?p=jena.git;h=498b226 ]

JENA-1108 : jena-cmds module

> New module : jena-cmds
> --
>
> Key: JENA-1108
> URL: https://issues.apache.org/jira/browse/JENA-1108
> Project: Apache Jena
>  Issue Type: Task
>  Components: cmd
>Affects Versions: Jena 3.0.1
>Reporter: Andy Seaborne
>Assignee: Andy Seaborne
>
> Put the command line tools (code, arq, tdb) in their own module.
> This includes the command line support code.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)