[jira] [Commented] (JENA-1645) Poor performance with full text search (Lucene)

2018-12-05 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/JENA-1645?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16710649#comment-16710649
 ] 

ASF GitHub Bot commented on JENA-1645:
--

Github user osma commented on the issue:

https://github.com/apache/jena/pull/503
  
Looks good to me, based on a quick look at the diff. It took a while to 
figure out the way you use UnaryOperator but in the end it made sense.

I'm currently travelling and haven't found time to actually run the code, 
but if it passes the existing unit tests and you're sure that they trigger the 
`concreteSubject` case, I'm fine with that.


> Poor performance with full text search (Lucene)
> ---
>
> Key: JENA-1645
> URL: https://issues.apache.org/jira/browse/JENA-1645
> Project: Apache Jena
>  Issue Type: Question
>  Components: Jena
>Affects Versions: Jena 3.9.0
>Reporter: Vasyl Danyliuk
>Priority: Major
>
> Situation: half of a million of an indexed by Lucene documents(emails 
> actually), searching for emails by sender/receiver and some text.
> If to put text filter in the start of SPARQL query it executes once but in a 
> case of very common words here are a lot of results(100 000+) that leads to 
> poor performance, limiting results count may and up with missed results.
> If to put text search as the last condition it executes once per each already 
> found subject. That's completely OK but text search completely ignores 
> subject URI.
> I found two methods in TextQueryPF class: variableSubject(...) for the first 
> case, and concreteSubject(...) for the second one.
> The question is: why can't subject URI be used as a constraint in the text 
> search?



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[GitHub] jena issue #503: JENA-1645: Use uri predicate in concrete subject query.

2018-12-05 Thread osma
Github user osma commented on the issue:

https://github.com/apache/jena/pull/503
  
Looks good to me, based on a quick look at the diff. It took a while to 
figure out the way you use UnaryOperator but in the end it made sense.

I'm currently travelling and haven't found time to actually run the code, 
but if it passes the existing unit tests and you're sure that they trigger the 
`concreteSubject` case, I'm fine with that.


---


[jira] [Commented] (JENA-1646) allow optional non-indexing of text:field

2018-12-05 Thread ASF subversion and git services (JIRA)


[ 
https://issues.apache.org/jira/browse/JENA-1646?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16710587#comment-16710587
 ] 

ASF subversion and git services commented on JENA-1646:
---

Commit 31995c78a06b2f9bb1e1760866806e9eaae61307 in jena's branch 
refs/heads/master from [~code-ferret]
[ https://git-wip-us.apache.org/repos/asf?p=jena.git;h=31995c7 ]

JENA-1646 Merge commit 'refs/pull/504/head' of https://github.com/apache/jena. 
This close #504


> allow optional non-indexing of text:field
> -
>
> Key: JENA-1646
> URL: https://issues.apache.org/jira/browse/JENA-1646
> Project: Apache Jena
>  Issue Type: Improvement
>  Components: Jena
>Affects Versions: Jena 3.9.0
>Reporter: Code Ferret
>Assignee: Code Ferret
>Priority: Minor
>  Labels: pull-request-available
> Fix For: Jena 3.10.0
>
>
> When using the Multilingual support, the field to search is generally the 
> {{text:field}} with an appended {{text:lang}} field value:
> {code:java}
> altLabel_fr
> {code}
> In this usage, if queries are never performed against the {{text:field}} 
> without a _language tag_ then some space and time can be saved by not 
> indexing the {{text:field}} and this improvement adds a boolean option, 
> {{text:noIndex}}, that is used in the {{text:map}} entries for those entries 
> that should not have their {{text:field}} indexed. This only makes sense in 
> the context of {{text:multilingualSupport true}} in the {{TextIndex}}.
> A pull request is available.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (JENA-1646) allow optional non-indexing of text:field

2018-12-05 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/JENA-1646?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16710588#comment-16710588
 ] 

ASF GitHub Bot commented on JENA-1646:
--

Github user asfgit closed the pull request at:

https://github.com/apache/jena/pull/504


> allow optional non-indexing of text:field
> -
>
> Key: JENA-1646
> URL: https://issues.apache.org/jira/browse/JENA-1646
> Project: Apache Jena
>  Issue Type: Improvement
>  Components: Jena
>Affects Versions: Jena 3.9.0
>Reporter: Code Ferret
>Assignee: Code Ferret
>Priority: Minor
>  Labels: pull-request-available
> Fix For: Jena 3.10.0
>
>
> When using the Multilingual support, the field to search is generally the 
> {{text:field}} with an appended {{text:lang}} field value:
> {code:java}
> altLabel_fr
> {code}
> In this usage, if queries are never performed against the {{text:field}} 
> without a _language tag_ then some space and time can be saved by not 
> indexing the {{text:field}} and this improvement adds a boolean option, 
> {{text:noIndex}}, that is used in the {{text:map}} entries for those entries 
> that should not have their {{text:field}} indexed. This only makes sense in 
> the context of {{text:multilingualSupport true}} in the {{TextIndex}}.
> A pull request is available.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[GitHub] jena pull request #504: JENA-1646 allow optional non-indexing

2018-12-05 Thread asfgit
Github user asfgit closed the pull request at:

https://github.com/apache/jena/pull/504


---


[jira] [Commented] (JENA-1646) allow optional non-indexing of text:field

2018-12-05 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/JENA-1646?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16710329#comment-16710329
 ] 

ASF GitHub Bot commented on JENA-1646:
--

GitHub user xristy opened a pull request:

https://github.com/apache/jena/pull/504

JENA-1646 allow optional non-indexing

implements JENA-1646

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/BuddhistDigitalResourceCenter/jena 
JENA-1646-NoIndex

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/jena/pull/504.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #504


commit b757303a205386730888e00d0622ee7b80bb1d3b
Author: Code Ferret 
Date:   2018-12-05T16:42:17Z

Merge dev NoIndex




> allow optional non-indexing of text:field
> -
>
> Key: JENA-1646
> URL: https://issues.apache.org/jira/browse/JENA-1646
> Project: Apache Jena
>  Issue Type: Improvement
>  Components: Jena
>Affects Versions: Jena 3.9.0
>Reporter: Code Ferret
>Assignee: Code Ferret
>Priority: Minor
>  Labels: pull-request-available
> Fix For: Jena 3.10.0
>
>
> When using the Multilingual support, the field to search is generally the 
> {{text:field}} with an appended {{text:lang}} field value:
> {code:java}
> altLabel_fr
> {code}
> In this usage, if queries are never performed against the {{text:field}} 
> without a _language tag_ then some space and time can be saved by not 
> indexing the {{text:field}} and this improvement adds a boolean option, 
> {{text:noIndex}}, that is used in the {{text:map}} entries for those entries 
> that should not have their {{text:field}} indexed. This only makes sense in 
> the context of {{text:multilingualSupport true}} in the {{TextIndex}}.
> A pull request is available.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[GitHub] jena pull request #504: JENA-1646 allow optional non-indexing

2018-12-05 Thread xristy
GitHub user xristy opened a pull request:

https://github.com/apache/jena/pull/504

JENA-1646 allow optional non-indexing

implements JENA-1646

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/BuddhistDigitalResourceCenter/jena 
JENA-1646-NoIndex

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/jena/pull/504.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #504


commit b757303a205386730888e00d0622ee7b80bb1d3b
Author: Code Ferret 
Date:   2018-12-05T16:42:17Z

Merge dev NoIndex




---


[GitHub] jena issue #503: JENA-1645: Use uri predicate in concrete subject query.

2018-12-05 Thread xristy
Github user xristy commented on the issue:

https://github.com/apache/jena/pull/503
  
I'm looking at the PR and so far it looks good. I'm wanting to complete the 
PR for JENA-1646.


---


[jira] [Commented] (JENA-1645) Poor performance with full text search (Lucene)

2018-12-05 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/JENA-1645?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16710288#comment-16710288
 ] 

ASF GitHub Bot commented on JENA-1645:
--

Github user xristy commented on the issue:

https://github.com/apache/jena/pull/503
  
I'm looking at the PR and so far it looks good. I'm wanting to complete the 
PR for JENA-1646.


> Poor performance with full text search (Lucene)
> ---
>
> Key: JENA-1645
> URL: https://issues.apache.org/jira/browse/JENA-1645
> Project: Apache Jena
>  Issue Type: Question
>  Components: Jena
>Affects Versions: Jena 3.9.0
>Reporter: Vasyl Danyliuk
>Priority: Major
>
> Situation: half of a million of an indexed by Lucene documents(emails 
> actually), searching for emails by sender/receiver and some text.
> If to put text filter in the start of SPARQL query it executes once but in a 
> case of very common words here are a lot of results(100 000+) that leads to 
> poor performance, limiting results count may and up with missed results.
> If to put text search as the last condition it executes once per each already 
> found subject. That's completely OK but text search completely ignores 
> subject URI.
> I found two methods in TextQueryPF class: variableSubject(...) for the first 
> case, and concreteSubject(...) for the second one.
> The question is: why can't subject URI be used as a constraint in the text 
> search?



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (JENA-1645) Poor performance with full text search (Lucene)

2018-12-05 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/JENA-1645?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16710281#comment-16710281
 ] 

ASF GitHub Bot commented on JENA-1645:
--

Github user rvesse commented on the issue:

https://github.com/apache/jena/pull/503
  
cc @osma @xristy for review as main devs in this area


> Poor performance with full text search (Lucene)
> ---
>
> Key: JENA-1645
> URL: https://issues.apache.org/jira/browse/JENA-1645
> Project: Apache Jena
>  Issue Type: Question
>  Components: Jena
>Affects Versions: Jena 3.9.0
>Reporter: Vasyl Danyliuk
>Priority: Major
>
> Situation: half of a million of an indexed by Lucene documents(emails 
> actually), searching for emails by sender/receiver and some text.
> If to put text filter in the start of SPARQL query it executes once but in a 
> case of very common words here are a lot of results(100 000+) that leads to 
> poor performance, limiting results count may and up with missed results.
> If to put text search as the last condition it executes once per each already 
> found subject. That's completely OK but text search completely ignores 
> subject URI.
> I found two methods in TextQueryPF class: variableSubject(...) for the first 
> case, and concreteSubject(...) for the second one.
> The question is: why can't subject URI be used as a constraint in the text 
> search?



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[GitHub] jena issue #503: JENA-1645: Use uri predicate in concrete subject query.

2018-12-05 Thread rvesse
Github user rvesse commented on the issue:

https://github.com/apache/jena/pull/503
  
cc @osma @xristy for review as main devs in this area


---


[jira] [Commented] (JENA-1645) Poor performance with full text search (Lucene)

2018-12-05 Thread Vasyl Danyliuk (JIRA)


[ 
https://issues.apache.org/jira/browse/JENA-1645?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16709812#comment-16709812
 ] 

Vasyl Danyliuk commented on JENA-1645:
--

The query is pretty straightforward:
{code:java}
PREFIX person: 
PREFIX email: 
PREFIX text: 

SELECT DISTINCT ?emailId ?content
  WHERE {
?person1Id person:name "Person One" .
?person2Id person:name "Second Person" .
{?person1Id email:sent ?emailId . ?person2Id email:received ?emailId .} 
UNION
{?person2Id email:sent ?emailId . ?person1Id email:received ?emailId .}
(?emailId ?score ?content) text:query (email:indexedContent "ext to search" 
1 "highlight:s: | e:") .
  }
{code}
Such cases already covered by tests in jena-text module.

Created pull request with code added to the Lucene index.

> Poor performance with full text search (Lucene)
> ---
>
> Key: JENA-1645
> URL: https://issues.apache.org/jira/browse/JENA-1645
> Project: Apache Jena
>  Issue Type: Question
>  Components: Jena
>Affects Versions: Jena 3.9.0
>Reporter: Vasyl Danyliuk
>Priority: Major
>
> Situation: half of a million of an indexed by Lucene documents(emails 
> actually), searching for emails by sender/receiver and some text.
> If to put text filter in the start of SPARQL query it executes once but in a 
> case of very common words here are a lot of results(100 000+) that leads to 
> poor performance, limiting results count may and up with missed results.
> If to put text search as the last condition it executes once per each already 
> found subject. That's completely OK but text search completely ignores 
> subject URI.
> I found two methods in TextQueryPF class: variableSubject(...) for the first 
> case, and concreteSubject(...) for the second one.
> The question is: why can't subject URI be used as a constraint in the text 
> search?



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (JENA-1645) Poor performance with full text search (Lucene)

2018-12-05 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/JENA-1645?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16709803#comment-16709803
 ] 

ASF GitHub Bot commented on JENA-1645:
--

GitHub user DrBAXA opened a pull request:

https://github.com/apache/jena/pull/503

JENA-1645: Use uri predicate in concrete subject query.

Added URI predicate to the Lucene search in case of concrete subject 
search. 
Method added in TextIndex interface made default with a fallback to the 
previous implementation.

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/DrBAXA/jena master

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/jena/pull/503.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #503


commit 52d959c7a654b03e525fad214b027b6ac6aba2b2
Author: vdanyliuk 
Date:   2018-12-05T09:10:49Z

JENA-1645: Use uri predicate in concrete subject query.




> Poor performance with full text search (Lucene)
> ---
>
> Key: JENA-1645
> URL: https://issues.apache.org/jira/browse/JENA-1645
> Project: Apache Jena
>  Issue Type: Question
>  Components: Jena
>Affects Versions: Jena 3.9.0
>Reporter: Vasyl Danyliuk
>Priority: Major
>
> Situation: half of a million of an indexed by Lucene documents(emails 
> actually), searching for emails by sender/receiver and some text.
> If to put text filter in the start of SPARQL query it executes once but in a 
> case of very common words here are a lot of results(100 000+) that leads to 
> poor performance, limiting results count may and up with missed results.
> If to put text search as the last condition it executes once per each already 
> found subject. That's completely OK but text search completely ignores 
> subject URI.
> I found two methods in TextQueryPF class: variableSubject(...) for the first 
> case, and concreteSubject(...) for the second one.
> The question is: why can't subject URI be used as a constraint in the text 
> search?



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[GitHub] jena pull request #503: JENA-1645: Use uri predicate in concrete subject que...

2018-12-05 Thread DrBAXA
GitHub user DrBAXA opened a pull request:

https://github.com/apache/jena/pull/503

JENA-1645: Use uri predicate in concrete subject query.

Added URI predicate to the Lucene search in case of concrete subject 
search. 
Method added in TextIndex interface made default with a fallback to the 
previous implementation.

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/DrBAXA/jena master

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/jena/pull/503.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #503


commit 52d959c7a654b03e525fad214b027b6ac6aba2b2
Author: vdanyliuk 
Date:   2018-12-05T09:10:49Z

JENA-1645: Use uri predicate in concrete subject query.




---