[jira] [Commented] (LUCENE-8327) Add a multiplexing TokenFilter

2018-05-22 Thread Alan Woodward (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-8327?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16484065#comment-16484065
 ] 

Alan Woodward commented on LUCENE-8327:
---

Here's an initial patch sketching out the idea.  It still needs a filter 
factory and adding into CustomAnalyzer, and it would be nice to somehow add it 
into TestRandomChains.  A caveat is that I don't think this will work with 
tokenfilters that need to read ahead, like SynonymFilter.

> Add a multiplexing TokenFilter
> --
>
> Key: LUCENE-8327
> URL: https://issues.apache.org/jira/browse/LUCENE-8327
> Project: Lucene - Core
>  Issue Type: New Feature
>    Reporter: Alan Woodward
>Assignee: Alan Woodward
>Priority: Major
> Attachments: LUCENE-8327.patch
>
>
> Following on from LUCENE-8273, and as a prerequisite of LUCENE-8308, it would 
> be useful to have a TokenFilter that takes a number of child filters, and 
> repeats its incoming stream, applying each filter in turn.  So for example, 
> you could keep the original term, output ngrams, and apply stemming, all in 
> the same token stream.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (LUCENE-8327) Add a multiplexing TokenFilter

2018-05-22 Thread Alan Woodward (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-8327?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alan Woodward updated LUCENE-8327:
--
Attachment: LUCENE-8327.patch

> Add a multiplexing TokenFilter
> --
>
> Key: LUCENE-8327
> URL: https://issues.apache.org/jira/browse/LUCENE-8327
> Project: Lucene - Core
>  Issue Type: New Feature
>    Reporter: Alan Woodward
>Assignee: Alan Woodward
>Priority: Major
> Attachments: LUCENE-8327.patch
>
>
> Following on from LUCENE-8273, and as a prerequisite of LUCENE-8308, it would 
> be useful to have a TokenFilter that takes a number of child filters, and 
> repeats its incoming stream, applying each filter in turn.  So for example, 
> you could keep the original term, output ngrams, and apply stemming, all in 
> the same token stream.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Created] (LUCENE-8327) Add a multiplexing TokenFilter

2018-05-22 Thread Alan Woodward (JIRA)
Alan Woodward created LUCENE-8327:
-

 Summary: Add a multiplexing TokenFilter
 Key: LUCENE-8327
 URL: https://issues.apache.org/jira/browse/LUCENE-8327
 Project: Lucene - Core
  Issue Type: New Feature
Reporter: Alan Woodward
Assignee: Alan Woodward


Following on from LUCENE-8273, and as a prerequisite of LUCENE-8308, it would 
be useful to have a TokenFilter that takes a number of child filters, and 
repeats its incoming stream, applying each filter in turn.  So for example, you 
could keep the original term, output ngrams, and apply stemming, all in the 
same token stream.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



Re: BadApple candidates

2018-05-21 Thread Alan Woodward
Looks like it was an OOM, can you leave that one be for now?

> On 21 May 2018, at 19:11, Erick Erickson <erickerick...@gmail.com> wrote:
> 
> Alan:
> 
> http://fucit.org/solr-jenkins-reports/job-data/sarowe/Lucene-Solr-Nightly-7.x/256/
> 
> You can get there from Hoss's rollup reports here:
> http://fucit.org/solr-jenkins-reports/failure-report.html
> 
> To be included in any potential BadApple, two things must be true:
> 1> it must have failed since last Monday
> 2> it must have failed in the report collected two weeks ago Monday
> 
> Erick
> 
> 
> On Mon, May 21, 2018 at 12:40 PM, Alan Woodward <romseyg...@gmail.com> wrote:
>> When did TestLRUQueryCache fail?  I haven’t seen that one.
>> 
>>> On 21 May 2018, at 16:00, Erick Erickson <erickerick...@gmail.com> wrote:
>>> 
>>> I'm going to change how I collect the badapple candidates. After
>>> getting a little
>>> overwhelmed by the number of failure e-mails (even ignoring the ones with
>>> BadApple enabled), "It come to me in a vision! In a flash!"" (points if you
>>> know where that comes from, hint: Old music involving a pickle).
>>> 
>>> Since I collect failures for a week then run filter them by what's
>>> also in Hoss's
>>> results from two  weeks ago, that's really equivalent to creating the 
>>> candidate
>>> list from the intersection of the most recent week of Hoss's results and the
>>> results from _three_ weeks ago. Much faster too. Thanks Hoss!
>>> 
>>> So that's what I'll do going forward.
>>> 
>>> Meanwhile, here's the list for this Thursday.
>>> 
>>> BadApple candidates: I'll BadApple these on Thursday unless there are 
>>> objections
>>> org.apache.lucene.search.TestLRUQueryCache.testBulkScorerLocking
>>> org.apache.solr.TestDistributedSearch.test
>>> org.apache.solr.cloud.AddReplicaTest.test
>>> org.apache.solr.cloud.AssignBackwardCompatibilityTest.test
>>> org.apache.solr.cloud.CreateRoutedAliasTest.testCollectionNamesMustBeAbsent
>>> org.apache.solr.cloud.CreateRoutedAliasTest.testTimezoneAbsoluteDate
>>> org.apache.solr.cloud.CreateRoutedAliasTest.testV1
>>> org.apache.solr.cloud.CreateRoutedAliasTest.testV2
>>> org.apache.solr.cloud.DeleteReplicaTest.raceConditionOnDeleteAndRegisterReplica
>>> org.apache.solr.cloud.LIRRollingUpdatesTest.testNewReplicaOldLeader
>>> org.apache.solr.cloud.LeaderVoteWaitTimeoutTest.basicTest
>>> org.apache.solr.cloud.LeaderVoteWaitTimeoutTest.testMostInSyncReplicasCanWinElection
>>> org.apache.solr.cloud.MoveReplicaHDFSTest.testFailedMove
>>> org.apache.solr.cloud.RestartWhileUpdatingTest.test
>>> org.apache.solr.cloud.TestCloudConsistency.testOutOfSyncReplicasCannotBecomeLeader
>>> org.apache.solr.cloud.TestPullReplica.testCreateDelete
>>> org.apache.solr.cloud.TestPullReplica.testKillLeader
>>> org.apache.solr.cloud.TestSolrCloudWithKerberosAlt.testBasics
>>> org.apache.solr.cloud.UnloadDistributedZkTest.test
>>> org.apache.solr.cloud.api.collections.CollectionsAPIAsyncDistributedZkTest.testAsyncRequests
>>> org.apache.solr.cloud.api.collections.CustomCollectionTest.testCustomCollectionsAPI
>>> org.apache.solr.cloud.autoscaling.ComputePlanActionTest.testNodeLost
>>> org.apache.solr.cloud.autoscaling.IndexSizeTriggerTest.testMergeIntegration
>>> org.apache.solr.cloud.autoscaling.IndexSizeTriggerTest.testSplitIntegration
>>> org.apache.solr.cloud.autoscaling.IndexSizeTriggerTest.testTrigger
>>> org.apache.solr.cloud.autoscaling.NodeAddedTriggerTest.testRestoreState
>>> org.apache.solr.cloud.autoscaling.SearchRateTriggerIntegrationTest.testBelowSearchRate
>>> org.apache.solr.cloud.autoscaling.SearchRateTriggerIntegrationTest.testDeleteNode
>>> org.apache.solr.cloud.autoscaling.SearchRateTriggerTest.testTrigger
>>> org.apache.solr.cloud.hdfs.HdfsUnloadDistributedZkTest.test
>>> org.apache.solr.cloud.hdfs.StressHdfsTest.test
>>> org.apache.solr.handler.TestSQLHandler.doTest
>>> org.apache.solr.security.BasicAuthIntegrationTest.testBasicAuth
>>> org.apache.solr.uninverting.TestDocTermOrds.testTriggerUnInvertLimit
>>> org.apache.solr.update.TestHdfsUpdateLog.testFSThreadSafety
>>> org.apache.solr.update.TestInPlaceUpdatesDistrib.test
>>> 
>>> 
>>> Number of AwaitsFix: 21 Number of BadApples: 99
>>> 
>>> *AwaitsFix Annotations:
>>> 
>>> 
>>> Lucene AwaitsFix
>>> GeoPolygonTest.java
>>>

Re: BadApple candidates

2018-05-21 Thread Alan Woodward
When did TestLRUQueryCache fail?  I haven’t seen that one.

> On 21 May 2018, at 16:00, Erick Erickson  wrote:
> 
> I'm going to change how I collect the badapple candidates. After
> getting a little
> overwhelmed by the number of failure e-mails (even ignoring the ones with
> BadApple enabled), "It come to me in a vision! In a flash!"" (points if you
> know where that comes from, hint: Old music involving a pickle).
> 
> Since I collect failures for a week then run filter them by what's
> also in Hoss's
> results from two  weeks ago, that's really equivalent to creating the 
> candidate
> list from the intersection of the most recent week of Hoss's results and the
> results from _three_ weeks ago. Much faster too. Thanks Hoss!
> 
> So that's what I'll do going forward.
> 
> Meanwhile, here's the list for this Thursday.
> 
> BadApple candidates: I'll BadApple these on Thursday unless there are 
> objections
> org.apache.lucene.search.TestLRUQueryCache.testBulkScorerLocking
>  org.apache.solr.TestDistributedSearch.test
>  org.apache.solr.cloud.AddReplicaTest.test
>  org.apache.solr.cloud.AssignBackwardCompatibilityTest.test
>  org.apache.solr.cloud.CreateRoutedAliasTest.testCollectionNamesMustBeAbsent
>  org.apache.solr.cloud.CreateRoutedAliasTest.testTimezoneAbsoluteDate
>  org.apache.solr.cloud.CreateRoutedAliasTest.testV1
>  org.apache.solr.cloud.CreateRoutedAliasTest.testV2
>  
> org.apache.solr.cloud.DeleteReplicaTest.raceConditionOnDeleteAndRegisterReplica
>  org.apache.solr.cloud.LIRRollingUpdatesTest.testNewReplicaOldLeader
>  org.apache.solr.cloud.LeaderVoteWaitTimeoutTest.basicTest
>  
> org.apache.solr.cloud.LeaderVoteWaitTimeoutTest.testMostInSyncReplicasCanWinElection
>  org.apache.solr.cloud.MoveReplicaHDFSTest.testFailedMove
>  org.apache.solr.cloud.RestartWhileUpdatingTest.test
>  
> org.apache.solr.cloud.TestCloudConsistency.testOutOfSyncReplicasCannotBecomeLeader
>  org.apache.solr.cloud.TestPullReplica.testCreateDelete
>  org.apache.solr.cloud.TestPullReplica.testKillLeader
>  org.apache.solr.cloud.TestSolrCloudWithKerberosAlt.testBasics
>  org.apache.solr.cloud.UnloadDistributedZkTest.test
>  
> org.apache.solr.cloud.api.collections.CollectionsAPIAsyncDistributedZkTest.testAsyncRequests
>  
> org.apache.solr.cloud.api.collections.CustomCollectionTest.testCustomCollectionsAPI
>  org.apache.solr.cloud.autoscaling.ComputePlanActionTest.testNodeLost
>  org.apache.solr.cloud.autoscaling.IndexSizeTriggerTest.testMergeIntegration
>  org.apache.solr.cloud.autoscaling.IndexSizeTriggerTest.testSplitIntegration
>  org.apache.solr.cloud.autoscaling.IndexSizeTriggerTest.testTrigger
>  org.apache.solr.cloud.autoscaling.NodeAddedTriggerTest.testRestoreState
>  
> org.apache.solr.cloud.autoscaling.SearchRateTriggerIntegrationTest.testBelowSearchRate
>  
> org.apache.solr.cloud.autoscaling.SearchRateTriggerIntegrationTest.testDeleteNode
>  org.apache.solr.cloud.autoscaling.SearchRateTriggerTest.testTrigger
>  org.apache.solr.cloud.hdfs.HdfsUnloadDistributedZkTest.test
>  org.apache.solr.cloud.hdfs.StressHdfsTest.test
>  org.apache.solr.handler.TestSQLHandler.doTest
>  org.apache.solr.security.BasicAuthIntegrationTest.testBasicAuth
>  org.apache.solr.uninverting.TestDocTermOrds.testTriggerUnInvertLimit
>  org.apache.solr.update.TestHdfsUpdateLog.testFSThreadSafety
>  org.apache.solr.update.TestInPlaceUpdatesDistrib.test
> 
> 
> Number of AwaitsFix: 21 Number of BadApples: 99
> 
> *AwaitsFix Annotations:
> 
> 
> Lucene AwaitsFix
> GeoPolygonTest.java
>  testLUCENE8276_case3()
>  //@AwaitsFix(bugUrl="https://issues.apache.org/jira/browse/LUCENE-8276;)
> 
> GeoPolygonTest.java
>  testLUCENE8280()
>  //@AwaitsFix(bugUrl="https://issues.apache.org/jira/browse/LUCENE-8280;)
> 
> GeoPolygonTest.java
>  testLUCENE8281()
>  //@AwaitsFix(bugUrl="https://issues.apache.org/jira/browse/LUCENE-8281;)
> 
> RandomGeoPolygonTest.java
>  testCompareBigPolygons()
>  //@AwaitsFix(bugUrl="https://issues.apache.org/jira/browse/LUCENE-8281;)
> 
> RandomGeoPolygonTest.java
>  testCompareSmallPolygons()
>  //@AwaitsFix(bugUrl="https://issues.apache.org/jira/browse/LUCENE-8281;)
> 
> TestControlledRealTimeReopenThread.java
>  testCRTReopen()
>  @AwaitsFix(bugUrl = "https://issues.apache.org/jira/browse/LUCENE-5737;)
> 
> TestICUNormalizer2CharFilter.java
>  testRandomStrings()
>  @AwaitsFix(bugUrl = "https://issues.apache.org/jira/browse/LUCENE-5595;)
> 
> TestICUTokenizerCJK.java
>  TestICUTokenizerCJK suite
>  @AwaitsFix(bugUrl="https://issues.apache.org/jira/browse/LUCENE-8222;)
> 
> TestMoreLikeThis.java
>  testMultiFieldShouldReturnPerFieldBooleanQuery()
>  @AwaitsFix(bugUrl = "https://issues.apache.org/jira/browse/LUCENE-7161;)
> 
> UIMABaseAnalyzerTest.java
>  testRandomStrings()
>  @Test @AwaitsFix(bugUrl =
> "https://issues.apache.org/jira/browse/LUCENE-3869;)
> 
> UIMABaseAnalyzerTest.java
>  testRandomStringsWithConfigurationParameters()
>  @Test @AwaitsFix(bugUrl =

[jira] [Commented] (LUCENE-8273) Add a ConditionalTokenFilter

2018-05-18 Thread Alan Woodward (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-8273?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16480712#comment-16480712
 ] 

Alan Woodward commented on LUCENE-8273:
---

bq. Would you like me to make a new patch from the other stuff in my patch?

Yes please!  Feel free to commit it if you think it's ready, I wasn't sure if 
you still wanted to add more testing or docs.

> Add a ConditionalTokenFilter
> 
>
> Key: LUCENE-8273
> URL: https://issues.apache.org/jira/browse/LUCENE-8273
> Project: Lucene - Core
>  Issue Type: New Feature
>    Reporter: Alan Woodward
>Assignee: Alan Woodward
>Priority: Major
> Fix For: 7.4
>
> Attachments: LUCENE-8273-2.patch, LUCENE-8273-2.patch, 
> LUCENE-8273-part2-rebased.patch, LUCENE-8273-part2.patch, 
> LUCENE-8273-part2.patch, LUCENE-8273.patch, LUCENE-8273.patch, 
> LUCENE-8273.patch, LUCENE-8273.patch, LUCENE-8273.patch, LUCENE-8273.patch, 
> LUCENE-8273.patch, LUCENE-8273.patch
>
>
> Spinoff of LUCENE-8265.  It would be useful to be able to wrap a TokenFilter 
> in such a way that it could optionally be bypassed based on the current state 
> of the TokenStream.  This could be used to, for example, only apply 
> WordDelimiterFilter to terms that contain hyphens.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-8273) Add a ConditionalTokenFilter

2018-05-18 Thread Alan Woodward (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-8273?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16480574#comment-16480574
 ] 

Alan Woodward commented on LUCENE-8273:
---

Nope, that's a relic from a failed attempt to fix something, and precommit has 
just pulled me up on it not having any javadocs :) Will nuke it before I commit.

> Add a ConditionalTokenFilter
> 
>
> Key: LUCENE-8273
> URL: https://issues.apache.org/jira/browse/LUCENE-8273
> Project: Lucene - Core
>  Issue Type: New Feature
>    Reporter: Alan Woodward
>Assignee: Alan Woodward
>Priority: Major
> Fix For: 7.4
>
> Attachments: LUCENE-8273-2.patch, LUCENE-8273-2.patch, 
> LUCENE-8273-part2-rebased.patch, LUCENE-8273-part2.patch, 
> LUCENE-8273-part2.patch, LUCENE-8273.patch, LUCENE-8273.patch, 
> LUCENE-8273.patch, LUCENE-8273.patch, LUCENE-8273.patch, LUCENE-8273.patch, 
> LUCENE-8273.patch, LUCENE-8273.patch
>
>
> Spinoff of LUCENE-8265.  It would be useful to be able to wrap a TokenFilter 
> in such a way that it could optionally be bypassed based on the current state 
> of the TokenStream.  This could be used to, for example, only apply 
> WordDelimiterFilter to terms that contain hyphens.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-8273) Add a ConditionalTokenFilter

2018-05-18 Thread Alan Woodward (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-8273?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16480549#comment-16480549
 ] 

Alan Woodward commented on LUCENE-8273:
---

I think I have now chased down the last failures, all around end() propagation 
and how to deal with position increments when FilteredTermFilter is skipped.  
Attached is a patch that includes [~steve_rowe]'s test improvements.  I'll 
commit this now, and un-awaitsfix TestRandomChains and see if that throws out 
any new bugs in the next while.

> Add a ConditionalTokenFilter
> 
>
> Key: LUCENE-8273
> URL: https://issues.apache.org/jira/browse/LUCENE-8273
> Project: Lucene - Core
>  Issue Type: New Feature
>    Reporter: Alan Woodward
>Assignee: Alan Woodward
>Priority: Major
> Fix For: 7.4
>
> Attachments: LUCENE-8273-2.patch, LUCENE-8273-2.patch, 
> LUCENE-8273-part2-rebased.patch, LUCENE-8273-part2.patch, 
> LUCENE-8273-part2.patch, LUCENE-8273.patch, LUCENE-8273.patch, 
> LUCENE-8273.patch, LUCENE-8273.patch, LUCENE-8273.patch, LUCENE-8273.patch, 
> LUCENE-8273.patch, LUCENE-8273.patch
>
>
> Spinoff of LUCENE-8265.  It would be useful to be able to wrap a TokenFilter 
> in such a way that it could optionally be bypassed based on the current state 
> of the TokenStream.  This could be used to, for example, only apply 
> WordDelimiterFilter to terms that contain hyphens.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Reopened] (LUCENE-8273) Add a ConditionalTokenFilter

2018-05-18 Thread Alan Woodward (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-8273?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alan Woodward reopened LUCENE-8273:
---

> Add a ConditionalTokenFilter
> 
>
> Key: LUCENE-8273
> URL: https://issues.apache.org/jira/browse/LUCENE-8273
> Project: Lucene - Core
>  Issue Type: New Feature
>    Reporter: Alan Woodward
>Assignee: Alan Woodward
>Priority: Major
> Fix For: 7.4
>
> Attachments: LUCENE-8273-2.patch, LUCENE-8273-2.patch, 
> LUCENE-8273-part2-rebased.patch, LUCENE-8273-part2.patch, 
> LUCENE-8273-part2.patch, LUCENE-8273.patch, LUCENE-8273.patch, 
> LUCENE-8273.patch, LUCENE-8273.patch, LUCENE-8273.patch, LUCENE-8273.patch, 
> LUCENE-8273.patch, LUCENE-8273.patch
>
>
> Spinoff of LUCENE-8265.  It would be useful to be able to wrap a TokenFilter 
> in such a way that it could optionally be bypassed based on the current state 
> of the TokenStream.  This could be used to, for example, only apply 
> WordDelimiterFilter to terms that contain hyphens.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (LUCENE-8273) Add a ConditionalTokenFilter

2018-05-18 Thread Alan Woodward (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-8273?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alan Woodward updated LUCENE-8273:
--
Attachment: LUCENE-8273-2.patch

> Add a ConditionalTokenFilter
> 
>
> Key: LUCENE-8273
> URL: https://issues.apache.org/jira/browse/LUCENE-8273
> Project: Lucene - Core
>  Issue Type: New Feature
>    Reporter: Alan Woodward
>Assignee: Alan Woodward
>Priority: Major
> Fix For: 7.4
>
> Attachments: LUCENE-8273-2.patch, LUCENE-8273-2.patch, 
> LUCENE-8273-part2-rebased.patch, LUCENE-8273-part2.patch, 
> LUCENE-8273-part2.patch, LUCENE-8273.patch, LUCENE-8273.patch, 
> LUCENE-8273.patch, LUCENE-8273.patch, LUCENE-8273.patch, LUCENE-8273.patch, 
> LUCENE-8273.patch, LUCENE-8273.patch
>
>
> Spinoff of LUCENE-8265.  It would be useful to be able to wrap a TokenFilter 
> in such a way that it could optionally be bypassed based on the current state 
> of the TokenStream.  This could be used to, for example, only apply 
> WordDelimiterFilter to terms that contain hyphens.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-8306) Allow iteration over the term positions of a Match

2018-05-18 Thread Alan Woodward (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-8306?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16480465#comment-16480465
 ] 

Alan Woodward commented on LUCENE-8306:
---

I think of it more as adding flexibility.  With this API you can either 
highlight whole matches, or individual parts of matches, or any combination in 
between that you like.  And it's a pretty minimal addition - one extra method 
and a FunctionInterface.  And seeing as the whole point of the Matches API is 
to help highlighting, I think it's worth it?

> Allow iteration over the term positions of a Match
> --
>
> Key: LUCENE-8306
> URL: https://issues.apache.org/jira/browse/LUCENE-8306
> Project: Lucene - Core
>  Issue Type: New Feature
>    Reporter: Alan Woodward
>Assignee: Alan Woodward
>Priority: Major
> Attachments: LUCENE-8306.patch, LUCENE-8306.patch
>
>
> For multi-term queries such as phrase queries, the matches API currently just 
> returns information about the span of the whole match.  It would be useful to 
> also expose information about the matching terms within the phrase.  The same 
> would apply to Spans and Interval queries.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-8273) Add a ConditionalTokenFilter

2018-05-16 Thread Alan Woodward (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-8273?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16477910#comment-16477910
 ] 

Alan Woodward commented on LUCENE-8273:
---

Every time I think I have this fixed, TestRandomChains finds another failure... 
 I'm attaching my latest patch, which includes the failing seed.  
[~steve_rowe], can you rebase on top of this?

> Add a ConditionalTokenFilter
> 
>
> Key: LUCENE-8273
> URL: https://issues.apache.org/jira/browse/LUCENE-8273
> Project: Lucene - Core
>  Issue Type: New Feature
>    Reporter: Alan Woodward
>Assignee: Alan Woodward
>Priority: Major
> Fix For: 7.4
>
> Attachments: LUCENE-8273-2.patch, LUCENE-8273-part2.patch, 
> LUCENE-8273-part2.patch, LUCENE-8273.patch, LUCENE-8273.patch, 
> LUCENE-8273.patch, LUCENE-8273.patch, LUCENE-8273.patch, LUCENE-8273.patch, 
> LUCENE-8273.patch, LUCENE-8273.patch
>
>
> Spinoff of LUCENE-8265.  It would be useful to be able to wrap a TokenFilter 
> in such a way that it could optionally be bypassed based on the current state 
> of the TokenStream.  This could be used to, for example, only apply 
> WordDelimiterFilter to terms that contain hyphens.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (LUCENE-8273) Add a ConditionalTokenFilter

2018-05-16 Thread Alan Woodward (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-8273?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alan Woodward updated LUCENE-8273:
--
Attachment: LUCENE-8273-2.patch

> Add a ConditionalTokenFilter
> 
>
> Key: LUCENE-8273
> URL: https://issues.apache.org/jira/browse/LUCENE-8273
> Project: Lucene - Core
>  Issue Type: New Feature
>    Reporter: Alan Woodward
>Assignee: Alan Woodward
>Priority: Major
> Fix For: 7.4
>
> Attachments: LUCENE-8273-2.patch, LUCENE-8273-part2.patch, 
> LUCENE-8273-part2.patch, LUCENE-8273.patch, LUCENE-8273.patch, 
> LUCENE-8273.patch, LUCENE-8273.patch, LUCENE-8273.patch, LUCENE-8273.patch, 
> LUCENE-8273.patch, LUCENE-8273.patch
>
>
> Spinoff of LUCENE-8265.  It would be useful to be able to wrap a TokenFilter 
> in such a way that it could optionally be bypassed based on the current state 
> of the TokenStream.  This could be used to, for example, only apply 
> WordDelimiterFilter to terms that contain hyphens.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-8273) Add a ConditionalTokenFilter

2018-05-16 Thread Alan Woodward (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-8273?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16477116#comment-16477116
 ] 

Alan Woodward commented on LUCENE-8273:
---

There's a bug in the way that tokens are buffered if the wrapped TokenFilter 
needs to read ahead.  I'm working on a fix for that now.

TestRandomChains has found quite a few problems with this, I'm tempted to back 
it out and work on a branch for a while as it's clearly not ready for release 
yet.

> Add a ConditionalTokenFilter
> 
>
> Key: LUCENE-8273
> URL: https://issues.apache.org/jira/browse/LUCENE-8273
> Project: Lucene - Core
>  Issue Type: New Feature
>    Reporter: Alan Woodward
>Assignee: Alan Woodward
>Priority: Major
> Fix For: 7.4
>
> Attachments: LUCENE-8273-part2.patch, LUCENE-8273.patch, 
> LUCENE-8273.patch, LUCENE-8273.patch, LUCENE-8273.patch, LUCENE-8273.patch, 
> LUCENE-8273.patch, LUCENE-8273.patch, LUCENE-8273.patch
>
>
> Spinoff of LUCENE-8265.  It would be useful to be able to wrap a TokenFilter 
> in such a way that it could optionally be bypassed based on the current state 
> of the TokenStream.  This could be used to, for example, only apply 
> WordDelimiterFilter to terms that contain hyphens.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-8306) Allow iteration over the term positions of a Match

2018-05-14 Thread Alan Woodward (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-8306?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16474058#comment-16474058
 ] 

Alan Woodward commented on LUCENE-8306:
---

bq. Can we do without this new API?

I think it's important, particularly if we're talking about highlighting terms 
in very large intervals.  Here's an updated patch.  I've changed the API to use 
a collector interface rather than returning a list, which will make things much 
easier to implement on Spans and Intervals.  I've also implemented it on exact 
and sloppy phrases, including a test against a sloppy phrase with repeats.  
It's ended up simplifying the SloppyPhraseMatcher slightly, as I was trying to 
do too much to report the intervals (and getting inaccurate results in certain 
circumstances, which this API revealed, so it's already been useful!)

> Allow iteration over the term positions of a Match
> --
>
> Key: LUCENE-8306
> URL: https://issues.apache.org/jira/browse/LUCENE-8306
> Project: Lucene - Core
>  Issue Type: New Feature
>    Reporter: Alan Woodward
>Assignee: Alan Woodward
>Priority: Major
> Attachments: LUCENE-8306.patch, LUCENE-8306.patch
>
>
> For multi-term queries such as phrase queries, the matches API currently just 
> returns information about the span of the whole match.  It would be useful to 
> also expose information about the matching terms within the phrase.  The same 
> would apply to Spans and Interval queries.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (LUCENE-8306) Allow iteration over the term positions of a Match

2018-05-14 Thread Alan Woodward (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-8306?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alan Woodward updated LUCENE-8306:
--
Attachment: LUCENE-8306.patch

> Allow iteration over the term positions of a Match
> --
>
> Key: LUCENE-8306
> URL: https://issues.apache.org/jira/browse/LUCENE-8306
> Project: Lucene - Core
>  Issue Type: New Feature
>    Reporter: Alan Woodward
>Assignee: Alan Woodward
>Priority: Major
> Attachments: LUCENE-8306.patch, LUCENE-8306.patch
>
>
> For multi-term queries such as phrase queries, the matches API currently just 
> returns information about the span of the whole match.  It would be useful to 
> also expose information about the matching terms within the phrase.  The same 
> would apply to Spans and Interval queries.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-8273) Add a ConditionalTokenFilter

2018-05-14 Thread Alan Woodward (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-8273?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16473985#comment-16473985
 ] 

Alan Woodward commented on LUCENE-8273:
---

Thanks Steve, your patch looks great.

> Add a ConditionalTokenFilter
> 
>
> Key: LUCENE-8273
> URL: https://issues.apache.org/jira/browse/LUCENE-8273
> Project: Lucene - Core
>  Issue Type: New Feature
>    Reporter: Alan Woodward
>Assignee: Alan Woodward
>Priority: Major
> Fix For: 7.4
>
> Attachments: LUCENE-8273-part2.patch, LUCENE-8273.patch, 
> LUCENE-8273.patch, LUCENE-8273.patch, LUCENE-8273.patch, LUCENE-8273.patch, 
> LUCENE-8273.patch, LUCENE-8273.patch, LUCENE-8273.patch
>
>
> Spinoff of LUCENE-8265.  It would be useful to be able to wrap a TokenFilter 
> in such a way that it could optionally be bypassed based on the current state 
> of the TokenStream.  This could be used to, for example, only apply 
> WordDelimiterFilter to terms that contain hyphens.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-8273) Add a ConditionalTokenFilter

2018-05-11 Thread Alan Woodward (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-8273?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16472709#comment-16472709
 ] 

Alan Woodward commented on LUCENE-8273:
---

Some more failures in testRandomChains:
https://jenkins.thetaphi.de/job/Lucene-Solr-7.x-Linux/1885/
https://jenkins.thetaphi.de/job/Lucene-Solr-master-Solaris/1857/

I'm going to AwaitsFix it for the weekend, and look at it again on Monday.

> Add a ConditionalTokenFilter
> 
>
> Key: LUCENE-8273
> URL: https://issues.apache.org/jira/browse/LUCENE-8273
> Project: Lucene - Core
>  Issue Type: New Feature
>    Reporter: Alan Woodward
>Assignee: Alan Woodward
>Priority: Major
> Fix For: 7.4
>
> Attachments: LUCENE-8273.patch, LUCENE-8273.patch, LUCENE-8273.patch, 
> LUCENE-8273.patch, LUCENE-8273.patch, LUCENE-8273.patch, LUCENE-8273.patch, 
> LUCENE-8273.patch
>
>
> Spinoff of LUCENE-8265.  It would be useful to be able to wrap a TokenFilter 
> in such a way that it could optionally be bypassed based on the current state 
> of the TokenStream.  This could be used to, for example, only apply 
> WordDelimiterFilter to terms that contain hyphens.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-8273) Add a ConditionalTokenFilter

2018-05-11 Thread Alan Woodward (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-8273?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16472206#comment-16472206
 ] 

Alan Woodward commented on LUCENE-8273:
---

bq. Thanks for debugging the failure

There will be more, I'm sure!

bq. Should we open a followup issue to clean up the analyzers and stuff?

+1

> Add a ConditionalTokenFilter
> 
>
> Key: LUCENE-8273
> URL: https://issues.apache.org/jira/browse/LUCENE-8273
> Project: Lucene - Core
>  Issue Type: New Feature
>    Reporter: Alan Woodward
>Assignee: Alan Woodward
>Priority: Major
> Fix For: 7.4
>
> Attachments: LUCENE-8273.patch, LUCENE-8273.patch, LUCENE-8273.patch, 
> LUCENE-8273.patch, LUCENE-8273.patch, LUCENE-8273.patch, LUCENE-8273.patch, 
> LUCENE-8273.patch
>
>
> Spinoff of LUCENE-8265.  It would be useful to be able to wrap a TokenFilter 
> in such a way that it could optionally be bypassed based on the current state 
> of the TokenStream.  This could be used to, for example, only apply 
> WordDelimiterFilter to terms that contain hyphens.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Resolved] (LUCENE-8273) Add a ConditionalTokenFilter

2018-05-11 Thread Alan Woodward (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-8273?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alan Woodward resolved LUCENE-8273.
---
Resolution: Fixed

Turns out it was an error in end() propagation, now fixed.  I'll keep an eye 
out for further failures.

> Add a ConditionalTokenFilter
> 
>
> Key: LUCENE-8273
> URL: https://issues.apache.org/jira/browse/LUCENE-8273
> Project: Lucene - Core
>  Issue Type: New Feature
>    Reporter: Alan Woodward
>Assignee: Alan Woodward
>Priority: Major
> Fix For: 7.4
>
> Attachments: LUCENE-8273.patch, LUCENE-8273.patch, LUCENE-8273.patch, 
> LUCENE-8273.patch, LUCENE-8273.patch, LUCENE-8273.patch, LUCENE-8273.patch, 
> LUCENE-8273.patch
>
>
> Spinoff of LUCENE-8265.  It would be useful to be able to wrap a TokenFilter 
> in such a way that it could optionally be bypassed based on the current state 
> of the TokenStream.  This could be used to, for example, only apply 
> WordDelimiterFilter to terms that contain hyphens.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-8273) Add a ConditionalTokenFilter

2018-05-11 Thread Alan Woodward (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-8273?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16471900#comment-16471900
 ] 

Alan Woodward commented on LUCENE-8273:
---

The elasticsearch CI had some failures due to this:
{code}
ant test  -Dtestcase=TestRandomChains -Dtests.method=testRandomChains 
-Dtests.seed=EF8BCF910EB1138C -Dtests.slow=true -Dtests.badapples=true 
-Dtests.locale=es-CR -Dtests.timezone=Asia/Ashgabat -Dtests.asserts=true 
-Dtests.file.encoding=UTF8
{code}

They look to be caused by FingerprintFilter being wrapped in a 
ConditionalTokenStream, which I don't think makes any sense?  So the simplest 
solution is probably to blacklist it.

> Add a ConditionalTokenFilter
> 
>
> Key: LUCENE-8273
> URL: https://issues.apache.org/jira/browse/LUCENE-8273
> Project: Lucene - Core
>  Issue Type: New Feature
>    Reporter: Alan Woodward
>Assignee: Alan Woodward
>Priority: Major
> Fix For: 7.4
>
> Attachments: LUCENE-8273.patch, LUCENE-8273.patch, LUCENE-8273.patch, 
> LUCENE-8273.patch, LUCENE-8273.patch, LUCENE-8273.patch, LUCENE-8273.patch, 
> LUCENE-8273.patch
>
>
> Spinoff of LUCENE-8265.  It would be useful to be able to wrap a TokenFilter 
> in such a way that it could optionally be bypassed based on the current state 
> of the TokenStream.  This could be used to, for example, only apply 
> WordDelimiterFilter to terms that contain hyphens.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Reopened] (LUCENE-8273) Add a ConditionalTokenFilter

2018-05-11 Thread Alan Woodward (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-8273?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alan Woodward reopened LUCENE-8273:
---

> Add a ConditionalTokenFilter
> 
>
> Key: LUCENE-8273
> URL: https://issues.apache.org/jira/browse/LUCENE-8273
> Project: Lucene - Core
>  Issue Type: New Feature
>    Reporter: Alan Woodward
>Assignee: Alan Woodward
>Priority: Major
> Fix For: 7.4
>
> Attachments: LUCENE-8273.patch, LUCENE-8273.patch, LUCENE-8273.patch, 
> LUCENE-8273.patch, LUCENE-8273.patch, LUCENE-8273.patch, LUCENE-8273.patch, 
> LUCENE-8273.patch
>
>
> Spinoff of LUCENE-8265.  It would be useful to be able to wrap a TokenFilter 
> in such a way that it could optionally be bypassed based on the current state 
> of the TokenStream.  This could be used to, for example, only apply 
> WordDelimiterFilter to terms that contain hyphens.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Resolved] (LUCENE-8273) Add a ConditionalTokenFilter

2018-05-11 Thread Alan Woodward (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-8273?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alan Woodward resolved LUCENE-8273.
---
   Resolution: Fixed
Fix Version/s: 7.4

Thanks all!  I'll start to look at applying this to the various language 
analyzers soon.

> Add a ConditionalTokenFilter
> 
>
> Key: LUCENE-8273
> URL: https://issues.apache.org/jira/browse/LUCENE-8273
> Project: Lucene - Core
>  Issue Type: New Feature
>    Reporter: Alan Woodward
>Priority: Major
> Fix For: 7.4
>
> Attachments: LUCENE-8273.patch, LUCENE-8273.patch, LUCENE-8273.patch, 
> LUCENE-8273.patch, LUCENE-8273.patch, LUCENE-8273.patch, LUCENE-8273.patch, 
> LUCENE-8273.patch
>
>
> Spinoff of LUCENE-8265.  It would be useful to be able to wrap a TokenFilter 
> in such a way that it could optionally be bypassed based on the current state 
> of the TokenStream.  This could be used to, for example, only apply 
> WordDelimiterFilter to terms that contain hyphens.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Assigned] (LUCENE-8273) Add a ConditionalTokenFilter

2018-05-11 Thread Alan Woodward (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-8273?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alan Woodward reassigned LUCENE-8273:
-

Assignee: Alan Woodward

> Add a ConditionalTokenFilter
> 
>
> Key: LUCENE-8273
> URL: https://issues.apache.org/jira/browse/LUCENE-8273
> Project: Lucene - Core
>  Issue Type: New Feature
>    Reporter: Alan Woodward
>Assignee: Alan Woodward
>Priority: Major
> Fix For: 7.4
>
> Attachments: LUCENE-8273.patch, LUCENE-8273.patch, LUCENE-8273.patch, 
> LUCENE-8273.patch, LUCENE-8273.patch, LUCENE-8273.patch, LUCENE-8273.patch, 
> LUCENE-8273.patch
>
>
> Spinoff of LUCENE-8265.  It would be useful to be able to wrap a TokenFilter 
> in such a way that it could optionally be bypassed based on the current state 
> of the TokenStream.  This could be used to, for example, only apply 
> WordDelimiterFilter to terms that contain hyphens.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (LUCENE-8306) Allow iteration over the term positions of a Match

2018-05-10 Thread Alan Woodward (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-8306?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alan Woodward updated LUCENE-8306:
--
Environment: (was: For multi-term queries such as phrase queries, the 
matches API currently just returns information about the span of the whole 
match.  It would be useful to also expose information about the matching terms 
within the phrase.  The same would apply to Spans and Interval queries.)
Description: For multi-term queries such as phrase queries, the matches API 
currently just returns information about the span of the whole match.  It would 
be useful to also expose information about the matching terms within the 
phrase.  The same would apply to Spans and Interval queries.

> Allow iteration over the term positions of a Match
> --
>
> Key: LUCENE-8306
> URL: https://issues.apache.org/jira/browse/LUCENE-8306
> Project: Lucene - Core
>  Issue Type: New Feature
>    Reporter: Alan Woodward
>Assignee: Alan Woodward
>Priority: Major
> Attachments: LUCENE-8306.patch
>
>
> For multi-term queries such as phrase queries, the matches API currently just 
> returns information about the span of the whole match.  It would be useful to 
> also expose information about the matching terms within the phrase.  The same 
> would apply to Spans and Interval queries.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-8306) Allow iteration over the term positions of a Match

2018-05-10 Thread Alan Woodward (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-8306?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16470358#comment-16470358
 ] 

Alan Woodward commented on LUCENE-8306:
---

Here's a patch with an outline for an API.  MatchesIterator gets a 
termMatches() method, returning a list of TermMatch objects.  TermMatch 
contains the term, payload, position and offsets of each term within the 
current match.

> Allow iteration over the term positions of a Match
> --
>
> Key: LUCENE-8306
> URL: https://issues.apache.org/jira/browse/LUCENE-8306
> Project: Lucene - Core
>  Issue Type: New Feature
> Environment: For multi-term queries such as phrase queries, the 
> matches API currently just returns information about the span of the whole 
> match.  It would be useful to also expose information about the matching 
> terms within the phrase.  The same would apply to Spans and Interval queries.
>Reporter: Alan Woodward
>Assignee: Alan Woodward
>Priority: Major
> Attachments: LUCENE-8306.patch
>
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (LUCENE-8306) Allow iteration over the term positions of a Match

2018-05-10 Thread Alan Woodward (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-8306?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alan Woodward updated LUCENE-8306:
--
Attachment: LUCENE-8306.patch

> Allow iteration over the term positions of a Match
> --
>
> Key: LUCENE-8306
> URL: https://issues.apache.org/jira/browse/LUCENE-8306
> Project: Lucene - Core
>  Issue Type: New Feature
> Environment: For multi-term queries such as phrase queries, the 
> matches API currently just returns information about the span of the whole 
> match.  It would be useful to also expose information about the matching 
> terms within the phrase.  The same would apply to Spans and Interval queries.
>Reporter: Alan Woodward
>Assignee: Alan Woodward
>Priority: Major
> Attachments: LUCENE-8306.patch
>
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Created] (LUCENE-8306) Allow iteration over the term positions of a Match

2018-05-10 Thread Alan Woodward (JIRA)
Alan Woodward created LUCENE-8306:
-

 Summary: Allow iteration over the term positions of a Match
 Key: LUCENE-8306
 URL: https://issues.apache.org/jira/browse/LUCENE-8306
 Project: Lucene - Core
  Issue Type: New Feature
 Environment: For multi-term queries such as phrase queries, the 
matches API currently just returns information about the span of the whole 
match.  It would be useful to also expose information about the matching terms 
within the phrase.  The same would apply to Spans and Interval queries.
Reporter: Alan Woodward
Assignee: Alan Woodward






--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-8304) Add TermFrequencyQuery

2018-05-10 Thread Alan Woodward (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-8304?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16470332#comment-16470332
 ] 

Alan Woodward commented on LUCENE-8304:
---

It's no more inefficient than a plain TermQuery though?  We can make it more 
efficient using impacts, but I think this is still useful in its current state.

> Add TermFrequencyQuery
> --
>
> Key: LUCENE-8304
> URL: https://issues.apache.org/jira/browse/LUCENE-8304
> Project: Lucene - Core
>  Issue Type: New Feature
>    Reporter: Alan Woodward
>Assignee: Alan Woodward
>Priority: Major
> Attachments: LUCENE-8304.patch
>
>
> This has come up a few times when writing query parsers.  It would be useful 
> to have a query that returned documents that match a term with a particular 
> frequency - eg, all docs where "patent" appears at least five times.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Resolved] (LUCENE-8249) Add matches to exact PhraseQuery and MultiPhraseQuery

2018-05-10 Thread Alan Woodward (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-8249?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alan Woodward resolved LUCENE-8249.
---
   Resolution: Fixed
Fix Version/s: 7.4

> Add matches to exact PhraseQuery and MultiPhraseQuery
> -
>
> Key: LUCENE-8249
> URL: https://issues.apache.org/jira/browse/LUCENE-8249
> Project: Lucene - Core
>  Issue Type: Improvement
>    Reporter: Alan Woodward
>Assignee: Alan Woodward
>Priority: Major
> Fix For: 7.4
>
> Attachments: LUCENE-8249.patch, LUCENE-8249.patch, LUCENE-8249.patch, 
> LUCENE-8249.patch, LUCENE-8249.patch
>
>
> ExactPhraseScorer can be rejigged fairly easily to expose a MatchesIterator



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-8304) Add TermFrequencyQuery

2018-05-10 Thread Alan Woodward (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-8304?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16470218#comment-16470218
 ] 

Alan Woodward commented on LUCENE-8304:
---

I'll stick this in the sandbox, and then we can work on Impacts in a follow up. 
 They'd be master-only anyway.

> Add TermFrequencyQuery
> --
>
> Key: LUCENE-8304
> URL: https://issues.apache.org/jira/browse/LUCENE-8304
> Project: Lucene - Core
>  Issue Type: New Feature
>    Reporter: Alan Woodward
>Assignee: Alan Woodward
>Priority: Major
> Attachments: LUCENE-8304.patch
>
>
> This has come up a few times when writing query parsers.  It would be useful 
> to have a query that returned documents that match a term with a particular 
> frequency - eg, all docs where "patent" appears at least five times.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



Re: lucene-solr:master: Squashed commit of the following:

2018-05-09 Thread Alan Woodward
I think this has broken precommit?


build-nav-data-files:
 [java] Building up tree of all known pages
 [java] ERROR: Orphan page: 
/Users/romseygeek/projects/lucene-solr/solr/build/solr-ref-guide/content/dsp.adoc
 [java] Exception in thread "main" java.lang.RuntimeException: Found 1 
orphan pages (which are not in the 'page-children' attribute of any other pages)
 [java] at BuildNavAndPDFBody.main(BuildNavAndPDFBody.java:82)

> On 9 May 2018, at 18:24, jbern...@apache.org wrote:
> 
> Repository: lucene-solr
> Updated Branches:
>  refs/heads/master 4177252a1 -> 144f00a1e
> 
> 
> Squashed commit of the following:
> 
> commit e5074c3223e394af17f686294a67a1dd3ecdd147
> Author: Joel Bernstein 
> Date:   Wed May 9 13:16:34 2018 -0400
> 
>SOLR-12280: WIP
> 
> commit 69cdeccf161177d10f4d2407542392aaee3fcfe8
> Author: Joel Bernstein 
> Date:   Wed May 9 13:08:02 2018 -0400
> 
>SOLR-12280: WIP
> 
> commit c94f0c87c3e57c023d622ad2411e522c4aac491c
> Author: Joel Bernstein 
> Date:   Wed May 9 11:54:58 2018 -0400
> 
>SOLR-12280: WIP
> 
> commit 68dd1e73355cb84410f2d0ff3a51797ed6194a10
> Author: Joel Bernstein 
> Date:   Wed May 9 10:54:32 2018 -0400
> 
>SOLR-12280: WIP
> 
> commit 04a010543418a469100fa299c606a7b1eed452e1
> Author: Joel Bernstein 
> Date:   Wed May 9 10:47:27 2018 -0400
> 
>SOLR-12280: WIP
> 
> commit a6bbfbadaafe33fcdf93d5c72755e30dadadf017
> Author: Joel Bernstein 
> Date:   Wed May 9 09:40:08 2018 -0400
> 
>SOLR-12280: WIP
> 
> commit 5d27961aa291bcd71527337632981bcdf62369b4
> Author: Joel Bernstein 
> Date:   Tue May 8 20:43:33 2018 -0400
> 
>SOLR-12280: WIP
> 
> commit e982cf939f429c05b736f6292c68dd96d7ebc027
> Author: Joel Bernstein 
> Date:   Tue May 8 13:27:29 2018 -0400
> 
>SOLR-12280: WIP
> 
> commit aae78ab6f387c28a080021bc919ef51864540be2
> Author: Joel Bernstein 
> Date:   Tue May 8 12:23:52 2018 -0400
> 
>SOLR-12280: WIP
> 
> commit 0787ad76f0f4c62c860784b15490d8a988939997
> Author: Joel Bernstein 
> Date:   Tue May 8 12:20:38 2018 -0400
> 
>SOLR-12280: WIP
> 
> commit 4df098376ba05188702cca8582959c3fe18066f5
> Author: Joel Bernstein 
> Date:   Tue May 8 12:12:11 2018 -0400
> 
>SOLR-12280: WIP
> 
> commit 5c0be5136bbab7e0c33b3b8a7b0395b1b330e96d
> Author: Joel Bernstein 
> Date:   Tue May 8 12:04:57 2018 -0400
> 
>SOLR-12280: WIP
> 
> commit 6c6feac4c2e5a49a5eab87a228713d1b93c8fc70
> Author: Joel Bernstein 
> Date:   Tue May 8 11:57:49 2018 -0400
> 
>SOLR-12280: WIP
> 
> commit 7d46d11c9dd3a51b68600c2c889f586147545294
> Author: Joel Bernstein 
> Date:   Tue May 8 11:50:51 2018 -0400
> 
>SOLR-12280: WIP
> 
> commit 8b6bf19d0091203ed63b39d070dd02a9bece6a61
> Author: Joel Bernstein 
> Date:   Mon May 7 10:53:14 2018 -0400
> 
>SOLR-12280: WIP
> 
> commit 5466591999816eaacde6ce18d824d7688e5f4fe8
> Author: Joel Bernstein 
> Date:   Fri May 4 15:12:43 2018 -0400
> 
>SOLR-12280: WIP
> 
> commit d7fff7d557a7fd26011c21445b7969b2cd81036f
> Author: Joel Bernstein 
> Date:   Fri Apr 27 12:50:27 2018 -0400
> 
>SOLR-12280: Initial commit
> 
> 
> Project: http://git-wip-us.apache.org/repos/asf/lucene-solr/repo
> Commit: http://git-wip-us.apache.org/repos/asf/lucene-solr/commit/144f00a1
> Tree: http://git-wip-us.apache.org/repos/asf/lucene-solr/tree/144f00a1
> Diff: http://git-wip-us.apache.org/repos/asf/lucene-solr/diff/144f00a1
> 
> Branch: refs/heads/master
> Commit: 144f00a1e315541a28f526f4cdf1e55eb60c862b
> Parents: 4177252
> Author: Joel Bernstein 
> Authored: Wed May 9 13:24:08 2018 -0400
> Committer: Joel Bernstein 
> Committed: Wed May 9 13:24:08 2018 -0400
> 
> --
> solr/solr-ref-guide/src/dsp.adoc| 719 +++
> .../hidden-signal-autocorrelation.png   | Bin 0 -> 258831 bytes
> .../math-expressions/hidden-signal-fft.png  | Bin 0 -> 215981 bytes
> .../images/math-expressions/hidden-signal.png   | Bin 0 -> 319100 bytes
> .../math-expressions/noise-autocorrelation.png  | Bin 0 -> 204511 bytes
> .../src/images/math-expressions/noise-fft.png   | Bin 0 -> 319551 bytes
> .../src/images/math-expressions/noise.png   | Bin 0 -> 375565 bytes
> .../math-expressions/signal-autocorrelation.png | Bin 0 -> 322164 bytes
> .../src/images/math-expressions/signal-fft.png  | Bin 0 -> 140111 bytes
> .../src/images/math-expressions/signal.png  | Bin 0 -> 365018 bytes
> solr/solr-ref-guide/src/math-expressions.adoc   |   2 +
> 11 files changed, 721 insertions(+)
> 

[jira] [Commented] (LUCENE-8304) Add TermFrequencyQuery

2018-05-09 Thread Alan Woodward (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-8304?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16469080#comment-16469080
 ] 

Alan Woodward commented on LUCENE-8304:
---

I haven't properly looked at Impacts yet - is the idea that we can skip blocks 
by looking at their min and max freqs and seeing if they fall out of the 
requested range?

> Add TermFrequencyQuery
> --
>
> Key: LUCENE-8304
> URL: https://issues.apache.org/jira/browse/LUCENE-8304
> Project: Lucene - Core
>  Issue Type: New Feature
>    Reporter: Alan Woodward
>Assignee: Alan Woodward
>Priority: Major
> Attachments: LUCENE-8304.patch
>
>
> This has come up a few times when writing query parsers.  It would be useful 
> to have a query that returned documents that match a term with a particular 
> frequency - eg, all docs where "patent" appears at least five times.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-8249) Add matches to exact PhraseQuery and MultiPhraseQuery

2018-05-09 Thread Alan Woodward (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-8249?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16469069#comment-16469069
 ] 

Alan Woodward commented on LUCENE-8249:
---

I spoke to Adrien elsewhere and have added some comments to 
SloppyPhraseMatcher.  I've also removed the exposeOffsets boolean as it wasn't 
protecting anything expensive, and was confusingly named.

> Add matches to exact PhraseQuery and MultiPhraseQuery
> -
>
> Key: LUCENE-8249
> URL: https://issues.apache.org/jira/browse/LUCENE-8249
> Project: Lucene - Core
>  Issue Type: Improvement
>    Reporter: Alan Woodward
>Assignee: Alan Woodward
>Priority: Major
> Attachments: LUCENE-8249.patch, LUCENE-8249.patch, LUCENE-8249.patch, 
> LUCENE-8249.patch, LUCENE-8249.patch
>
>
> ExactPhraseScorer can be rejigged fairly easily to expose a MatchesIterator



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (LUCENE-8249) Add matches to exact PhraseQuery and MultiPhraseQuery

2018-05-09 Thread Alan Woodward (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-8249?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alan Woodward updated LUCENE-8249:
--
Attachment: LUCENE-8249.patch

> Add matches to exact PhraseQuery and MultiPhraseQuery
> -
>
> Key: LUCENE-8249
> URL: https://issues.apache.org/jira/browse/LUCENE-8249
> Project: Lucene - Core
>  Issue Type: Improvement
>    Reporter: Alan Woodward
>Assignee: Alan Woodward
>Priority: Major
> Attachments: LUCENE-8249.patch, LUCENE-8249.patch, LUCENE-8249.patch, 
> LUCENE-8249.patch, LUCENE-8249.patch
>
>
> ExactPhraseScorer can be rejigged fairly easily to expose a MatchesIterator



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-8304) Add TermFrequencyQuery

2018-05-09 Thread Alan Woodward (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-8304?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16468942#comment-16468942
 ] 

Alan Woodward commented on LUCENE-8304:
---

Here's a patch, adding TermFrequencyQuery to the {{queries}} module.

> Add TermFrequencyQuery
> --
>
> Key: LUCENE-8304
> URL: https://issues.apache.org/jira/browse/LUCENE-8304
> Project: Lucene - Core
>  Issue Type: New Feature
>    Reporter: Alan Woodward
>Assignee: Alan Woodward
>Priority: Major
> Attachments: LUCENE-8304.patch
>
>
> This has come up a few times when writing query parsers.  It would be useful 
> to have a query that returned documents that match a term with a particular 
> frequency - eg, all docs where "patent" appears at least five times.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (LUCENE-8304) Add TermFrequencyQuery

2018-05-09 Thread Alan Woodward (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-8304?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alan Woodward updated LUCENE-8304:
--
Attachment: LUCENE-8304.patch

> Add TermFrequencyQuery
> --
>
> Key: LUCENE-8304
> URL: https://issues.apache.org/jira/browse/LUCENE-8304
> Project: Lucene - Core
>  Issue Type: New Feature
>    Reporter: Alan Woodward
>Assignee: Alan Woodward
>Priority: Major
> Attachments: LUCENE-8304.patch
>
>
> This has come up a few times when writing query parsers.  It would be useful 
> to have a query that returned documents that match a term with a particular 
> frequency - eg, all docs where "patent" appears at least five times.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Created] (LUCENE-8304) Add TermFrequencyQuery

2018-05-09 Thread Alan Woodward (JIRA)
Alan Woodward created LUCENE-8304:
-

 Summary: Add TermFrequencyQuery
 Key: LUCENE-8304
 URL: https://issues.apache.org/jira/browse/LUCENE-8304
 Project: Lucene - Core
  Issue Type: New Feature
Reporter: Alan Woodward
Assignee: Alan Woodward


This has come up a few times when writing query parsers.  It would be useful to 
have a query that returned documents that match a term with a particular 
frequency - eg, all docs where "patent" appears at least five times.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



Re: [VOTE] Release Lucene/Solr 7.3.1 RC2

2018-05-09 Thread Alan Woodward
+1
SUCCESS! [3:10:43.862442]

My internet has been really very slow today...

On Wed, May 9, 2018 at 5:50 AM, Đạt Cao Mạnh 
wrote:

> Please vote for release candidate 2 for Lucene/Solr 7.3.1
>
> The artifact can be downloaded from:
> *https://dist.apache.org/repos/dist/dev/lucene/lucene-solr-7.3.1-RC2-revae0705edb59eaa567fe13ed3a222fdadc7153680/
> *
>
> You can run the smoke tester directly with this command:
> python3 -u dev-tools/scripts/smokeTestRelease.py 
> *https://dist.apache.org/repos/dist/dev/lucene/lucene-solr-7.3.1-RC2-revae0705edb59eaa567fe13ed3a222fdadc7153680
> *
>
> Here’s my +1
> SUCCESS! [0:53:47.443795]
>


[jira] [Commented] (LUCENE-8273) Add a ConditionalTokenFilter

2018-05-09 Thread Alan Woodward (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-8273?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16468547#comment-16468547
 ] 

Alan Woodward commented on LUCENE-8273:
---

New patch, {{whenTerm()}} now takes {{Predicate}}, and 
{{TermExclusionFilterFactory}} uses {{protected}} instead of {{excludeFile}}

> Add a ConditionalTokenFilter
> 
>
> Key: LUCENE-8273
> URL: https://issues.apache.org/jira/browse/LUCENE-8273
> Project: Lucene - Core
>  Issue Type: New Feature
>    Reporter: Alan Woodward
>Priority: Major
> Attachments: LUCENE-8273.patch, LUCENE-8273.patch, LUCENE-8273.patch, 
> LUCENE-8273.patch, LUCENE-8273.patch, LUCENE-8273.patch, LUCENE-8273.patch, 
> LUCENE-8273.patch
>
>
> Spinoff of LUCENE-8265.  It would be useful to be able to wrap a TokenFilter 
> in such a way that it could optionally be bypassed based on the current state 
> of the TokenStream.  This could be used to, for example, only apply 
> WordDelimiterFilter to terms that contain hyphens.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (LUCENE-8273) Add a ConditionalTokenFilter

2018-05-09 Thread Alan Woodward (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-8273?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alan Woodward updated LUCENE-8273:
--
Attachment: LUCENE-8273.patch

> Add a ConditionalTokenFilter
> 
>
> Key: LUCENE-8273
> URL: https://issues.apache.org/jira/browse/LUCENE-8273
> Project: Lucene - Core
>  Issue Type: New Feature
>    Reporter: Alan Woodward
>Priority: Major
> Attachments: LUCENE-8273.patch, LUCENE-8273.patch, LUCENE-8273.patch, 
> LUCENE-8273.patch, LUCENE-8273.patch, LUCENE-8273.patch, LUCENE-8273.patch, 
> LUCENE-8273.patch
>
>
> Spinoff of LUCENE-8265.  It would be useful to be able to wrap a TokenFilter 
> in such a way that it could optionally be bypassed based on the current state 
> of the TokenStream.  This could be used to, for example, only apply 
> WordDelimiterFilter to terms that contain hyphens.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Comment Edited] (LUCENE-8196) Add IntervalQuery and IntervalsSource to expose minimum interval semantics across term fields

2018-05-09 Thread Alan Woodward (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-8196?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16468537#comment-16468537
 ] 

Alan Woodward edited comment on LUCENE-8196 at 5/9/18 8:18 AM:
---

I opened LUCENE-8300 to deal with unordered overlaps.


was (Author: romseygeek):
I opened LUCENE-8300 do deal with unordered overlaps.

> Add IntervalQuery and IntervalsSource to expose minimum interval semantics 
> across term fields
> -
>
> Key: LUCENE-8196
> URL: https://issues.apache.org/jira/browse/LUCENE-8196
> Project: Lucene - Core
>  Issue Type: New Feature
>Reporter: Alan Woodward
>    Assignee: Alan Woodward
>Priority: Major
> Fix For: 7.4
>
> Attachments: LUCENE-8196-debug.patch, LUCENE-8196.patch, 
> LUCENE-8196.patch, LUCENE-8196.patch, LUCENE-8196.patch, LUCENE-8196.patch
>
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> This ticket proposes an alternative implementation of the SpanQuery family 
> that uses minimum-interval semantics from 
> [http://vigna.di.unimi.it/ftp/papers/EfficientAlgorithmsMinimalIntervalSemantics.pdf]
>  to implement positional queries across term-based fields.  Rather than using 
> TermQueries to construct the interval operators, as in LUCENE-2878 or the 
> current Spans implementation, we instead use a new IntervalsSource object, 
> which will produce IntervalIterators over a particular segment and field.  
> These are constructed using various static helper methods, and can then be 
> passed to a new IntervalQuery which will return documents that contain one or 
> more intervals so defined.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (LUCENE-8300) Add unordered-distinct IntervalsSource

2018-05-09 Thread Alan Woodward (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-8300?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alan Woodward updated LUCENE-8300:
--
Attachment: LUCENE-8300.patch

> Add unordered-distinct IntervalsSource
> --
>
> Key: LUCENE-8300
> URL: https://issues.apache.org/jira/browse/LUCENE-8300
> Project: Lucene - Core
>  Issue Type: New Feature
>    Reporter: Alan Woodward
>Assignee: Alan Woodward
>Priority: Major
> Attachments: LUCENE-8300.patch
>
>
> [~mattweber] pointed out on LUCENE-8196 that {{Intervals.unordered()}} 
> doesn't check to see if its subintervals overlap, which means that for 
> example {{Intervals.unordered(Intervals.term("a"), Intervals.term("a"))}} 
> would match a document with {{a}} appearing only once.  This ticket will 
> introduce a new function, {{Intervals.unordered_distinct()}}, that ensures 
> that all subintervals within an unordered interval do not overlap.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-8196) Add IntervalQuery and IntervalsSource to expose minimum interval semantics across term fields

2018-05-09 Thread Alan Woodward (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-8196?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16468537#comment-16468537
 ] 

Alan Woodward commented on LUCENE-8196:
---

I opened LUCENE-8300 do deal with unordered overlaps.

> Add IntervalQuery and IntervalsSource to expose minimum interval semantics 
> across term fields
> -
>
> Key: LUCENE-8196
> URL: https://issues.apache.org/jira/browse/LUCENE-8196
> Project: Lucene - Core
>  Issue Type: New Feature
>Reporter: Alan Woodward
>    Assignee: Alan Woodward
>Priority: Major
> Fix For: 7.4
>
> Attachments: LUCENE-8196-debug.patch, LUCENE-8196.patch, 
> LUCENE-8196.patch, LUCENE-8196.patch, LUCENE-8196.patch, LUCENE-8196.patch
>
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> This ticket proposes an alternative implementation of the SpanQuery family 
> that uses minimum-interval semantics from 
> [http://vigna.di.unimi.it/ftp/papers/EfficientAlgorithmsMinimalIntervalSemantics.pdf]
>  to implement positional queries across term-based fields.  Rather than using 
> TermQueries to construct the interval operators, as in LUCENE-2878 or the 
> current Spans implementation, we instead use a new IntervalsSource object, 
> which will produce IntervalIterators over a particular segment and field.  
> These are constructed using various static helper methods, and can then be 
> passed to a new IntervalQuery which will return documents that contain one or 
> more intervals so defined.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Created] (LUCENE-8300) Add unordered-distinct IntervalsSource

2018-05-09 Thread Alan Woodward (JIRA)
Alan Woodward created LUCENE-8300:
-

 Summary: Add unordered-distinct IntervalsSource
 Key: LUCENE-8300
 URL: https://issues.apache.org/jira/browse/LUCENE-8300
 Project: Lucene - Core
  Issue Type: New Feature
Reporter: Alan Woodward
Assignee: Alan Woodward


[~mattweber] pointed out on LUCENE-8196 that {{Intervals.unordered()}} doesn't 
check to see if its subintervals overlap, which means that for example 
{{Intervals.unordered(Intervals.term("a"), Intervals.term("a"))}} would match a 
document with {{a}} appearing only once.  This ticket will introduce a new 
function, {{Intervals.unordered_distinct()}}, that ensures that all 
subintervals within an unordered interval do not overlap.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-8273) Add a ConditionalTokenFilter

2018-05-08 Thread Alan Woodward (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-8273?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16467353#comment-16467353
 ] 

Alan Woodward commented on LUCENE-8273:
---

Thanks David, I'll do that.  Will commit later on if nobody else has any 
comments.

> Add a ConditionalTokenFilter
> 
>
> Key: LUCENE-8273
> URL: https://issues.apache.org/jira/browse/LUCENE-8273
> Project: Lucene - Core
>  Issue Type: New Feature
>    Reporter: Alan Woodward
>Priority: Major
> Attachments: LUCENE-8273.patch, LUCENE-8273.patch, LUCENE-8273.patch, 
> LUCENE-8273.patch, LUCENE-8273.patch, LUCENE-8273.patch, LUCENE-8273.patch
>
>
> Spinoff of LUCENE-8265.  It would be useful to be able to wrap a TokenFilter 
> in such a way that it could optionally be bypassed based on the current state 
> of the TokenStream.  This could be used to, for example, only apply 
> WordDelimiterFilter to terms that contain hyphens.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-8273) Add a ConditionalTokenFilter

2018-05-07 Thread Alan Woodward (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-8273?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16466012#comment-16466012
 ] 

Alan Woodward commented on LUCENE-8273:
---

Patch, up to date with master and passing all tests and precommit.  I think 
this is ready?

> Add a ConditionalTokenFilter
> 
>
> Key: LUCENE-8273
> URL: https://issues.apache.org/jira/browse/LUCENE-8273
> Project: Lucene - Core
>  Issue Type: New Feature
>    Reporter: Alan Woodward
>Priority: Major
> Attachments: LUCENE-8273.patch, LUCENE-8273.patch, LUCENE-8273.patch, 
> LUCENE-8273.patch, LUCENE-8273.patch, LUCENE-8273.patch, LUCENE-8273.patch
>
>
> Spinoff of LUCENE-8265.  It would be useful to be able to wrap a TokenFilter 
> in such a way that it could optionally be bypassed based on the current state 
> of the TokenStream.  This could be used to, for example, only apply 
> WordDelimiterFilter to terms that contain hyphens.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (LUCENE-8273) Add a ConditionalTokenFilter

2018-05-07 Thread Alan Woodward (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-8273?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alan Woodward updated LUCENE-8273:
--
Attachment: LUCENE-8273.patch

> Add a ConditionalTokenFilter
> 
>
> Key: LUCENE-8273
> URL: https://issues.apache.org/jira/browse/LUCENE-8273
> Project: Lucene - Core
>  Issue Type: New Feature
>    Reporter: Alan Woodward
>Priority: Major
> Attachments: LUCENE-8273.patch, LUCENE-8273.patch, LUCENE-8273.patch, 
> LUCENE-8273.patch, LUCENE-8273.patch, LUCENE-8273.patch, LUCENE-8273.patch
>
>
> Spinoff of LUCENE-8265.  It would be useful to be able to wrap a TokenFilter 
> in such a way that it could optionally be bypassed based on the current state 
> of the TokenStream.  This could be used to, for example, only apply 
> WordDelimiterFilter to terms that contain hyphens.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-8286) UnifiedHighlighter should support the new Weight.matches API for better match accuracy

2018-05-02 Thread Alan Woodward (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-8286?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16461226#comment-16461226
 ] 

Alan Woodward commented on LUCENE-8286:
---

There's an API mismatch in how offsets are retrieved, per-field in the 
UnifiedHighlighter and per-leafreader in the Matches API, which means that (for 
example) we can't easily use term vectors for a single field with Matches.  So 
that will need to be resolved somehow.

> UnifiedHighlighter should support the new Weight.matches API for better match 
> accuracy
> --
>
> Key: LUCENE-8286
> URL: https://issues.apache.org/jira/browse/LUCENE-8286
> Project: Lucene - Core
>  Issue Type: Improvement
>  Components: modules/highlighter
>Reporter: David Smiley
>Priority: Major
>
> The new Weight.matches() API should allow the UnifiedHighlighter to more 
> accurately highlight some BooleanQuery patterns correctly -- see LUCENE-7903.
> In addition, this API should make the job of highlighting easier, reducing 
> the LOC and related complexities, especially the UH's PhraseHelper.  Note: 
> reducing/removing PhraseHelper is not a near-term goal since Weight.matches 
> is experimental and incomplete, and perhaps we'll discover some gaps in 
> flexibility/functionality.
> This issue should introduce a new UnifiedHighlighter.HighlightFlag enum 
> option for this method of highlighting.   Perhaps call it {{WEIGHT_MATCHES}}? 
>  Longer term it could go away and it'll be implied if you specify enum values 
> for PHRASES & MULTI_TERM_QUERY?



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-8273) Add a ConditionalTokenFilter

2018-05-01 Thread Alan Woodward (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-8273?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16459643#comment-16459643
 ] 

Alan Woodward commented on LUCENE-8273:
---

bq. it seems you need to make the ConditionalTokenFilterFactory implement the 
resourceloaderaware stuff always

Just spotted Robert's comment here, I've added a new patch which fixes this.  
Thanks!

> Add a ConditionalTokenFilter
> 
>
> Key: LUCENE-8273
> URL: https://issues.apache.org/jira/browse/LUCENE-8273
> Project: Lucene - Core
>  Issue Type: New Feature
>    Reporter: Alan Woodward
>Priority: Major
> Attachments: LUCENE-8273.patch, LUCENE-8273.patch, LUCENE-8273.patch, 
> LUCENE-8273.patch, LUCENE-8273.patch, LUCENE-8273.patch
>
>
> Spinoff of LUCENE-8265.  It would be useful to be able to wrap a TokenFilter 
> in such a way that it could optionally be bypassed based on the current state 
> of the TokenStream.  This could be used to, for example, only apply 
> WordDelimiterFilter to terms that contain hyphens.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (LUCENE-8273) Add a ConditionalTokenFilter

2018-05-01 Thread Alan Woodward (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-8273?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alan Woodward updated LUCENE-8273:
--
Attachment: LUCENE-8273.patch

> Add a ConditionalTokenFilter
> 
>
> Key: LUCENE-8273
> URL: https://issues.apache.org/jira/browse/LUCENE-8273
> Project: Lucene - Core
>  Issue Type: New Feature
>    Reporter: Alan Woodward
>Priority: Major
> Attachments: LUCENE-8273.patch, LUCENE-8273.patch, LUCENE-8273.patch, 
> LUCENE-8273.patch, LUCENE-8273.patch, LUCENE-8273.patch
>
>
> Spinoff of LUCENE-8265.  It would be useful to be able to wrap a TokenFilter 
> in such a way that it could optionally be bypassed based on the current state 
> of the TokenStream.  This could be used to, for example, only apply 
> WordDelimiterFilter to terms that contain hyphens.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-8273) Add a ConditionalTokenFilter

2018-05-01 Thread Alan Woodward (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-8273?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16459564#comment-16459564
 ] 

Alan Woodward commented on LUCENE-8273:
---

Updated patch.  {{ConditionalTokenFilterFactory}} is now a top-level class, 
distinct from {{ConditionBuilder}}.  I've added a TermExclusionFilter that 
accepts a list of terms and only runs its child filters if the current token is 
not in its list, and demonstrated how to use it in TestCustomAnalyzer.  At the 
moment it just reads a word file, but we can expand it to accept patterns or a 
directly passed in list of terms in follow ups.  I've also changed the 
CustomAnalyzerBuilder to use {{when}} rather than {{ifXXX}} - thanks for the 
suggestion Steve!

> Add a ConditionalTokenFilter
> 
>
> Key: LUCENE-8273
> URL: https://issues.apache.org/jira/browse/LUCENE-8273
> Project: Lucene - Core
>  Issue Type: New Feature
>    Reporter: Alan Woodward
>Priority: Major
> Attachments: LUCENE-8273.patch, LUCENE-8273.patch, LUCENE-8273.patch, 
> LUCENE-8273.patch, LUCENE-8273.patch
>
>
> Spinoff of LUCENE-8265.  It would be useful to be able to wrap a TokenFilter 
> in such a way that it could optionally be bypassed based on the current state 
> of the TokenStream.  This could be used to, for example, only apply 
> WordDelimiterFilter to terms that contain hyphens.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (LUCENE-8273) Add a ConditionalTokenFilter

2018-05-01 Thread Alan Woodward (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-8273?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alan Woodward updated LUCENE-8273:
--
Attachment: LUCENE-8273.patch

> Add a ConditionalTokenFilter
> 
>
> Key: LUCENE-8273
> URL: https://issues.apache.org/jira/browse/LUCENE-8273
> Project: Lucene - Core
>  Issue Type: New Feature
>    Reporter: Alan Woodward
>Priority: Major
> Attachments: LUCENE-8273.patch, LUCENE-8273.patch, LUCENE-8273.patch, 
> LUCENE-8273.patch, LUCENE-8273.patch
>
>
> Spinoff of LUCENE-8265.  It would be useful to be able to wrap a TokenFilter 
> in such a way that it could optionally be bypassed based on the current state 
> of the TokenStream.  This could be used to, for example, only apply 
> WordDelimiterFilter to terms that contain hyphens.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-8273) Add a ConditionalTokenFilter

2018-04-30 Thread Alan Woodward (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-8273?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16458550#comment-16458550
 ] 

Alan Woodward commented on LUCENE-8273:
---

{{if}} is a keyword, unfortunately, so that won't work.  Maybe {{ifCondition}} 
instead?

Access to resources will be a bit trickier, I think maybe the best way to do 
that would be to have a specialised method {{ifInList}} or something similar 
that takes a path to a word list.  I'll see what I can come up with.

> Add a ConditionalTokenFilter
> 
>
> Key: LUCENE-8273
> URL: https://issues.apache.org/jira/browse/LUCENE-8273
> Project: Lucene - Core
>  Issue Type: New Feature
>    Reporter: Alan Woodward
>Priority: Major
> Attachments: LUCENE-8273.patch, LUCENE-8273.patch, LUCENE-8273.patch, 
> LUCENE-8273.patch
>
>
> Spinoff of LUCENE-8265.  It would be useful to be able to wrap a TokenFilter 
> in such a way that it could optionally be bypassed based on the current state 
> of the TokenStream.  This could be used to, for example, only apply 
> WordDelimiterFilter to terms that contain hyphens.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-8273) Add a ConditionalTokenFilter

2018-04-30 Thread Alan Woodward (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-8273?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16458472#comment-16458472
 ] 

Alan Woodward commented on LUCENE-8273:
---

[~msoko...@gmail.com] in that case, the filter would still get applied, because 
we only check the token that is passed to the synonym filter from outside.  
Anything that's pulled by the synonym filter itself doesn't get checked.  
Although thinking about it, it would be possible to run the check in the 
OneTimeWrapper as well and return 'false' from things that don't pass the 
check.  I'm not sure how that would work with the graph itself though, it might 
end up corrupting things.

> Add a ConditionalTokenFilter
> 
>
> Key: LUCENE-8273
> URL: https://issues.apache.org/jira/browse/LUCENE-8273
> Project: Lucene - Core
>  Issue Type: New Feature
>    Reporter: Alan Woodward
>Priority: Major
> Attachments: LUCENE-8273.patch, LUCENE-8273.patch, LUCENE-8273.patch, 
> LUCENE-8273.patch
>
>
> Spinoff of LUCENE-8265.  It would be useful to be able to wrap a TokenFilter 
> in such a way that it could optionally be bypassed based on the current state 
> of the TokenStream.  This could be used to, for example, only apply 
> WordDelimiterFilter to terms that contain hyphens.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-8273) Add a ConditionalTokenFilter

2018-04-30 Thread Alan Woodward (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-8273?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16458469#comment-16458469
 ] 

Alan Woodward commented on LUCENE-8273:
---

And here's a patch that integrates it into CustomAnalyzer

> Add a ConditionalTokenFilter
> 
>
> Key: LUCENE-8273
> URL: https://issues.apache.org/jira/browse/LUCENE-8273
> Project: Lucene - Core
>  Issue Type: New Feature
>    Reporter: Alan Woodward
>Priority: Major
> Attachments: LUCENE-8273.patch, LUCENE-8273.patch, LUCENE-8273.patch, 
> LUCENE-8273.patch
>
>
> Spinoff of LUCENE-8265.  It would be useful to be able to wrap a TokenFilter 
> in such a way that it could optionally be bypassed based on the current state 
> of the TokenStream.  This could be used to, for example, only apply 
> WordDelimiterFilter to terms that contain hyphens.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (LUCENE-8273) Add a ConditionalTokenFilter

2018-04-30 Thread Alan Woodward (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-8273?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alan Woodward updated LUCENE-8273:
--
Attachment: LUCENE-8273.patch

> Add a ConditionalTokenFilter
> 
>
> Key: LUCENE-8273
> URL: https://issues.apache.org/jira/browse/LUCENE-8273
> Project: Lucene - Core
>  Issue Type: New Feature
>    Reporter: Alan Woodward
>Priority: Major
> Attachments: LUCENE-8273.patch, LUCENE-8273.patch, LUCENE-8273.patch, 
> LUCENE-8273.patch
>
>
> Spinoff of LUCENE-8265.  It would be useful to be able to wrap a TokenFilter 
> in such a way that it could optionally be bypassed based on the current state 
> of the TokenStream.  This could be used to, for example, only apply 
> WordDelimiterFilter to terms that contain hyphens.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-8273) Add a ConditionalTokenFilter

2018-04-27 Thread Alan Woodward (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-8273?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16456717#comment-16456717
 ] 

Alan Woodward commented on LUCENE-8273:
---

I think I fixed it - attached is a patch including a test where we wrap 
SynonymGraphFilter, and everything seems to pass.

> Add a ConditionalTokenFilter
> 
>
> Key: LUCENE-8273
> URL: https://issues.apache.org/jira/browse/LUCENE-8273
> Project: Lucene - Core
>  Issue Type: New Feature
>    Reporter: Alan Woodward
>Priority: Major
> Attachments: LUCENE-8273.patch, LUCENE-8273.patch, LUCENE-8273.patch
>
>
> Spinoff of LUCENE-8265.  It would be useful to be able to wrap a TokenFilter 
> in such a way that it could optionally be bypassed based on the current state 
> of the TokenStream.  This could be used to, for example, only apply 
> WordDelimiterFilter to terms that contain hyphens.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (LUCENE-8273) Add a ConditionalTokenFilter

2018-04-27 Thread Alan Woodward (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-8273?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alan Woodward updated LUCENE-8273:
--
Attachment: LUCENE-8273.patch

> Add a ConditionalTokenFilter
> 
>
> Key: LUCENE-8273
> URL: https://issues.apache.org/jira/browse/LUCENE-8273
> Project: Lucene - Core
>  Issue Type: New Feature
>    Reporter: Alan Woodward
>Priority: Major
> Attachments: LUCENE-8273.patch, LUCENE-8273.patch, LUCENE-8273.patch
>
>
> Spinoff of LUCENE-8265.  It would be useful to be able to wrap a TokenFilter 
> in such a way that it could optionally be bypassed based on the current state 
> of the TokenStream.  This could be used to, for example, only apply 
> WordDelimiterFilter to terms that contain hyphens.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-8273) Add a ConditionalTokenFilter

2018-04-26 Thread Alan Woodward (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-8273?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16453807#comment-16453807
 ] 

Alan Woodward commented on LUCENE-8273:
---

Yes, exactly that.

> Add a ConditionalTokenFilter
> 
>
> Key: LUCENE-8273
> URL: https://issues.apache.org/jira/browse/LUCENE-8273
> Project: Lucene - Core
>  Issue Type: New Feature
>    Reporter: Alan Woodward
>Priority: Major
> Attachments: LUCENE-8273.patch, LUCENE-8273.patch
>
>
> Spinoff of LUCENE-8265.  It would be useful to be able to wrap a TokenFilter 
> in such a way that it could optionally be bypassed based on the current state 
> of the TokenStream.  This could be used to, for example, only apply 
> WordDelimiterFilter to terms that contain hyphens.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-8196) Add IntervalQuery and IntervalsSource to expose minimum interval semantics across term fields

2018-04-26 Thread Alan Woodward (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-8196?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16453767#comment-16453767
 ] 

Alan Woodward commented on LUCENE-8196:
---

I think minwidth() would run into problems with documents that have two 
instances of 'b', because unordered will always find the minimal intervals, so 
it would always end up with intervals of width 0, which would then be rejected 
by the filter, and you'd end up with missing matches.

What we really need here I think is a new source, something like 
'unordered-non-overlapping', which checks that all of the internal intervals 
are separated.  With a better name, of course :) . And we should rename 
'unordered' to 'and' to make the semantics a bit clearer.

> Add IntervalQuery and IntervalsSource to expose minimum interval semantics 
> across term fields
> -
>
> Key: LUCENE-8196
> URL: https://issues.apache.org/jira/browse/LUCENE-8196
> Project: Lucene - Core
>  Issue Type: New Feature
>Reporter: Alan Woodward
>    Assignee: Alan Woodward
>Priority: Major
> Fix For: 7.4
>
> Attachments: LUCENE-8196-debug.patch, LUCENE-8196.patch, 
> LUCENE-8196.patch, LUCENE-8196.patch, LUCENE-8196.patch, LUCENE-8196.patch
>
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> This ticket proposes an alternative implementation of the SpanQuery family 
> that uses minimum-interval semantics from 
> [http://vigna.di.unimi.it/ftp/papers/EfficientAlgorithmsMinimalIntervalSemantics.pdf]
>  to implement positional queries across term-based fields.  Rather than using 
> TermQueries to construct the interval operators, as in LUCENE-2878 or the 
> current Spans implementation, we instead use a new IntervalsSource object, 
> which will produce IntervalIterators over a particular segment and field.  
> These are constructed using various static helper methods, and can then be 
> passed to a new IntervalQuery which will return documents that contain one or 
> more intervals so defined.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-8273) Add a ConditionalTokenFilter

2018-04-25 Thread Alan Woodward (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-8273?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16452676#comment-16452676
 ] 

Alan Woodward commented on LUCENE-8273:
---

bq. You mean any filter that uses captureState?

capture/restoreState works fine, the problem comes when you get a filter that 
needs to look ahead in the tokenstream, so for example if SynonymGraphFilter 
has a multiword synonym "a b c -> d", and you hit token "a", then the filter 
pulls in two more tokens to see if it matches the whole synonym; but 
ConditionalTokenFilter only allows you to pull in one token at a time, because 
it needs to distinguish between incrementToken() as called by the next filter 
down the line, and incrementToken() as called by its delegate.

> Add a ConditionalTokenFilter
> 
>
> Key: LUCENE-8273
> URL: https://issues.apache.org/jira/browse/LUCENE-8273
> Project: Lucene - Core
>  Issue Type: New Feature
>Reporter: Alan Woodward
>Priority: Major
> Attachments: LUCENE-8273.patch, LUCENE-8273.patch
>
>
> Spinoff of LUCENE-8265.  It would be useful to be able to wrap a TokenFilter 
> in such a way that it could optionally be bypassed based on the current state 
> of the TokenStream.  This could be used to, for example, only apply 
> WordDelimiterFilter to terms that contain hyphens.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (LUCENE-8273) Add a ConditionalTokenFilter

2018-04-25 Thread Alan Woodward (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-8273?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alan Woodward updated LUCENE-8273:
--
Summary: Add a ConditionalTokenFilter  (was: Add a BypassingTokenFilter)

> Add a ConditionalTokenFilter
> 
>
> Key: LUCENE-8273
> URL: https://issues.apache.org/jira/browse/LUCENE-8273
> Project: Lucene - Core
>  Issue Type: New Feature
>    Reporter: Alan Woodward
>Priority: Major
> Attachments: LUCENE-8273.patch, LUCENE-8273.patch
>
>
> Spinoff of LUCENE-8265.  It would be useful to be able to wrap a TokenFilter 
> in such a way that it could optionally be bypassed based on the current state 
> of the TokenStream.  This could be used to, for example, only apply 
> WordDelimiterFilter to terms that contain hyphens.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-8273) Add a BypassingTokenFilter

2018-04-25 Thread Alan Woodward (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-8273?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16452407#comment-16452407
 ] 

Alan Woodward commented on LUCENE-8273:
---

bq. Perhaps this can be extended to handle the case of ShingleFilter

I'm not sure that this makes sense in those cases though?  For example, what if 
the first token in the tokenstream matches the condition and is passed to the 
ShingleFilter, but the second one doesn't?

> Add a BypassingTokenFilter
> --
>
> Key: LUCENE-8273
> URL: https://issues.apache.org/jira/browse/LUCENE-8273
> Project: Lucene - Core
>  Issue Type: New Feature
>    Reporter: Alan Woodward
>Priority: Major
> Attachments: LUCENE-8273.patch, LUCENE-8273.patch
>
>
> Spinoff of LUCENE-8265.  It would be useful to be able to wrap a TokenFilter 
> in such a way that it could optionally be bypassed based on the current state 
> of the TokenStream.  This could be used to, for example, only apply 
> WordDelimiterFilter to terms that contain hyphens.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-8273) Add a BypassingTokenFilter

2018-04-25 Thread Alan Woodward (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-8273?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16452274#comment-16452274
 ] 

Alan Woodward commented on LUCENE-8273:
---

Here's an updated patch:
* now works with wrapped filters that emit more than one token (thanks David!)
* renamed to ConditionalTokenFilter and the logic reversed (thanks Robert!)
* cleaned up all the logic around reset(), close() and end()
* integrated into testRandomChains.

This latter one is a bit clunky, as this TokenFilter won't work with filters 
that consume more than one token at a time - eg ShingleFilter or 
SynonymGraphFilter.  At the moment I have a blacklist, but there may be a 
better way of isolating that - preferably one that throws errors when you build 
the TokenStream.  Speak up if you have any suggestions.

I do like the idea of integrating things into CustomAnalyzer, will look at that 
next.

> Add a BypassingTokenFilter
> --
>
> Key: LUCENE-8273
> URL: https://issues.apache.org/jira/browse/LUCENE-8273
> Project: Lucene - Core
>  Issue Type: New Feature
>    Reporter: Alan Woodward
>Priority: Major
> Attachments: LUCENE-8273.patch, LUCENE-8273.patch
>
>
> Spinoff of LUCENE-8265.  It would be useful to be able to wrap a TokenFilter 
> in such a way that it could optionally be bypassed based on the current state 
> of the TokenStream.  This could be used to, for example, only apply 
> WordDelimiterFilter to terms that contain hyphens.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (LUCENE-8273) Add a BypassingTokenFilter

2018-04-25 Thread Alan Woodward (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-8273?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alan Woodward updated LUCENE-8273:
--
Attachment: LUCENE-8273.patch

> Add a BypassingTokenFilter
> --
>
> Key: LUCENE-8273
> URL: https://issues.apache.org/jira/browse/LUCENE-8273
> Project: Lucene - Core
>  Issue Type: New Feature
>    Reporter: Alan Woodward
>Priority: Major
> Attachments: LUCENE-8273.patch, LUCENE-8273.patch
>
>
> Spinoff of LUCENE-8265.  It would be useful to be able to wrap a TokenFilter 
> in such a way that it could optionally be bypassed based on the current state 
> of the TokenStream.  This could be used to, for example, only apply 
> WordDelimiterFilter to terms that contain hyphens.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-8273) Add a BypassingTokenFilter

2018-04-24 Thread Alan Woodward (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-8273?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16449897#comment-16449897
 ] 

Alan Woodward commented on LUCENE-8273:
---

I added this to core rather than to the analysis module as it seems to me to be 
a utility class like FilteringTokenFilter, which is also in core.  But I'm 
perfectly happy to move it to analysis-common if that makes more sense to 
others.

> Add a BypassingTokenFilter
> --
>
> Key: LUCENE-8273
> URL: https://issues.apache.org/jira/browse/LUCENE-8273
> Project: Lucene - Core
>  Issue Type: New Feature
>    Reporter: Alan Woodward
>Priority: Major
> Attachments: LUCENE-8273.patch
>
>
> Spinoff of LUCENE-8265.  It would be useful to be able to wrap a TokenFilter 
> in such a way that it could optionally be bypassed based on the current state 
> of the TokenStream.  This could be used to, for example, only apply 
> WordDelimiterFilter to terms that contain hyphens.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-8265) WordDelimiterFilter should pass through terms marked as keywords

2018-04-24 Thread Alan Woodward (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-8265?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16449896#comment-16449896
 ] 

Alan Woodward commented on LUCENE-8265:
---

I created LUCENE-8273 for the potential spinoff - [~sokolov] would this work 
for your situation?

> WordDelimiterFilter should pass through terms marked as keywords
> 
>
> Key: LUCENE-8265
> URL: https://issues.apache.org/jira/browse/LUCENE-8265
> Project: Lucene - Core
>  Issue Type: Improvement
>Reporter: Mike Sokolov
>Priority: Major
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> This will help in cases where some terms containing separator characters 
> should be split, but others should not.  For example, this will enable a 
> filter that identifies things that look like fractions and identifies them as 
> keywords so that 1/2 does not become 12, while doing splitting and joining on 
> terms that look like part numbers containing slashes, eg something like 
> "sn-999123/1" might sometimes be written "sn-999123-1".



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-8273) Add a BypassingTokenFilter

2018-04-24 Thread Alan Woodward (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-8273?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16449893#comment-16449893
 ] 

Alan Woodward commented on LUCENE-8273:
---

Here's a patch.

> Add a BypassingTokenFilter
> --
>
> Key: LUCENE-8273
> URL: https://issues.apache.org/jira/browse/LUCENE-8273
> Project: Lucene - Core
>  Issue Type: New Feature
>    Reporter: Alan Woodward
>Priority: Major
> Attachments: LUCENE-8273.patch
>
>
> Spinoff of LUCENE-8265.  It would be useful to be able to wrap a TokenFilter 
> in such a way that it could optionally be bypassed based on the current state 
> of the TokenStream.  This could be used to, for example, only apply 
> WordDelimiterFilter to terms that contain hyphens.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (LUCENE-8273) Add a BypassingTokenFilter

2018-04-24 Thread Alan Woodward (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-8273?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alan Woodward updated LUCENE-8273:
--
Attachment: LUCENE-8273.patch

> Add a BypassingTokenFilter
> --
>
> Key: LUCENE-8273
> URL: https://issues.apache.org/jira/browse/LUCENE-8273
> Project: Lucene - Core
>  Issue Type: New Feature
>    Reporter: Alan Woodward
>Priority: Major
> Attachments: LUCENE-8273.patch
>
>
> Spinoff of LUCENE-8265.  It would be useful to be able to wrap a TokenFilter 
> in such a way that it could optionally be bypassed based on the current state 
> of the TokenStream.  This could be used to, for example, only apply 
> WordDelimiterFilter to terms that contain hyphens.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Created] (LUCENE-8273) Add a BypassingTokenFilter

2018-04-24 Thread Alan Woodward (JIRA)
Alan Woodward created LUCENE-8273:
-

 Summary: Add a BypassingTokenFilter
 Key: LUCENE-8273
 URL: https://issues.apache.org/jira/browse/LUCENE-8273
 Project: Lucene - Core
  Issue Type: New Feature
Reporter: Alan Woodward


Spinoff of LUCENE-8265.  It would be useful to be able to wrap a TokenFilter in 
such a way that it could optionally be bypassed based on the current state of 
the TokenStream.  This could be used to, for example, only apply 
WordDelimiterFilter to terms that contain hyphens.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-8249) Add matches to exact PhraseQuery and MultiPhraseQuery

2018-04-24 Thread Alan Woodward (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-8249?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16449494#comment-16449494
 ] 

Alan Woodward commented on LUCENE-8249:
---

Updated patch, now that MatchesIterator#term() is gone.  I also changed 
maxFreq() to return a float, and added a comment to SloppyPhraseMatcher to 
explain how the max freq is calculated.

> Add matches to exact PhraseQuery and MultiPhraseQuery
> -
>
> Key: LUCENE-8249
> URL: https://issues.apache.org/jira/browse/LUCENE-8249
> Project: Lucene - Core
>  Issue Type: Improvement
>    Reporter: Alan Woodward
>Assignee: Alan Woodward
>Priority: Major
> Attachments: LUCENE-8249.patch, LUCENE-8249.patch, LUCENE-8249.patch, 
> LUCENE-8249.patch
>
>
> ExactPhraseScorer can be rejigged fairly easily to expose a MatchesIterator



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (LUCENE-8249) Add matches to exact PhraseQuery and MultiPhraseQuery

2018-04-24 Thread Alan Woodward (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-8249?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alan Woodward updated LUCENE-8249:
--
Attachment: LUCENE-8249.patch

> Add matches to exact PhraseQuery and MultiPhraseQuery
> -
>
> Key: LUCENE-8249
> URL: https://issues.apache.org/jira/browse/LUCENE-8249
> Project: Lucene - Core
>  Issue Type: Improvement
>    Reporter: Alan Woodward
>Assignee: Alan Woodward
>Priority: Major
> Attachments: LUCENE-8249.patch, LUCENE-8249.patch, LUCENE-8249.patch, 
> LUCENE-8249.patch
>
>
> ExactPhraseScorer can be rejigged fairly easily to expose a MatchesIterator



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-8196) Add IntervalQuery and IntervalsSource to expose minimum interval semantics across term fields

2018-04-24 Thread Alan Woodward (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-8196?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16449461#comment-16449461
 ] 

Alan Woodward commented on LUCENE-8196:
---

Good catch [~jim.ferenczi], I'll commit that change.  I like the idea of 
changing *unordered* to *and* as well - I think that makes sense [~mattweber]?

> Add IntervalQuery and IntervalsSource to expose minimum interval semantics 
> across term fields
> -
>
> Key: LUCENE-8196
> URL: https://issues.apache.org/jira/browse/LUCENE-8196
> Project: Lucene - Core
>  Issue Type: New Feature
>Reporter: Alan Woodward
>    Assignee: Alan Woodward
>Priority: Major
> Fix For: 7.4
>
> Attachments: LUCENE-8196-debug.patch, LUCENE-8196.patch, 
> LUCENE-8196.patch, LUCENE-8196.patch, LUCENE-8196.patch, LUCENE-8196.patch
>
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> This ticket proposes an alternative implementation of the SpanQuery family 
> that uses minimum-interval semantics from 
> [http://vigna.di.unimi.it/ftp/papers/EfficientAlgorithmsMinimalIntervalSemantics.pdf]
>  to implement positional queries across term-based fields.  Rather than using 
> TermQueries to construct the interval operators, as in LUCENE-2878 or the 
> current Spans implementation, we instead use a new IntervalsSource object, 
> which will produce IntervalIterators over a particular segment and field.  
> These are constructed using various static helper methods, and can then be 
> passed to a new IntervalQuery which will return documents that contain one or 
> more intervals so defined.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-8196) Add IntervalQuery and IntervalsSource to expose minimum interval semantics across term fields

2018-04-23 Thread Alan Woodward (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-8196?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16448387#comment-16448387
 ] 

Alan Woodward commented on LUCENE-8196:
---

bq. How would we prevent matching at the same interval?

The original paper doesn't look like it addresses this.  I'll try and work out 
the best way of dealing with things, I guess we'll need to keep track of the 
positions of internal intervals in the priority queue, and when we advance make 
sure that they don't collide.

> Add IntervalQuery and IntervalsSource to expose minimum interval semantics 
> across term fields
> -
>
> Key: LUCENE-8196
> URL: https://issues.apache.org/jira/browse/LUCENE-8196
> Project: Lucene - Core
>  Issue Type: New Feature
>Reporter: Alan Woodward
>    Assignee: Alan Woodward
>Priority: Major
> Fix For: 7.4
>
> Attachments: LUCENE-8196.patch, LUCENE-8196.patch, LUCENE-8196.patch, 
> LUCENE-8196.patch, LUCENE-8196.patch
>
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> This ticket proposes an alternative implementation of the SpanQuery family 
> that uses minimum-interval semantics from 
> [http://vigna.di.unimi.it/ftp/papers/EfficientAlgorithmsMinimalIntervalSemantics.pdf]
>  to implement positional queries across term-based fields.  Rather than using 
> TermQueries to construct the interval operators, as in LUCENE-2878 or the 
> current Spans implementation, we instead use a new IntervalsSource object, 
> which will produce IntervalIterators over a particular segment and field.  
> These are constructed using various static helper methods, and can then be 
> passed to a new IntervalQuery which will return documents that contain one or 
> more intervals so defined.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (LUCENE-8270) Remove MatchesIterator.term()

2018-04-23 Thread Alan Woodward (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-8270?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alan Woodward updated LUCENE-8270:
--
Attachment: LUCENE-8270.patch

> Remove MatchesIterator.term()
> -
>
> Key: LUCENE-8270
> URL: https://issues.apache.org/jira/browse/LUCENE-8270
> Project: Lucene - Core
>  Issue Type: Improvement
>    Reporter: Alan Woodward
>Assignee: Alan Woodward
>Priority: Major
> Attachments: LUCENE-8270.patch
>
>
> As discussed on LUCENE-8268, we don't have a clear use-case for this yet, and 
> it's complicating adding Matches to phrase queries, so let's just remove it 
> for now.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Created] (LUCENE-8270) Remove MatchesIterator.term()

2018-04-23 Thread Alan Woodward (JIRA)
Alan Woodward created LUCENE-8270:
-

 Summary: Remove MatchesIterator.term()
 Key: LUCENE-8270
 URL: https://issues.apache.org/jira/browse/LUCENE-8270
 Project: Lucene - Core
  Issue Type: Improvement
Reporter: Alan Woodward
Assignee: Alan Woodward


As discussed on LUCENE-8268, we don't have a clear use-case for this yet, and 
it's complicating adding Matches to phrase queries, so let's just remove it for 
now.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-8268) MatchesIterator.term() should return an array

2018-04-23 Thread Alan Woodward (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-8268?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16448211#comment-16448211
 ] 

Alan Woodward commented on LUCENE-8268:
---

bq. since inner nodes are not queries? 

Sorry, I'm not following here - inner nodes are always generated by a Weight, 
which in turn has a parent query.

bq. Maybe we should just remove this method for now

I think that may be the most sensible, I'll close this as Won't Fix and open a 
new issue to just remove it entirely.

> MatchesIterator.term() should return an array
> -
>
> Key: LUCENE-8268
> URL: https://issues.apache.org/jira/browse/LUCENE-8268
> Project: Lucene - Core
>  Issue Type: Improvement
>    Reporter: Alan Woodward
>Assignee: Alan Woodward
>Priority: Major
> Attachments: LUCENE-8268.patch, LUCENE-8268.patch
>
>
> At the moment, we return a single BytesRef from MatchesIterator.term(), which 
> works well for the queries that currently implement this.  This won't be 
> enough for queries that operate on more than one term, however, such as 
> phrase or Span queries.
> In preparation for LUCENE-8249, this issue will change the method to return 
> an array of BytesRef



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Resolved] (LUCENE-8268) MatchesIterator.term() should return an array

2018-04-23 Thread Alan Woodward (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-8268?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alan Woodward resolved LUCENE-8268.
---
Resolution: Won't Fix

> MatchesIterator.term() should return an array
> -
>
> Key: LUCENE-8268
> URL: https://issues.apache.org/jira/browse/LUCENE-8268
> Project: Lucene - Core
>  Issue Type: Improvement
>    Reporter: Alan Woodward
>Assignee: Alan Woodward
>Priority: Major
> Attachments: LUCENE-8268.patch, LUCENE-8268.patch
>
>
> At the moment, we return a single BytesRef from MatchesIterator.term(), which 
> works well for the queries that currently implement this.  This won't be 
> enough for queries that operate on more than one term, however, such as 
> phrase or Span queries.
> In preparation for LUCENE-8249, this issue will change the method to return 
> an array of BytesRef



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-8268) MatchesIterator.term() should return an array

2018-04-23 Thread Alan Woodward (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-8268?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16448159#comment-16448159
 ] 

Alan Woodward commented on LUCENE-8268:
---

Here's a patch removing term() and adding getLeafQuery()

> MatchesIterator.term() should return an array
> -
>
> Key: LUCENE-8268
> URL: https://issues.apache.org/jira/browse/LUCENE-8268
> Project: Lucene - Core
>  Issue Type: Improvement
>    Reporter: Alan Woodward
>Assignee: Alan Woodward
>Priority: Major
> Attachments: LUCENE-8268.patch, LUCENE-8268.patch
>
>
> At the moment, we return a single BytesRef from MatchesIterator.term(), which 
> works well for the queries that currently implement this.  This won't be 
> enough for queries that operate on more than one term, however, such as 
> phrase or Span queries.
> In preparation for LUCENE-8249, this issue will change the method to return 
> an array of BytesRef



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (LUCENE-8268) MatchesIterator.term() should return an array

2018-04-23 Thread Alan Woodward (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-8268?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alan Woodward updated LUCENE-8268:
--
Attachment: LUCENE-8268.patch

> MatchesIterator.term() should return an array
> -
>
> Key: LUCENE-8268
> URL: https://issues.apache.org/jira/browse/LUCENE-8268
> Project: Lucene - Core
>  Issue Type: Improvement
>    Reporter: Alan Woodward
>Assignee: Alan Woodward
>Priority: Major
> Attachments: LUCENE-8268.patch, LUCENE-8268.patch
>
>
> At the moment, we return a single BytesRef from MatchesIterator.term(), which 
> works well for the queries that currently implement this.  This won't be 
> enough for queries that operate on more than one term, however, such as 
> phrase or Span queries.
> In preparation for LUCENE-8249, this issue will change the method to return 
> an array of BytesRef



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-8265) WordDelimiterFilter should pass through terms marked as keywords

2018-04-23 Thread Alan Woodward (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-8265?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16448145#comment-16448145
 ] 

Alan Woodward commented on LUCENE-8265:
---

I wonder if there's a better way of handling this than using the 
KeywordAttribute, which as Nikolay says is heavily overloaded.  Would it be 
possible to somehow code up a TokenFilter that wraps another TokenFilter, and 
bypasses the wrapped filter if a certain condition is met?

> WordDelimiterFilter should pass through terms marked as keywords
> 
>
> Key: LUCENE-8265
> URL: https://issues.apache.org/jira/browse/LUCENE-8265
> Project: Lucene - Core
>  Issue Type: Improvement
>Reporter: Mike Sokolov
>Priority: Major
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> This will help in cases where some terms containing separator characters 
> should be split, but others should not.  For example, this will enable a 
> filter that identifies things that look like fractions and identifies them as 
> keywords so that 1/2 does not become 12, while doing splitting and joining on 
> terms that look like part numbers containing slashes, eg something like 
> "sn-999123/1" might sometimes be written "sn-999123-1".



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Reopened] (LUCENE-8254) LRUQueryCache can leak locks

2018-04-23 Thread Alan Woodward (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-8254?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alan Woodward reopened LUCENE-8254:
---

> LRUQueryCache can leak locks
> 
>
> Key: LUCENE-8254
> URL: https://issues.apache.org/jira/browse/LUCENE-8254
> Project: Lucene - Core
>  Issue Type: Bug
>    Reporter: Alan Woodward
>Assignee: Alan Woodward
>Priority: Major
> Fix For: 7.3.1
>
> Attachments: LUCENE-8254.patch, LUCENE-8254.patch
>
>
> If a QueryCache is shared between two searchers, one of which has an 
> IndexReader with no CacheHelper, then CachingWrapperWeight can leak locks in 
> scorerSupplier() and bulkScorer().  This can cause the IndexReader that does 
> have a CacheHelper to hang on close.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (LUCENE-8254) LRUQueryCache can leak locks

2018-04-23 Thread Alan Woodward (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-8254?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alan Woodward updated LUCENE-8254:
--
Fix Version/s: (was: 7.4)
   7.3.1

> LRUQueryCache can leak locks
> 
>
> Key: LUCENE-8254
> URL: https://issues.apache.org/jira/browse/LUCENE-8254
> Project: Lucene - Core
>  Issue Type: Bug
>    Reporter: Alan Woodward
>Assignee: Alan Woodward
>Priority: Major
> Fix For: 7.3.1
>
> Attachments: LUCENE-8254.patch, LUCENE-8254.patch
>
>
> If a QueryCache is shared between two searchers, one of which has an 
> IndexReader with no CacheHelper, then CachingWrapperWeight can leak locks in 
> scorerSupplier() and bulkScorer().  This can cause the IndexReader that does 
> have a CacheHelper to hang on close.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Resolved] (LUCENE-8254) LRUQueryCache can leak locks

2018-04-23 Thread Alan Woodward (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-8254?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alan Woodward resolved LUCENE-8254.
---
Resolution: Fixed

Re-opened to change the fix version to 7.3.1

> LRUQueryCache can leak locks
> 
>
> Key: LUCENE-8254
> URL: https://issues.apache.org/jira/browse/LUCENE-8254
> Project: Lucene - Core
>  Issue Type: Bug
>    Reporter: Alan Woodward
>Assignee: Alan Woodward
>Priority: Major
> Fix For: 7.3.1
>
> Attachments: LUCENE-8254.patch, LUCENE-8254.patch
>
>
> If a QueryCache is shared between two searchers, one of which has an 
> IndexReader with no CacheHelper, then CachingWrapperWeight can leak locks in 
> scorerSupplier() and bulkScorer().  This can cause the IndexReader that does 
> have a CacheHelper to hang on close.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-8268) MatchesIterator.term() should return an array

2018-04-23 Thread Alan Woodward (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-8268?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16447802#comment-16447802
 ] 

Alan Woodward commented on LUCENE-8268:
---

bq. Can we add a users of multiple terms?

So at the moment there isn't anything that actually uses this.  My reason for 
adding it was to make it possible to identify the leaf query that returned each 
position, but maybe it would be a better idea to remove terms() entirely, and 
add a getLeafQuery() method instead?


> MatchesIterator.term() should return an array
> -
>
> Key: LUCENE-8268
> URL: https://issues.apache.org/jira/browse/LUCENE-8268
> Project: Lucene - Core
>  Issue Type: Improvement
>    Reporter: Alan Woodward
>Assignee: Alan Woodward
>Priority: Major
> Attachments: LUCENE-8268.patch
>
>
> At the moment, we return a single BytesRef from MatchesIterator.term(), which 
> works well for the queries that currently implement this.  This won't be 
> enough for queries that operate on more than one term, however, such as 
> phrase or Span queries.
> In preparation for LUCENE-8249, this issue will change the method to return 
> an array of BytesRef



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-8268) MatchesIterator.term() should return an array

2018-04-23 Thread Alan Woodward (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-8268?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16447768#comment-16447768
 ] 

Alan Woodward commented on LUCENE-8268:
---

Patch attached.  It changes the signature from {code}BytesRef term(){code} to 
{code}BytesRef[] terms(){code}, and adds a test to ensure that matches at the 
same position are iterated over in term order.

> MatchesIterator.term() should return an array
> -
>
> Key: LUCENE-8268
> URL: https://issues.apache.org/jira/browse/LUCENE-8268
> Project: Lucene - Core
>  Issue Type: Improvement
>    Reporter: Alan Woodward
>Assignee: Alan Woodward
>Priority: Major
> Attachments: LUCENE-8268.patch
>
>
> At the moment, we return a single BytesRef from MatchesIterator.term(), which 
> works well for the queries that currently implement this.  This won't be 
> enough for queries that operate on more than one term, however, such as 
> phrase or Span queries.
> In preparation for LUCENE-8249, this issue will change the method to return 
> an array of BytesRef



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Created] (LUCENE-8268) MatchesIterator.term() should return an array

2018-04-23 Thread Alan Woodward (JIRA)
Alan Woodward created LUCENE-8268:
-

 Summary: MatchesIterator.term() should return an array
 Key: LUCENE-8268
 URL: https://issues.apache.org/jira/browse/LUCENE-8268
 Project: Lucene - Core
  Issue Type: Improvement
Reporter: Alan Woodward
Assignee: Alan Woodward
 Attachments: LUCENE-8268.patch

At the moment, we return a single BytesRef from MatchesIterator.term(), which 
works well for the queries that currently implement this.  This won't be enough 
for queries that operate on more than one term, however, such as phrase or Span 
queries.

In preparation for LUCENE-8249, this issue will change the method to return an 
array of BytesRef



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (LUCENE-8268) MatchesIterator.term() should return an array

2018-04-23 Thread Alan Woodward (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-8268?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alan Woodward updated LUCENE-8268:
--
Attachment: LUCENE-8268.patch

> MatchesIterator.term() should return an array
> -
>
> Key: LUCENE-8268
> URL: https://issues.apache.org/jira/browse/LUCENE-8268
> Project: Lucene - Core
>  Issue Type: Improvement
>    Reporter: Alan Woodward
>Assignee: Alan Woodward
>Priority: Major
> Attachments: LUCENE-8268.patch
>
>
> At the moment, we return a single BytesRef from MatchesIterator.term(), which 
> works well for the queries that currently implement this.  This won't be 
> enough for queries that operate on more than one term, however, such as 
> phrase or Span queries.
> In preparation for LUCENE-8249, this issue will change the method to return 
> an array of BytesRef



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



Re: BugFix release 7.3.1

2018-04-23 Thread Alan Woodward
Done

> On 23 Apr 2018, at 04:12, Đạt Cao Mạnh  wrote:
> 
> Hi Alan, 
> 
> Can you backport LUCENE-8254 to branch_7_3?


-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



Re: [JENKINS] Lucene-Solr-NightlyTests-master - Build # 1534 - Failure

2018-04-20 Thread Alan Woodward
+1.  It’s a shame that @SuppressCodecs doesn’t work on test methods, only on 
classes, which makes things a little trickier.

> On 20 Apr 2018, at 10:13, Dawid Weiss  wrote:
> 
> This is due to an out of memory exception in
> 
>  [junit4]   1> at
> org.apache.lucene.search.TestInetAddressRangeQueries.testRandomBig(TestInetAddressRangeQueries.java:81)
> 
> Seems like mem codec has been picked -- should we add suppression to this 
> test?
> 
> @SuppressCodecs({"Direct", "Memory"})
> 
> Dawid
> 
> -
> To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
> For additional commands, e-mail: dev-h...@lucene.apache.org
> 


-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-8249) Add matches to exact PhraseQuery and MultiPhraseQuery

2018-04-20 Thread Alan Woodward (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-8249?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16445499#comment-16445499
 ] 

Alan Woodward commented on LUCENE-8249:
---

Here's an updated patch:
* minFreq() is now maxFreq()
* I fixed SloppyPhraseMatcher#maxFreq to be the sum of the freqs of each child 
postings, on the grounds that each term position can be the first term in at 
most one match.  +1 on changing to a float, I'd missed that comment before 
uploading the new patch, will change it now.
* ExactPhraseMatcher now advances its lead in sync with everything else, which 
simplifies things a lot.
* I removed freq(), and replaced it with sloppyWeight() which returns the 
contribution of the current match to the total sloppy freq - suggestions for a 
better name are welcomed...
* I originally put in MatchesIterator#term so that highlighters could do things 
like keep track of the number of specific terms in a fragment, for scoring.  I 
like the idea of changing it to return a BytesRef[] though, let's do that in a 
followup.

> Add matches to exact PhraseQuery and MultiPhraseQuery
> -
>
> Key: LUCENE-8249
> URL: https://issues.apache.org/jira/browse/LUCENE-8249
> Project: Lucene - Core
>  Issue Type: Improvement
>    Reporter: Alan Woodward
>Assignee: Alan Woodward
>Priority: Major
> Attachments: LUCENE-8249.patch, LUCENE-8249.patch, LUCENE-8249.patch
>
>
> ExactPhraseScorer can be rejigged fairly easily to expose a MatchesIterator



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



Re: BadApple report

2018-04-17 Thread Alan Woodward
TestDocTermOrds should be fixed now, as should TestIndexSorting (I un-badappled 
the latter yesterday)

> On 16 Apr 2018, at 21:59, Erick Erickson  wrote:
> 
> We have a much smaller list of _consistently_ failing tests this week, i.e.
> tests that are in Hoss' rollups from two weeks ago and also failed this
> past week.
> 
> In order to reduce some of the make-work, I collect failed tests Fri->Mon
> so the BadApple'd tests on Thursday don't clutter things up.
> 
> 
> ***Tests I'll BadApple on Thursday.
> 
> These are tests that failed in the last week and _also_ are failures
> in Hoss' report from two weeks ago, so nobody has addressed them in
> that time-frame.
> 
> PLEASE LET ME KNOW BEFORE THURSDAY WHICH OF THESE SHOULD NOT BE BADAPPLEd
>   org.apache.solr.cloud.autoscaling.NodeLostTriggerTest.testListenerAcceptance
>   org.apache.solr.cloud.autoscaling.sim.TestTriggerIntegration.testEventQueue
>   org.apache.solr.uninverting.TestDocTermOrds.testNumericEncoded64
> 
> 
> ***All collected test failures:
> *Timeout (or time related)/session expired/thread
> leak/zombie threads/Object tracker
> junit.framework.TestSuite.org.apache.solr.cloud.ChaosMonkeyNothingIsSafeWithPullReplicasTest
> junit.framework.TestSuite.org.apache.solr.cloud.ZkControllerTest
> junit.framework.TestSuite.org.apache.solr.ltr.feature.TestExternalFeatures
> junit.framework.TestSuite.org.apache.solr.request.TestUnInvertedFieldException
> junit.framework.TestSuite.org.apache.solr.schema.TestCloudSchemaless
> org.apache.solr.cloud.cdcr.CdcrBootstrapTest.testBootstrapWithSourceCluster
> org.apache.solr.cloud.TestPullReplicaErrorHandling.testCantConnectToLeader
> org.apache.solr.common.cloud.TestCollectionStateWatchers.testWaitForStateWatcherIsRetainedOnPredicateFailure
> unit.framework.TestSuite.org.apache.solr.ltr.feature.TestExternalFeatures
> 
> 
> ***OutOfMemory/GC overhead exceeded.
> junit.framework.TestSuite.org.apache.solr.uninverting.TestDocTermOrds
> org.apache.solr.uninverting.TestDocTermOrds.testActuallySingleValued
> org.apache.solr.uninverting.TestDocTermOrds.testEmptyIndex
> org.apache.solr.uninverting.TestDocTermOrds.testNumericEncoded64
> org.apache.solr.uninverting.TestDocTermOrds.testRandom
> org.apache.solr.uninverting.TestDocTermOrds.testSortedTermsEnum
> org.apache.solr.uninverting.TestDocTermOrds.testTriggerUnInvertLimit
> 
> *Other (typically asserts.)
> org.apache.solr.client.solrj.impl.CloudSolrClientTest.testRetryUpdatesWhenClusterStateIsStale
> org.apache.solr.cloud.autoscaling.IndexSizeTriggerTest.testSplitIntegration
> org.apache.solr.cloud.autoscaling.IndexSizeTriggerTest.testTrigger
> org.apache.solr.cloud.autoscaling.MetricTriggerIntegrationTest.testMetricTrigger
> org.apache.solr.cloud.autoscaling.NodeAddedTriggerTest.testRestoreState
> org.apache.solr.cloud.autoscaling.NodeLostTriggerTest.testListenerAcceptance
> org.apache.solr.cloud.autoscaling.sim.TestTriggerIntegration.testEventQueue
> org.apache.solr.cloud.ForceLeaderTest.testZombieLeader
> org.apache.solr.common.cloud.TestCollectionStateWatchers.testWaitForStateWatcherIsRetainedOnPredicateFailure
> org.apache.solr.handler.admin.SegmentsInfoRequestHandlerTest.testSegmentInfos
> org.apache.solr.handler.admin.SegmentsInfoRequestHandlerTest.testSegmentInfosData
> org.apache.solr.handler.admin.SegmentsInfoRequestHandlerTest.testSegmentInfosVersion
> org.apache.solr.handler.dataimport.TestContentStreamDataSource.testCommitWithin
> org.apache.solr.schema.TestCloudSchemaless.test
> org.apache.solr.uninverting.TestDocTermOrds.testNumericEncoded64
> org.apache.solr.update.processor.TemplateUpdateProcessorTest.testSimple
> 
> Annotated tests.
> 
> 
> *AwaitsFix Annotations:
> 
> Lucene AwaitsFix
> RandomGeoPolygonTest.java
>   testComparePolygons()
>   //@AwaitsFix(bugUrl="https://issues.apache.org/jira/browse/LUCENE-8245;)
> 
> TestControlledRealTimeReopenThread.java
>   testCRTReopen()
>   @AwaitsFix(bugUrl = "https://issues.apache.org/jira/browse/LUCENE-5737;)
> 
> TestICUNormalizer2CharFilter.java
>   testRandomStrings()
>   @AwaitsFix(bugUrl = "https://issues.apache.org/jira/browse/LUCENE-5595;)
> 
> TestICUTokenizerCJK.java
>   TestICUTokenizerCJK suite
>   @AwaitsFix(bugUrl="https://issues.apache.org/jira/browse/LUCENE-8222;)
> 
> TestMoreLikeThis.java
>   testMultiFieldShouldReturnPerFieldBooleanQuery()
>   @AwaitsFix(bugUrl = "https://issues.apache.org/jira/browse/LUCENE-7161;)
> 
> UIMABaseAnalyzerTest.java
>   testRandomStrings()
>   @Test @AwaitsFix(bugUrl =
> "https://issues.apache.org/jira/browse/LUCENE-3869;)
> 
> UIMABaseAnalyzerTest.java
>   testRandomStringsWithConfigurationParameters()
>   @Test @AwaitsFix(bugUrl =
> "https://issues.apache.org/jira/browse/LUCENE-3869;)
> 
> UIMATypeAwareAnalyzerTest.java
>   testRandomStrings()
>   @Test @AwaitsFix(bugUrl =
> 

[jira] [Updated] (SOLR-12147) TestDocTermOrds.testTriggerUnInvertLimit should not use MemoryCodec

2018-04-16 Thread Alan Woodward (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-12147?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alan Woodward updated SOLR-12147:
-
Attachment: SOLR-12147.patch

> TestDocTermOrds.testTriggerUnInvertLimit should not use MemoryCodec
> ---
>
> Key: SOLR-12147
> URL: https://issues.apache.org/jira/browse/SOLR-12147
> Project: Solr
>  Issue Type: Task
>  Security Level: Public(Default Security Level. Issues are Public) 
>Reporter: Alan Woodward
>    Assignee: Alan Woodward
>Priority: Major
> Fix For: 7.4
>
> Attachments: SOLR-12147.patch
>
>
> This can lead to OOM, for example in 
> [https://builds.apache.org/job/Lucene-Solr-NightlyTests-7.3/10/.|https://builds.apache.org/job/Lucene-Solr-NightlyTests-7.3/10/]
>   It's already a nightly-only test, and it's always going to require a large 
> index.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Reopened] (SOLR-12147) TestDocTermOrds.testTriggerUnInvertLimit should not use MemoryCodec

2018-04-16 Thread Alan Woodward (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-12147?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alan Woodward reopened SOLR-12147:
--

> TestDocTermOrds.testTriggerUnInvertLimit should not use MemoryCodec
> ---
>
> Key: SOLR-12147
> URL: https://issues.apache.org/jira/browse/SOLR-12147
> Project: Solr
>  Issue Type: Task
>  Security Level: Public(Default Security Level. Issues are Public) 
>Reporter: Alan Woodward
>    Assignee: Alan Woodward
>Priority: Major
> Fix For: 7.4
>
>
> This can lead to OOM, for example in 
> [https://builds.apache.org/job/Lucene-Solr-NightlyTests-7.3/10/.|https://builds.apache.org/job/Lucene-Solr-NightlyTests-7.3/10/]
>   It's already a nightly-only test, and it's always going to require a large 
> index.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-8249) Add matches to exact PhraseQuery and MultiPhraseQuery

2018-04-16 Thread Alan Woodward (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-8249?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16439293#comment-16439293
 ] 

Alan Woodward commented on LUCENE-8249:
---

Looks like a small penalty on sloppy phrases, and a slightly less small boost 
on exact phrases.  Or possibly just noise.
{code}
TaskQPS baseline  StdDevQPS my_modified_version  StdDev
Pct diff
HighSloppyPhrase  589.78  (5.4%)  573.54  (7.9%)   
-2.8% ( -15% -   11%)
  OrHighHigh 1174.38  (8.6%) 1146.22  (7.5%)   
-2.4% ( -17% -   15%)
 MedSloppyPhrase 1328.47  (4.3%) 1302.80  (5.2%)   
-1.9% ( -10% -7%)
  AndHighLow 3138.65  (8.5%) 3087.05  (7.2%)   
-1.6% ( -15% -   15%)
 LowSpanNear 1962.66  (5.3%) 1931.60  (5.8%)   
-1.6% ( -12% -   10%)
 Prefix3 1027.12  (7.8%) 1011.50  (8.0%)   
-1.5% ( -16% -   15%)
Wildcard 1842.34  (5.8%) 1821.58  (4.2%)   
-1.1% ( -10% -9%)
PKLookup  392.44  (4.6%)  388.12  (4.6%)   
-1.1% (  -9% -8%)
   HighTermDayOfYearSort 1122.38  (6.2%) .20  (7.3%)   
-1.0% ( -13% -   13%)
HighTerm 4343.88  (8.5%) 4316.70  (5.9%)   
-0.6% ( -13% -   14%)
  IntNRQ 1319.13  (2.5%) 1313.00  (2.4%)   
-0.5% (  -5% -4%)
   OrHighLow 2157.05  (4.2%) 2148.60  (4.9%)   
-0.4% (  -9% -9%)
   HighTermMonthSort 3568.59  (5.9%) 3563.38  (5.7%)   
-0.1% ( -11% -   12%)
   OrHighMed 1276.34 (11.4%) 1274.61 (11.2%)   
-0.1% ( -20% -   25%)
   LowPhrase 1567.69  (4.7%) 1567.03  (5.5%)   
-0.0% (  -9% -   10%)
 MedTerm 5682.98  (8.2%) 5685.03  (9.3%)
0.0% ( -16% -   19%)
 AndHighHigh 1020.12  (4.6%) 1023.48  (4.7%)
0.3% (  -8% -   10%)
 LowSloppyPhrase  885.26  (4.4%)  889.20  (5.2%)
0.4% (  -8% -   10%)
  AndHighMed 1287.27  (6.0%) 1296.46  (5.0%)
0.7% (  -9% -   12%)
  Fuzzy1  493.78  (4.4%)  497.65  (2.9%)
0.8% (  -6% -8%)
  Fuzzy2   83.87 (20.0%)   85.02 (18.4%)
1.4% ( -30% -   49%)
 Respell  391.63  (4.6%)  397.30  (4.1%)
1.4% (  -6% -   10%)
 LowTerm 6098.16  (6.0%) 6202.87  (5.4%)
1.7% (  -9% -   13%)
HighSpanNear  773.18 (10.9%)  786.87  (8.4%)
1.8% ( -15% -   23%)
 MedSpanNear  937.52  (6.1%)  960.49  (4.2%)
2.4% (  -7% -   13%)
  HighPhrase 1035.86  (3.8%) 1101.79  (4.9%)
6.4% (  -2% -   15%)
   MedPhrase  997.89  (7.2%) 1068.68  (5.0%)
7.1% (  -4% -   20%)
{code}

> Add matches to exact PhraseQuery and MultiPhraseQuery
> -
>
> Key: LUCENE-8249
> URL: https://issues.apache.org/jira/browse/LUCENE-8249
> Project: Lucene - Core
>  Issue Type: Improvement
>    Reporter: Alan Woodward
>Assignee: Alan Woodward
>Priority: Major
> Attachments: LUCENE-8249.patch, LUCENE-8249.patch
>
>
> ExactPhraseScorer can be rejigged fairly easily to expose a MatchesIterator



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-8249) Add matches to exact PhraseQuery and MultiPhraseQuery

2018-04-16 Thread Alan Woodward (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-8249?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16439285#comment-16439285
 ] 

Alan Woodward commented on LUCENE-8249:
---

Here's an updated patch:
* matching is moved out to a new abstract class PhraseMatcher, with Exact and 
Sloppy implementations
* PhraseWeight is abstracted from PhraseQuery and MultiPhraseQuery, and handles 
explanations, matches and scorers (this has the nice side-effect of adding 
min-score handling to sloppy phrases)

I'm running benchmarks now

> Add matches to exact PhraseQuery and MultiPhraseQuery
> -
>
> Key: LUCENE-8249
> URL: https://issues.apache.org/jira/browse/LUCENE-8249
> Project: Lucene - Core
>  Issue Type: Improvement
>    Reporter: Alan Woodward
>Assignee: Alan Woodward
>Priority: Major
> Attachments: LUCENE-8249.patch, LUCENE-8249.patch
>
>
> ExactPhraseScorer can be rejigged fairly easily to expose a MatchesIterator



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (LUCENE-8249) Add matches to exact PhraseQuery and MultiPhraseQuery

2018-04-16 Thread Alan Woodward (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-8249?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alan Woodward updated LUCENE-8249:
--
Attachment: LUCENE-8249.patch

> Add matches to exact PhraseQuery and MultiPhraseQuery
> -
>
> Key: LUCENE-8249
> URL: https://issues.apache.org/jira/browse/LUCENE-8249
> Project: Lucene - Core
>  Issue Type: Improvement
>    Reporter: Alan Woodward
>Assignee: Alan Woodward
>Priority: Major
> Attachments: LUCENE-8249.patch, LUCENE-8249.patch
>
>
> ExactPhraseScorer can be rejigged fairly easily to expose a MatchesIterator



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Resolved] (LUCENE-8254) LRUQueryCache can leak locks

2018-04-16 Thread Alan Woodward (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-8254?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alan Woodward resolved LUCENE-8254.
---
   Resolution: Fixed
 Assignee: Alan Woodward
Fix Version/s: 7.4

> LRUQueryCache can leak locks
> 
>
> Key: LUCENE-8254
> URL: https://issues.apache.org/jira/browse/LUCENE-8254
> Project: Lucene - Core
>  Issue Type: Bug
>    Reporter: Alan Woodward
>Assignee: Alan Woodward
>Priority: Major
> Fix For: 7.4
>
> Attachments: LUCENE-8254.patch, LUCENE-8254.patch
>
>
> If a QueryCache is shared between two searchers, one of which has an 
> IndexReader with no CacheHelper, then CachingWrapperWeight can leak locks in 
> scorerSupplier() and bulkScorer().  This can cause the IndexReader that does 
> have a CacheHelper to hang on close.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-8254) LRUQueryCache can leak locks

2018-04-16 Thread Alan Woodward (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-8254?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16439221#comment-16439221
 ] 

Alan Woodward commented on LUCENE-8254:
---

This only happens under very unusual circumstances (query cache shared between 
two different IndexReader types, one of which has a null CacheHelper), so I 
don't think this warrants a bug fix release?

> LRUQueryCache can leak locks
> 
>
> Key: LUCENE-8254
> URL: https://issues.apache.org/jira/browse/LUCENE-8254
> Project: Lucene - Core
>  Issue Type: Bug
>    Reporter: Alan Woodward
>Priority: Major
> Attachments: LUCENE-8254.patch, LUCENE-8254.patch
>
>
> If a QueryCache is shared between two searchers, one of which has an 
> IndexReader with no CacheHelper, then CachingWrapperWeight can leak locks in 
> scorerSupplier() and bulkScorer().  This can cause the IndexReader that does 
> have a CacheHelper to hang on close.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



<    3   4   5   6   7   8   9   10   11   12   >