[jira] [Updated] (LUCENE-5479) Make default dimension config in FacetConfig adjustable

2014-02-28 Thread Rob Audenaerde (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-5479?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Rob Audenaerde updated LUCENE-5479:
---

Attachment: LUCENE-5479.patch

With javadoc.

> Make default dimension config in FacetConfig adjustable 
> 
>
> Key: LUCENE-5479
> URL: https://issues.apache.org/jira/browse/LUCENE-5479
> Project: Lucene - Core
>  Issue Type: Improvement
>  Components: modules/facet
>Reporter: Rob Audenaerde
>Priority: Minor
> Attachments: LUCENE-5479.patch, LUCENE-5479.patch
>
>
> Now it is hardcoded to DEFAULT_DIM_CONFIG. This may be useful for most 
> standard approaches. 
> However, I use lots of facets. These facets can be multivalued, I do not know 
> that on beforehand. So what I would like to do is to change the default 
> config to {{mulitvalued = true}}. 
> Currently I have a working, but rather ugly workaround that subclasses 
> FacetConfig, like this:
> {code:title=CustomFacetConfig.java|borderStyle=solid}
> public class CustomFacetsConfig extends FacetsConfig
> {
>   public final static DimConfig DEFAULT_D2A_DIM_CONFIG = new DimConfig();
>   static
>   {
>   DEFAULT_D2A_DIM_CONFIG.multiValued = true;
>   }
>   @Override
>   public synchronized DimConfig getDimConfig( String dimName )
>   {
>   DimConfig ft = super.getDimConfig( dimName );
>   if ( DEFAULT_DIM_CONFIG.equals( ft ) )
>   {
>   return DEFAULT_D2A_DIM_CONFIG;
>   }
>   return ft;
>   }
> }
> {code}
> I created a patch to illustrate what I would like to change. By making a 
> protected method it is easier to create a custom subclass of FacetConfig. 
> Also, maybe there are better way to accomplish my goal (easy default to 
> multivalue?)



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



Elevation and core create

2014-02-28 Thread David Stuart
Hi,

I am using Solr 3.6 and am trying to automate the deployment of cores with a 
custom elevate file. It is proving to be difficult as most of the file (schema, 
stop words etc) support absolute path elevate seems to need to be in either a 
conf directory as a sibling to data or in the data directory itself. I am able 
to achieve my goal by having a secondary process that places the file but 
thought I would as the group just in case I have missed the obvious. Should I 
move to Solr 4 is it fixed here? I could also go down the root of extending the 
SolrCore create function to accept additional params and move the file into the 
defined data directory.

Ideas?

Thanks for your help

Regards

David Stuart
M  +44(0) 778 854 2157
T   +44(0) 845 519 5465
www.axistwelve.com
Axis12 Ltd | The Ivories | 6/18 Northampton Street, London | N1 2HY | UK

AXIS12 - Enterprise Web Solutions

Reg Company No. 7215135
VAT No. 997 4801 60

This e-mail is strictly confidential and intended solely for the ordinary user 
of the e-mail account to which it is addressed. If you have received this 
e-mail in error please inform Axis12 immediately by return e-mail or telephone. 
We advise that in keeping with good computing practice the recipient of this 
e-mail should ensure that it is virus free. We do not accept any responsibility 
for any loss or damage that may arise from the use of this email or its 
contents.





[jira] [Commented] (LUCENE-5370) Sorting Facets on CategoryPath (Label)

2014-02-28 Thread Rob Audenaerde (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-5370?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13915602#comment-13915602
 ] 

Rob Audenaerde commented on LUCENE-5370:


I have implemented sorting on label based on the 4.7 branch. The cost in terms 
of speed is not really that high compared to retrieving  values. 

I have an index with 999.999 random documents. To retrieve a top-ten facet list 
based on value takes +- 510 ms. Retrieving labels takes +- 580 ms. 

So it looks like there is only increase in time of about 15%.   

> Sorting Facets on CategoryPath (Label)
> --
>
> Key: LUCENE-5370
> URL: https://issues.apache.org/jira/browse/LUCENE-5370
> Project: Lucene - Core
>  Issue Type: New Feature
>  Components: modules/facet
>Affects Versions: 4.6
>Reporter: Rob Audenaerde
>  Labels: features
>
> Facet support sorting through {{FacetRequest.SortOrder}}. This is used in the 
> {{ResultSortUtils}}. For my application it would be very nice if the facets 
> can also be sorted on their label. 
> I think this could be accomplished by altering {{FacetRequest}} with an extra 
> enum {{SortType}}, and two extra {{Heap}}  in {{ResultSortUtils}} which 
> instead of comparing the double value, compare the CategoryPath.
> What do you think of this idea? Or could the same behaviour be accomplished 
> in a different way already?
> (btw: I tried building this patch on the trunk of lucene5.0; but I couldn't 
> get the maven build to build correctly. I will try again lateron on the 4.6 
> branch.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-5476) Facet sampling

2014-02-28 Thread Rob Audenaerde (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-5476?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13915650#comment-13915650
 ] 

Rob Audenaerde commented on LUCENE-5476:


I'm currently expermenting with this. To increase the speed it seems logical to 
me the {{FacetsCollector}} needs to return less hits. I have a slighly modified 
version that I will attach. 
It uses a sampling technique that divides the total hits in to 'bins' of a 
given size; and takes one sample of that bin. I have implemented it as keeping 
that one sample as 'hit' of the search if it was a hit, and clearing all other 
bits.  See the attached file. 

By using this technique the distribution of the results should not be altered 
too much, while the performance gains can be significant. 

A quick test revealed that for 1M results and binsize 500, the sampled version 
is twice as fast.

The problem it that the resulting {{FacetResult}}s are not correct, as the 
number of hits is reduced. This can be fixed afterwards for counting facets by 
multiplying with the binsize; but for other facets it will be more difficult or 
will require other approaches. 

What do you think?

> Facet sampling
> --
>
> Key: LUCENE-5476
> URL: https://issues.apache.org/jira/browse/LUCENE-5476
> Project: Lucene - Core
>  Issue Type: Improvement
>Reporter: Rob Audenaerde
>
> With LUCENE-5339 facet sampling disappeared. 
> When trying to display facet counts on large datasets (>10M documents) 
> counting facets is rather expensive, as all the hits are collected and 
> processed. 
> Sampling greatly reduced this and thus provided a nice speedup. Could it be 
> brought back?



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (LUCENE-5476) Facet sampling

2014-02-28 Thread Rob Audenaerde (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-5476?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Rob Audenaerde updated LUCENE-5476:
---

Attachment: SamplingFacetsCollector.java

> Facet sampling
> --
>
> Key: LUCENE-5476
> URL: https://issues.apache.org/jira/browse/LUCENE-5476
> Project: Lucene - Core
>  Issue Type: Improvement
>Reporter: Rob Audenaerde
> Attachments: SamplingFacetsCollector.java
>
>
> With LUCENE-5339 facet sampling disappeared. 
> When trying to display facet counts on large datasets (>10M documents) 
> counting facets is rather expensive, as all the hits are collected and 
> processed. 
> Sampling greatly reduced this and thus provided a nice speedup. Could it be 
> brought back?



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



Re: GSoC 2014 on LUCENE-466: Need QueryParser support for BooleanQuery.minNrShouldMatch

2014-02-28 Thread Michael McCandless
I think a good place to start is on the issue itself.

E.g. add a comment expressing that you're interested in this issue,
maybe summarize roughly what's entailed.  E.g., that issue is quite
old, and the first part of it (supporting minShouldMatch in BQ) has
already been done, so all that remains is fixing QueryParsers to
accept it, if they don't already?  I'm not sure, but just this part
may be too little for a whole summer?


Mike McCandless

http://blog.mikemccandless.com


On Thu, Feb 27, 2014 at 10:16 PM, Tao Lin  wrote:
> Hello,
>
> My name is Tao Lin, a Chinese student from Beijing Normal University Zhuhai
> Campus. It's great to see that Han Jiang (also a Chinese student) has
> already contributed to Lucene in GSoC 2012 and 2013. Likewise, I'd like to
> participant GSoC 2014, on the project of LUCENE-466 [1] (Need QueryParser
> support for BooleanQuery.minNrShouldMatch). Is this lucene dev mailing list
> the place to discuss gsoc projects? Who will be the mentor(s) for this
> project? I see the Assignee of LUCENE-466 is Yonik Seeley. How can I get in
> touch with him? Is LUCENE-466 still available as a GSoC 2014 student
> project?
>
> For a brief self-introduction, I've successfully completed 2 open source
> GSoC projects:
> - In GSoC 2011, I worked for Languagetool [2] to develop a Lucene-based
> indexing tool that makes it possible to run proof-reading rule against a
> large amount of text.
> - In GSoC 2012, I added the RDFa metadata support for Apache ODF Toolkit
> [3].
>
> Yours,
> Tao Lin
>
> [1] https://issues.apache.org/jira/browse/LUCENE-466
>
> [2] http://www.languagetool.org/gsoc2011/
>
> [3] https://issues.apache.org/jira/browse/ODFTOOLKIT-50
>
>
>

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Comment Edited] (LUCENE-5469) Add small rounding to FuzzyQuery.floatToEdits

2014-02-28 Thread Tim Allison (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-5469?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13915700#comment-13915700
 ] 

Tim Allison edited comment on LUCENE-5469 at 2/28/14 11:56 AM:
---

[~rcmuir], would the next step to removing FuzzyQuery.floatToEdits from trunk 
be to deprecate fuzzyMinSim(s) in the queryparsers and add fuzzyMaxEdits?


was (Author: talli...@mitre.org):
[~rcmuir], would the next step to removing FuzzyQuery.floatToEdits from trunk 
be to deprecate fuzzyMinSim(s) in the queryparsers?

> Add small rounding to FuzzyQuery.floatToEdits
> -
>
> Key: LUCENE-5469
> URL: https://issues.apache.org/jira/browse/LUCENE-5469
> Project: Lucene - Core
>  Issue Type: Bug
>  Components: core/search
>Affects Versions: 5.0
>Reporter: Tim Allison
>Priority: Trivial
>  Labels: easyfix
> Attachments: LUCENE-5469.patch
>
>
> I realize that FuzzyQuery.floatToEdits is deprecated, but I'd like to make a 
> small fix for posterity.  Because of floating point issues, if a percentage 
> leads to a number that is very close to a whole number of edits, our cast to 
> int can improperly cause misses.
> d~0.8  will not match "X"
> e~0.8 will not match "" or "ee"
> This is a trivial part of the plan to reduce code duplication with 
> LUCENE-5205.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-5469) Add small rounding to FuzzyQuery.floatToEdits

2014-02-28 Thread Tim Allison (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-5469?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13915700#comment-13915700
 ] 

Tim Allison commented on LUCENE-5469:
-

[~rcmuir], would the next step to removing FuzzyQuery.floatToEdits from trunk 
be to deprecate fuzzyMinSim(s) in the queryparsers?

> Add small rounding to FuzzyQuery.floatToEdits
> -
>
> Key: LUCENE-5469
> URL: https://issues.apache.org/jira/browse/LUCENE-5469
> Project: Lucene - Core
>  Issue Type: Bug
>  Components: core/search
>Affects Versions: 5.0
>Reporter: Tim Allison
>Priority: Trivial
>  Labels: easyfix
> Attachments: LUCENE-5469.patch
>
>
> I realize that FuzzyQuery.floatToEdits is deprecated, but I'd like to make a 
> small fix for posterity.  Because of floating point issues, if a percentage 
> leads to a number that is very close to a whole number of edits, our cast to 
> int can improperly cause misses.
> d~0.8  will not match "X"
> e~0.8 will not match "" or "ee"
> This is a trivial part of the plan to reduce code duplication with 
> LUCENE-5205.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (LUCENE-5477) add near-real-time suggest building to AnalyzingInfixSuggester

2014-02-28 Thread Michael McCandless (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-5477?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Michael McCandless updated LUCENE-5477:
---

Attachment: LUCENE-5477.patch

New patch, adding random test, which seems to be passing ... I think it's 
ready; I'll commit soon.

> add near-real-time suggest building to AnalyzingInfixSuggester
> --
>
> Key: LUCENE-5477
> URL: https://issues.apache.org/jira/browse/LUCENE-5477
> Project: Lucene - Core
>  Issue Type: Improvement
>  Components: modules/spellchecker
>Reporter: Michael McCandless
> Fix For: 4.8, 5.0
>
> Attachments: LUCENE-5477.patch, LUCENE-5477.patch
>
>
> Because this suggester impl. is just a Lucene index under-the-hood, it should 
> be straightforward to enable near-real-time additions/removals of suggestions.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-5476) Facet sampling

2014-02-28 Thread Michael McCandless (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-5476?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13915729#comment-13915729
 ] 

Michael McCandless commented on LUCENE-5476:


This looks great!  

bq. To increase the speed it seems logical to me the FacetsCollector needs to 
return less hits.

*fewer* hits :)  (Sorry, pet peeve).

Maybe, you could add a utility method on SamplingFacetsCollector to "fixup" the 
FacetResult assuming it's a simple count (e.g., multiply the aggregate by the 
bin size)?  The old sampling code 
(https://svn.apache.org/repos/asf/lucene/dev/branches/lucene_solr_4_6/lucene/facet/src/java/org/apache/lucene/facet/sampling
 ) has something like this.

It might be good to allow passing the random seed, for repeatable results?

Another option, which would save the 2nd pass, would be to do the sampling 
during Docs.addDoc.

Also, instead of the bin-checking, you could just pull the next random double 
and check if it's < 1.0/binSize?

> Facet sampling
> --
>
> Key: LUCENE-5476
> URL: https://issues.apache.org/jira/browse/LUCENE-5476
> Project: Lucene - Core
>  Issue Type: Improvement
>Reporter: Rob Audenaerde
> Attachments: SamplingFacetsCollector.java
>
>
> With LUCENE-5339 facet sampling disappeared. 
> When trying to display facet counts on large datasets (>10M documents) 
> counting facets is rather expensive, as all the hits are collected and 
> processed. 
> Sampling greatly reduced this and thus provided a nice speedup. Could it be 
> brought back?



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Created] (LUCENE-5481) IndexWriter.forceMerge may run unneeded merges

2014-02-28 Thread Adrien Grand (JIRA)
Adrien Grand created LUCENE-5481:


 Summary: IndexWriter.forceMerge may run unneeded merges
 Key: LUCENE-5481
 URL: https://issues.apache.org/jira/browse/LUCENE-5481
 Project: Lucene - Core
  Issue Type: Bug
Reporter: Adrien Grand
Assignee: Adrien Grand
Priority: Minor
 Fix For: 4.8


I was running some tests and was surprised that {{IndexWriter.forceMerge}} 
caused the index to be merged even when the index contains a single segment 
with no deletions.

This is due to {{MergePolicy.isMerged}} which always returns false with the 
default configuration although merge policies rely on it to know whether a 
single-segment index should be merged.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-5481) IndexWriter.forceMerge may run unneeded merges

2014-02-28 Thread Michael McCandless (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-5481?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13915740#comment-13915740
 ] 

Michael McCandless commented on LUCENE-5481:


That's just awful.

So if you .forceMerge(1) an existing index, and then turn around call call that 
again, the 2nd time still does the merge?

> IndexWriter.forceMerge may run unneeded merges
> --
>
> Key: LUCENE-5481
> URL: https://issues.apache.org/jira/browse/LUCENE-5481
> Project: Lucene - Core
>  Issue Type: Bug
>Reporter: Adrien Grand
>Assignee: Adrien Grand
>Priority: Minor
> Fix For: 4.8
>
> Attachments: LUCENE-5481.patch
>
>
> I was running some tests and was surprised that {{IndexWriter.forceMerge}} 
> caused the index to be merged even when the index contains a single segment 
> with no deletions.
> This is due to {{MergePolicy.isMerged}} which always returns false with the 
> default configuration although merge policies rely on it to know whether a 
> single-segment index should be merged.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-5793) SignatureUpdateProcessorFactoryTest routinely fails on J9

2014-02-28 Thread Robert Muir (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-5793?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13915757#comment-13915757
 ] 

Robert Muir commented on SOLR-5793:
---

Hoss, are you sure the assume is working? this test tripped last night on J9:
{noformat}
Build: http://jenkins.thetaphi.de/job/Lucene-Solr-4.x-Linux/9515/
Java: 64bit/ibm-j9-jdk7 
-Xjit:exclude={org/apache/lucene/util/fst/FST.pack(IIF)Lorg/apache/lucene/util/fst/FST;}

2 tests failed.
REGRESSION:  
org.apache.solr.update.processor.SignatureUpdateProcessorFactoryTest.testMultiThreaded

Error Message:
expected:<1> but was:<3>
{noformat}

What I do know works, and have used before is this:
{code}
Constants.JAVA_VENDOR.startsWith("IBM")
{code}

> SignatureUpdateProcessorFactoryTest routinely fails on J9
> -
>
> Key: SOLR-5793
> URL: https://issues.apache.org/jira/browse/SOLR-5793
> Project: Solr
>  Issue Type: Bug
>Reporter: Hoss Man
>
> Two very similar looking failures pop up frequently, but not always 
> together...
> {noformat}
> REGRESSION:  
> org.apache.solr.update.processor.SignatureUpdateProcessorFactoryTest.testMultiThreaded
> Error Message:
> expected:<1> but was:<2>
> Stack Trace:
> java.lang.AssertionError: expected:<1> but was:<2>
>   at 
> __randomizedtesting.SeedInfo.seed([791041A112471F1D:18859B41FA9615EB]:0)
>   at org.junit.Assert.fail(Assert.java:93)
>   at org.junit.Assert.failNotEquals(Assert.java:647)
>   at org.junit.Assert.assertEquals(Assert.java:128)
>   at org.junit.Assert.assertEquals(Assert.java:472)
>   at org.junit.Assert.assertEquals(Assert.java:456)
>   at 
> org.apache.solr.update.processor.SignatureUpdateProcessorFactoryTest.checkNumDocs(SignatureUpdateProcessorFactoryTest.java:71)
>   at 
> org.apache.solr.update.processor.SignatureUpdateProcessorFactoryTest.testMultiThreaded(SignatureUpdateProcessorFactoryTest.java:222)
> {noformat}
> {noformat}
> REGRESSION:  
> org.apache.solr.update.processor.SignatureUpdateProcessorFactoryTest.testDupeDetection
> Error Message:
> expected:<1> but was:<2>
> Stack Trace:
> java.lang.AssertionError: expected:<1> but was:<2>
>   at 
> __randomizedtesting.SeedInfo.seed([16A8922439B48E61:4D9869EC3AF32D1D]:0)
>   at org.junit.Assert.fail(Assert.java:93)
>   at org.junit.Assert.failNotEquals(Assert.java:647)
>   at org.junit.Assert.assertEquals(Assert.java:128)
>   at org.junit.Assert.assertEquals(Assert.java:472)
>   at org.junit.Assert.assertEquals(Assert.java:456)
>   at 
> org.apache.solr.update.processor.SignatureUpdateProcessorFactoryTest.checkNumDocs(SignatureUpdateProcessorFactoryTest.java:71)
>   at 
> org.apache.solr.update.processor.SignatureUpdateProcessorFactoryTest.testDupeDetection(SignatureUpdateProcessorFactoryTest.java:119)
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (LUCENE-5481) IndexWriter.forceMerge may run unneeded merges

2014-02-28 Thread Adrien Grand (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-5481?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Adrien Grand updated LUCENE-5481:
-

Attachment: LUCENE-5481.patch

Here is an attempt to fit it. The interesting change is in 
{{MergePolicy.isMerged}}.

> IndexWriter.forceMerge may run unneeded merges
> --
>
> Key: LUCENE-5481
> URL: https://issues.apache.org/jira/browse/LUCENE-5481
> Project: Lucene - Core
>  Issue Type: Bug
>Reporter: Adrien Grand
>Assignee: Adrien Grand
>Priority: Minor
> Fix For: 4.8
>
> Attachments: LUCENE-5481.patch
>
>
> I was running some tests and was surprised that {{IndexWriter.forceMerge}} 
> caused the index to be merged even when the index contains a single segment 
> with no deletions.
> This is due to {{MergePolicy.isMerged}} which always returns false with the 
> default configuration although merge policies rely on it to know whether a 
> single-segment index should be merged.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-5476) Facet sampling

2014-02-28 Thread Shai Erera (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-5476?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13915762#comment-13915762
 ] 

Shai Erera commented on LUCENE-5476:


This looks like a great start! I have few comments/suggestions, based on the 
previous sampling impl:

* I think SamplingFC.createDocs should return a declared SampledDocs (see 
later) instead of anonymous class

* Currently that SampledDocs.getDocIdSet() creates a new sample on every call. 
This is bad not only from a performance point of view, but more because if 
there are few Facets* that call it, they will compute weights on different 
samples
** Instead, I think that SampledDocs.getDocIdSet() should return the same 
sampled DIS, i.e. by caching it.
** Also, I think it would be good if it exposes a getSampledDocIdSet which 
takes some parameters e.g. the sample size
** I think the original FBS should not be modified. E.g. in the previous 
sampling impl, you could compute the sampled top-facets but then return their 
exact counts by traversing the original matching docs and counting only them.
*** But maybe we should leave that for later, it could be a different SFC impl. 
But still please don't compute the sample on every getDocIdSet() call.

* The old implementation let you specify different parameters such as sample 
size, minimum number of documents to evaluate, maximum number of documents to 
evaluate etc. I think those are important since defining e.g. 10% sample size 
could still cause you to process 10M docs, which is slow.

I like that this impl samples per-segment as it allows to tune the sample on a 
per-segment basis. E.g. small segments (as in NRT) probably don't need to be 
sampled at all. If we allow passing different parameters such as sampleRatio, 
min/maxSampleSize, we could tune sampling per-segment.

I agree with Mike that we need to allow passing a seed for repeatability (I 
assume it will be important when testing). Maybe wrap all the parameters in a 
SamplingConfig? We can keep the sugar ctors on SFC to take only e.g. 
sampleRatio.

Also, one thing that we did in the old impl is not use Math.random() or Random 
at all. Rather some crazy formula was used to create faster samples. I don't 
say we need to do that now, but maybe drop a TODO. At any rate, I don't think 
we should use Math.random(), but rather init a Random once and call 
nextDouble(). This will also allow to reuse a seed.

> Facet sampling
> --
>
> Key: LUCENE-5476
> URL: https://issues.apache.org/jira/browse/LUCENE-5476
> Project: Lucene - Core
>  Issue Type: Improvement
>Reporter: Rob Audenaerde
> Attachments: SamplingFacetsCollector.java
>
>
> With LUCENE-5339 facet sampling disappeared. 
> When trying to display facet counts on large datasets (>10M documents) 
> counting facets is rather expensive, as all the hits are collected and 
> processed. 
> Sampling greatly reduced this and thus provided a nice speedup. Could it be 
> brought back?



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-5481) IndexWriter.forceMerge may run unneeded merges

2014-02-28 Thread ASF subversion and git services (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-5481?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13915772#comment-13915772
 ] 

ASF subversion and git services commented on LUCENE-5481:
-

Commit 1572942 from [~jpountz] in branch 'dev/trunk'
[ https://svn.apache.org/r1572942 ]

LUCENE-5481: Don't run unnecessary merges in IndexWriter.forceMerge.

> IndexWriter.forceMerge may run unneeded merges
> --
>
> Key: LUCENE-5481
> URL: https://issues.apache.org/jira/browse/LUCENE-5481
> Project: Lucene - Core
>  Issue Type: Bug
>Reporter: Adrien Grand
>Assignee: Adrien Grand
>Priority: Minor
> Fix For: 4.8
>
> Attachments: LUCENE-5481.patch
>
>
> I was running some tests and was surprised that {{IndexWriter.forceMerge}} 
> caused the index to be merged even when the index contains a single segment 
> with no deletions.
> This is due to {{MergePolicy.isMerged}} which always returns false with the 
> default configuration although merge policies rely on it to know whether a 
> single-segment index should be merged.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-5469) Add small rounding to FuzzyQuery.floatToEdits

2014-02-28 Thread Robert Muir (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-5469?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13915765#comment-13915765
 ] 

Robert Muir commented on LUCENE-5469:
-

We deprecated this stuff already in 4.x, and issued warnings (e.g. in the 
queryparser syntax) that such syntax was deprecated in 4.x and will be removed 
in 5. We should nuke this stuff!

> Add small rounding to FuzzyQuery.floatToEdits
> -
>
> Key: LUCENE-5469
> URL: https://issues.apache.org/jira/browse/LUCENE-5469
> Project: Lucene - Core
>  Issue Type: Bug
>  Components: core/search
>Affects Versions: 5.0
>Reporter: Tim Allison
>Priority: Trivial
>  Labels: easyfix
> Attachments: LUCENE-5469.patch
>
>
> I realize that FuzzyQuery.floatToEdits is deprecated, but I'd like to make a 
> small fix for posterity.  Because of floating point issues, if a percentage 
> leads to a number that is very close to a whole number of edits, our cast to 
> int can improperly cause misses.
> d~0.8  will not match "X"
> e~0.8 will not match "" or "ee"
> This is a trivial part of the plan to reduce code duplication with 
> LUCENE-5205.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Resolved] (LUCENE-5481) IndexWriter.forceMerge may run unneeded merges

2014-02-28 Thread Adrien Grand (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-5481?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Adrien Grand resolved LUCENE-5481.
--

Resolution: Fixed

Committed. Thank you for the review, Mike!

> IndexWriter.forceMerge may run unneeded merges
> --
>
> Key: LUCENE-5481
> URL: https://issues.apache.org/jira/browse/LUCENE-5481
> Project: Lucene - Core
>  Issue Type: Bug
>Reporter: Adrien Grand
>Assignee: Adrien Grand
>Priority: Minor
> Fix For: 4.8
>
> Attachments: LUCENE-5481.patch
>
>
> I was running some tests and was surprised that {{IndexWriter.forceMerge}} 
> caused the index to be merged even when the index contains a single segment 
> with no deletions.
> This is due to {{MergePolicy.isMerged}} which always returns false with the 
> default configuration although merge policies rely on it to know whether a 
> single-segment index should be merged.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-5481) IndexWriter.forceMerge may run unneeded merges

2014-02-28 Thread ASF subversion and git services (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-5481?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13915774#comment-13915774
 ] 

ASF subversion and git services commented on LUCENE-5481:
-

Commit 1572943 from [~jpountz] in branch 'dev/branches/branch_4x'
[ https://svn.apache.org/r1572943 ]

LUCENE-5481: Don't run unnecessary merges in IndexWriter.forceMerge.

> IndexWriter.forceMerge may run unneeded merges
> --
>
> Key: LUCENE-5481
> URL: https://issues.apache.org/jira/browse/LUCENE-5481
> Project: Lucene - Core
>  Issue Type: Bug
>Reporter: Adrien Grand
>Assignee: Adrien Grand
>Priority: Minor
> Fix For: 4.8
>
> Attachments: LUCENE-5481.patch
>
>
> I was running some tests and was surprised that {{IndexWriter.forceMerge}} 
> caused the index to be merged even when the index contains a single segment 
> with no deletions.
> This is due to {{MergePolicy.isMerged}} which always returns false with the 
> default configuration although merge policies rely on it to know whether a 
> single-segment index should be merged.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-5476) Facet sampling

2014-02-28 Thread Rob Audenaerde (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-5476?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13915796#comment-13915796
 ] 

Rob Audenaerde commented on LUCENE-5476:


Thanks guys for the feedback (also on my language skills, I need to improve my 
English ;))

{quote}
It might be good to allow passing the random seed, for repeatable results?
{quote}
Yes! This is very sensible for testing and more 'stable' screenresults and I 
will add this.

{quote}
Another option, which would save the 2nd pass, would be to do the sampling 
during Docs.addDoc.
{quote}
I considered sampling on the 'addDocument' but I figured it would be more 
expensive as then for each hit we need to do a random() calculation.

{quote}
I think SamplingFC.createDocs should return a declared SampledDocs (see later) 
instead of anonymous class
{quote}
I also considered this. It is far better for clarity-sake but it also costs a 
copy of the original. I will try some approaches and will make sure the 
sampling is only done once. 

{quote}
I like that this impl samples per-segment as it allows to tune the sample on a 
per-segment basis. E.g. small segments (as in NRT) probably don't need to be 
sampled at all. If we allow passing different parameters such as sampleRatio, 
min/maxSampleSize, we could tune sampling per-segment.
{quote}
This was more or less by accident, but indeed seems useful. All segments need 
the same ratio of sampling though, else it would be really hard to correct the 
counts afterwards. (Or am I missing something here?)

{quote}
Maybe wrap all the parameters in a SamplingConfig?
{quote}
Yes. Very useful and makes it more stable.

{quote}
The old implementation let you specify different parameters such as sample 
size, minimum number of documents to evaluate, maximum number of documents to 
evaluate etc
{quote}

The old style sampling indeed had a fixed sample size, which I found very 
useful. However, I have not yet found a way to implement this as I do not know 
the total number of results when I start facetting, so I cannot determine the 
samplingRatio.  I could of course first count all results, but that also 
impacts performance as I would need two passes. I will give it some more 
thought, but maybe you have an idea on how to accomplish this in a better way?
 

> Facet sampling
> --
>
> Key: LUCENE-5476
> URL: https://issues.apache.org/jira/browse/LUCENE-5476
> Project: Lucene - Core
>  Issue Type: Improvement
>Reporter: Rob Audenaerde
> Attachments: SamplingFacetsCollector.java
>
>
> With LUCENE-5339 facet sampling disappeared. 
> When trying to display facet counts on large datasets (>10M documents) 
> counting facets is rather expensive, as all the hits are collected and 
> processed. 
> Sampling greatly reduced this and thus provided a nice speedup. Could it be 
> brought back?



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-5183) Add block support for JSONLoader

2014-02-28 Thread Varun Thacker (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-5183?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13915752#comment-13915752
 ] 

Varun Thacker commented on SOLR-5183:
-

Thanks Hoss for reviewing.
I have added a comment on the ref guide containing an example of adding nested 
documents in JSON - 
https://cwiki.apache.org/confluence/display/solr/Other+Parsers?focusedCommentId=39621617#comment-39621617

> Add block support for JSONLoader
> 
>
> Key: SOLR-5183
> URL: https://issues.apache.org/jira/browse/SOLR-5183
> Project: Solr
>  Issue Type: Sub-task
>Reporter: Varun Thacker
>Assignee: Hoss Man
> Fix For: 4.8, 5.0
>
> Attachments: SOLR-5183.patch, SOLR-5183.patch, SOLR-5183.patch, 
> SOLR-5183.patch, SOLR-5183.patch
>
>
> We should be able to index block documents in JSON format



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-5481) IndexWriter.forceMerge may run unneeded merges

2014-02-28 Thread Michael McCandless (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-5481?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13915750#comment-13915750
 ] 

Michael McCandless commented on LUCENE-5481:


+1, patch looks good.  Thanks Adrien!

> IndexWriter.forceMerge may run unneeded merges
> --
>
> Key: LUCENE-5481
> URL: https://issues.apache.org/jira/browse/LUCENE-5481
> Project: Lucene - Core
>  Issue Type: Bug
>Reporter: Adrien Grand
>Assignee: Adrien Grand
>Priority: Minor
> Fix For: 4.8
>
> Attachments: LUCENE-5481.patch
>
>
> I was running some tests and was surprised that {{IndexWriter.forceMerge}} 
> caused the index to be merged even when the index contains a single segment 
> with no deletions.
> This is due to {{MergePolicy.isMerged}} which always returns false with the 
> default configuration although merge policies rely on it to know whether a 
> single-segment index should be merged.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-4787) Join Contrib

2014-02-28 Thread Alexander S. (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-4787?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13915768#comment-13915768
 ] 

Alexander S. commented on SOLR-4787:


Just tried 4.7.0 and it does not work either.

> Join Contrib
> 
>
> Key: SOLR-4787
> URL: https://issues.apache.org/jira/browse/SOLR-4787
> Project: Solr
>  Issue Type: New Feature
>  Components: search
>Affects Versions: 4.2.1
>Reporter: Joel Bernstein
>Priority: Minor
> Fix For: 4.7
>
> Attachments: SOLR-4787-deadlock-fix.patch, 
> SOLR-4787-pjoin-long-keys.patch, SOLR-4787.patch, SOLR-4787.patch, 
> SOLR-4787.patch, SOLR-4787.patch, SOLR-4787.patch, SOLR-4787.patch, 
> SOLR-4787.patch, SOLR-4787.patch, SOLR-4787.patch, SOLR-4787.patch, 
> SOLR-4787.patch, SOLR-4787.patch, SOLR-4787.patch, SOLR-4787.patch, 
> SOLR-4797-hjoin-multivaluekeys-trunk.patch
>
>
> This contrib provides a place where different join implementations can be 
> contributed to Solr. This contrib currently includes 3 join implementations. 
> The initial patch was generated from the Solr 4.3 tag. Because of changes in 
> the FieldCache API this patch will only build with Solr 4.2 or above.
> *HashSetJoinQParserPlugin aka hjoin*
> The hjoin provides a join implementation that filters results in one core 
> based on the results of a search in another core. This is similar in 
> functionality to the JoinQParserPlugin but the implementation differs in a 
> couple of important ways.
> The first way is that the hjoin is designed to work with int and long join 
> keys only. So, in order to use hjoin, int or long join keys must be included 
> in both the to and from core.
> The second difference is that the hjoin builds memory structures that are 
> used to quickly connect the join keys. So, the hjoin will need more memory 
> then the JoinQParserPlugin to perform the join.
> The main advantage of the hjoin is that it can scale to join millions of keys 
> between cores and provide sub-second response time. The hjoin should work 
> well with up to two million results from the fromIndex and tens of millions 
> of results from the main query.
> The hjoin supports the following features:
> 1) Both lucene query and PostFilter implementations. A *"cost"* > 99 will 
> turn on the PostFilter. The PostFilter will typically outperform the Lucene 
> query when the main query results have been narrowed down.
> 2) With the lucene query implementation there is an option to build the 
> filter with threads. This can greatly improve the performance of the query if 
> the main query index is very large. The "threads" parameter turns on 
> threading. For example *threads=6* will use 6 threads to build the filter. 
> This will setup a fixed threadpool with six threads to handle all hjoin 
> requests. Once the threadpool is created the hjoin will always use it to 
> build the filter. Threading does not come into play with the PostFilter.
> 3) The *size* local parameter can be used to set the initial size of the 
> hashset used to perform the join. If this is set above the number of results 
> from the fromIndex then the you can avoid hashset resizing which improves 
> performance.
> 4) Nested filter queries. The local parameter "fq" can be used to nest a 
> filter query within the join. The nested fq will filter the results of the 
> join query. This can point to another join to support nested joins.
> 5) Full caching support for the lucene query implementation. The filterCache 
> and queryResultCache should work properly even with deep nesting of joins. 
> Only the queryResultCache comes into play with the PostFilter implementation 
> because PostFilters are not cacheable in the filterCache.
> The syntax of the hjoin is similar to the JoinQParserPlugin except that the 
> plugin is referenced by the string "hjoin" rather then "join".
> fq=\{!hjoin fromIndex=collection2 from=id_i to=id_i threads=6 
> fq=$qq\}user:customer1&qq=group:5
> The example filter query above will search the fromIndex (collection2) for 
> "user:customer1" applying the local fq parameter to filter the results. The 
> lucene filter query will be built using 6 threads. This query will generate a 
> list of values from the "from" field that will be used to filter the main 
> query. Only records from the main query, where the "to" field is present in 
> the "from" list will be included in the results.
> The solrconfig.xml in the main query core must contain the reference to the 
> hjoin.
>  class="org.apache.solr.joins.HashSetJoinQParserPlugin"/>
> And the join contrib lib jars must be registed in the solrconfig.xml.
>  
> After issuing the "ant dist" command from inside the solr directory the joins 
> contrib jar will appear in the solr/dist directory. Place the the 
> solr-joins-4.*-.jar  in the WEB-INF/lib directory of the solr we

[jira] [Commented] (LUCENE-5481) IndexWriter.forceMerge may run unneeded merges

2014-02-28 Thread Adrien Grand (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-5481?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13915741#comment-13915741
 ] 

Adrien Grand commented on LUCENE-5481:
--

Yes, exactly.

> IndexWriter.forceMerge may run unneeded merges
> --
>
> Key: LUCENE-5481
> URL: https://issues.apache.org/jira/browse/LUCENE-5481
> Project: Lucene - Core
>  Issue Type: Bug
>Reporter: Adrien Grand
>Assignee: Adrien Grand
>Priority: Minor
> Fix For: 4.8
>
> Attachments: LUCENE-5481.patch
>
>
> I was running some tests and was surprised that {{IndexWriter.forceMerge}} 
> caused the index to be merged even when the index contains a single segment 
> with no deletions.
> This is due to {{MergePolicy.isMerged}} which always returns false with the 
> default configuration although merge policies rely on it to know whether a 
> single-segment index should be merged.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (LUCENE-5476) Facet sampling

2014-02-28 Thread Rob Audenaerde (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-5476?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Rob Audenaerde updated LUCENE-5476:
---

Attachment: LUCENE-5476.patch

Here is a patch (agains 4.7) that covers some of the feedback. 

> Facet sampling
> --
>
> Key: LUCENE-5476
> URL: https://issues.apache.org/jira/browse/LUCENE-5476
> Project: Lucene - Core
>  Issue Type: Improvement
>Reporter: Rob Audenaerde
> Attachments: LUCENE-5476.patch, SamplingFacetsCollector.java
>
>
> With LUCENE-5339 facet sampling disappeared. 
> When trying to display facet counts on large datasets (>10M documents) 
> counting facets is rather expensive, as all the hits are collected and 
> processed. 
> Sampling greatly reduced this and thus provided a nice speedup. Could it be 
> brought back?



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-5770) All attempts to match a SolrCore with it's state in clusterstate.json should be done with the CoreNodeName.

2014-02-28 Thread Mark Miller (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-5770?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13915874#comment-13915874
 ] 

Mark Miller commented on SOLR-5770:
---

Thanks Steve - I'll look closer at this soon.

> All attempts to match a SolrCore with it's state in clusterstate.json should 
> be done with the CoreNodeName.
> ---
>
> Key: SOLR-5770
> URL: https://issues.apache.org/jira/browse/SOLR-5770
> Project: Solr
>  Issue Type: Bug
>  Components: SolrCloud
>Reporter: Mark Miller
>Assignee: Mark Miller
> Fix For: 4.7, 5.0
>
> Attachments: SOLR-5770.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (SOLR-5770) All attempts to match a SolrCore with it's state in clusterstate.json should be done with the NodeName rather than the baseUrl.

2014-02-28 Thread Mark Miller (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-5770?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mark Miller updated SOLR-5770:
--

Summary: All attempts to match a SolrCore with it's state in 
clusterstate.json should be done with the NodeName rather than the baseUrl.  
(was: All attempts to match a SolrCore with it's state in clusterstate.json 
should be done with the CoreNodeName.)

> All attempts to match a SolrCore with it's state in clusterstate.json should 
> be done with the NodeName rather than the baseUrl.
> ---
>
> Key: SOLR-5770
> URL: https://issues.apache.org/jira/browse/SOLR-5770
> Project: Solr
>  Issue Type: Bug
>  Components: SolrCloud
>Reporter: Mark Miller
>Assignee: Mark Miller
> Fix For: 4.7, 5.0
>
> Attachments: SOLR-5770.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-5776) Look at speeding up using SSL with tests.

2014-02-28 Thread ASF subversion and git services (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-5776?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13915890#comment-13915890
 ] 

ASF subversion and git services commented on SOLR-5776:
---

Commit 1572974 from [~markrmil...@gmail.com] in branch 'dev/trunk'
[ https://svn.apache.org/r1572974 ]

SOLR-5776: Suppress SSL

> Look at speeding up using SSL with tests.
> -
>
> Key: SOLR-5776
> URL: https://issues.apache.org/jira/browse/SOLR-5776
> Project: Solr
>  Issue Type: Test
>Reporter: Mark Miller
>
> We have to disable SSL on a bunch of tests now because it appears to sometime 
> be ridiculously slow - especially in slow envs (I never see timeouts on my 
> machine).
> I was talking to Robert about this, and he mentioned that there might be some 
> settings we could change to speed it up.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-5776) Look at speeding up using SSL with tests.

2014-02-28 Thread ASF subversion and git services (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-5776?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13915892#comment-13915892
 ] 

ASF subversion and git services commented on SOLR-5776:
---

Commit 1572976 from [~markrmil...@gmail.com] in branch 'dev/branches/branch_4x'
[ https://svn.apache.org/r1572976 ]

SOLR-5776: Suppress SSL

> Look at speeding up using SSL with tests.
> -
>
> Key: SOLR-5776
> URL: https://issues.apache.org/jira/browse/SOLR-5776
> Project: Solr
>  Issue Type: Test
>Reporter: Mark Miller
>
> We have to disable SSL on a bunch of tests now because it appears to sometime 
> be ridiculously slow - especially in slow envs (I never see timeouts on my 
> machine).
> I was talking to Robert about this, and he mentioned that there might be some 
> settings we could change to speed it up.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (SOLR-5536) Add ValueSource collapse criteria to CollapsingQParsingPlugin

2014-02-28 Thread Peter Keegan (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-5536?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Peter Keegan updated SOLR-5536:
---

Attachment: SOLR-5536-edited.patch

I wasn't able to install the latest patch because of some extraneous lines. 
I've uploaded a version that works for me. Also, it looks like the tests got 
dropped.

Thanks,
Peter

> Add ValueSource collapse criteria to CollapsingQParsingPlugin
> -
>
> Key: SOLR-5536
> URL: https://issues.apache.org/jira/browse/SOLR-5536
> Project: Solr
>  Issue Type: Improvement
>  Components: search
>Affects Versions: 4.6
>Reporter: Joel Bernstein
>Assignee: Joel Bernstein
>Priority: Minor
> Fix For: 4.7, 5.0
>
> Attachments: SOLR-5536-edited.patch, SOLR-5536.patch, 
> SOLR-5536.patch, SOLR-5536.patch, SOLR-5536.patch
>
>
> It would be useful for the CollapsingQParserPlugin to support ValueSource 
> collapse criteria.
> Proposed syntax:
> {code}
> fq={!collapse field=collapse_field max=value_source}
> {code}
> This ticket will also introduce a function query called "cscore",  which will 
> return the score of the current document being collapsed. This will allow 
> score to be incorporated into collapse criteria functions.
> A simple example of the cscore usage:
> {code}
> fq={!collapse field=collapse_field max=sum(cscore(), field(x))}
> {code}
>  



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-5536) Add ValueSource collapse criteria to CollapsingQParsingPlugin

2014-02-28 Thread Peter Keegan (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-5536?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13915934#comment-13915934
 ] 

Peter Keegan commented on SOLR-5536:


Correction: there's a test in the patch, but I see these extra lines at the end 
of the patch file:

 //Test collapse by score with elevation
 
 params = new ModifiableSolrParams();

> Add ValueSource collapse criteria to CollapsingQParsingPlugin
> -
>
> Key: SOLR-5536
> URL: https://issues.apache.org/jira/browse/SOLR-5536
> Project: Solr
>  Issue Type: Improvement
>  Components: search
>Affects Versions: 4.6
>Reporter: Joel Bernstein
>Assignee: Joel Bernstein
>Priority: Minor
> Fix For: 4.7, 5.0
>
> Attachments: SOLR-5536-edited.patch, SOLR-5536.patch, 
> SOLR-5536.patch, SOLR-5536.patch, SOLR-5536.patch
>
>
> It would be useful for the CollapsingQParserPlugin to support ValueSource 
> collapse criteria.
> Proposed syntax:
> {code}
> fq={!collapse field=collapse_field max=value_source}
> {code}
> This ticket will also introduce a function query called "cscore",  which will 
> return the score of the current document being collapsed. This will allow 
> score to be incorporated into collapse criteria functions.
> A simple example of the cscore usage:
> {code}
> fq={!collapse field=collapse_field max=sum(cscore(), field(x))}
> {code}
>  



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-5477) add near-real-time suggest building to AnalyzingInfixSuggester

2014-02-28 Thread ASF subversion and git services (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-5477?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13915961#comment-13915961
 ] 

ASF subversion and git services commented on LUCENE-5477:
-

Commit 1572992 from [~mikemccand] in branch 'dev/trunk'
[ https://svn.apache.org/r1572992 ]

LUCENE-5477: add near-real-time add/update/refresh to AnalyzingInfixSuggester

> add near-real-time suggest building to AnalyzingInfixSuggester
> --
>
> Key: LUCENE-5477
> URL: https://issues.apache.org/jira/browse/LUCENE-5477
> Project: Lucene - Core
>  Issue Type: Improvement
>  Components: modules/spellchecker
>Reporter: Michael McCandless
> Fix For: 4.8, 5.0
>
> Attachments: LUCENE-5477.patch, LUCENE-5477.patch
>
>
> Because this suggester impl. is just a Lucene index under-the-hood, it should 
> be straightforward to enable near-real-time additions/removals of suggestions.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Resolved] (LUCENE-5477) add near-real-time suggest building to AnalyzingInfixSuggester

2014-02-28 Thread Michael McCandless (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-5477?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Michael McCandless resolved LUCENE-5477.


Resolution: Fixed
  Assignee: Michael McCandless

> add near-real-time suggest building to AnalyzingInfixSuggester
> --
>
> Key: LUCENE-5477
> URL: https://issues.apache.org/jira/browse/LUCENE-5477
> Project: Lucene - Core
>  Issue Type: Improvement
>  Components: modules/spellchecker
>Reporter: Michael McCandless
>Assignee: Michael McCandless
> Fix For: 4.8, 5.0
>
> Attachments: LUCENE-5477.patch, LUCENE-5477.patch
>
>
> Because this suggester impl. is just a Lucene index under-the-hood, it should 
> be straightforward to enable near-real-time additions/removals of suggestions.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-5477) add near-real-time suggest building to AnalyzingInfixSuggester

2014-02-28 Thread ASF subversion and git services (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-5477?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13915988#comment-13915988
 ] 

ASF subversion and git services commented on LUCENE-5477:
-

Commit 1572997 from [~mikemccand] in branch 'dev/branches/branch_4x'
[ https://svn.apache.org/r1572997 ]

LUCENE-5477: add near-real-time add/update/refresh to AnalyzingInfixSuggester

> add near-real-time suggest building to AnalyzingInfixSuggester
> --
>
> Key: LUCENE-5477
> URL: https://issues.apache.org/jira/browse/LUCENE-5477
> Project: Lucene - Core
>  Issue Type: Improvement
>  Components: modules/spellchecker
>Reporter: Michael McCandless
> Fix For: 4.8, 5.0
>
> Attachments: LUCENE-5477.patch, LUCENE-5477.patch
>
>
> Because this suggester impl. is just a Lucene index under-the-hood, it should 
> be straightforward to enable near-real-time additions/removals of suggestions.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (LUCENE-5472) Long terms should generate a RuntimeException, not just infoStream

2014-02-28 Thread Varun Thacker (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-5472?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Varun Thacker updated LUCENE-5472:
--

Attachment: LUCENE-5472.patch

Attached new patch -
1. Removed unused variable in DWPT
2. Added Solr Tests 

> Long terms should generate a RuntimeException, not just infoStream
> --
>
> Key: LUCENE-5472
> URL: https://issues.apache.org/jira/browse/LUCENE-5472
> Project: Lucene - Core
>  Issue Type: Improvement
>Reporter: Hoss Man
> Attachments: LUCENE-5472.patch, LUCENE-5472.patch, LUCENE-5472.patch
>
>
> As reported on the solr-user list, when a term is greater then 2^15 bytes it 
> is silently ignored at indexing time -- a message is logged in to infoStream 
> if enabled, but no error is thrown.
> seems like we should change this behavior (if nothing else starting in 5.0) 
> to throw an exception.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-5376) Add a demo search server

2014-02-28 Thread ASF subversion and git services (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-5376?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13916021#comment-13916021
 ] 

ASF subversion and git services commented on LUCENE-5376:
-

Commit 1573003 from [~mikemccand] in branch 'dev/branches/lucene5376'
[ https://svn.apache.org/r1573003 ]

LUCENE-5376: merge trunk

> Add a demo search server
> 
>
> Key: LUCENE-5376
> URL: https://issues.apache.org/jira/browse/LUCENE-5376
> Project: Lucene - Core
>  Issue Type: Improvement
>Reporter: Michael McCandless
>Assignee: Michael McCandless
> Attachments: lucene-demo-server.tgz
>
>
> I think it'd be useful to have a "demo" search server for Lucene.
> Rather than being fully featured, like Solr, it would be minimal, just 
> wrapping the existing Lucene modules to show how you can make use of these 
> features in a server setting.
> The purpose is to demonstrate how one can build a minimal search server on 
> top of APIs like SearchManager, SearcherLifetimeManager, etc.
> This is also useful for finding rough edges / issues in Lucene's APIs that 
> make building a server unnecessarily hard.
> I don't think it should have back compatibility promises (except Lucene's 
> index back compatibility), so it's free to improve as Lucene's APIs change.
> As a starting point, I'll post what I built for the "eating your own dog 
> food" search app for Lucene's & Solr's jira issues 
> http://jirasearch.mikemccandless.com (blog: 
> http://blog.mikemccandless.com/2013/05/eating-dog-food-with-lucene.html ). It 
> uses Netty to expose basic indexing & searching APIs via JSON, but it's very 
> rough (lots nocommits).



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-5793) SignatureUpdateProcessorFactoryTest routinely fails on J9

2014-02-28 Thread Hoss Man (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-5793?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13916058#comment-13916058
 ] 

Hoss Man commented on SOLR-5793:


bq. Hoss, are you sure the assume is working? this test tripped last night on 
J9:

Clearly it is not working.

I got that assume pattern from what i saw in a bunch of other tests when i 
grepped for "J9" because I knew there were others out there avoiding J9 -- i'll 
switch now to using JAVA_VENDOR.startsWith("IBM")



> SignatureUpdateProcessorFactoryTest routinely fails on J9
> -
>
> Key: SOLR-5793
> URL: https://issues.apache.org/jira/browse/SOLR-5793
> Project: Solr
>  Issue Type: Bug
>Reporter: Hoss Man
>
> Two very similar looking failures pop up frequently, but not always 
> together...
> {noformat}
> REGRESSION:  
> org.apache.solr.update.processor.SignatureUpdateProcessorFactoryTest.testMultiThreaded
> Error Message:
> expected:<1> but was:<2>
> Stack Trace:
> java.lang.AssertionError: expected:<1> but was:<2>
>   at 
> __randomizedtesting.SeedInfo.seed([791041A112471F1D:18859B41FA9615EB]:0)
>   at org.junit.Assert.fail(Assert.java:93)
>   at org.junit.Assert.failNotEquals(Assert.java:647)
>   at org.junit.Assert.assertEquals(Assert.java:128)
>   at org.junit.Assert.assertEquals(Assert.java:472)
>   at org.junit.Assert.assertEquals(Assert.java:456)
>   at 
> org.apache.solr.update.processor.SignatureUpdateProcessorFactoryTest.checkNumDocs(SignatureUpdateProcessorFactoryTest.java:71)
>   at 
> org.apache.solr.update.processor.SignatureUpdateProcessorFactoryTest.testMultiThreaded(SignatureUpdateProcessorFactoryTest.java:222)
> {noformat}
> {noformat}
> REGRESSION:  
> org.apache.solr.update.processor.SignatureUpdateProcessorFactoryTest.testDupeDetection
> Error Message:
> expected:<1> but was:<2>
> Stack Trace:
> java.lang.AssertionError: expected:<1> but was:<2>
>   at 
> __randomizedtesting.SeedInfo.seed([16A8922439B48E61:4D9869EC3AF32D1D]:0)
>   at org.junit.Assert.fail(Assert.java:93)
>   at org.junit.Assert.failNotEquals(Assert.java:647)
>   at org.junit.Assert.assertEquals(Assert.java:128)
>   at org.junit.Assert.assertEquals(Assert.java:472)
>   at org.junit.Assert.assertEquals(Assert.java:456)
>   at 
> org.apache.solr.update.processor.SignatureUpdateProcessorFactoryTest.checkNumDocs(SignatureUpdateProcessorFactoryTest.java:71)
>   at 
> org.apache.solr.update.processor.SignatureUpdateProcessorFactoryTest.testDupeDetection(SignatureUpdateProcessorFactoryTest.java:119)
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[JENKINS-MAVEN] Lucene-Solr-Maven-trunk #1116: POMs out of sync

2014-02-28 Thread Apache Jenkins Server
Build: https://builds.apache.org/job/Lucene-Solr-Maven-trunk/1116/

1 tests failed.
REGRESSION:  
org.apache.solr.cloud.ChaosMonkeyNothingIsSafeTest.testDistribSearch

Error Message:
There were too many update fails - we expect it can happen, but shouldn't easily

Stack Trace:
java.lang.AssertionError: There were too many update fails - we expect it can 
happen, but shouldn't easily
at 
__randomizedtesting.SeedInfo.seed([9739675D9B83AA7F:16DFE945ECDCCA43]:0)
at org.junit.Assert.fail(Assert.java:93)
at org.junit.Assert.assertTrue(Assert.java:43)
at org.junit.Assert.assertFalse(Assert.java:68)
at 
org.apache.solr.cloud.ChaosMonkeyNothingIsSafeTest.doTest(ChaosMonkeyNothingIsSafeTest.java:210)




Build Log:
[...truncated 53174 lines...]
BUILD FAILED
/usr/home/hudson/hudson-slave/workspace/Lucene-Solr-Maven-trunk/build.xml:488: 
The following error occurred while executing this line:
/usr/home/hudson/hudson-slave/workspace/Lucene-Solr-Maven-trunk/build.xml:176: 
The following error occurred while executing this line:
/usr/home/hudson/hudson-slave/workspace/Lucene-Solr-Maven-trunk/extra-targets.xml:77:
 Java returned: 1

Total time: 143 minutes 41 seconds
Build step 'Invoke Ant' marked build as failure
Recording test results
Email was triggered for: Failure
Sending email for trigger: Failure



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (SOLR-5793) SignatureUpdateProcessorFactoryTest routinely fails on J9

2014-02-28 Thread ASF subversion and git services (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-5793?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13916068#comment-13916068
 ] 

ASF subversion and git services commented on SOLR-5793:
---

Commit 1573019 from hoss...@apache.org in branch 'dev/trunk'
[ https://svn.apache.org/r1573019 ]

SOLR-5793, SOLR-5792, SOLR-5791: switch to using Constants.JAVA_VENDOR in assume

> SignatureUpdateProcessorFactoryTest routinely fails on J9
> -
>
> Key: SOLR-5793
> URL: https://issues.apache.org/jira/browse/SOLR-5793
> Project: Solr
>  Issue Type: Bug
>Reporter: Hoss Man
>
> Two very similar looking failures pop up frequently, but not always 
> together...
> {noformat}
> REGRESSION:  
> org.apache.solr.update.processor.SignatureUpdateProcessorFactoryTest.testMultiThreaded
> Error Message:
> expected:<1> but was:<2>
> Stack Trace:
> java.lang.AssertionError: expected:<1> but was:<2>
>   at 
> __randomizedtesting.SeedInfo.seed([791041A112471F1D:18859B41FA9615EB]:0)
>   at org.junit.Assert.fail(Assert.java:93)
>   at org.junit.Assert.failNotEquals(Assert.java:647)
>   at org.junit.Assert.assertEquals(Assert.java:128)
>   at org.junit.Assert.assertEquals(Assert.java:472)
>   at org.junit.Assert.assertEquals(Assert.java:456)
>   at 
> org.apache.solr.update.processor.SignatureUpdateProcessorFactoryTest.checkNumDocs(SignatureUpdateProcessorFactoryTest.java:71)
>   at 
> org.apache.solr.update.processor.SignatureUpdateProcessorFactoryTest.testMultiThreaded(SignatureUpdateProcessorFactoryTest.java:222)
> {noformat}
> {noformat}
> REGRESSION:  
> org.apache.solr.update.processor.SignatureUpdateProcessorFactoryTest.testDupeDetection
> Error Message:
> expected:<1> but was:<2>
> Stack Trace:
> java.lang.AssertionError: expected:<1> but was:<2>
>   at 
> __randomizedtesting.SeedInfo.seed([16A8922439B48E61:4D9869EC3AF32D1D]:0)
>   at org.junit.Assert.fail(Assert.java:93)
>   at org.junit.Assert.failNotEquals(Assert.java:647)
>   at org.junit.Assert.assertEquals(Assert.java:128)
>   at org.junit.Assert.assertEquals(Assert.java:472)
>   at org.junit.Assert.assertEquals(Assert.java:456)
>   at 
> org.apache.solr.update.processor.SignatureUpdateProcessorFactoryTest.checkNumDocs(SignatureUpdateProcessorFactoryTest.java:71)
>   at 
> org.apache.solr.update.processor.SignatureUpdateProcessorFactoryTest.testDupeDetection(SignatureUpdateProcessorFactoryTest.java:119)
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-5792) TermVectorComponentDistributedTest routinely fails on J9

2014-02-28 Thread ASF subversion and git services (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-5792?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13916069#comment-13916069
 ] 

ASF subversion and git services commented on SOLR-5792:
---

Commit 1573019 from hoss...@apache.org in branch 'dev/trunk'
[ https://svn.apache.org/r1573019 ]

SOLR-5793, SOLR-5792, SOLR-5791: switch to using Constants.JAVA_VENDOR in assume

> TermVectorComponentDistributedTest routinely fails on J9
> 
>
> Key: SOLR-5792
> URL: https://issues.apache.org/jira/browse/SOLR-5792
> Project: Solr
>  Issue Type: Bug
>Reporter: Hoss Man
>
> Perhaps the code is using a Map when it should be using a NamedList? or 
> perhaps the test should be configured not to care about the order .. is hte 
> order meaningful in this part of the output?
> {noformat}
> REGRESSION:  
> org.apache.solr.handler.component.TermVectorComponentDistributedTest.testDistribSearch
> Error Message:
> .termVectors.0.test_basictv!=test_postv (unordered or missing)
> Stack Trace:
> junit.framework.AssertionFailedError: .termVectors.0.test_basictv!=test_postv 
> (unordered or missing)
> at 
> __randomizedtesting.SeedInfo.seed([C6763A182C2489BA:4790B4005B7BE986]:0)
> at junit.framework.Assert.fail(Assert.java:50)
> at
> org.apache.solr.BaseDistributedSearchTestCase.compareSolrResponses(BaseDistributedSearchTestCase.java:843)
> at 
> org.apache.solr.BaseDistributedSearchTestCase.compareResponses(BaseDistributedSearchTestCase.java:862)
> at 
> org.apache.solr.BaseDistributedSearchTestCase.query(BaseDistributedSearchTestCase.java:565)
> at 
> org.apache.solr.BaseDistributedSearchTestCase.query(BaseDistributedSearchTestCase.java:545)
> at 
> org.apache.solr.BaseDistributedSearchTestCase.query(BaseDistributedSearchTestCase.java:524)
> at
> org.apache.solr.handler.component.TermVectorComponentDistributedTest.doTest(TermVectorComponentDistributedTest.java:
> 164)
> at 
> org.apache.solr.BaseDistributedSearchTestCase.testDistribSearch(BaseDistributedSearchTestCase.java:876)
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-5791) DistributedQueryElevationComponentTest routinely fails on J9

2014-02-28 Thread ASF subversion and git services (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-5791?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13916070#comment-13916070
 ] 

ASF subversion and git services commented on SOLR-5791:
---

Commit 1573019 from hoss...@apache.org in branch 'dev/trunk'
[ https://svn.apache.org/r1573019 ]

SOLR-5793, SOLR-5792, SOLR-5791: switch to using Constants.JAVA_VENDOR in assume

> DistributedQueryElevationComponentTest routinely fails on J9
> 
>
> Key: SOLR-5791
> URL: https://issues.apache.org/jira/browse/SOLR-5791
> Project: Solr
>  Issue Type: Bug
>Reporter: Hoss Man
>
> Either there is a bug in how the params are handled that only manifests 
> itself in J9, or the test needs fixed to not expect the params in a certain 
> order
> {noformat}
> REGRESSION:  
> org.apache.solr.handler.component.DistributedQueryElevationComponentTest.testDistribSearch
> Error Message:
> .responseHeader.params.fl!=version (unordered or missing)
> Stack Trace:
> junit.framework.AssertionFailedError: .responseHeader.params.fl!=version 
> (unordered or missing)
> at 
> __randomizedtesting.SeedInfo.seed([C6763A182C2489BA:4790B4005B7BE986]:0)
> at junit.framework.Assert.fail(Assert.java:50)
> at
> org.apache.solr.BaseDistributedSearchTestCase.compareSolrResponses(BaseDistributedSearchTestCase.java:843)
> at 
> org.apache.solr.BaseDistributedSearchTestCase.compareResponses(BaseDistributedSearchTestCase.java:862)
> at 
> org.apache.solr.BaseDistributedSearchTestCase.query(BaseDistributedSearchTestCase.java:565)
> at 
> org.apache.solr.BaseDistributedSearchTestCase.query(BaseDistributedSearchTestCase.java:545)
> at 
> org.apache.solr.BaseDistributedSearchTestCase.query(BaseDistributedSearchTestCase.java:524)
> at
> org.apache.solr.handler.component.DistributedQueryElevationComponentTest.doTest(DistributedQueryElevationComponentTe
> st.java:81)
> at 
> org.apache.solr.BaseDistributedSearchTestCase.testDistribSearch(BaseDistributedSearchTestCase.java:870)
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-5792) TermVectorComponentDistributedTest routinely fails on J9

2014-02-28 Thread ASF subversion and git services (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-5792?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13916077#comment-13916077
 ] 

ASF subversion and git services commented on SOLR-5792:
---

Commit 1573020 from hoss...@apache.org in branch 'dev/branches/branch_4x'
[ https://svn.apache.org/r1573020 ]

SOLR-5793, SOLR-5792, SOLR-5791: switch to using Constants.JAVA_VENDOR in 
assume (merge r1573019)

> TermVectorComponentDistributedTest routinely fails on J9
> 
>
> Key: SOLR-5792
> URL: https://issues.apache.org/jira/browse/SOLR-5792
> Project: Solr
>  Issue Type: Bug
>Reporter: Hoss Man
>
> Perhaps the code is using a Map when it should be using a NamedList? or 
> perhaps the test should be configured not to care about the order .. is hte 
> order meaningful in this part of the output?
> {noformat}
> REGRESSION:  
> org.apache.solr.handler.component.TermVectorComponentDistributedTest.testDistribSearch
> Error Message:
> .termVectors.0.test_basictv!=test_postv (unordered or missing)
> Stack Trace:
> junit.framework.AssertionFailedError: .termVectors.0.test_basictv!=test_postv 
> (unordered or missing)
> at 
> __randomizedtesting.SeedInfo.seed([C6763A182C2489BA:4790B4005B7BE986]:0)
> at junit.framework.Assert.fail(Assert.java:50)
> at
> org.apache.solr.BaseDistributedSearchTestCase.compareSolrResponses(BaseDistributedSearchTestCase.java:843)
> at 
> org.apache.solr.BaseDistributedSearchTestCase.compareResponses(BaseDistributedSearchTestCase.java:862)
> at 
> org.apache.solr.BaseDistributedSearchTestCase.query(BaseDistributedSearchTestCase.java:565)
> at 
> org.apache.solr.BaseDistributedSearchTestCase.query(BaseDistributedSearchTestCase.java:545)
> at 
> org.apache.solr.BaseDistributedSearchTestCase.query(BaseDistributedSearchTestCase.java:524)
> at
> org.apache.solr.handler.component.TermVectorComponentDistributedTest.doTest(TermVectorComponentDistributedTest.java:
> 164)
> at 
> org.apache.solr.BaseDistributedSearchTestCase.testDistribSearch(BaseDistributedSearchTestCase.java:876)
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-5793) SignatureUpdateProcessorFactoryTest routinely fails on J9

2014-02-28 Thread ASF subversion and git services (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-5793?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13916076#comment-13916076
 ] 

ASF subversion and git services commented on SOLR-5793:
---

Commit 1573020 from hoss...@apache.org in branch 'dev/branches/branch_4x'
[ https://svn.apache.org/r1573020 ]

SOLR-5793, SOLR-5792, SOLR-5791: switch to using Constants.JAVA_VENDOR in 
assume (merge r1573019)

> SignatureUpdateProcessorFactoryTest routinely fails on J9
> -
>
> Key: SOLR-5793
> URL: https://issues.apache.org/jira/browse/SOLR-5793
> Project: Solr
>  Issue Type: Bug
>Reporter: Hoss Man
>
> Two very similar looking failures pop up frequently, but not always 
> together...
> {noformat}
> REGRESSION:  
> org.apache.solr.update.processor.SignatureUpdateProcessorFactoryTest.testMultiThreaded
> Error Message:
> expected:<1> but was:<2>
> Stack Trace:
> java.lang.AssertionError: expected:<1> but was:<2>
>   at 
> __randomizedtesting.SeedInfo.seed([791041A112471F1D:18859B41FA9615EB]:0)
>   at org.junit.Assert.fail(Assert.java:93)
>   at org.junit.Assert.failNotEquals(Assert.java:647)
>   at org.junit.Assert.assertEquals(Assert.java:128)
>   at org.junit.Assert.assertEquals(Assert.java:472)
>   at org.junit.Assert.assertEquals(Assert.java:456)
>   at 
> org.apache.solr.update.processor.SignatureUpdateProcessorFactoryTest.checkNumDocs(SignatureUpdateProcessorFactoryTest.java:71)
>   at 
> org.apache.solr.update.processor.SignatureUpdateProcessorFactoryTest.testMultiThreaded(SignatureUpdateProcessorFactoryTest.java:222)
> {noformat}
> {noformat}
> REGRESSION:  
> org.apache.solr.update.processor.SignatureUpdateProcessorFactoryTest.testDupeDetection
> Error Message:
> expected:<1> but was:<2>
> Stack Trace:
> java.lang.AssertionError: expected:<1> but was:<2>
>   at 
> __randomizedtesting.SeedInfo.seed([16A8922439B48E61:4D9869EC3AF32D1D]:0)
>   at org.junit.Assert.fail(Assert.java:93)
>   at org.junit.Assert.failNotEquals(Assert.java:647)
>   at org.junit.Assert.assertEquals(Assert.java:128)
>   at org.junit.Assert.assertEquals(Assert.java:472)
>   at org.junit.Assert.assertEquals(Assert.java:456)
>   at 
> org.apache.solr.update.processor.SignatureUpdateProcessorFactoryTest.checkNumDocs(SignatureUpdateProcessorFactoryTest.java:71)
>   at 
> org.apache.solr.update.processor.SignatureUpdateProcessorFactoryTest.testDupeDetection(SignatureUpdateProcessorFactoryTest.java:119)
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-5791) DistributedQueryElevationComponentTest routinely fails on J9

2014-02-28 Thread ASF subversion and git services (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-5791?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13916078#comment-13916078
 ] 

ASF subversion and git services commented on SOLR-5791:
---

Commit 1573020 from hoss...@apache.org in branch 'dev/branches/branch_4x'
[ https://svn.apache.org/r1573020 ]

SOLR-5793, SOLR-5792, SOLR-5791: switch to using Constants.JAVA_VENDOR in 
assume (merge r1573019)

> DistributedQueryElevationComponentTest routinely fails on J9
> 
>
> Key: SOLR-5791
> URL: https://issues.apache.org/jira/browse/SOLR-5791
> Project: Solr
>  Issue Type: Bug
>Reporter: Hoss Man
>
> Either there is a bug in how the params are handled that only manifests 
> itself in J9, or the test needs fixed to not expect the params in a certain 
> order
> {noformat}
> REGRESSION:  
> org.apache.solr.handler.component.DistributedQueryElevationComponentTest.testDistribSearch
> Error Message:
> .responseHeader.params.fl!=version (unordered or missing)
> Stack Trace:
> junit.framework.AssertionFailedError: .responseHeader.params.fl!=version 
> (unordered or missing)
> at 
> __randomizedtesting.SeedInfo.seed([C6763A182C2489BA:4790B4005B7BE986]:0)
> at junit.framework.Assert.fail(Assert.java:50)
> at
> org.apache.solr.BaseDistributedSearchTestCase.compareSolrResponses(BaseDistributedSearchTestCase.java:843)
> at 
> org.apache.solr.BaseDistributedSearchTestCase.compareResponses(BaseDistributedSearchTestCase.java:862)
> at 
> org.apache.solr.BaseDistributedSearchTestCase.query(BaseDistributedSearchTestCase.java:565)
> at 
> org.apache.solr.BaseDistributedSearchTestCase.query(BaseDistributedSearchTestCase.java:545)
> at 
> org.apache.solr.BaseDistributedSearchTestCase.query(BaseDistributedSearchTestCase.java:524)
> at
> org.apache.solr.handler.component.DistributedQueryElevationComponentTest.doTest(DistributedQueryElevationComponentTe
> st.java:81)
> at 
> org.apache.solr.BaseDistributedSearchTestCase.testDistribSearch(BaseDistributedSearchTestCase.java:870)
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-5536) Add ValueSource collapse criteria to CollapsingQParsingPlugin

2014-02-28 Thread Joel Bernstein (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-5536?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13916081#comment-13916081
 ] 

Joel Bernstein commented on SOLR-5536:
--

Hi Peter,

There were a couple of jiras for the CollapsingQParserPlugin being worked on at 
once while this ticket was being developed. So the patch here likely built upon 
changes committed on other tickets. So it's best to look at the commits that 
were done in svn for the CollapsingQParserPlugin and merge them in order into 
your branch.

Joel

> Add ValueSource collapse criteria to CollapsingQParsingPlugin
> -
>
> Key: SOLR-5536
> URL: https://issues.apache.org/jira/browse/SOLR-5536
> Project: Solr
>  Issue Type: Improvement
>  Components: search
>Affects Versions: 4.6
>Reporter: Joel Bernstein
>Assignee: Joel Bernstein
>Priority: Minor
> Fix For: 4.7, 5.0
>
> Attachments: SOLR-5536-edited.patch, SOLR-5536.patch, 
> SOLR-5536.patch, SOLR-5536.patch, SOLR-5536.patch
>
>
> It would be useful for the CollapsingQParserPlugin to support ValueSource 
> collapse criteria.
> Proposed syntax:
> {code}
> fq={!collapse field=collapse_field max=value_source}
> {code}
> This ticket will also introduce a function query called "cscore",  which will 
> return the score of the current document being collapsed. This will allow 
> score to be incorporated into collapse criteria functions.
> A simple example of the cscore usage:
> {code}
> fq={!collapse field=collapse_field max=sum(cscore(), field(x))}
> {code}
>  



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-4787) Join Contrib

2014-02-28 Thread Joel Bernstein (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-4787?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13916095#comment-13916095
 ] 

Joel Bernstein commented on SOLR-4787:
--

Hi Alexander,

This ticket has not been committed. There are two joins described on the list 
of QParserPlugins here:

https://cwiki.apache.org/confluence/display/solr/Other+Parser

Joel

> Join Contrib
> 
>
> Key: SOLR-4787
> URL: https://issues.apache.org/jira/browse/SOLR-4787
> Project: Solr
>  Issue Type: New Feature
>  Components: search
>Affects Versions: 4.2.1
>Reporter: Joel Bernstein
>Priority: Minor
> Fix For: 4.7
>
> Attachments: SOLR-4787-deadlock-fix.patch, 
> SOLR-4787-pjoin-long-keys.patch, SOLR-4787.patch, SOLR-4787.patch, 
> SOLR-4787.patch, SOLR-4787.patch, SOLR-4787.patch, SOLR-4787.patch, 
> SOLR-4787.patch, SOLR-4787.patch, SOLR-4787.patch, SOLR-4787.patch, 
> SOLR-4787.patch, SOLR-4787.patch, SOLR-4787.patch, SOLR-4787.patch, 
> SOLR-4797-hjoin-multivaluekeys-trunk.patch
>
>
> This contrib provides a place where different join implementations can be 
> contributed to Solr. This contrib currently includes 3 join implementations. 
> The initial patch was generated from the Solr 4.3 tag. Because of changes in 
> the FieldCache API this patch will only build with Solr 4.2 or above.
> *HashSetJoinQParserPlugin aka hjoin*
> The hjoin provides a join implementation that filters results in one core 
> based on the results of a search in another core. This is similar in 
> functionality to the JoinQParserPlugin but the implementation differs in a 
> couple of important ways.
> The first way is that the hjoin is designed to work with int and long join 
> keys only. So, in order to use hjoin, int or long join keys must be included 
> in both the to and from core.
> The second difference is that the hjoin builds memory structures that are 
> used to quickly connect the join keys. So, the hjoin will need more memory 
> then the JoinQParserPlugin to perform the join.
> The main advantage of the hjoin is that it can scale to join millions of keys 
> between cores and provide sub-second response time. The hjoin should work 
> well with up to two million results from the fromIndex and tens of millions 
> of results from the main query.
> The hjoin supports the following features:
> 1) Both lucene query and PostFilter implementations. A *"cost"* > 99 will 
> turn on the PostFilter. The PostFilter will typically outperform the Lucene 
> query when the main query results have been narrowed down.
> 2) With the lucene query implementation there is an option to build the 
> filter with threads. This can greatly improve the performance of the query if 
> the main query index is very large. The "threads" parameter turns on 
> threading. For example *threads=6* will use 6 threads to build the filter. 
> This will setup a fixed threadpool with six threads to handle all hjoin 
> requests. Once the threadpool is created the hjoin will always use it to 
> build the filter. Threading does not come into play with the PostFilter.
> 3) The *size* local parameter can be used to set the initial size of the 
> hashset used to perform the join. If this is set above the number of results 
> from the fromIndex then the you can avoid hashset resizing which improves 
> performance.
> 4) Nested filter queries. The local parameter "fq" can be used to nest a 
> filter query within the join. The nested fq will filter the results of the 
> join query. This can point to another join to support nested joins.
> 5) Full caching support for the lucene query implementation. The filterCache 
> and queryResultCache should work properly even with deep nesting of joins. 
> Only the queryResultCache comes into play with the PostFilter implementation 
> because PostFilters are not cacheable in the filterCache.
> The syntax of the hjoin is similar to the JoinQParserPlugin except that the 
> plugin is referenced by the string "hjoin" rather then "join".
> fq=\{!hjoin fromIndex=collection2 from=id_i to=id_i threads=6 
> fq=$qq\}user:customer1&qq=group:5
> The example filter query above will search the fromIndex (collection2) for 
> "user:customer1" applying the local fq parameter to filter the results. The 
> lucene filter query will be built using 6 threads. This query will generate a 
> list of values from the "from" field that will be used to filter the main 
> query. Only records from the main query, where the "to" field is present in 
> the "from" list will be included in the results.
> The solrconfig.xml in the main query core must contain the reference to the 
> hjoin.
>  class="org.apache.solr.joins.HashSetJoinQParserPlugin"/>
> And the join contrib lib jars must be registed in the solrconfig.xml.
>  
> After issuing the "ant dist" command from inside the solr directory t

[jira] [Comment Edited] (SOLR-4787) Join Contrib

2014-02-28 Thread Joel Bernstein (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-4787?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13916095#comment-13916095
 ] 

Joel Bernstein edited comment on SOLR-4787 at 2/28/14 5:56 PM:
---

Hi Alexander,

This ticket has not been committed. There are two joins described on the list 
of QParserPlugins here:

https://cwiki.apache.org/confluence/display/solr/Other+Parsers

Joel


was (Author: joel.bernstein):
Hi Alexander,

This ticket has not been committed. There are two joins described on the list 
of QParserPlugins here:

https://cwiki.apache.org/confluence/display/solr/Other+Parser

Joel

> Join Contrib
> 
>
> Key: SOLR-4787
> URL: https://issues.apache.org/jira/browse/SOLR-4787
> Project: Solr
>  Issue Type: New Feature
>  Components: search
>Affects Versions: 4.2.1
>Reporter: Joel Bernstein
>Priority: Minor
> Fix For: 4.7
>
> Attachments: SOLR-4787-deadlock-fix.patch, 
> SOLR-4787-pjoin-long-keys.patch, SOLR-4787.patch, SOLR-4787.patch, 
> SOLR-4787.patch, SOLR-4787.patch, SOLR-4787.patch, SOLR-4787.patch, 
> SOLR-4787.patch, SOLR-4787.patch, SOLR-4787.patch, SOLR-4787.patch, 
> SOLR-4787.patch, SOLR-4787.patch, SOLR-4787.patch, SOLR-4787.patch, 
> SOLR-4797-hjoin-multivaluekeys-trunk.patch
>
>
> This contrib provides a place where different join implementations can be 
> contributed to Solr. This contrib currently includes 3 join implementations. 
> The initial patch was generated from the Solr 4.3 tag. Because of changes in 
> the FieldCache API this patch will only build with Solr 4.2 or above.
> *HashSetJoinQParserPlugin aka hjoin*
> The hjoin provides a join implementation that filters results in one core 
> based on the results of a search in another core. This is similar in 
> functionality to the JoinQParserPlugin but the implementation differs in a 
> couple of important ways.
> The first way is that the hjoin is designed to work with int and long join 
> keys only. So, in order to use hjoin, int or long join keys must be included 
> in both the to and from core.
> The second difference is that the hjoin builds memory structures that are 
> used to quickly connect the join keys. So, the hjoin will need more memory 
> then the JoinQParserPlugin to perform the join.
> The main advantage of the hjoin is that it can scale to join millions of keys 
> between cores and provide sub-second response time. The hjoin should work 
> well with up to two million results from the fromIndex and tens of millions 
> of results from the main query.
> The hjoin supports the following features:
> 1) Both lucene query and PostFilter implementations. A *"cost"* > 99 will 
> turn on the PostFilter. The PostFilter will typically outperform the Lucene 
> query when the main query results have been narrowed down.
> 2) With the lucene query implementation there is an option to build the 
> filter with threads. This can greatly improve the performance of the query if 
> the main query index is very large. The "threads" parameter turns on 
> threading. For example *threads=6* will use 6 threads to build the filter. 
> This will setup a fixed threadpool with six threads to handle all hjoin 
> requests. Once the threadpool is created the hjoin will always use it to 
> build the filter. Threading does not come into play with the PostFilter.
> 3) The *size* local parameter can be used to set the initial size of the 
> hashset used to perform the join. If this is set above the number of results 
> from the fromIndex then the you can avoid hashset resizing which improves 
> performance.
> 4) Nested filter queries. The local parameter "fq" can be used to nest a 
> filter query within the join. The nested fq will filter the results of the 
> join query. This can point to another join to support nested joins.
> 5) Full caching support for the lucene query implementation. The filterCache 
> and queryResultCache should work properly even with deep nesting of joins. 
> Only the queryResultCache comes into play with the PostFilter implementation 
> because PostFilters are not cacheable in the filterCache.
> The syntax of the hjoin is similar to the JoinQParserPlugin except that the 
> plugin is referenced by the string "hjoin" rather then "join".
> fq=\{!hjoin fromIndex=collection2 from=id_i to=id_i threads=6 
> fq=$qq\}user:customer1&qq=group:5
> The example filter query above will search the fromIndex (collection2) for 
> "user:customer1" applying the local fq parameter to filter the results. The 
> lucene filter query will be built using 6 threads. This query will generate a 
> list of values from the "from" field that will be used to filter the main 
> query. Only records from the main query, where the "to" field is present in 
> the "from" list will be included in the results.
> The solrconfig.xml in th

[jira] [Commented] (SOLR-5536) Add ValueSource collapse criteria to CollapsingQParsingPlugin

2014-02-28 Thread Peter Keegan (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-5536?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13916103#comment-13916103
 ] 

Peter Keegan commented on SOLR-5536:


Hi Joel,

The patch worked fine with 4.6.1 after I removed the 'bad' lines (Eclipse patch 
tool is rather fussy):

$ diff SOLR-5536.patch  SOLR-5536-edited.patch
324,331d323
< \ No newline at end of file
<
< Property changes on: 
solr/core/src/java/org/apache/solr/search/function/CollapseScoreFunction.java
< ___
< Added: svn:eol-style
< ## -0,0 +1 ##
< +native
< \ No newline at end of property
391,393c383
<  //Test collapse by score with elevation
<
<  params = new ModifiableSolrParams();

Thanks,
Peter


> Add ValueSource collapse criteria to CollapsingQParsingPlugin
> -
>
> Key: SOLR-5536
> URL: https://issues.apache.org/jira/browse/SOLR-5536
> Project: Solr
>  Issue Type: Improvement
>  Components: search
>Affects Versions: 4.6
>Reporter: Joel Bernstein
>Assignee: Joel Bernstein
>Priority: Minor
> Fix For: 4.7, 5.0
>
> Attachments: SOLR-5536-edited.patch, SOLR-5536.patch, 
> SOLR-5536.patch, SOLR-5536.patch, SOLR-5536.patch
>
>
> It would be useful for the CollapsingQParserPlugin to support ValueSource 
> collapse criteria.
> Proposed syntax:
> {code}
> fq={!collapse field=collapse_field max=value_source}
> {code}
> This ticket will also introduce a function query called "cscore",  which will 
> return the score of the current document being collapsed. This will allow 
> score to be incorporated into collapse criteria functions.
> A simple example of the cscore usage:
> {code}
> fq={!collapse field=collapse_field max=sum(cscore(), field(x))}
> {code}
>  



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Closed] (LUCENE-5469) Add small rounding to FuzzyQuery.floatToEdits

2014-02-28 Thread Tim Allison (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-5469?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tim Allison closed LUCENE-5469.
---

Resolution: Not A Problem

will open another issue on nuking fuzzyminsim

> Add small rounding to FuzzyQuery.floatToEdits
> -
>
> Key: LUCENE-5469
> URL: https://issues.apache.org/jira/browse/LUCENE-5469
> Project: Lucene - Core
>  Issue Type: Bug
>  Components: core/search
>Affects Versions: 5.0
>Reporter: Tim Allison
>Priority: Trivial
>  Labels: easyfix
> Attachments: LUCENE-5469.patch
>
>
> I realize that FuzzyQuery.floatToEdits is deprecated, but I'd like to make a 
> small fix for posterity.  Because of floating point issues, if a percentage 
> leads to a number that is very close to a whole number of edits, our cast to 
> int can improperly cause misses.
> d~0.8  will not match "X"
> e~0.8 will not match "" or "ee"
> This is a trivial part of the plan to reduce code duplication with 
> LUCENE-5205.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Created] (SOLR-5795) Option to periodically delete docs based on an expiration field -- or ttl specified when indexed.

2014-02-28 Thread Hoss Man (JIRA)
Hoss Man created SOLR-5795:
--

 Summary: Option to periodically delete docs based on an expiration 
field -- or ttl specified when indexed.
 Key: SOLR-5795
 URL: https://issues.apache.org/jira/browse/SOLR-5795
 Project: Solr
  Issue Type: New Feature
Reporter: Hoss Man
Assignee: Hoss Man


A question I get periodically from people is how to automatically remove 
documents from a collection at a certain time (or after a certain amount of 
time).  

Excluding from search results using a filter query on a date field is trivial, 
but you still have to periodically send a deleteByQuery to clean up those older 
"expired" documents.  And in the case where you want all documents to 
auto-expire some fixed amount of time when they were indexed, you still have to 
setup a simple UpdateProcessorto set that expiration date.  So i've been 
thinking it would be nice if there was a simple way to configure solr to do it 
all for you.





--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-5795) Option to periodically delete docs based on an expiration field -- or ttl specified when indexed.

2014-02-28 Thread Hoss Man (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-5795?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13916126#comment-13916126
 ] 

Hoss Man commented on SOLR-5795:


The one hitch with this idea -- which is already a problem if you do the same 
logic from an external client -- is that ass things stand today, if you do a 
lot of periodic {{deleteByQuery}} commands with auto-commit, every one will 
cause a new searcher to be opened, even if nothing was actually deleted -- but 
it looks like we can fix that independently in SOLR-5783.

I'm going to tackle the design i laid out here once I get SOLR-5783 in shape 
with enough tests that i'm comfortable committing.

> Option to periodically delete docs based on an expiration field -- or ttl 
> specified when indexed.
> -
>
> Key: SOLR-5795
> URL: https://issues.apache.org/jira/browse/SOLR-5795
> Project: Solr
>  Issue Type: New Feature
>Reporter: Hoss Man
>Assignee: Hoss Man
>
> A question I get periodically from people is how to automatically remove 
> documents from a collection at a certain time (or after a certain amount of 
> time).  
> Excluding from search results using a filter query on a date field is 
> trivial, but you still have to periodically send a deleteByQuery to clean up 
> those older "expired" documents.  And in the case where you want all 
> documents to auto-expire some fixed amount of time when they were indexed, 
> you still have to setup a simple UpdateProcessorto set that expiration date.  
> So i've been thinking it would be nice if there was a simple way to configure 
> solr to do it all for you.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-5795) Option to periodically delete docs based on an expiration field -- or ttl specified when indexed.

2014-02-28 Thread Hoss Man (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-5795?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13916125#comment-13916125
 ] 

Hoss Man commented on SOLR-5795:


Here's the basic design i've been fleshing out in my head...

* A new "{{ExpireDocsUpdateProcessorFactory}}"
** can compute an {{expiration}} field to add to indexed docs based on a 
"{{ttl}}" field in the input doc
*** perhaps could also fallback to a {{ttl}} update request param when bulk 
adding similar to {{\_version\_}} ?
*** {{IgnoreFieldUpdateProcessorFactory}} could be used to remove the {{ttl}} 
if they don't wnat a record in the index of when/why {{expiration_date}} was 
computed
** Can trigger periodic {{deleteByQuery}} on {{expiration}} time field
* rough idea for configuration...{code}

  
  expire_at
  
  300
  
  ttl

{code}
* {{ExpireDocsUpdateProcessorFactory.init()}} logic:
** if {{ttl.fieldName}} is specified make a note of it
** validate {{expiration.fieldName}} is set & exists in schema
*** perhaps in managed schema mode create automatically if it doesn't?
** if {{deleteIntervalInSeconds}} is set:
*** spin up a recurring {{ScheduledThreadPoolExecutor}} with a recurring 
{{AutoExpireDocsCallable}}
*** add a core Shutdown hook to shutdown the executor when the core shuts down
* {{ExpireDocsUpdateProcessor.processAdd()}} logic:
** if {{ttl.fieldName}} is configured & doc contains that field name:
*** treat value as datemath from NOW and put computed value in 
{{expiration.fieldName}}
** else: No-Op
* {{AutoExpireDocsCallable}} logic:
** if cloud mode, return No-Op unless we are running on the overseer
** Create a {{DeleteUpdateCommand}} using {{deleteByQuery}} of {{\[* TO NOW\]}} 
using the {{expiration.fieldName}}
*** this can be fired directly against the {{UpdateRequestProcessor}} returned 
by the {{ExpireDocsUpdateProcessorFactory}} itself using a 
{{LocalSolrQueryRequest}}
 Or perhaps we make an optional configuration so you can specify any chain 
name and we fetch it from the SolrCore?
*** the existing distributed delete logic should ensure it gets distributed 
cleanly in cloud mode
*** NOTE: the executor should run on every node, and only do the overseer check 
when the executor fires, so even when the overseer changes periodically, 
whoever the current overseer is every X minutes will fire the delete.

This, combined with things like {{DefaultValueUpdateProcessorFactory}}, 
{{IgnoreFieldUpdateProcessorFactory}} and 
{{FirstFieldValueUpdateProcessorFactory}} on the {{ttl.fieldName}} and/or 
{{expiration.fieldName}} should allow all sorts of various usecases:

* every doc expires after X amount of time no matter what the client says
* every doc defaults to an ttl of X unless it has a doc explicit ttl
* every doc defaults to an ttl of X unless it has a doc explicit expire date
* docs can optional expire after a ttl specified when they were indexed
* docs can optional expire at an explicit time specified when they were indexed


> Option to periodically delete docs based on an expiration field -- or ttl 
> specified when indexed.
> -
>
> Key: SOLR-5795
> URL: https://issues.apache.org/jira/browse/SOLR-5795
> Project: Solr
>  Issue Type: New Feature
>Reporter: Hoss Man
>Assignee: Hoss Man
>
> A question I get periodically from people is how to automatically remove 
> documents from a collection at a certain time (or after a certain amount of 
> time).  
> Excluding from search results using a filter query on a date field is 
> trivial, but you still have to periodically send a deleteByQuery to clean up 
> those older "expired" documents.  And in the case where you want all 
> documents to auto-expire some fixed amount of time when they were indexed, 
> you still have to setup a simple UpdateProcessorto set that expiration date.  
> So i've been thinking it would be nice if there was a simple way to configure 
> solr to do it all for you.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-5783) Can we stop opening a new searcher when the index hasn't changed?

2014-02-28 Thread Hoss Man (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-5783?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13916133#comment-13916133
 ] 

Hoss Man commented on SOLR-5783:


bq. LGTM.

thanks mark.

bq.  looks like you meant to use newName

nice catch.

bq. I had to fix a test...

Your fix jives with that i was referring to -- although i believe we can make 
that test more assertive: the reader refCounts should not only be the same, the 
searchers themselves should be identical.

I'll work on adding some more test to increase my confidence

> Can we stop opening a new searcher when the index hasn't changed?
> -
>
> Key: SOLR-5783
> URL: https://issues.apache.org/jira/browse/SOLR-5783
> Project: Solr
>  Issue Type: Improvement
>Reporter: Hoss Man
> Attachments: SOLR-5783.patch, SOLR-5783.patch
>
>
> I've been thinking recently about how/when we re-open searchers -- and what 
> the overhead of that is in terms of caches and what not -- even if the 
> underlying index hasn't changed.  
> The particular real world case that got me thinking about this recently is 
> when a deleteByQuery gets forwarded to all shards in a collection, and then 
> the subsequent (soft)Commit (either auto or explicit) opens a new searcher -- 
> even if that shard was completley uneffected by the delete.
> It got me wondering: why don't re-use the same searcher when the index is 
> unchanged?
> From what I can tell, we're basically 99% of the way there (in 
> {{}})...
> * IndexWriter.commit is already smart enough to short circut if there's 
> nothing to commit
> * SolrCore.openNewSearcher already uses DirectoryReader.openIfChanged to see 
> if the reader can be re-used.
> * for "realtime" purposes, SolrCore.openNewSearcher will return the existing 
> searcher if it exists and the DirectoryReader hasn't changed
> ...The only reason I could think of for not _always_ re-using the same 
> searcher when the underlying DirectoryReader is identical (ie: that last 
> bullet above) is in the situation where the "live" schema has changed -- but 
> that seems pretty trivial to account for.
> Is there any other reason why this wouldn't be a good idea for improving 
> performance?



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (LUCENE-5480) Hunspell shouldnt merge dictionary entries

2014-02-28 Thread Robert Muir (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-5480?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Robert Muir updated LUCENE-5480:


Attachment: LUCENE-5480.patch

here is my current state. i've unraveled a few bugs with these cool little 
tests (the examples from the man page). I'll see how far I can get but i wanted 
to snapshot here since its some progress...

> Hunspell shouldnt merge dictionary entries
> --
>
> Key: LUCENE-5480
> URL: https://issues.apache.org/jira/browse/LUCENE-5480
> Project: Lucene - Core
>  Issue Type: Bug
>  Components: modules/analysis
>Reporter: Robert Muir
> Attachments: LUCENE-5480.patch
>
>
> Ive been writing lots of little unit tests for this thing, and I'm pretty 
> positive i screwed this up in LUCENE-5468... sorry
> Otherwise the whole "prefix-suffix dependencies" described in the manpage 
> won't work.
> Either 'words' should be changed from FST to FST, or when 
> there are duplicates we should add 'padding' that we just consume 
> (suggester-style). The latter is a little tricky, but I think this is 
> generally uncommon so it would keep the FST smaller.
> shouldnt be hard to fix.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-5482) improve default TurkishAnalyzer

2014-02-28 Thread Ahmet Arslan (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-5482?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13916137#comment-13916137
 ] 

Ahmet Arslan commented on LUCENE-5482:
--

This is similar to ClassicFilter that removes 's from the end of words. But 
ClassicFilter is useful for English language only and has nothing to do with 
Turkish. Because it only removes 's and 'S. In Turkish different character 
sequences may come after an apostrophe. e.g. 'nin, 'a, 'nin, 'ü etc.

In Turkish, apostrophe is used to separate suffixes from proper names 
(continent, sea, river, lake, mountain, upland, proper names related to 
religion and mythology). For example Van Gölü’ne (meaning: to Lake Van).

> improve default TurkishAnalyzer
> ---
>
> Key: LUCENE-5482
> URL: https://issues.apache.org/jira/browse/LUCENE-5482
> Project: Lucene - Core
>  Issue Type: Improvement
>  Components: modules/analysis
>Affects Versions: 4.7
>Reporter: Ahmet Arslan
>Priority: Minor
>  Labels: Turkish, analysis
> Fix For: 4.8
>
>
> Add a TokenFilter that strips characters after an apostrophe (including the 
> apostrophe itself). 



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Created] (LUCENE-5482) improve default TurkishAnalyzer

2014-02-28 Thread Ahmet Arslan (JIRA)
Ahmet Arslan created LUCENE-5482:


 Summary: improve default TurkishAnalyzer
 Key: LUCENE-5482
 URL: https://issues.apache.org/jira/browse/LUCENE-5482
 Project: Lucene - Core
  Issue Type: Improvement
  Components: modules/analysis
Affects Versions: 4.7
Reporter: Ahmet Arslan
Priority: Minor
 Fix For: 4.8


Add a TokenFilter that strips characters after an apostrophe (including the 
apostrophe itself). 



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-5422) Postings lists deduplication

2014-02-28 Thread Vishmi Money (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-5422?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13916140#comment-13916140
 ] 

Vishmi Money commented on LUCENE-5422:
--

Hi,
I am Vishmi Money and I am a third year undergraduate at Department of Computer 
Science and Engineering, University of Moratuwa, Sri Lanka.

I am familiar with Lucene as I have read and learnt about it for a project in 
which I have  tried to implement Global Search for moodle. But then I found out 
that Lucene was a dead end for that as moodle is a php implementation.

After going through the discussion you provided, I am very interested to work 
on this project for GSoc 2014 because I am very intersted in Data Structures 
and Algorithms area too.

Can you further explain me about the relationship of LUCENE-2082 to LUCENE-5422?
so that I can start work on this project.



> Postings lists deduplication
> 
>
> Key: LUCENE-5422
> URL: https://issues.apache.org/jira/browse/LUCENE-5422
> Project: Lucene - Core
>  Issue Type: Improvement
>  Components: core/codecs, core/index
>Reporter: Dmitry Kan
>  Labels: gsoc2014
>
> The context:
> http://markmail.org/thread/tywtrjjcfdbzww6f
> Robert Muir and I have discussed what Robert eventually named "postings
> lists deduplication" at Berlin Buzzwords 2013 conference.
> The idea is to allow multiple terms to point to the same postings list to
> save space. This can be achieved by new index codec implementation, but this 
> jira is open to other ideas as well.
> The application / impact of this is positive for synonyms, exact / inexact
> terms, leading wildcard support via storing reversed term etc.
> For example, at the moment, when supporting exact (unstemmed) and inexact 
> (stemmed)
> searches, we store both unstemmed and stemmed variant of a word form and
> that leads to index bloating. That is why we had to remove the leading
> wildcard support via reversing a token on index and query time because of
> the same index size considerations.
> Comment from Mike McCandless:
> Neat idea!
> Would this idea allow a single term to point to (the union of) N other
> posting lists?  It seems like that's necessary e.g. to handle the
> exact/inexact case.
> And then, to produce the Docs/AndPositionsEnum you'd need to do the
> merge sort across those N posting lists?
> Such a thing might also be do-able as runtime only wrapper around the
> postings API (FieldsProducer), if you could at runtime do the reverse
> expansion (e.g. stem -> all of its surface forms).
> Comment from Robert Muir:
> I think the exact/inexact is trickier (detecting it would be the hard
> part), and you are right, another solution might work better.
> but for the reverse wildcard and synonyms situation, it seems we could even
> detect it on write if we created some hash of the previous terms postings.
> if the hash matches for the current term, we know it might be a "duplicate"
> and would have to actually do the costly check they are the same.
> maybe there are better ways to do it, but it might be a fun postingformat
> experiment to try.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-5482) improve default TurkishAnalyzer

2014-02-28 Thread Robert Muir (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-5482?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13916149#comment-13916149
 ] 

Robert Muir commented on LUCENE-5482:
-

+1, i saw your paper (very nice) on this and think it would be a great addition 
to lucene!

> improve default TurkishAnalyzer
> ---
>
> Key: LUCENE-5482
> URL: https://issues.apache.org/jira/browse/LUCENE-5482
> Project: Lucene - Core
>  Issue Type: Improvement
>  Components: modules/analysis
>Affects Versions: 4.7
>Reporter: Ahmet Arslan
>Priority: Minor
>  Labels: Turkish, analysis
> Fix For: 4.8
>
>
> Add a TokenFilter that strips characters after an apostrophe (including the 
> apostrophe itself). 



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-5480) Hunspell shouldnt merge dictionary entries

2014-02-28 Thread Robert Muir (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-5480?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13916175#comment-13916175
 ] 

Robert Muir commented on LUCENE-5480:
-

I think the current bug is a longstanding one, because prefix and suffix 
stripping is not intertwined (so continuation classes from prefixes dont apply 
to suffixes and so on).

This causes overstemming today.

I'd like to fix the current bug(s) here with the uploaded patch and open a 
followup issue for that... its progress.

> Hunspell shouldnt merge dictionary entries
> --
>
> Key: LUCENE-5480
> URL: https://issues.apache.org/jira/browse/LUCENE-5480
> Project: Lucene - Core
>  Issue Type: Bug
>  Components: modules/analysis
>Reporter: Robert Muir
> Attachments: LUCENE-5480.patch
>
>
> Ive been writing lots of little unit tests for this thing, and I'm pretty 
> positive i screwed this up in LUCENE-5468... sorry
> Otherwise the whole "prefix-suffix dependencies" described in the manpage 
> won't work.
> Either 'words' should be changed from FST to FST, or when 
> there are duplicates we should add 'padding' that we just consume 
> (suggester-style). The latter is a little tricky, but I think this is 
> generally uncommon so it would keep the FST smaller.
> shouldnt be hard to fix.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-5795) Option to periodically delete docs based on an expiration field -- or ttl specified when indexed.

2014-02-28 Thread Steven Bower (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-5795?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13916185#comment-13916185
 ] 

Steven Bower commented on SOLR-5795:


The idea makes sense but this seems to me like a horrible feature to add... 
Content should never be removed without explicit external interactions and this 
will lead to so many "where did my content go" type problems.. Especially since 
once its gone from the index debugging what went wrong is not going to be 
easy.. writing a script to send a query delete periodically is really not that 
complex and then it becomes the responsibility of the content owner/developer 
to delete content..

I would suggest that if this does go in some sort of "audit" output be produced 
(eg X docs delete automatically or a list of ids)

Also per this design both the exp and ttl fields must be required if specified 
in the config else mayhem..

> Option to periodically delete docs based on an expiration field -- or ttl 
> specified when indexed.
> -
>
> Key: SOLR-5795
> URL: https://issues.apache.org/jira/browse/SOLR-5795
> Project: Solr
>  Issue Type: New Feature
>Reporter: Hoss Man
>Assignee: Hoss Man
>
> A question I get periodically from people is how to automatically remove 
> documents from a collection at a certain time (or after a certain amount of 
> time).  
> Excluding from search results using a filter query on a date field is 
> trivial, but you still have to periodically send a deleteByQuery to clean up 
> those older "expired" documents.  And in the case where you want all 
> documents to auto-expire some fixed amount of time when they were indexed, 
> you still have to setup a simple UpdateProcessorto set that expiration date.  
> So i've been thinking it would be nice if there was a simple way to configure 
> solr to do it all for you.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (LUCENE-5482) improve default TurkishAnalyzer

2014-02-28 Thread Ahmet Arslan (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-5482?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ahmet Arslan updated LUCENE-5482:
-

Attachment: LUCENE-5482.patch

This patch adds a new TokenFilter named ApostropheFilter. 

> improve default TurkishAnalyzer
> ---
>
> Key: LUCENE-5482
> URL: https://issues.apache.org/jira/browse/LUCENE-5482
> Project: Lucene - Core
>  Issue Type: Improvement
>  Components: modules/analysis
>Affects Versions: 4.7
>Reporter: Ahmet Arslan
>Priority: Minor
>  Labels: Turkish, analysis
> Fix For: 4.8
>
> Attachments: LUCENE-5482.patch
>
>
> Add a TokenFilter that strips characters after an apostrophe (including the 
> apostrophe itself). 



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-5482) improve default TurkishAnalyzer

2014-02-28 Thread Ahmet Arslan (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-5482?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13916193#comment-13916193
 ] 

Ahmet Arslan commented on LUCENE-5482:
--

Thank you for your interest [~rcmuir] ! Here is the 
[paper|http://www.ipcsit.com/vol57/015-ICNI2012-M021.pdf] in case anyone 
interested. It's more like a solr writeup though. 

> improve default TurkishAnalyzer
> ---
>
> Key: LUCENE-5482
> URL: https://issues.apache.org/jira/browse/LUCENE-5482
> Project: Lucene - Core
>  Issue Type: Improvement
>  Components: modules/analysis
>Affects Versions: 4.7
>Reporter: Ahmet Arslan
>Priority: Minor
>  Labels: Turkish, analysis
> Fix For: 4.8
>
> Attachments: LUCENE-5482.patch
>
>
> Add a TokenFilter that strips characters after an apostrophe (including the 
> apostrophe itself). 



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-5482) improve default TurkishAnalyzer

2014-02-28 Thread Ahmet Arslan (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-5482?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13916200#comment-13916200
 ] 

Ahmet Arslan commented on LUCENE-5482:
--

It is possible to achieve described behavior with following existing filters. 
(without a custom filter) Any thoughts on which way is preferred?

{code:xml}
 
{code}


{code:xml}
 
{code}

> improve default TurkishAnalyzer
> ---
>
> Key: LUCENE-5482
> URL: https://issues.apache.org/jira/browse/LUCENE-5482
> Project: Lucene - Core
>  Issue Type: Improvement
>  Components: modules/analysis
>Affects Versions: 4.7
>Reporter: Ahmet Arslan
>Priority: Minor
>  Labels: Turkish, analysis
> Fix For: 4.8
>
> Attachments: LUCENE-5482.patch
>
>
> Add a TokenFilter that strips characters after an apostrophe (including the 
> apostrophe itself). 



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-5482) improve default TurkishAnalyzer

2014-02-28 Thread Uwe Schindler (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-5482?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13916199#comment-13916199
 ] 

Uwe Schindler commented on LUCENE-5482:
---

Hi,
your patch contains unrelated changes in analysis' modules root folder (adding 
of a useless classpath). Can you fix this?
Also, because you add new functionality, TurkishAnalyzer should only add the 
new TokenFilter, if matchVersion is at least LUCENE_48.

> improve default TurkishAnalyzer
> ---
>
> Key: LUCENE-5482
> URL: https://issues.apache.org/jira/browse/LUCENE-5482
> Project: Lucene - Core
>  Issue Type: Improvement
>  Components: modules/analysis
>Affects Versions: 4.7
>Reporter: Ahmet Arslan
>Priority: Minor
>  Labels: Turkish, analysis
> Fix For: 4.8
>
> Attachments: LUCENE-5482.patch
>
>
> Add a TokenFilter that strips characters after an apostrophe (including the 
> apostrophe itself). 



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-5482) improve default TurkishAnalyzer

2014-02-28 Thread Robert Muir (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-5482?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13916205#comment-13916205
 ] 

Robert Muir commented on LUCENE-5482:
-

I prefer the explicit filter you have now! 

> improve default TurkishAnalyzer
> ---
>
> Key: LUCENE-5482
> URL: https://issues.apache.org/jira/browse/LUCENE-5482
> Project: Lucene - Core
>  Issue Type: Improvement
>  Components: modules/analysis
>Affects Versions: 4.7
>Reporter: Ahmet Arslan
>Priority: Minor
>  Labels: Turkish, analysis
> Fix For: 4.8
>
> Attachments: LUCENE-5482.patch
>
>
> Add a TokenFilter that strips characters after an apostrophe (including the 
> apostrophe itself). 



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Assigned] (LUCENE-5482) improve default TurkishAnalyzer

2014-02-28 Thread Robert Muir (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-5482?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Robert Muir reassigned LUCENE-5482:
---

Assignee: Robert Muir

> improve default TurkishAnalyzer
> ---
>
> Key: LUCENE-5482
> URL: https://issues.apache.org/jira/browse/LUCENE-5482
> Project: Lucene - Core
>  Issue Type: Improvement
>  Components: modules/analysis
>Affects Versions: 4.7
>Reporter: Ahmet Arslan
>Assignee: Robert Muir
>Priority: Minor
>  Labels: Turkish, analysis
> Fix For: 4.8
>
> Attachments: LUCENE-5482.patch
>
>
> Add a TokenFilter that strips characters after an apostrophe (including the 
> apostrophe itself). 



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-5482) improve default TurkishAnalyzer

2014-02-28 Thread Uwe Schindler (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-5482?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13916207#comment-13916207
 ] 

Uwe Schindler commented on LUCENE-5482:
---

This should also work:
{code:xml}

{code}

> improve default TurkishAnalyzer
> ---
>
> Key: LUCENE-5482
> URL: https://issues.apache.org/jira/browse/LUCENE-5482
> Project: Lucene - Core
>  Issue Type: Improvement
>  Components: modules/analysis
>Affects Versions: 4.7
>Reporter: Ahmet Arslan
>Assignee: Robert Muir
>Priority: Minor
>  Labels: Turkish, analysis
> Fix For: 4.8
>
> Attachments: LUCENE-5482.patch
>
>
> Add a TokenFilter that strips characters after an apostrophe (including the 
> apostrophe itself). 



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Comment Edited] (LUCENE-5482) improve default TurkishAnalyzer

2014-02-28 Thread Uwe Schindler (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-5482?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13916207#comment-13916207
 ] 

Uwe Schindler edited comment on LUCENE-5482 at 2/28/14 7:18 PM:


This should also work:
{code:xml}

{code}


was (Author: thetaphi):
This should also work:
{code:xml}

{code}

> improve default TurkishAnalyzer
> ---
>
> Key: LUCENE-5482
> URL: https://issues.apache.org/jira/browse/LUCENE-5482
> Project: Lucene - Core
>  Issue Type: Improvement
>  Components: modules/analysis
>Affects Versions: 4.7
>Reporter: Ahmet Arslan
>Assignee: Robert Muir
>Priority: Minor
>  Labels: Turkish, analysis
> Fix For: 4.8
>
> Attachments: LUCENE-5482.patch
>
>
> Add a TokenFilter that strips characters after an apostrophe (including the 
> apostrophe itself). 



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-5482) improve default TurkishAnalyzer

2014-02-28 Thread Ahmet Arslan (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-5482?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13916211#comment-13916211
 ] 

Ahmet Arslan commented on LUCENE-5482:
--

Thanks for looking into this [~thetaphi]. I wanted to use QueryParser in 
TestTurkishAnalyzer.java but I am not familiar with ant. I want to include a 
checkMatch(String text, String qString) method that checks this : "this query 
string" should retrieve "this document text" 

I added this but not sure this is correct.
{code:xml}   



  
{code} 

> improve default TurkishAnalyzer
> ---
>
> Key: LUCENE-5482
> URL: https://issues.apache.org/jira/browse/LUCENE-5482
> Project: Lucene - Core
>  Issue Type: Improvement
>  Components: modules/analysis
>Affects Versions: 4.7
>Reporter: Ahmet Arslan
>Assignee: Robert Muir
>Priority: Minor
>  Labels: Turkish, analysis
> Fix For: 4.8
>
> Attachments: LUCENE-5482.patch
>
>
> Add a TokenFilter that strips characters after an apostrophe (including the 
> apostrophe itself). 



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-5482) improve default TurkishAnalyzer

2014-02-28 Thread Robert Muir (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-5482?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13916215#comment-13916215
 ] 

Robert Muir commented on LUCENE-5482:
-

Generally speaking its enough to just do assertAnalyzesTo/tokenStreamContents 
in unit tests. it keeps everything simple and easier to debug than 
integration-like tests.

Thats why we don't depend on queryparser in any of the tests today.

> improve default TurkishAnalyzer
> ---
>
> Key: LUCENE-5482
> URL: https://issues.apache.org/jira/browse/LUCENE-5482
> Project: Lucene - Core
>  Issue Type: Improvement
>  Components: modules/analysis
>Affects Versions: 4.7
>Reporter: Ahmet Arslan
>Assignee: Robert Muir
>Priority: Minor
>  Labels: Turkish, analysis
> Fix For: 4.8
>
> Attachments: LUCENE-5482.patch
>
>
> Add a TokenFilter that strips characters after an apostrophe (including the 
> apostrophe itself). 



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-5482) improve default TurkishAnalyzer

2014-02-28 Thread Uwe Schindler (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-5482?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13916218#comment-13916218
 ] 

Uwe Schindler commented on LUCENE-5482:
---

We should not add an additional dependency to the query parser module! I would 
remove this test, we generally don't add such type of tests. Use 
BaseTokenStreamTestCase as base class for your test and use the various assert 
methods to check if the token stream is what you expect. Feeding IndexWriter 
with your tokens and executing a search is not really a "unit test" anymore. We 
have enough tests for the indexing.

> improve default TurkishAnalyzer
> ---
>
> Key: LUCENE-5482
> URL: https://issues.apache.org/jira/browse/LUCENE-5482
> Project: Lucene - Core
>  Issue Type: Improvement
>  Components: modules/analysis
>Affects Versions: 4.7
>Reporter: Ahmet Arslan
>Assignee: Robert Muir
>Priority: Minor
>  Labels: Turkish, analysis
> Fix For: 4.8
>
> Attachments: LUCENE-5482.patch
>
>
> Add a TokenFilter that strips characters after an apostrophe (including the 
> apostrophe itself). 



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (LUCENE-5482) improve default TurkishAnalyzer

2014-02-28 Thread Ahmet Arslan (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-5482?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ahmet Arslan updated LUCENE-5482:
-

Attachment: LUCENE-5482.patch

useless class path chance and test case removed. 

> improve default TurkishAnalyzer
> ---
>
> Key: LUCENE-5482
> URL: https://issues.apache.org/jira/browse/LUCENE-5482
> Project: Lucene - Core
>  Issue Type: Improvement
>  Components: modules/analysis
>Affects Versions: 4.7
>Reporter: Ahmet Arslan
>Assignee: Robert Muir
>Priority: Minor
>  Labels: Turkish, analysis
> Fix For: 4.8
>
> Attachments: LUCENE-5482.patch, LUCENE-5482.patch
>
>
> Add a TokenFilter that strips characters after an apostrophe (including the 
> apostrophe itself). 



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-5482) improve default TurkishAnalyzer

2014-02-28 Thread Robert Muir (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-5482?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13916238#comment-13916238
 ] 

Robert Muir commented on LUCENE-5482:
-

This looks great Ahmet: As Uwe mentioned, i think the only change we need is 
the condition in TurkishAnalyzer:
{code}
if matchVersion.onOrAfter(Version.LUCENE_48) {
 // do new stuff, include the new filter
} else {
 // do old stuff
}
{code}

Otherwise, this change looks ready to me.

> improve default TurkishAnalyzer
> ---
>
> Key: LUCENE-5482
> URL: https://issues.apache.org/jira/browse/LUCENE-5482
> Project: Lucene - Core
>  Issue Type: Improvement
>  Components: modules/analysis
>Affects Versions: 4.7
>Reporter: Ahmet Arslan
>Assignee: Robert Muir
>Priority: Minor
>  Labels: Turkish, analysis
> Fix For: 4.8
>
> Attachments: LUCENE-5482.patch, LUCENE-5482.patch
>
>
> Add a TokenFilter that strips characters after an apostrophe (including the 
> apostrophe itself). 



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-5482) improve default TurkishAnalyzer

2014-02-28 Thread Robert Muir (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-5482?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13916245#comment-13916245
 ] 

Robert Muir commented on LUCENE-5482:
-

Oh one other thing that would be nice, if you could add some javadocs to the 
public classes? 

The factories typically have an example of its use (see some of the others). 
For the filter itself, maybe just a simple description of what it does, and a 
reference to your paper would be good (since you have done experiments and so 
on).

> improve default TurkishAnalyzer
> ---
>
> Key: LUCENE-5482
> URL: https://issues.apache.org/jira/browse/LUCENE-5482
> Project: Lucene - Core
>  Issue Type: Improvement
>  Components: modules/analysis
>Affects Versions: 4.7
>Reporter: Ahmet Arslan
>Assignee: Robert Muir
>Priority: Minor
>  Labels: Turkish, analysis
> Fix For: 4.8
>
> Attachments: LUCENE-5482.patch, LUCENE-5482.patch
>
>
> Add a TokenFilter that strips characters after an apostrophe (including the 
> apostrophe itself). 



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-5480) Hunspell shouldnt merge dictionary entries

2014-02-28 Thread ASF subversion and git services (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-5480?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13916252#comment-13916252
 ] 

ASF subversion and git services commented on LUCENE-5480:
-

Commit 1573048 from [~rcmuir] in branch 'dev/trunk'
[ https://svn.apache.org/r1573048 ]

LUCENE-5480: Hunspell shouldn't merge dictionary entries

> Hunspell shouldnt merge dictionary entries
> --
>
> Key: LUCENE-5480
> URL: https://issues.apache.org/jira/browse/LUCENE-5480
> Project: Lucene - Core
>  Issue Type: Bug
>  Components: modules/analysis
>Reporter: Robert Muir
> Attachments: LUCENE-5480.patch
>
>
> Ive been writing lots of little unit tests for this thing, and I'm pretty 
> positive i screwed this up in LUCENE-5468... sorry
> Otherwise the whole "prefix-suffix dependencies" described in the manpage 
> won't work.
> Either 'words' should be changed from FST to FST, or when 
> there are duplicates we should add 'padding' that we just consume 
> (suggester-style). The latter is a little tricky, but I think this is 
> generally uncommon so it would keep the FST smaller.
> shouldnt be hard to fix.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-5482) improve default TurkishAnalyzer

2014-02-28 Thread Ahmet Arslan (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-5482?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13916274#comment-13916274
 ] 

Ahmet Arslan commented on LUCENE-5482:
--

bq. if matchVersion.onOrAfter(Version.LUCENE_48)
I tried this but there is no LUCENE_48 in trunk.

> improve default TurkishAnalyzer
> ---
>
> Key: LUCENE-5482
> URL: https://issues.apache.org/jira/browse/LUCENE-5482
> Project: Lucene - Core
>  Issue Type: Improvement
>  Components: modules/analysis
>Affects Versions: 4.7
>Reporter: Ahmet Arslan
>Assignee: Robert Muir
>Priority: Minor
>  Labels: Turkish, analysis
> Fix For: 4.8
>
> Attachments: LUCENE-5482.patch, LUCENE-5482.patch
>
>
> Add a TokenFilter that strips characters after an apostrophe (including the 
> apostrophe itself). 



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-5480) Hunspell shouldnt merge dictionary entries

2014-02-28 Thread ASF subversion and git services (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-5480?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13916275#comment-13916275
 ] 

ASF subversion and git services commented on LUCENE-5480:
-

Commit 1573057 from [~rcmuir] in branch 'dev/branches/branch_4x'
[ https://svn.apache.org/r1573057 ]

LUCENE-5480: Hunspell shouldn't merge dictionary entries

> Hunspell shouldnt merge dictionary entries
> --
>
> Key: LUCENE-5480
> URL: https://issues.apache.org/jira/browse/LUCENE-5480
> Project: Lucene - Core
>  Issue Type: Bug
>  Components: modules/analysis
>Reporter: Robert Muir
> Attachments: LUCENE-5480.patch
>
>
> Ive been writing lots of little unit tests for this thing, and I'm pretty 
> positive i screwed this up in LUCENE-5468... sorry
> Otherwise the whole "prefix-suffix dependencies" described in the manpage 
> won't work.
> Either 'words' should be changed from FST to FST, or when 
> there are duplicates we should add 'padding' that we just consume 
> (suggester-style). The latter is a little tricky, but I think this is 
> generally uncommon so it would keep the FST smaller.
> shouldnt be hard to fix.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Resolved] (LUCENE-5480) Hunspell shouldnt merge dictionary entries

2014-02-28 Thread Robert Muir (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-5480?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Robert Muir resolved LUCENE-5480.
-

   Resolution: Fixed
Fix Version/s: 5.0
   4.8

> Hunspell shouldnt merge dictionary entries
> --
>
> Key: LUCENE-5480
> URL: https://issues.apache.org/jira/browse/LUCENE-5480
> Project: Lucene - Core
>  Issue Type: Bug
>  Components: modules/analysis
>Reporter: Robert Muir
> Fix For: 4.8, 5.0
>
> Attachments: LUCENE-5480.patch
>
>
> Ive been writing lots of little unit tests for this thing, and I'm pretty 
> positive i screwed this up in LUCENE-5468... sorry
> Otherwise the whole "prefix-suffix dependencies" described in the manpage 
> won't work.
> Either 'words' should be changed from FST to FST, or when 
> there are duplicates we should add 'padding' that we just consume 
> (suggester-style). The latter is a little tricky, but I think this is 
> generally uncommon so it would keep the FST smaller.
> shouldnt be hard to fix.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-5482) improve default TurkishAnalyzer

2014-02-28 Thread Robert Muir (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-5482?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13916281#comment-13916281
 ] 

Robert Muir commented on LUCENE-5482:
-

Thats a bug. I will take care of it right now!

> improve default TurkishAnalyzer
> ---
>
> Key: LUCENE-5482
> URL: https://issues.apache.org/jira/browse/LUCENE-5482
> Project: Lucene - Core
>  Issue Type: Improvement
>  Components: modules/analysis
>Affects Versions: 4.7
>Reporter: Ahmet Arslan
>Assignee: Robert Muir
>Priority: Minor
>  Labels: Turkish, analysis
> Fix For: 4.8
>
> Attachments: LUCENE-5482.patch, LUCENE-5482.patch
>
>
> Add a TokenFilter that strips characters after an apostrophe (including the 
> apostrophe itself). 



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-5482) improve default TurkishAnalyzer

2014-02-28 Thread ASF subversion and git services (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-5482?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13916286#comment-13916286
 ] 

ASF subversion and git services commented on LUCENE-5482:
-

Commit 1573059 from [~rcmuir] in branch 'dev/trunk'
[ https://svn.apache.org/r1573059 ]

LUCENE-5482: add missing constant

> improve default TurkishAnalyzer
> ---
>
> Key: LUCENE-5482
> URL: https://issues.apache.org/jira/browse/LUCENE-5482
> Project: Lucene - Core
>  Issue Type: Improvement
>  Components: modules/analysis
>Affects Versions: 4.7
>Reporter: Ahmet Arslan
>Assignee: Robert Muir
>Priority: Minor
>  Labels: Turkish, analysis
> Fix For: 4.8
>
> Attachments: LUCENE-5482.patch, LUCENE-5482.patch
>
>
> Add a TokenFilter that strips characters after an apostrophe (including the 
> apostrophe itself). 



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-5482) improve default TurkishAnalyzer

2014-02-28 Thread Robert Muir (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-5482?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13916288#comment-13916288
 ] 

Robert Muir commented on LUCENE-5482:
-

Thanks for pointing that out, you should see the constant now.

> improve default TurkishAnalyzer
> ---
>
> Key: LUCENE-5482
> URL: https://issues.apache.org/jira/browse/LUCENE-5482
> Project: Lucene - Core
>  Issue Type: Improvement
>  Components: modules/analysis
>Affects Versions: 4.7
>Reporter: Ahmet Arslan
>Assignee: Robert Muir
>Priority: Minor
>  Labels: Turkish, analysis
> Fix For: 4.8
>
> Attachments: LUCENE-5482.patch, LUCENE-5482.patch
>
>
> Add a TokenFilter that strips characters after an apostrophe (including the 
> apostrophe itself). 



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-5482) improve default TurkishAnalyzer

2014-02-28 Thread ASF subversion and git services (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-5482?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13916287#comment-13916287
 ] 

ASF subversion and git services commented on LUCENE-5482:
-

Commit 1573061 from [~rcmuir] in branch 'dev/branches/branch_4x'
[ https://svn.apache.org/r1573061 ]

LUCENE-5482: remove wrong text from this, its not the latest

> improve default TurkishAnalyzer
> ---
>
> Key: LUCENE-5482
> URL: https://issues.apache.org/jira/browse/LUCENE-5482
> Project: Lucene - Core
>  Issue Type: Improvement
>  Components: modules/analysis
>Affects Versions: 4.7
>Reporter: Ahmet Arslan
>Assignee: Robert Muir
>Priority: Minor
>  Labels: Turkish, analysis
> Fix For: 4.8
>
> Attachments: LUCENE-5482.patch, LUCENE-5482.patch
>
>
> Add a TokenFilter that strips characters after an apostrophe (including the 
> apostrophe itself). 



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Created] (SOLR-5796) With many collections, leader re-election takes too long when a node dies or is rebooted, leading to some shards getting into a "conflicting" state about who is the leader

2014-02-28 Thread Timothy Potter (JIRA)
Timothy Potter created SOLR-5796:


 Summary: With many collections, leader re-election takes too long 
when a node dies or is rebooted, leading to some shards getting into a 
"conflicting" state about who is the leader.
 Key: SOLR-5796
 URL: https://issues.apache.org/jira/browse/SOLR-5796
 Project: Solr
  Issue Type: Bug
  Components: SolrCloud
 Environment: Found on branch_4x
Reporter: Timothy Potter


I'm doing some testing with a 4-node SolrCloud cluster against the latest rev 
in branch_4x having many collections, 150 to be exact, each having 4 shards 
with rf=3, so 450 cores per node. Nodes are decent in terms of resources: 
-Xmx6g with 4 CPU - m3.xlarge's in EC2.

The problem occurs when rebooting one of the nodes, say as part of a rolling 
restart of the cluster. If I kill one node and then wait for an extended period 
of time, such as 3 minutes, then all of the leaders on the downed node (roughly 
150) have time to failover to another node in the cluster. When I restart the 
downed node, since leaders have all failed over successfully, the new node 
starts up and all cores assume the replica role in their respective shards. 
This is goodness and expected.

However, if I don't wait long enough for the leader failover process to 
complete on the other nodes before restarting the downed node, 
then some bad things happen. Specifically, when the dust settles, many of the 
previous leaders on the node I restarted get stuck in the "conflicting" state 
seen in the ZkController, starting around line 852 in branch_4x:

{quote}
852   while (!leaderUrl.equals(clusterStateLeaderUrl)) {
853 if (tries == 60) {
854   throw new SolrException(ErrorCode.SERVER_ERROR,
855   "There is conflicting information about the leader of shard: "
856   + cloudDesc.getShardId() + " our state says:"
857   + clusterStateLeaderUrl + " but zookeeper says:" + 
leaderUrl);
858 }
859 Thread.sleep(1000);
860 tries++;
861 clusterStateLeaderUrl = zkStateReader.getLeaderUrl(collection, 
shardId,
862 timeoutms);
863 leaderUrl = getLeaderProps(collection, cloudDesc.getShardId(), 
timeoutms)
864 .getCoreUrl();
865   }
{quote}

As you can see, the code is trying to give a little time for this problem to 
work itself out, 1 minute to be exact. Unfortunately, that doesn't seem to be 
long enough for a busy cluster that has many collections. Now, one might argue 
that 450 cores per node is asking too much of Solr, however I think this points 
to a bigger issue of the fact that a node coming up isn't aware that it went 
down and leader election is running on other nodes and is just being slow. 
Moreover, once this problem occurs, it's not clear how to fix it besides 
shutting the node down again and waiting for leader failover to complete.

It's also interesting to me that /clusterstate.json was updated by the healthy 
node taking over the leader role but the /collections/leaders/shard# was 
not updated? I added some debugging and it seems like the overseer queue is 
extremely backed up with work.

Maybe the solution here is to just wait longer but I also want to get some 
feedback from the community on other options? I know there are some plans to 
help scale the Overseer (i.e. SOLR-5476) so maybe that helps and I'm trying to 
add more debug to see if this is really due to overseer backlog (which I 
suspect it is).

In general, I'm a little confused by the keeping of leader state in multiple 
places in ZK. Is there any background information on why we have leader state 
in /clusterstate.json and in the leader path znode?

Also, here are some interesting side observations:

a. If I use rf=2, then this problem doesn't occur as leader failover happens 
more quickly and there's less overseer work? 
May be a red herring here, but I can consistently reproduce it with RF=3, but 
not with RF=2 ... suppose that is because there are only 300 cores per node 
versus 450 and that's just enough less work to make this issue work itself out.

b. To support that many cores, I had to set -Xss256k to reduce the stack size 
as Solr uses a lot of threads during startup (high point was 800'ish)   
   
Might be something we should recommend on the mailing list / wiki somewhere.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-2894) Implement distributed pivot faceting

2014-02-28 Thread Brett Lucey (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-2894?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13916293#comment-13916293
 ] 

Brett Lucey commented on SOLR-2894:
---

Hi Elran,

Regarding the string/object issue:  We have not been able to revert back to 
toObject for all data types because doing so results in the following exception 
being thrown by the testDistribSearch case of DistributedFacetPivotTest.
org.apache.solr.client.solrj.impl.HttpSolrServer$RemoteSolrException: Invalid 
Date String:'Sat Sep 01 08:30:00 BOT 2012

This is part of the test case which demonstrates the issue:
//datetime
this.query( "q", "*:*",
"rows", "0",
"facet","true",
"facet.pivot","hiredate_dt,place_s,company_t",
"f.hiredate_dt.facet.limit","2",
"f.hiredate_dt.facet.offset","1",
FacetParams.FACET_LIMIT, "4"); //test default sort (count)

I am producing that error by running:
ant -Dtests.class="org.apache.solr.handler.component.DistributedFacetPivotTest" 
clean test

> Implement distributed pivot faceting
> 
>
> Key: SOLR-2894
> URL: https://issues.apache.org/jira/browse/SOLR-2894
> Project: Solr
>  Issue Type: Improvement
>Reporter: Erik Hatcher
> Fix For: 4.7
>
> Attachments: SOLR-2894-reworked.patch, SOLR-2894.patch, 
> SOLR-2894.patch, SOLR-2894.patch, SOLR-2894.patch, SOLR-2894.patch, 
> SOLR-2894.patch, SOLR-2894.patch, SOLR-2894.patch, SOLR-2894.patch, 
> SOLR-2894.patch, SOLR-2894.patch, SOLR-2894.patch, SOLR-2894.patch, 
> SOLR-2894.patch, SOLR-2894.patch, SOLR-2894.patch
>
>
> Following up on SOLR-792, pivot faceting currently only supports 
> undistributed mode.  Distributed pivot faceting needs to be implemented.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (LUCENE-5205) [PATCH] SpanQueryParser with recursion, analysis and syntax very similar to classic QueryParser

2014-02-28 Thread Tim Allison (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-5205?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tim Allison updated LUCENE-5205:


Attachment: LUCENE-5205-cleanup-tests.patch

This patch focuses on reducing code duplication in the test cases.  Reducing 
duplication in the main part of code will be separate patch.

1) TestSpanQPBasedOnQPTestBase (renamed to TestQPTestBaseSpanQuery)...now 
subclasses QueryParserTestBase.
  There were a few handfuls of tests that I couldn't easily modify; mostly 
these were string equality tests for complex queries. The CJK examples where 
complex truth queries were built programmatically also had to be rewritten for 
SpanQueries. Those now exist in testParserSpecificQuery()...solution is not 
elegant, and I don't like the name.  It would be slightly cleaner to move those 
handfuls of tests up into TestQueryParser, but I want them to be available to 
the other subclasses of QueryParserTestBase.

2) TestComplexPhraseSpanQuery now subclasses TestComplexPhraseQuery.  Again, 
had to add a testParserSpecificSyntax.  However, in this case the syntax btwn 
the two parsers is slightly different.
 
3) TestMultiAnalyzer (renamed to TestMultiAnalyzerSpanQuery) now subclasses 
TestMultiAnalyzer.  These tests were mostly string equality tests on complex 
queries, and I couldn't easily keep any of the tests. 

For the above, I'm sure there are more elegant solutions, but this is where the 
code is for now.



Small clean up in SpanQueryParserBase:
1) got rid of special option to lowercase regex...treat like any other 
multiterm, and eventually get rid of lowercasing multiterms altogether!

2) got rid of special rounding correction for fuzzy minSims.  Behavior is now 
the same as classic.


Bugs found:
1) failed to set boost on MatchAllDocsQuery
2) fixed analysis of range terms (the irony :( )

Capabilities not currently covered by SQP
1) Known: && ! syntax
2) Unknown: "term phrase term" ->
"+term +(+phrase1 +phrase2) +term"

> [PATCH] SpanQueryParser with recursion, analysis and syntax very similar to 
> classic QueryParser
> ---
>
> Key: LUCENE-5205
> URL: https://issues.apache.org/jira/browse/LUCENE-5205
> Project: Lucene - Core
>  Issue Type: Improvement
>  Components: core/queryparser
>Reporter: Tim Allison
>  Labels: patch
> Fix For: 4.7
>
> Attachments: LUCENE-5205-cleanup-tests.patch, 
> LUCENE-5205-date-pkg-prvt.patch, LUCENE-5205.patch.gz, LUCENE-5205.patch.gz, 
> LUCENE-5205_dateTestReInitPkgPrvt.patch, LUCENE-5205_smallTestMods.patch, 
> LUCENE_5205.patch, SpanQueryParser_v1.patch.gz, patch.txt
>
>
> This parser extends QueryParserBase and includes functionality from:
> * Classic QueryParser: most of its syntax
> * SurroundQueryParser: recursive parsing for "near" and "not" clauses.
> * ComplexPhraseQueryParser: can handle "near" queries that include multiterms 
> (wildcard, fuzzy, regex, prefix),
> * AnalyzingQueryParser: has an option to analyze multiterms.
> At a high level, there's a first pass BooleanQuery/field parser and then a 
> span query parser handles all terminal nodes and phrases.
> Same as classic syntax:
> * term: test 
> * fuzzy: roam~0.8, roam~2
> * wildcard: te?t, test*, t*st
> * regex: /\[mb\]oat/
> * phrase: "jakarta apache"
> * phrase with slop: "jakarta apache"~3
> * default "or" clause: jakarta apache
> * grouping "or" clause: (jakarta apache)
> * boolean and +/-: (lucene OR apache) NOT jakarta; +lucene +apache -jakarta
> * multiple fields: title:lucene author:hatcher
>  
> Main additions in SpanQueryParser syntax vs. classic syntax:
> * Can require "in order" for phrases with slop with the \~> operator: 
> "jakarta apache"\~>3
> * Can specify "not near": "fever bieber"!\~3,10 ::
> find "fever" but not if "bieber" appears within 3 words before or 10 
> words after it.
> * Fully recursive phrasal queries with \[ and \]; as in: \[\[jakarta 
> apache\]~3 lucene\]\~>4 :: 
> find "jakarta" within 3 words of "apache", and that hit has to be within 
> four words before "lucene"
> * Can also use \[\] for single level phrasal queries instead of " as in: 
> \[jakarta apache\]
> * Can use "or grouping" clauses in phrasal queries: "apache (lucene solr)"\~3 
> :: find "apache" and then either "lucene" or "solr" within three words.
> * Can use multiterms in phrasal queries: "jakarta\~1 ap*che"\~2
> * Did I mention full recursion: \[\[jakarta\~1 ap*che\]\~2 (solr~ 
> /l\[ou\]\+\[cs\]\[en\]\+/)]\~10 :: Find something like "jakarta" within two 
> words of "ap*che" and that hit has to be within ten words of something like 
> "solr" or that "lucene" regex.
> * Can require at least x number of hits at boolean level: "apache AND (lucene 
> solr tika)~2
> * Can use negative only query: -jakarta :: Fin

[jira] [Updated] (LUCENE-5482) improve default TurkishAnalyzer

2014-02-28 Thread Ahmet Arslan (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-5482?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ahmet Arslan updated LUCENE-5482:
-

Attachment: LUCENE-5482.patch

Java doc for public classes added
Version.LUCENE_48 check  added to TurkishAnalyzer

> improve default TurkishAnalyzer
> ---
>
> Key: LUCENE-5482
> URL: https://issues.apache.org/jira/browse/LUCENE-5482
> Project: Lucene - Core
>  Issue Type: Improvement
>  Components: modules/analysis
>Affects Versions: 4.7
>Reporter: Ahmet Arslan
>Assignee: Robert Muir
>Priority: Minor
>  Labels: Turkish, analysis
> Fix For: 4.8
>
> Attachments: LUCENE-5482.patch, LUCENE-5482.patch, LUCENE-5482.patch
>
>
> Add a TokenFilter that strips characters after an apostrophe (including the 
> apostrophe itself). 



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-5482) improve default TurkishAnalyzer

2014-02-28 Thread Ahmet Arslan (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-5482?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13916328#comment-13916328
 ] 

Ahmet Arslan commented on LUCENE-5482:
--

Should we add this if check to TestTurkishAnalyzer too?

{code}
 if(matchVersion.onOrAfter(Version.LUCENE_48))   
 // check apostrophes 
{code}

> improve default TurkishAnalyzer
> ---
>
> Key: LUCENE-5482
> URL: https://issues.apache.org/jira/browse/LUCENE-5482
> Project: Lucene - Core
>  Issue Type: Improvement
>  Components: modules/analysis
>Affects Versions: 4.7
>Reporter: Ahmet Arslan
>Assignee: Robert Muir
>Priority: Minor
>  Labels: Turkish, analysis
> Fix For: 4.8
>
> Attachments: LUCENE-5482.patch, LUCENE-5482.patch, LUCENE-5482.patch
>
>
> Add a TokenFilter that strips characters after an apostrophe (including the 
> apostrophe itself). 



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-5482) improve default TurkishAnalyzer

2014-02-28 Thread Robert Muir (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-5482?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13916336#comment-13916336
 ] 

Robert Muir commented on LUCENE-5482:
-

No its ok, because we only instantiate analyzers with the latest version

> improve default TurkishAnalyzer
> ---
>
> Key: LUCENE-5482
> URL: https://issues.apache.org/jira/browse/LUCENE-5482
> Project: Lucene - Core
>  Issue Type: Improvement
>  Components: modules/analysis
>Affects Versions: 4.7
>Reporter: Ahmet Arslan
>Assignee: Robert Muir
>Priority: Minor
>  Labels: Turkish, analysis
> Fix For: 4.8
>
> Attachments: LUCENE-5482.patch, LUCENE-5482.patch, LUCENE-5482.patch
>
>
> Add a TokenFilter that strips characters after an apostrophe (including the 
> apostrophe itself). 



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-5482) improve default TurkishAnalyzer

2014-02-28 Thread Ahmet Arslan (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-5482?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13916342#comment-13916342
 ] 

Ahmet Arslan commented on LUCENE-5482:
--

Great, Thanks for guidance and comments!

> improve default TurkishAnalyzer
> ---
>
> Key: LUCENE-5482
> URL: https://issues.apache.org/jira/browse/LUCENE-5482
> Project: Lucene - Core
>  Issue Type: Improvement
>  Components: modules/analysis
>Affects Versions: 4.7
>Reporter: Ahmet Arslan
>Assignee: Robert Muir
>Priority: Minor
>  Labels: Turkish, analysis
> Fix For: 4.8
>
> Attachments: LUCENE-5482.patch, LUCENE-5482.patch, LUCENE-5482.patch
>
>
> Add a TokenFilter that strips characters after an apostrophe (including the 
> apostrophe itself). 



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-5482) improve default TurkishAnalyzer

2014-02-28 Thread ASF subversion and git services (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-5482?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13916348#comment-13916348
 ] 

ASF subversion and git services commented on LUCENE-5482:
-

Commit 1573066 from [~rcmuir] in branch 'dev/trunk'
[ https://svn.apache.org/r1573066 ]

LUCENE-5482: Improve default TurkishAnalyzer

> improve default TurkishAnalyzer
> ---
>
> Key: LUCENE-5482
> URL: https://issues.apache.org/jira/browse/LUCENE-5482
> Project: Lucene - Core
>  Issue Type: Improvement
>  Components: modules/analysis
>Affects Versions: 4.7
>Reporter: Ahmet Arslan
>Assignee: Robert Muir
>Priority: Minor
>  Labels: Turkish, analysis
> Fix For: 4.8
>
> Attachments: LUCENE-5482.patch, LUCENE-5482.patch, LUCENE-5482.patch
>
>
> Add a TokenFilter that strips characters after an apostrophe (including the 
> apostrophe itself). 



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-5482) improve default TurkishAnalyzer

2014-02-28 Thread Uwe Schindler (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-5482?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13916355#comment-13916355
 ] 

Uwe Schindler commented on LUCENE-5482:
---

Cool, thanks!
+1 to commit

> improve default TurkishAnalyzer
> ---
>
> Key: LUCENE-5482
> URL: https://issues.apache.org/jira/browse/LUCENE-5482
> Project: Lucene - Core
>  Issue Type: Improvement
>  Components: modules/analysis
>Affects Versions: 4.7
>Reporter: Ahmet Arslan
>Assignee: Robert Muir
>Priority: Minor
>  Labels: Turkish, analysis
> Fix For: 4.8
>
> Attachments: LUCENE-5482.patch, LUCENE-5482.patch, LUCENE-5482.patch
>
>
> Add a TokenFilter that strips characters after an apostrophe (including the 
> apostrophe itself). 



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-5482) improve default TurkishAnalyzer

2014-02-28 Thread ASF subversion and git services (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-5482?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13916362#comment-13916362
 ] 

ASF subversion and git services commented on LUCENE-5482:
-

Commit 1573074 from [~rcmuir] in branch 'dev/branches/branch_4x'
[ https://svn.apache.org/r1573074 ]

LUCENE-5482: Improve default TurkishAnalyzer

> improve default TurkishAnalyzer
> ---
>
> Key: LUCENE-5482
> URL: https://issues.apache.org/jira/browse/LUCENE-5482
> Project: Lucene - Core
>  Issue Type: Improvement
>  Components: modules/analysis
>Affects Versions: 4.7
>Reporter: Ahmet Arslan
>Assignee: Robert Muir
>Priority: Minor
>  Labels: Turkish, analysis
> Fix For: 4.8
>
> Attachments: LUCENE-5482.patch, LUCENE-5482.patch, LUCENE-5482.patch
>
>
> Add a TokenFilter that strips characters after an apostrophe (including the 
> apostrophe itself). 



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Resolved] (LUCENE-5482) improve default TurkishAnalyzer

2014-02-28 Thread Robert Muir (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-5482?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Robert Muir resolved LUCENE-5482.
-

   Resolution: Fixed
Fix Version/s: 5.0

Thanks Ahmet!

I made one addition: I also inserted this filter into the text_tr chain in the 
solr example.

> improve default TurkishAnalyzer
> ---
>
> Key: LUCENE-5482
> URL: https://issues.apache.org/jira/browse/LUCENE-5482
> Project: Lucene - Core
>  Issue Type: Improvement
>  Components: modules/analysis
>Affects Versions: 4.7
>Reporter: Ahmet Arslan
>Assignee: Robert Muir
>Priority: Minor
>  Labels: Turkish, analysis
> Fix For: 4.8, 5.0
>
> Attachments: LUCENE-5482.patch, LUCENE-5482.patch, LUCENE-5482.patch
>
>
> Add a TokenFilter that strips characters after an apostrophe (including the 
> apostrophe itself). 



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-5477) add near-real-time suggest building to AnalyzingInfixSuggester

2014-02-28 Thread ASF subversion and git services (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-5477?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13916388#comment-13916388
 ] 

ASF subversion and git services commented on LUCENE-5477:
-

Commit 1573080 from [~mikemccand] in branch 'dev/branches/lucene5376'
[ https://svn.apache.org/r1573080 ]

LUCENE-5376, LUCENE-5477: add near-real-time suggest updates when using 
AnalyzingInfixSuggester to Lucene demo server

> add near-real-time suggest building to AnalyzingInfixSuggester
> --
>
> Key: LUCENE-5477
> URL: https://issues.apache.org/jira/browse/LUCENE-5477
> Project: Lucene - Core
>  Issue Type: Improvement
>  Components: modules/spellchecker
>Reporter: Michael McCandless
>Assignee: Michael McCandless
> Fix For: 4.8, 5.0
>
> Attachments: LUCENE-5477.patch, LUCENE-5477.patch
>
>
> Because this suggester impl. is just a Lucene index under-the-hood, it should 
> be straightforward to enable near-real-time additions/removals of suggestions.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-5376) Add a demo search server

2014-02-28 Thread ASF subversion and git services (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-5376?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13916387#comment-13916387
 ] 

ASF subversion and git services commented on LUCENE-5376:
-

Commit 1573080 from [~mikemccand] in branch 'dev/branches/lucene5376'
[ https://svn.apache.org/r1573080 ]

LUCENE-5376, LUCENE-5477: add near-real-time suggest updates when using 
AnalyzingInfixSuggester to Lucene demo server

> Add a demo search server
> 
>
> Key: LUCENE-5376
> URL: https://issues.apache.org/jira/browse/LUCENE-5376
> Project: Lucene - Core
>  Issue Type: Improvement
>Reporter: Michael McCandless
>Assignee: Michael McCandless
> Attachments: lucene-demo-server.tgz
>
>
> I think it'd be useful to have a "demo" search server for Lucene.
> Rather than being fully featured, like Solr, it would be minimal, just 
> wrapping the existing Lucene modules to show how you can make use of these 
> features in a server setting.
> The purpose is to demonstrate how one can build a minimal search server on 
> top of APIs like SearchManager, SearcherLifetimeManager, etc.
> This is also useful for finding rough edges / issues in Lucene's APIs that 
> make building a server unnecessarily hard.
> I don't think it should have back compatibility promises (except Lucene's 
> index back compatibility), so it's free to improve as Lucene's APIs change.
> As a starting point, I'll post what I built for the "eating your own dog 
> food" search app for Lucene's & Solr's jira issues 
> http://jirasearch.mikemccandless.com (blog: 
> http://blog.mikemccandless.com/2013/05/eating-dog-food-with-lucene.html ). It 
> uses Netty to expose basic indexing & searching APIs via JSON, but it's very 
> rough (lots nocommits).



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Created] (SOLR-5797) Explain plan transform does not work in Solr cloud

2014-02-28 Thread Divya Mehta (JIRA)
Divya Mehta created SOLR-5797:
-

 Summary: Explain plan transform does not work in Solr cloud
 Key: SOLR-5797
 URL: https://issues.apache.org/jira/browse/SOLR-5797
 Project: Solr
  Issue Type: Bug
Affects Versions: 4.4
Reporter: Divya Mehta


explain plan works as expected on single solr node, After moving to Solr Cloud, 
it does not show any explanation field in returned documents.

This is how we ask for explain output in our SolrQuery, as

SolrQuery sq = new SolrQuery();


if (args.getExplain()) {
sq.setParam(CommonParams.DEBUG_QUERY, true);
sq.addField("explanation:[explain style=text]");
}

I checked the logs at both single node and cloud, but request and its 
parameters are exactly the same.

Is this a known issue or does it need some other configuration to make it work 
on solr cloud. We have one main node and one shard and using standalone 
zookeeper to manage solr cloud.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (SOLR-5783) Can we stop opening a new searcher when the index hasn't changed?

2014-02-28 Thread Hoss Man (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-5783?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hoss Man updated SOLR-5783:
---

Attachment: SOLR-5783.patch

Updated patch:

* hardend the modified assertion in TestIndexSearcher.testReopen to verify it 
truly is the exact same searcher
* added a new TestSearcherReuse to verify that we get the same searcher after 
doing various things that are No-Ops, and we get a new searcher after doing 
things that modify the index.

still one nocommit: i want to make this use managed-schema and include a check 
that modifing the schema w/o any data cahnges results in a newSearcher (need to 
poke around the managed schema tests more to figure out how to do that)

> Can we stop opening a new searcher when the index hasn't changed?
> -
>
> Key: SOLR-5783
> URL: https://issues.apache.org/jira/browse/SOLR-5783
> Project: Solr
>  Issue Type: Improvement
>Reporter: Hoss Man
> Attachments: SOLR-5783.patch, SOLR-5783.patch, SOLR-5783.patch
>
>
> I've been thinking recently about how/when we re-open searchers -- and what 
> the overhead of that is in terms of caches and what not -- even if the 
> underlying index hasn't changed.  
> The particular real world case that got me thinking about this recently is 
> when a deleteByQuery gets forwarded to all shards in a collection, and then 
> the subsequent (soft)Commit (either auto or explicit) opens a new searcher -- 
> even if that shard was completley uneffected by the delete.
> It got me wondering: why don't re-use the same searcher when the index is 
> unchanged?
> From what I can tell, we're basically 99% of the way there (in 
> {{}})...
> * IndexWriter.commit is already smart enough to short circut if there's 
> nothing to commit
> * SolrCore.openNewSearcher already uses DirectoryReader.openIfChanged to see 
> if the reader can be re-used.
> * for "realtime" purposes, SolrCore.openNewSearcher will return the existing 
> searcher if it exists and the DirectoryReader hasn't changed
> ...The only reason I could think of for not _always_ re-using the same 
> searcher when the underlying DirectoryReader is identical (ie: that last 
> bullet above) is in the situation where the "live" schema has changed -- but 
> that seems pretty trivial to account for.
> Is there any other reason why this wouldn't be a good idea for improving 
> performance?



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-5796) With many collections, leader re-election takes too long when a node dies or is rebooted, leading to some shards getting into a "conflicting" state about who is the lead

2014-02-28 Thread Timothy Potter (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-5796?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13916521#comment-13916521
 ] 

Timothy Potter commented on SOLR-5796:
--

Little more to this for today ...

I tried this on more powerful nodes (m3.2xlarge) and changed the wait to tries 
== 180 (instead of 60) and viola, the restarted node came back as expected. 
This begs the question whether we should make that wait period configurable for 
those installations that have many collections in a cluster? To be clear, I'm 
referring to the wait period in ZkController, while loop starting around line 
852 (see above). I'd prefer to have something more deterministic vs. an upper 
limit on waiting as that seems like a ticking time bomb in a busy cluster. I'm 
going to try a few more ideas out over the weekend.

> With many collections, leader re-election takes too long when a node dies or 
> is rebooted, leading to some shards getting into a "conflicting" state about 
> who is the leader.
> 
>
> Key: SOLR-5796
> URL: https://issues.apache.org/jira/browse/SOLR-5796
> Project: Solr
>  Issue Type: Bug
>  Components: SolrCloud
> Environment: Found on branch_4x
>Reporter: Timothy Potter
>
> I'm doing some testing with a 4-node SolrCloud cluster against the latest rev 
> in branch_4x having many collections, 150 to be exact, each having 4 shards 
> with rf=3, so 450 cores per node. Nodes are decent in terms of resources: 
> -Xmx6g with 4 CPU - m3.xlarge's in EC2.
> The problem occurs when rebooting one of the nodes, say as part of a rolling 
> restart of the cluster. If I kill one node and then wait for an extended 
> period of time, such as 3 minutes, then all of the leaders on the downed node 
> (roughly 150) have time to failover to another node in the cluster. When I 
> restart the downed node, since leaders have all failed over successfully, the 
> new node starts up and all cores assume the replica role in their respective 
> shards. This is goodness and expected.
> However, if I don't wait long enough for the leader failover process to 
> complete on the other nodes before restarting the downed node, 
> then some bad things happen. Specifically, when the dust settles, many of the 
> previous leaders on the node I restarted get stuck in the "conflicting" state 
> seen in the ZkController, starting around line 852 in branch_4x:
> {quote}
> 852   while (!leaderUrl.equals(clusterStateLeaderUrl)) {
> 853 if (tries == 60) {
> 854   throw new SolrException(ErrorCode.SERVER_ERROR,
> 855   "There is conflicting information about the leader of 
> shard: "
> 856   + cloudDesc.getShardId() + " our state says:"
> 857   + clusterStateLeaderUrl + " but zookeeper says:" + 
> leaderUrl);
> 858 }
> 859 Thread.sleep(1000);
> 860 tries++;
> 861 clusterStateLeaderUrl = zkStateReader.getLeaderUrl(collection, 
> shardId,
> 862 timeoutms);
> 863 leaderUrl = getLeaderProps(collection, cloudDesc.getShardId(), 
> timeoutms)
> 864 .getCoreUrl();
> 865   }
> {quote}
> As you can see, the code is trying to give a little time for this problem to 
> work itself out, 1 minute to be exact. Unfortunately, that doesn't seem to be 
> long enough for a busy cluster that has many collections. Now, one might 
> argue that 450 cores per node is asking too much of Solr, however I think 
> this points to a bigger issue of the fact that a node coming up isn't aware 
> that it went down and leader election is running on other nodes and is just 
> being slow. Moreover, once this problem occurs, it's not clear how to fix it 
> besides shutting the node down again and waiting for leader failover to 
> complete.
> It's also interesting to me that /clusterstate.json was updated by the 
> healthy node taking over the leader role but the 
> /collections/leaders/shard# was not updated? I added some debugging and 
> it seems like the overseer queue is extremely backed up with work.
> Maybe the solution here is to just wait longer but I also want to get some 
> feedback from the community on other options? I know there are some plans to 
> help scale the Overseer (i.e. SOLR-5476) so maybe that helps and I'm trying 
> to add more debug to see if this is really due to overseer backlog (which I 
> suspect it is).
> In general, I'm a little confused by the keeping of leader state in multiple 
> places in ZK. Is there any background information on why we have leader state 
> in /clusterstate.json and in the leader path znode?
> Also, here are some interesting side observations:
> a. If I use rf=2, then this problem 

[jira] [Commented] (SOLR-5796) With many collections, leader re-election takes too long when a node dies or is rebooted, leading to some shards getting into a "conflicting" state about who is the lead

2014-02-28 Thread Mark Miller (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-5796?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13916575#comment-13916575
 ] 

Mark Miller commented on SOLR-5796:
---

Cool - recently saw a user post a problem with this conflicting state - glad to 
see you already have a jump on it :)

> With many collections, leader re-election takes too long when a node dies or 
> is rebooted, leading to some shards getting into a "conflicting" state about 
> who is the leader.
> 
>
> Key: SOLR-5796
> URL: https://issues.apache.org/jira/browse/SOLR-5796
> Project: Solr
>  Issue Type: Bug
>  Components: SolrCloud
> Environment: Found on branch_4x
>Reporter: Timothy Potter
>
> I'm doing some testing with a 4-node SolrCloud cluster against the latest rev 
> in branch_4x having many collections, 150 to be exact, each having 4 shards 
> with rf=3, so 450 cores per node. Nodes are decent in terms of resources: 
> -Xmx6g with 4 CPU - m3.xlarge's in EC2.
> The problem occurs when rebooting one of the nodes, say as part of a rolling 
> restart of the cluster. If I kill one node and then wait for an extended 
> period of time, such as 3 minutes, then all of the leaders on the downed node 
> (roughly 150) have time to failover to another node in the cluster. When I 
> restart the downed node, since leaders have all failed over successfully, the 
> new node starts up and all cores assume the replica role in their respective 
> shards. This is goodness and expected.
> However, if I don't wait long enough for the leader failover process to 
> complete on the other nodes before restarting the downed node, 
> then some bad things happen. Specifically, when the dust settles, many of the 
> previous leaders on the node I restarted get stuck in the "conflicting" state 
> seen in the ZkController, starting around line 852 in branch_4x:
> {quote}
> 852   while (!leaderUrl.equals(clusterStateLeaderUrl)) {
> 853 if (tries == 60) {
> 854   throw new SolrException(ErrorCode.SERVER_ERROR,
> 855   "There is conflicting information about the leader of 
> shard: "
> 856   + cloudDesc.getShardId() + " our state says:"
> 857   + clusterStateLeaderUrl + " but zookeeper says:" + 
> leaderUrl);
> 858 }
> 859 Thread.sleep(1000);
> 860 tries++;
> 861 clusterStateLeaderUrl = zkStateReader.getLeaderUrl(collection, 
> shardId,
> 862 timeoutms);
> 863 leaderUrl = getLeaderProps(collection, cloudDesc.getShardId(), 
> timeoutms)
> 864 .getCoreUrl();
> 865   }
> {quote}
> As you can see, the code is trying to give a little time for this problem to 
> work itself out, 1 minute to be exact. Unfortunately, that doesn't seem to be 
> long enough for a busy cluster that has many collections. Now, one might 
> argue that 450 cores per node is asking too much of Solr, however I think 
> this points to a bigger issue of the fact that a node coming up isn't aware 
> that it went down and leader election is running on other nodes and is just 
> being slow. Moreover, once this problem occurs, it's not clear how to fix it 
> besides shutting the node down again and waiting for leader failover to 
> complete.
> It's also interesting to me that /clusterstate.json was updated by the 
> healthy node taking over the leader role but the 
> /collections/leaders/shard# was not updated? I added some debugging and 
> it seems like the overseer queue is extremely backed up with work.
> Maybe the solution here is to just wait longer but I also want to get some 
> feedback from the community on other options? I know there are some plans to 
> help scale the Overseer (i.e. SOLR-5476) so maybe that helps and I'm trying 
> to add more debug to see if this is really due to overseer backlog (which I 
> suspect it is).
> In general, I'm a little confused by the keeping of leader state in multiple 
> places in ZK. Is there any background information on why we have leader state 
> in /clusterstate.json and in the leader path znode?
> Also, here are some interesting side observations:
> a. If I use rf=2, then this problem doesn't occur as leader failover happens 
> more quickly and there's less overseer work? 
> May be a red herring here, but I can consistently reproduce it with RF=3, but 
> not with RF=2 ... suppose that is because there are only 300 cores per node 
> versus 450 and that's just enough less work to make this issue work itself 
> out.
> b. To support that many cores, I had to set -Xss256k to reduce the stack size 
> as Solr uses a lot of threads during startup (high point was 800'ish) 
>  
> Might be something we s

[jira] [Commented] (SOLR-5796) With many collections, leader re-election takes too long when a node dies or is rebooted, leading to some shards getting into a "conflicting" state about who is the lead

2014-02-28 Thread Mark Miller (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-5796?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13916579#comment-13916579
 ] 

Mark Miller commented on SOLR-5796:
---

{noformat}
This begs the question whether we should make that wait period configurable for 
those installations that have many collections in a cluster? To be clear, I'm 
referring to the wait period in ZkController, while loop starting around line 
852 (see above).
{noformat}

I have no qualms with making any timeouts configurable, but it seems the 
defaults should be fairly high as well - we want to work out of the box with 
most reasonable setups if possible.

+1 to making it configurable, but let's also crank it up.

> With many collections, leader re-election takes too long when a node dies or 
> is rebooted, leading to some shards getting into a "conflicting" state about 
> who is the leader.
> 
>
> Key: SOLR-5796
> URL: https://issues.apache.org/jira/browse/SOLR-5796
> Project: Solr
>  Issue Type: Bug
>  Components: SolrCloud
> Environment: Found on branch_4x
>Reporter: Timothy Potter
>
> I'm doing some testing with a 4-node SolrCloud cluster against the latest rev 
> in branch_4x having many collections, 150 to be exact, each having 4 shards 
> with rf=3, so 450 cores per node. Nodes are decent in terms of resources: 
> -Xmx6g with 4 CPU - m3.xlarge's in EC2.
> The problem occurs when rebooting one of the nodes, say as part of a rolling 
> restart of the cluster. If I kill one node and then wait for an extended 
> period of time, such as 3 minutes, then all of the leaders on the downed node 
> (roughly 150) have time to failover to another node in the cluster. When I 
> restart the downed node, since leaders have all failed over successfully, the 
> new node starts up and all cores assume the replica role in their respective 
> shards. This is goodness and expected.
> However, if I don't wait long enough for the leader failover process to 
> complete on the other nodes before restarting the downed node, 
> then some bad things happen. Specifically, when the dust settles, many of the 
> previous leaders on the node I restarted get stuck in the "conflicting" state 
> seen in the ZkController, starting around line 852 in branch_4x:
> {quote}
> 852   while (!leaderUrl.equals(clusterStateLeaderUrl)) {
> 853 if (tries == 60) {
> 854   throw new SolrException(ErrorCode.SERVER_ERROR,
> 855   "There is conflicting information about the leader of 
> shard: "
> 856   + cloudDesc.getShardId() + " our state says:"
> 857   + clusterStateLeaderUrl + " but zookeeper says:" + 
> leaderUrl);
> 858 }
> 859 Thread.sleep(1000);
> 860 tries++;
> 861 clusterStateLeaderUrl = zkStateReader.getLeaderUrl(collection, 
> shardId,
> 862 timeoutms);
> 863 leaderUrl = getLeaderProps(collection, cloudDesc.getShardId(), 
> timeoutms)
> 864 .getCoreUrl();
> 865   }
> {quote}
> As you can see, the code is trying to give a little time for this problem to 
> work itself out, 1 minute to be exact. Unfortunately, that doesn't seem to be 
> long enough for a busy cluster that has many collections. Now, one might 
> argue that 450 cores per node is asking too much of Solr, however I think 
> this points to a bigger issue of the fact that a node coming up isn't aware 
> that it went down and leader election is running on other nodes and is just 
> being slow. Moreover, once this problem occurs, it's not clear how to fix it 
> besides shutting the node down again and waiting for leader failover to 
> complete.
> It's also interesting to me that /clusterstate.json was updated by the 
> healthy node taking over the leader role but the 
> /collections/leaders/shard# was not updated? I added some debugging and 
> it seems like the overseer queue is extremely backed up with work.
> Maybe the solution here is to just wait longer but I also want to get some 
> feedback from the community on other options? I know there are some plans to 
> help scale the Overseer (i.e. SOLR-5476) so maybe that helps and I'm trying 
> to add more debug to see if this is really due to overseer backlog (which I 
> suspect it is).
> In general, I'm a little confused by the keeping of leader state in multiple 
> places in ZK. Is there any background information on why we have leader state 
> in /clusterstate.json and in the leader path znode?
> Also, here are some interesting side observations:
> a. If I use rf=2, then this problem doesn't occur as leader failover happens 
> more quickly and there's less overseer work? 
> May be a red herring here, but

  1   2   >