[jira] [Commented] (LUCENE-5302) Make StemmerOverrideMap methods public

2013-10-25 Thread Robert Muir (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-5302?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13806001#comment-13806001
 ] 

Robert Muir commented on LUCENE-5302:
-

The \@link was broken before, javadocs were just never generated because it 
only had package visibility.

I think in this case the \@link just has to be qualified as 
FST.Arc/FST.BytesReader or fully-qualify or whatever.

> Make StemmerOverrideMap methods public
> --
>
> Key: LUCENE-5302
> URL: https://issues.apache.org/jira/browse/LUCENE-5302
> Project: Lucene - Core
>  Issue Type: Improvement
>Reporter: Alan Woodward
>Priority: Minor
> Attachments: LUCENE-5302.patch
>
>
> StemmerOverrideFilter is configured with an FST-based map that you can build 
> at construction time from a list of entries.  Building this FST offline and 
> loading it directly as a bytestream makes construction a lot quicker, but you 
> can't do that conveniently at the moment as all the methods of 
> StemmerOverrideMap are package-private.



--
This message was sent by Atlassian JIRA
(v6.1#6144)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



Lucene40TermVectorsReader TVTermsEnum totalTermFreq() is not a total

2013-10-25 Thread Tom Burton-West
Hi all,

I was reading some code that calls Lucene40TermVectorsReader
TVTermsEnum

The method totalTermFreq() actually returns freq and the method docFreq()
returns 1.
Once you think about the context this sort of makes sense but I found this
confusing.

I'm guessing there is a good reason for the method to be called
totalTermFreq(), but I would like to know what that is.  Also is there
documentation somewhere in the javadocs that explains this?

Better yet, is there a good example of how to use the Lucene 4.x
TermVectors API?


Tom


Re: What is recommended version of jdk 1.7?

2013-10-25 Thread Israel Ekpo
Also,

It is sometimes difficult to find this specific version.

You can download it here

http://www.oracle.com/technetwork/java/javase/downloads/java-archive-downloads-javase7-521261.html#jre-7u25-oth-JPR




On Wed, Oct 23, 2013 at 8:46 AM, Uwe Schindler  wrote:

> Use u25, this ist he latest stable version and works very fine.
>
> ** **
>
> -
>
> Uwe Schindler
>
> H.-H.-Meier-Allee 63, D-28213 Bremen
>
> http://www.thetaphi.de
>
> eMail: u...@thetaphi.de
>
> ** **
>
> *From:* Danil ŢORIN [mailto:torin...@gmail.com]
> *Sent:* Wednesday, October 23, 2013 1:40 PM
> *To:* lucene-...@apache.org
> *Subject:* What is recommended version of jdk 1.7?
>
> ** **
>
> We had some problems with u45.
>
> I know there are several jiras, and a bug report for oracle.
>
> ** **
>
> But my question in more pragmatic: when running test for release like
> latest 4.5.1, what jvm (preferably 1.7) did you used ?
>
> ** **
>
> What is the lastest but safe version to use with Lucene?
>
> ** **
>



-- 
°O°
"Good Enough" is not good enough.
To give anything less than your best is to sacrifice the gift.
Quality First. Measure Twice. Cut Once.
http://www.israelekpo.com/


[jira] [Commented] (SOLR-5392) extend solrj apis to cover collection management

2013-10-25 Thread Mark Miller (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-5392?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13805743#comment-13805743
 ] 

Mark Miller commented on SOLR-5392:
---

bq.  I completely aped CoreAdminRequest and CoreAdminResponse keeping up with 
all the stylistic idiosyncrasies of the two

+1 - until someone is willing to clean up the whole shebang.

> extend solrj apis to cover collection management
> 
>
> Key: SOLR-5392
> URL: https://issues.apache.org/jira/browse/SOLR-5392
> Project: Solr
>  Issue Type: Improvement
>  Components: clients - java
>Affects Versions: 4.5
>Reporter: Roman Shaposhnik
> Attachments: 
> 0001-SOLR-5392.-extend-solrj-apis-to-cover-collection-man.patch
>
>
> It would be useful to extend solrj APIs to cover collection management calls: 
> https://cwiki.apache.org/confluence/display/solr/Collections+API 



--
This message was sent by Atlassian JIRA
(v6.1#6144)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (LUCENE-5307) Inconsistency between Weight.scorer documentation and ConstantScoreQuery on top of a Filter

2013-10-25 Thread Uwe Schindler (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-5307?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Uwe Schindler updated LUCENE-5307:
--

Attachment: LUCENE-5307.patch

Simplified patch. API of ConstantScorer does not change!

Thanks Adrien for review, will commit tomorrow!

> Inconsistency between Weight.scorer documentation and ConstantScoreQuery on 
> top of a Filter
> ---
>
> Key: LUCENE-5307
> URL: https://issues.apache.org/jira/browse/LUCENE-5307
> Project: Lucene - Core
>  Issue Type: Bug
>  Components: core/search
>Reporter: Adrien Grand
>Assignee: Uwe Schindler
>Priority: Minor
> Fix For: 4.6, 5.0
>
> Attachments: LUCENE-5307.patch, LUCENE-5307.patch, 
> LUCENE-5307-test.patch
>
>
> {{Weight.scorer}} states that if {{topScorer == true}}, {{Scorer.collect}} 
> will be called and that otherwise {{Scorer.nextDoc/advance}} will be called.
> This is a problem when {{ConstantScoreQuery}} is used on top of a 
> {{QueryWrapperFilter}}:
>  1. {{ConstantScoreWeight}}  calls {{getDocIdSet}} on the filter to know 
> which documents to collect.
>  2. {{QueryWrapperFilter.getDocIdSet}} returns a {{Scorer}} created with 
> {{topScorer == false}} so that {{nextDoc/advance}} are supported.
>  3. But then {{ConstantScorer.score(Collector)}} has the following 
> optimization:
> {code}
> // this optimization allows out of order scoring as top scorer!
> @Override
> public void score(Collector collector) throws IOException {
>   if (docIdSetIterator instanceof Scorer) {
> ((Scorer) docIdSetIterator).score(wrapCollector(collector));
>   } else {
> super.score(collector);
>   }
> }
> {code}
> So the filter iterator is a scorer which was created with {{topScorer = 
> false}} but {{ParentScorer}} ends up using its {{score(Collector)}} method, 
> which is illegal. (I found this out because AssertingSearcher has some checks 
> to make sure Scorers are used accordingly to the value of topScorer.)
> I can imagine several fixes, including:
>  - removing this optimization when working on top of a filter
>  - relaxing Weight documentation to allow for using {{score(Collector)}} when 
> {{topScorer == false}}
> but I'm not sure which one is the best one. What do you think?



--
This message was sent by Atlassian JIRA
(v6.1#6144)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (SOLR-5392) extend solrj apis to cover collection management

2013-10-25 Thread Roman Shaposhnik (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-5392?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Roman Shaposhnik updated SOLR-5392:
---

Attachment: 0001-SOLR-5392.-extend-solrj-apis-to-cover-collection-man.patch

Please consider the following patch against trunk.

What I did here is -- I completely aped CoreAdminRequest and CoreAdminResponse 
keeping up with all the stylistic idiosyncrasies of the two. Hope this was the 
right thing to do.

Either way, please let me know what do you guys think.

> extend solrj apis to cover collection management
> 
>
> Key: SOLR-5392
> URL: https://issues.apache.org/jira/browse/SOLR-5392
> Project: Solr
>  Issue Type: Improvement
>  Components: clients - java
>Affects Versions: 4.5
>Reporter: Roman Shaposhnik
> Attachments: 
> 0001-SOLR-5392.-extend-solrj-apis-to-cover-collection-man.patch
>
>
> It would be useful to extend solrj APIs to cover collection management calls: 
> https://cwiki.apache.org/confluence/display/solr/Collections+API 



--
This message was sent by Atlassian JIRA
(v6.1#6144)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-5239) Create and edit cwiki page for clustering search results

2013-10-25 Thread Dawid Weiss (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-5239?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13805708#comment-13805708
 ] 

Dawid Weiss commented on SOLR-5239:
---

Thanks Cassandra!

> Create and edit cwiki page for clustering search results
> 
>
> Key: SOLR-5239
> URL: https://issues.apache.org/jira/browse/SOLR-5239
> Project: Solr
>  Issue Type: Bug
>Reporter: Dawid Weiss
>Assignee: Dawid Weiss
>Priority: Minor
>
> Essentially pull out the information from:
> https://wiki.apache.org/solr/ClusteringComponent
> skipping any information about ancient versions?



--
This message was sent by Atlassian JIRA
(v6.1#6144)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-5239) Create and edit cwiki page for clustering search results

2013-10-25 Thread Cassandra Targett (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-5239?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13805702#comment-13805702
 ] 

Cassandra Targett commented on SOLR-5239:
-

[~dweiss] I finally got a chance to read this over, and I think it's really 
good the way you wrote it. The only thing I changed is the borders around your 
code examples - just to standardize them within the page and across all the 
pages. 

> Create and edit cwiki page for clustering search results
> 
>
> Key: SOLR-5239
> URL: https://issues.apache.org/jira/browse/SOLR-5239
> Project: Solr
>  Issue Type: Bug
>Reporter: Dawid Weiss
>Assignee: Dawid Weiss
>Priority: Minor
>
> Essentially pull out the information from:
> https://wiki.apache.org/solr/ClusteringComponent
> skipping any information about ancient versions?



--
This message was sent by Atlassian JIRA
(v6.1#6144)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-5307) Inconsistency between Weight.scorer documentation and ConstantScoreQuery on top of a Filter

2013-10-25 Thread Adrien Grand (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-5307?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13805673#comment-13805673
 ] 

Adrien Grand commented on LUCENE-5307:
--

+1!

> Inconsistency between Weight.scorer documentation and ConstantScoreQuery on 
> top of a Filter
> ---
>
> Key: LUCENE-5307
> URL: https://issues.apache.org/jira/browse/LUCENE-5307
> Project: Lucene - Core
>  Issue Type: Bug
>  Components: core/search
>Reporter: Adrien Grand
>Assignee: Uwe Schindler
>Priority: Minor
> Fix For: 4.6, 5.0
>
> Attachments: LUCENE-5307.patch, LUCENE-5307-test.patch
>
>
> {{Weight.scorer}} states that if {{topScorer == true}}, {{Scorer.collect}} 
> will be called and that otherwise {{Scorer.nextDoc/advance}} will be called.
> This is a problem when {{ConstantScoreQuery}} is used on top of a 
> {{QueryWrapperFilter}}:
>  1. {{ConstantScoreWeight}}  calls {{getDocIdSet}} on the filter to know 
> which documents to collect.
>  2. {{QueryWrapperFilter.getDocIdSet}} returns a {{Scorer}} created with 
> {{topScorer == false}} so that {{nextDoc/advance}} are supported.
>  3. But then {{ConstantScorer.score(Collector)}} has the following 
> optimization:
> {code}
> // this optimization allows out of order scoring as top scorer!
> @Override
> public void score(Collector collector) throws IOException {
>   if (docIdSetIterator instanceof Scorer) {
> ((Scorer) docIdSetIterator).score(wrapCollector(collector));
>   } else {
> super.score(collector);
>   }
> }
> {code}
> So the filter iterator is a scorer which was created with {{topScorer = 
> false}} but {{ParentScorer}} ends up using its {{score(Collector)}} method, 
> which is illegal. (I found this out because AssertingSearcher has some checks 
> to make sure Scorers are used accordingly to the value of topScorer.)
> I can imagine several fixes, including:
>  - removing this optimization when working on top of a filter
>  - relaxing Weight documentation to allow for using {{score(Collector)}} when 
> {{topScorer == false}}
> but I'm not sure which one is the best one. What do you think?



--
This message was sent by Atlassian JIRA
(v6.1#6144)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (LUCENE-5307) Inconsistency between Weight.scorer documentation and ConstantScoreQuery on top of a Filter

2013-10-25 Thread Uwe Schindler (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-5307?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Uwe Schindler updated LUCENE-5307:
--

  Component/s: core/search
Fix Version/s: 5.0
   4.6

> Inconsistency between Weight.scorer documentation and ConstantScoreQuery on 
> top of a Filter
> ---
>
> Key: LUCENE-5307
> URL: https://issues.apache.org/jira/browse/LUCENE-5307
> Project: Lucene - Core
>  Issue Type: Bug
>  Components: core/search
>Reporter: Adrien Grand
>Assignee: Uwe Schindler
>Priority: Minor
> Fix For: 4.6, 5.0
>
> Attachments: LUCENE-5307.patch, LUCENE-5307-test.patch
>
>
> {{Weight.scorer}} states that if {{topScorer == true}}, {{Scorer.collect}} 
> will be called and that otherwise {{Scorer.nextDoc/advance}} will be called.
> This is a problem when {{ConstantScoreQuery}} is used on top of a 
> {{QueryWrapperFilter}}:
>  1. {{ConstantScoreWeight}}  calls {{getDocIdSet}} on the filter to know 
> which documents to collect.
>  2. {{QueryWrapperFilter.getDocIdSet}} returns a {{Scorer}} created with 
> {{topScorer == false}} so that {{nextDoc/advance}} are supported.
>  3. But then {{ConstantScorer.score(Collector)}} has the following 
> optimization:
> {code}
> // this optimization allows out of order scoring as top scorer!
> @Override
> public void score(Collector collector) throws IOException {
>   if (docIdSetIterator instanceof Scorer) {
> ((Scorer) docIdSetIterator).score(wrapCollector(collector));
>   } else {
> super.score(collector);
>   }
> }
> {code}
> So the filter iterator is a scorer which was created with {{topScorer = 
> false}} but {{ParentScorer}} ends up using its {{score(Collector)}} method, 
> which is illegal. (I found this out because AssertingSearcher has some checks 
> to make sure Scorers are used accordingly to the value of topScorer.)
> I can imagine several fixes, including:
>  - removing this optimization when working on top of a filter
>  - relaxing Weight documentation to allow for using {{score(Collector)}} when 
> {{topScorer == false}}
> but I'm not sure which one is the best one. What do you think?



--
This message was sent by Atlassian JIRA
(v6.1#6144)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-5189) Numeric DocValues Updates

2013-10-25 Thread Michael McCandless (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-5189?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13805658#comment-13805658
 ] 

Michael McCandless commented on LUCENE-5189:


+1 to backport.

> Numeric DocValues Updates
> -
>
> Key: LUCENE-5189
> URL: https://issues.apache.org/jira/browse/LUCENE-5189
> Project: Lucene - Core
>  Issue Type: New Feature
>  Components: core/index
>Reporter: Shai Erera
>Assignee: Shai Erera
> Attachments: LUCENE-5189-4x.patch, LUCENE-5189-4x.patch, 
> LUCENE-5189-no-lost-updates.patch, LUCENE-5189.patch, LUCENE-5189.patch, 
> LUCENE-5189.patch, LUCENE-5189.patch, LUCENE-5189.patch, LUCENE-5189.patch, 
> LUCENE-5189.patch, LUCENE-5189.patch, LUCENE-5189.patch, LUCENE-5189.patch, 
> LUCENE-5189.patch, LUCENE-5189_process_events.patch, 
> LUCENE-5189_process_events.patch, LUCENE-5189-segdv.patch, 
> LUCENE-5189-updates-order.patch, LUCENE-5189-updates-order.patch
>
>
> In LUCENE-4258 we started to work on incremental field updates, however the 
> amount of changes are immense and hard to follow/consume. The reason is that 
> we targeted postings, stored fields, DV etc., all from the get go.
> I'd like to start afresh here, with numeric-dv-field updates only. There are 
> a couple of reasons to that:
> * NumericDV fields should be easier to update, if e.g. we write all the 
> values of all the documents in a segment for the updated field (similar to 
> how livedocs work, and previously norms).
> * It's a fairly contained issue, attempting to handle just one data type to 
> update, yet requires many changes to core code which will also be useful for 
> updating other data types.
> * It has value in and on itself, and we don't need to allow updating all the 
> data types in Lucene at once ... we can do that gradually.
> I have some working patch already which I'll upload next, explaining the 
> changes.



--
This message was sent by Atlassian JIRA
(v6.1#6144)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-5302) Make StemmerOverrideMap methods public

2013-10-25 Thread Alan Woodward (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-5302?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13805609#comment-13805609
 ] 

Alan Woodward commented on LUCENE-5302:
---

Hm, this patch fails ant precommit with a javadocs warning:

lucene/analysis/common/src/java/org/apache/lucene/analysis/miscellaneous/StemmerOverrideFilter.java:111:
 warning - Tag @link: can't find get(char[], int, Arc, BytesReader) in 
org.apache.lucene.analysis.miscellaneous.StemmerOverrideFilter.StemmerOverrideMap

...even though that method's javadoc is definitely there.  Maybe because it's 
not defining the generic parameter on Arc?  Anybody have any ideas, apart from 
changing the javadoc from a link var to a code var?

> Make StemmerOverrideMap methods public
> --
>
> Key: LUCENE-5302
> URL: https://issues.apache.org/jira/browse/LUCENE-5302
> Project: Lucene - Core
>  Issue Type: Improvement
>Reporter: Alan Woodward
>Priority: Minor
> Attachments: LUCENE-5302.patch
>
>
> StemmerOverrideFilter is configured with an FST-based map that you can build 
> at construction time from a list of entries.  Building this FST offline and 
> loading it directly as a bytestream makes construction a lot quicker, but you 
> can't do that conveniently at the moment as all the methods of 
> StemmerOverrideMap are package-private.



--
This message was sent by Atlassian JIRA
(v6.1#6144)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (LUCENE-5307) Inconsistency between Weight.scorer documentation and ConstantScoreQuery on top of a Filter

2013-10-25 Thread Uwe Schindler (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-5307?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Uwe Schindler updated LUCENE-5307:
--

Attachment: LUCENE-5307.patch

Here is the patch. Thanks for the test, I was about to write a similar one!

> Inconsistency between Weight.scorer documentation and ConstantScoreQuery on 
> top of a Filter
> ---
>
> Key: LUCENE-5307
> URL: https://issues.apache.org/jira/browse/LUCENE-5307
> Project: Lucene - Core
>  Issue Type: Bug
>Reporter: Adrien Grand
>Assignee: Uwe Schindler
>Priority: Minor
> Attachments: LUCENE-5307.patch, LUCENE-5307-test.patch
>
>
> {{Weight.scorer}} states that if {{topScorer == true}}, {{Scorer.collect}} 
> will be called and that otherwise {{Scorer.nextDoc/advance}} will be called.
> This is a problem when {{ConstantScoreQuery}} is used on top of a 
> {{QueryWrapperFilter}}:
>  1. {{ConstantScoreWeight}}  calls {{getDocIdSet}} on the filter to know 
> which documents to collect.
>  2. {{QueryWrapperFilter.getDocIdSet}} returns a {{Scorer}} created with 
> {{topScorer == false}} so that {{nextDoc/advance}} are supported.
>  3. But then {{ConstantScorer.score(Collector)}} has the following 
> optimization:
> {code}
> // this optimization allows out of order scoring as top scorer!
> @Override
> public void score(Collector collector) throws IOException {
>   if (docIdSetIterator instanceof Scorer) {
> ((Scorer) docIdSetIterator).score(wrapCollector(collector));
>   } else {
> super.score(collector);
>   }
> }
> {code}
> So the filter iterator is a scorer which was created with {{topScorer = 
> false}} but {{ParentScorer}} ends up using its {{score(Collector)}} method, 
> which is illegal. (I found this out because AssertingSearcher has some checks 
> to make sure Scorers are used accordingly to the value of topScorer.)
> I can imagine several fixes, including:
>  - removing this optimization when working on top of a filter
>  - relaxing Weight documentation to allow for using {{score(Collector)}} when 
> {{topScorer == false}}
> but I'm not sure which one is the best one. What do you think?



--
This message was sent by Atlassian JIRA
(v6.1#6144)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (LUCENE-5307) Inconsistency between Weight.scorer documentation and ConstantScoreQuery on top of a Filter

2013-10-25 Thread Adrien Grand (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-5307?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Adrien Grand updated LUCENE-5307:
-

Attachment: LUCENE-5307-test.patch

Here is a test that fails (feel free to not reuse it, this is just to 
demonstrate the problem).

> Inconsistency between Weight.scorer documentation and ConstantScoreQuery on 
> top of a Filter
> ---
>
> Key: LUCENE-5307
> URL: https://issues.apache.org/jira/browse/LUCENE-5307
> Project: Lucene - Core
>  Issue Type: Bug
>Reporter: Adrien Grand
>Assignee: Uwe Schindler
>Priority: Minor
> Attachments: LUCENE-5307-test.patch
>
>
> {{Weight.scorer}} states that if {{topScorer == true}}, {{Scorer.collect}} 
> will be called and that otherwise {{Scorer.nextDoc/advance}} will be called.
> This is a problem when {{ConstantScoreQuery}} is used on top of a 
> {{QueryWrapperFilter}}:
>  1. {{ConstantScoreWeight}}  calls {{getDocIdSet}} on the filter to know 
> which documents to collect.
>  2. {{QueryWrapperFilter.getDocIdSet}} returns a {{Scorer}} created with 
> {{topScorer == false}} so that {{nextDoc/advance}} are supported.
>  3. But then {{ConstantScorer.score(Collector)}} has the following 
> optimization:
> {code}
> // this optimization allows out of order scoring as top scorer!
> @Override
> public void score(Collector collector) throws IOException {
>   if (docIdSetIterator instanceof Scorer) {
> ((Scorer) docIdSetIterator).score(wrapCollector(collector));
>   } else {
> super.score(collector);
>   }
> }
> {code}
> So the filter iterator is a scorer which was created with {{topScorer = 
> false}} but {{ParentScorer}} ends up using its {{score(Collector)}} method, 
> which is illegal. (I found this out because AssertingSearcher has some checks 
> to make sure Scorers are used accordingly to the value of topScorer.)
> I can imagine several fixes, including:
>  - removing this optimization when working on top of a filter
>  - relaxing Weight documentation to allow for using {{score(Collector)}} when 
> {{topScorer == false}}
> but I'm not sure which one is the best one. What do you think?



--
This message was sent by Atlassian JIRA
(v6.1#6144)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-5307) Inconsistency between Weight.scorer documentation and ConstantScoreQuery on top of a Filter

2013-10-25 Thread Adrien Grand (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-5307?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13805547#comment-13805547
 ] 

Adrien Grand commented on LUCENE-5307:
--

Thanks Uwe!

> Inconsistency between Weight.scorer documentation and ConstantScoreQuery on 
> top of a Filter
> ---
>
> Key: LUCENE-5307
> URL: https://issues.apache.org/jira/browse/LUCENE-5307
> Project: Lucene - Core
>  Issue Type: Bug
>Reporter: Adrien Grand
>Assignee: Uwe Schindler
>Priority: Minor
> Attachments: LUCENE-5307-test.patch
>
>
> {{Weight.scorer}} states that if {{topScorer == true}}, {{Scorer.collect}} 
> will be called and that otherwise {{Scorer.nextDoc/advance}} will be called.
> This is a problem when {{ConstantScoreQuery}} is used on top of a 
> {{QueryWrapperFilter}}:
>  1. {{ConstantScoreWeight}}  calls {{getDocIdSet}} on the filter to know 
> which documents to collect.
>  2. {{QueryWrapperFilter.getDocIdSet}} returns a {{Scorer}} created with 
> {{topScorer == false}} so that {{nextDoc/advance}} are supported.
>  3. But then {{ConstantScorer.score(Collector)}} has the following 
> optimization:
> {code}
> // this optimization allows out of order scoring as top scorer!
> @Override
> public void score(Collector collector) throws IOException {
>   if (docIdSetIterator instanceof Scorer) {
> ((Scorer) docIdSetIterator).score(wrapCollector(collector));
>   } else {
> super.score(collector);
>   }
> }
> {code}
> So the filter iterator is a scorer which was created with {{topScorer = 
> false}} but {{ParentScorer}} ends up using its {{score(Collector)}} method, 
> which is illegal. (I found this out because AssertingSearcher has some checks 
> to make sure Scorers are used accordingly to the value of topScorer.)
> I can imagine several fixes, including:
>  - removing this optimization when working on top of a filter
>  - relaxing Weight documentation to allow for using {{score(Collector)}} when 
> {{topScorer == false}}
> but I'm not sure which one is the best one. What do you think?



--
This message was sent by Atlassian JIRA
(v6.1#6144)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



Re: New JVM bug on MacOSX ?!?

2013-10-25 Thread Rory O'Donnell Oracle, Dublin Ireland

Hi Uwe,

I am adding Sean Coffey to this thread, I am out next week until Friday, 
1st of November.
Could you log a webbug (http://bugreport.sun.com/bugreport) , Sean will 
try to help

move it along.

Rgds,Rory


On 25/10/2013 17:42, Uwe Schindler wrote:

Hi,

 From the hs_err file it looks like this one happens only, if while reading from a socket 
(the HTTP socket) an I/O-error occurs. The native socket read code tries to throw an 
IOException and creates the message string from the "errno" macro of the libc 
and segfaults while doing this. So it could be a simple network stack problem on OSX 
where some errno/LastError has no valid message,

This is why we might not see this bug on other MacOSX machines, because this 
machine is not the fastest one and we know that read errors occur much more 
often (this is why Solr test fail on this OSX machine). So it might affect 
other OSX machines, if you make them very busy by a parallel SETI@home or 
whatever :-)

Uwe

-
Uwe Schindler
H.-H.-Meier-Allee 63, D-28213 Bremen
http://www.thetaphi.de
eMail: u...@thetaphi.de



-Original Message-
From: Uwe Schindler [mailto:u...@thetaphi.de]
Sent: Friday, October 25, 2013 6:15 PM
To: dev@lucene.apache.org
Cc: rory.odonn...@oracle.com
Subject: New JVM bug on MacOSX ?!?

Hi Rory, hi Lucene/Solr committers,

this is a JVM crash with 7u45 on MacOSX, but also happened with u40. I set
the broken build to be sticky, the hs_err file is also available (was archived 
as
build artifact):

Build:
http://jenkins.thetaphi.de/job/Lucene-Solr-trunk-MacOSX/946/

hs_err:
http://jenkins.thetaphi.de/job/Lucene-Solr-trunk-
MacOSX/946/artifact/solr/build/solr-core/test/J0/hs_err_pid185.log

SIGSEGV happens here:
[libjava.dylib+0x9a2b]  JNU_NewStringPlatform+0x1d3

We have seen this bug on Oct 10, too. It's also mentioned in this issue:
https://issues.apache.org/jira/browse/SOLR-4593
It only reproduces on MacOSX computers (sometimes). I happens in line with
other malloc/free bugs: MacOSX crashes the JVM quite often with
complaining about double free() on pointers malloc'ed before. MacOSX
seems to be very picky about double freeing pointers in their libc. If I have a
new failure about that one, I will post it, too. The double free() one
reproduces from time to time on all MacOSX machines. This one above was
only seen on this virtual machine (VirtualBOX 4.2.18 and 4.3.0, stock OSX
10.8.5 EFI64 Guest) up to now.

Uwe

-
Uwe Schindler
H.-H.-Meier-Allee 63, D-28213 Bremen
http://www.thetaphi.de
eMail: u...@thetaphi.de



-Original Message-
From: Policeman Jenkins Server [mailto:jenk...@thetaphi.de]
Sent: Friday, October 25, 2013 2:31 PM
To: dev@lucene.apache.org
Subject: [JENKINS] Lucene-Solr-trunk-MacOSX (64bit/jdk1.7.0) - Build #
946 - Still Failing!

Build: http://jenkins.thetaphi.de/job/Lucene-Solr-trunk-MacOSX/946/
Java: 64bit/jdk1.7.0 -XX:-UseCompressedOops -

XX:+UseConcMarkSweepGC

All tests passed

Build Log:
[...truncated 9765 lines...]
[junit4] JVM J0: stdout was not empty, see:
/Users/jenkins/workspace/Lucene-Solr-trunk-MacOSX/solr/build/solr-
core/test/temp/junit4-J0-20131025_122106_052.sysout
[junit4] >>> JVM J0: stdout (verbatim) 
[junit4] #
[junit4] # A fatal error has been detected by the Java Runtime
Environment:
[junit4] #
[junit4] #  SIGSEGV (0xb) at pc=0x00010feeba2b, pid=185, tid=134147
[junit4] #
[junit4] # JRE version: Java(TM) SE Runtime Environment
(7.0_45-b18) (build 1.7.0_45-b18)
[junit4] # Java VM: Java HotSpot(TM) 64-Bit Server VM (24.45-b08
mixed mode bsd-amd64 )
[junit4] # Problematic frame:
[junit4] # C  [libjava.dylib+0x9a2b]  JNU_NewStringPlatform+0x1d3
[junit4] #
[junit4] # Failed to write core dump. Core dumps have been
disabled. To enable core dumping, try "ulimit -c unlimited" before starting

Java again

[junit4] #
[junit4] # An error report file with more information is saved as:
[junit4] # /Users/jenkins/workspace/Lucene-Solr-trunk-
MacOSX/solr/build/solr-core/test/J0/hs_err_pid185.log
[junit4] [thread 139779 also had an error]
[junit4] #
[junit4] # If you would like to submit a bug report, please visit:
[junit4] #   http://bugreport.sun.com/bugreport/crash.jsp
[junit4] # The crash happened outside the Java Virtual Machine in
native code.
[junit4] # See problematic frame for where to report the bug.
[junit4] #
[junit4] <<< JVM J0: EOF 

[...truncated 1 lines...]
[junit4] ERROR: JVM J0 ended with an exception, command line:
/Library/Java/JavaVirtualMachines/jdk1.7.0_45.jdk/Contents/Home/jre/bi
n/ java -XX:-UseCompressedOops -XX:+UseConcMarkSweepGC -
XX:+HeapDumpOnOutOfMemoryError -
XX:HeapDumpPath=/Users/jenkins/workspace/Lucene-Solr-trunk-
MacOSX/heapdumps -Dtests.prefix=tests -

Dtests.seed=F31293EB573940CD -

Xmx512M -Dtests.iters= -Dtests.verbose=false -Dtests.infostream=false
- Dtests.codec=random -Dtests.postingsformat=random -
Dtests.docvaluesformat=random -Dte

[jira] [Commented] (LUCENE-5294) Suggester Dictionary implementation that takes expressions as term weights

2013-10-25 Thread ASF subversion and git services (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-5294?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13805519#comment-13805519
 ] 

ASF subversion and git services commented on LUCENE-5294:
-

Commit 1535798 from [~steve_rowe] in branch 'dev/branches/branch_4x'
[ https://svn.apache.org/r1535798 ]

LUCENE-5294: IntelliJ config (merged trunk r1535797)

> Suggester Dictionary implementation that takes expressions as term weights
> --
>
> Key: LUCENE-5294
> URL: https://issues.apache.org/jira/browse/LUCENE-5294
> Project: Lucene - Core
>  Issue Type: New Feature
>  Components: core/search
>Reporter: Areek Zillur
> Fix For: 4.6, 5.0
>
> Attachments: LUCENE-5294.patch
>
>
> It would be nice to have a Suggester Dictionary implementation that could 
> compute the weights of the terms consumed by the suggester based on an 
> user-defined expression (using lucene's expression module).
> It could be an extension of the existing DocumentDictionary (which takes 
> terms, weights and (optionally) payloads from the stored documents in the 
> index). The only exception being that instead of taking the weights for the 
> terms from the specified weight fields, it could compute the weights using an 
> user-defn expression, that uses one or more NumicDocValuesField from the 
> document.
> Example:
>   let the document have
>  - product_id
>  - product_name
>  - product_popularity
>  - product_profit
>   Then this implementation could be used with an expression of 
> "0.2*product_popularity + 0.8*product_profit" to determine the weights of the 
> terms for the corresponding documents (optionally along with a payload 
> (product_id))



--
This message was sent by Atlassian JIRA
(v6.1#6144)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-5294) Suggester Dictionary implementation that takes expressions as term weights

2013-10-25 Thread ASF subversion and git services (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-5294?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13805513#comment-13805513
 ] 

ASF subversion and git services commented on LUCENE-5294:
-

Commit 1535797 from [~steve_rowe] in branch 'dev/trunk'
[ https://svn.apache.org/r1535797 ]

LUCENE-5294: IntelliJ config

> Suggester Dictionary implementation that takes expressions as term weights
> --
>
> Key: LUCENE-5294
> URL: https://issues.apache.org/jira/browse/LUCENE-5294
> Project: Lucene - Core
>  Issue Type: New Feature
>  Components: core/search
>Reporter: Areek Zillur
> Fix For: 4.6, 5.0
>
> Attachments: LUCENE-5294.patch
>
>
> It would be nice to have a Suggester Dictionary implementation that could 
> compute the weights of the terms consumed by the suggester based on an 
> user-defined expression (using lucene's expression module).
> It could be an extension of the existing DocumentDictionary (which takes 
> terms, weights and (optionally) payloads from the stored documents in the 
> index). The only exception being that instead of taking the weights for the 
> terms from the specified weight fields, it could compute the weights using an 
> user-defn expression, that uses one or more NumicDocValuesField from the 
> document.
> Example:
>   let the document have
>  - product_id
>  - product_name
>  - product_popularity
>  - product_profit
>   Then this implementation could be used with an expression of 
> "0.2*product_popularity + 0.8*product_profit" to determine the weights of the 
> terms for the corresponding documents (optionally along with a payload 
> (product_id))



--
This message was sent by Atlassian JIRA
(v6.1#6144)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-5307) Inconsistency between Weight.scorer documentation and ConstantScoreQuery on top of a Filter

2013-10-25 Thread Uwe Schindler (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-5307?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13805506#comment-13805506
 ] 

Uwe Schindler commented on LUCENE-5307:
---

Hi Adrien,
This is actually my fault. The following fix would be most correct and makes 
the optimization still work for the not really useful combination: 
ConstsntScoreQuery(QueryWrapperFilter(Query))

LotS of old code is using this, instead it should directly wrap the Query 
instead of creating a filter wrap.

I would fix:
- change the instanceof check to a query != null and assert that it is a Scorer
- add another special case in rewrite to prevent the old style stupidity: 
rewrite the above combination to a simple ConstantScoreQuery with the Query 
that was wrapped by the filter, ignoring inner boost.

Il'll upoad patch later.

> Inconsistency between Weight.scorer documentation and ConstantScoreQuery on 
> top of a Filter
> ---
>
> Key: LUCENE-5307
> URL: https://issues.apache.org/jira/browse/LUCENE-5307
> Project: Lucene - Core
>  Issue Type: Bug
>Reporter: Adrien Grand
>Assignee: Uwe Schindler
>Priority: Minor
>
> {{Weight.scorer}} states that if {{topScorer == true}}, {{Scorer.collect}} 
> will be called and that otherwise {{Scorer.nextDoc/advance}} will be called.
> This is a problem when {{ConstantScoreQuery}} is used on top of a 
> {{QueryWrapperFilter}}:
>  1. {{ConstantScoreWeight}}  calls {{getDocIdSet}} on the filter to know 
> which documents to collect.
>  2. {{QueryWrapperFilter.getDocIdSet}} returns a {{Scorer}} created with 
> {{topScorer == false}} so that {{nextDoc/advance}} are supported.
>  3. But then {{ConstantScorer.score(Collector)}} has the following 
> optimization:
> {code}
> // this optimization allows out of order scoring as top scorer!
> @Override
> public void score(Collector collector) throws IOException {
>   if (docIdSetIterator instanceof Scorer) {
> ((Scorer) docIdSetIterator).score(wrapCollector(collector));
>   } else {
> super.score(collector);
>   }
> }
> {code}
> So the filter iterator is a scorer which was created with {{topScorer = 
> false}} but {{ParentScorer}} ends up using its {{score(Collector)}} method, 
> which is illegal. (I found this out because AssertingSearcher has some checks 
> to make sure Scorers are used accordingly to the value of topScorer.)
> I can imagine several fixes, including:
>  - removing this optimization when working on top of a filter
>  - relaxing Weight documentation to allow for using {{score(Collector)}} when 
> {{topScorer == false}}
> but I'm not sure which one is the best one. What do you think?



--
This message was sent by Atlassian JIRA
(v6.1#6144)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Assigned] (LUCENE-5307) Inconsistency between Weight.scorer documentation and ConstantScoreQuery on top of a Filter

2013-10-25 Thread Uwe Schindler (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-5307?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Uwe Schindler reassigned LUCENE-5307:
-

Assignee: Uwe Schindler  (was: Adrien Grand)

> Inconsistency between Weight.scorer documentation and ConstantScoreQuery on 
> top of a Filter
> ---
>
> Key: LUCENE-5307
> URL: https://issues.apache.org/jira/browse/LUCENE-5307
> Project: Lucene - Core
>  Issue Type: Bug
>Reporter: Adrien Grand
>Assignee: Uwe Schindler
>Priority: Minor
>
> {{Weight.scorer}} states that if {{topScorer == true}}, {{Scorer.collect}} 
> will be called and that otherwise {{Scorer.nextDoc/advance}} will be called.
> This is a problem when {{ConstantScoreQuery}} is used on top of a 
> {{QueryWrapperFilter}}:
>  1. {{ConstantScoreWeight}}  calls {{getDocIdSet}} on the filter to know 
> which documents to collect.
>  2. {{QueryWrapperFilter.getDocIdSet}} returns a {{Scorer}} created with 
> {{topScorer == false}} so that {{nextDoc/advance}} are supported.
>  3. But then {{ConstantScorer.score(Collector)}} has the following 
> optimization:
> {code}
> // this optimization allows out of order scoring as top scorer!
> @Override
> public void score(Collector collector) throws IOException {
>   if (docIdSetIterator instanceof Scorer) {
> ((Scorer) docIdSetIterator).score(wrapCollector(collector));
>   } else {
> super.score(collector);
>   }
> }
> {code}
> So the filter iterator is a scorer which was created with {{topScorer = 
> false}} but {{ParentScorer}} ends up using its {{score(Collector)}} method, 
> which is illegal. (I found this out because AssertingSearcher has some checks 
> to make sure Scorers are used accordingly to the value of topScorer.)
> I can imagine several fixes, including:
>  - removing this optimization when working on top of a filter
>  - relaxing Weight documentation to allow for using {{score(Collector)}} when 
> {{topScorer == false}}
> but I'm not sure which one is the best one. What do you think?



--
This message was sent by Atlassian JIRA
(v6.1#6144)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Comment Edited] (SOLR-4787) Join Contrib

2013-10-25 Thread Kranti Parisa (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-4787?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13805485#comment-13805485
 ] 

Kranti Parisa edited comment on SOLR-4787 at 10/25/13 5:29 PM:
---

I have recently extended the hjoin further to support multiple FQs separated by 
comma (,)

{code}
/masterCore/select?q=*:*&fq=({!hjoin fromIndex=ACore from=parentid to=id v=$aQ 
fq=$BJoinQ,$AlocalFQ})&aQ=(f1:false)&BJoinQ=({!hjoin fromIndex=BCore from=bid 
to=aid}tag:abc)&AlocalFQ=(fieldName:value)
{code}

This will allow using the filter caches for multiple nested queries while using 
the hjoin like how solr supports multiple FQ params within the same request.

Any feedback for the syntax? is comma separated FQs (eg: 
*fq=$BJoinQ,$AlocalFQ*) sounds ok?


was (Author: krantiparisa):
I have recently extended the hjoin further to support multiple FQs separated by 
comma (,)

{code}
/masterCore/select?q=*:*&fq=({!hjoin fromIndex=ACore from=parentid to=id v=$aQ 
*fq=$BJoinQ,$AlocalFQ*})&aQ=(f1:false)&BJoinQ=({!hjoin fromIndex=BCore from=bid 
to=aid}tag:abc)&AlocalFQ=(fieldName:value)
{code}

This will allow using the filter caches for multiple nested queries while using 
the hjoin like how solr supports multiple FQ params within the same request.

Any feedback for the syntax? is comma separated FQs sounds ok?

> Join Contrib
> 
>
> Key: SOLR-4787
> URL: https://issues.apache.org/jira/browse/SOLR-4787
> Project: Solr
>  Issue Type: New Feature
>  Components: search
>Affects Versions: 4.2.1
>Reporter: Joel Bernstein
>Priority: Minor
> Fix For: 4.6
>
> Attachments: SOLR-4787-deadlock-fix.patch, SOLR-4787.patch, 
> SOLR-4787.patch, SOLR-4787.patch, SOLR-4787.patch, SOLR-4787.patch, 
> SOLR-4787.patch, SOLR-4787.patch, SOLR-4787.patch, SOLR-4787.patch, 
> SOLR-4787.patch, SOLR-4787.patch, SOLR-4787.patch, SOLR-4787.patch, 
> SOLR-4787-pjoin-long-keys.patch, SOLR-4797-hjoin-multivaluekeys-trunk.patch
>
>
> This contrib provides a place where different join implementations can be 
> contributed to Solr. This contrib currently includes 3 join implementations. 
> The initial patch was generated from the Solr 4.3 tag. Because of changes in 
> the FieldCache API this patch will only build with Solr 4.2 or above.
> *HashSetJoinQParserPlugin aka hjoin*
> The hjoin provides a join implementation that filters results in one core 
> based on the results of a search in another core. This is similar in 
> functionality to the JoinQParserPlugin but the implementation differs in a 
> couple of important ways.
> The first way is that the hjoin is designed to work with int and long join 
> keys only. So, in order to use hjoin, int or long join keys must be included 
> in both the to and from core.
> The second difference is that the hjoin builds memory structures that are 
> used to quickly connect the join keys. So, the hjoin will need more memory 
> then the JoinQParserPlugin to perform the join.
> The main advantage of the hjoin is that it can scale to join millions of keys 
> between cores and provide sub-second response time. The hjoin should work 
> well with up to two million results from the fromIndex and tens of millions 
> of results from the main query.
> The hjoin supports the following features:
> 1) Both lucene query and PostFilter implementations. A *"cost"* > 99 will 
> turn on the PostFilter. The PostFilter will typically outperform the Lucene 
> query when the main query results have been narrowed down.
> 2) With the lucene query implementation there is an option to build the 
> filter with threads. This can greatly improve the performance of the query if 
> the main query index is very large. The "threads" parameter turns on 
> threading. For example *threads=6* will use 6 threads to build the filter. 
> This will setup a fixed threadpool with six threads to handle all hjoin 
> requests. Once the threadpool is created the hjoin will always use it to 
> build the filter. Threading does not come into play with the PostFilter.
> 3) The *size* local parameter can be used to set the initial size of the 
> hashset used to perform the join. If this is set above the number of results 
> from the fromIndex then the you can avoid hashset resizing which improves 
> performance.
> 4) Nested filter queries. The local parameter "fq" can be used to nest a 
> filter query within the join. The nested fq will filter the results of the 
> join query. This can point to another join to support nested joins.
> 5) Full caching support for the lucene query implementation. The filterCache 
> and queryResultCache should work properly even with deep nesting of joins. 
> Only the queryResultCache comes into play with the PostFilter implementation 
> because PostFilters are not cacheable in the filterCache.
> The syntax of the

[jira] [Commented] (SOLR-4787) Join Contrib

2013-10-25 Thread Kranti Parisa (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-4787?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13805485#comment-13805485
 ] 

Kranti Parisa commented on SOLR-4787:
-

I have recently extended the hjoin further to support multiple FQs separated by 
comma (,)

{code}
/masterCore/select?q=*:*&fq=({!hjoin fromIndex=ACore from=parentid to=id v=$aQ 
*fq=$BJoinQ,$AlocalFQ*})&aQ=(f1:false)&BJoinQ=({!hjoin fromIndex=BCore from=bid 
to=aid}tag:abc)&AlocalFQ=(fieldName:value)
{code}

This will allow using the filter caches for multiple nested queries while using 
the hjoin like how solr supports multiple FQ params within the same request.

Any feedback for the syntax? is comma separated FQs sounds ok?

> Join Contrib
> 
>
> Key: SOLR-4787
> URL: https://issues.apache.org/jira/browse/SOLR-4787
> Project: Solr
>  Issue Type: New Feature
>  Components: search
>Affects Versions: 4.2.1
>Reporter: Joel Bernstein
>Priority: Minor
> Fix For: 4.6
>
> Attachments: SOLR-4787-deadlock-fix.patch, SOLR-4787.patch, 
> SOLR-4787.patch, SOLR-4787.patch, SOLR-4787.patch, SOLR-4787.patch, 
> SOLR-4787.patch, SOLR-4787.patch, SOLR-4787.patch, SOLR-4787.patch, 
> SOLR-4787.patch, SOLR-4787.patch, SOLR-4787.patch, SOLR-4787.patch, 
> SOLR-4787-pjoin-long-keys.patch, SOLR-4797-hjoin-multivaluekeys-trunk.patch
>
>
> This contrib provides a place where different join implementations can be 
> contributed to Solr. This contrib currently includes 3 join implementations. 
> The initial patch was generated from the Solr 4.3 tag. Because of changes in 
> the FieldCache API this patch will only build with Solr 4.2 or above.
> *HashSetJoinQParserPlugin aka hjoin*
> The hjoin provides a join implementation that filters results in one core 
> based on the results of a search in another core. This is similar in 
> functionality to the JoinQParserPlugin but the implementation differs in a 
> couple of important ways.
> The first way is that the hjoin is designed to work with int and long join 
> keys only. So, in order to use hjoin, int or long join keys must be included 
> in both the to and from core.
> The second difference is that the hjoin builds memory structures that are 
> used to quickly connect the join keys. So, the hjoin will need more memory 
> then the JoinQParserPlugin to perform the join.
> The main advantage of the hjoin is that it can scale to join millions of keys 
> between cores and provide sub-second response time. The hjoin should work 
> well with up to two million results from the fromIndex and tens of millions 
> of results from the main query.
> The hjoin supports the following features:
> 1) Both lucene query and PostFilter implementations. A *"cost"* > 99 will 
> turn on the PostFilter. The PostFilter will typically outperform the Lucene 
> query when the main query results have been narrowed down.
> 2) With the lucene query implementation there is an option to build the 
> filter with threads. This can greatly improve the performance of the query if 
> the main query index is very large. The "threads" parameter turns on 
> threading. For example *threads=6* will use 6 threads to build the filter. 
> This will setup a fixed threadpool with six threads to handle all hjoin 
> requests. Once the threadpool is created the hjoin will always use it to 
> build the filter. Threading does not come into play with the PostFilter.
> 3) The *size* local parameter can be used to set the initial size of the 
> hashset used to perform the join. If this is set above the number of results 
> from the fromIndex then the you can avoid hashset resizing which improves 
> performance.
> 4) Nested filter queries. The local parameter "fq" can be used to nest a 
> filter query within the join. The nested fq will filter the results of the 
> join query. This can point to another join to support nested joins.
> 5) Full caching support for the lucene query implementation. The filterCache 
> and queryResultCache should work properly even with deep nesting of joins. 
> Only the queryResultCache comes into play with the PostFilter implementation 
> because PostFilters are not cacheable in the filterCache.
> The syntax of the hjoin is similar to the JoinQParserPlugin except that the 
> plugin is referenced by the string "hjoin" rather then "join".
> fq=\{!hjoin fromIndex=collection2 from=id_i to=id_i threads=6 
> fq=$qq\}user:customer1&qq=group:5
> The example filter query above will search the fromIndex (collection2) for 
> "user:customer1" applying the local fq parameter to filter the results. The 
> lucene filter query will be built using 6 threads. This query will generate a 
> list of values from the "from" field that will be used to filter the main 
> query. Only records from the main query, where the "to" field is present in 
> the "from" list will

[jira] [Created] (LUCENE-5307) Inconsistency between Weight.scorer documentation and ConstantScoreQuery on top of a Filter

2013-10-25 Thread Adrien Grand (JIRA)
Adrien Grand created LUCENE-5307:


 Summary: Inconsistency between Weight.scorer documentation and 
ConstantScoreQuery on top of a Filter
 Key: LUCENE-5307
 URL: https://issues.apache.org/jira/browse/LUCENE-5307
 Project: Lucene - Core
  Issue Type: Bug
Reporter: Adrien Grand
Assignee: Adrien Grand
Priority: Minor


{{Weight.scorer}} states that if {{topScorer == true}}, {{Scorer.collect}} will 
be called and that otherwise {{Scorer.nextDoc/advance}} will be called.

This is a problem when {{ConstantScoreQuery}} is used on top of a 
{{QueryWrapperFilter}}:
 1. {{ConstantScoreWeight}}  calls {{getDocIdSet}} on the filter to know which 
documents to collect.
 2. {{QueryWrapperFilter.getDocIdSet}} returns a {{Scorer}} created with 
{{topScorer == false}} so that {{nextDoc/advance}} are supported.
 3. But then {{ConstantScorer.score(Collector)}} has the following optimization:
{code}
// this optimization allows out of order scoring as top scorer!
@Override
public void score(Collector collector) throws IOException {
  if (docIdSetIterator instanceof Scorer) {
((Scorer) docIdSetIterator).score(wrapCollector(collector));
  } else {
super.score(collector);
  }
}
{code}

So the filter iterator is a scorer which was created with {{topScorer = false}} 
but {{ParentScorer}} ends up using its {{score(Collector)}} method, which is 
illegal. (I found this out because AssertingSearcher has some checks to make 
sure Scorers are used accordingly to the value of topScorer.)

I can imagine several fixes, including:
 - removing this optimization when working on top of a filter
 - relaxing Weight documentation to allow for using {{score(Collector)}} when 
{{topScorer == false}}

but I'm not sure which one is the best one. What do you think?



--
This message was sent by Atlassian JIRA
(v6.1#6144)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



RE: New JVM bug on MacOSX ?!?

2013-10-25 Thread Uwe Schindler
Hi,

>From the hs_err file it looks like this one happens only, if while reading 
>from a socket (the HTTP socket) an I/O-error occurs. The native socket read 
>code tries to throw an IOException and creates the message string from the 
>"errno" macro of the libc and segfaults while doing this. So it could be a 
>simple network stack problem on OSX where some errno/LastError has no valid 
>message,

This is why we might not see this bug on other MacOSX machines, because this 
machine is not the fastest one and we know that read errors occur much more 
often (this is why Solr test fail on this OSX machine). So it might affect 
other OSX machines, if you make them very busy by a parallel SETI@home or 
whatever :-)

Uwe

-
Uwe Schindler
H.-H.-Meier-Allee 63, D-28213 Bremen
http://www.thetaphi.de
eMail: u...@thetaphi.de


> -Original Message-
> From: Uwe Schindler [mailto:u...@thetaphi.de]
> Sent: Friday, October 25, 2013 6:15 PM
> To: dev@lucene.apache.org
> Cc: rory.odonn...@oracle.com
> Subject: New JVM bug on MacOSX ?!?
> 
> Hi Rory, hi Lucene/Solr committers,
> 
> this is a JVM crash with 7u45 on MacOSX, but also happened with u40. I set
> the broken build to be sticky, the hs_err file is also available (was 
> archived as
> build artifact):
> 
> Build:
> http://jenkins.thetaphi.de/job/Lucene-Solr-trunk-MacOSX/946/
> 
> hs_err:
> http://jenkins.thetaphi.de/job/Lucene-Solr-trunk-
> MacOSX/946/artifact/solr/build/solr-core/test/J0/hs_err_pid185.log
> 
> SIGSEGV happens here:
> [libjava.dylib+0x9a2b]  JNU_NewStringPlatform+0x1d3
> 
> We have seen this bug on Oct 10, too. It's also mentioned in this issue:
> https://issues.apache.org/jira/browse/SOLR-4593
> It only reproduces on MacOSX computers (sometimes). I happens in line with
> other malloc/free bugs: MacOSX crashes the JVM quite often with
> complaining about double free() on pointers malloc'ed before. MacOSX
> seems to be very picky about double freeing pointers in their libc. If I have 
> a
> new failure about that one, I will post it, too. The double free() one
> reproduces from time to time on all MacOSX machines. This one above was
> only seen on this virtual machine (VirtualBOX 4.2.18 and 4.3.0, stock OSX
> 10.8.5 EFI64 Guest) up to now.
> 
> Uwe
> 
> -
> Uwe Schindler
> H.-H.-Meier-Allee 63, D-28213 Bremen
> http://www.thetaphi.de
> eMail: u...@thetaphi.de
> 
> 
> > -Original Message-
> > From: Policeman Jenkins Server [mailto:jenk...@thetaphi.de]
> > Sent: Friday, October 25, 2013 2:31 PM
> > To: dev@lucene.apache.org
> > Subject: [JENKINS] Lucene-Solr-trunk-MacOSX (64bit/jdk1.7.0) - Build #
> > 946 - Still Failing!
> >
> > Build: http://jenkins.thetaphi.de/job/Lucene-Solr-trunk-MacOSX/946/
> > Java: 64bit/jdk1.7.0 -XX:-UseCompressedOops -
> XX:+UseConcMarkSweepGC
> >
> > All tests passed
> >
> > Build Log:
> > [...truncated 9765 lines...]
> >[junit4] JVM J0: stdout was not empty, see:
> > /Users/jenkins/workspace/Lucene-Solr-trunk-MacOSX/solr/build/solr-
> > core/test/temp/junit4-J0-20131025_122106_052.sysout
> >[junit4] >>> JVM J0: stdout (verbatim) 
> >[junit4] #
> >[junit4] # A fatal error has been detected by the Java Runtime
> > Environment:
> >[junit4] #
> >[junit4] #  SIGSEGV (0xb) at pc=0x00010feeba2b, pid=185, tid=134147
> >[junit4] #
> >[junit4] # JRE version: Java(TM) SE Runtime Environment
> > (7.0_45-b18) (build 1.7.0_45-b18)
> >[junit4] # Java VM: Java HotSpot(TM) 64-Bit Server VM (24.45-b08
> > mixed mode bsd-amd64 )
> >[junit4] # Problematic frame:
> >[junit4] # C  [libjava.dylib+0x9a2b]  JNU_NewStringPlatform+0x1d3
> >[junit4] #
> >[junit4] # Failed to write core dump. Core dumps have been
> > disabled. To enable core dumping, try "ulimit -c unlimited" before starting
> Java again
> >[junit4] #
> >[junit4] # An error report file with more information is saved as:
> >[junit4] # /Users/jenkins/workspace/Lucene-Solr-trunk-
> > MacOSX/solr/build/solr-core/test/J0/hs_err_pid185.log
> >[junit4] [thread 139779 also had an error]
> >[junit4] #
> >[junit4] # If you would like to submit a bug report, please visit:
> >[junit4] #   http://bugreport.sun.com/bugreport/crash.jsp
> >[junit4] # The crash happened outside the Java Virtual Machine in
> > native code.
> >[junit4] # See problematic frame for where to report the bug.
> >[junit4] #
> >[junit4] <<< JVM J0: EOF 
> >
> > [...truncated 1 lines...]
> >[junit4] ERROR: JVM J0 ended with an exception, command line:
> > /Library/Java/JavaVirtualMachines/jdk1.7.0_45.jdk/Contents/Home/jre/bi
> > n/ java -XX:-UseCompressedOops -XX:+UseConcMarkSweepGC -
> > XX:+HeapDumpOnOutOfMemoryError -
> > XX:HeapDumpPath=/Users/jenkins/workspace/Lucene-Solr-trunk-
> > MacOSX/heapdumps -Dtests.prefix=tests -
> Dtests.seed=F31293EB573940CD -
> > Xmx512M -Dtests.iters= -Dtests.verbose=false -Dtests.infostream=false
> > - Dtests.codec=random -Dtests.postingsformat=random -
>

New JVM bug on MacOSX ?!?

2013-10-25 Thread Uwe Schindler
Hi Rory, hi Lucene/Solr committers,

this is a JVM crash with 7u45 on MacOSX, but also happened with u40. I set the 
broken build to be sticky, the hs_err file is also available (was archived as 
build artifact):

Build:
http://jenkins.thetaphi.de/job/Lucene-Solr-trunk-MacOSX/946/

hs_err:
http://jenkins.thetaphi.de/job/Lucene-Solr-trunk-MacOSX/946/artifact/solr/build/solr-core/test/J0/hs_err_pid185.log

SIGSEGV happens here:
[libjava.dylib+0x9a2b]  JNU_NewStringPlatform+0x1d3

We have seen this bug on Oct 10, too. It's also mentioned in this issue: 
https://issues.apache.org/jira/browse/SOLR-4593
It only reproduces on MacOSX computers (sometimes). I happens in line with 
other malloc/free bugs: MacOSX crashes the JVM quite often with complaining 
about double free() on pointers malloc'ed before. MacOSX seems to be very picky 
about double freeing pointers in their libc. If I have a new failure about that 
one, I will post it, too. The double free() one reproduces from time to time on 
all MacOSX machines. This one above was only seen on this virtual machine 
(VirtualBOX 4.2.18 and 4.3.0, stock OSX 10.8.5 EFI64 Guest) up to now.

Uwe

-
Uwe Schindler
H.-H.-Meier-Allee 63, D-28213 Bremen
http://www.thetaphi.de
eMail: u...@thetaphi.de


> -Original Message-
> From: Policeman Jenkins Server [mailto:jenk...@thetaphi.de]
> Sent: Friday, October 25, 2013 2:31 PM
> To: dev@lucene.apache.org
> Subject: [JENKINS] Lucene-Solr-trunk-MacOSX (64bit/jdk1.7.0) - Build # 946 -
> Still Failing!
> 
> Build: http://jenkins.thetaphi.de/job/Lucene-Solr-trunk-MacOSX/946/
> Java: 64bit/jdk1.7.0 -XX:-UseCompressedOops -XX:+UseConcMarkSweepGC
> 
> All tests passed
> 
> Build Log:
> [...truncated 9765 lines...]
>[junit4] JVM J0: stdout was not empty, see:
> /Users/jenkins/workspace/Lucene-Solr-trunk-MacOSX/solr/build/solr-
> core/test/temp/junit4-J0-20131025_122106_052.sysout
>[junit4] >>> JVM J0: stdout (verbatim) 
>[junit4] #
>[junit4] # A fatal error has been detected by the Java Runtime
> Environment:
>[junit4] #
>[junit4] #  SIGSEGV (0xb) at pc=0x00010feeba2b, pid=185, tid=134147
>[junit4] #
>[junit4] # JRE version: Java(TM) SE Runtime Environment (7.0_45-b18)
> (build 1.7.0_45-b18)
>[junit4] # Java VM: Java HotSpot(TM) 64-Bit Server VM (24.45-b08 mixed
> mode bsd-amd64 )
>[junit4] # Problematic frame:
>[junit4] # C  [libjava.dylib+0x9a2b]  JNU_NewStringPlatform+0x1d3
>[junit4] #
>[junit4] # Failed to write core dump. Core dumps have been disabled. To
> enable core dumping, try "ulimit -c unlimited" before starting Java again
>[junit4] #
>[junit4] # An error report file with more information is saved as:
>[junit4] # /Users/jenkins/workspace/Lucene-Solr-trunk-
> MacOSX/solr/build/solr-core/test/J0/hs_err_pid185.log
>[junit4] [thread 139779 also had an error]
>[junit4] #
>[junit4] # If you would like to submit a bug report, please visit:
>[junit4] #   http://bugreport.sun.com/bugreport/crash.jsp
>[junit4] # The crash happened outside the Java Virtual Machine in native
> code.
>[junit4] # See problematic frame for where to report the bug.
>[junit4] #
>[junit4] <<< JVM J0: EOF 
> 
> [...truncated 1 lines...]
>[junit4] ERROR: JVM J0 ended with an exception, command line:
> /Library/Java/JavaVirtualMachines/jdk1.7.0_45.jdk/Contents/Home/jre/bin/
> java -XX:-UseCompressedOops -XX:+UseConcMarkSweepGC -
> XX:+HeapDumpOnOutOfMemoryError -
> XX:HeapDumpPath=/Users/jenkins/workspace/Lucene-Solr-trunk-
> MacOSX/heapdumps -Dtests.prefix=tests -Dtests.seed=F31293EB573940CD -
> Xmx512M -Dtests.iters= -Dtests.verbose=false -Dtests.infostream=false -
> Dtests.codec=random -Dtests.postingsformat=random -
> Dtests.docvaluesformat=random -Dtests.locale=random -
> Dtests.timezone=random -Dtests.directory=random -
> Dtests.linedocsfile=europarl.lines.txt.gz -Dtests.luceneMatchVersion=5.0 -
> Dtests.cleanthreads=perClass -
> Djava.util.logging.config.file=/Users/jenkins/workspace/Lucene-Solr-trunk-
> MacOSX/lucene/tools/junit4/logging.properties -Dtests.nightly=false -
> Dtests.weekly=false -Dtests.slow=true -Dtests.asserts.gracious=false -
> Dtests.multiplier=1 -DtempDir=. -Djava.io.tmpdir=. -
> Djunit4.tempDir=/Users/jenkins/workspace/Lucene-Solr-trunk-
> MacOSX/solr/build/solr-core/test/temp -
> Dclover.db.dir=/Users/jenkins/workspace/Lucene-Solr-trunk-
> MacOSX/lucene/build/clover/db -
> Djava.security.manager=org.apache.lucene.util.TestSecurityManager -
> Djava.security.policy=/Users/jenkins/workspace/Lucene-Solr-trunk-
> MacOSX/lucene/tools/junit4/tests.policy -Dlucene.version=5.0-SNAPSHOT -
> Djetty.testMode=1 -Djetty.insecurerandom=1 -
> Dsolr.directoryFactory=org.apache.solr.core.MockDirectoryFactory -
> Djava.awt.headless=true -Dtests.disableHdfs=true -Dfile.encoding=UTF-8 -
> classpath /Users/jenkins/workspace/Lucene-Solr-trunk-
> MacOSX/solr/build/solr-
> core/classes/test:/Users/jenkins/workspace/Lucene-Solr-trunk-

[jira] [Updated] (LUCENE-5285) FastVectorHighlighter copies segments scores when splitting segments across multi-valued fields

2013-10-25 Thread Nik Everett (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-5285?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Nik Everett updated LUCENE-5285:


Attachment: LUCENE-5285.patch

New patch fixes my broken WeightedFragList change and expands  
WeightedFragListBuilderTest to catch the broken implementation.

> FastVectorHighlighter copies segments scores when splitting segments across 
> multi-valued fields
> ---
>
> Key: LUCENE-5285
> URL: https://issues.apache.org/jira/browse/LUCENE-5285
> Project: Lucene - Core
>  Issue Type: Bug
>Reporter: Nik Everett
>Priority: Minor
> Attachments: LUCENE-5285.patch, LUCENE-5285.patch
>
>
> FastVectorHighlighter copies segments scores when splitting segments across 
> multi-valued fields.  This is only a problem when you want to sort the 
> fragments by score. Technically BaseFragmentsBuilder (line 261 in my copy of 
> the source) does the copying.
> Rather than copying the score I _think_ it'd be more right to pull that 
> copying logic into a protected method that child classes (such as 
> ScoreOrderFragmentsBuilder) can override to do more intelligent things.  
> Exactly what that means isn't clear to me at the moment.



--
This message was sent by Atlassian JIRA
(v6.1#6144)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[JENKINS] Lucene-Solr-trunk-Linux (32bit/jdk1.8.0-ea-b109) - Build # 8036 - Failure!

2013-10-25 Thread Policeman Jenkins Server
Build: http://jenkins.thetaphi.de/job/Lucene-Solr-trunk-Linux/8036/
Java: 32bit/jdk1.8.0-ea-b109 -server -XX:+UseSerialGC

1 tests failed.
REGRESSION:  org.apache.solr.core.TestNonNRTOpen.testReaderIsNotNRT

Error Message:
expected:<3> but was:<2>

Stack Trace:
java.lang.AssertionError: expected:<3> but was:<2>
at 
__randomizedtesting.SeedInfo.seed([8EB9DFDAC3456D7E:3B3FBE5D7C84DF8A]:0)
at org.junit.Assert.fail(Assert.java:93)
at org.junit.Assert.failNotEquals(Assert.java:647)
at org.junit.Assert.assertEquals(Assert.java:128)
at org.junit.Assert.assertEquals(Assert.java:472)
at org.junit.Assert.assertEquals(Assert.java:456)
at 
org.apache.solr.core.TestNonNRTOpen.assertNotNRT(TestNonNRTOpen.java:133)
at 
org.apache.solr.core.TestNonNRTOpen.testReaderIsNotNRT(TestNonNRTOpen.java:94)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:491)
at 
com.carrotsearch.randomizedtesting.RandomizedRunner.invoke(RandomizedRunner.java:1559)
at 
com.carrotsearch.randomizedtesting.RandomizedRunner.access$600(RandomizedRunner.java:79)
at 
com.carrotsearch.randomizedtesting.RandomizedRunner$6.evaluate(RandomizedRunner.java:737)
at 
com.carrotsearch.randomizedtesting.RandomizedRunner$7.evaluate(RandomizedRunner.java:773)
at 
com.carrotsearch.randomizedtesting.RandomizedRunner$8.evaluate(RandomizedRunner.java:787)
at 
com.carrotsearch.randomizedtesting.rules.SystemPropertiesRestoreRule$1.evaluate(SystemPropertiesRestoreRule.java:53)
at 
org.apache.lucene.util.TestRuleSetupTeardownChained$1.evaluate(TestRuleSetupTeardownChained.java:50)
at 
org.apache.lucene.util.TestRuleFieldCacheSanity$1.evaluate(TestRuleFieldCacheSanity.java:51)
at 
org.apache.lucene.util.AbstractBeforeAfterRule$1.evaluate(AbstractBeforeAfterRule.java:46)
at 
com.carrotsearch.randomizedtesting.rules.SystemPropertiesInvariantRule$1.evaluate(SystemPropertiesInvariantRule.java:55)
at 
org.apache.lucene.util.TestRuleThreadAndTestName$1.evaluate(TestRuleThreadAndTestName.java:49)
at 
org.apache.lucene.util.TestRuleIgnoreAfterMaxFailures$1.evaluate(TestRuleIgnoreAfterMaxFailures.java:70)
at 
org.apache.lucene.util.TestRuleMarkFailure$1.evaluate(TestRuleMarkFailure.java:48)
at 
com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
at 
com.carrotsearch.randomizedtesting.ThreadLeakControl$StatementRunner.run(ThreadLeakControl.java:358)
at 
com.carrotsearch.randomizedtesting.ThreadLeakControl.forkTimeoutingTask(ThreadLeakControl.java:782)
at 
com.carrotsearch.randomizedtesting.ThreadLeakControl$3.evaluate(ThreadLeakControl.java:442)
at 
com.carrotsearch.randomizedtesting.RandomizedRunner.runSingleTest(RandomizedRunner.java:746)
at 
com.carrotsearch.randomizedtesting.RandomizedRunner$3.evaluate(RandomizedRunner.java:648)
at 
com.carrotsearch.randomizedtesting.RandomizedRunner$4.evaluate(RandomizedRunner.java:682)
at 
com.carrotsearch.randomizedtesting.RandomizedRunner$5.evaluate(RandomizedRunner.java:693)
at 
com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
at 
com.carrotsearch.randomizedtesting.rules.SystemPropertiesRestoreRule$1.evaluate(SystemPropertiesRestoreRule.java:53)
at 
org.apache.lucene.util.AbstractBeforeAfterRule$1.evaluate(AbstractBeforeAfterRule.java:46)
at 
org.apache.lucene.util.TestRuleStoreClassName$1.evaluate(TestRuleStoreClassName.java:42)
at 
com.carrotsearch.randomizedtesting.rules.SystemPropertiesInvariantRule$1.evaluate(SystemPropertiesInvariantRule.java:55)
at 
com.carrotsearch.randomizedtesting.rules.NoShadowingOrOverridesOnMethodsRule$1.evaluate(NoShadowingOrOverridesOnMethodsRule.java:39)
at 
com.carrotsearch.randomizedtesting.rules.NoShadowingOrOverridesOnMethodsRule$1.evaluate(NoShadowingOrOverridesOnMethodsRule.java:39)
at 
com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
at 
org.apache.lucene.util.TestRuleAssertionsRequired$1.evaluate(TestRuleAssertionsRequired.java:43)
at 
org.apache.lucene.util.TestRuleMarkFailure$1.evaluate(TestRuleMarkFailure.java:48)
at 
org.apache.lucene.util.TestRuleIgnoreAfterMaxFailures$1.evaluate(TestRuleIgnoreAfterMaxFailures.java:70)
at 
org.apache.lucene.util.TestRuleIgnoreTestSuites$1.evaluate(TestRuleIgnoreTestSuites.java:55)
at 
com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
at 
com.carrotsearch.randomizedtesting.ThreadLeakControl$Statemen

[jira] [Resolved] (SOLR-5387) Multi-Term analyser not working

2013-10-25 Thread Erick Erickson (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-5387?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Erick Erickson resolved SOLR-5387.
--

Resolution: Not A Problem

Please raise questions like this on the user's list before raising a JIRA to 
see if it's a real bug or just a misunderstanding on your part.

In this case, the code is functioning as expected. MultiTerm analysis chains 
may NOT break incoming tokens up into more than one token. In this case, 
anything with an @ symbol is broken up into more than one term due to the 
StandardTokenizer, which is an illegal condition for multiterm queries.

> Multi-Term analyser not working
> ---
>
> Key: SOLR-5387
> URL: https://issues.apache.org/jira/browse/SOLR-5387
> Project: Solr
>  Issue Type: Bug
>Reporter: Claire Chan
> Fix For: 4.5
>
>
> I tried the solr 4.5 example schema modifed by changing a field, say, 'manu' 
> to the following fieldType:
>  positionIncrementGap="100">
> ...
>   
> 
> 
>   
>   
> 
> 
>   
> 
> After indexing a document with manu value europ...@union.de, 
> the following search throw an Exception:
> manu:(european@unio*)
> The exception:
> analyzer returned too many terms for multiTerm term: european@unio
> org.apache.solr.common.SolrException: analyzer returned too many terms for 
> multiTerm term: european@unio
> at 
> org.apache.solr.schema.TextField.analyzeMultiTerm(TextField.java:157)
> at 
> org.apache.solr.parser.SolrQueryParserBase.analyzeIfMultitermTermText(SolrQueryParserBase.java:936)
> at 
> org.apache.solr.parser.SolrQueryParserBase.getPrefixQuery(SolrQueryParserBase.java:981)
> at 
> org.apache.solr.parser.SolrQueryParserBase.handleBareTokenQuery(SolrQueryParserBase.java:746)
> at org.apache.solr.parser.QueryParser.Term(QueryParser.java:300)
> at org.apache.solr.parser.QueryParser.Clause(QueryParser.java:186)
> at org.apache.solr.parser.QueryParser.Query(QueryParser.java:108)
> I thought I did exactly as instructed by various MultiTerm-blogs & 
> Wiki-Pages. So please take a look if this is a bug.



--
This message was sent by Atlassian JIRA
(v6.1#6144)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[JENKINS] Lucene-Solr-trunk-MacOSX (64bit/jdk1.7.0) - Build # 946 - Still Failing!

2013-10-25 Thread Policeman Jenkins Server
Build: http://jenkins.thetaphi.de/job/Lucene-Solr-trunk-MacOSX/946/
Java: 64bit/jdk1.7.0 -XX:-UseCompressedOops -XX:+UseConcMarkSweepGC

All tests passed

Build Log:
[...truncated 9765 lines...]
   [junit4] JVM J0: stdout was not empty, see: 
/Users/jenkins/workspace/Lucene-Solr-trunk-MacOSX/solr/build/solr-core/test/temp/junit4-J0-20131025_122106_052.sysout
   [junit4] >>> JVM J0: stdout (verbatim) 
   [junit4] #
   [junit4] # A fatal error has been detected by the Java Runtime Environment:
   [junit4] #
   [junit4] #  SIGSEGV (0xb) at pc=0x00010feeba2b, pid=185, tid=134147
   [junit4] #
   [junit4] # JRE version: Java(TM) SE Runtime Environment (7.0_45-b18) (build 
1.7.0_45-b18)
   [junit4] # Java VM: Java HotSpot(TM) 64-Bit Server VM (24.45-b08 mixed mode 
bsd-amd64 )
   [junit4] # Problematic frame:
   [junit4] # C  [libjava.dylib+0x9a2b]  JNU_NewStringPlatform+0x1d3
   [junit4] #
   [junit4] # Failed to write core dump. Core dumps have been disabled. To 
enable core dumping, try "ulimit -c unlimited" before starting Java again
   [junit4] #
   [junit4] # An error report file with more information is saved as:
   [junit4] # 
/Users/jenkins/workspace/Lucene-Solr-trunk-MacOSX/solr/build/solr-core/test/J0/hs_err_pid185.log
   [junit4] [thread 139779 also had an error]
   [junit4] #
   [junit4] # If you would like to submit a bug report, please visit:
   [junit4] #   http://bugreport.sun.com/bugreport/crash.jsp
   [junit4] # The crash happened outside the Java Virtual Machine in native 
code.
   [junit4] # See problematic frame for where to report the bug.
   [junit4] #
   [junit4] <<< JVM J0: EOF 

[...truncated 1 lines...]
   [junit4] ERROR: JVM J0 ended with an exception, command line: 
/Library/Java/JavaVirtualMachines/jdk1.7.0_45.jdk/Contents/Home/jre/bin/java 
-XX:-UseCompressedOops -XX:+UseConcMarkSweepGC -XX:+HeapDumpOnOutOfMemoryError 
-XX:HeapDumpPath=/Users/jenkins/workspace/Lucene-Solr-trunk-MacOSX/heapdumps 
-Dtests.prefix=tests -Dtests.seed=F31293EB573940CD -Xmx512M -Dtests.iters= 
-Dtests.verbose=false -Dtests.infostream=false -Dtests.codec=random 
-Dtests.postingsformat=random -Dtests.docvaluesformat=random 
-Dtests.locale=random -Dtests.timezone=random -Dtests.directory=random 
-Dtests.linedocsfile=europarl.lines.txt.gz -Dtests.luceneMatchVersion=5.0 
-Dtests.cleanthreads=perClass 
-Djava.util.logging.config.file=/Users/jenkins/workspace/Lucene-Solr-trunk-MacOSX/lucene/tools/junit4/logging.properties
 -Dtests.nightly=false -Dtests.weekly=false -Dtests.slow=true 
-Dtests.asserts.gracious=false -Dtests.multiplier=1 -DtempDir=. 
-Djava.io.tmpdir=. 
-Djunit4.tempDir=/Users/jenkins/workspace/Lucene-Solr-trunk-MacOSX/solr/build/solr-core/test/temp
 
-Dclover.db.dir=/Users/jenkins/workspace/Lucene-Solr-trunk-MacOSX/lucene/build/clover/db
 -Djava.security.manager=org.apache.lucene.util.TestSecurityManager 
-Djava.security.policy=/Users/jenkins/workspace/Lucene-Solr-trunk-MacOSX/lucene/tools/junit4/tests.policy
 -Dlucene.version=5.0-SNAPSHOT -Djetty.testMode=1 -Djetty.insecurerandom=1 
-Dsolr.directoryFactory=org.apache.solr.core.MockDirectoryFactory 
-Djava.awt.headless=true -Dtests.disableHdfs=true -Dfile.encoding=UTF-8 
-classpath 
/Users/jenkins/workspace/Lucene-Solr-trunk-MacOSX/solr/build/solr-core/classes/test:/Users/jenkins/workspace/Lucene-Solr-trunk-MacOSX/solr/build/solr-test-framework/classes/java:/Users/jenkins/workspace/Lucene-Solr-trunk-MacOSX/solr/test-framework/lib/junit4-ant-2.0.10.jar:/Users/jenkins/workspace/Lucene-Solr-trunk-MacOSX/solr/build/solr-core/test-files:/Users/jenkins/workspace/Lucene-Solr-trunk-MacOSX/lucene/build/test-framework/classes/java:/Users/jenkins/workspace/Lucene-Solr-trunk-MacOSX/lucene/build/codecs/classes/java:/Users/jenkins/workspace/Lucene-Solr-trunk-MacOSX/solr/build/solr-solrj/classes/java:/Users/jenkins/workspace/Lucene-Solr-trunk-MacOSX/solr/build/solr-core/classes/java:/Users/jenkins/workspace/Lucene-Solr-trunk-MacOSX/lucene/build/analysis/common/lucene-analyzers-common-5.0-SNAPSHOT.jar:/Users/jenkins/workspace/Lucene-Solr-trunk-MacOSX/lucene/build/analysis/kuromoji/lucene-analyzers-kuromoji-5.0-SNAPSHOT.jar:/Users/jenkins/workspace/Lucene-Solr-trunk-MacOSX/lucene/build/analysis/phonetic/lucene-analyzers-phonetic-5.0-SNAPSHOT.jar:/Users/jenkins/workspace/Lucene-Solr-trunk-MacOSX/lucene/build/codecs/lucene-codecs-5.0-SNAPSHOT.jar:/Users/jenkins/workspace/Lucene-Solr-trunk-MacOSX/lucene/build/highlighter/lucene-highlighter-5.0-SNAPSHOT.jar:/Users/jenkins/workspace/Lucene-Solr-trunk-MacOSX/lucene/build/memory/lucene-memory-5.0-SNAPSHOT.jar:/Users/jenkins/workspace/Lucene-Solr-trunk-MacOSX/lucene/build/misc/lucene-misc-5.0-SNAPSHOT.jar:/Users/jenkins/workspace/Lucene-Solr-trunk-MacOSX/lucene/build/spatial/lucene-spatial-5.0-SNAPSHOT.jar:/Users/jenkins/workspace/Lucene-Solr-trunk-MacOSX/lucene/build/suggest/lucene-suggest-5.0-SNAPSHOT.jar:/Users/jenkins/workspace/Lucene-Solr-trunk-MacOSX/lucene/build/grouping/lucene-grouping-5.0-SNAPSHOT.ja

[jira] [Created] (SOLR-5393) Solr Cloud Polygon insert over non leader shard produces error

2013-10-25 Thread Mahledivic Stronza (JIRA)
Mahledivic Stronza created SOLR-5393:


 Summary: Solr Cloud Polygon insert over non leader shard produces 
error
 Key: SOLR-5393
 URL: https://issues.apache.org/jira/browse/SOLR-5393
 Project: Solr
  Issue Type: Bug
Affects Versions: 4.5
Reporter: Mahledivic Stronza


We get the following error when trying to load polygon data into the solr cloud 
of type solr.SpatialRecursivePrefixTreeFieldType. 

When we insert the polygon data directly into the leader shard everything (also 
replication) works fine, but we get the following error whenever we try to 
insert the data over a non leader shard:

ERROR SolrCmdDistributor forwarding update to 
http://192.168.14.68:7460/geodata/ failed - retrying 

Without polygon we can insert in any shards (leader or replica) without 
problems.

Code to reproduce:

SolrInputDocument input = new SolrInputDocument();
input.setField("id", "test");
String poly = "POLYGON((13.994591177713277 51.002523790166705, 
13.999693765863235 51.002226426778684, 14.001913786758868 51.003267620336686, 
14.009460032984512 51.00507616830811, 14.014352741862453 51.005397015805684, 
14.015753050939129 51.006165440252694, 14.01713711596325 51.006907316716564, 
14.018964454037809 51.006716998902526, 14.021343903143388 51.006464920150016, 
14.02502734579799 51.00552477067982, 14.024643222840908 51.00516708013014, 
14.023995094883304 51.004529130427706, 14.02693059874708 51.00426911242828, 
14.029893872882688 51.00399916991763, 14.03384714224316 51.00248305935758, 
14.03569145591514 51.00176950378165, 14.037017581967358 51.00193640923721, 
14.039078347499816 51.00218906817236, 14.041385544697533 51.001182106955575, 
14.042961142804325 51.00050362709656, 14.045466411981872 51.00171549577248, 
14.047436055944628 51.00170955402223, 14.047501462719008 51.000698678987334, 
14.0512755199 51.00113313322361, 14.05199253668852 50.99931867381475, 
14.049908557072259 50.99764370517098, 14.048323570941262 50.99633182825351, 
14.048756259187261 50.995084559520684, 14.049083460379496 50.9941377646161, 
14.048559547400519 50.99251429812778, 14.048031545244314 50.990836907379865, 
14.049200640514442 50.988396160957755, 14.048752386129843 50.98645511274733, 
14.049899403692196 50.985032920490475, 14.050433093991956 50.982683593518296, 
14.049663453585083 50.98008576559324, 14.053447525957514 50.979934262898006, 
14.054101699616101 50.979905280314895, 14.054541876032507 50.979504496321006, 
14.056883382560684 50.97728906687162, 14.061461001521177 50.977626495499514, 
14.065739069155171 50.977036091326006, 14.070237468930674 50.976528847851064, 
14.072737969004711 50.9763890539482, 14.073435256940234 50.97580914575129, 
14.073656252079921 50.97497361816587, 14.073507842972424 50.97415844208817, 
14.073477076544268 50.97357386295706, 14.073497624562199 50.97328497403231, 
14.07353309846212 50.973004634793355, 14.073629355769953 50.97258730796079, 
14.074031133526418 50.97206150901968, 14.074173161574096 50.9718679773403, 
14.074953685738512 50.97125847625354, 14.076318122528281 50.970450862964476, 
14.077419495985641 50.968822576119045, 14.07938471092451 50.96785232072157, 
14.079935466625953 50.965376163972905, 14.075928740820487 50.965769620044185, 
14.076477656611141 50.96493301242208, 14.07715569856788 50.96392127776921, 
14.074518652583494 50.96468694014318, 14.070947374058479 50.965733442336116, 
14.070067128285528 50.964490320180744, 14.073638108045227 50.962885351295846, 
14.074800325552822 50.96204792730229, 14.07578512690655 50.96131503249083, 
14.072648032067532 50.96077183886253, 14.069990748256624 50.96108762107119, 
14.068779840845313 50.962034585317255, 14.067957389649735 50.96265434173164, 
14.067324872520413 50.96203416740502, 14.066893256314694 50.96161501555103, 
14.063746581910717 50.96094576617525, 14.060359780703521 50.96104946306251, 
14.05747465371602 50.96025493987205, 14.05373788209221 50.960071883898436, 
14.048310513394146 50.95997631668494, 14.048472867689442 50.96098025221684, 
14.048526758248931 50.96131190166739, 14.047891683599207 50.961214167562254, 
14.043110704367722 50.96043205625402, 14.04200268876 50.96254661748835, 
14.039611197301197 50.962610392016906, 14.039496995984026 50.962983189010785, 
14.039153009982101 50.96408360520823, 14.036527043440328 50.964253532621974, 
14.03671903618213 50.962842471755415, 14.037626765889286 50.962400512989845, 
14.039081773666089 50.96165364817774, 14.039871682812647 50.96060271915765, 
14.040057039055357 50.95948010265582, 14.037002440491422 50.9541537619, 
14.035134869565928 50.9605846093134, 14.033106236231959 50.96185328415876, 
14.03250868364643 50.962997407710056, 14.03011455632278 50.963214199168796, 
14.030856865387815 50.96219178096697, 14.0308016783549 50.96188314726897, 
14.02902553543847 50.96211457060064, 14.028183254277486 50.961264041726984, 
14.025263665203713 50.962073167844004, 14.021542191222775 5

[jira] [Updated] (SOLR-5379) Query-time multi-word synonym expansion

2013-10-25 Thread Nguyen Manh Tien (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-5379?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Nguyen Manh Tien updated SOLR-5379:
---

Attachment: (was: synonym-expander.patch)

> Query-time multi-word synonym expansion
> ---
>
> Key: SOLR-5379
> URL: https://issues.apache.org/jira/browse/SOLR-5379
> Project: Solr
>  Issue Type: Improvement
>  Components: query parsers
>Reporter: Nguyen Manh Tien
>  Labels: multi-word, queryparser, synonym
> Fix For: 4.5.1, 4.6
>
> Attachments: quoted.patch, synonym-expander.patch
>
>
> While dealing with synonym at query time, solr failed to work with multi-word 
> synonyms due to some reasons:
> - First the lucene queryparser tokenizes user query by space so it split 
> multi-word term into two terms before feeding to synonym filter, so synonym 
> filter can't recognized multi-word term to do expansion
> - Second, if synonym filter expand into multiple terms which contains 
> multi-word synonym, The SolrQueryParseBase currently use MultiPhraseQuery to 
> handle synonyms. But MultiPhraseQuery don't work with term have different 
> number of words.
> For the first one, we can extend quoted all multi-word synonym in user query 
> so that lucene queryparser don't split it. There are a jira task related to 
> this one https://issues.apache.org/jira/browse/LUCENE-2605.
> For the second, we can replace MultiPhraseQuery by an appropriate BoleanQuery 
> SHOULD which contains multiple PhraseQuery in case tokens stream have 
> multi-word synonym.



--
This message was sent by Atlassian JIRA
(v6.1#6144)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (SOLR-5379) Query-time multi-word synonym expansion

2013-10-25 Thread Nguyen Manh Tien (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-5379?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Nguyen Manh Tien updated SOLR-5379:
---

Attachment: synonym-expander.patch

Patch check synonym term

> Query-time multi-word synonym expansion
> ---
>
> Key: SOLR-5379
> URL: https://issues.apache.org/jira/browse/SOLR-5379
> Project: Solr
>  Issue Type: Improvement
>  Components: query parsers
>Reporter: Nguyen Manh Tien
>  Labels: multi-word, queryparser, synonym
> Fix For: 4.5.1, 4.6
>
> Attachments: quoted.patch, synonym-expander.patch
>
>
> While dealing with synonym at query time, solr failed to work with multi-word 
> synonyms due to some reasons:
> - First the lucene queryparser tokenizes user query by space so it split 
> multi-word term into two terms before feeding to synonym filter, so synonym 
> filter can't recognized multi-word term to do expansion
> - Second, if synonym filter expand into multiple terms which contains 
> multi-word synonym, The SolrQueryParseBase currently use MultiPhraseQuery to 
> handle synonyms. But MultiPhraseQuery don't work with term have different 
> number of words.
> For the first one, we can extend quoted all multi-word synonym in user query 
> so that lucene queryparser don't split it. There are a jira task related to 
> this one https://issues.apache.org/jira/browse/LUCENE-2605.
> For the second, we can replace MultiPhraseQuery by an appropriate BoleanQuery 
> SHOULD which contains multiple PhraseQuery in case tokens stream have 
> multi-word synonym.



--
This message was sent by Atlassian JIRA
(v6.1#6144)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-5379) Query-time multi-word synonym expansion

2013-10-25 Thread Nguyen Manh Tien (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-5379?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13805188#comment-13805188
 ] 

Nguyen Manh Tien commented on SOLR-5379:


Yes, it will emit PhraseQuery(Term(a), Term(b)).
We must additional check to only tokenize term when it is synonym.
I will change the patch.

> Query-time multi-word synonym expansion
> ---
>
> Key: SOLR-5379
> URL: https://issues.apache.org/jira/browse/SOLR-5379
> Project: Solr
>  Issue Type: Improvement
>  Components: query parsers
>Reporter: Nguyen Manh Tien
>  Labels: multi-word, queryparser, synonym
> Fix For: 4.5.1, 4.6
>
> Attachments: quoted.patch, synonym-expander.patch
>
>
> While dealing with synonym at query time, solr failed to work with multi-word 
> synonyms due to some reasons:
> - First the lucene queryparser tokenizes user query by space so it split 
> multi-word term into two terms before feeding to synonym filter, so synonym 
> filter can't recognized multi-word term to do expansion
> - Second, if synonym filter expand into multiple terms which contains 
> multi-word synonym, The SolrQueryParseBase currently use MultiPhraseQuery to 
> handle synonyms. But MultiPhraseQuery don't work with term have different 
> number of words.
> For the first one, we can extend quoted all multi-word synonym in user query 
> so that lucene queryparser don't split it. There are a jira task related to 
> this one https://issues.apache.org/jira/browse/LUCENE-2605.
> For the second, we can replace MultiPhraseQuery by an appropriate BoleanQuery 
> SHOULD which contains multiple PhraseQuery in case tokens stream have 
> multi-word synonym.



--
This message was sent by Atlassian JIRA
(v6.1#6144)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Comment Edited] (SOLR-5379) Query-time multi-word synonym expansion

2013-10-25 Thread Marco Wong (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-5379?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13805179#comment-13805179
 ] 

Marco Wong edited comment on SOLR-5379 at 10/25/13 8:55 AM:


Excuse me, for the synonym-expander.patch, does the updated SolrQueryParserBase 
will emitting PhraseQuery(Term(a), Term(b)), when I have a ShingleFilter in 
query time analyzer which emits bigram like Term(a b), and makes my existing 
tokenization logic fail?


was (Author: marcowong):
Excuse me, for the synonym-expander.patch, does the updated SolrQueryParserBase 
will emitting PhraseQuery(Term(a), Term(b)), when I have a ShingleFilter in 
query time analyzer which emits bigram like Term(a b), which makes my existing 
tokenization logic fail?

> Query-time multi-word synonym expansion
> ---
>
> Key: SOLR-5379
> URL: https://issues.apache.org/jira/browse/SOLR-5379
> Project: Solr
>  Issue Type: Improvement
>  Components: query parsers
>Reporter: Nguyen Manh Tien
>  Labels: multi-word, queryparser, synonym
> Fix For: 4.5.1, 4.6
>
> Attachments: quoted.patch, synonym-expander.patch
>
>
> While dealing with synonym at query time, solr failed to work with multi-word 
> synonyms due to some reasons:
> - First the lucene queryparser tokenizes user query by space so it split 
> multi-word term into two terms before feeding to synonym filter, so synonym 
> filter can't recognized multi-word term to do expansion
> - Second, if synonym filter expand into multiple terms which contains 
> multi-word synonym, The SolrQueryParseBase currently use MultiPhraseQuery to 
> handle synonyms. But MultiPhraseQuery don't work with term have different 
> number of words.
> For the first one, we can extend quoted all multi-word synonym in user query 
> so that lucene queryparser don't split it. There are a jira task related to 
> this one https://issues.apache.org/jira/browse/LUCENE-2605.
> For the second, we can replace MultiPhraseQuery by an appropriate BoleanQuery 
> SHOULD which contains multiple PhraseQuery in case tokens stream have 
> multi-word synonym.



--
This message was sent by Atlassian JIRA
(v6.1#6144)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Comment Edited] (SOLR-5379) Query-time multi-word synonym expansion

2013-10-25 Thread Marco Wong (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-5379?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13805179#comment-13805179
 ] 

Marco Wong edited comment on SOLR-5379 at 10/25/13 8:56 AM:


Excuse me, for the synonym-expander.patch, when I have a ShingleFilter in query 
time analyzer which emits bigram TermQuery like Term(a b), does the updated 
SolrQueryParserBase will emitting PhraseQuery(Term(a), Term(b)), making my 
existing tokenization logic fail?


was (Author: marcowong):
Excuse me, for the synonym-expander.patch, does the updated SolrQueryParserBase 
will emitting PhraseQuery(Term(a), Term(b)), when I have a ShingleFilter in 
query time analyzer which emits bigram like Term(a b), and makes my existing 
tokenization logic fail?

> Query-time multi-word synonym expansion
> ---
>
> Key: SOLR-5379
> URL: https://issues.apache.org/jira/browse/SOLR-5379
> Project: Solr
>  Issue Type: Improvement
>  Components: query parsers
>Reporter: Nguyen Manh Tien
>  Labels: multi-word, queryparser, synonym
> Fix For: 4.5.1, 4.6
>
> Attachments: quoted.patch, synonym-expander.patch
>
>
> While dealing with synonym at query time, solr failed to work with multi-word 
> synonyms due to some reasons:
> - First the lucene queryparser tokenizes user query by space so it split 
> multi-word term into two terms before feeding to synonym filter, so synonym 
> filter can't recognized multi-word term to do expansion
> - Second, if synonym filter expand into multiple terms which contains 
> multi-word synonym, The SolrQueryParseBase currently use MultiPhraseQuery to 
> handle synonyms. But MultiPhraseQuery don't work with term have different 
> number of words.
> For the first one, we can extend quoted all multi-word synonym in user query 
> so that lucene queryparser don't split it. There are a jira task related to 
> this one https://issues.apache.org/jira/browse/LUCENE-2605.
> For the second, we can replace MultiPhraseQuery by an appropriate BoleanQuery 
> SHOULD which contains multiple PhraseQuery in case tokens stream have 
> multi-word synonym.



--
This message was sent by Atlassian JIRA
(v6.1#6144)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-5379) Query-time multi-word synonym expansion

2013-10-25 Thread Marco Wong (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-5379?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13805179#comment-13805179
 ] 

Marco Wong commented on SOLR-5379:
--

Excuse me, for the synonym-expander.patch, does the updated SolrQueryParserBase 
will emitting PhraseQuery(Term(a), Term(b)), when I have a ShingleFilter in 
query time analyzer which emits bigram like Term(a b), which makes my existing 
tokenization logic fail?

> Query-time multi-word synonym expansion
> ---
>
> Key: SOLR-5379
> URL: https://issues.apache.org/jira/browse/SOLR-5379
> Project: Solr
>  Issue Type: Improvement
>  Components: query parsers
>Reporter: Nguyen Manh Tien
>  Labels: multi-word, queryparser, synonym
> Fix For: 4.5.1, 4.6
>
> Attachments: quoted.patch, synonym-expander.patch
>
>
> While dealing with synonym at query time, solr failed to work with multi-word 
> synonyms due to some reasons:
> - First the lucene queryparser tokenizes user query by space so it split 
> multi-word term into two terms before feeding to synonym filter, so synonym 
> filter can't recognized multi-word term to do expansion
> - Second, if synonym filter expand into multiple terms which contains 
> multi-word synonym, The SolrQueryParseBase currently use MultiPhraseQuery to 
> handle synonyms. But MultiPhraseQuery don't work with term have different 
> number of words.
> For the first one, we can extend quoted all multi-word synonym in user query 
> so that lucene queryparser don't split it. There are a jira task related to 
> this one https://issues.apache.org/jira/browse/LUCENE-2605.
> For the second, we can replace MultiPhraseQuery by an appropriate BoleanQuery 
> SHOULD which contains multiple PhraseQuery in case tokens stream have 
> multi-word synonym.



--
This message was sent by Atlassian JIRA
(v6.1#6144)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-5364) SolrCloud stops accepting updates

2013-10-25 Thread Chris (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-5364?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13805172#comment-13805172
 ] 

Chris commented on SOLR-5364:
-

Mark, thanks for taking the time to test this.

It turns out it is a garbage collection issue, I was running the default java 
args too, however my machines have 128GB RAM, and had been allocated 20GB heap. 
It seems this really needs tuning with Java 7. Since updating my GC collection 
settings, I have not had any issues. For reference these are the GC settings 
I'm running with:

JAVA_OPTS="$JAVA_OPTS -server -XX:NewRatio=1 -XX:SurvivorRatio=6 \
-XX:+UseConcMarkSweepGC -XX:+CMSIncrementalMode
-XX:CMSIncrementalDutyCycleMin=0 \
-XX:CMSIncrementalDutyCycle=10 -XX:+CMSIncrementalPacing \
-XX:+CMSClassUnloadingEnabled -XX:+DisableExplicitGC \
-XX:ConcGCThreads=10 \
-XX:ParallelGCThreads=10 \
-XX:MaxGCPauseMillis=3"

I've also set heap to 12g, eden to 5g:  -Xmx12g -Xms12g -Xmn5g


> SolrCloud stops accepting updates
> -
>
> Key: SOLR-5364
> URL: https://issues.apache.org/jira/browse/SOLR-5364
> Project: Solr
>  Issue Type: Bug
>  Components: SolrCloud
>Affects Versions: 4.4, 4.5, 4.6
>Reporter: Chris
>Priority: Blocker
>
> I'm attempting to import data into a SolrCloud cluster. After a certain 
> amount of time, the cluster stops accepting updates.
> I have tried numerous suggestions in IRC from Elyorag and others without 
> resolve.
> I have had this issue with 4.4, and understood there was a deadlock issue 
> fixed in 4.5, which hasn't resolved the issue, neither have the 4.6 snapshots.
> I've tried with Tomcat, various tomcat configuration changes to threading, 
> and with Jetty. Tried with various index merging configurations as I 
> initially thought there was a deadlock with concurrent merg scheduler, 
> however same issue with SerialMergeScheduler.
> The cluster stops accepting updates after some amount of time, this seems to 
> vary and is inconsistent. Sometimes I manage to index 400k docs, other times 
> ~1million . Querying  the cluster continues to work. I can reproduce the 
> issue consistently, and is currently blocking our transition to Solr.
> I can provide stack traces, thread dumps, jstack dumps as required.
> Here are two jstacks thus far:
> http://pastebin.com/1ktjBYbf
> http://pastebin.com/8JiQc3rb
> I have got these jstacks from the latest 4.6 snapshot, also running solrj 
> snapshot. The issue is also consistently reproducable with BinaryRequest 
> writer.



--
This message was sent by Atlassian JIRA
(v6.1#6144)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Closed] (SOLR-5331) SolrCloud 4.5 bulk add errors

2013-10-25 Thread Chris (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-5331?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chris closed SOLR-5331.
---

Resolution: Invalid

> SolrCloud 4.5 bulk add errors
> -
>
> Key: SOLR-5331
> URL: https://issues.apache.org/jira/browse/SOLR-5331
> Project: Solr
>  Issue Type: Bug
>  Components: SolrCloud
>Affects Versions: 4.5
>Reporter: Chris
> Fix For: 4.5.1
>
>
> Since Solr 4.5 bulk adding documents via SolrJ (at least) is causing errors.
> // build array list of SolrInputDocuments
> server.add(docs);
> I've tried with CUSS (which swallows exceptions as expected) however they are 
> shown in the logs on server, and with CloudSolrServer which is returning the 
> errors as well as seeing them in the server logs.
> I've tried downgrading my SolrJ to 4.4, still errors so looks like a 
> regression in the server code. Reverting to Solr 4.4 on server and I don't 
> get errors (however run into deadlock issues).
> I raised this issue in IRC - NOT the mailing list, and elyorag suggested 
> opening a ticket, and to mention this has now been discussed in IRC.
> The exceptions would indicate I'm attempting to do multiple operations in a 
> single request which is malformed. I am not, I am only attempting to add 
> documents.
> Stack traces seen here:
> 14:57:13 ERROR SolrCmdDistributor shard update error RetryNode: 
> http://X.X.X.X:8080/solr/collection1_shard16_replica2/:org.apache.solr.client.solrj.impl.HttpSolrServer$RemoteSolrException:
>  Illegal to have multiple roots (start tag in epilog?).
>  
> shard update error RetryNode: 
> http://X.X.X.X:8080/solr/collection1_shard16_replica2/:org.apache.solr.client.solrj.impl.HttpSolrServer$RemoteSolrException:
>  Illegal to have multiple roots (start tag in epilog?).
> at [row,col {unknown-source}]: [18,327]
> at 
> org.apache.solr.client.solrj.impl.HttpSolrServer.request(HttpSolrServer.java:424)
> at 
> org.apache.solr.client.solrj.impl.HttpSolrServer.request(HttpSolrServer.java:180)
> at 
> org.apache.solr.update.SolrCmdDistributor$1.call(SolrCmdDistributor.java:401)
> at 
> org.apache.solr.update.SolrCmdDistributor$1.call(SolrCmdDistributor.java:375)
> at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:334)
> at java.util.concurrent.FutureTask.run(FutureTask.java:166)
> at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)
> at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:334)
> at java.util.concurrent.FutureTask.run(FutureTask.java:166)
> at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
> at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
> at java.lang.Thread.run(Thread.java:722)
> 
> org.apache.solr.common.SolrException: Illegal to have multiple roots 
> (start tag in epilog?).
> at [row,col {unknown-source}]: [7,6314]
> at org.apache.solr.handler.loader.XMLLoader.load(XMLLoader.java:176)
> at 
> org.apache.solr.handler.UpdateRequestHandler$1.load(UpdateRequestHandler.java:92)
> at 
> org.apache.solr.handler.ContentStreamHandlerBase.handleRequestBody(ContentStreamHandlerBase.java:74)
> at 
> org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:135)
> at org.apache.solr.core.SolrCore.execute(SolrCore.java:1859)
> at 
> org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:703)
> at 
> org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:406)
> at 
> org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:195)
> at 
> org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:243)
> at 
> org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:210)
> at 
> org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperValve.java:222)
> at 
> org.apache.catalina.core.StandardContextValve.invoke(StandardContextValve.java:123)
> at 
> org.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.java:171)
> at 
> org.apache.catalina.valves.ErrorReportValve.invoke(ErrorReportValve.java:99)
> at 
> org.apache.catalina.valves.AccessLogValve.invoke(AccessLogValve.java:953)
> at 
> org.apache.catalina.core.StandardEngineValve.invoke(StandardEngineValve.java:118)
> at 
> org.apache.catalina.connector.CoyoteAdapter.service(CoyoteAdapter.java:408)
> at 
> org.apache.coyote.http11.AbstractHttp11Processor.process(AbstractHttp11Processor.java:1023)
> at 
> org.apache.coyote.AbstractProtocol$AbstractConnectionHandler.process(AbstractProtocol.java:589)
> at 
> org.apache.tomcat.util.net.JIoEndpoint$SocketProcessor.run(JIoEndpoint.java:312)
> at 
> java.util.c

[jira] [Commented] (SOLR-5331) SolrCloud 4.5 bulk add errors

2013-10-25 Thread Chris (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-5331?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13805167#comment-13805167
 ] 

Chris commented on SOLR-5331:
-

Closing this... With the above changes, and changes to my tomcat connector 
limits, this issue has gone away.

> SolrCloud 4.5 bulk add errors
> -
>
> Key: SOLR-5331
> URL: https://issues.apache.org/jira/browse/SOLR-5331
> Project: Solr
>  Issue Type: Bug
>  Components: SolrCloud
>Affects Versions: 4.5
>Reporter: Chris
> Fix For: 4.5.1
>
>
> Since Solr 4.5 bulk adding documents via SolrJ (at least) is causing errors.
> // build array list of SolrInputDocuments
> server.add(docs);
> I've tried with CUSS (which swallows exceptions as expected) however they are 
> shown in the logs on server, and with CloudSolrServer which is returning the 
> errors as well as seeing them in the server logs.
> I've tried downgrading my SolrJ to 4.4, still errors so looks like a 
> regression in the server code. Reverting to Solr 4.4 on server and I don't 
> get errors (however run into deadlock issues).
> I raised this issue in IRC - NOT the mailing list, and elyorag suggested 
> opening a ticket, and to mention this has now been discussed in IRC.
> The exceptions would indicate I'm attempting to do multiple operations in a 
> single request which is malformed. I am not, I am only attempting to add 
> documents.
> Stack traces seen here:
> 14:57:13 ERROR SolrCmdDistributor shard update error RetryNode: 
> http://X.X.X.X:8080/solr/collection1_shard16_replica2/:org.apache.solr.client.solrj.impl.HttpSolrServer$RemoteSolrException:
>  Illegal to have multiple roots (start tag in epilog?).
>  
> shard update error RetryNode: 
> http://X.X.X.X:8080/solr/collection1_shard16_replica2/:org.apache.solr.client.solrj.impl.HttpSolrServer$RemoteSolrException:
>  Illegal to have multiple roots (start tag in epilog?).
> at [row,col {unknown-source}]: [18,327]
> at 
> org.apache.solr.client.solrj.impl.HttpSolrServer.request(HttpSolrServer.java:424)
> at 
> org.apache.solr.client.solrj.impl.HttpSolrServer.request(HttpSolrServer.java:180)
> at 
> org.apache.solr.update.SolrCmdDistributor$1.call(SolrCmdDistributor.java:401)
> at 
> org.apache.solr.update.SolrCmdDistributor$1.call(SolrCmdDistributor.java:375)
> at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:334)
> at java.util.concurrent.FutureTask.run(FutureTask.java:166)
> at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)
> at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:334)
> at java.util.concurrent.FutureTask.run(FutureTask.java:166)
> at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
> at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
> at java.lang.Thread.run(Thread.java:722)
> 
> org.apache.solr.common.SolrException: Illegal to have multiple roots 
> (start tag in epilog?).
> at [row,col {unknown-source}]: [7,6314]
> at org.apache.solr.handler.loader.XMLLoader.load(XMLLoader.java:176)
> at 
> org.apache.solr.handler.UpdateRequestHandler$1.load(UpdateRequestHandler.java:92)
> at 
> org.apache.solr.handler.ContentStreamHandlerBase.handleRequestBody(ContentStreamHandlerBase.java:74)
> at 
> org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:135)
> at org.apache.solr.core.SolrCore.execute(SolrCore.java:1859)
> at 
> org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:703)
> at 
> org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:406)
> at 
> org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:195)
> at 
> org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:243)
> at 
> org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:210)
> at 
> org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperValve.java:222)
> at 
> org.apache.catalina.core.StandardContextValve.invoke(StandardContextValve.java:123)
> at 
> org.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.java:171)
> at 
> org.apache.catalina.valves.ErrorReportValve.invoke(ErrorReportValve.java:99)
> at 
> org.apache.catalina.valves.AccessLogValve.invoke(AccessLogValve.java:953)
> at 
> org.apache.catalina.core.StandardEngineValve.invoke(StandardEngineValve.java:118)
> at 
> org.apache.catalina.connector.CoyoteAdapter.service(CoyoteAdapter.java:408)
> at 
> org.apache.coyote.http11.AbstractHttp11Processor.process(AbstractHttp11Processor.java:1023)
> at 
> org.apache.coyote.AbstractProtocol$AbstractConnectionHandler.proces

[jira] [Updated] (LUCENE-5189) Numeric DocValues Updates

2013-10-25 Thread Shai Erera (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-5189?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shai Erera updated LUCENE-5189:
---

Attachment: LUCENE-5189-4x.patch

Patch covers the work from all the issues, ported to 4x (created w/ 
--show-copies-as-adds). I think it's ready (tests pass several times).

If there are no objections, I will add a CHANGES entry and commit it.

> Numeric DocValues Updates
> -
>
> Key: LUCENE-5189
> URL: https://issues.apache.org/jira/browse/LUCENE-5189
> Project: Lucene - Core
>  Issue Type: New Feature
>  Components: core/index
>Reporter: Shai Erera
>Assignee: Shai Erera
> Attachments: LUCENE-5189-4x.patch, LUCENE-5189-4x.patch, 
> LUCENE-5189-no-lost-updates.patch, LUCENE-5189.patch, LUCENE-5189.patch, 
> LUCENE-5189.patch, LUCENE-5189.patch, LUCENE-5189.patch, LUCENE-5189.patch, 
> LUCENE-5189.patch, LUCENE-5189.patch, LUCENE-5189.patch, LUCENE-5189.patch, 
> LUCENE-5189.patch, LUCENE-5189_process_events.patch, 
> LUCENE-5189_process_events.patch, LUCENE-5189-segdv.patch, 
> LUCENE-5189-updates-order.patch, LUCENE-5189-updates-order.patch
>
>
> In LUCENE-4258 we started to work on incremental field updates, however the 
> amount of changes are immense and hard to follow/consume. The reason is that 
> we targeted postings, stored fields, DV etc., all from the get go.
> I'd like to start afresh here, with numeric-dv-field updates only. There are 
> a couple of reasons to that:
> * NumericDV fields should be easier to update, if e.g. we write all the 
> values of all the documents in a segment for the updated field (similar to 
> how livedocs work, and previously norms).
> * It's a fairly contained issue, attempting to handle just one data type to 
> update, yet requires many changes to core code which will also be useful for 
> updating other data types.
> * It has value in and on itself, and we don't need to allow updating all the 
> data types in Lucene at once ... we can do that gradually.
> I have some working patch already which I'll upload next, explaining the 
> changes.



--
This message was sent by Atlassian JIRA
(v6.1#6144)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org