[JENKINS] Lucene-Solr-tests-only-3.x - Build # 10820 - Failure

2011-10-10 Thread Apache Jenkins Server
Build: https://builds.apache.org/job/Lucene-Solr-tests-only-3.x/10820/

2 tests failed.
REGRESSION:  
org.apache.solr.client.solrj.embedded.SolrExampleEmbeddedTest.testCommitWithin

Error Message:
expected:<1> but was:<0>

Stack Trace:
junit.framework.AssertionFailedError: expected:<1> but was:<0>
at 
org.apache.solr.client.solrj.SolrExampleTests.testCommitWithin(SolrExampleTests.java:363)
at 
org.apache.lucene.util.LuceneTestCase$2$1.evaluate(LuceneTestCase.java:435)
at 
org.apache.lucene.util.LuceneTestCaseRunner.runChild(LuceneTestCaseRunner.java:147)
at 
org.apache.lucene.util.LuceneTestCaseRunner.runChild(LuceneTestCaseRunner.java:50)


REGRESSION:  
org.apache.solr.client.solrj.embedded.SolrExampleStreamingTest.testCommitWithin

Error Message:
expected:<1> but was:<0>

Stack Trace:
junit.framework.AssertionFailedError: expected:<1> but was:<0>
at 
org.apache.solr.client.solrj.SolrExampleTests.testCommitWithin(SolrExampleTests.java:341)
at 
org.apache.lucene.util.LuceneTestCase$2$1.evaluate(LuceneTestCase.java:435)
at 
org.apache.lucene.util.LuceneTestCaseRunner.runChild(LuceneTestCaseRunner.java:147)
at 
org.apache.lucene.util.LuceneTestCaseRunner.runChild(LuceneTestCaseRunner.java:50)




Build Log (for compile errors):
[...truncated 14483 lines...]



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Created] (SOLR-2819) Wrong isHex()-method in HTMLStripCharFilter

2011-10-10 Thread Bernhard Berger (Created) (JIRA)
Wrong isHex()-method in HTMLStripCharFilter
---

 Key: SOLR-2819
 URL: https://issues.apache.org/jira/browse/SOLR-2819
 Project: Solr
  Issue Type: Bug
Affects Versions: 3.4
Reporter: Bernhard Berger
Priority: Trivial


In org.apache.solr.analysis HTMLStripCharFilter use a wrong isHex()-method that 
return characters like 'X', 'Y' as valid hex chars:

{code}
  private boolean isHex(int ch) {
return (ch>='0' && ch<='9') ||
   (ch>='A' && ch<='Z') ||
   (ch>='a' && ch<='z');
  }
{code}

If only characters from [0-9a-zA-Z] are allowed, the readNumericEntity method 
will detect faster a mismatch.


--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[JENKINS] Lucene-Solr-tests-only-3.x-java7 - Build # 608 - Failure

2011-10-10 Thread Apache Jenkins Server
Build: https://builds.apache.org/job/Lucene-Solr-tests-only-3.x-java7/608/

1 tests failed.
REGRESSION:  org.apache.solr.update.AutoCommitTest.testMaxTime

Error Message:
null

Stack Trace:
junit.framework.AssertionFailedError: 
at 
org.apache.lucene.util.LuceneTestCaseRunner.runChild(LuceneTestCaseRunner.java:147)
at 
org.apache.lucene.util.LuceneTestCaseRunner.runChild(LuceneTestCaseRunner.java:50)
at 
org.apache.solr.update.AutoCommitTest.testMaxTime(AutoCommitTest.java:247)
at 
org.apache.lucene.util.LuceneTestCase$2$1.evaluate(LuceneTestCase.java:435)




Build Log (for compile errors):
[...truncated 14033 lines...]



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (LUCENE-3186) DocValues type should be recored in FNX file to early fail if user specifies incompatible type

2011-10-10 Thread Simon Willnauer (Updated) (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-3186?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Simon Willnauer updated LUCENE-3186:


Attachment: LUCENE-3186.patch

next iteration. Added JavaDoc, removed all nocommits and fixed all tests.

This version of the patch promotes incompatible variants to BYTES_VAR_STRAIGHT 
instead of dropping the data entirely. this looses at least no data if somebody 
messes up their types. I think this is ready - if nobody objects I am going to 
commit this tomorrow...

> DocValues type should be recored in FNX file to early fail if user specifies 
> incompatible type
> --
>
> Key: LUCENE-3186
> URL: https://issues.apache.org/jira/browse/LUCENE-3186
> Project: Lucene - Java
>  Issue Type: Improvement
>  Components: core/index
>Affects Versions: 4.0
>Reporter: Simon Willnauer
>Assignee: Simon Willnauer
> Fix For: 4.0
>
> Attachments: LUCENE-3186.patch, LUCENE-3186.patch, LUCENE-3186.patch
>
>
> Currently segment merger fails if the docvalues type is not compatible across 
> segments. We already catch this problem if somebody changes the values type 
> for a field within one segment but not across segments. in order to do that 
> we should record the type in the fnx fiel alone with the field numbers.
> I marked this 4.0 since it should not block the landing on trunk

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-1536) if a filter can support random access API, we should use it

2011-10-10 Thread Michael McCandless (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-1536?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13123998#comment-13123998
 ] 

Michael McCandless commented on LUCENE-1536:


On the diff that luceneutil hits, it looks like there's an float iota
difference:

On trunk we get these results:

{noformat}
TASK: cat=Fuzzy1F90.0 q=body:changer~1.0 s=null 
f=CachingWrapperFilter(PreComputedRandomFilter(pctAccept=90.0)) group=null 
hits=198715
  32.160243 msec
  thread 5
  doc=6199951 score=40.27584
  doc=6199960 score=40.27584
  doc=6200023 score=40.27584
  doc=7580697 score=40.27584
  doc=7995191 score=33.34529
  doc=8684145 score=31.100195
  doc=6260043 score=31.100193
  doc=7320778 score=31.100193
  doc=7454704 score=31.100193
  doc=7979518 score=26.333052
  50 expanded terms
{noformat}

With the patch we get this:

{noformat}
TASK: cat=Fuzzy1F90.0 q=body:changer~1.0 s=null 
f=CachingWrapperFilter(PreComputedRandomFilter(pctAccept=90.0)) group=null 
hits=198715
  19.300811 msec
  thread 4
  doc=6199951 score=40.27584
  doc=6199960 score=40.27584
  doc=6200023 score=40.27584
  doc=7580697 score=40.27584
  doc=7995191 score=33.34529
  doc=6260043 score=31.100195
  doc=7454704 score=31.100195
  doc=7320778 score=31.100193
  doc=8684145 score=31.100193
  doc=7979518 score=26.333052
  50 expanded terms
{noformat}

several of the docs with score 31.100193 or 31.100195 flipped around.


> if a filter can support random access API, we should use it
> ---
>
> Key: LUCENE-1536
> URL: https://issues.apache.org/jira/browse/LUCENE-1536
> Project: Lucene - Java
>  Issue Type: Improvement
>  Components: core/search
>Affects Versions: 2.4
>Reporter: Michael McCandless
>Assignee: Michael McCandless
>Priority: Minor
>  Labels: gsoc2011, lucene-gsoc-11, mentor
> Fix For: 4.0
>
> Attachments: CachedFilterIndexReader.java, LUCENE-1536-rewrite.patch, 
> LUCENE-1536-rewrite.patch, LUCENE-1536-rewrite.patch, 
> LUCENE-1536-rewrite.patch, LUCENE-1536-rewrite.patch, 
> LUCENE-1536-rewrite.patch, LUCENE-1536-rewrite.patch, 
> LUCENE-1536-rewrite.patch, LUCENE-1536.patch, LUCENE-1536.patch, 
> LUCENE-1536.patch, LUCENE-1536.patch, LUCENE-1536.patch, LUCENE-1536.patch, 
> LUCENE-1536.patch, LUCENE-1536.patch, LUCENE-1536.patch, LUCENE-1536.patch, 
> LUCENE-1536.patch, LUCENE-1536.patch, LUCENE-1536.patch, LUCENE-1536.patch, 
> LUCENE-1536.patch, LUCENE-1536.patch, LUCENE-1536.patch, LUCENE-1536.patch, 
> LUCENE-1536.patch, LUCENE-1536.patch, LUCENE-1536.patch, LUCENE-1536.patch, 
> LUCENE-1536.patch, LUCENE-1536.patch, LUCENE-1536.patch, LUCENE-1536.patch, 
> changes-yonik-uwe.patch, luceneutil.patch
>
>
> I ran some performance tests, comparing applying a filter via
> random-access API instead of current trunk's iterator API.
> This was inspired by LUCENE-1476, where we realized deletions should
> really be implemented just like a filter, but then in testing found
> that switching deletions to iterator was a very sizable performance
> hit.
> Some notes on the test:
>   * Index is first 2M docs of Wikipedia.  Test machine is Mac OS X
> 10.5.6, quad core Intel CPU, 6 GB RAM, java 1.6.0_07-b06-153.
>   * I test across multiple queries.  1-X means an OR query, eg 1-4
> means 1 OR 2 OR 3 OR 4, whereas +1-4 is an AND query, ie 1 AND 2
> AND 3 AND 4.  "u s" means "united states" (phrase search).
>   * I test with multiple filter densities (0, 1, 2, 5, 10, 25, 75, 90,
> 95, 98, 99, 99.9 (filter is non-null but all bits are set),
> 100 (filter=null, control)).
>   * Method high means I use random-access filter API in
> IndexSearcher's main loop.  Method low means I use random-access
> filter API down in SegmentTermDocs (just like deleted docs
> today).
>   * Baseline (QPS) is current trunk, where filter is applied as iterator up
> "high" (ie in IndexSearcher's search loop).

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-1536) if a filter can support random access API, we should use it

2011-10-10 Thread Chris Male (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-1536?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13124001#comment-13124001
 ] 

Chris Male commented on LUCENE-1536:


So where does this leave us?

> if a filter can support random access API, we should use it
> ---
>
> Key: LUCENE-1536
> URL: https://issues.apache.org/jira/browse/LUCENE-1536
> Project: Lucene - Java
>  Issue Type: Improvement
>  Components: core/search
>Affects Versions: 2.4
>Reporter: Michael McCandless
>Assignee: Michael McCandless
>Priority: Minor
>  Labels: gsoc2011, lucene-gsoc-11, mentor
> Fix For: 4.0
>
> Attachments: CachedFilterIndexReader.java, LUCENE-1536-rewrite.patch, 
> LUCENE-1536-rewrite.patch, LUCENE-1536-rewrite.patch, 
> LUCENE-1536-rewrite.patch, LUCENE-1536-rewrite.patch, 
> LUCENE-1536-rewrite.patch, LUCENE-1536-rewrite.patch, 
> LUCENE-1536-rewrite.patch, LUCENE-1536.patch, LUCENE-1536.patch, 
> LUCENE-1536.patch, LUCENE-1536.patch, LUCENE-1536.patch, LUCENE-1536.patch, 
> LUCENE-1536.patch, LUCENE-1536.patch, LUCENE-1536.patch, LUCENE-1536.patch, 
> LUCENE-1536.patch, LUCENE-1536.patch, LUCENE-1536.patch, LUCENE-1536.patch, 
> LUCENE-1536.patch, LUCENE-1536.patch, LUCENE-1536.patch, LUCENE-1536.patch, 
> LUCENE-1536.patch, LUCENE-1536.patch, LUCENE-1536.patch, LUCENE-1536.patch, 
> LUCENE-1536.patch, LUCENE-1536.patch, LUCENE-1536.patch, LUCENE-1536.patch, 
> changes-yonik-uwe.patch, luceneutil.patch
>
>
> I ran some performance tests, comparing applying a filter via
> random-access API instead of current trunk's iterator API.
> This was inspired by LUCENE-1476, where we realized deletions should
> really be implemented just like a filter, but then in testing found
> that switching deletions to iterator was a very sizable performance
> hit.
> Some notes on the test:
>   * Index is first 2M docs of Wikipedia.  Test machine is Mac OS X
> 10.5.6, quad core Intel CPU, 6 GB RAM, java 1.6.0_07-b06-153.
>   * I test across multiple queries.  1-X means an OR query, eg 1-4
> means 1 OR 2 OR 3 OR 4, whereas +1-4 is an AND query, ie 1 AND 2
> AND 3 AND 4.  "u s" means "united states" (phrase search).
>   * I test with multiple filter densities (0, 1, 2, 5, 10, 25, 75, 90,
> 95, 98, 99, 99.9 (filter is non-null but all bits are set),
> 100 (filter=null, control)).
>   * Method high means I use random-access filter API in
> IndexSearcher's main loop.  Method low means I use random-access
> filter API down in SegmentTermDocs (just like deleted docs
> today).
>   * Baseline (QPS) is current trunk, where filter is applied as iterator up
> "high" (ie in IndexSearcher's search loop).

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-1536) if a filter can support random access API, we should use it

2011-10-10 Thread Robert Muir (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-1536?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13124003#comment-13124003
 ] 

Robert Muir commented on LUCENE-1536:
-

this patch shouldn't be changing scores, I think even a small difference could 
be indicative of a larger problem: we need to understand what is causing this.

> if a filter can support random access API, we should use it
> ---
>
> Key: LUCENE-1536
> URL: https://issues.apache.org/jira/browse/LUCENE-1536
> Project: Lucene - Java
>  Issue Type: Improvement
>  Components: core/search
>Affects Versions: 2.4
>Reporter: Michael McCandless
>Assignee: Michael McCandless
>Priority: Minor
>  Labels: gsoc2011, lucene-gsoc-11, mentor
> Fix For: 4.0
>
> Attachments: CachedFilterIndexReader.java, LUCENE-1536-rewrite.patch, 
> LUCENE-1536-rewrite.patch, LUCENE-1536-rewrite.patch, 
> LUCENE-1536-rewrite.patch, LUCENE-1536-rewrite.patch, 
> LUCENE-1536-rewrite.patch, LUCENE-1536-rewrite.patch, 
> LUCENE-1536-rewrite.patch, LUCENE-1536.patch, LUCENE-1536.patch, 
> LUCENE-1536.patch, LUCENE-1536.patch, LUCENE-1536.patch, LUCENE-1536.patch, 
> LUCENE-1536.patch, LUCENE-1536.patch, LUCENE-1536.patch, LUCENE-1536.patch, 
> LUCENE-1536.patch, LUCENE-1536.patch, LUCENE-1536.patch, LUCENE-1536.patch, 
> LUCENE-1536.patch, LUCENE-1536.patch, LUCENE-1536.patch, LUCENE-1536.patch, 
> LUCENE-1536.patch, LUCENE-1536.patch, LUCENE-1536.patch, LUCENE-1536.patch, 
> LUCENE-1536.patch, LUCENE-1536.patch, LUCENE-1536.patch, LUCENE-1536.patch, 
> changes-yonik-uwe.patch, luceneutil.patch
>
>
> I ran some performance tests, comparing applying a filter via
> random-access API instead of current trunk's iterator API.
> This was inspired by LUCENE-1476, where we realized deletions should
> really be implemented just like a filter, but then in testing found
> that switching deletions to iterator was a very sizable performance
> hit.
> Some notes on the test:
>   * Index is first 2M docs of Wikipedia.  Test machine is Mac OS X
> 10.5.6, quad core Intel CPU, 6 GB RAM, java 1.6.0_07-b06-153.
>   * I test across multiple queries.  1-X means an OR query, eg 1-4
> means 1 OR 2 OR 3 OR 4, whereas +1-4 is an AND query, ie 1 AND 2
> AND 3 AND 4.  "u s" means "united states" (phrase search).
>   * I test with multiple filter densities (0, 1, 2, 5, 10, 25, 75, 90,
> 95, 98, 99, 99.9 (filter is non-null but all bits are set),
> 100 (filter=null, control)).
>   * Method high means I use random-access filter API in
> IndexSearcher's main loop.  Method low means I use random-access
> filter API down in SegmentTermDocs (just like deleted docs
> today).
>   * Baseline (QPS) is current trunk, where filter is applied as iterator up
> "high" (ie in IndexSearcher's search loop).

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-1536) if a filter can support random access API, we should use it

2011-10-10 Thread Chris Male (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-1536?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13124004#comment-13124004
 ] 

Chris Male commented on LUCENE-1536:


Are any deletes made in the above benchmarking? Might try to simulate the same 
change in a small unit test.

> if a filter can support random access API, we should use it
> ---
>
> Key: LUCENE-1536
> URL: https://issues.apache.org/jira/browse/LUCENE-1536
> Project: Lucene - Java
>  Issue Type: Improvement
>  Components: core/search
>Affects Versions: 2.4
>Reporter: Michael McCandless
>Assignee: Michael McCandless
>Priority: Minor
>  Labels: gsoc2011, lucene-gsoc-11, mentor
> Fix For: 4.0
>
> Attachments: CachedFilterIndexReader.java, LUCENE-1536-rewrite.patch, 
> LUCENE-1536-rewrite.patch, LUCENE-1536-rewrite.patch, 
> LUCENE-1536-rewrite.patch, LUCENE-1536-rewrite.patch, 
> LUCENE-1536-rewrite.patch, LUCENE-1536-rewrite.patch, 
> LUCENE-1536-rewrite.patch, LUCENE-1536.patch, LUCENE-1536.patch, 
> LUCENE-1536.patch, LUCENE-1536.patch, LUCENE-1536.patch, LUCENE-1536.patch, 
> LUCENE-1536.patch, LUCENE-1536.patch, LUCENE-1536.patch, LUCENE-1536.patch, 
> LUCENE-1536.patch, LUCENE-1536.patch, LUCENE-1536.patch, LUCENE-1536.patch, 
> LUCENE-1536.patch, LUCENE-1536.patch, LUCENE-1536.patch, LUCENE-1536.patch, 
> LUCENE-1536.patch, LUCENE-1536.patch, LUCENE-1536.patch, LUCENE-1536.patch, 
> LUCENE-1536.patch, LUCENE-1536.patch, LUCENE-1536.patch, LUCENE-1536.patch, 
> changes-yonik-uwe.patch, luceneutil.patch
>
>
> I ran some performance tests, comparing applying a filter via
> random-access API instead of current trunk's iterator API.
> This was inspired by LUCENE-1476, where we realized deletions should
> really be implemented just like a filter, but then in testing found
> that switching deletions to iterator was a very sizable performance
> hit.
> Some notes on the test:
>   * Index is first 2M docs of Wikipedia.  Test machine is Mac OS X
> 10.5.6, quad core Intel CPU, 6 GB RAM, java 1.6.0_07-b06-153.
>   * I test across multiple queries.  1-X means an OR query, eg 1-4
> means 1 OR 2 OR 3 OR 4, whereas +1-4 is an AND query, ie 1 AND 2
> AND 3 AND 4.  "u s" means "united states" (phrase search).
>   * I test with multiple filter densities (0, 1, 2, 5, 10, 25, 75, 90,
> 95, 98, 99, 99.9 (filter is non-null but all bits are set),
> 100 (filter=null, control)).
>   * Method high means I use random-access filter API in
> IndexSearcher's main loop.  Method low means I use random-access
> filter API down in SegmentTermDocs (just like deleted docs
> today).
>   * Baseline (QPS) is current trunk, where filter is applied as iterator up
> "high" (ie in IndexSearcher's search loop).

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-1536) if a filter can support random access API, we should use it

2011-10-10 Thread Robert Muir (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-1536?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13124006#comment-13124006
 ] 

Robert Muir commented on LUCENE-1536:
-

deletes dont affect scoring.

> if a filter can support random access API, we should use it
> ---
>
> Key: LUCENE-1536
> URL: https://issues.apache.org/jira/browse/LUCENE-1536
> Project: Lucene - Java
>  Issue Type: Improvement
>  Components: core/search
>Affects Versions: 2.4
>Reporter: Michael McCandless
>Assignee: Michael McCandless
>Priority: Minor
>  Labels: gsoc2011, lucene-gsoc-11, mentor
> Fix For: 4.0
>
> Attachments: CachedFilterIndexReader.java, LUCENE-1536-rewrite.patch, 
> LUCENE-1536-rewrite.patch, LUCENE-1536-rewrite.patch, 
> LUCENE-1536-rewrite.patch, LUCENE-1536-rewrite.patch, 
> LUCENE-1536-rewrite.patch, LUCENE-1536-rewrite.patch, 
> LUCENE-1536-rewrite.patch, LUCENE-1536.patch, LUCENE-1536.patch, 
> LUCENE-1536.patch, LUCENE-1536.patch, LUCENE-1536.patch, LUCENE-1536.patch, 
> LUCENE-1536.patch, LUCENE-1536.patch, LUCENE-1536.patch, LUCENE-1536.patch, 
> LUCENE-1536.patch, LUCENE-1536.patch, LUCENE-1536.patch, LUCENE-1536.patch, 
> LUCENE-1536.patch, LUCENE-1536.patch, LUCENE-1536.patch, LUCENE-1536.patch, 
> LUCENE-1536.patch, LUCENE-1536.patch, LUCENE-1536.patch, LUCENE-1536.patch, 
> LUCENE-1536.patch, LUCENE-1536.patch, LUCENE-1536.patch, LUCENE-1536.patch, 
> changes-yonik-uwe.patch, luceneutil.patch
>
>
> I ran some performance tests, comparing applying a filter via
> random-access API instead of current trunk's iterator API.
> This was inspired by LUCENE-1476, where we realized deletions should
> really be implemented just like a filter, but then in testing found
> that switching deletions to iterator was a very sizable performance
> hit.
> Some notes on the test:
>   * Index is first 2M docs of Wikipedia.  Test machine is Mac OS X
> 10.5.6, quad core Intel CPU, 6 GB RAM, java 1.6.0_07-b06-153.
>   * I test across multiple queries.  1-X means an OR query, eg 1-4
> means 1 OR 2 OR 3 OR 4, whereas +1-4 is an AND query, ie 1 AND 2
> AND 3 AND 4.  "u s" means "united states" (phrase search).
>   * I test with multiple filter densities (0, 1, 2, 5, 10, 25, 75, 90,
> 95, 98, 99, 99.9 (filter is non-null but all bits are set),
> 100 (filter=null, control)).
>   * Method high means I use random-access filter API in
> IndexSearcher's main loop.  Method low means I use random-access
> filter API down in SegmentTermDocs (just like deleted docs
> today).
>   * Baseline (QPS) is current trunk, where filter is applied as iterator up
> "high" (ie in IndexSearcher's search loop).

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-1536) if a filter can support random access API, we should use it

2011-10-10 Thread Chris Male (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-1536?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13124007#comment-13124007
 ] 

Chris Male commented on LUCENE-1536:


I realise that, I was just wanting to replicate the same conditions.

> if a filter can support random access API, we should use it
> ---
>
> Key: LUCENE-1536
> URL: https://issues.apache.org/jira/browse/LUCENE-1536
> Project: Lucene - Java
>  Issue Type: Improvement
>  Components: core/search
>Affects Versions: 2.4
>Reporter: Michael McCandless
>Assignee: Michael McCandless
>Priority: Minor
>  Labels: gsoc2011, lucene-gsoc-11, mentor
> Fix For: 4.0
>
> Attachments: CachedFilterIndexReader.java, LUCENE-1536-rewrite.patch, 
> LUCENE-1536-rewrite.patch, LUCENE-1536-rewrite.patch, 
> LUCENE-1536-rewrite.patch, LUCENE-1536-rewrite.patch, 
> LUCENE-1536-rewrite.patch, LUCENE-1536-rewrite.patch, 
> LUCENE-1536-rewrite.patch, LUCENE-1536.patch, LUCENE-1536.patch, 
> LUCENE-1536.patch, LUCENE-1536.patch, LUCENE-1536.patch, LUCENE-1536.patch, 
> LUCENE-1536.patch, LUCENE-1536.patch, LUCENE-1536.patch, LUCENE-1536.patch, 
> LUCENE-1536.patch, LUCENE-1536.patch, LUCENE-1536.patch, LUCENE-1536.patch, 
> LUCENE-1536.patch, LUCENE-1536.patch, LUCENE-1536.patch, LUCENE-1536.patch, 
> LUCENE-1536.patch, LUCENE-1536.patch, LUCENE-1536.patch, LUCENE-1536.patch, 
> LUCENE-1536.patch, LUCENE-1536.patch, LUCENE-1536.patch, LUCENE-1536.patch, 
> changes-yonik-uwe.patch, luceneutil.patch
>
>
> I ran some performance tests, comparing applying a filter via
> random-access API instead of current trunk's iterator API.
> This was inspired by LUCENE-1476, where we realized deletions should
> really be implemented just like a filter, but then in testing found
> that switching deletions to iterator was a very sizable performance
> hit.
> Some notes on the test:
>   * Index is first 2M docs of Wikipedia.  Test machine is Mac OS X
> 10.5.6, quad core Intel CPU, 6 GB RAM, java 1.6.0_07-b06-153.
>   * I test across multiple queries.  1-X means an OR query, eg 1-4
> means 1 OR 2 OR 3 OR 4, whereas +1-4 is an AND query, ie 1 AND 2
> AND 3 AND 4.  "u s" means "united states" (phrase search).
>   * I test with multiple filter densities (0, 1, 2, 5, 10, 25, 75, 90,
> 95, 98, 99, 99.9 (filter is non-null but all bits are set),
> 100 (filter=null, control)).
>   * Method high means I use random-access filter API in
> IndexSearcher's main loop.  Method low means I use random-access
> filter API down in SegmentTermDocs (just like deleted docs
> today).
>   * Baseline (QPS) is current trunk, where filter is applied as iterator up
> "high" (ie in IndexSearcher's search loop).

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Created] (LUCENE-3502) Packed ints: move .getArray into Reader API

2011-10-10 Thread Michael McCandless (Created) (JIRA)
Packed ints: move .getArray into Reader API
---

 Key: LUCENE-3502
 URL: https://issues.apache.org/jira/browse/LUCENE-3502
 Project: Lucene - Java
  Issue Type: Improvement
Reporter: Michael McCandless
Assignee: Michael McCandless
 Fix For: 3.5, 4.0


This is a simple code cleanup... it's messy that a consumer of
PackedInts.Reader must check whether the impl is Direct8/16/32/64 in
order to get an array; it's better to move up the .getArray into the
Reader interface and then make the DirectN impls package private.


--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (LUCENE-3502) Packed ints: move .getArray into Reader API

2011-10-10 Thread Michael McCandless (Updated) (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-3502?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Michael McCandless updated LUCENE-3502:
---

Attachment: LUCENE-3502.patch

> Packed ints: move .getArray into Reader API
> ---
>
> Key: LUCENE-3502
> URL: https://issues.apache.org/jira/browse/LUCENE-3502
> Project: Lucene - Java
>  Issue Type: Improvement
>Reporter: Michael McCandless
>Assignee: Michael McCandless
> Fix For: 3.5, 4.0
>
> Attachments: LUCENE-3502.patch
>
>
> This is a simple code cleanup... it's messy that a consumer of
> PackedInts.Reader must check whether the impl is Direct8/16/32/64 in
> order to get an array; it's better to move up the .getArray into the
> Reader interface and then make the DirectN impls package private.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-3502) Packed ints: move .getArray into Reader API

2011-10-10 Thread Simon Willnauer (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-3502?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13124043#comment-13124043
 ] 

Simon Willnauer commented on LUCENE-3502:
-

mike this looks good. Can we rename getNativeArray to getArray? this seems more 
consistent to what we have in IDV and what java has in ByteBuffer etc. I also 
think we should have a boolean hasArray() for consistency, or did I miss it?

simon

> Packed ints: move .getArray into Reader API
> ---
>
> Key: LUCENE-3502
> URL: https://issues.apache.org/jira/browse/LUCENE-3502
> Project: Lucene - Java
>  Issue Type: Improvement
>Reporter: Michael McCandless
>Assignee: Michael McCandless
> Fix For: 3.5, 4.0
>
> Attachments: LUCENE-3502.patch
>
>
> This is a simple code cleanup... it's messy that a consumer of
> PackedInts.Reader must check whether the impl is Direct8/16/32/64 in
> order to get an array; it's better to move up the .getArray into the
> Reader interface and then make the DirectN impls package private.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-2440) Schema Browser more user friendly

2011-10-10 Thread Joan Codina (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-2440?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13124056#comment-13124056
 ] 

Joan Codina commented on SOLR-2440:
---

ok, 
the patch for the screen.css (but this is a matter of taste). Even that what I 
changed was to improve a bit the contrast as the grey was "almost white" . If 
using a projector, then this low contrast colors are still worst, but in my 
screen where difficult to distinguish.

The patch includes 3 functionalities.
* Drill down : You can click on any of the items in the the list of most common 
words, to get the top documents having that item.
* Field Selector: On the right of the field's Name there is a minus sign. 
Clicking on it changes to a plus sign. After changing the field to view, the 
next drill down actions will include these fields in the result. (if none 
selected only current field is shown)
* Filter query: When a filter query is specified, then the list of most common 
words and frequencies are obtained from the facets after doing a search using 
that filter. Not very scalable, but it helps when guessing some properties of 
the data imported.


> Schema Browser more user friendly
> -
>
> Key: SOLR-2440
> URL: https://issues.apache.org/jira/browse/SOLR-2440
> Project: Solr
>  Issue Type: New Feature
>  Components: web gui
>Affects Versions: 1.4.1
> Environment: The schema browser of the admin web application
>Reporter: Joan Codina
>Priority: Minor
>  Labels: browser, schema
> Fix For: 3.5, 4.0
>
> Attachments: LUCENE_4_schema_jsp.patch, schema_jsp.patch
>
>   Original Estimate: 1h
>  Remaining Estimate: 1h
>
> The schema browser has some drawbacks
> * Does not sort the fields (the actual sorting seems arbritrary)
> * Capitalises all field names. Making difficult the match
> * Does not allow a drill down
> This small patch solves the three issues: 
> #  Changes the Css to do not capitalise the links
> #  Sorts the field names
> #  It replaces the tokens by links to a search query with that token
> that's all  

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (SOLR-2440) Schema Browser more user friendly

2011-10-10 Thread Joan Codina (Updated) (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-2440?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Joan Codina updated SOLR-2440:
--

Attachment: LUCENE_4_screen_css.patch

patch of some changes to increase contrast in the css.

> Schema Browser more user friendly
> -
>
> Key: SOLR-2440
> URL: https://issues.apache.org/jira/browse/SOLR-2440
> Project: Solr
>  Issue Type: New Feature
>  Components: web gui
>Affects Versions: 1.4.1
> Environment: The schema browser of the admin web application
>Reporter: Joan Codina
>Priority: Minor
>  Labels: browser, schema
> Fix For: 3.5, 4.0
>
> Attachments: LUCENE_4_schema_jsp.patch, LUCENE_4_screen_css.patch, 
> schema_jsp.patch
>
>   Original Estimate: 1h
>  Remaining Estimate: 1h
>
> The schema browser has some drawbacks
> * Does not sort the fields (the actual sorting seems arbritrary)
> * Capitalises all field names. Making difficult the match
> * Does not allow a drill down
> This small patch solves the three issues: 
> #  Changes the Css to do not capitalise the links
> #  Sorts the field names
> #  It replaces the tokens by links to a search query with that token
> that's all  

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-3186) DocValues type should be recored in FNX file to early fail if user specifies incompatible type

2011-10-10 Thread Michael McCandless (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-3186?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13124061#comment-13124061
 ] 

Michael McCandless commented on LUCENE-3186:


Patch looks good Simon!  Thanks.

> DocValues type should be recored in FNX file to early fail if user specifies 
> incompatible type
> --
>
> Key: LUCENE-3186
> URL: https://issues.apache.org/jira/browse/LUCENE-3186
> Project: Lucene - Java
>  Issue Type: Improvement
>  Components: core/index
>Affects Versions: 4.0
>Reporter: Simon Willnauer
>Assignee: Simon Willnauer
> Fix For: 4.0
>
> Attachments: LUCENE-3186.patch, LUCENE-3186.patch, LUCENE-3186.patch
>
>
> Currently segment merger fails if the docvalues type is not compatible across 
> segments. We already catch this problem if somebody changes the values type 
> for a field within one segment but not across segments. in order to do that 
> we should record the type in the fnx fiel alone with the field numbers.
> I marked this 4.0 since it should not block the landing on trunk

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-3502) Packed ints: move .getArray into Reader API

2011-10-10 Thread Michael McCandless (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-3502?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13124064#comment-13124064
 ] 

Michael McCandless commented on LUCENE-3502:


OK I'll rename to getArray.

On the .hasArray -- why do we need that?  Can't we just have .getArray and it 
returns null if there is none?  (None of these classes have a "costly" 
.getArray impl).  Likewise for DocValues...

> Packed ints: move .getArray into Reader API
> ---
>
> Key: LUCENE-3502
> URL: https://issues.apache.org/jira/browse/LUCENE-3502
> Project: Lucene - Java
>  Issue Type: Improvement
>Reporter: Michael McCandless
>Assignee: Michael McCandless
> Fix For: 3.5, 4.0
>
> Attachments: LUCENE-3502.patch
>
>
> This is a simple code cleanup... it's messy that a consumer of
> PackedInts.Reader must check whether the impl is Direct8/16/32/64 in
> order to get an array; it's better to move up the .getArray into the
> Reader interface and then make the DirectN impls package private.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-2667) Finish Solr Admin UI

2011-10-10 Thread Joan Codina (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-2667?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13124069#comment-13124069
 ] 

Joan Codina commented on SOLR-2667:
---

some issues when using it.
* It is a pity that one cannot indicate the number of terms to view, and only 
do more... more... and not modify the number (to ask for example for the top 
2000 terms), we do that sometimes, to check if there are many misspelled terms.
* A stupid issue: there is no place where the name of the current field is in 
plain text, so, you can cut&paste, to be sure you get the current spelling ;-)
* finally, maybe the graphic could be done using an html5 charting tool? 


> Finish Solr Admin UI
> 
>
> Key: SOLR-2667
> URL: https://issues.apache.org/jira/browse/SOLR-2667
> Project: Solr
>  Issue Type: Improvement
>Reporter: Ryan McKinley
>Assignee: Ryan McKinley
> Fix For: 4.0
>
> Attachments: SOLR-2667-110722.patch
>
>
> In SOLR-2399, we added a new admin UI. The issue has gotten too long to 
> follow, so this is a new issue to track remaining tasks.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-3502) Packed ints: move .getArray into Reader API

2011-10-10 Thread Robert Muir (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-3502?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13124067#comment-13124067
 ] 

Robert Muir commented on LUCENE-3502:
-

I like the hasArray for API consistency with ByteBuffer etc too (same with 
Docvalues).


> Packed ints: move .getArray into Reader API
> ---
>
> Key: LUCENE-3502
> URL: https://issues.apache.org/jira/browse/LUCENE-3502
> Project: Lucene - Java
>  Issue Type: Improvement
>Reporter: Michael McCandless
>Assignee: Michael McCandless
> Fix For: 3.5, 4.0
>
> Attachments: LUCENE-3502.patch
>
>
> This is a simple code cleanup... it's messy that a consumer of
> PackedInts.Reader must check whether the impl is Direct8/16/32/64 in
> order to get an array; it's better to move up the .getArray into the
> Reader interface and then make the DirectN impls package private.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-3502) Packed ints: move .getArray into Reader API

2011-10-10 Thread Simon Willnauer (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-3502?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13124070#comment-13124070
 ] 

Simon Willnauer commented on LUCENE-3502:
-

bq. I like the hasArray for API consistency with ByteBuffer etc too (same with 
Docvalues).
consistency is good and you might need a hasArray() for future docvalues impls 
or subclasses that compute it?

> Packed ints: move .getArray into Reader API
> ---
>
> Key: LUCENE-3502
> URL: https://issues.apache.org/jira/browse/LUCENE-3502
> Project: Lucene - Java
>  Issue Type: Improvement
>Reporter: Michael McCandless
>Assignee: Michael McCandless
> Fix For: 3.5, 4.0
>
> Attachments: LUCENE-3502.patch
>
>
> This is a simple code cleanup... it's messy that a consumer of
> PackedInts.Reader must check whether the impl is Direct8/16/32/64 in
> order to get an array; it's better to move up the .getArray into the
> Reader interface and then make the DirectN impls package private.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-3502) Packed ints: move .getArray into Reader API

2011-10-10 Thread Michael McCandless (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-3502?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13124072#comment-13124072
 ] 

Michael McCandless commented on LUCENE-3502:


OK I agree, I'll put the .hasArray back.

> Packed ints: move .getArray into Reader API
> ---
>
> Key: LUCENE-3502
> URL: https://issues.apache.org/jira/browse/LUCENE-3502
> Project: Lucene - Java
>  Issue Type: Improvement
>Reporter: Michael McCandless
>Assignee: Michael McCandless
> Fix For: 3.5, 4.0
>
> Attachments: LUCENE-3502.patch
>
>
> This is a simple code cleanup... it's messy that a consumer of
> PackedInts.Reader must check whether the impl is Direct8/16/32/64 in
> order to get an array; it's better to move up the .getArray into the
> Reader interface and then make the DirectN impls package private.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-1536) if a filter can support random access API, we should use it

2011-10-10 Thread Robert Muir (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-1536?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13124080#comment-13124080
 ] 

Robert Muir commented on LUCENE-1536:
-

well, i think you are on the right path.

if our unit tests pass but luceneutil 'fails' i think thats a bad sign of the 
quality of our tests... it sounds like
we need to improve the tests to have more coverage for filters & deletions?

> if a filter can support random access API, we should use it
> ---
>
> Key: LUCENE-1536
> URL: https://issues.apache.org/jira/browse/LUCENE-1536
> Project: Lucene - Java
>  Issue Type: Improvement
>  Components: core/search
>Affects Versions: 2.4
>Reporter: Michael McCandless
>Assignee: Michael McCandless
>Priority: Minor
>  Labels: gsoc2011, lucene-gsoc-11, mentor
> Fix For: 4.0
>
> Attachments: CachedFilterIndexReader.java, LUCENE-1536-rewrite.patch, 
> LUCENE-1536-rewrite.patch, LUCENE-1536-rewrite.patch, 
> LUCENE-1536-rewrite.patch, LUCENE-1536-rewrite.patch, 
> LUCENE-1536-rewrite.patch, LUCENE-1536-rewrite.patch, 
> LUCENE-1536-rewrite.patch, LUCENE-1536.patch, LUCENE-1536.patch, 
> LUCENE-1536.patch, LUCENE-1536.patch, LUCENE-1536.patch, LUCENE-1536.patch, 
> LUCENE-1536.patch, LUCENE-1536.patch, LUCENE-1536.patch, LUCENE-1536.patch, 
> LUCENE-1536.patch, LUCENE-1536.patch, LUCENE-1536.patch, LUCENE-1536.patch, 
> LUCENE-1536.patch, LUCENE-1536.patch, LUCENE-1536.patch, LUCENE-1536.patch, 
> LUCENE-1536.patch, LUCENE-1536.patch, LUCENE-1536.patch, LUCENE-1536.patch, 
> LUCENE-1536.patch, LUCENE-1536.patch, LUCENE-1536.patch, LUCENE-1536.patch, 
> changes-yonik-uwe.patch, luceneutil.patch
>
>
> I ran some performance tests, comparing applying a filter via
> random-access API instead of current trunk's iterator API.
> This was inspired by LUCENE-1476, where we realized deletions should
> really be implemented just like a filter, but then in testing found
> that switching deletions to iterator was a very sizable performance
> hit.
> Some notes on the test:
>   * Index is first 2M docs of Wikipedia.  Test machine is Mac OS X
> 10.5.6, quad core Intel CPU, 6 GB RAM, java 1.6.0_07-b06-153.
>   * I test across multiple queries.  1-X means an OR query, eg 1-4
> means 1 OR 2 OR 3 OR 4, whereas +1-4 is an AND query, ie 1 AND 2
> AND 3 AND 4.  "u s" means "united states" (phrase search).
>   * I test with multiple filter densities (0, 1, 2, 5, 10, 25, 75, 90,
> 95, 98, 99, 99.9 (filter is non-null but all bits are set),
> 100 (filter=null, control)).
>   * Method high means I use random-access filter API in
> IndexSearcher's main loop.  Method low means I use random-access
> filter API down in SegmentTermDocs (just like deleted docs
> today).
>   * Baseline (QPS) is current trunk, where filter is applied as iterator up
> "high" (ie in IndexSearcher's search loop).

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (LUCENE-3502) Packed ints: move .getArray into Reader API

2011-10-10 Thread Michael McCandless (Updated) (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-3502?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Michael McCandless updated LUCENE-3502:
---

Attachment: LUCENE-3502.patch

New patch, using .hasArray/getArray.

> Packed ints: move .getArray into Reader API
> ---
>
> Key: LUCENE-3502
> URL: https://issues.apache.org/jira/browse/LUCENE-3502
> Project: Lucene - Java
>  Issue Type: Improvement
>Reporter: Michael McCandless
>Assignee: Michael McCandless
> Fix For: 3.5, 4.0
>
> Attachments: LUCENE-3502.patch, LUCENE-3502.patch
>
>
> This is a simple code cleanup... it's messy that a consumer of
> PackedInts.Reader must check whether the impl is Direct8/16/32/64 in
> order to get an array; it's better to move up the .getArray into the
> Reader interface and then make the DirectN impls package private.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-1536) if a filter can support random access API, we should use it

2011-10-10 Thread Robert Muir (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-1536?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13124109#comment-13124109
 ] 

Robert Muir commented on LUCENE-1536:
-

one bug is that FilteredQuery in the patch runs some heuristics *per segment* 
which determine how the booleans get set that drive the BS1 versus B2 decision.

This means that some segments could get BS1, and others get BS2, meaning we 
will rank some documents arbitrarily higher than others when they actually have 
the same underlying index statistics... this is bad!

So I think at least the parameters to subscorer (topLevel/inOrder) must be 
consistently applied to all segments from that Weight.


> if a filter can support random access API, we should use it
> ---
>
> Key: LUCENE-1536
> URL: https://issues.apache.org/jira/browse/LUCENE-1536
> Project: Lucene - Java
>  Issue Type: Improvement
>  Components: core/search
>Affects Versions: 2.4
>Reporter: Michael McCandless
>Assignee: Michael McCandless
>Priority: Minor
>  Labels: gsoc2011, lucene-gsoc-11, mentor
> Fix For: 4.0
>
> Attachments: CachedFilterIndexReader.java, LUCENE-1536-rewrite.patch, 
> LUCENE-1536-rewrite.patch, LUCENE-1536-rewrite.patch, 
> LUCENE-1536-rewrite.patch, LUCENE-1536-rewrite.patch, 
> LUCENE-1536-rewrite.patch, LUCENE-1536-rewrite.patch, 
> LUCENE-1536-rewrite.patch, LUCENE-1536.patch, LUCENE-1536.patch, 
> LUCENE-1536.patch, LUCENE-1536.patch, LUCENE-1536.patch, LUCENE-1536.patch, 
> LUCENE-1536.patch, LUCENE-1536.patch, LUCENE-1536.patch, LUCENE-1536.patch, 
> LUCENE-1536.patch, LUCENE-1536.patch, LUCENE-1536.patch, LUCENE-1536.patch, 
> LUCENE-1536.patch, LUCENE-1536.patch, LUCENE-1536.patch, LUCENE-1536.patch, 
> LUCENE-1536.patch, LUCENE-1536.patch, LUCENE-1536.patch, LUCENE-1536.patch, 
> LUCENE-1536.patch, LUCENE-1536.patch, LUCENE-1536.patch, LUCENE-1536.patch, 
> changes-yonik-uwe.patch, luceneutil.patch
>
>
> I ran some performance tests, comparing applying a filter via
> random-access API instead of current trunk's iterator API.
> This was inspired by LUCENE-1476, where we realized deletions should
> really be implemented just like a filter, but then in testing found
> that switching deletions to iterator was a very sizable performance
> hit.
> Some notes on the test:
>   * Index is first 2M docs of Wikipedia.  Test machine is Mac OS X
> 10.5.6, quad core Intel CPU, 6 GB RAM, java 1.6.0_07-b06-153.
>   * I test across multiple queries.  1-X means an OR query, eg 1-4
> means 1 OR 2 OR 3 OR 4, whereas +1-4 is an AND query, ie 1 AND 2
> AND 3 AND 4.  "u s" means "united states" (phrase search).
>   * I test with multiple filter densities (0, 1, 2, 5, 10, 25, 75, 90,
> 95, 98, 99, 99.9 (filter is non-null but all bits are set),
> 100 (filter=null, control)).
>   * Method high means I use random-access filter API in
> IndexSearcher's main loop.  Method low means I use random-access
> filter API down in SegmentTermDocs (just like deleted docs
> today).
>   * Baseline (QPS) is current trunk, where filter is applied as iterator up
> "high" (ie in IndexSearcher's search loop).

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-3167) Make lucene/solr a OSGI bundle through Ant

2011-10-10 Thread Steven Rowe (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-3167?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13124118#comment-13124118
 ] 

Steven Rowe commented on LUCENE-3167:
-

Luca, you dropped the changes from my patch to {{solr/}} and {{modules/}}.  
Please put them back.

bq. I add a new property 'development'. This property could be used switch many 
other cases that can decrease the performances during the build.

As I have stated previously, I don't like this idea.

The default build with no properties specified should be development mode 
(i.e., don't do the extra work needed to build OSGi manifests).  The 
Lucene/Solr build is for the developers; it must be as fast as possible by 
default.

There should be a property named "build.osgi.manifests" or something similar 
that says what's happening, rather than hiding behind some anonymous "publish 
mode".  That is, don't call the property "development" or "publish.mode".  OSGi 
manifest building will be the only optional performance-decreasing element in 
the build, so there is no reason to generalize it at this point.


> Make lucene/solr a OSGI bundle through Ant
> --
>
> Key: LUCENE-3167
> URL: https://issues.apache.org/jira/browse/LUCENE-3167
> Project: Lucene - Java
>  Issue Type: New Feature
> Environment: bndtools
>Reporter: Luca Stancapiano
> Attachments: LUCENE-3167.patch, LUCENE-3167.patch, LUCENE-3167.patch, 
> lucene_trunk.patch, lucene_trunk.patch
>
>
> We need to make a bundle thriugh Ant, so the binary can be published and no 
> more need the download of the sources. Actually to get a OSGI bundle we need 
> to use maven tools and build the sources. Here the reference for the creation 
> of the OSGI bundle through Maven:
> https://issues.apache.org/jira/browse/LUCENE-1344
> Bndtools could be used inside Ant

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-1536) if a filter can support random access API, we should use it

2011-10-10 Thread Michael McCandless (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-1536?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13124129#comment-13124129
 ] 

Michael McCandless commented on LUCENE-1536:


Hmm, another bug is: we are never using BS1 when the filter is applied 'down 
low'; this is because FilteredQuery's Weight impl does not override 
scoresDocsOutOfOrder.  I think it should do so?  And if the filter will be 
applied 'down low', it should delegate to the wrapped Weight?

> if a filter can support random access API, we should use it
> ---
>
> Key: LUCENE-1536
> URL: https://issues.apache.org/jira/browse/LUCENE-1536
> Project: Lucene - Java
>  Issue Type: Improvement
>  Components: core/search
>Affects Versions: 2.4
>Reporter: Michael McCandless
>Assignee: Michael McCandless
>Priority: Minor
>  Labels: gsoc2011, lucene-gsoc-11, mentor
> Fix For: 4.0
>
> Attachments: CachedFilterIndexReader.java, LUCENE-1536-rewrite.patch, 
> LUCENE-1536-rewrite.patch, LUCENE-1536-rewrite.patch, 
> LUCENE-1536-rewrite.patch, LUCENE-1536-rewrite.patch, 
> LUCENE-1536-rewrite.patch, LUCENE-1536-rewrite.patch, 
> LUCENE-1536-rewrite.patch, LUCENE-1536.patch, LUCENE-1536.patch, 
> LUCENE-1536.patch, LUCENE-1536.patch, LUCENE-1536.patch, LUCENE-1536.patch, 
> LUCENE-1536.patch, LUCENE-1536.patch, LUCENE-1536.patch, LUCENE-1536.patch, 
> LUCENE-1536.patch, LUCENE-1536.patch, LUCENE-1536.patch, LUCENE-1536.patch, 
> LUCENE-1536.patch, LUCENE-1536.patch, LUCENE-1536.patch, LUCENE-1536.patch, 
> LUCENE-1536.patch, LUCENE-1536.patch, LUCENE-1536.patch, LUCENE-1536.patch, 
> LUCENE-1536.patch, LUCENE-1536.patch, LUCENE-1536.patch, LUCENE-1536.patch, 
> changes-yonik-uwe.patch, luceneutil.patch
>
>
> I ran some performance tests, comparing applying a filter via
> random-access API instead of current trunk's iterator API.
> This was inspired by LUCENE-1476, where we realized deletions should
> really be implemented just like a filter, but then in testing found
> that switching deletions to iterator was a very sizable performance
> hit.
> Some notes on the test:
>   * Index is first 2M docs of Wikipedia.  Test machine is Mac OS X
> 10.5.6, quad core Intel CPU, 6 GB RAM, java 1.6.0_07-b06-153.
>   * I test across multiple queries.  1-X means an OR query, eg 1-4
> means 1 OR 2 OR 3 OR 4, whereas +1-4 is an AND query, ie 1 AND 2
> AND 3 AND 4.  "u s" means "united states" (phrase search).
>   * I test with multiple filter densities (0, 1, 2, 5, 10, 25, 75, 90,
> 95, 98, 99, 99.9 (filter is non-null but all bits are set),
> 100 (filter=null, control)).
>   * Method high means I use random-access filter API in
> IndexSearcher's main loop.  Method low means I use random-access
> filter API down in SegmentTermDocs (just like deleted docs
> today).
>   * Baseline (QPS) is current trunk, where filter is applied as iterator up
> "high" (ie in IndexSearcher's search loop).

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-1536) if a filter can support random access API, we should use it

2011-10-10 Thread Uwe Schindler (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-1536?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13124135#comment-13124135
 ] 

Uwe Schindler commented on LUCENE-1536:
---

Mike: That could be the reason for the problems: Currently it delegates to the 
wrapped Weight, but id does not wrap all methods.

> if a filter can support random access API, we should use it
> ---
>
> Key: LUCENE-1536
> URL: https://issues.apache.org/jira/browse/LUCENE-1536
> Project: Lucene - Java
>  Issue Type: Improvement
>  Components: core/search
>Affects Versions: 2.4
>Reporter: Michael McCandless
>Assignee: Michael McCandless
>Priority: Minor
>  Labels: gsoc2011, lucene-gsoc-11, mentor
> Fix For: 4.0
>
> Attachments: CachedFilterIndexReader.java, LUCENE-1536-rewrite.patch, 
> LUCENE-1536-rewrite.patch, LUCENE-1536-rewrite.patch, 
> LUCENE-1536-rewrite.patch, LUCENE-1536-rewrite.patch, 
> LUCENE-1536-rewrite.patch, LUCENE-1536-rewrite.patch, 
> LUCENE-1536-rewrite.patch, LUCENE-1536.patch, LUCENE-1536.patch, 
> LUCENE-1536.patch, LUCENE-1536.patch, LUCENE-1536.patch, LUCENE-1536.patch, 
> LUCENE-1536.patch, LUCENE-1536.patch, LUCENE-1536.patch, LUCENE-1536.patch, 
> LUCENE-1536.patch, LUCENE-1536.patch, LUCENE-1536.patch, LUCENE-1536.patch, 
> LUCENE-1536.patch, LUCENE-1536.patch, LUCENE-1536.patch, LUCENE-1536.patch, 
> LUCENE-1536.patch, LUCENE-1536.patch, LUCENE-1536.patch, LUCENE-1536.patch, 
> LUCENE-1536.patch, LUCENE-1536.patch, LUCENE-1536.patch, LUCENE-1536.patch, 
> changes-yonik-uwe.patch, luceneutil.patch
>
>
> I ran some performance tests, comparing applying a filter via
> random-access API instead of current trunk's iterator API.
> This was inspired by LUCENE-1476, where we realized deletions should
> really be implemented just like a filter, but then in testing found
> that switching deletions to iterator was a very sizable performance
> hit.
> Some notes on the test:
>   * Index is first 2M docs of Wikipedia.  Test machine is Mac OS X
> 10.5.6, quad core Intel CPU, 6 GB RAM, java 1.6.0_07-b06-153.
>   * I test across multiple queries.  1-X means an OR query, eg 1-4
> means 1 OR 2 OR 3 OR 4, whereas +1-4 is an AND query, ie 1 AND 2
> AND 3 AND 4.  "u s" means "united states" (phrase search).
>   * I test with multiple filter densities (0, 1, 2, 5, 10, 25, 75, 90,
> 95, 98, 99, 99.9 (filter is non-null but all bits are set),
> 100 (filter=null, control)).
>   * Method high means I use random-access filter API in
> IndexSearcher's main loop.  Method low means I use random-access
> filter API down in SegmentTermDocs (just like deleted docs
> today).
>   * Baseline (QPS) is current trunk, where filter is applied as iterator up
> "high" (ie in IndexSearcher's search loop).

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-1536) if a filter can support random access API, we should use it

2011-10-10 Thread Uwe Schindler (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-1536?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13124138#comment-13124138
 ] 

Uwe Schindler commented on LUCENE-1536:
---

bq. one bug is that FilteredQuery in the patch runs some heuristics per segment 
which determine how the booleans get set that drive the BS1 versus B2 decision.

How can BS1 and BS2 return different scores, this would be a bug? Theoretically 
it should be possible to have one segment with BS1 the other one with BS2.

By the way: That was not different without FilteredQuery in the older patches.

Of course the selection of the right scorer based on out of order should be 
done based on scoresDocOutOfOrder returned by the weight. This is a bug in 
FilteredQuery#Weight. But easy to fix.

By the way: This was also not different without FilteredQuery in the older 
patches.

> if a filter can support random access API, we should use it
> ---
>
> Key: LUCENE-1536
> URL: https://issues.apache.org/jira/browse/LUCENE-1536
> Project: Lucene - Java
>  Issue Type: Improvement
>  Components: core/search
>Affects Versions: 2.4
>Reporter: Michael McCandless
>Assignee: Michael McCandless
>Priority: Minor
>  Labels: gsoc2011, lucene-gsoc-11, mentor
> Fix For: 4.0
>
> Attachments: CachedFilterIndexReader.java, LUCENE-1536-rewrite.patch, 
> LUCENE-1536-rewrite.patch, LUCENE-1536-rewrite.patch, 
> LUCENE-1536-rewrite.patch, LUCENE-1536-rewrite.patch, 
> LUCENE-1536-rewrite.patch, LUCENE-1536-rewrite.patch, 
> LUCENE-1536-rewrite.patch, LUCENE-1536.patch, LUCENE-1536.patch, 
> LUCENE-1536.patch, LUCENE-1536.patch, LUCENE-1536.patch, LUCENE-1536.patch, 
> LUCENE-1536.patch, LUCENE-1536.patch, LUCENE-1536.patch, LUCENE-1536.patch, 
> LUCENE-1536.patch, LUCENE-1536.patch, LUCENE-1536.patch, LUCENE-1536.patch, 
> LUCENE-1536.patch, LUCENE-1536.patch, LUCENE-1536.patch, LUCENE-1536.patch, 
> LUCENE-1536.patch, LUCENE-1536.patch, LUCENE-1536.patch, LUCENE-1536.patch, 
> LUCENE-1536.patch, LUCENE-1536.patch, LUCENE-1536.patch, LUCENE-1536.patch, 
> changes-yonik-uwe.patch, luceneutil.patch
>
>
> I ran some performance tests, comparing applying a filter via
> random-access API instead of current trunk's iterator API.
> This was inspired by LUCENE-1476, where we realized deletions should
> really be implemented just like a filter, but then in testing found
> that switching deletions to iterator was a very sizable performance
> hit.
> Some notes on the test:
>   * Index is first 2M docs of Wikipedia.  Test machine is Mac OS X
> 10.5.6, quad core Intel CPU, 6 GB RAM, java 1.6.0_07-b06-153.
>   * I test across multiple queries.  1-X means an OR query, eg 1-4
> means 1 OR 2 OR 3 OR 4, whereas +1-4 is an AND query, ie 1 AND 2
> AND 3 AND 4.  "u s" means "united states" (phrase search).
>   * I test with multiple filter densities (0, 1, 2, 5, 10, 25, 75, 90,
> 95, 98, 99, 99.9 (filter is non-null but all bits are set),
> 100 (filter=null, control)).
>   * Method high means I use random-access filter API in
> IndexSearcher's main loop.  Method low means I use random-access
> filter API down in SegmentTermDocs (just like deleted docs
> today).
>   * Baseline (QPS) is current trunk, where filter is applied as iterator up
> "high" (ie in IndexSearcher's search loop).

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-1536) if a filter can support random access API, we should use it

2011-10-10 Thread Robert Muir (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-1536?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13124144#comment-13124144
 ] 

Robert Muir commented on LUCENE-1536:
-

{quote}
How can BS1 and BS2 return different scores, this would be a bug? Theoretically 
it should be possible to have one segment with BS1 the other one with BS2.
{quote}

Well they are different Scorer.java's ? I think its bad to use different code 
to score different segments, in this case two different algorithms
could cause floating point operations to be done in different order?

Its also a bug that the scoresDocsOutOfOrder is wrong: and this is really the 
whole bug. Somehow FilteredQuery#Weight needs to determine what its gonna do
up front so that collector specialization is working, so that we use BS1 or BS2 
consistently across all segments, etc.

> if a filter can support random access API, we should use it
> ---
>
> Key: LUCENE-1536
> URL: https://issues.apache.org/jira/browse/LUCENE-1536
> Project: Lucene - Java
>  Issue Type: Improvement
>  Components: core/search
>Affects Versions: 2.4
>Reporter: Michael McCandless
>Assignee: Michael McCandless
>Priority: Minor
>  Labels: gsoc2011, lucene-gsoc-11, mentor
> Fix For: 4.0
>
> Attachments: CachedFilterIndexReader.java, LUCENE-1536-rewrite.patch, 
> LUCENE-1536-rewrite.patch, LUCENE-1536-rewrite.patch, 
> LUCENE-1536-rewrite.patch, LUCENE-1536-rewrite.patch, 
> LUCENE-1536-rewrite.patch, LUCENE-1536-rewrite.patch, 
> LUCENE-1536-rewrite.patch, LUCENE-1536.patch, LUCENE-1536.patch, 
> LUCENE-1536.patch, LUCENE-1536.patch, LUCENE-1536.patch, LUCENE-1536.patch, 
> LUCENE-1536.patch, LUCENE-1536.patch, LUCENE-1536.patch, LUCENE-1536.patch, 
> LUCENE-1536.patch, LUCENE-1536.patch, LUCENE-1536.patch, LUCENE-1536.patch, 
> LUCENE-1536.patch, LUCENE-1536.patch, LUCENE-1536.patch, LUCENE-1536.patch, 
> LUCENE-1536.patch, LUCENE-1536.patch, LUCENE-1536.patch, LUCENE-1536.patch, 
> LUCENE-1536.patch, LUCENE-1536.patch, LUCENE-1536.patch, LUCENE-1536.patch, 
> changes-yonik-uwe.patch, luceneutil.patch
>
>
> I ran some performance tests, comparing applying a filter via
> random-access API instead of current trunk's iterator API.
> This was inspired by LUCENE-1476, where we realized deletions should
> really be implemented just like a filter, but then in testing found
> that switching deletions to iterator was a very sizable performance
> hit.
> Some notes on the test:
>   * Index is first 2M docs of Wikipedia.  Test machine is Mac OS X
> 10.5.6, quad core Intel CPU, 6 GB RAM, java 1.6.0_07-b06-153.
>   * I test across multiple queries.  1-X means an OR query, eg 1-4
> means 1 OR 2 OR 3 OR 4, whereas +1-4 is an AND query, ie 1 AND 2
> AND 3 AND 4.  "u s" means "united states" (phrase search).
>   * I test with multiple filter densities (0, 1, 2, 5, 10, 25, 75, 90,
> 95, 98, 99, 99.9 (filter is non-null but all bits are set),
> 100 (filter=null, control)).
>   * Method high means I use random-access filter API in
> IndexSearcher's main loop.  Method low means I use random-access
> filter API down in SegmentTermDocs (just like deleted docs
> today).
>   * Baseline (QPS) is current trunk, where filter is applied as iterator up
> "high" (ie in IndexSearcher's search loop).

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-3502) Packed ints: move .getArray into Reader API

2011-10-10 Thread Simon Willnauer (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-3502?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13124167#comment-13124167
 ] 

Simon Willnauer commented on LUCENE-3502:
-

looks good mike!

> Packed ints: move .getArray into Reader API
> ---
>
> Key: LUCENE-3502
> URL: https://issues.apache.org/jira/browse/LUCENE-3502
> Project: Lucene - Java
>  Issue Type: Improvement
>Reporter: Michael McCandless
>Assignee: Michael McCandless
> Fix For: 3.5, 4.0
>
> Attachments: LUCENE-3502.patch, LUCENE-3502.patch
>
>
> This is a simple code cleanup... it's messy that a consumer of
> PackedInts.Reader must check whether the impl is Direct8/16/32/64 in
> order to get an array; it's better to move up the .getArray into the
> Reader interface and then make the DirectN impls package private.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Resolved] (LUCENE-3186) DocValues type should be recored in FNX file to early fail if user specifies incompatible type

2011-10-10 Thread Simon Willnauer (Resolved) (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-3186?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Simon Willnauer resolved LUCENE-3186.
-

   Resolution: Fixed
Lucene Fields: New,Patch Available  (was: New)

committed to trunk in rev. 1181020

thanks

> DocValues type should be recored in FNX file to early fail if user specifies 
> incompatible type
> --
>
> Key: LUCENE-3186
> URL: https://issues.apache.org/jira/browse/LUCENE-3186
> Project: Lucene - Java
>  Issue Type: Improvement
>  Components: core/index
>Affects Versions: 4.0
>Reporter: Simon Willnauer
>Assignee: Simon Willnauer
> Fix For: 4.0
>
> Attachments: LUCENE-3186.patch, LUCENE-3186.patch, LUCENE-3186.patch
>
>
> Currently segment merger fails if the docvalues type is not compatible across 
> segments. We already catch this problem if somebody changes the values type 
> for a field within one segment but not across segments. in order to do that 
> we should record the type in the fnx fiel alone with the field numbers.
> I marked this 4.0 since it should not block the landing on trunk

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (LUCENE-3502) Packed ints: move .getArray into Reader API

2011-10-10 Thread Michael McCandless (Updated) (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-3502?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Michael McCandless updated LUCENE-3502:
---

Fix Version/s: (was: 3.5)

Removing 3.5 fix version... I keep forgetting packed ints aren't backported yet.

But for LUCENE-2205 we have to make sure we sync to trunk when we backport it.

> Packed ints: move .getArray into Reader API
> ---
>
> Key: LUCENE-3502
> URL: https://issues.apache.org/jira/browse/LUCENE-3502
> Project: Lucene - Java
>  Issue Type: Improvement
>Reporter: Michael McCandless
>Assignee: Michael McCandless
> Fix For: 4.0
>
> Attachments: LUCENE-3502.patch, LUCENE-3502.patch
>
>
> This is a simple code cleanup... it's messy that a consumer of
> PackedInts.Reader must check whether the impl is Direct8/16/32/64 in
> order to get an array; it's better to move up the .getArray into the
> Reader interface and then make the DirectN impls package private.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Resolved] (LUCENE-3502) Packed ints: move .getArray into Reader API

2011-10-10 Thread Michael McCandless (Resolved) (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-3502?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Michael McCandless resolved LUCENE-3502.


Resolution: Fixed

Thanks Simon and Robert.

> Packed ints: move .getArray into Reader API
> ---
>
> Key: LUCENE-3502
> URL: https://issues.apache.org/jira/browse/LUCENE-3502
> Project: Lucene - Java
>  Issue Type: Improvement
>Reporter: Michael McCandless
>Assignee: Michael McCandless
> Fix For: 4.0
>
> Attachments: LUCENE-3502.patch, LUCENE-3502.patch
>
>
> This is a simple code cleanup... it's messy that a consumer of
> PackedInts.Reader must check whether the impl is Direct8/16/32/64 in
> order to get an array; it's better to move up the .getArray into the
> Reader interface and then make the DirectN impls package private.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (LUCENE-1536) if a filter can support random access API, we should use it

2011-10-10 Thread Robert Muir (Updated) (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-1536?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Robert Muir updated LUCENE-1536:


Attachment: LUCENE-1536_hack.patch

hack patch that computes the heuristic up front in weight init, so it scores 
all segments consistently and returns the proper scoresDocsOutOfOrder for BS1.

Uwe's new test (the nestedFilterQuery) doesnt pass yet, don't know why.

I recomputed the benchmarks:
{noformat}
Task   QPS trunkStdDev trunk   QPS patchStdDev patch  Pct 
diff
  PhraseF1.0   11.990.207.790.23  -37% -  
-31%
TermF0.5  135.147.62  116.570.36  -18% -   
-8%
   AndHighHighF100.0   17.340.78   15.440.15  -15% -   
-5%
AndHighHighF95.0   17.280.66   15.480.17  -14% -   
-5%
AndHighHighF90.0   17.310.76   15.580.19  -14% -   
-4%
AndHighHighF99.0   17.051.02   15.450.17  -15% -   
-2%
AndHighHighF75.0   17.470.78   16.030.15  -12% -   
-3%
 AndHighHighF5.0   20.690.95   19.780.23   -9% -
1%
 AndHighHighF1.0   35.111.46   33.640.36   -8% -
1%
 AndHighHighF0.1  136.043.70  132.001.41   -6% -
0%
 AndHighHigh   18.250.70   17.740.20   -7% -
2%
 AndHighHighF0.5   49.841.72   48.580.49   -6% -
1%
TermF0.1  351.18   11.01  345.851.73   -4% -
2%
Fuzzy2F100.0   95.524.21   94.332.07   -7% -
5%
  SloppyPhraseF100.08.010.287.910.09   -5% -
3%
 Fuzzy2F90.0   95.423.86   94.511.74   -6% -
5%
 Fuzzy2F95.0   95.204.86   94.331.83   -7% -
6%
  Fuzzy1F1.0   54.021.67   53.561.07   -5% -
4%
  PhraseF2.07.730.077.680.18   -3% -
2%
   SloppyPhraseF99.07.990.237.950.10   -4% -
3%
AndHighHighF50.0   17.540.79   17.460.12   -5% -
4%
  Fuzzy2F0.1  105.393.93  105.343.74   -7% -
7%
  SpanNearF100.03.160.063.160.04   -2% -
2%
 Fuzzy2F99.0   94.026.86   94.211.97   -8% -   
10%
 Fuzzy2F75.0   95.563.51   95.762.02   -5% -
6%
WildcardF2.0   52.790.27   53.050.57   -1% -
2%
  Fuzzy1F0.5   58.121.83   58.431.22   -4% -
5%
  PhraseF0.1   66.340.78   66.731.68   -3% -
4%
SloppyPhraseF0.1   56.151.52   56.790.64   -2% -
5%
SloppyPhrase8.080.268.180.08   -2% -
5%
PKLookup  176.595.07  178.965.71   -4% -
7%
SpanNearF0.1   32.360.56   32.830.54   -1% -
4%
  OrHighHighF0.1   78.200.52   79.440.740% -
3%
   SloppyPhraseF95.07.910.088.050.090% -
3%
  Fuzzy2   94.873.72   96.491.62   -3% -
7%
  OrHighHighF0.5   31.410.47   31.960.330% -
4%
   SpanNearF99.03.120.063.180.030% -
4%
WildcardF0.5   61.970.56   63.280.820% -
4%
  PhraseF0.5   19.780.26   20.290.310% -
5%
SpanNear3.190.083.270.05   -1% -
6%
WildcardF0.1   67.450.64   69.240.890% -
4%
   SloppyPhraseF90.08.000.298.210.12   -2% -
8%
   SpanNearF95.03.130.043.230.031% -
5%
Wildcard   43.190.34   44.641.400% -
7%
 Fuzzy2F50.0   95.124.22   98.692.28   -2% -   
11%
  Fuzzy1   55.284.53   57.680.76   -4% -   
15%
  OrHighHigh   12.130.99   12.710.43   -6% -   
18%
  Phrase3.600.043.810.043% -
7%
   SpanNearF90.03.150.053.350.043% -
9%
Term   71.690.40   76.534.130% -   
13%
 PhraseF99.03.430.033.680.045% -
9%
PhraseF100.03.390.053.670.045% -   
10%
   SloppyPhraseF75.08.04

[jira] [Created] (LUCENE-3503) DisjunctionSumScorer gives slightly (float iotas) different scores when you .nextDoc vs .advance

2011-10-10 Thread Michael McCandless (Created) (JIRA)
DisjunctionSumScorer gives slightly (float iotas) different scores when you 
.nextDoc vs .advance


 Key: LUCENE-3503
 URL: https://issues.apache.org/jira/browse/LUCENE-3503
 Project: Lucene - Java
  Issue Type: Bug
Reporter: Michael McCandless
 Attachments: LUCENE-3503.patch

Spinoff from LUCENE-1536.

I dug into why we hit a score diff when using luceneutil to benchmark
the patch.

At first I thought it was BS1/BS2 difference, but because of a bug in
the patch it was still using BS2 (but should be BS1) -- Robert's last
patch fixes that.

But it's actually a diff in BS2 itself, whether you next or advance
through the docs.

It's because DisjunctionSumScorer, when summing the float scores for a
given doc that matches multiple sub-scorers, might sum in a different
order, when you had .nextDoc'd to that doc than when you had .advance'd
to it.

This in turn is because the PQ used by that scorer (ScorerDocQueue)
makes no effort to break ties.  So, when the top N scorers are on the
same doc, the PQ doesn't care what order they are in.

Fixing ScorerDocQueue to break ties will likely be a non-trivial perf
hit, though, so I'm not sure whether we should do anything here...

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (LUCENE-3503) DisjunctionSumScorer gives slightly (float iotas) different scores when you .nextDoc vs .advance

2011-10-10 Thread Michael McCandless (Updated) (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-3503?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Michael McCandless updated LUCENE-3503:
---

Attachment: LUCENE-3503.patch

Failing test case showing the bug.

> DisjunctionSumScorer gives slightly (float iotas) different scores when you 
> .nextDoc vs .advance
> 
>
> Key: LUCENE-3503
> URL: https://issues.apache.org/jira/browse/LUCENE-3503
> Project: Lucene - Java
>  Issue Type: Bug
>Reporter: Michael McCandless
> Attachments: LUCENE-3503.patch
>
>
> Spinoff from LUCENE-1536.
> I dug into why we hit a score diff when using luceneutil to benchmark
> the patch.
> At first I thought it was BS1/BS2 difference, but because of a bug in
> the patch it was still using BS2 (but should be BS1) -- Robert's last
> patch fixes that.
> But it's actually a diff in BS2 itself, whether you next or advance
> through the docs.
> It's because DisjunctionSumScorer, when summing the float scores for a
> given doc that matches multiple sub-scorers, might sum in a different
> order, when you had .nextDoc'd to that doc than when you had .advance'd
> to it.
> This in turn is because the PQ used by that scorer (ScorerDocQueue)
> makes no effort to break ties.  So, when the top N scorers are on the
> same doc, the PQ doesn't care what order they are in.
> Fixing ScorerDocQueue to break ties will likely be a non-trivial perf
> hit, though, so I'm not sure whether we should do anything here...

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Created] (SOLR-2820) Add both model and state to ZooKeeper layout for SolrCloud

2011-10-10 Thread Mark Miller (Created) (JIRA)
Add both model and state to ZooKeeper layout for SolrCloud
--

 Key: SOLR-2820
 URL: https://issues.apache.org/jira/browse/SOLR-2820
 Project: Solr
  Issue Type: New Feature
  Components: SolrCloud
Reporter: Mark Miller


Current we skimp by here by having the model and simple node state - we really 
want the model and full cluster state longer term though.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-3503) DisjunctionSumScorer gives slightly (float iotas) different scores when you .nextDoc vs .advance

2011-10-10 Thread Robert Muir (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-3503?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13124315#comment-13124315
 ] 

Robert Muir commented on LUCENE-3503:
-

I think we have to fix it, get the correctness and then worry about performance 
later.

giving a document a different score (no matter how little, it will affect 
ranking) just because you next'ed versus advance'd it is bad news.

> DisjunctionSumScorer gives slightly (float iotas) different scores when you 
> .nextDoc vs .advance
> 
>
> Key: LUCENE-3503
> URL: https://issues.apache.org/jira/browse/LUCENE-3503
> Project: Lucene - Java
>  Issue Type: Bug
>Reporter: Michael McCandless
> Attachments: LUCENE-3503.patch
>
>
> Spinoff from LUCENE-1536.
> I dug into why we hit a score diff when using luceneutil to benchmark
> the patch.
> At first I thought it was BS1/BS2 difference, but because of a bug in
> the patch it was still using BS2 (but should be BS1) -- Robert's last
> patch fixes that.
> But it's actually a diff in BS2 itself, whether you next or advance
> through the docs.
> It's because DisjunctionSumScorer, when summing the float scores for a
> given doc that matches multiple sub-scorers, might sum in a different
> order, when you had .nextDoc'd to that doc than when you had .advance'd
> to it.
> This in turn is because the PQ used by that scorer (ScorerDocQueue)
> makes no effort to break ties.  So, when the top N scorers are on the
> same doc, the PQ doesn't care what order they are in.
> Fixing ScorerDocQueue to break ties will likely be a non-trivial perf
> hit, though, so I'm not sure whether we should do anything here...

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-1536) if a filter can support random access API, we should use it

2011-10-10 Thread Michael McCandless (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-1536?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13124317#comment-13124317
 ] 

Michael McCandless commented on LUCENE-1536:


I opened LUCENE-3503 for the score diff issue; it's a pre-existing bug.

> if a filter can support random access API, we should use it
> ---
>
> Key: LUCENE-1536
> URL: https://issues.apache.org/jira/browse/LUCENE-1536
> Project: Lucene - Java
>  Issue Type: Improvement
>  Components: core/search
>Affects Versions: 2.4
>Reporter: Michael McCandless
>Assignee: Michael McCandless
>Priority: Minor
>  Labels: gsoc2011, lucene-gsoc-11, mentor
> Fix For: 4.0
>
> Attachments: CachedFilterIndexReader.java, LUCENE-1536-rewrite.patch, 
> LUCENE-1536-rewrite.patch, LUCENE-1536-rewrite.patch, 
> LUCENE-1536-rewrite.patch, LUCENE-1536-rewrite.patch, 
> LUCENE-1536-rewrite.patch, LUCENE-1536-rewrite.patch, 
> LUCENE-1536-rewrite.patch, LUCENE-1536.patch, LUCENE-1536.patch, 
> LUCENE-1536.patch, LUCENE-1536.patch, LUCENE-1536.patch, LUCENE-1536.patch, 
> LUCENE-1536.patch, LUCENE-1536.patch, LUCENE-1536.patch, LUCENE-1536.patch, 
> LUCENE-1536.patch, LUCENE-1536.patch, LUCENE-1536.patch, LUCENE-1536.patch, 
> LUCENE-1536.patch, LUCENE-1536.patch, LUCENE-1536.patch, LUCENE-1536.patch, 
> LUCENE-1536.patch, LUCENE-1536.patch, LUCENE-1536.patch, LUCENE-1536.patch, 
> LUCENE-1536.patch, LUCENE-1536.patch, LUCENE-1536.patch, LUCENE-1536.patch, 
> LUCENE-1536_hack.patch, changes-yonik-uwe.patch, luceneutil.patch
>
>
> I ran some performance tests, comparing applying a filter via
> random-access API instead of current trunk's iterator API.
> This was inspired by LUCENE-1476, where we realized deletions should
> really be implemented just like a filter, but then in testing found
> that switching deletions to iterator was a very sizable performance
> hit.
> Some notes on the test:
>   * Index is first 2M docs of Wikipedia.  Test machine is Mac OS X
> 10.5.6, quad core Intel CPU, 6 GB RAM, java 1.6.0_07-b06-153.
>   * I test across multiple queries.  1-X means an OR query, eg 1-4
> means 1 OR 2 OR 3 OR 4, whereas +1-4 is an AND query, ie 1 AND 2
> AND 3 AND 4.  "u s" means "united states" (phrase search).
>   * I test with multiple filter densities (0, 1, 2, 5, 10, 25, 75, 90,
> 95, 98, 99, 99.9 (filter is non-null but all bits are set),
> 100 (filter=null, control)).
>   * Method high means I use random-access filter API in
> IndexSearcher's main loop.  Method low means I use random-access
> filter API down in SegmentTermDocs (just like deleted docs
> today).
>   * Baseline (QPS) is current trunk, where filter is applied as iterator up
> "high" (ie in IndexSearcher's search loop).

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Created] (SOLR-2821) Improve how cluster state is managed in ZooKeeper.

2011-10-10 Thread Mark Miller (Created) (JIRA)
Improve how cluster state is managed in ZooKeeper.
--

 Key: SOLR-2821
 URL: https://issues.apache.org/jira/browse/SOLR-2821
 Project: Solr
  Issue Type: Improvement
  Components: SolrCloud
Reporter: Mark Miller
 Fix For: 4.0


Currently, we have issues supporting both incremental cluster state updates 
(needed because reading the state with many ZK requests does not scale) and 
allowing shard/node properties to change on the fly. We be nice to have a 
solution that allows faster cluster state reads and easy on the fly shard/node 
prop changes.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-2765) Shard/Node states

2011-10-10 Thread Mark Miller (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-2765?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13124318#comment-13124318
 ] 

Mark Miller commented on SOLR-2765:
---

I've spun off:

SOLR-2821 Improve how cluster state is managed in ZooKeeper.
SOLR-2820 Add both model and state to ZooKeeper layout for SolrCloud.

Even if we hit them with one patch, makes it easier to track these changes.

> Shard/Node states
> -
>
> Key: SOLR-2765
> URL: https://issues.apache.org/jira/browse/SOLR-2765
> Project: Solr
>  Issue Type: Sub-task
>  Components: SolrCloud, update
>Reporter: Yonik Seeley
> Fix For: 4.0
>
> Attachments: cluster_state-file.patch, combined.patch, 
> incremental_update.patch, scheduled_executors.patch, shard-roles.patch
>
>
> Need state for shards that indicate they are recovering, active/enabled, or 
> disabled.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-2765) Shard/Node states

2011-10-10 Thread Mark Miller (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-2765?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13124326#comment-13124326
 ] 

Mark Miller commented on SOLR-2765:
---

Thanks for the patch Jaime! I'll be able to take a closer look at it later 
today.

> Shard/Node states
> -
>
> Key: SOLR-2765
> URL: https://issues.apache.org/jira/browse/SOLR-2765
> Project: Solr
>  Issue Type: Sub-task
>  Components: SolrCloud, update
>Reporter: Yonik Seeley
> Fix For: 4.0
>
> Attachments: cluster_state-file.patch, combined.patch, 
> incremental_update.patch, scheduled_executors.patch, shard-roles.patch
>
>
> Need state for shards that indicate they are recovering, active/enabled, or 
> disabled.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



Re: svn commit: r1181020 - in /lucene/dev/trunk/lucene/src/java/org/apache/lucene/index: ./ codecs/ values/

2011-10-10 Thread Mark Miller
I think this commit is missing the TypePromoter class file?


On Oct 10, 2011, at 11:28 AM, sim...@apache.org wrote:

> Author: simonw
> Date: Mon Oct 10 15:28:07 2011
> New Revision: 1181020
> 
> URL: http://svn.apache.org/viewvc?rev=1181020&view=rev
> Log:
> LUCENE-3186: Promote docvalues types during merge if ValueType or size 
> differes across segments
> 
> Modified:
>lucene/dev/trunk/lucene/src/java/org/apache/lucene/index/FieldInfo.java
>
> lucene/dev/trunk/lucene/src/java/org/apache/lucene/index/MultiPerDocValues.java
>lucene/dev/trunk/lucene/src/java/org/apache/lucene/index/SegmentMerger.java
>
> lucene/dev/trunk/lucene/src/java/org/apache/lucene/index/codecs/DocValuesConsumer.java
>
> lucene/dev/trunk/lucene/src/java/org/apache/lucene/index/codecs/PerDocConsumer.java
>
> lucene/dev/trunk/lucene/src/java/org/apache/lucene/index/values/FixedDerefBytesImpl.java
>
> lucene/dev/trunk/lucene/src/java/org/apache/lucene/index/values/FixedSortedBytesImpl.java
>
> lucene/dev/trunk/lucene/src/java/org/apache/lucene/index/values/FixedStraightBytesImpl.java
>lucene/dev/trunk/lucene/src/java/org/apache/lucene/index/values/Floats.java
>
> lucene/dev/trunk/lucene/src/java/org/apache/lucene/index/values/IndexDocValues.java
>lucene/dev/trunk/lucene/src/java/org/apache/lucene/index/values/Ints.java
>
> lucene/dev/trunk/lucene/src/java/org/apache/lucene/index/values/MultiIndexDocValues.java
>
> lucene/dev/trunk/lucene/src/java/org/apache/lucene/index/values/VarStraightBytesImpl.java
>lucene/dev/trunk/lucene/src/java/org/apache/lucene/index/values/Writer.java
> 
> Modified: 
> lucene/dev/trunk/lucene/src/java/org/apache/lucene/index/FieldInfo.java
> URL: 
> http://svn.apache.org/viewvc/lucene/dev/trunk/lucene/src/java/org/apache/lucene/index/FieldInfo.java?rev=1181020&r1=1181019&r2=1181020&view=diff
> ==
> --- lucene/dev/trunk/lucene/src/java/org/apache/lucene/index/FieldInfo.java 
> (original)
> +++ lucene/dev/trunk/lucene/src/java/org/apache/lucene/index/FieldInfo.java 
> Mon Oct 10 15:28:07 2011
> @@ -86,7 +86,7 @@ public final class FieldInfo {
>   public int getCodecId() {
> return codecId;
>   }
> -
> +  
>   @Override
>   public Object clone() {
> FieldInfo clone = new FieldInfo(name, isIndexed, number, storeTermVector, 
> storePositionWithTermVector,
> @@ -132,6 +132,12 @@ public final class FieldInfo {
> }
>   }
> 
> +  public void resetDocValues(ValueType v) {
> +if (docValues != null) {
> +  docValues = v;
> +}
> +  }
> +  
>   public boolean hasDocValues() {
> return docValues != null;
>   }
> 
> Modified: 
> lucene/dev/trunk/lucene/src/java/org/apache/lucene/index/MultiPerDocValues.java
> URL: 
> http://svn.apache.org/viewvc/lucene/dev/trunk/lucene/src/java/org/apache/lucene/index/MultiPerDocValues.java?rev=1181020&r1=1181019&r2=1181020&view=diff
> ==
> --- 
> lucene/dev/trunk/lucene/src/java/org/apache/lucene/index/MultiPerDocValues.java
>  (original)
> +++ 
> lucene/dev/trunk/lucene/src/java/org/apache/lucene/index/MultiPerDocValues.java
>  Mon Oct 10 15:28:07 2011
> @@ -128,7 +128,7 @@ public class MultiPerDocValues extends P
>   if (docsUpto != start) {
> type = values.type();
> docValuesIndex.add(new MultiIndexDocValues.DocValuesIndex(
> -new MultiIndexDocValues.DummyDocValues(start, type), 
> docsUpto, start
> +new MultiIndexDocValues.EmptyDocValues(start, type), 
> docsUpto, start
> - docsUpto));
>   }
>   docValuesIndex.add(new MultiIndexDocValues.DocValuesIndex(values, 
> start,
> @@ -137,7 +137,7 @@ public class MultiPerDocValues extends P
> 
> } else if (i + 1 == subs.length && !docValuesIndex.isEmpty()) {
>   docValuesIndex.add(new MultiIndexDocValues.DocValuesIndex(
> -  new MultiIndexDocValues.DummyDocValues(start, type), docsUpto, 
> start
> +  new MultiIndexDocValues.EmptyDocValues(start, type), docsUpto, 
> start
>   - docsUpto));
> }
>   }
> 
> Modified: 
> lucene/dev/trunk/lucene/src/java/org/apache/lucene/index/SegmentMerger.java
> URL: 
> http://svn.apache.org/viewvc/lucene/dev/trunk/lucene/src/java/org/apache/lucene/index/SegmentMerger.java?rev=1181020&r1=1181019&r2=1181020&view=diff
> ==
> --- 
> lucene/dev/trunk/lucene/src/java/org/apache/lucene/index/SegmentMerger.java 
> (original)
> +++ 
> lucene/dev/trunk/lucene/src/java/org/apache/lucene/index/SegmentMerger.java 
> Mon Oct 10 15:28:07 2011
> @@ -33,7 +33,6 @@ import org.apache.lucene.index.codecs.Fi
> import org.apache.lucene.index.codecs.FieldsWriter;
> import org.apache.lucene.index.codecs.MergeState;
> import org.apache.lucene.index.codecs.PerDocConsum

[jira] [Commented] (SOLR-1979) Create LanguageIdentifierUpdateProcessor

2011-10-10 Thread T Jake Luciani (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-1979?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13124343#comment-13124343
 ] 

T Jake Luciani commented on SOLR-1979:
--

build on 3x branch still failing because 
solr/contrib/langid/src/java/overview.html was only committed to trunk. This 
file needs to be added to branch_3x as well.

> Create LanguageIdentifierUpdateProcessor
> 
>
> Key: SOLR-1979
> URL: https://issues.apache.org/jira/browse/SOLR-1979
> Project: Solr
>  Issue Type: New Feature
>  Components: contrib - LangId, update
>Reporter: Jan Høydahl
>Assignee: Jan Høydahl
>Priority: Minor
>  Labels: UpdateProcessor
> Fix For: 3.5, 4.0
>
> Attachments: SOLR-1979-branch_3x.patch, SOLR-1979.patch, 
> SOLR-1979.patch, SOLR-1979.patch, SOLR-1979.patch, SOLR-1979.patch, 
> SOLR-1979.patch, SOLR-1979.patch, SOLR-1979.patch, SOLR-1979.patch, 
> SOLR-1979.patch, SOLR-1979.patch, SOLR-1979.patch, SOLR-1979.patch, 
> SOLR-1979.patch
>
>
> Language identification from document fields, and mapping of field names to 
> language-specific fields based on detected language.
> Wrap the Tika LanguageIdentifier in an UpdateProcessor.
> See user documentation at http://wiki.apache.org/solr/LanguageDetection

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (LUCENE-3503) DisjunctionSumScorer gives slightly (float iotas) different scores when you .nextDoc vs .advance

2011-10-10 Thread Robert Muir (Updated) (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-3503?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Robert Muir updated LUCENE-3503:


Attachment: LUCENE-3503.patch

not a real 'fix' but maybe this solves it for practical purposes?

> DisjunctionSumScorer gives slightly (float iotas) different scores when you 
> .nextDoc vs .advance
> 
>
> Key: LUCENE-3503
> URL: https://issues.apache.org/jira/browse/LUCENE-3503
> Project: Lucene - Java
>  Issue Type: Bug
>Reporter: Michael McCandless
> Attachments: LUCENE-3503.patch, LUCENE-3503.patch
>
>
> Spinoff from LUCENE-1536.
> I dug into why we hit a score diff when using luceneutil to benchmark
> the patch.
> At first I thought it was BS1/BS2 difference, but because of a bug in
> the patch it was still using BS2 (but should be BS1) -- Robert's last
> patch fixes that.
> But it's actually a diff in BS2 itself, whether you next or advance
> through the docs.
> It's because DisjunctionSumScorer, when summing the float scores for a
> given doc that matches multiple sub-scorers, might sum in a different
> order, when you had .nextDoc'd to that doc than when you had .advance'd
> to it.
> This in turn is because the PQ used by that scorer (ScorerDocQueue)
> makes no effort to break ties.  So, when the top N scorers are on the
> same doc, the PQ doesn't care what order they are in.
> Fixing ScorerDocQueue to break ties will likely be a non-trivial perf
> hit, though, so I'm not sure whether we should do anything here...

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



Re: svn commit: r1181020 - in /lucene/dev/trunk/lucene/src/java/org/apache/lucene/index: ./ codecs/ values/

2011-10-10 Thread Mark Miller
Robert put in the class and test it looks - thanks.


On Oct 10, 2011, at 2:03 PM, Mark Miller wrote:

> I think this commit is missing the TypePromoter class file?
> 
> 
> On Oct 10, 2011, at 11:28 AM, sim...@apache.org wrote:
> 
>> Author: simonw
>> Date: Mon Oct 10 15:28:07 2011
>> New Revision: 1181020
>> 
>> URL: http://svn.apache.org/viewvc?rev=1181020&view=rev
>> Log:
>> LUCENE-3186: Promote docvalues types during merge if ValueType or size 
>> differes across segments
>> 
>> Modified:
>>   lucene/dev/trunk/lucene/src/java/org/apache/lucene/index/FieldInfo.java
>>   
>> lucene/dev/trunk/lucene/src/java/org/apache/lucene/index/MultiPerDocValues.java
>>   lucene/dev/trunk/lucene/src/java/org/apache/lucene/index/SegmentMerger.java
>>   
>> lucene/dev/trunk/lucene/src/java/org/apache/lucene/index/codecs/DocValuesConsumer.java
>>   
>> lucene/dev/trunk/lucene/src/java/org/apache/lucene/index/codecs/PerDocConsumer.java
>>   
>> lucene/dev/trunk/lucene/src/java/org/apache/lucene/index/values/FixedDerefBytesImpl.java
>>   
>> lucene/dev/trunk/lucene/src/java/org/apache/lucene/index/values/FixedSortedBytesImpl.java
>>   
>> lucene/dev/trunk/lucene/src/java/org/apache/lucene/index/values/FixedStraightBytesImpl.java
>>   lucene/dev/trunk/lucene/src/java/org/apache/lucene/index/values/Floats.java
>>   
>> lucene/dev/trunk/lucene/src/java/org/apache/lucene/index/values/IndexDocValues.java
>>   lucene/dev/trunk/lucene/src/java/org/apache/lucene/index/values/Ints.java
>>   
>> lucene/dev/trunk/lucene/src/java/org/apache/lucene/index/values/MultiIndexDocValues.java
>>   
>> lucene/dev/trunk/lucene/src/java/org/apache/lucene/index/values/VarStraightBytesImpl.java
>>   lucene/dev/trunk/lucene/src/java/org/apache/lucene/index/values/Writer.java
>> 
>> Modified: 
>> lucene/dev/trunk/lucene/src/java/org/apache/lucene/index/FieldInfo.java
>> URL: 
>> http://svn.apache.org/viewvc/lucene/dev/trunk/lucene/src/java/org/apache/lucene/index/FieldInfo.java?rev=1181020&r1=1181019&r2=1181020&view=diff
>> ==
>> --- lucene/dev/trunk/lucene/src/java/org/apache/lucene/index/FieldInfo.java 
>> (original)
>> +++ lucene/dev/trunk/lucene/src/java/org/apache/lucene/index/FieldInfo.java 
>> Mon Oct 10 15:28:07 2011
>> @@ -86,7 +86,7 @@ public final class FieldInfo {
>>  public int getCodecId() {
>>return codecId;
>>  }
>> -
>> +  
>>  @Override
>>  public Object clone() {
>>FieldInfo clone = new FieldInfo(name, isIndexed, number, storeTermVector, 
>> storePositionWithTermVector,
>> @@ -132,6 +132,12 @@ public final class FieldInfo {
>>}
>>  }
>> 
>> +  public void resetDocValues(ValueType v) {
>> +if (docValues != null) {
>> +  docValues = v;
>> +}
>> +  }
>> +  
>>  public boolean hasDocValues() {
>>return docValues != null;
>>  }
>> 
>> Modified: 
>> lucene/dev/trunk/lucene/src/java/org/apache/lucene/index/MultiPerDocValues.java
>> URL: 
>> http://svn.apache.org/viewvc/lucene/dev/trunk/lucene/src/java/org/apache/lucene/index/MultiPerDocValues.java?rev=1181020&r1=1181019&r2=1181020&view=diff
>> ==
>> --- 
>> lucene/dev/trunk/lucene/src/java/org/apache/lucene/index/MultiPerDocValues.java
>>  (original)
>> +++ 
>> lucene/dev/trunk/lucene/src/java/org/apache/lucene/index/MultiPerDocValues.java
>>  Mon Oct 10 15:28:07 2011
>> @@ -128,7 +128,7 @@ public class MultiPerDocValues extends P
>>  if (docsUpto != start) {
>>type = values.type();
>>docValuesIndex.add(new MultiIndexDocValues.DocValuesIndex(
>> -new MultiIndexDocValues.DummyDocValues(start, type), 
>> docsUpto, start
>> +new MultiIndexDocValues.EmptyDocValues(start, type), 
>> docsUpto, start
>>- docsUpto));
>>  }
>>  docValuesIndex.add(new MultiIndexDocValues.DocValuesIndex(values, 
>> start,
>> @@ -137,7 +137,7 @@ public class MultiPerDocValues extends P
>> 
>>} else if (i + 1 == subs.length && !docValuesIndex.isEmpty()) {
>>  docValuesIndex.add(new MultiIndexDocValues.DocValuesIndex(
>> -  new MultiIndexDocValues.DummyDocValues(start, type), 
>> docsUpto, start
>> +  new MultiIndexDocValues.EmptyDocValues(start, type), 
>> docsUpto, start
>>  - docsUpto));
>>}
>>  }
>> 
>> Modified: 
>> lucene/dev/trunk/lucene/src/java/org/apache/lucene/index/SegmentMerger.java
>> URL: 
>> http://svn.apache.org/viewvc/lucene/dev/trunk/lucene/src/java/org/apache/lucene/index/SegmentMerger.java?rev=1181020&r1=1181019&r2=1181020&view=diff
>> ==
>> --- 
>> lucene/dev/trunk/lucene/src/java/org/apache/lucene/index/SegmentMerger.java 
>> (original)
>> +++ 
>> lucene/dev/trunk/lucene/src/java/org/apache/lucene/index/SegmentMerger.java 
>> Mon Oct 10 15:28:07 2011
>> @@ -33,7 +33,6 @@ import o

[jira] [Resolved] (SOLR-2815) Fields with a "-" in the name are interpreted as functions in the fl= parameter.

2011-10-10 Thread Hoss Man (Resolved) (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-2815?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hoss Man resolved SOLR-2815.


Resolution: Duplicate

Dup of SOLR-2719

> Fields with a "-" in the name are interpreted as functions in the fl= 
> parameter.
> 
>
> Key: SOLR-2815
> URL: https://issues.apache.org/jira/browse/SOLR-2815
> Project: Solr
>  Issue Type: Bug
>  Components: search
>Affects Versions: 4.0
> Environment: Using latest from trunk
>Reporter: Eric Pugh
>
> If you query for a field that has a "-" character in the name, you get odd 
> results.  I took the example schema and added a field called "in-stock" to go 
> along with the existing "inStock" field.  
> A query for http://localhost:8983/solr/select?q=*:*&fl=id,in-stock throws 
> back an error saying the field "in" can't be found.  
> I can sort of work around it by quoting the field name as "in-stock":
> http://localhost:8983/solr/select?q=*:*&fl=id,%22in-stock%22&rows=1
> However the output is still off:
> 
> GB18030TEST
> in-stock
> 
> In looking at it, I think the dash character causes the field name to be 
> interpreted as an actual function!

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (SOLR-2444) Update fl syntax to support: pseudo fields, AS, transformers, and wildcards

2011-10-10 Thread Hoss Man (Updated) (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-2444?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hoss Man updated SOLR-2444:
---

Fix Version/s: 4.0

since the majority of this has already been committed to trunk, i'm marking 
this for 4.0 -- if there is any outstanding work to consider this issue 
"finished" it either needs spun off into a new issue, or wrapped up before 4.0 
is released.

> Update fl syntax to support: pseudo fields, AS, transformers, and wildcards
> ---
>
> Key: SOLR-2444
> URL: https://issues.apache.org/jira/browse/SOLR-2444
> Project: Solr
>  Issue Type: New Feature
>Reporter: Ryan McKinley
> Fix For: 4.0
>
> Attachments: SOLR-2444-fl-parsing.patch, SOLR-2444-fl-parsing.patch
>
>
> The ReturnFields parsing needs to be improved.  It should also support 
> wildcards

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (SOLR-2719) REGRESSION ReturnFields incorrect parse fields with hyphen - breaks existing "fl=my-field-name" type usecases

2011-10-10 Thread Hoss Man (Updated) (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-2719?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hoss Man updated SOLR-2719:
---

 Priority: Blocker  (was: Major)
Fix Version/s: 4.0
  Summary: REGRESSION ReturnFields incorrect parse fields with hyphen - 
breaks existing "fl=my-field-name" type usecases  (was: ReturnFields incorrect 
parse fields with hyphen )

setting this as a blocker for 4.0 since it is a fairly serious regression for 
anyone using field names with "-" in them

> REGRESSION ReturnFields incorrect parse fields with hyphen - breaks existing 
> "fl=my-field-name" type usecases
> -
>
> Key: SOLR-2719
> URL: https://issues.apache.org/jira/browse/SOLR-2719
> Project: Solr
>  Issue Type: Bug
>  Components: search
>Affects Versions: 4.0
>Reporter: Nik V. Babichev
>Priority: Blocker
>  Labels: field, fl, query, queryparser
> Fix For: 4.0
>
>
> fl=my-hyphen-field in query params parsed as "my" instead of 
> "my-hyphen-field".
> OAS.search.ReturnFields use method getId() from OAS.search.QueryParsing
> in which check chars "if (!Character.isJavaIdentifierPart(ch) && ch != '.')"
> Hyphen is not JavaIdentifierPart and this check break when first "-" is found.
> This problem solve by passing '-' to check:
> if (!Character.isJavaIdentifierPart(ch) && ch != '.' && ch != '-') break;
> But I don't know how it can affect on whole project.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-1979) Create LanguageIdentifierUpdateProcessor

2011-10-10 Thread Commented

[ 
https://issues.apache.org/jira/browse/SOLR-1979?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13124366#comment-13124366
 ] 

Jan Høydahl commented on SOLR-1979:
---

Fixed overview.html in branch

> Create LanguageIdentifierUpdateProcessor
> 
>
> Key: SOLR-1979
> URL: https://issues.apache.org/jira/browse/SOLR-1979
> Project: Solr
>  Issue Type: New Feature
>  Components: contrib - LangId, update
>Reporter: Jan Høydahl
>Assignee: Jan Høydahl
>Priority: Minor
>  Labels: UpdateProcessor
> Fix For: 3.5, 4.0
>
> Attachments: SOLR-1979-branch_3x.patch, SOLR-1979.patch, 
> SOLR-1979.patch, SOLR-1979.patch, SOLR-1979.patch, SOLR-1979.patch, 
> SOLR-1979.patch, SOLR-1979.patch, SOLR-1979.patch, SOLR-1979.patch, 
> SOLR-1979.patch, SOLR-1979.patch, SOLR-1979.patch, SOLR-1979.patch, 
> SOLR-1979.patch
>
>
> Language identification from document fields, and mapping of field names to 
> language-specific fields based on detected language.
> Wrap the Tika LanguageIdentifier in an UpdateProcessor.
> See user documentation at http://wiki.apache.org/solr/LanguageDetection

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-1536) if a filter can support random access API, we should use it

2011-10-10 Thread Michael McCandless (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-1536?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13124385#comment-13124385
 ] 

Michael McCandless commented on LUCENE-1536:


I also bench'd Robert's patch (turned off verifyScores in lucenebench because 
of LUCENE-3503); results look very similar:
{noformat}

TaskQPS base StdDev baseQPS filterlowStdDev filterlow  
Pct diff
  PhraseF0.5   20.180.658.050.56  -64% -  
-55%
  PhraseF1.0   12.260.337.960.54  -41% -  
-28%
AndHighHighF95.0   16.560.13   15.981.09  -10% -
3%
 Fuzzy2F99.0   80.524.67   77.722.34  -11% -
5%
AndHighHighF99.0   16.550.12   15.971.05  -10% -
3%
   AndHighHighF100.0   16.540.13   15.981.06  -10% -
3%
Fuzzy2F100.0   80.324.60   77.642.34  -11% -
5%
 Fuzzy2F90.0   80.805.17   78.192.77  -12% -
7%
AndHighHighF90.0   16.570.15   16.051.13  -10% -
4%
  OrHighHighF0.1   72.173.60   70.113.69  -12% -
7%
  OrHighHighF0.5   29.261.23   28.441.50  -11% -
6%
 Fuzzy2F95.0   79.954.49   77.862.10  -10% -
5%
WildcardF0.1   59.214.21   58.013.42  -13% -   
11%
WildcardF0.5   54.943.78   53.883.08  -13% -   
11%
WildcardF1.0   51.313.31   50.352.44  -12% -
9%
WildcardF2.0   46.992.93   46.132.15  -11% -
9%
Wildcard   38.731.94   38.141.78  -10% -
8%
 Fuzzy2F75.0   80.575.03   79.382.04   -9% -
7%
AndHighHighF75.0   16.630.14   16.411.21   -9% -
6%
  SloppyPhraseF100.07.730.157.640.25   -6% -
4%
   SloppyPhraseF99.07.740.157.660.26   -6% -
4%
TermF0.1  328.10   15.20  325.54   16.82  -10% -
9%
  OrHighHigh   10.681.11   10.610.75  -16% -   
18%
TermF0.5  127.553.70  126.886.02   -7% -
7%
  PhraseF0.1   63.932.25   63.622.87   -8% -
7%
  PhraseF2.07.880.197.860.31   -6% -
6%
 AndHighHighF0.1  129.645.02  129.286.98   -9% -
9%
SloppyPhraseF0.1   53.800.79   53.861.84   -4% -
5%
   SloppyPhraseF95.07.740.157.750.27   -5% -
5%
SloppyPhraseF0.5   18.440.31   18.470.64   -4% -
5%
SloppyPhraseF1.0   13.100.23   13.130.47   -5% -
5%
SloppyPhrase7.810.107.830.30   -4% -
5%
 AndHighHighF0.5   47.611.00   47.762.33   -6% -
7%
  Fuzzy2F1.0   81.494.85   81.960.96   -6% -
8%
  Fuzzy1   47.973.71   48.351.94  -10% -   
13%
  Fuzzy1F0.1   64.313.56   64.820.83   -5% -
8%
  Fuzzy2   80.936.15   81.611.74   -8% -   
11%
  Phrase3.580.103.630.18   -6% -
9%
  SpanNearF100.02.980.103.030.12   -5% -
9%
   SloppyPhraseF90.07.740.157.870.28   -3% -
7%
 AndHighHigh   17.310.24   17.620.64   -3% -
6%
  Fuzzy2F0.1   89.545.78   91.381.44   -5% -   
10%
   SpanNearF99.02.980.093.040.13   -5% -
9%
Term   58.946.06   60.384.40  -13% -   
22%
SpanNearF0.1   29.911.07   30.701.43   -5% -   
11%
SpanNearF0.58.730.308.980.41   -5% -   
11%
SpanNearF5.03.330.113.420.16   -5% -   
11%
 Fuzzy2F50.0   80.905.19   83.292.28   -5% -   
13%
SpanNear3.010.103.100.14   -4% -   
11%
TermF1.0   87.072.01   89.926.38   -6% -   
13%
   SpanNearF95.02.980.103.100.13   -3% -   
12%
PhraseF100.03.370.063.510.17   -2% -   
11%
 PhraseF99.03.370.053.520.17   -2% -   
11%
 PhraseF95.03.370.063.56

[jira] [Commented] (SOLR-2080) Create a Related Search Component

2011-10-10 Thread Cameron (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-2080?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13124396#comment-13124396
 ] 

Cameron commented on SOLR-2080:
---

Any progress on this feature? I'm curious as to how one would build a feature 
like this without the need for external data such as logs.

> Create a Related Search Component
> -
>
> Key: SOLR-2080
> URL: https://issues.apache.org/jira/browse/SOLR-2080
> Project: Solr
>  Issue Type: New Feature
>Reporter: Grant Ingersoll
>
> Similar to spell checking, it is often useful to be able to, given a search, 
> get back related searches, as determined by some model (perhaps external, 
> perhaps internal -- as in a different core).  For now, I'm not concerned with 
> the process of adding queries to the model.  
> So, for example, given the query "television", this component _might_ return: 
> LCD tvs, plasma tvs, HDTV, etc.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



Re: svn commit: r1181104 - in /lucene/dev/trunk/lucene/src: java/org/apache/lucene/index/values/TypePromoter.java test/org/apache/lucene/index/values/TestTypePromotion.java

2011-10-10 Thread Simon Willnauer
ah crap! thanks robert!


On Mon, Oct 10, 2011 at 8:05 PM,   wrote:
> Author: rmuir
> Date: Mon Oct 10 18:05:18 2011
> New Revision: 1181104
>
> URL: http://svn.apache.org/viewvc?rev=1181104&view=rev
> Log:
> LUCENE-3186: svn add
>
> Added:
>    
> lucene/dev/trunk/lucene/src/java/org/apache/lucene/index/values/TypePromoter.java
>    (with props)
>    
> lucene/dev/trunk/lucene/src/test/org/apache/lucene/index/values/TestTypePromotion.java
>    (with props)
>
> Added: 
> lucene/dev/trunk/lucene/src/java/org/apache/lucene/index/values/TypePromoter.java
> URL: 
> http://svn.apache.org/viewvc/lucene/dev/trunk/lucene/src/java/org/apache/lucene/index/values/TypePromoter.java?rev=1181104&view=auto
> ==
> --- 
> lucene/dev/trunk/lucene/src/java/org/apache/lucene/index/values/TypePromoter.java
>  (added)
> +++ 
> lucene/dev/trunk/lucene/src/java/org/apache/lucene/index/values/TypePromoter.java
>  Mon Oct 10 18:05:18 2011
> @@ -0,0 +1,204 @@
> +package org.apache.lucene.index.values;
> +
> +/**
> + * Licensed to the Apache Software Foundation (ASF) under one or more
> + * contributor license agreements.  See the NOTICE file distributed with
> + * this work for additional information regarding copyright ownership.
> + * The ASF licenses this file to You under the Apache License, Version 2.0
> + * (the "License"); you may not use this file except in compliance with
> + * the License.  You may obtain a copy of the License at
> + *
> + *     http://www.apache.org/licenses/LICENSE-2.0
> + *
> + * Unless required by applicable law or agreed to in writing, software
> + * distributed under the License is distributed on an "AS IS" BASIS,
> + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
> + * See the License for the specific language governing permissions and
> + * limitations under the License.
> + */
> +import java.util.HashMap;
> +import java.util.Map;
> +
> +/**
> + * Type promoter that promotes {@link IndexDocValues} during merge based on
> + * their {@link ValueType} and {@link #getValueSize()}
> + *
> + * @lucene.internal
> + */
> +public class TypePromoter {
> +
> +  private final static Map FLAGS_MAP = new 
> HashMap();
> +  private static final TypePromoter IDENTITY_PROMOTER = new 
> IdentityTypePromoter();
> +  public static final int VAR_TYPE_VALUE_SIZE = -1;
> +
> +  private static final int IS_INT = 1 << 0;
> +  private static final int IS_BYTE = 1 << 1;
> +  private static final int IS_FLOAT = 1 << 2;
> +  /* VAR & FIXED == VAR */
> +  private static final int IS_VAR = 1 << 3;
> +  private static final int IS_FIXED = 1 << 3 | 1 << 4;
> +  /* if we have FIXED & FIXED with different size we promote to VAR */
> +  private static final int PROMOTE_TO_VAR_SIZE_MASK = ~(1 << 3);
> +  /* STRAIGHT & DEREF == STRAIGHT (dense values win) */
> +  private static final int IS_STRAIGHT = 1 << 5;
> +  private static final int IS_DEREF = 1 << 5 | 1 << 6;
> +  private static final int IS_SORTED = 1 << 7;
> +  /* more bits wins (int16 & int32 == int32) */
> +  private static final int IS_8_BIT = 1 << 8 | 1 << 9 | 1 << 10 | 1 << 11;
> +  private static final int IS_16_BIT = 1 << 9 | 1 << 10 | 1 << 11;
> +  private static final int IS_32_BIT = 1 << 10 | 1 << 11;
> +  private static final int IS_64_BIT = 1 << 11;
> +
> +  private final ValueType type;
> +  private final int flags;
> +  private final int valueSize;
> +
> +  /**
> +   * Returns a positive value size if this {@link TypePromoter} represents a
> +   * fixed variant, otherwise -1
> +   *
> +   * @return a positive value size if this {@link TypePromoter} represents a
> +   *         fixed variant, otherwise -1
> +   */
> +  public int getValueSize() {
> +    return valueSize;
> +  }
> +
> +  static {
> +    for (ValueType type : ValueType.values()) {
> +      TypePromoter create = create(type, VAR_TYPE_VALUE_SIZE);
> +      FLAGS_MAP.put(create.flags, type);
> +    }
> +  }
> +
> +  /**
> +   * Creates a new {@link TypePromoter}
> +   *
> +   * @param type
> +   *          the {@link ValueType} this promoter represents
> +   * @param flags
> +   *          the promoters flags
> +   * @param valueSize
> +   *          the value size if {@link #IS_FIXED} or -1 
> otherwise.
> +   */
> +  protected TypePromoter(ValueType type, int flags, int valueSize) {
> +    this.type = type;
> +    this.flags = flags;
> +    this.valueSize = valueSize;
> +  }
> +
> +  /**
> +   * Creates a new promoted {@link TypePromoter} based on this and the given
> +   * {@link TypePromoter} or null iff the {@link TypePromoter}
> +   * aren't compatible.
> +   *
> +   * @param promoter
> +   *          the incoming promoter
> +   * @return a new promoted {@link TypePromoter} based on this and the given
> +   *         {@link TypePromoter} or null iff the
> +   *         {@link TypePromoter} aren't compatible.
> +   */
> +  public TypePromoter promote(TypePromoter promoter) {
> +
> +    int promo

[jira] [Updated] (LUCENE-3503) DisjunctionSumScorer gives slightly (float iotas) different scores when you .nextDoc vs .advance

2011-10-10 Thread Robert Muir (Updated) (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-3503?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Robert Muir updated LUCENE-3503:


Attachment: LUCENE-3503.patch

patch with a bugfix to the test (in case it gets slowmultireaderwrapper)

> DisjunctionSumScorer gives slightly (float iotas) different scores when you 
> .nextDoc vs .advance
> 
>
> Key: LUCENE-3503
> URL: https://issues.apache.org/jira/browse/LUCENE-3503
> Project: Lucene - Java
>  Issue Type: Bug
>Reporter: Michael McCandless
> Attachments: LUCENE-3503.patch, LUCENE-3503.patch, LUCENE-3503.patch
>
>
> Spinoff from LUCENE-1536.
> I dug into why we hit a score diff when using luceneutil to benchmark
> the patch.
> At first I thought it was BS1/BS2 difference, but because of a bug in
> the patch it was still using BS2 (but should be BS1) -- Robert's last
> patch fixes that.
> But it's actually a diff in BS2 itself, whether you next or advance
> through the docs.
> It's because DisjunctionSumScorer, when summing the float scores for a
> given doc that matches multiple sub-scorers, might sum in a different
> order, when you had .nextDoc'd to that doc than when you had .advance'd
> to it.
> This in turn is because the PQ used by that scorer (ScorerDocQueue)
> makes no effort to break ties.  So, when the top N scorers are on the
> same doc, the PQ doesn't care what order they are in.
> Fixing ScorerDocQueue to break ties will likely be a non-trivial perf
> hit, though, so I'm not sure whether we should do anything here...

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-1536) if a filter can support random access API, we should use it

2011-10-10 Thread Michael McCandless (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-1536?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13124414#comment-13124414
 ] 

Michael McCandless commented on LUCENE-1536:


bq. Mike: We can no longer do this, as the acceptDocs passed to the 
getDocIdSet() are no longer always liveDocs, they can be everything.

But CWF's job is still the same with this patch?

It's just that the cache key is now a reader + acceptDocs (instead of
just reader), and the "policy" must be more careful not to cache just
any acceptDocs.

Ie, we could easily add back the RECACHE option (maybe just a boolean
"cacheLiveDocs" or something)?  Or am I missing something?

The IGNORE option must go away, since no filter impl is allowed to ignore
the incoming acceptDocs.  The DYNAMIC option is what the patch now
hardwires.

I think this use case (app using CWF, doing deletes and reopening
periodically) is important.  For this use case we should do the AND w/
liveDocs only once on each reopen, and cache & reuse that, instead of
re-ANDing over and over for every query.


> if a filter can support random access API, we should use it
> ---
>
> Key: LUCENE-1536
> URL: https://issues.apache.org/jira/browse/LUCENE-1536
> Project: Lucene - Java
>  Issue Type: Improvement
>  Components: core/search
>Affects Versions: 2.4
>Reporter: Michael McCandless
>Assignee: Michael McCandless
>Priority: Minor
>  Labels: gsoc2011, lucene-gsoc-11, mentor
> Fix For: 4.0
>
> Attachments: CachedFilterIndexReader.java, LUCENE-1536-rewrite.patch, 
> LUCENE-1536-rewrite.patch, LUCENE-1536-rewrite.patch, 
> LUCENE-1536-rewrite.patch, LUCENE-1536-rewrite.patch, 
> LUCENE-1536-rewrite.patch, LUCENE-1536-rewrite.patch, 
> LUCENE-1536-rewrite.patch, LUCENE-1536.patch, LUCENE-1536.patch, 
> LUCENE-1536.patch, LUCENE-1536.patch, LUCENE-1536.patch, LUCENE-1536.patch, 
> LUCENE-1536.patch, LUCENE-1536.patch, LUCENE-1536.patch, LUCENE-1536.patch, 
> LUCENE-1536.patch, LUCENE-1536.patch, LUCENE-1536.patch, LUCENE-1536.patch, 
> LUCENE-1536.patch, LUCENE-1536.patch, LUCENE-1536.patch, LUCENE-1536.patch, 
> LUCENE-1536.patch, LUCENE-1536.patch, LUCENE-1536.patch, LUCENE-1536.patch, 
> LUCENE-1536.patch, LUCENE-1536.patch, LUCENE-1536.patch, LUCENE-1536.patch, 
> LUCENE-1536_hack.patch, changes-yonik-uwe.patch, luceneutil.patch
>
>
> I ran some performance tests, comparing applying a filter via
> random-access API instead of current trunk's iterator API.
> This was inspired by LUCENE-1476, where we realized deletions should
> really be implemented just like a filter, but then in testing found
> that switching deletions to iterator was a very sizable performance
> hit.
> Some notes on the test:
>   * Index is first 2M docs of Wikipedia.  Test machine is Mac OS X
> 10.5.6, quad core Intel CPU, 6 GB RAM, java 1.6.0_07-b06-153.
>   * I test across multiple queries.  1-X means an OR query, eg 1-4
> means 1 OR 2 OR 3 OR 4, whereas +1-4 is an AND query, ie 1 AND 2
> AND 3 AND 4.  "u s" means "united states" (phrase search).
>   * I test with multiple filter densities (0, 1, 2, 5, 10, 25, 75, 90,
> 95, 98, 99, 99.9 (filter is non-null but all bits are set),
> 100 (filter=null, control)).
>   * Method high means I use random-access filter API in
> IndexSearcher's main loop.  Method low means I use random-access
> filter API down in SegmentTermDocs (just like deleted docs
> today).
>   * Baseline (QPS) is current trunk, where filter is applied as iterator up
> "high" (ie in IndexSearcher's search loop).

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



What is the sync rate between svn and git

2011-10-10 Thread Pulkit Singhal
How often does a sync between the following two occur?

https://github.com/apache/lucene-solr
http://svn.apache.org/repos/asf/lucene/dev/trunk

Upon a shallow inspection I would say they are at least 1 day apart, am I
correct?

Thanks!
- Pulkit


Re: What is the sync rate between svn and git

2011-10-10 Thread Chris Hostetter

: How often does a sync between the following two occur?
: 
: https://github.com/apache/lucene-solr
: http://svn.apache.org/repos/asf/lucene/dev/trunk

I think you'll have to ask the github team that question, it's their 
mirror.

This is all the difinitive info i know about Git mirrors @ apache...

https://www.apache.org/dev/git.html
https://wiki.apache.org/general/GitAtApache

-Hoss

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (SOLR-2487) Do not include slf4j-jdk14 jar in WAR

2011-10-10 Thread Updated

 [ 
https://issues.apache.org/jira/browse/SOLR-2487?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jan Høydahl updated SOLR-2487:
--

Attachment: SOLR-2487.patch

This patch attempts a solution which hopefully solves all needs. It adds a new 
ant target "dist-war-minimal" which creates a minimal Solr WAR Distribution 
file, without slf4j jars, except for the slf4j-api itself.

The target is not included in the "dist" task, so the minimal war will not be 
packaged redundantly. However, being an explicit target it is more easily 
understood and discoverable than a build parameter. Users packaging the war 
using this target must supply the relevant slf4j jars elsewhere on their 
classpath.

This minimal war could be a candidate for uploading to maven.

> Do not include slf4j-jdk14 jar in WAR
> -
>
> Key: SOLR-2487
> URL: https://issues.apache.org/jira/browse/SOLR-2487
> Project: Solr
>  Issue Type: Improvement
>  Components: Build
>Affects Versions: 3.2, 4.0
>Reporter: Jan Høydahl
>  Labels: logging, slf4j
> Attachments: SOLR-2487.patch
>
>
> I know we've intentionally bundled slf4j-jdk14-1.5.5.jar in the war to help 
> newbies get up and running. But I find myself re-packaging the war for every 
> customer when adapting to their choice of logger framework, which is 
> counter-productive.
> It would be sufficient to have the jdk-logging binding in example/lib to let 
> the example and tutorial still work OOTB but as soon as you deploy solr.war 
> to production you're forced to explicitly decide what logging to use.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



Re: What is the sync rate between svn and git

2011-10-10 Thread Mark Miller

On Oct 10, 2011, at 4:01 PM, Pulkit Singhal wrote:

> How often does a sync between the following two occur?
> 
> https://github.com/apache/lucene-solr
> http://svn.apache.org/repos/asf/lucene/dev/trunk
> 
> Upon a shallow inspection I would say they are at least 1 day apart, am I 
> correct?
> 
> Thanks!
> - Pulkit

It varies - usually it's around a day as you note - at least many hours. I've 
seen it up to a week even though. The last time I saw that is when I stopped 
using the GitHub mirrors.

http://git.apache.org/

The repo's there are always right up to date in my experience. I tend to add 
those as upstream sources and pull from them instead.

- Mark Miller
lucidimagination.com
2011.lucene-eurocon.org | Oct 17-20 | Barcelona











-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-2765) Shard/Node states

2011-10-10 Thread Jamie Johnson (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-2765?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13124477#comment-13124477
 ] 

Jamie Johnson commented on SOLR-2765:
-

I'm preparing a new patch tonight, I believe I've narrowed down what is causing 
the issues I mentioned before, Also I am going back to ClusterState and Slice 
being immutable instead of locking.

> Shard/Node states
> -
>
> Key: SOLR-2765
> URL: https://issues.apache.org/jira/browse/SOLR-2765
> Project: Solr
>  Issue Type: Sub-task
>  Components: SolrCloud, update
>Reporter: Yonik Seeley
> Fix For: 4.0
>
> Attachments: cluster_state-file.patch, combined.patch, 
> incremental_update.patch, scheduled_executors.patch, shard-roles.patch
>
>
> Need state for shards that indicate they are recovering, active/enabled, or 
> disabled.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Created] (LUCENE-3504) DocValues: deref/sorted bytes types shouldn't return empty byte[] when doc didn't have a value

2011-10-10 Thread Michael McCandless (Created) (JIRA)
DocValues: deref/sorted bytes types shouldn't return empty byte[] when doc 
didn't have a value
--

 Key: LUCENE-3504
 URL: https://issues.apache.org/jira/browse/LUCENE-3504
 Project: Lucene - Java
  Issue Type: Bug
Reporter: Michael McCandless
Assignee: Michael McCandless
 Fix For: 4.0


I'm looking at making a FieldComparator that uses DV's SortedSource to
sort by string field (ie just like TermOrdValComparator, except using
DV instead of FieldCache).  We already have comparators for DV int and
float DV fields.

But one thing I noticed is we can't detect documents that didn't have
any value indexed vs documents that had empty byte[] indexed.

This is easy to fix (and we used to do this), because these types are
deref'd (ie, each doc stores an address, and then separately looks up
the byte[] at that address), we can reserve ord/address 0 to mean "doc
didn't have the field".  Then we should return null when you retrieve
the BytesRef value for that field.


--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-1499) SolrEntityProcessor - DIH EntityProcessor that queries an external Solr via SolrJ

2011-10-10 Thread Pulkit Singhal (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-1499?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13124522#comment-13124522
 ] 

Pulkit Singhal commented on SOLR-1499:
--

Tried to update the patch but I can't get Solr admin interface up & running 
anymore after applying it with the modifications, perhaps someone can advice?

> SolrEntityProcessor - DIH EntityProcessor that queries an external Solr via 
> SolrJ
> -
>
> Key: SOLR-1499
> URL: https://issues.apache.org/jira/browse/SOLR-1499
> Project: Solr
>  Issue Type: New Feature
>  Components: contrib - DataImportHandler
>Reporter: Lance Norskog
> Fix For: 3.5, 4.0
>
> Attachments: SOLR-1499.patch, SOLR-1499.patch, SOLR-1499.patch, 
> SOLR-1499.patch, SOLR-1499.patch, SOLR-1499.patch
>
>
> The SolrEntityProcessor queries an external Solr instance. The Solr documents 
> returned are unpacked and emitted as DIH fields.
> The SolrEntityProcessor uses the following attributes:
> * solr='http://localhost:8983/solr/sms'
> ** This gives the URL of the target Solr instance.
> *** Note: the connection to the target Solr uses the binary SolrJ format.
> * query='Jefferson&sort=id+asc'
> ** This gives the base query string use with Solr. It can include any 
> standard Solr request parameter. This attribute is processed under the 
> variable resolution rules and can be driven in an inner stage of the indexing 
> pipeline.
> * rows='10'
> ** This gives the number of rows to fetch per request..
> ** The SolrEntityProcessor always fetches every document that matches the 
> request..
> * fields='id,tag'
> ** This selects the fields to be returned from the Solr request.
> ** These must also be declared as  elements.
> ** As with all fields, template processors can be used to alter the contents 
> to be passed downwards.
> * timeout='30'
> ** This limits the query to 5 seconds. This can be used as a fail-safe to 
> prevent the indexing session from freezing up. By default the timeout is 5 
> minutes.
> Limitations:
> * Solr errors are not handled correctly.
> * Loop control constructs have not been tested.
> * Multi-valued returned fields have not been tested.
> The unit tests give examples of how to use it as the root entity and an inner 
> entity.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Issue Comment Edited] (SOLR-1499) SolrEntityProcessor - DIH EntityProcessor that queries an external Solr via SolrJ

2011-10-10 Thread Pulkit Singhal (Issue Comment Edited) (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-1499?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13124522#comment-13124522
 ] 

Pulkit Singhal edited comment on SOLR-1499 at 10/10/11 10:09 PM:
-

Tried to update the patch for the lucene-solr trunk ... but I can't get Solr 
admin interface up & running anymore after applying it with the modifications, 
perhaps someone can advice?

  was (Author: pulkitsing...@gmail.com):
Tried to update the patch but I can't get Solr admin interface up & running 
anymore after applying it with the modifications, perhaps someone can advice?
  
> SolrEntityProcessor - DIH EntityProcessor that queries an external Solr via 
> SolrJ
> -
>
> Key: SOLR-1499
> URL: https://issues.apache.org/jira/browse/SOLR-1499
> Project: Solr
>  Issue Type: New Feature
>  Components: contrib - DataImportHandler
>Reporter: Lance Norskog
> Fix For: 3.5, 4.0
>
> Attachments: SOLR-1499.patch, SOLR-1499.patch, SOLR-1499.patch, 
> SOLR-1499.patch, SOLR-1499.patch, SOLR-1499.patch
>
>
> The SolrEntityProcessor queries an external Solr instance. The Solr documents 
> returned are unpacked and emitted as DIH fields.
> The SolrEntityProcessor uses the following attributes:
> * solr='http://localhost:8983/solr/sms'
> ** This gives the URL of the target Solr instance.
> *** Note: the connection to the target Solr uses the binary SolrJ format.
> * query='Jefferson&sort=id+asc'
> ** This gives the base query string use with Solr. It can include any 
> standard Solr request parameter. This attribute is processed under the 
> variable resolution rules and can be driven in an inner stage of the indexing 
> pipeline.
> * rows='10'
> ** This gives the number of rows to fetch per request..
> ** The SolrEntityProcessor always fetches every document that matches the 
> request..
> * fields='id,tag'
> ** This selects the fields to be returned from the Solr request.
> ** These must also be declared as  elements.
> ** As with all fields, template processors can be used to alter the contents 
> to be passed downwards.
> * timeout='30'
> ** This limits the query to 5 seconds. This can be used as a fail-safe to 
> prevent the indexing session from freezing up. By default the timeout is 5 
> minutes.
> Limitations:
> * Solr errors are not handled correctly.
> * Loop control constructs have not been tested.
> * Multi-valued returned fields have not been tested.
> The unit tests give examples of how to use it as the root entity and an inner 
> entity.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (SOLR-1499) SolrEntityProcessor - DIH EntityProcessor that queries an external Solr via SolrJ

2011-10-10 Thread Pulkit Singhal (Updated) (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-1499?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Pulkit Singhal updated SOLR-1499:
-

Attachment: SOLR-1499.rev1181269.buggy.patch

> SolrEntityProcessor - DIH EntityProcessor that queries an external Solr via 
> SolrJ
> -
>
> Key: SOLR-1499
> URL: https://issues.apache.org/jira/browse/SOLR-1499
> Project: Solr
>  Issue Type: New Feature
>  Components: contrib - DataImportHandler
>Reporter: Lance Norskog
> Fix For: 3.5, 4.0
>
> Attachments: SOLR-1499.patch, SOLR-1499.patch, SOLR-1499.patch, 
> SOLR-1499.patch, SOLR-1499.patch, SOLR-1499.patch, 
> SOLR-1499.rev1181269.buggy.patch
>
>
> The SolrEntityProcessor queries an external Solr instance. The Solr documents 
> returned are unpacked and emitted as DIH fields.
> The SolrEntityProcessor uses the following attributes:
> * solr='http://localhost:8983/solr/sms'
> ** This gives the URL of the target Solr instance.
> *** Note: the connection to the target Solr uses the binary SolrJ format.
> * query='Jefferson&sort=id+asc'
> ** This gives the base query string use with Solr. It can include any 
> standard Solr request parameter. This attribute is processed under the 
> variable resolution rules and can be driven in an inner stage of the indexing 
> pipeline.
> * rows='10'
> ** This gives the number of rows to fetch per request..
> ** The SolrEntityProcessor always fetches every document that matches the 
> request..
> * fields='id,tag'
> ** This selects the fields to be returned from the Solr request.
> ** These must also be declared as  elements.
> ** As with all fields, template processors can be used to alter the contents 
> to be passed downwards.
> * timeout='30'
> ** This limits the query to 5 seconds. This can be used as a fail-safe to 
> prevent the indexing session from freezing up. By default the timeout is 5 
> minutes.
> Limitations:
> * Solr errors are not handled correctly.
> * Loop control constructs have not been tested.
> * Multi-valued returned fields have not been tested.
> The unit tests give examples of how to use it as the root entity and an inner 
> entity.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Resolved] (LUCENE-3486) Add SearcherLifetimeManager, so you can retrieve the same searcher you previously used

2011-10-10 Thread Michael McCandless (Resolved) (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-3486?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Michael McCandless resolved LUCENE-3486.


Resolution: Fixed

> Add SearcherLifetimeManager, so you can retrieve the same searcher you 
> previously used
> --
>
> Key: LUCENE-3486
> URL: https://issues.apache.org/jira/browse/LUCENE-3486
> Project: Lucene - Java
>  Issue Type: New Feature
>  Components: core/search
>Reporter: Michael McCandless
>Assignee: Michael McCandless
> Fix For: 3.5, 4.0
>
> Attachments: LUCENE-3486.patch, LUCENE-3486.patch, LUCENE-3486.patch
>
>
> The idea is similar to SOLR-2809 (adding searcher leases to Solr).
> This utility class sits above whatever your source is for "the
> current" searcher (eg NRTManager, SearcherManager, etc.), and records
> (holds a reference to) each searcher in recent history.
> The idea is to ensure that when a user does a follow-on action (clicks
> next page, drills down/up), or when two or more searcher invocations
> within a single user search need to happen against the same searcher
> (eg in distributed search), you can retrieve the same searcher you
> used "last time".
> I think with the new searchAfter API (LUCENE-2215), doing follow-on
> searches on the same searcher is more important, since the "bottom"
> (score/docID) held for that API can easily shift when a new searcher
> is opened.
> When you do a "new" search, you record the searcher you used with the
> manager, and it returns to you a long token (currently just the
> IR.getVersion()), which you can later use to retrieve the same
> searcher.
> Separately you must periodically call prune(), to prune the old
> searchers, ideally from the same thread / at the same time that
> you open a new searcher.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-3479) TestGrouping failure

2011-10-10 Thread Michael McCandless (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-3479?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13124563#comment-13124563
 ] 

Michael McCandless commented on LUCENE-3479:


I think this fix is OK?  Can we commit this?

> TestGrouping failure
> 
>
> Key: LUCENE-3479
> URL: https://issues.apache.org/jira/browse/LUCENE-3479
> Project: Lucene - Java
>  Issue Type: Bug
>  Components: modules/grouping
>Reporter: Michael McCandless
>Assignee: Robert Muir
> Attachments: LUCENE-3479.patch
>
>
> {noformat}
> ant test -Dtestcase=TestGrouping -Dtestmethod=testRandom 
> -Dtests.seed=295cdb78b4a442d4:-4c5d64ef4d698c27:-425d4c1eb87211ba
> {noformat}
> fails with this on current trunk:
> {noformat}
> [junit] - Standard Error -
> [junit] NOTE: reproduce with: ant test -Dtestcase=TestGrouping 
> -Dtestmethod=testRandom 
> -Dtests.seed=295cdb78b4a442d4:-4c5d64ef4d698c27:-425d4c1eb87211ba
> [junit] NOTE: test params are: codec=RandomCodecProvider: {id=MockRandom, 
> content=MockSep, sort2=SimpleText, groupend=Pulsing(freqCutoff=3 
> minBlockSize=65 maxBlockSize=132), sort1=Memory, group=Memory}, 
> sim=RandomSimilarityProvider(queryNorm=true,coord=false): {id=DFR I(F)L2, 
> content=DFR BeL3(800.0), sort2=DFR GL3(800.0), groupend=DFR G2, sort1=DFR 
> GB3(800.0), group=LM Jelinek-Mercer(0.70)}, locale=zh_TW, 
> timezone=America/Indiana/Indianapolis
> [junit] NOTE: all tests run in this JVM:
> [junit] [TestGrouping]
> [junit] NOTE: Linux 2.6.33.6-147.fc13.x86_64 amd64/Sun Microsystems Inc. 
> 1.6.0_21 (64-bit)/cpus=24,threads=1,free=143246344,total=281804800
> [junit] -  ---
> [junit] Testcase: 
> testRandom(org.apache.lucene.search.grouping.TestGrouping): FAILED
> [junit] expected:<11> but was:<7>
> [junit] junit.framework.AssertionFailedError: expected:<11> but was:<7>
> [junit]   at 
> org.apache.lucene.util.LuceneTestCaseRunner.runChild(LuceneTestCaseRunner.java:148)
> [junit]   at 
> org.apache.lucene.util.LuceneTestCaseRunner.runChild(LuceneTestCaseRunner.java:50)
> [junit]   at 
> org.apache.lucene.search.grouping.TestGrouping.assertEquals(TestGrouping.java:980)
> [junit]   at 
> org.apache.lucene.search.grouping.TestGrouping.testRandom(TestGrouping.java:865)
> [junit]   at 
> org.apache.lucene.util.LuceneTestCase$2$1.evaluate(LuceneTestCase.java:611)
> [junit] 
> [junit] 
> {noformat}
> I dug for a while... the test is a bit sneaky because it compares sorted docs 
> (by score) across 2 indexes.  Index #1 has no deletions; Index #2 has same 
> docs, but organized into doc blocks by group, and has some deletions.  In 
> theory (I think) even though the deletions will cause scores to differ across 
> the two indices, it should not alter the sort order of the docs.  Here is the 
> explain output of the docs that sorted differently:
> {noformat}
> #1: top hit in the "has deletes doc-block" index (id=239):
> explain: 2.394486 = (MATCH) weight(content:real1 in 292)
> [DFRSimilarity], result of:
>  2.394486 = score(DFRSimilarity, doc=292, freq=1.0), computed from:
>1.0 = termFreq=1
>41.944084 = NormalizationH3, computed from:
>  1.0 = tf
>  5.3102274 = avgFieldLength
>  2.56 = len
>102.829 = BasicModelBE, computed from:
>  41.944084 = tfn
>  880.0 = numberOfDocuments
>  239.0 = totalTermFreq
>0.023286095 = AfterEffectL, computed from:
>  41.944084 = tfn
> #2: hit in the "no deletes normal index" (id=229)
> ID=229 explain=2.382285 = (MATCH) weight(content:real1 in 225)
> [DFRSimilarity], result of:
>  2.382285 = score(DFRSimilarity, doc=225, freq=1.0), computed from:
>1.0 = termFreq=1
>41.765594 = NormalizationH3, computed from:
>  1.0 = tf
>  5.3218827 = avgFieldLength
>  10.24 = len
>101.879845 = BasicModelBE, computed from:
>  41.765594 = tfn
>  786.0 = numberOfDocuments
>  215.0 = totalTermFreq
>0.023383282 = AfterEffectL, computed from:
>  41.765594 = tfn
> Then I went and called explain on the "no deletes normal index" for
> the top doc (id=239):
> explain: 2.3822558 = (MATCH) weight(content:real1 in 17)
> [DFRSimilarity], result of:
>  2.3822558 = score(DFRSimilarity, doc=17, freq=1.0), computed from:
>1.0 = termFreq=1
>42.165264 = NormalizationH3, computed from:
>  1.0 = tf
>  5.3218827 = avgFieldLength
>  2.56 = len
>102.8307 = BasicModelBE, computed from:
>  42.165264 = tfn
>  786.0 = numberOfDocuments
>  215.0 = totalTermFreq
>0.023166776 = AfterEffectL, computed from:
>  42.165264 = tfn
> {noformat}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 

[jira] [Resolved] (LUCENE-3479) TestGrouping failure

2011-10-10 Thread Robert Muir (Resolved) (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-3479?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Robert Muir resolved LUCENE-3479.
-

   Resolution: Fixed
Fix Version/s: 4.0

> TestGrouping failure
> 
>
> Key: LUCENE-3479
> URL: https://issues.apache.org/jira/browse/LUCENE-3479
> Project: Lucene - Java
>  Issue Type: Bug
>  Components: modules/grouping
>Reporter: Michael McCandless
>Assignee: Robert Muir
> Fix For: 4.0
>
> Attachments: LUCENE-3479.patch
>
>
> {noformat}
> ant test -Dtestcase=TestGrouping -Dtestmethod=testRandom 
> -Dtests.seed=295cdb78b4a442d4:-4c5d64ef4d698c27:-425d4c1eb87211ba
> {noformat}
> fails with this on current trunk:
> {noformat}
> [junit] - Standard Error -
> [junit] NOTE: reproduce with: ant test -Dtestcase=TestGrouping 
> -Dtestmethod=testRandom 
> -Dtests.seed=295cdb78b4a442d4:-4c5d64ef4d698c27:-425d4c1eb87211ba
> [junit] NOTE: test params are: codec=RandomCodecProvider: {id=MockRandom, 
> content=MockSep, sort2=SimpleText, groupend=Pulsing(freqCutoff=3 
> minBlockSize=65 maxBlockSize=132), sort1=Memory, group=Memory}, 
> sim=RandomSimilarityProvider(queryNorm=true,coord=false): {id=DFR I(F)L2, 
> content=DFR BeL3(800.0), sort2=DFR GL3(800.0), groupend=DFR G2, sort1=DFR 
> GB3(800.0), group=LM Jelinek-Mercer(0.70)}, locale=zh_TW, 
> timezone=America/Indiana/Indianapolis
> [junit] NOTE: all tests run in this JVM:
> [junit] [TestGrouping]
> [junit] NOTE: Linux 2.6.33.6-147.fc13.x86_64 amd64/Sun Microsystems Inc. 
> 1.6.0_21 (64-bit)/cpus=24,threads=1,free=143246344,total=281804800
> [junit] -  ---
> [junit] Testcase: 
> testRandom(org.apache.lucene.search.grouping.TestGrouping): FAILED
> [junit] expected:<11> but was:<7>
> [junit] junit.framework.AssertionFailedError: expected:<11> but was:<7>
> [junit]   at 
> org.apache.lucene.util.LuceneTestCaseRunner.runChild(LuceneTestCaseRunner.java:148)
> [junit]   at 
> org.apache.lucene.util.LuceneTestCaseRunner.runChild(LuceneTestCaseRunner.java:50)
> [junit]   at 
> org.apache.lucene.search.grouping.TestGrouping.assertEquals(TestGrouping.java:980)
> [junit]   at 
> org.apache.lucene.search.grouping.TestGrouping.testRandom(TestGrouping.java:865)
> [junit]   at 
> org.apache.lucene.util.LuceneTestCase$2$1.evaluate(LuceneTestCase.java:611)
> [junit] 
> [junit] 
> {noformat}
> I dug for a while... the test is a bit sneaky because it compares sorted docs 
> (by score) across 2 indexes.  Index #1 has no deletions; Index #2 has same 
> docs, but organized into doc blocks by group, and has some deletions.  In 
> theory (I think) even though the deletions will cause scores to differ across 
> the two indices, it should not alter the sort order of the docs.  Here is the 
> explain output of the docs that sorted differently:
> {noformat}
> #1: top hit in the "has deletes doc-block" index (id=239):
> explain: 2.394486 = (MATCH) weight(content:real1 in 292)
> [DFRSimilarity], result of:
>  2.394486 = score(DFRSimilarity, doc=292, freq=1.0), computed from:
>1.0 = termFreq=1
>41.944084 = NormalizationH3, computed from:
>  1.0 = tf
>  5.3102274 = avgFieldLength
>  2.56 = len
>102.829 = BasicModelBE, computed from:
>  41.944084 = tfn
>  880.0 = numberOfDocuments
>  239.0 = totalTermFreq
>0.023286095 = AfterEffectL, computed from:
>  41.944084 = tfn
> #2: hit in the "no deletes normal index" (id=229)
> ID=229 explain=2.382285 = (MATCH) weight(content:real1 in 225)
> [DFRSimilarity], result of:
>  2.382285 = score(DFRSimilarity, doc=225, freq=1.0), computed from:
>1.0 = termFreq=1
>41.765594 = NormalizationH3, computed from:
>  1.0 = tf
>  5.3218827 = avgFieldLength
>  10.24 = len
>101.879845 = BasicModelBE, computed from:
>  41.765594 = tfn
>  786.0 = numberOfDocuments
>  215.0 = totalTermFreq
>0.023383282 = AfterEffectL, computed from:
>  41.765594 = tfn
> Then I went and called explain on the "no deletes normal index" for
> the top doc (id=239):
> explain: 2.3822558 = (MATCH) weight(content:real1 in 17)
> [DFRSimilarity], result of:
>  2.3822558 = score(DFRSimilarity, doc=17, freq=1.0), computed from:
>1.0 = termFreq=1
>42.165264 = NormalizationH3, computed from:
>  1.0 = tf
>  5.3218827 = avgFieldLength
>  2.56 = len
>102.8307 = BasicModelBE, computed from:
>  42.165264 = tfn
>  786.0 = numberOfDocuments
>  215.0 = totalTermFreq
>0.023166776 = AfterEffectL, computed from:
>  42.165264 = tfn
> {noformat}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdm

[jira] [Assigned] (LUCENE-3487) TestBooleanMinShouldMatch test failure

2011-10-10 Thread Robert Muir (Assigned) (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-3487?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Robert Muir reassigned LUCENE-3487:
---

Assignee: Robert Muir

> TestBooleanMinShouldMatch test failure
> --
>
> Key: LUCENE-3487
> URL: https://issues.apache.org/jira/browse/LUCENE-3487
> Project: Lucene - Java
>  Issue Type: Bug
>Affects Versions: 4.0
>Reporter: Robert Muir
>Assignee: Robert Muir
>
> ant test -Dtestcase=TestBooleanMinShouldMatch -Dtestmethod=testRandomQueries 
> -Dtests.seed=505d62a62e9f90d0:-60daa428161b404b:-406411290a98f416
> I think its an absolute/relative epsilon issue

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-2366) Facet Range Gaps

2011-10-10 Thread Hoss Man (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-2366?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13124591#comment-13124591
 ] 

Hoss Man commented on SOLR-2366:



Jan: I've got to be completely honest here -- catching up on this issue, I got 
really confused and lost by some of your comments and the updated docs.

This sequence of comments really stands out at me...

{quote}
I have no good answer to this, other than inventing some syntax.
...
I think the values facet.range.include=upper/lower is clear. Outer/edge would 
need some more work/definition.
...
*My primary reason for suggesting this is to give users a terse, intuitive 
syntax for ranges.*
...
One thing this improvement needs to tackle is how to return the range buckets 
in the Response. It will not be enough with the simple range_facet format ... 
We need something which can return the explicit ranges,
{quote}

(emphasis added by me)

I really liked the simplicity of your earlier proposal, and I agree that it 
would be really powerful/helpful to give users a terse, intuitive syntax for 
specifying sequential ranges of variable sizes -- but it seems like we're 
really moving away from the syntax being "intuitive" because of the hoops 
you're having to jump through to treat this as an extension of the existing 
"facet.range" param in your design.

I think we really ought to revisit my earlier suggestion to approach this as an 
entirely new "type" of faceting - not a new plugin or a contrib, but a new 
first-class type of faceting that FacetComponent would support, right along 
side facet.field, facet.query, and facet.range.  Let's ignore everything about 
the existing facet.range.* param syntax, and the facet_range response format, 
and think about what makes the most sense for this feature on it's own.  If 
there are ideas from facet.range that make sense to carry over (like 
facet.range.include) then great -- but let's approach it from the "something 
new that can borrow from facet.range" standpoint instead of the "extension to 
facet.range that has a bunch of caveats with how facet.range already works"

I mean: if it looks like a duck, walks like a duck, and quacks like a duck, 
then i'm happy to call it a duck -- but in this case:
 * doesn't make sense with facet.range.other
 * needs special start/end syntax to play nice with facet.range.start/end
 * needs to change the response format

...ie: it doesn't look the same, it doesn't walk the same, and it doesn't quack.

---

Regardless of whether this functionality becomes part of facet.range or not, I 
wanted to comment specifically on this idea...

bq. If all gaps are specified as explicit ranges this is no ambiguity, so we 
could require all gaps to be explicit ranges if one wants to use it?

This seems like a really harsh limitation to impose.  If the only way to use an 
explicit range is in use cases where you *only* use explicit ranges, then what 
value add does this feature give you over just using multiple facet.query 
params? (it might be marginally fewer characters, but multiple facet.query 
params seem more intuitive and easier to read).  I mean: I don't have a 
solution to propose, it just seems like there's not much point in supporting 
explicit ranges in that case.

---

Having not thought about this issue in almost a month, and revisiting it with 
(fairly) fresh eyes, and thinking about all the use cases that have been 
discussed, it seems like the main goals we should address are really:

 * an intuitive syntax for specifying end points for ranges of varying sizes
 * ability to specify range end points using either fixed values or increments
 * ability to specify that ranges should be either use sequential end points, 
or be overlapping relative some fixed min/max value

In other words: the only reason (that i know of) why overlapping ranges even 
came up in this issue was use cases like...

{noformat}
   Price: $0-10, $0-20, $0-50, $0-100
   Date: NOW-1DAY TO NOW, NOW-1MONTH TO NOW, NOW-1YEAR TO NOW
{noformat}

...there doesn't seem to be a lot of motivations for using overlapping ranges 
in the "middle" of a sequence, and these types of use cases where *all* the 
ranges overlap seem just as important as use cases where the ranges don't 
overlap at all...

{noformat}
   Price: $0-10, $10-20, $20-50, $50-100
   Date: NOW-1DAY TO NOW, NOW-1MONTH TO NOW-1DAY, NOW-1YEAR TO NOW-1MONTH
{noformat}

...so let's try to focus on a syntax that makes both easy, using both fixed and 
relative values, w/o worrying about supporting arbitrary overlapping ranges 
(since I can't think of a use case for it, and it could always be achieved 
using facet.query)

So how about something like...

{noformat}
 facet.sequence=
 facet.sequence.spec=[,]?,[,]*[,]?
 facet.sequence.type=[before|after|between]
 facet.sequence.include=(same as facet.range.include)
{noformat}

Where "relval" would either be a concrete value, or a relativ

[jira] [Issue Comment Edited] (SOLR-1499) SolrEntityProcessor - DIH EntityProcessor that queries an external Solr via SolrJ

2011-10-10 Thread Pulkit Singhal (Issue Comment Edited) (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-1499?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13124522#comment-13124522
 ] 

Pulkit Singhal edited comment on SOLR-1499 at 10/11/11 12:28 AM:
-

The updated patch is for lucene-solr trunk is attached.

I need to message multivalued fields, is there any guidance around that? I know 
its not tested but how should one go about experimenting with it?

FYI: To prove the patch works, I got a basic sanity-test to work where the 
data-config.xml file in my bbyopen2 core got its data from the initital bbyopen 
core:
  1 
  2   
  3 http://localhost:8983/solr/bbyopen";
  6 query="sku:1000159"
  7 format="javabin"
  8 transformer="TemplateTransformer">
  9   
 10 
 11
 12 

  was (Author: pulkitsing...@gmail.com):
Tried to update the patch for the lucene-solr trunk ... but I can't get 
Solr admin interface up & running anymore after applying it with the 
modifications, perhaps someone can advice?
  
> SolrEntityProcessor - DIH EntityProcessor that queries an external Solr via 
> SolrJ
> -
>
> Key: SOLR-1499
> URL: https://issues.apache.org/jira/browse/SOLR-1499
> Project: Solr
>  Issue Type: New Feature
>  Components: contrib - DataImportHandler
>Reporter: Lance Norskog
> Fix For: 3.5, 4.0
>
> Attachments: SOLR-1499.patch, SOLR-1499.patch, SOLR-1499.patch, 
> SOLR-1499.patch, SOLR-1499.patch, SOLR-1499.patch, 
> SOLR-1499.rev1181269.buggy.patch
>
>
> The SolrEntityProcessor queries an external Solr instance. The Solr documents 
> returned are unpacked and emitted as DIH fields.
> The SolrEntityProcessor uses the following attributes:
> * solr='http://localhost:8983/solr/sms'
> ** This gives the URL of the target Solr instance.
> *** Note: the connection to the target Solr uses the binary SolrJ format.
> * query='Jefferson&sort=id+asc'
> ** This gives the base query string use with Solr. It can include any 
> standard Solr request parameter. This attribute is processed under the 
> variable resolution rules and can be driven in an inner stage of the indexing 
> pipeline.
> * rows='10'
> ** This gives the number of rows to fetch per request..
> ** The SolrEntityProcessor always fetches every document that matches the 
> request..
> * fields='id,tag'
> ** This selects the fields to be returned from the Solr request.
> ** These must also be declared as  elements.
> ** As with all fields, template processors can be used to alter the contents 
> to be passed downwards.
> * timeout='30'
> ** This limits the query to 5 seconds. This can be used as a fail-safe to 
> prevent the indexing session from freezing up. By default the timeout is 5 
> minutes.
> Limitations:
> * Solr errors are not handled correctly.
> * Loop control constructs have not been tested.
> * Multi-valued returned fields have not been tested.
> The unit tests give examples of how to use it as the root entity and an inner 
> entity.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Issue Comment Edited] (SOLR-1499) SolrEntityProcessor - DIH EntityProcessor that queries an external Solr via SolrJ

2011-10-10 Thread Pulkit Singhal (Issue Comment Edited) (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-1499?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13124522#comment-13124522
 ] 

Pulkit Singhal edited comment on SOLR-1499 at 10/11/11 12:30 AM:
-

The updated patch is for lucene-solr trunk is attached. Sorry for naming it 
badly but apparently I can't edit the file name after attaching it: 
SOLR-1499.rev1181269.buggy.patch

I need to message multivalued fields, is there any guidance around that? I know 
its not tested but how should one go about experimenting with it?

FYI: To prove the patch works, I got a basic sanity-test to work where the 
data-config.xml file in my bbyopen2 core got its data from the initital bbyopen 
core:
  1 
  2   
  3 http://localhost:8983/solr/bbyopen";
  6 query="sku:1000159"
  7 format="javabin"
  8 transformer="TemplateTransformer">
  9   
 10 
 11
 12 

  was (Author: pulkitsing...@gmail.com):
The updated patch is for lucene-solr trunk is attached.

I need to message multivalued fields, is there any guidance around that? I know 
its not tested but how should one go about experimenting with it?

FYI: To prove the patch works, I got a basic sanity-test to work where the 
data-config.xml file in my bbyopen2 core got its data from the initital bbyopen 
core:
  1 
  2   
  3 http://localhost:8983/solr/bbyopen";
  6 query="sku:1000159"
  7 format="javabin"
  8 transformer="TemplateTransformer">
  9   
 10 
 11
 12 
  
> SolrEntityProcessor - DIH EntityProcessor that queries an external Solr via 
> SolrJ
> -
>
> Key: SOLR-1499
> URL: https://issues.apache.org/jira/browse/SOLR-1499
> Project: Solr
>  Issue Type: New Feature
>  Components: contrib - DataImportHandler
>Reporter: Lance Norskog
> Fix For: 3.5, 4.0
>
> Attachments: SOLR-1499.patch, SOLR-1499.patch, SOLR-1499.patch, 
> SOLR-1499.patch, SOLR-1499.patch, SOLR-1499.patch, 
> SOLR-1499.rev1181269.buggy.patch
>
>
> The SolrEntityProcessor queries an external Solr instance. The Solr documents 
> returned are unpacked and emitted as DIH fields.
> The SolrEntityProcessor uses the following attributes:
> * solr='http://localhost:8983/solr/sms'
> ** This gives the URL of the target Solr instance.
> *** Note: the connection to the target Solr uses the binary SolrJ format.
> * query='Jefferson&sort=id+asc'
> ** This gives the base query string use with Solr. It can include any 
> standard Solr request parameter. This attribute is processed under the 
> variable resolution rules and can be driven in an inner stage of the indexing 
> pipeline.
> * rows='10'
> ** This gives the number of rows to fetch per request..
> ** The SolrEntityProcessor always fetches every document that matches the 
> request..
> * fields='id,tag'
> ** This selects the fields to be returned from the Solr request.
> ** These must also be declared as  elements.
> ** As with all fields, template processors can be used to alter the contents 
> to be passed downwards.
> * timeout='30'
> ** This limits the query to 5 seconds. This can be used as a fail-safe to 
> prevent the indexing session from freezing up. By default the timeout is 5 
> minutes.
> Limitations:
> * Solr errors are not handled correctly.
> * Loop control constructs have not been tested.
> * Multi-valued returned fields have not been tested.
> The unit tests give examples of how to use it as the root entity and an inner 
> entity.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-2820) Add both model and state to ZooKeeper layout for SolrCloud

2011-10-10 Thread Ted Dunning (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-2820?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13124642#comment-13124642
 ] 

Ted Dunning commented on SOLR-2820:
---

Would it help to have a toy implementation for discussion here?  I don't have 
enough time to make clean updates to Solr itself, but I have built this kind of 
code several times and could build a simple framework very quickly.

> Add both model and state to ZooKeeper layout for SolrCloud
> --
>
> Key: SOLR-2820
> URL: https://issues.apache.org/jira/browse/SOLR-2820
> Project: Solr
>  Issue Type: New Feature
>  Components: SolrCloud
>Reporter: Mark Miller
>
> Current we skimp by here by having the model and simple node state - we 
> really want the model and full cluster state longer term though.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Created] (LUCENE-3505) BooleanScorer2.freq() doesnt work unless you call score() first.

2011-10-10 Thread Robert Muir (Created) (JIRA)
BooleanScorer2.freq() doesnt work unless you call score() first.


 Key: LUCENE-3505
 URL: https://issues.apache.org/jira/browse/LUCENE-3505
 Project: Lucene - Java
  Issue Type: Bug
Reporter: Robert Muir


its 0, the freq() is then calculated as a side effect of score()... we should 
at least document this or throw UOE for freq() instead.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-3487) TestBooleanMinShouldMatch test failure

2011-10-10 Thread Robert Muir (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-3487?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13124646#comment-13124646
 ] 

Robert Muir commented on LUCENE-3487:
-

i looked this over, this is because its comparing two different queries (one 
with minShouldMatch, other without), so there are some
minor floating point differences because BS2 uses different scorers.

this is fine, its going to be consistent with itself, so we just need to fix 
the test to use a relative epsilon (like the queryutil check).


> TestBooleanMinShouldMatch test failure
> --
>
> Key: LUCENE-3487
> URL: https://issues.apache.org/jira/browse/LUCENE-3487
> Project: Lucene - Java
>  Issue Type: Bug
>Affects Versions: 4.0
>Reporter: Robert Muir
>Assignee: Robert Muir
>
> ant test -Dtestcase=TestBooleanMinShouldMatch -Dtestmethod=testRandomQueries 
> -Dtests.seed=505d62a62e9f90d0:-60daa428161b404b:-406411290a98f416
> I think its an absolute/relative epsilon issue

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Resolved] (LUCENE-3487) TestBooleanMinShouldMatch test failure

2011-10-10 Thread Robert Muir (Resolved) (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-3487?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Robert Muir resolved LUCENE-3487.
-

   Resolution: Fixed
Fix Version/s: 4.0

> TestBooleanMinShouldMatch test failure
> --
>
> Key: LUCENE-3487
> URL: https://issues.apache.org/jira/browse/LUCENE-3487
> Project: Lucene - Java
>  Issue Type: Bug
>Affects Versions: 4.0
>Reporter: Robert Muir
>Assignee: Robert Muir
> Fix For: 4.0
>
>
> ant test -Dtestcase=TestBooleanMinShouldMatch -Dtestmethod=testRandomQueries 
> -Dtests.seed=505d62a62e9f90d0:-60daa428161b404b:-406411290a98f416
> I think its an absolute/relative epsilon issue

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-1499) SolrEntityProcessor - DIH EntityProcessor that queries an external Solr via SolrJ

2011-10-10 Thread Lance Norskog (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-1499?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13124649#comment-13124649
 ] 

Lance Norskog commented on SOLR-1499:
-

Hi-

First, get the unit tests to work. After that, we're ready to work on it. You 
do a full build at the top with 
{code}
ant compile'
{code}
and then cd to solr/contrib/dataimporthandler and 
{code}
ant test
{code}
When the unit tests do not work, something fundamental is broken and there is 
no point going further. In this case, the tests are broken because a 
solrconfig.xml sample file they depended on has gone away and you need to find 
replacements.

> SolrEntityProcessor - DIH EntityProcessor that queries an external Solr via 
> SolrJ
> -
>
> Key: SOLR-1499
> URL: https://issues.apache.org/jira/browse/SOLR-1499
> Project: Solr
>  Issue Type: New Feature
>  Components: contrib - DataImportHandler
>Reporter: Lance Norskog
> Fix For: 3.5, 4.0
>
> Attachments: SOLR-1499.patch, SOLR-1499.patch, SOLR-1499.patch, 
> SOLR-1499.patch, SOLR-1499.patch, SOLR-1499.patch, 
> SOLR-1499.rev1181269.buggy.patch
>
>
> The SolrEntityProcessor queries an external Solr instance. The Solr documents 
> returned are unpacked and emitted as DIH fields.
> The SolrEntityProcessor uses the following attributes:
> * solr='http://localhost:8983/solr/sms'
> ** This gives the URL of the target Solr instance.
> *** Note: the connection to the target Solr uses the binary SolrJ format.
> * query='Jefferson&sort=id+asc'
> ** This gives the base query string use with Solr. It can include any 
> standard Solr request parameter. This attribute is processed under the 
> variable resolution rules and can be driven in an inner stage of the indexing 
> pipeline.
> * rows='10'
> ** This gives the number of rows to fetch per request..
> ** The SolrEntityProcessor always fetches every document that matches the 
> request..
> * fields='id,tag'
> ** This selects the fields to be returned from the Solr request.
> ** These must also be declared as  elements.
> ** As with all fields, template processors can be used to alter the contents 
> to be passed downwards.
> * timeout='30'
> ** This limits the query to 5 seconds. This can be used as a fail-safe to 
> prevent the indexing session from freezing up. By default the timeout is 5 
> minutes.
> Limitations:
> * Solr errors are not handled correctly.
> * Loop control constructs have not been tested.
> * Multi-valued returned fields have not been tested.
> The unit tests give examples of how to use it as the root entity and an inner 
> entity.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (SOLR-2765) Shard/Node states

2011-10-10 Thread Jamie Johnson (Updated) (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-2765?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jamie Johnson updated SOLR-2765:


Attachment: solrcloud.patch

The latest patch fixes the issue I had mentioned before, where live_nodes 
wasn't properly getting updated.  This implementation also keeps ClusterState 
(used to be CloudState) and Slice immutable.  

The following is a list of things that can still be done:

# Consider removing ZkStateReader.updateCloudState methods, these aren't called 
by anything other than tests right now.  On a side note, the current 
implementation processes every watch event instead of having the 5s delay.  
This could cause some issues performance wise when the cluster is first coming 
up since everyone will be trying to write to the cluster state.  If we have to 
add that back it should be a really simple change.
# Update the tests to read from /cluster_state instead of /collections
# Decide if ClusterState should be the ideal state or actual (i.e. do we 
maintain information in /collections as ideal and update ClusterState to track 
nodes going down).  Currently ClusterState is ideal.  This would require some 
leader to track when a node is no longer live, so should probably be pushed off.


Any thoughts on these?  Specifically 1.

> Shard/Node states
> -
>
> Key: SOLR-2765
> URL: https://issues.apache.org/jira/browse/SOLR-2765
> Project: Solr
>  Issue Type: Sub-task
>  Components: SolrCloud, update
>Reporter: Yonik Seeley
> Fix For: 4.0
>
> Attachments: cluster_state-file.patch, combined.patch, 
> incremental_update.patch, scheduled_executors.patch, shard-roles.patch, 
> solrcloud.patch
>
>
> Need state for shards that indicate they are recovering, active/enabled, or 
> disabled.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



Patch commit question

2011-10-10 Thread Pulkit Singhal
Hello,

I was wondering what the process is for getting patches committed.

For example: SOLR-1499 ,
SOLR-2549,  and
SOLR-2382would all
make for a great commit overall.

As I'm sure many others will too, I'm biased obviously :)

What is that they lack process-wise to make it in?

Thanks!
- Pulkit


[jira] [Commented] (LUCENE-1536) if a filter can support random access API, we should use it

2011-10-10 Thread Uwe Schindler (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-1536?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13124720#comment-13124720
 ] 

Uwe Schindler commented on LUCENE-1536:
---

{quote}
hack patch that computes the heuristic up front in weight init, so it scores 
all segments consistently and returns the proper scoresDocsOutOfOrder for BS1.

Uwe's new test (the nestedFilterQuery) doesnt pass yet, don't know why.
{quote}

Very easy to explain: Because it's a hack! The problem is simple: The new test 
explicitely checks that acceptDocs are correctly handled by the query, which is 
not the case for your modifications. In createWeight you get the first segemnt 
and create the filter's docidset on it, passing *liveDocs* (because you have 
nothing else). You cache this first DocIdSet (to not need to execute 
getDocIdSet for the first filter 2 times) and by that miss the real acceptDocs 
(which are != liveDocs in this test). The firts segment therefore returns more 
documents that it should.

Alltogether, the hack is of course uncommitable and the source of outr problem 
only lies in the out of order setting. The fix in your patch is fine, but too 
much. The scoresDocsOutOfOrder method should simply return, what the inner 
weight returns, because it *may* return docs out of order. It can still retun 
them in order (if a filter needs to be applied using iterator). This is not 
different to behaviour before. So the fix is easy: Do the same like in 
ConstantScoreQuery, where we return the setting from the inner weight.

Being consistent in selecting scorer implementations between segments is not an 
issue of this special case, it's a general problem and cannot be solved by a 
hack. The selection of Scorer for BooleanQuery can be different even without 
FilteredQuery, as BooleanWeight might return different different scorer, too 
(so the problem is BooleanScorer that does selection of its Scorer 
per-segment). To fix this, BooleanWeight must do all the scorer descisions in 
it's ctor, so we would need to pass also scoreInOrder and other parameters to 
the Weight's ctor.

Please remove the hack, and only correctly implement scoresDocsOutOfOrder 
(which is the reason for the problem, as it suddenly returns documents in a 
different order). We can still get the documents with that patch in different 
order if we have random access enabled together with the filter but the old 
IndexSearcher used DocIdSetIterator (in-order). We should ignore those 
differences in document order, if score is identical (and Mike's output shows 
scores are equal). If we want to check that the results are identical, the 
benchmark test must explicitely request docs-in-order on trunk vs. patch to be 
consistent. But then it's no longer a benchmark.

Conclusion: In general we explained the differences between the patches and I 
think, my original patch is fine except the Weight.scoresDocsOutOfOrder, which 
should return the inner Weight's setting (like CSQ does) - no magic needed. Our 
patch does *not* return wrong documents, just the order of equal-scoring 
documents is different, which is perfectly fine.

> if a filter can support random access API, we should use it
> ---
>
> Key: LUCENE-1536
> URL: https://issues.apache.org/jira/browse/LUCENE-1536
> Project: Lucene - Java
>  Issue Type: Improvement
>  Components: core/search
>Affects Versions: 2.4
>Reporter: Michael McCandless
>Assignee: Michael McCandless
>Priority: Minor
>  Labels: gsoc2011, lucene-gsoc-11, mentor
> Fix For: 4.0
>
> Attachments: CachedFilterIndexReader.java, LUCENE-1536-rewrite.patch, 
> LUCENE-1536-rewrite.patch, LUCENE-1536-rewrite.patch, 
> LUCENE-1536-rewrite.patch, LUCENE-1536-rewrite.patch, 
> LUCENE-1536-rewrite.patch, LUCENE-1536-rewrite.patch, 
> LUCENE-1536-rewrite.patch, LUCENE-1536.patch, LUCENE-1536.patch, 
> LUCENE-1536.patch, LUCENE-1536.patch, LUCENE-1536.patch, LUCENE-1536.patch, 
> LUCENE-1536.patch, LUCENE-1536.patch, LUCENE-1536.patch, LUCENE-1536.patch, 
> LUCENE-1536.patch, LUCENE-1536.patch, LUCENE-1536.patch, LUCENE-1536.patch, 
> LUCENE-1536.patch, LUCENE-1536.patch, LUCENE-1536.patch, LUCENE-1536.patch, 
> LUCENE-1536.patch, LUCENE-1536.patch, LUCENE-1536.patch, LUCENE-1536.patch, 
> LUCENE-1536.patch, LUCENE-1536.patch, LUCENE-1536.patch, LUCENE-1536.patch, 
> LUCENE-1536_hack.patch, changes-yonik-uwe.patch, luceneutil.patch
>
>
> I ran some performance tests, comparing applying a filter via
> random-access API instead of current trunk's iterator API.
> This was inspired by LUCENE-1476, where we realized deletions should
> really be implemented just like a filter, but then in testing found
> that switching deletions to iterator was a very sizable performance
> hit.
> S

[jira] [Updated] (LUCENE-1536) if a filter can support random access API, we should use it

2011-10-10 Thread Uwe Schindler (Updated) (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-1536?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Uwe Schindler updated LUCENE-1536:
--

Attachment: LUCENE-1536.patch

Patch that fixes the Weight.scoreDocsOutOfOrder method to return the inner 
weight's setting. The scorer can still return docs in order, but that was 
identical behaviour in previous unpatched trunk (IS looked at the out-of- order 
setting of the weight and uses correct collector, but once a filter was 
applied, the documents came in order). My patch only missed to pass this 
setting to our wrapper query.

Mike: If you have time, can you check this? We may need a test, that uses a 
larger index and tests FilteredQuery on top of it, the current indexes used for 
filtering are simply too small and in most cases have only one segment :(

There is no need for Robert's hack (that does not work correctly with aceptDocs 
!= liveDocs), if different BooleanScorers return significant different scores, 
it as a bug, not a problem in FilteredQuery. Slight score changes and therefor 
different order in results is not a problem at all - this is just my opinion.

bq. If we want to check that the results are identical, the benchmark test must 
explicitely request docs-in-order on trunk vs. patch to be consistent. But then 
it's no longer a benchmark.

This is of course untrue, sorry. If the weight returns that docs *may* come out 
of order, the collector should handle this.

> if a filter can support random access API, we should use it
> ---
>
> Key: LUCENE-1536
> URL: https://issues.apache.org/jira/browse/LUCENE-1536
> Project: Lucene - Java
>  Issue Type: Improvement
>  Components: core/search
>Affects Versions: 2.4
>Reporter: Michael McCandless
>Assignee: Michael McCandless
>Priority: Minor
>  Labels: gsoc2011, lucene-gsoc-11, mentor
> Fix For: 4.0
>
> Attachments: CachedFilterIndexReader.java, LUCENE-1536-rewrite.patch, 
> LUCENE-1536-rewrite.patch, LUCENE-1536-rewrite.patch, 
> LUCENE-1536-rewrite.patch, LUCENE-1536-rewrite.patch, 
> LUCENE-1536-rewrite.patch, LUCENE-1536-rewrite.patch, 
> LUCENE-1536-rewrite.patch, LUCENE-1536.patch, LUCENE-1536.patch, 
> LUCENE-1536.patch, LUCENE-1536.patch, LUCENE-1536.patch, LUCENE-1536.patch, 
> LUCENE-1536.patch, LUCENE-1536.patch, LUCENE-1536.patch, LUCENE-1536.patch, 
> LUCENE-1536.patch, LUCENE-1536.patch, LUCENE-1536.patch, LUCENE-1536.patch, 
> LUCENE-1536.patch, LUCENE-1536.patch, LUCENE-1536.patch, LUCENE-1536.patch, 
> LUCENE-1536.patch, LUCENE-1536.patch, LUCENE-1536.patch, LUCENE-1536.patch, 
> LUCENE-1536.patch, LUCENE-1536.patch, LUCENE-1536.patch, LUCENE-1536.patch, 
> LUCENE-1536.patch, LUCENE-1536_hack.patch, changes-yonik-uwe.patch, 
> luceneutil.patch
>
>
> I ran some performance tests, comparing applying a filter via
> random-access API instead of current trunk's iterator API.
> This was inspired by LUCENE-1476, where we realized deletions should
> really be implemented just like a filter, but then in testing found
> that switching deletions to iterator was a very sizable performance
> hit.
> Some notes on the test:
>   * Index is first 2M docs of Wikipedia.  Test machine is Mac OS X
> 10.5.6, quad core Intel CPU, 6 GB RAM, java 1.6.0_07-b06-153.
>   * I test across multiple queries.  1-X means an OR query, eg 1-4
> means 1 OR 2 OR 3 OR 4, whereas +1-4 is an AND query, ie 1 AND 2
> AND 3 AND 4.  "u s" means "united states" (phrase search).
>   * I test with multiple filter densities (0, 1, 2, 5, 10, 25, 75, 90,
> 95, 98, 99, 99.9 (filter is non-null but all bits are set),
> 100 (filter=null, control)).
>   * Method high means I use random-access filter API in
> IndexSearcher's main loop.  Method low means I use random-access
> filter API down in SegmentTermDocs (just like deleted docs
> today).
>   * Baseline (QPS) is current trunk, where filter is applied as iterator up
> "high" (ie in IndexSearcher's search loop).

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org