[jira] Commented: (LUCENE-2831) Revise Weight#scorer & Filter#getDocIdSet API to pass Readers context

2011-01-05 Thread Simon Willnauer (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-2831?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12978158#action_12978158
 ] 

Simon Willnauer commented on LUCENE-2831:
-

bq. So we should either change to AtomicReaderContext, or put a getBaseInTop() 
method on ReaderContext.

We should move to AtomicReaderContext if possible. Would you want to open a new 
issue to migrate solr parts or should we do that in this one?
Similarly, Weight#scorer should also take a AtomicReaderContext if possible...


> Revise Weight#scorer & Filter#getDocIdSet API to pass Readers context
> -
>
> Key: LUCENE-2831
> URL: https://issues.apache.org/jira/browse/LUCENE-2831
> Project: Lucene - Java
>  Issue Type: Improvement
>  Components: Search
>Affects Versions: 4.0
>Reporter: Simon Willnauer
>Assignee: Simon Willnauer
> Fix For: 4.0
>
> Attachments: LUCENE-2831.patch, LUCENE-2831.patch, LUCENE-2831.patch, 
> LUCENE-2831.patch, LUCENE-2831.patch, LUCENE-2831.patch
>
>
> Spinoff from LUCENE-2694 - instead of passing a reader into Weight#scorer(IR, 
> boolean, boolean) we should / could revise the API and pass in a struct that 
> has parent reader, sub reader, ord of that sub. The ord mapping plus the 
> context with its parent would make several issues way easier. See 
> LUCENE-2694, LUCENE-2348 and LUCENE-2829 to name some.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] Commented: (LUCENE-2847) Support all of unicode in StandardTokenizer

2011-01-05 Thread Steven Rowe (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-2847?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12978141#action_12978141
 ] 

Steven Rowe commented on LUCENE-2847:
-

bq. We could also consolidate tools, because in general i would rather all the 
analyzers be consolidated, they are only split up due to dependencies/large 
files etc. But tools are different, its just to assist the build.

How far would you go with this tools consolidation?  All tools across the whole 
of Scenolunr?  Or just the ones under {{modules/analysis/}}?

> Support all of unicode in StandardTokenizer
> ---
>
> Key: LUCENE-2847
> URL: https://issues.apache.org/jira/browse/LUCENE-2847
> Project: Lucene - Java
>  Issue Type: Bug
>  Components: Analysis
>Reporter: Robert Muir
> Fix For: 3.1, 4.0
>
> Attachments: LUCENE-2847.patch, LUCENE-2847.patch, LUCENE-2847.patch
>
>
> StandardTokenizer currently only supports the BMP.
> If it encounters characters outside of the BMP, it just discards them... 
> it should instead implement fully implement UAX#29 across all of unicode.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] Updated: (LUCENE-2324) Per thread DocumentsWriters that write their own private segments

2011-01-05 Thread Jason Rutherglen (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-2324?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jason Rutherglen updated LUCENE-2324:
-

Attachment: test.out

Here's a new test.out, I'll look at TestCheckIndex which should probably work.  

"IndexFileDeleter doesn't know about file" seems odd.  We're OOMing in 
TestIndexWriter because we're not flushing by RAM (eg, it currently defaults to 
return false).

> Per thread DocumentsWriters that write their own private segments
> -
>
> Key: LUCENE-2324
> URL: https://issues.apache.org/jira/browse/LUCENE-2324
> Project: Lucene - Java
>  Issue Type: Improvement
>  Components: Index
>Reporter: Michael Busch
>Assignee: Michael Busch
>Priority: Minor
> Fix For: Realtime Branch
>
> Attachments: LUCENE-2324-SMALL.patch, LUCENE-2324-SMALL.patch, 
> LUCENE-2324-SMALL.patch, lucene-2324.patch, lucene-2324.patch, 
> LUCENE-2324.patch, test.out, test.out
>
>
> See LUCENE-2293 for motivation and more details.
> I'm copying here Mike's summary he posted on 2293:
> Change the approach for how we buffer in RAM to a more isolated
> approach, whereby IW has N fully independent RAM segments
> in-process and when a doc needs to be indexed it's added to one of
> them. Each segment would also write its own doc stores and
> "normal" segment merging (not the inefficient merge we now do on
> flush) would merge them. This should be a good simplification in
> the chain (eg maybe we can remove the *PerThread classes). The
> segments can flush independently, letting us make much better
> concurrent use of IO & CPU.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] Issue Comment Edited: (LUCENE-2847) Support all of unicode in StandardTokenizer

2011-01-05 Thread Steven Rowe (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-2847?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12978130#action_12978130
 ] 

Steven Rowe edited comment on LUCENE-2847 at 1/6/11 12:16 AM:
--

New patch, with the following changes:

# Added a new target {{gen-uax29-supp-macros}} to 
{{modules/analysis/icu/build.xml}}, and a {{}} call to it from the 
{{jflex}} task in {{modules/analysis/common/build.xml}}.
# Included {{SUPPLEMENTARY.jflex-macro}} in {{UAX29URLEmailTokenizer.jflex}} in 
the same way as it is included in {{StandardTokenizer.jflex}}
# Copied the simple supplementary characters test from 
{{TestStandardAnalyzer.java}} to {{TestUAX29URLEmailTokenizer.java}}.
# Modified the CHANGES.txt entry for the UAX#29 issues to include a reference 
to this issue.

All tests pass.

  was (Author: steve_rowe):
New patch, with the following changes:

# Added a new target {{gen-uax29-supp-macros}} to 
{{modules/analysis/icu/build.xml}}, and a {{}} call to it from the 
{{jflex}} task in {{modules/analysis/common/build.xml}}.
# Included SUPPLEMENTARY.jflex-macro}} in {{UAX29URLEmailTokenizer.jflex}} in 
the same way as it is included in {{StandardTokenizer.jflex}}
# Copied the simple supplementary characters test from 
{{TestStandardAnalyzer.java}} to {{TestUAX29URLEmailTokenizer.java}}.
# Modified the CHANGES.txt entry for the UAX#29 issues to include a reference 
to this issue.

All tests pass.
  
> Support all of unicode in StandardTokenizer
> ---
>
> Key: LUCENE-2847
> URL: https://issues.apache.org/jira/browse/LUCENE-2847
> Project: Lucene - Java
>  Issue Type: Bug
>  Components: Analysis
>Reporter: Robert Muir
> Fix For: 3.1, 4.0
>
> Attachments: LUCENE-2847.patch, LUCENE-2847.patch, LUCENE-2847.patch
>
>
> StandardTokenizer currently only supports the BMP.
> If it encounters characters outside of the BMP, it just discards them... 
> it should instead implement fully implement UAX#29 across all of unicode.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] Updated: (LUCENE-2847) Support all of unicode in StandardTokenizer

2011-01-05 Thread Steven Rowe (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-2847?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Steven Rowe updated LUCENE-2847:


Attachment: LUCENE-2847.patch

Removed the WARNING from the {{UAX29URLEmailTokenizer}} class javadocs about 
Unicode supplementary character non-coverage.

> Support all of unicode in StandardTokenizer
> ---
>
> Key: LUCENE-2847
> URL: https://issues.apache.org/jira/browse/LUCENE-2847
> Project: Lucene - Java
>  Issue Type: Bug
>  Components: Analysis
>Reporter: Robert Muir
> Fix For: 3.1, 4.0
>
> Attachments: LUCENE-2847.patch, LUCENE-2847.patch, LUCENE-2847.patch
>
>
> StandardTokenizer currently only supports the BMP.
> If it encounters characters outside of the BMP, it just discards them... 
> it should instead implement fully implement UAX#29 across all of unicode.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] Updated: (LUCENE-2847) Support all of unicode in StandardTokenizer

2011-01-05 Thread Steven Rowe (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-2847?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Steven Rowe updated LUCENE-2847:


Attachment: LUCENE-2847.patch

New patch, with the following changes:

# Added a new target {{gen-uax29-supp-macros}} to 
{{modules/analysis/icu/build.xml}}, and a {{}} call to it from the 
{{jflex}} task in {{modules/analysis/common/build.xml}}.
# Included SUPPLEMENTARY.jflex-macro}} in {{UAX29URLEmailTokenizer.jflex}} in 
the same way as it is included in {{StandardTokenizer.jflex}}
# Copied the simple supplementary characters test from 
{{TestStandardAnalyzer.java}} to {{TestUAX29URLEmailTokenizer.java}}.
# Modified the CHANGES.txt entry for the UAX#29 issues to include a reference 
to this issue.

All tests pass.

> Support all of unicode in StandardTokenizer
> ---
>
> Key: LUCENE-2847
> URL: https://issues.apache.org/jira/browse/LUCENE-2847
> Project: Lucene - Java
>  Issue Type: Bug
>  Components: Analysis
>Reporter: Robert Muir
> Fix For: 3.1, 4.0
>
> Attachments: LUCENE-2847.patch, LUCENE-2847.patch
>
>
> StandardTokenizer currently only supports the BMP.
> If it encounters characters outside of the BMP, it just discards them... 
> it should instead implement fully implement UAX#29 across all of unicode.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] Updated: (LUCENE-2324) Per thread DocumentsWriters that write their own private segments

2011-01-05 Thread Jason Rutherglen (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-2324?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jason Rutherglen updated LUCENE-2324:
-

Attachment: LUCENE-2324-SMALL.patch

Same as the last patch, however default deletes is added to DW to which deletes 
are added to when there are no available DWPTs.  On flush all threads, default 
deletes is applied to the last segment with no doc limit.  

TestIndexReaderReopen now passes.

> Per thread DocumentsWriters that write their own private segments
> -
>
> Key: LUCENE-2324
> URL: https://issues.apache.org/jira/browse/LUCENE-2324
> Project: Lucene - Java
>  Issue Type: Improvement
>  Components: Index
>Reporter: Michael Busch
>Assignee: Michael Busch
>Priority: Minor
> Fix For: Realtime Branch
>
> Attachments: LUCENE-2324-SMALL.patch, LUCENE-2324-SMALL.patch, 
> LUCENE-2324-SMALL.patch, lucene-2324.patch, lucene-2324.patch, 
> LUCENE-2324.patch, test.out
>
>
> See LUCENE-2293 for motivation and more details.
> I'm copying here Mike's summary he posted on 2293:
> Change the approach for how we buffer in RAM to a more isolated
> approach, whereby IW has N fully independent RAM segments
> in-process and when a doc needs to be indexed it's added to one of
> them. Each segment would also write its own doc stores and
> "normal" segment merging (not the inefficient merge we now do on
> flush) would merge them. This should be a good simplification in
> the chain (eg maybe we can remove the *PerThread classes). The
> segments can flush independently, letting us make much better
> concurrent use of IO & CPU.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



Re: strange problem of PForDelta decoder

2011-01-05 Thread Li Li
we recently are interested in this problem. if we come up with a
patch, I'd like
to share it with everyone.

2011/1/4 Michael McCandless :
> 2011/1/4 Li Li :
>> I agree with you that we should not tie concurrency w/in a single search to
>> index segments.
>> That solution is just a hack.
>> will lucene 4 support multithreads search for a single query?
>> I haven't found any patch about this.
>
> Well, as things stand now, Lucene 4 will only support the "thread per
> segment" hack.  The patch on LUCENE-2837 (still needs work) merges
> ParallelMultiSearcher into IndexSearcher, carrying over that hack.
>
> But this discussion seems like it could lead to a nice patch?  (If
> someone has the time/energy/itch to cons one up).
>
> Just dividing up the docID space equally seems like a simple solution
> that'd work well...
>
> Mike
>
> -
> To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
> For additional commands, e-mail: dev-h...@lucene.apache.org
>
>

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] Commented: (LUCENE-2831) Revise Weight#scorer & Filter#getDocIdSet API to pass Readers context

2011-01-05 Thread Yonik Seeley (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-2831?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12978096#action_12978096
 ] 

Yonik Seeley commented on LUCENE-2831:
--

Do we have any good MultiReader tests? 
wrapUnderlyingReader() sort of does... but not enough to tell if someone 
accidentally used baseInParent as opposed to the global base.
Perhaps it should construct a MultiReader with an arbitrary but equivalent 
structure based on children and leaves?

> Revise Weight#scorer & Filter#getDocIdSet API to pass Readers context
> -
>
> Key: LUCENE-2831
> URL: https://issues.apache.org/jira/browse/LUCENE-2831
> Project: Lucene - Java
>  Issue Type: Improvement
>  Components: Search
>Affects Versions: 4.0
>Reporter: Simon Willnauer
>Assignee: Simon Willnauer
> Fix For: 4.0
>
> Attachments: LUCENE-2831.patch, LUCENE-2831.patch, LUCENE-2831.patch, 
> LUCENE-2831.patch, LUCENE-2831.patch, LUCENE-2831.patch
>
>
> Spinoff from LUCENE-2694 - instead of passing a reader into Weight#scorer(IR, 
> boolean, boolean) we should / could revise the API and pass in a struct that 
> has parent reader, sub reader, ord of that sub. The ord mapping plus the 
> context with its parent would make several issues way easier. See 
> LUCENE-2694, LUCENE-2348 and LUCENE-2829 to name some.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] Commented: (LUCENE-2831) Revise Weight#scorer & Filter#getDocIdSet API to pass Readers context

2011-01-05 Thread Yonik Seeley (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-2831?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12978088#action_12978088
 ] 

Yonik Seeley commented on LUCENE-2831:
--

>Should Filter.getDocIDSet take an AtomicReaderContext? We don't have
>to do that in this patch, though... this patch is a big enough first
>step!

bq. Yeah I would like to do so, similar to Weight#scorer but currently mainly 
solr prevents us from this.

Which part?  I was looking into migrating some SolrIndexSearcher to 
ReaderContext, and realized I needed the global base.
I clould walk up to calculate, but then I realized that AtomicReaderContext 
already has that!  So we should either change to AtomicReaderContext, or put a 
getBaseInTop() method on ReaderContext.

> Revise Weight#scorer & Filter#getDocIdSet API to pass Readers context
> -
>
> Key: LUCENE-2831
> URL: https://issues.apache.org/jira/browse/LUCENE-2831
> Project: Lucene - Java
>  Issue Type: Improvement
>  Components: Search
>Affects Versions: 4.0
>Reporter: Simon Willnauer
>Assignee: Simon Willnauer
> Fix For: 4.0
>
> Attachments: LUCENE-2831.patch, LUCENE-2831.patch, LUCENE-2831.patch, 
> LUCENE-2831.patch, LUCENE-2831.patch, LUCENE-2831.patch
>
>
> Spinoff from LUCENE-2694 - instead of passing a reader into Weight#scorer(IR, 
> boolean, boolean) we should / could revise the API and pass in a struct that 
> has parent reader, sub reader, ord of that sub. The ord mapping plus the 
> context with its parent would make several issues way easier. See 
> LUCENE-2694, LUCENE-2348 and LUCENE-2829 to name some.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



Lucene-trunk - Build # 1417 - Still Failing

2011-01-05 Thread Apache Hudson Server
Build: https://hudson.apache.org/hudson/job/Lucene-trunk/1417/

No tests ran.

Build Log (for compile errors):
[...truncated 4457 lines...]



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] Commented: (LUCENE-2324) Per thread DocumentsWriters that write their own private segments

2011-01-05 Thread Jason Rutherglen (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-2324?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12978075#action_12978075
 ] 

Jason Rutherglen commented on LUCENE-2324:
--

{quote}I believe we can drop the delete in that case. We only need to buffer
into DWPTs that have at least 1 doc.{quote}

Right if the DWPT's flushing we can skip it. In the queue model we'd consume 
and locate
the existing DWPTs, adding the delete to each DWPT not flushing. However in the
zero DWPT case we still need to record a delete somewhere, most likely we'd
need to create a zero doc DWPT? Oh wait, we need to add the delete to the last
segment? Ah, I can fix that in the existing code (eg, fix the reopen test case
failures).

> Per thread DocumentsWriters that write their own private segments
> -
>
> Key: LUCENE-2324
> URL: https://issues.apache.org/jira/browse/LUCENE-2324
> Project: Lucene - Java
>  Issue Type: Improvement
>  Components: Index
>Reporter: Michael Busch
>Assignee: Michael Busch
>Priority: Minor
> Fix For: Realtime Branch
>
> Attachments: LUCENE-2324-SMALL.patch, LUCENE-2324-SMALL.patch, 
> lucene-2324.patch, lucene-2324.patch, LUCENE-2324.patch, test.out
>
>
> See LUCENE-2293 for motivation and more details.
> I'm copying here Mike's summary he posted on 2293:
> Change the approach for how we buffer in RAM to a more isolated
> approach, whereby IW has N fully independent RAM segments
> in-process and when a doc needs to be indexed it's added to one of
> them. Each segment would also write its own doc stores and
> "normal" segment merging (not the inefficient merge we now do on
> flush) would merge them. This should be a good simplification in
> the chain (eg maybe we can remove the *PerThread classes). The
> segments can flush independently, letting us make much better
> concurrent use of IO & CPU.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] Commented: (LUCENE-2324) Per thread DocumentsWriters that write their own private segments

2011-01-05 Thread Jason Rutherglen (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-2324?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12978065#action_12978065
 ] 

Jason Rutherglen commented on LUCENE-2324:
--

We're going to great lengths it seems to emulate a producer consumer queue (eg,
ordering of calls with sequence ids, thread pooling) without actually
implementing one. A fixed size blocking queue would simply block threads as
needed and would probably look cleaner in code. We could still implement thread
affinities though I simply can't see most applications requiring affinity, so
perhaps we can avoid it for now and put it back in later? 

{quote}I think flush control must be global? Ie when we've used too much RAM we
start flushing?{quote}

Right, it should. I'm just not sure we still need FC's global waiting during
flush, that'd seem to go away because the RAM usage tracking is in DW. If we
record the new incremental RAM used (which I think we do) per add/update/delete
then we can enable a pluggable user defined flush policy. 

{quote} If a given DWPT is flushing then we pick another? Ie the binding logic
would naturally avoid DWPTs that are not available - either because another
thread has it, or it's flushing. But it would prefer to use the same DWPT it
used last time, if possible (affinity). {quote}

However once the affinity DWPT flush completed, we'd need logic to revert back
to the original?

I think the 5% model of LUCENE-2573 may typically yield flushing that occurs in
near intervals of each other, ie, it's going to slow down the aggregate
indexing if they're flushing on top of each other. Maybe we should start at 60%
then the multiple of 40% divided by maxthreadstate - 1? Ideally we'd
statistically optimize the flush interval per machine, eg, SSDs and RAM disks
will likely require only a small flush percentage interval.





> Per thread DocumentsWriters that write their own private segments
> -
>
> Key: LUCENE-2324
> URL: https://issues.apache.org/jira/browse/LUCENE-2324
> Project: Lucene - Java
>  Issue Type: Improvement
>  Components: Index
>Reporter: Michael Busch
>Assignee: Michael Busch
>Priority: Minor
> Fix For: Realtime Branch
>
> Attachments: LUCENE-2324-SMALL.patch, LUCENE-2324-SMALL.patch, 
> lucene-2324.patch, lucene-2324.patch, LUCENE-2324.patch, test.out
>
>
> See LUCENE-2293 for motivation and more details.
> I'm copying here Mike's summary he posted on 2293:
> Change the approach for how we buffer in RAM to a more isolated
> approach, whereby IW has N fully independent RAM segments
> in-process and when a doc needs to be indexed it's added to one of
> them. Each segment would also write its own doc stores and
> "normal" segment merging (not the inefficient merge we now do on
> flush) would merge them. This should be a good simplification in
> the chain (eg maybe we can remove the *PerThread classes). The
> segments can flush independently, letting us make much better
> concurrent use of IO & CPU.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] Commented: (LUCENE-2831) Revise Weight#scorer & Filter#getDocIdSet API to pass Readers context

2011-01-05 Thread Simon Willnauer (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-2831?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12978061#action_12978061
 ] 

Simon Willnauer commented on LUCENE-2831:
-

bq. This is a bug in ReaderUtil.build() that when passed a segment reader, it 
sets isTopLevel to false.
ah good catch! Thanks!

bq. It assumes that a reader is top level if it has leaves.
that one is actually intentional. if it is a CompositeReaderContext it must 
have leaves since it is composed of at least on other reader, right? Otherwise 
it should be an atomic reader or do I miss something?

I have to admit that I didn't try to hard to get the Solr part running 
altogether. 

> Revise Weight#scorer & Filter#getDocIdSet API to pass Readers context
> -
>
> Key: LUCENE-2831
> URL: https://issues.apache.org/jira/browse/LUCENE-2831
> Project: Lucene - Java
>  Issue Type: Improvement
>  Components: Search
>Affects Versions: 4.0
>Reporter: Simon Willnauer
>Assignee: Simon Willnauer
> Fix For: 4.0
>
> Attachments: LUCENE-2831.patch, LUCENE-2831.patch, LUCENE-2831.patch, 
> LUCENE-2831.patch, LUCENE-2831.patch, LUCENE-2831.patch
>
>
> Spinoff from LUCENE-2694 - instead of passing a reader into Weight#scorer(IR, 
> boolean, boolean) we should / could revise the API and pass in a struct that 
> has parent reader, sub reader, ord of that sub. The ord mapping plus the 
> context with its parent would make several issues way easier. See 
> LUCENE-2694, LUCENE-2348 and LUCENE-2829 to name some.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] Commented: (LUCENE-2324) Per thread DocumentsWriters that write their own private segments

2011-01-05 Thread Michael McCandless (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-2324?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12978059#action_12978059
 ] 

Michael McCandless commented on LUCENE-2324:


{quote}
Taking a step back, I'm not sure flush control should be global, as flushing is
entirely per thread now?
{quote}

I think flush control must be global?  Ie when we've used too much RAM we start 
flushing?

{quote}
If we're adding a delete term for every DWPT, if one
is flushing do we wait or do we simply queue it up? I don't think we can wait
in the delete call for a DWPT to completely flush?
{quote}

I believe we can drop the delete in that case.  We only need to buffer into 
DWPTs that have at least 1 doc.

> Per thread DocumentsWriters that write their own private segments
> -
>
> Key: LUCENE-2324
> URL: https://issues.apache.org/jira/browse/LUCENE-2324
> Project: Lucene - Java
>  Issue Type: Improvement
>  Components: Index
>Reporter: Michael Busch
>Assignee: Michael Busch
>Priority: Minor
> Fix For: Realtime Branch
>
> Attachments: LUCENE-2324-SMALL.patch, LUCENE-2324-SMALL.patch, 
> lucene-2324.patch, lucene-2324.patch, LUCENE-2324.patch, test.out
>
>
> See LUCENE-2293 for motivation and more details.
> I'm copying here Mike's summary he posted on 2293:
> Change the approach for how we buffer in RAM to a more isolated
> approach, whereby IW has N fully independent RAM segments
> in-process and when a doc needs to be indexed it's added to one of
> them. Each segment would also write its own doc stores and
> "normal" segment merging (not the inefficient merge we now do on
> flush) would merge them. This should be a good simplification in
> the chain (eg maybe we can remove the *PerThread classes). The
> segments can flush independently, letting us make much better
> concurrent use of IO & CPU.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] Commented: (LUCENE-2324) Per thread DocumentsWriters that write their own private segments

2011-01-05 Thread Michael McCandless (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-2324?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12978060#action_12978060
 ] 

Michael McCandless commented on LUCENE-2324:


{queue}
Another model we could implement is a straight queuing. This'd give us total
ordering on all IW calls. Documents, deletes, and flushes would be queued up
and executed asynchronously. For example in today's DWPT code we will still
block document additions while flushing because we're tying a thread to a given
DWPT. If a thread's DWPT is flushing, wouldn't we want to simply assign the doc
add to a different non-flushing DWPT to gain full efficiency? This seems more
easily doable with a queuing model. If we want synchronous flushing then we'd
place a flush event in the queue and wait for it to complete executing. How
does this sound?
{queue}
I think we should have to add queueing to all incoming ops...

If a given DWPT is flushing then we pick another?  Ie the binding logic would 
naturally avoid DWPTs that are not available -- either because another thread 
has it, or it's flushing.  But it would prefer to use the same DWPT it used 
last time, if possible (affinity).

> Per thread DocumentsWriters that write their own private segments
> -
>
> Key: LUCENE-2324
> URL: https://issues.apache.org/jira/browse/LUCENE-2324
> Project: Lucene - Java
>  Issue Type: Improvement
>  Components: Index
>Reporter: Michael Busch
>Assignee: Michael Busch
>Priority: Minor
> Fix For: Realtime Branch
>
> Attachments: LUCENE-2324-SMALL.patch, LUCENE-2324-SMALL.patch, 
> lucene-2324.patch, lucene-2324.patch, LUCENE-2324.patch, test.out
>
>
> See LUCENE-2293 for motivation and more details.
> I'm copying here Mike's summary he posted on 2293:
> Change the approach for how we buffer in RAM to a more isolated
> approach, whereby IW has N fully independent RAM segments
> in-process and when a doc needs to be indexed it's added to one of
> them. Each segment would also write its own doc stores and
> "normal" segment merging (not the inefficient merge we now do on
> flush) would merge them. This should be a good simplification in
> the chain (eg maybe we can remove the *PerThread classes). The
> segments can flush independently, letting us make much better
> concurrent use of IO & CPU.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] Commented: (LUCENE-2831) Revise Weight#scorer & Filter#getDocIdSet API to pass Readers context

2011-01-05 Thread Yonik Seeley (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-2831?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12978052#action_12978052
 ] 

Yonik Seeley commented on LUCENE-2831:
--

I see another related bug I think:
CompositeReaderContext does this:
  super(parent, reader, false, leaves != null, ordInParent, 
docbaseInParent);
It assumes that a reader is top level if it has leaves.

> Revise Weight#scorer & Filter#getDocIdSet API to pass Readers context
> -
>
> Key: LUCENE-2831
> URL: https://issues.apache.org/jira/browse/LUCENE-2831
> Project: Lucene - Java
>  Issue Type: Improvement
>  Components: Search
>Affects Versions: 4.0
>Reporter: Simon Willnauer
>Assignee: Simon Willnauer
> Fix For: 4.0
>
> Attachments: LUCENE-2831.patch, LUCENE-2831.patch, LUCENE-2831.patch, 
> LUCENE-2831.patch, LUCENE-2831.patch, LUCENE-2831.patch
>
>
> Spinoff from LUCENE-2694 - instead of passing a reader into Weight#scorer(IR, 
> boolean, boolean) we should / could revise the API and pass in a struct that 
> has parent reader, sub reader, ord of that sub. The ord mapping plus the 
> context with its parent would make several issues way easier. See 
> LUCENE-2694, LUCENE-2348 and LUCENE-2829 to name some.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] Commented: (LUCENE-2847) Support all of unicode in StandardTokenizer

2011-01-05 Thread Robert Muir (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-2847?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12978037#action_12978037
 ] 

Robert Muir commented on LUCENE-2847:
-

{quote}
If we add a target in modules/analysis/icu/build.xml to run 
GenerateJFlexSupplementaryMacros#main(), maybe named gen-stdtok-supp-macros, 
the jflex target in modules/analysis/common/build.xml could use a  to 
call it and auto-generate SUPPLEMENTARY.jflex-macro, no?
{quote}

Yeah, i think we could do something like this. We could also consolidate tools, 
because in general i would rather all the analyzers
be consolidated, they are only split up due to dependencies/large files etc. 
But tools are different, its just to assist the build.

> Support all of unicode in StandardTokenizer
> ---
>
> Key: LUCENE-2847
> URL: https://issues.apache.org/jira/browse/LUCENE-2847
> Project: Lucene - Java
>  Issue Type: Bug
>  Components: Analysis
>Reporter: Robert Muir
> Fix For: 3.1, 4.0
>
> Attachments: LUCENE-2847.patch
>
>
> StandardTokenizer currently only supports the BMP.
> If it encounters characters outside of the BMP, it just discards them... 
> it should instead implement fully implement UAX#29 across all of unicode.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



Re: Use of MultiFields.getFields() bad practice?

2011-01-05 Thread Michael McCandless
On Wed, Jan 5, 2011 at 3:50 PM, Smiley, David W.  wrote:
> On Jan 5, 2011, at 1:35 PM, Uwe Schindler wrote:
>
>> BUT:
>>
>> I am just upset about such code:
>>
>>      final DocIdSet dis = filter.getDocIdSet(reader);
>>      if (dis == null)
>>        return null;
>>      final DocIdSetIterator disi = dis.iterator();
>>      if (disi == null)
>>        return null;
>>      return new ConstantScorer(similarity, disi, this);
>>
>> (this is what I have seen during my work for ConstantScoreQuery)
>
> Exactly.  I can't stand such code either.  Null has its place but it is often 
> avoidable.  Given that we're talking about trunk for a major version, I think 
> it's definitely not too late.

But, I think even w/ the sentinels we're going to have to have ifs here...

> It would be awesome if we had @NotNull, @Nullable, (and various threadsafe 
> ones!), and used FindBugs to validate various constraints.  There isn't yet a 
> standard set in the JDK so some projects like Apache Http components have 
> their own in their own package.  FindBugs ignores the package name (I know, 
> I've checked).  We could do the same?  If this would be acceptable then I 
> could create a patch.

This sounds awesome!  So it could catch us if sometimes we return null
from a @NotNull method?

Mike

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



Re: Use of MultiFields.getFields() bad practice?

2011-01-05 Thread Michael McCandless
On Wed, Jan 5, 2011 at 2:21 PM, Uwe Schindler  wrote:
> Some other ideas:
> Maybe add an isEmpty() "hint" method to DocIdSet(Iterator). Empty DocIdSet
> would always return true. The problem, this method is costly for OpenBitSet.
> Maybe its just a "hint". Returning false is also OK when its empty. So if
> you have an docIdSet that’s empty and you can easily detect it, simply
> return true. The default impl returns false.

Hmm... feels like that's overly complicated?  Shouldn't we encourage
impls to just use the empty sentinel?

> Something else: EMPTY_DOCIDSETITER could be same instance EMPTY_SCORER ==
> EMPTY_DOCIDSETITER (and implemented as Scorer). You only have to add few
> methods to this empty instance.

I like this one!

Mike

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] Commented: (LUCENE-2847) Support all of unicode in StandardTokenizer

2011-01-05 Thread Steven Rowe (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-2847?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12978027#action_12978027
 ] 

Steven Rowe commented on LUCENE-2847:
-

JFlex generates fine, everything compiles, all tests pass.

If we add a target in {{modules/analysis/icu/build.xml}} to run 
{{GenerateJFlexSupplementaryMacros#main()}}, maybe named 
{{gen-stdtok-supp-macros}}, the {{jflex}} target in 
{{modules/analysis/common/build.xml}} could use a {{}} to call it and 
auto-generate {{SUPPLEMENTARY.jflex-macro}}, no?


> Support all of unicode in StandardTokenizer
> ---
>
> Key: LUCENE-2847
> URL: https://issues.apache.org/jira/browse/LUCENE-2847
> Project: Lucene - Java
>  Issue Type: Bug
>  Components: Analysis
>Reporter: Robert Muir
> Fix For: 3.1, 4.0
>
> Attachments: LUCENE-2847.patch
>
>
> StandardTokenizer currently only supports the BMP.
> If it encounters characters outside of the BMP, it just discards them... 
> it should instead implement fully implement UAX#29 across all of unicode.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] Commented: (LUCENE-2831) Revise Weight#scorer & Filter#getDocIdSet API to pass Readers context

2011-01-05 Thread Yonik Seeley (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-2831?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12978023#action_12978023
 ] 

Yonik Seeley commented on LUCENE-2831:
--

Regarding this assert in IndexSearcher:
// TODO: eable this assert once SolrIndexReader and friends are refactored 
to use ReaderContext
// We can't assert this here since SolrIndexReader will fail in some 
contexts - once solr is consistent we should be fine here
// assert context.isTopLevel: "IndexSearcher's ReaderContext must be 
topLevel for reader" + context.reader;

This is a bug in ReaderUtil.build() that when passed a segment reader, it sets 
isTopLevel to false.
You got bit by those extra booleans ;-) 

When I hacked ReaderContext to just set isTopLevel to parent==null, all the 
solr tests passed w/ the assertion enabled.

> Revise Weight#scorer & Filter#getDocIdSet API to pass Readers context
> -
>
> Key: LUCENE-2831
> URL: https://issues.apache.org/jira/browse/LUCENE-2831
> Project: Lucene - Java
>  Issue Type: Improvement
>  Components: Search
>Affects Versions: 4.0
>Reporter: Simon Willnauer
>Assignee: Simon Willnauer
> Fix For: 4.0
>
> Attachments: LUCENE-2831.patch, LUCENE-2831.patch, LUCENE-2831.patch, 
> LUCENE-2831.patch, LUCENE-2831.patch, LUCENE-2831.patch
>
>
> Spinoff from LUCENE-2694 - instead of passing a reader into Weight#scorer(IR, 
> boolean, boolean) we should / could revise the API and pass in a struct that 
> has parent reader, sub reader, ord of that sub. The ord mapping plus the 
> context with its parent would make several issues way easier. See 
> LUCENE-2694, LUCENE-2348 and LUCENE-2829 to name some.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] Commented: (LUCENE-2837) Collapse Searcher/Searchable/IndexSearcher; remove contrib/remote; merge PMS into IndexSearcher

2011-01-05 Thread Michael McCandless (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-2837?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12978021#action_12978021
 ] 

Michael McCandless commented on LUCENE-2837:


bq. How about this little patch to avoid creation of IndexSearcher per-segment 
if not needed?

Looks great!

> Collapse Searcher/Searchable/IndexSearcher; remove contrib/remote; merge PMS 
> into IndexSearcher
> ---
>
> Key: LUCENE-2837
> URL: https://issues.apache.org/jira/browse/LUCENE-2837
> Project: Lucene - Java
>  Issue Type: Improvement
>  Components: Search
>Reporter: Michael McCandless
>Assignee: Michael McCandless
> Fix For: 3.1, 4.0
>
> Attachments: LUCENE-2837.patch, LUCENE-2837.patch, LUCENE-2837.patch
>
>
> We've discussed cleaning up our *Searcher stack for some time... I
> think we should try to do this before releasing 4.0.
> So I'm attaching an initial patch which:
>   * Removes Searcher, Searchable, absorbing all their methods into 
> IndexSearcher
>   * Removes contrib/remote
>   * Removes MultiSearcher
>   * Absorbs ParallelMultiSearcher into IndexSearcher (ie you can now
> pass useThreads=true, or a custom ES to the ctor)
> The patch is rough -- I just ripped stuff out, did search/replace to
> IndexSearcher, etc.  EG nothing is directly testing using threads with
> IndexSearcher, but before committing I think we should add a
> newSearcher to LuceneTestCase, which randomly chooses whether the
> searcher uses threads, and cutover tests to use this instead of making
> their own IndexSearcher.
> I think MultiSearcher has a useful purpose, but as it is today it's
> too low-level, eg it shouldn't be involved in rewriting queries: the
> Query.combine method is scary.  Maybe in its place we make a higher
> level class, with limited API, that's able to federate search across
> multiple IndexSearchers?  It'd also be able to optionally use thread
> per IndexSearcher.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] Commented: (SOLR-236) Field collapsing

2011-01-05 Thread Ron Veenstra (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-236?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12978012#action_12978012
 ] 

Ron Veenstra commented on SOLR-236:
---

I have also been getting a null pointer exception:
message null java.lang.NullPointerException at 
org.apache.solr.search.fieldcollapse.NonAdjacentDocumentCollapser$PredefinedScorer.docID(NonAdjacentDocumentCollapser.java:397)

The error is repeatable for a given search term when sorted by "score desc," 
followed by any other field. It seems to crop up whenever there is only one 
result that should be returned in the collapsed field group, but does not 
happen for every possible query where this is the case (leading me to believe 
something else is at work).  Changing the sort order to anything else (moving 
score to second, or omitting a second field) eliminates the error.  This was 
the simple solution for my problem, but wanted to post this in case any of the 
information proved useful.

Using Solr 1.4.1 with SOLR-236-1_4_1-paging-totals-working.patch

> Field collapsing
> 
>
> Key: SOLR-236
> URL: https://issues.apache.org/jira/browse/SOLR-236
> Project: Solr
>  Issue Type: New Feature
>  Components: search
>Affects Versions: 1.3
>Reporter: Emmanuel Keller
>Assignee: Shalin Shekhar Mangar
> Fix For: Next
>
> Attachments: collapsing-patch-to-1.3.0-dieter.patch, 
> collapsing-patch-to-1.3.0-ivan.patch, collapsing-patch-to-1.3.0-ivan_2.patch, 
> collapsing-patch-to-1.3.0-ivan_3.patch, DocSetScoreCollector.java, 
> field-collapse-3.patch, field-collapse-4-with-solrj.patch, 
> field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, 
> field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, 
> field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, 
> field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, 
> field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, 
> field-collapse-solr-236-2.patch, field-collapse-solr-236.patch, 
> field-collapsing-extended-592129.patch, field_collapsing_1.1.0.patch, 
> field_collapsing_1.3.patch, field_collapsing_dsteigerwald.diff, 
> field_collapsing_dsteigerwald.diff, field_collapsing_dsteigerwald.diff, 
> NonAdjacentDocumentCollapser.java, NonAdjacentDocumentCollapserTest.java, 
> quasidistributed.additional.patch, 
> SOLR-236-1_4_1-paging-totals-working.patch, SOLR-236-1_4_1.patch, 
> SOLR-236-distinctFacet.patch, SOLR-236-FieldCollapsing.patch, 
> SOLR-236-FieldCollapsing.patch, SOLR-236-FieldCollapsing.patch, 
> SOLR-236-trunk.patch, SOLR-236-trunk.patch, SOLR-236-trunk.patch, 
> SOLR-236-trunk.patch, SOLR-236-trunk.patch, SOLR-236.patch, SOLR-236.patch, 
> SOLR-236.patch, SOLR-236.patch, SOLR-236.patch, SOLR-236.patch, 
> SOLR-236.patch, SOLR-236.patch, solr-236.patch, SOLR-236_collapsing.patch, 
> SOLR-236_collapsing.patch
>
>
> This patch include a new feature called "Field collapsing".
> "Used in order to collapse a group of results with similar value for a given 
> field to a single entry in the result set. Site collapsing is a special case 
> of this, where all results for a given web site is collapsed into one or two 
> entries in the result set, typically with an associated "more documents from 
> this site" link. See also Duplicate detection."
> http://www.fastsearch.com/glossary.aspx?m=48&amid=299
> The implementation add 3 new query parameters (SolrParams):
> "collapse.field" to choose the field used to group results
> "collapse.type" normal (default value) or adjacent
> "collapse.max" to select how many continuous results are allowed before 
> collapsing
> TODO (in progress):
> - More documentation (on source code)
> - Test cases
> Two patches:
> - "field_collapsing.patch" for current development version
> - "field_collapsing_1.1.0.patch" for Solr-1.1.0
> P.S.: Feedback and misspelling correction are welcome ;-)

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] Updated: (LUCENE-2837) Collapse Searcher/Searchable/IndexSearcher; remove contrib/remote; merge PMS into IndexSearcher

2011-01-05 Thread Yonik Seeley (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-2837?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yonik Seeley updated LUCENE-2837:
-

Attachment: LUCENE-2837.patch

How about this little patch to avoid creation of IndexSearcher per-segment if 
not needed?

> Collapse Searcher/Searchable/IndexSearcher; remove contrib/remote; merge PMS 
> into IndexSearcher
> ---
>
> Key: LUCENE-2837
> URL: https://issues.apache.org/jira/browse/LUCENE-2837
> Project: Lucene - Java
>  Issue Type: Improvement
>  Components: Search
>Reporter: Michael McCandless
>Assignee: Michael McCandless
> Fix For: 3.1, 4.0
>
> Attachments: LUCENE-2837.patch, LUCENE-2837.patch, LUCENE-2837.patch
>
>
> We've discussed cleaning up our *Searcher stack for some time... I
> think we should try to do this before releasing 4.0.
> So I'm attaching an initial patch which:
>   * Removes Searcher, Searchable, absorbing all their methods into 
> IndexSearcher
>   * Removes contrib/remote
>   * Removes MultiSearcher
>   * Absorbs ParallelMultiSearcher into IndexSearcher (ie you can now
> pass useThreads=true, or a custom ES to the ctor)
> The patch is rough -- I just ripped stuff out, did search/replace to
> IndexSearcher, etc.  EG nothing is directly testing using threads with
> IndexSearcher, but before committing I think we should add a
> newSearcher to LuceneTestCase, which randomly chooses whether the
> searcher uses threads, and cutover tests to use this instead of making
> their own IndexSearcher.
> I think MultiSearcher has a useful purpose, but as it is today it's
> too low-level, eg it shouldn't be involved in rewriting queries: the
> Query.combine method is scary.  Maybe in its place we make a higher
> level class, with limited API, that's able to federate search across
> multiple IndexSearchers?  It'd also be able to optionally use thread
> per IndexSearcher.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



Re: Use of MultiFields.getFields() bad practice?

2011-01-05 Thread Smiley, David W.
On Jan 5, 2011, at 1:35 PM, Uwe Schindler wrote:

> BUT:
> 
> I am just upset about such code:
> 
>  final DocIdSet dis = filter.getDocIdSet(reader);
>  if (dis == null)
>return null;
>  final DocIdSetIterator disi = dis.iterator();
>  if (disi == null)
>return null;
>  return new ConstantScorer(similarity, disi, this);
> 
> (this is what I have seen during my work for ConstantScoreQuery)

Exactly.  I can't stand such code either.  Null has its place but it is often 
avoidable.  Given that we're talking about trunk for a major version, I think 
it's definitely not too late.

It would be awesome if we had @NotNull, @Nullable, (and various threadsafe 
ones!), and used FindBugs to validate various constraints.  There isn't yet a 
standard set in the JDK so some projects like Apache Http components have their 
own in their own package.  FindBugs ignores the package name (I know, I've 
checked).  We could do the same?  If this would be acceptable then I could 
create a patch.

~ David Smiley
-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] Resolved: (LUCENE-2831) Revise Weight#scorer & Filter#getDocIdSet API to pass Readers context

2011-01-05 Thread Simon Willnauer (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-2831?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Simon Willnauer resolved LUCENE-2831.
-

Resolution: Fixed

committed in revision 1055636

> Revise Weight#scorer & Filter#getDocIdSet API to pass Readers context
> -
>
> Key: LUCENE-2831
> URL: https://issues.apache.org/jira/browse/LUCENE-2831
> Project: Lucene - Java
>  Issue Type: Improvement
>  Components: Search
>Affects Versions: 4.0
>Reporter: Simon Willnauer
>Assignee: Simon Willnauer
> Fix For: 4.0
>
> Attachments: LUCENE-2831.patch, LUCENE-2831.patch, LUCENE-2831.patch, 
> LUCENE-2831.patch, LUCENE-2831.patch, LUCENE-2831.patch
>
>
> Spinoff from LUCENE-2694 - instead of passing a reader into Weight#scorer(IR, 
> boolean, boolean) we should / could revise the API and pass in a struct that 
> has parent reader, sub reader, ord of that sub. The ord mapping plus the 
> context with its parent would make several issues way easier. See 
> LUCENE-2694, LUCENE-2348 and LUCENE-2829 to name some.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] Updated: (SOLR-1268) Incorporate Lucene's FastVectorHighlighter

2011-01-05 Thread Uwe Schindler (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-1268?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Uwe Schindler updated SOLR-1268:


Fix Version/s: (was: 1.5)

> Incorporate Lucene's FastVectorHighlighter
> --
>
> Key: SOLR-1268
> URL: https://issues.apache.org/jira/browse/SOLR-1268
> Project: Solr
>  Issue Type: New Feature
>  Components: highlighter
>Reporter: Koji Sekiguchi
>Assignee: Koji Sekiguchi
>Priority: Minor
> Fix For: 3.1, 4.0
>
> Attachments: SOLR-1268-0_fragsize.patch, SOLR-1268-0_fragsize.patch, 
> SOLR-1268.patch, SOLR-1268.patch, SOLR-1268.patch
>
>
> Correcting Fix Version based on CHANGES.txt, see this thread for more 
> details...
> http://mail-archives.apache.org/mod_mbox/lucene-dev/201005.mbox/%3calpine.deb.1.10.1005251052040.24...@radix.cryptio.net%3e

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] Commented: (LUCENE-2831) Revise Weight#scorer & Filter#getDocIdSet API to pass Readers context

2011-01-05 Thread Uwe Schindler (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-2831?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12977945#action_12977945
 ] 

Uwe Schindler commented on LUCENE-2831:
---

Go ahead, looks good, +1

If there are smaller issues, let's fix them later. The patch is quite big, so 
its better to commit now and let everybody use it! I was also thinking about 
using ReaderContext in Query.rewrite() for consistency.

> Revise Weight#scorer & Filter#getDocIdSet API to pass Readers context
> -
>
> Key: LUCENE-2831
> URL: https://issues.apache.org/jira/browse/LUCENE-2831
> Project: Lucene - Java
>  Issue Type: Improvement
>  Components: Search
>Affects Versions: 4.0
>Reporter: Simon Willnauer
>Assignee: Simon Willnauer
> Fix For: 4.0
>
> Attachments: LUCENE-2831.patch, LUCENE-2831.patch, LUCENE-2831.patch, 
> LUCENE-2831.patch, LUCENE-2831.patch, LUCENE-2831.patch
>
>
> Spinoff from LUCENE-2694 - instead of passing a reader into Weight#scorer(IR, 
> boolean, boolean) we should / could revise the API and pass in a struct that 
> has parent reader, sub reader, ord of that sub. The ord mapping plus the 
> context with its parent would make several issues way easier. See 
> LUCENE-2694, LUCENE-2348 and LUCENE-2829 to name some.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] Updated: (SOLR-2307) PHPSerialized fails with sharded queries

2011-01-05 Thread Antonio Verni (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-2307?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Antonio Verni updated SOLR-2307:


Attachment: PHPSerializedResponseWriter.java.patch

> PHPSerialized fails with sharded queries
> 
>
> Key: SOLR-2307
> URL: https://issues.apache.org/jira/browse/SOLR-2307
> Project: Solr
>  Issue Type: Bug
>  Components: Response Writers
>Affects Versions: 1.3, 1.4.1
>Reporter: Antonio Verni
>Priority: Minor
> Attachments: PHPSerializedResponseWriter.java.patch
>
>
> Solr throws a "java.lang.IllegalArgumentException: Map size must not be 
> negative exception" when using the PHP Serialized response writer with 
> sharded queries. 
> To reproduce the issue start your preferred example and try the following 
> query:
> http://localhost:8983/solr/select/?q=*:*&wt=phps&shards=localhost:8983/solr,localhost:8983/solr
> It is caused by the JSONWriter implementation of writeSolrDocumentList and 
> writeSolrDocument. Overriding this two methods in the 
> PHPSerializedResponseWriter to handle the SolrDocument size seems to solve 
> the issue.
> Attached my patch made against trunk rev 1055588.
> cheers,
> Antonio

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] Created: (SOLR-2307) PHPSerialized fails with sharded queries

2011-01-05 Thread Antonio Verni (JIRA)
PHPSerialized fails with sharded queries


 Key: SOLR-2307
 URL: https://issues.apache.org/jira/browse/SOLR-2307
 Project: Solr
  Issue Type: Bug
  Components: Response Writers
Affects Versions: 1.4.1, 1.3
Reporter: Antonio Verni
Priority: Minor


Solr throws a "java.lang.IllegalArgumentException: Map size must not be 
negative exception" when using the PHP Serialized response writer with sharded 
queries. 
To reproduce the issue start your preferred example and try the following query:

http://localhost:8983/solr/select/?q=*:*&wt=phps&shards=localhost:8983/solr,localhost:8983/solr

It is caused by the JSONWriter implementation of writeSolrDocumentList and 
writeSolrDocument. Overriding this two methods in the 
PHPSerializedResponseWriter to handle the SolrDocument size seems to solve the 
issue.
Attached my patch made against trunk rev 1055588.

cheers,
Antonio


-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] Updated: (LUCENE-2831) Revise Weight#scorer & Filter#getDocIdSet API to pass Readers context

2011-01-05 Thread Simon Willnauer (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-2831?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Simon Willnauer updated LUCENE-2831:


Attachment: LUCENE-2831.patch

final patch, fixed the leafes problem and added changes.txt entry. I commit 
shortly

> Revise Weight#scorer & Filter#getDocIdSet API to pass Readers context
> -
>
> Key: LUCENE-2831
> URL: https://issues.apache.org/jira/browse/LUCENE-2831
> Project: Lucene - Java
>  Issue Type: Improvement
>  Components: Search
>Affects Versions: 4.0
>Reporter: Simon Willnauer
>Assignee: Simon Willnauer
> Fix For: 4.0
>
> Attachments: LUCENE-2831.patch, LUCENE-2831.patch, LUCENE-2831.patch, 
> LUCENE-2831.patch, LUCENE-2831.patch, LUCENE-2831.patch
>
>
> Spinoff from LUCENE-2694 - instead of passing a reader into Weight#scorer(IR, 
> boolean, boolean) we should / could revise the API and pass in a struct that 
> has parent reader, sub reader, ord of that sub. The ord mapping plus the 
> context with its parent would make several issues way easier. See 
> LUCENE-2694, LUCENE-2348 and LUCENE-2829 to name some.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] Commented: (LUCENE-2837) Collapse Searcher/Searchable/IndexSearcher; remove contrib/remote; merge PMS into IndexSearcher

2011-01-05 Thread Yonik Seeley (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-2837?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12977923#action_12977923
 ] 

Yonik Seeley commented on LUCENE-2837:
--

The multithreaded stuff feels like it should be in a subclass of IndexSearcher.
But barring that, perhaps make it so that the subSearcher array is only 
populated if there is an executor passed in (to try and keep IndexSearcher as 
light weight as possible)?

> Collapse Searcher/Searchable/IndexSearcher; remove contrib/remote; merge PMS 
> into IndexSearcher
> ---
>
> Key: LUCENE-2837
> URL: https://issues.apache.org/jira/browse/LUCENE-2837
> Project: Lucene - Java
>  Issue Type: Improvement
>  Components: Search
>Reporter: Michael McCandless
>Assignee: Michael McCandless
> Fix For: 3.1, 4.0
>
> Attachments: LUCENE-2837.patch, LUCENE-2837.patch
>
>
> We've discussed cleaning up our *Searcher stack for some time... I
> think we should try to do this before releasing 4.0.
> So I'm attaching an initial patch which:
>   * Removes Searcher, Searchable, absorbing all their methods into 
> IndexSearcher
>   * Removes contrib/remote
>   * Removes MultiSearcher
>   * Absorbs ParallelMultiSearcher into IndexSearcher (ie you can now
> pass useThreads=true, or a custom ES to the ctor)
> The patch is rough -- I just ripped stuff out, did search/replace to
> IndexSearcher, etc.  EG nothing is directly testing using threads with
> IndexSearcher, but before committing I think we should add a
> newSearcher to LuceneTestCase, which randomly chooses whether the
> searcher uses threads, and cutover tests to use this instead of making
> their own IndexSearcher.
> I think MultiSearcher has a useful purpose, but as it is today it's
> too low-level, eg it shouldn't be involved in rewriting queries: the
> Query.combine method is scary.  Maybe in its place we make a higher
> level class, with limited API, that's able to federate search across
> multiple IndexSearchers?  It'd also be able to optionally use thread
> per IndexSearcher.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] Reopened: (LUCENE-2837) Collapse Searcher/Searchable/IndexSearcher; remove contrib/remote; merge PMS into IndexSearcher

2011-01-05 Thread Michael McCandless (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-2837?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Michael McCandless reopened LUCENE-2837:



Reopen to also backport merging of PMS into IS in 3.x.

> Collapse Searcher/Searchable/IndexSearcher; remove contrib/remote; merge PMS 
> into IndexSearcher
> ---
>
> Key: LUCENE-2837
> URL: https://issues.apache.org/jira/browse/LUCENE-2837
> Project: Lucene - Java
>  Issue Type: Improvement
>  Components: Search
>Reporter: Michael McCandless
>Assignee: Michael McCandless
> Fix For: 3.1, 4.0
>
> Attachments: LUCENE-2837.patch, LUCENE-2837.patch
>
>
> We've discussed cleaning up our *Searcher stack for some time... I
> think we should try to do this before releasing 4.0.
> So I'm attaching an initial patch which:
>   * Removes Searcher, Searchable, absorbing all their methods into 
> IndexSearcher
>   * Removes contrib/remote
>   * Removes MultiSearcher
>   * Absorbs ParallelMultiSearcher into IndexSearcher (ie you can now
> pass useThreads=true, or a custom ES to the ctor)
> The patch is rough -- I just ripped stuff out, did search/replace to
> IndexSearcher, etc.  EG nothing is directly testing using threads with
> IndexSearcher, but before committing I think we should add a
> newSearcher to LuceneTestCase, which randomly chooses whether the
> searcher uses threads, and cutover tests to use this instead of making
> their own IndexSearcher.
> I think MultiSearcher has a useful purpose, but as it is today it's
> too low-level, eg it shouldn't be involved in rewriting queries: the
> Query.combine method is scary.  Maybe in its place we make a higher
> level class, with limited API, that's able to federate search across
> multiple IndexSearchers?  It'd also be able to optionally use thread
> per IndexSearcher.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



Re: svn commit: r1055587 - in /lucene/dev/branches/branch_3x/lucene: ./ contrib/remote/src/java/org/apache/lucene/search/ src/java/org/apache/lucene/search/

2011-01-05 Thread Michael McCandless
Good idea!  I'll reopen...

Mike

On Wed, Jan 5, 2011 at 2:24 PM, Uwe Schindler  wrote:
> What happens with PMS? Maybe we should backport the parallelization of 
> IndexSearcher to 3.x! Then we can also deprecate.
>
> -
> Uwe Schindler
> H.-H.-Meier-Allee 63, D-28213 Bremen
> http://www.thetaphi.de
> eMail: u...@thetaphi.de
>
>
>> -Original Message-
>> From: mikemcc...@apache.org [mailto:mikemcc...@apache.org]
>> Sent: Wednesday, January 05, 2011 8:15 PM
>> To: comm...@lucene.apache.org
>> Subject: svn commit: r1055587 - in /lucene/dev/branches/branch_3x/lucene:
>> ./ contrib/remote/src/java/org/apache/lucene/search/
>> src/java/org/apache/lucene/search/
>>
>> Author: mikemccand
>> Date: Wed Jan  5 19:14:47 2011
>> New Revision: 1055587
>>
>> URL: http://svn.apache.org/viewvc?rev=1055587&view=rev
>> Log:
>> LUCENE-2837: deprecate classes in 3.x
>>
>> Modified:
>>     lucene/dev/branches/branch_3x/lucene/CHANGES.txt
>>
>> lucene/dev/branches/branch_3x/lucene/contrib/remote/src/java/org/apac
>> he/lucene/search/RMIRemoteSearchable.java
>>
>> lucene/dev/branches/branch_3x/lucene/contrib/remote/src/java/org/apac
>> he/lucene/search/RemoteCachingWrapperFilter.java
>>
>> lucene/dev/branches/branch_3x/lucene/contrib/remote/src/java/org/apac
>> he/lucene/search/RemoteSearchable.java
>>
>> lucene/dev/branches/branch_3x/lucene/src/java/org/apache/lucene/searc
>> h/MultiSearcher.java
>>
>> lucene/dev/branches/branch_3x/lucene/src/java/org/apache/lucene/searc
>> h/Searchable.java
>>
>> lucene/dev/branches/branch_3x/lucene/src/java/org/apache/lucene/searc
>> h/Searcher.java
>>
>> Modified: lucene/dev/branches/branch_3x/lucene/CHANGES.txt
>> URL:
>> http://svn.apache.org/viewvc/lucene/dev/branches/branch_3x/lucene/CHA
>> NGES.txt?rev=1055587&r1=1055586&r2=1055587&view=diff
>> ==
>> 
>> --- lucene/dev/branches/branch_3x/lucene/CHANGES.txt (original)
>> +++ lucene/dev/branches/branch_3x/lucene/CHANGES.txt Wed Jan  5
>> 19:14:47 2011
>> @@ -77,6 +77,10 @@ Changes in backwards compatibility polic
>>  * LUCENE-2804: Directory.setLockFactory new declares throwing an
>> IOException.
>>    (Shai Erera, Robert Muir)
>>
>> +* LUCENE-2837: Added deprecations noting that in 4.0, Searcher and
>> +  Searchable are collapsed into IndexSearcher; contrib/remote and
>> +  MultiSearcher have been removed.  (Mike McCandless)
>> +
>>  Changes in runtime behavior
>>
>>  * LUCENE-1923: Made IndexReader.toString() produce something
>>
>> Modified:
>> lucene/dev/branches/branch_3x/lucene/contrib/remote/src/java/org/apac
>> he/lucene/search/RMIRemoteSearchable.java
>> URL:
>> http://svn.apache.org/viewvc/lucene/dev/branches/branch_3x/lucene/con
>> trib/remote/src/java/org/apache/lucene/search/RMIRemoteSearchable.jav
>> a?rev=1055587&r1=1055586&r2=1055587&view=diff
>> ==
>> 
>> ---
>> lucene/dev/branches/branch_3x/lucene/contrib/remote/src/java/org/apac
>> he/lucene/search/RMIRemoteSearchable.java (original)
>> +++
>> lucene/dev/branches/branch_3x/lucene/contrib/remote/src/java/org/apac
>> he/lucene/search/RMIRemoteSearchable.java Wed Jan  5 19:14:47 2011
>> @@ -38,7 +38,11 @@ import java.rmi.Remote;
>>   *
>>   * 
>>   * 
>> + *
>> + * @deprecated This package (all of contrib/remote) will be
>> + * removed in 4.0.
>>   */
>> +...@deprecated
>>  public interface RMIRemoteSearchable extends Searchable, Remote {
>>
>>  }
>>
>> Modified:
>> lucene/dev/branches/branch_3x/lucene/contrib/remote/src/java/org/apac
>> he/lucene/search/RemoteCachingWrapperFilter.java
>> URL:
>> http://svn.apache.org/viewvc/lucene/dev/branches/branch_3x/lucene/con
>> trib/remote/src/java/org/apache/lucene/search/RemoteCachingWrapperFilt
>> er.java?rev=1055587&r1=1055586&r2=1055587&view=diff
>> ==
>> 
>> ---
>> lucene/dev/branches/branch_3x/lucene/contrib/remote/src/java/org/apac
>> he/lucene/search/RemoteCachingWrapperFilter.java (original)
>> +++
>> lucene/dev/branches/branch_3x/lucene/contrib/remote/src/java/org/apac
>> he/lucene/search/RemoteCachingWrapperFilter.java Wed Jan  5 19:14:47
>> 2011
>> @@ -35,7 +35,11 @@ import org.apache.lucene.index.IndexRead
>>   * To cache a result you must do something like
>>   * RemoteCachingWrapperFilter f = new RemoteCachingWrapperFilter(new
>> CachingWrapperFilter(myFilter));
>>   * 
>> + *
>> + * @deprecated This package (all of contrib/remote) will be
>> + * removed in 4.0.
>>   */
>> +...@deprecated
>>  public class RemoteCachingWrapperFilter extends Filter {
>>    protected Filter filter;
>>
>>
>> Modified:
>> lucene/dev/branches/branch_3x/lucene/contrib/remote/src/java/org/apac
>> he/lucene/search/RemoteSearchable.java
>> URL:
>> http://svn.apache.org/viewvc/lucene/dev/branches/branch_3x/lucene/con
>> trib/remote/src/java/org/apache/lucene/search/RemoteSearchable.java?re
>> v=1055587&r1=10555

RE: svn commit: r1055587 - in /lucene/dev/branches/branch_3x/lucene: ./ contrib/remote/src/java/org/apache/lucene/search/ src/java/org/apache/lucene/search/

2011-01-05 Thread Uwe Schindler
What happens with PMS? Maybe we should backport the parallelization of 
IndexSearcher to 3.x! Then we can also deprecate.

-
Uwe Schindler
H.-H.-Meier-Allee 63, D-28213 Bremen
http://www.thetaphi.de
eMail: u...@thetaphi.de


> -Original Message-
> From: mikemcc...@apache.org [mailto:mikemcc...@apache.org]
> Sent: Wednesday, January 05, 2011 8:15 PM
> To: comm...@lucene.apache.org
> Subject: svn commit: r1055587 - in /lucene/dev/branches/branch_3x/lucene:
> ./ contrib/remote/src/java/org/apache/lucene/search/
> src/java/org/apache/lucene/search/
> 
> Author: mikemccand
> Date: Wed Jan  5 19:14:47 2011
> New Revision: 1055587
> 
> URL: http://svn.apache.org/viewvc?rev=1055587&view=rev
> Log:
> LUCENE-2837: deprecate classes in 3.x
> 
> Modified:
> lucene/dev/branches/branch_3x/lucene/CHANGES.txt
> 
> lucene/dev/branches/branch_3x/lucene/contrib/remote/src/java/org/apac
> he/lucene/search/RMIRemoteSearchable.java
> 
> lucene/dev/branches/branch_3x/lucene/contrib/remote/src/java/org/apac
> he/lucene/search/RemoteCachingWrapperFilter.java
> 
> lucene/dev/branches/branch_3x/lucene/contrib/remote/src/java/org/apac
> he/lucene/search/RemoteSearchable.java
> 
> lucene/dev/branches/branch_3x/lucene/src/java/org/apache/lucene/searc
> h/MultiSearcher.java
> 
> lucene/dev/branches/branch_3x/lucene/src/java/org/apache/lucene/searc
> h/Searchable.java
> 
> lucene/dev/branches/branch_3x/lucene/src/java/org/apache/lucene/searc
> h/Searcher.java
> 
> Modified: lucene/dev/branches/branch_3x/lucene/CHANGES.txt
> URL:
> http://svn.apache.org/viewvc/lucene/dev/branches/branch_3x/lucene/CHA
> NGES.txt?rev=1055587&r1=1055586&r2=1055587&view=diff
> ==
> 
> --- lucene/dev/branches/branch_3x/lucene/CHANGES.txt (original)
> +++ lucene/dev/branches/branch_3x/lucene/CHANGES.txt Wed Jan  5
> 19:14:47 2011
> @@ -77,6 +77,10 @@ Changes in backwards compatibility polic
>  * LUCENE-2804: Directory.setLockFactory new declares throwing an
> IOException.
>(Shai Erera, Robert Muir)
> 
> +* LUCENE-2837: Added deprecations noting that in 4.0, Searcher and
> +  Searchable are collapsed into IndexSearcher; contrib/remote and
> +  MultiSearcher have been removed.  (Mike McCandless)
> +
>  Changes in runtime behavior
> 
>  * LUCENE-1923: Made IndexReader.toString() produce something
> 
> Modified:
> lucene/dev/branches/branch_3x/lucene/contrib/remote/src/java/org/apac
> he/lucene/search/RMIRemoteSearchable.java
> URL:
> http://svn.apache.org/viewvc/lucene/dev/branches/branch_3x/lucene/con
> trib/remote/src/java/org/apache/lucene/search/RMIRemoteSearchable.jav
> a?rev=1055587&r1=1055586&r2=1055587&view=diff
> ==
> 
> ---
> lucene/dev/branches/branch_3x/lucene/contrib/remote/src/java/org/apac
> he/lucene/search/RMIRemoteSearchable.java (original)
> +++
> lucene/dev/branches/branch_3x/lucene/contrib/remote/src/java/org/apac
> he/lucene/search/RMIRemoteSearchable.java Wed Jan  5 19:14:47 2011
> @@ -38,7 +38,11 @@ import java.rmi.Remote;
>   *
>   * 
>   * 
> + *
> + * @deprecated This package (all of contrib/remote) will be
> + * removed in 4.0.
>   */
> +...@deprecated
>  public interface RMIRemoteSearchable extends Searchable, Remote {
> 
>  }
> 
> Modified:
> lucene/dev/branches/branch_3x/lucene/contrib/remote/src/java/org/apac
> he/lucene/search/RemoteCachingWrapperFilter.java
> URL:
> http://svn.apache.org/viewvc/lucene/dev/branches/branch_3x/lucene/con
> trib/remote/src/java/org/apache/lucene/search/RemoteCachingWrapperFilt
> er.java?rev=1055587&r1=1055586&r2=1055587&view=diff
> ==
> 
> ---
> lucene/dev/branches/branch_3x/lucene/contrib/remote/src/java/org/apac
> he/lucene/search/RemoteCachingWrapperFilter.java (original)
> +++
> lucene/dev/branches/branch_3x/lucene/contrib/remote/src/java/org/apac
> he/lucene/search/RemoteCachingWrapperFilter.java Wed Jan  5 19:14:47
> 2011
> @@ -35,7 +35,11 @@ import org.apache.lucene.index.IndexRead
>   * To cache a result you must do something like
>   * RemoteCachingWrapperFilter f = new RemoteCachingWrapperFilter(new
> CachingWrapperFilter(myFilter));
>   * 
> + *
> + * @deprecated This package (all of contrib/remote) will be
> + * removed in 4.0.
>   */
> +...@deprecated
>  public class RemoteCachingWrapperFilter extends Filter {
>protected Filter filter;
> 
> 
> Modified:
> lucene/dev/branches/branch_3x/lucene/contrib/remote/src/java/org/apac
> he/lucene/search/RemoteSearchable.java
> URL:
> http://svn.apache.org/viewvc/lucene/dev/branches/branch_3x/lucene/con
> trib/remote/src/java/org/apache/lucene/search/RemoteSearchable.java?re
> v=1055587&r1=1055586&r2=1055587&view=diff
> ==
> 
> ---
> lucene/dev/branches/branch_3x/lucene/contrib/remote/src/java/org/apac
> he/lucene/search/RemoteS

[jira] Resolved: (LUCENE-2837) Collapse Searcher/Searchable/IndexSearcher; remove contrib/remote; merge PMS into IndexSearcher

2011-01-05 Thread Michael McCandless (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-2837?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Michael McCandless resolved LUCENE-2837.


Resolution: Fixed

> Collapse Searcher/Searchable/IndexSearcher; remove contrib/remote; merge PMS 
> into IndexSearcher
> ---
>
> Key: LUCENE-2837
> URL: https://issues.apache.org/jira/browse/LUCENE-2837
> Project: Lucene - Java
>  Issue Type: Improvement
>  Components: Search
>Reporter: Michael McCandless
>Assignee: Michael McCandless
> Fix For: 3.1, 4.0
>
> Attachments: LUCENE-2837.patch, LUCENE-2837.patch
>
>
> We've discussed cleaning up our *Searcher stack for some time... I
> think we should try to do this before releasing 4.0.
> So I'm attaching an initial patch which:
>   * Removes Searcher, Searchable, absorbing all their methods into 
> IndexSearcher
>   * Removes contrib/remote
>   * Removes MultiSearcher
>   * Absorbs ParallelMultiSearcher into IndexSearcher (ie you can now
> pass useThreads=true, or a custom ES to the ctor)
> The patch is rough -- I just ripped stuff out, did search/replace to
> IndexSearcher, etc.  EG nothing is directly testing using threads with
> IndexSearcher, but before committing I think we should add a
> newSearcher to LuceneTestCase, which randomly chooses whether the
> searcher uses threads, and cutover tests to use this instead of making
> their own IndexSearcher.
> I think MultiSearcher has a useful purpose, but as it is today it's
> too low-level, eg it shouldn't be involved in rewriting queries: the
> Query.combine method is scary.  Maybe in its place we make a higher
> level class, with limited API, that's able to federate search across
> multiple IndexSearchers?  It'd also be able to optionally use thread
> per IndexSearcher.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



RE: Use of MultiFields.getFields() bad practice?

2011-01-05 Thread Uwe Schindler
Some other ideas:
Maybe add an isEmpty() "hint" method to DocIdSet(Iterator). Empty DocIdSet
would always return true. The problem, this method is costly for OpenBitSet.
Maybe its just a "hint". Returning false is also OK when its empty. So if
you have an docIdSet that’s empty and you can easily detect it, simply
return true. The default impl returns false.

Something else: EMPTY_DOCIDSETITER could be same instance EMPTY_SCORER ==
EMPTY_DOCIDSETITER (and implemented as Scorer). You only have to add few
methods to this empty instance.

-
Uwe Schindler
H.-H.-Meier-Allee 63, D-28213 Bremen
http://www.thetaphi.de
eMail: u...@thetaphi.de


> -Original Message-
> From: Michael McCandless [mailto:luc...@mikemccandless.com]
> Sent: Wednesday, January 05, 2011 8:12 PM
> To: dev@lucene.apache.org
> Subject: Re: Use of MultiFields.getFields() bad practice?
> 
> On Wed, Jan 5, 2011 at 1:35 PM, Uwe Schindler  wrote:
> 
> > Scorer is an iterator :-)
> 
> Aha!  You are right... I keep forgetting this about Scorer :)
> 
> >> In some cases the null return can make a difference in performance,
> >> eg if BQ is OR'ing two terms, but one of them yields a null scorer
> >> (matches no docs) then the scorer can [almost -- coord] rewrite to a
> >> TermScorer with just the other term vs BooleanScorer.  We don't do this
> today, but we should/could.
> >
> > In that case, BQ could simply do a check on scorer==EMPTY_SCORER to
> > achieve the same. As Scorer subclasses DocIdSetIterator, and is an
> > iterator, it should return something empty (see below).
> 
> I think that's a good idea?  Ie the contract would then be "if
Weight.scorer
> can determine up front that no docs can match, you should return
> EMPTY_SCORER since caller may optimize for that case".
> Ie, impls shouldn't return their own custom empty impl; they should return
> that specific instance.
> 
> Hmm are we ever gonna run into classloader hell, where we have more than
> one instance of this EMPTY_SCORER somehow...?  null is safer in that
regard
> since null is always null...
> 
> >> So my feeling is we have to take it case by case, and I think these
> >> two cases (Weight.scorer and Filter.getDocIdSet) should keep their
> >> current contract (null may be returned if no docs will match).
> >
> > But we should not force them to return either null or empty. So the
> > docs say
> > "*may* return null". They don’t need to. They can return an empty
> > scorer if they like.
> 
> Right -- I think "may return null" is the right contract here.
> 
> > BUT:
> >
> > I am just upset about such code:
> >
> >  final DocIdSet dis = filter.getDocIdSet(reader);
> >  if (dis == null)
> >   return null;
> >  final DocIdSetIterator disi = dis.iterator();
> >  if (disi == null)
> >return null;
> >  return new ConstantScorer(similarity, disi, this);
> >
> > (this is what I have seen during my work for ConstantScoreQuery)
> 
> Alas the DIS.iterator now states null is a valid return... sigh.  I think
it should
> not... but dangerous to change that now.
> 
> But, even if we switch to the EMPTY_X sentinel, you'd still need these
ifs?  IE,
> we want CSQ to detect an "empty" filter and forward this on as an "empty
> scorer".
> 
> Mike
> 
> -
> To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional
> commands, e-mail: dev-h...@lucene.apache.org



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] Commented: (LUCENE-2831) Revise Weight#scorer & Filter#getDocIdSet API to pass Readers context

2011-01-05 Thread Simon Willnauer (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-2831?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12977913#action_12977913
 ] 

Simon Willnauer commented on LUCENE-2831:
-

bq. But there's still a numLeafes in ReaderUtil 

bloddy dyslexic german 

I'll go ahead and commit

> Revise Weight#scorer & Filter#getDocIdSet API to pass Readers context
> -
>
> Key: LUCENE-2831
> URL: https://issues.apache.org/jira/browse/LUCENE-2831
> Project: Lucene - Java
>  Issue Type: Improvement
>  Components: Search
>Affects Versions: 4.0
>Reporter: Simon Willnauer
>Assignee: Simon Willnauer
> Fix For: 4.0
>
> Attachments: LUCENE-2831.patch, LUCENE-2831.patch, LUCENE-2831.patch, 
> LUCENE-2831.patch, LUCENE-2831.patch
>
>
> Spinoff from LUCENE-2694 - instead of passing a reader into Weight#scorer(IR, 
> boolean, boolean) we should / could revise the API and pass in a struct that 
> has parent reader, sub reader, ord of that sub. The ord mapping plus the 
> context with its parent would make several issues way easier. See 
> LUCENE-2694, LUCENE-2348 and LUCENE-2829 to name some.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] Commented: (LUCENE-2831) Revise Weight#scorer & Filter#getDocIdSet API to pass Readers context

2011-01-05 Thread Michael McCandless (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-2831?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12977912#action_12977912
 ] 

Michael McCandless commented on LUCENE-2831:


+1!

But there's still a numLeafes in ReaderUtil :)

> Revise Weight#scorer & Filter#getDocIdSet API to pass Readers context
> -
>
> Key: LUCENE-2831
> URL: https://issues.apache.org/jira/browse/LUCENE-2831
> Project: Lucene - Java
>  Issue Type: Improvement
>  Components: Search
>Affects Versions: 4.0
>Reporter: Simon Willnauer
>Assignee: Simon Willnauer
> Fix For: 4.0
>
> Attachments: LUCENE-2831.patch, LUCENE-2831.patch, LUCENE-2831.patch, 
> LUCENE-2831.patch, LUCENE-2831.patch
>
>
> Spinoff from LUCENE-2694 - instead of passing a reader into Weight#scorer(IR, 
> boolean, boolean) we should / could revise the API and pass in a struct that 
> has parent reader, sub reader, ord of that sub. The ord mapping plus the 
> context with its parent would make several issues way easier. See 
> LUCENE-2694, LUCENE-2348 and LUCENE-2829 to name some.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



Re: Use of MultiFields.getFields() bad practice?

2011-01-05 Thread Michael McCandless
On Wed, Jan 5, 2011 at 1:35 PM, Uwe Schindler  wrote:

> Scorer is an iterator :-)

Aha!  You are right... I keep forgetting this about Scorer :)

>> In some cases the null return can make a difference in performance, eg if BQ
>> is OR'ing two terms, but one of them yields a null scorer (matches no docs)
>> then the scorer can [almost -- coord] rewrite to a TermScorer with just the
>> other term vs BooleanScorer.  We don't do this today, but we should/could.
>
> In that case, BQ could simply do a check on scorer==EMPTY_SCORER to achieve
> the same. As Scorer subclasses DocIdSetIterator, and is an iterator, it
> should return something empty (see below).

I think that's a good idea?  Ie the contract would then be "if
Weight.scorer can determine up front that no docs can match, you
should return EMPTY_SCORER since caller may optimize for that case".
Ie, impls shouldn't return their own custom empty impl; they should
return that specific instance.

Hmm are we ever gonna run into classloader hell, where we have more
than one instance of this EMPTY_SCORER somehow...?  null is safer in
that regard since null is always null...

>> So my feeling is we have to take it case by case, and I think these two cases
>> (Weight.scorer and Filter.getDocIdSet) should keep their current contract
>> (null may be returned if no docs will match).
>
> But we should not force them to return either null or empty. So the docs say
> "*may* return null". They don’t need to. They can return an empty scorer if
> they like.

Right -- I think "may return null" is the right contract here.

> BUT:
>
> I am just upset about such code:
>
>  final DocIdSet dis = filter.getDocIdSet(reader);
>  if (dis == null)
>   return null;
>  final DocIdSetIterator disi = dis.iterator();
>  if (disi == null)
>return null;
>  return new ConstantScorer(similarity, disi, this);
>
> (this is what I have seen during my work for ConstantScoreQuery)

Alas the DIS.iterator now states null is a valid return... sigh.  I
think it should not... but dangerous to change that now.

But, even if we switch to the EMPTY_X sentinel, you'd still need these
ifs?  IE, we want CSQ to detect an "empty" filter and forward this on
as an "empty scorer".

Mike

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] Updated: (LUCENE-2831) Revise Weight#scorer & Filter#getDocIdSet API to pass Readers context

2011-01-05 Thread Simon Willnauer (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-2831?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Simon Willnauer updated LUCENE-2831:


Attachment: LUCENE-2831.patch

fixed those little spelling issues & added a children() method to 
ReaderContext. I also revised the leaves() jdocs to be more clear now. 

I think we are good to go!

> Revise Weight#scorer & Filter#getDocIdSet API to pass Readers context
> -
>
> Key: LUCENE-2831
> URL: https://issues.apache.org/jira/browse/LUCENE-2831
> Project: Lucene - Java
>  Issue Type: Improvement
>  Components: Search
>Affects Versions: 4.0
>Reporter: Simon Willnauer
>Assignee: Simon Willnauer
> Fix For: 4.0
>
> Attachments: LUCENE-2831.patch, LUCENE-2831.patch, LUCENE-2831.patch, 
> LUCENE-2831.patch, LUCENE-2831.patch
>
>
> Spinoff from LUCENE-2694 - instead of passing a reader into Weight#scorer(IR, 
> boolean, boolean) we should / could revise the API and pass in a struct that 
> has parent reader, sub reader, ord of that sub. The ord mapping plus the 
> context with its parent would make several issues way easier. See 
> LUCENE-2694, LUCENE-2348 and LUCENE-2829 to name some.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] Updated: (LUCENE-2847) Support all of unicode in StandardTokenizer

2011-01-05 Thread Robert Muir (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-2847?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Robert Muir updated LUCENE-2847:


Attachment: LUCENE-2847.patch

Here's a patch... I added a simple test.

I'm sure it can be beautified etc.

> Support all of unicode in StandardTokenizer
> ---
>
> Key: LUCENE-2847
> URL: https://issues.apache.org/jira/browse/LUCENE-2847
> Project: Lucene - Java
>  Issue Type: Bug
>  Components: Analysis
>Reporter: Robert Muir
> Fix For: 3.1, 4.0
>
> Attachments: LUCENE-2847.patch
>
>
> StandardTokenizer currently only supports the BMP.
> If it encounters characters outside of the BMP, it just discards them... 
> it should instead implement fully implement UAX#29 across all of unicode.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] Created: (LUCENE-2848) DisjunctionMaxQuery toString uses single pipe to join sub terms

2011-01-05 Thread George Campbell (JIRA)
DisjunctionMaxQuery toString uses single pipe to join sub terms
---

 Key: LUCENE-2848
 URL: https://issues.apache.org/jira/browse/LUCENE-2848
 Project: Lucene - Java
  Issue Type: Bug
  Components: Other
Affects Versions: 3.0.3
Reporter: George Campbell
Priority: Minor


I'm not 100% sure because I'm relatively new to Lucene but I think this line in 
org.apache.lucene.search.DisjunctionMaxQuery.toString(String field) should be 
changed from
  if (i != numDisjunctions-1) buffer.append(" | ");
to
  if (i != numDisjunctions-1) buffer.append(" OR ");



-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] Created: (LUCENE-2847) Support all of unicode in StandardTokenizer

2011-01-05 Thread Robert Muir (JIRA)
Support all of unicode in StandardTokenizer
---

 Key: LUCENE-2847
 URL: https://issues.apache.org/jira/browse/LUCENE-2847
 Project: Lucene - Java
  Issue Type: Bug
  Components: Analysis
Reporter: Robert Muir
 Fix For: 3.1, 4.0
 Attachments: LUCENE-2847.patch

StandardTokenizer currently only supports the BMP.

If it encounters characters outside of the BMP, it just discards them... 
it should instead implement fully implement UAX#29 across all of unicode.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



RE: Use of MultiFields.getFields() bad practice?

2011-01-05 Thread Uwe Schindler
> >> I do agree we should be returning null not EMPTY_DOCIDSET since
> >> Filter.getDocIDSet's jdoc clearly states a null return means no docs
> >> are accepted.
> >
> > I disagree. NULL is a stupid indicator, if something is empty it
> > should return something empty. I also don't like somebody returning
> > NULL instead of
> > Collections.emptySet() or like that. We should document Filter and
> > also Weight.scorer to always return non-null and also supply a
> > Scorer.EMPTY_SCORER. If you want to optimize something, you can always
> > change you loops to first do next() and then exit early.
> 
> I don't think it's so clear cut.  I think it depends on how the API is
used, and
> how advanced an API it is.
> 
> For example, Fields.terms("foobar") returns null if foobar is a
non-existent
> field.  I think that's appropriate, because to return a "fake instance"
loses
> information.  EG caller can no longer tell if a given field exists or not.

OK, I agree with that. If the field does not exist, it makes no sense to
return anything.

> In some cases the null return can make a difference in performance, eg if
BQ
> is OR'ing two terms, but one of them yields a null scorer (matches no
docs)
> then the scorer can [almost -- coord] rewrite to a TermScorer with just
the
> other term vs BooleanScorer.  We don't do this today, but we should/could.

In that case, BQ could simply do a check on scorer==EMPTY_SCORER to achieve
the same. As Scorer subclasses DocIdSetIterator, and is an iterator, it
should return something empty (see below).

> So my feeling is we have to take it case by case, and I think these two
cases
> (Weight.scorer and Filter.getDocIdSet) should keep their current contract
> (null may be returned if no docs will match).

But we should not force them to return either null or empty. So the docs say
"*may* return null". They don’t need to. They can return an empty scorer if
they like.

BUT:

I am just upset about such code:

  final DocIdSet dis = filter.getDocIdSet(reader);
  if (dis == null)
return null;
  final DocIdSetIterator disi = dis.iterator();
  if (disi == null)
return null;
  return new ConstantScorer(similarity, disi, this);

(this is what I have seen during my work for ConstantScoreQuery)

> Regardless, I think each API should clearly state the contract
unambiguously.
> Ie "this method never returns null", or, "this method will return null if
XYZ",
> or, "this method may return null if XYZ".
> 
> >> And I think I'm on the opposite side of the fence on null returns, at
> > least for
> >> advanced APIs -- I'd prefer not to hide information (by returning an
> >> empty
> >> instance) in this case.  But other cases I think should never return
> >> null
> > -- eg
> >> once you have a non-null DocIdSet, then its .iterator() should never
> > return
> >> null.
> >
> > I agree. DocIdSet.iterator() should return EMPTY_DOCIDSETITERATOR.
> 
> Right, for an iterator from any class I think it should never return null.

Scorer is an iterator :-)

Uwe


-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] Commented: (LUCENE-2324) Per thread DocumentsWriters that write their own private segments

2011-01-05 Thread Jason Rutherglen (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-2324?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12977892#action_12977892
 ] 

Jason Rutherglen commented on LUCENE-2324:
--

Another model we could implement is a straight queuing. This'd give us total
ordering on all IW calls. Documents, deletes, and flushes would be queued up
and executed asynchronously. For example in today's DWPT code we will still
block document additions while flushing because we're tying a thread to a given
DWPT. If a thread's DWPT is flushing, wouldn't we want to simply assign the doc
add to a different non-flushing DWPT to gain full efficiency? This seems more
easily doable with a queuing model. If we want synchronous flushing then we'd
place a flush event in the queue and wait for it to complete executing. How
does this sound? 



> Per thread DocumentsWriters that write their own private segments
> -
>
> Key: LUCENE-2324
> URL: https://issues.apache.org/jira/browse/LUCENE-2324
> Project: Lucene - Java
>  Issue Type: Improvement
>  Components: Index
>Reporter: Michael Busch
>Assignee: Michael Busch
>Priority: Minor
> Fix For: Realtime Branch
>
> Attachments: LUCENE-2324-SMALL.patch, LUCENE-2324-SMALL.patch, 
> lucene-2324.patch, lucene-2324.patch, LUCENE-2324.patch, test.out
>
>
> See LUCENE-2293 for motivation and more details.
> I'm copying here Mike's summary he posted on 2293:
> Change the approach for how we buffer in RAM to a more isolated
> approach, whereby IW has N fully independent RAM segments
> in-process and when a doc needs to be indexed it's added to one of
> them. Each segment would also write its own doc stores and
> "normal" segment merging (not the inefficient merge we now do on
> flush) would merge them. This should be a good simplification in
> the chain (eg maybe we can remove the *PerThread classes). The
> segments can flush independently, letting us make much better
> concurrent use of IO & CPU.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



Re: Use of MultiFields.getFields() bad practice?

2011-01-05 Thread Michael McCandless
On Wed, Jan 5, 2011 at 12:35 PM, Uwe Schindler  wrote:
>> I do agree we should be returning null not EMPTY_DOCIDSET since
>> Filter.getDocIDSet's jdoc clearly states a null return means no docs are
>> accepted.
>
> I disagree. NULL is a stupid indicator, if something is empty it should
> return something empty. I also don't like somebody returning NULL instead of
> Collections.emptySet() or like that. We should document Filter and also
> Weight.scorer to always return non-null and also supply a
> Scorer.EMPTY_SCORER. If you want to optimize something, you can always
> change you loops to first do next() and then exit early.

I don't think it's so clear cut.  I think it depends on how the API is
used, and how advanced an API it is.

For example, Fields.terms("foobar") returns null if foobar is a
non-existent field.  I think that's appropriate, because to return a
"fake instance" loses information.  EG caller can no longer tell if a
given field exists or not.

In some cases the null return can make a difference in performance, eg
if BQ is OR'ing two terms, but one of them yields a null scorer
(matches no docs) then the scorer can [almost -- coord] rewrite to a
TermScorer with just the other term vs BooleanScorer.  We don't do
this today, but we should/could.

So my feeling is we have to take it case by case, and I think these
two cases (Weight.scorer and Filter.getDocIdSet) should keep their
current contract (null may be returned if no docs will match).

Regardless, I think each API should clearly state the contract
unambiguously.  Ie "this method never returns null", or, "this method
will return null if XYZ", or, "this method may return null if XYZ".

>> And I think I'm on the opposite side of the fence on null returns, at
> least for
>> advanced APIs -- I'd prefer not to hide information (by returning an empty
>> instance) in this case.  But other cases I think should never return null
> -- eg
>> once you have a non-null DocIdSet, then its .iterator() should never
> return
>> null.
>
> I agree. DocIdSet.iterator() should return EMPTY_DOCIDSETITERATOR.

Right, for an iterator from any class I think it should never return
null.

Mike

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] Commented: (LUCENE-2831) Revise Weight#scorer & Filter#getDocIdSet API to pass Readers context

2011-01-05 Thread Michael McCandless (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-2831?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12977888#action_12977888
 ] 

Michael McCandless commented on LUCENE-2831:


Patch looks good!

There's a numLeafs in ReaderUtil still, and s/docbaseInParent/docBaseInParent.

I think children() would be good.

> Revise Weight#scorer & Filter#getDocIdSet API to pass Readers context
> -
>
> Key: LUCENE-2831
> URL: https://issues.apache.org/jira/browse/LUCENE-2831
> Project: Lucene - Java
>  Issue Type: Improvement
>  Components: Search
>Affects Versions: 4.0
>Reporter: Simon Willnauer
>Assignee: Simon Willnauer
> Fix For: 4.0
>
> Attachments: LUCENE-2831.patch, LUCENE-2831.patch, LUCENE-2831.patch, 
> LUCENE-2831.patch
>
>
> Spinoff from LUCENE-2694 - instead of passing a reader into Weight#scorer(IR, 
> boolean, boolean) we should / could revise the API and pass in a struct that 
> has parent reader, sub reader, ord of that sub. The ord mapping plus the 
> context with its parent would make several issues way easier. See 
> LUCENE-2694, LUCENE-2348 and LUCENE-2829 to name some.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] Commented: (LUCENE-2837) Collapse Searcher/Searchable/IndexSearcher; remove contrib/remote; merge PMS into IndexSearcher

2011-01-05 Thread Michael McCandless (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-2837?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12977882#action_12977882
 ] 

Michael McCandless commented on LUCENE-2837:


I agree we should fix 3.x, too.

Also I didn't fix all the jdocs that reference Searcher/Searchable!

I'll reopen...

> Collapse Searcher/Searchable/IndexSearcher; remove contrib/remote; merge PMS 
> into IndexSearcher
> ---
>
> Key: LUCENE-2837
> URL: https://issues.apache.org/jira/browse/LUCENE-2837
> Project: Lucene - Java
>  Issue Type: Improvement
>  Components: Search
>Reporter: Michael McCandless
> Fix For: 3.1, 4.0
>
> Attachments: LUCENE-2837.patch, LUCENE-2837.patch
>
>
> We've discussed cleaning up our *Searcher stack for some time... I
> think we should try to do this before releasing 4.0.
> So I'm attaching an initial patch which:
>   * Removes Searcher, Searchable, absorbing all their methods into 
> IndexSearcher
>   * Removes contrib/remote
>   * Removes MultiSearcher
>   * Absorbs ParallelMultiSearcher into IndexSearcher (ie you can now
> pass useThreads=true, or a custom ES to the ctor)
> The patch is rough -- I just ripped stuff out, did search/replace to
> IndexSearcher, etc.  EG nothing is directly testing using threads with
> IndexSearcher, but before committing I think we should add a
> newSearcher to LuceneTestCase, which randomly chooses whether the
> searcher uses threads, and cutover tests to use this instead of making
> their own IndexSearcher.
> I think MultiSearcher has a useful purpose, but as it is today it's
> too low-level, eg it shouldn't be involved in rewriting queries: the
> Query.combine method is scary.  Maybe in its place we make a higher
> level class, with limited API, that's able to federate search across
> multiple IndexSearchers?  It'd also be able to optionally use thread
> per IndexSearcher.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] Updated: (LUCENE-2837) Collapse Searcher/Searchable/IndexSearcher; remove contrib/remote; merge PMS into IndexSearcher

2011-01-05 Thread Michael McCandless (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-2837?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Michael McCandless updated LUCENE-2837:
---

Fix Version/s: 3.1

> Collapse Searcher/Searchable/IndexSearcher; remove contrib/remote; merge PMS 
> into IndexSearcher
> ---
>
> Key: LUCENE-2837
> URL: https://issues.apache.org/jira/browse/LUCENE-2837
> Project: Lucene - Java
>  Issue Type: Improvement
>  Components: Search
>Reporter: Michael McCandless
>Assignee: Michael McCandless
> Fix For: 3.1, 4.0
>
> Attachments: LUCENE-2837.patch, LUCENE-2837.patch
>
>
> We've discussed cleaning up our *Searcher stack for some time... I
> think we should try to do this before releasing 4.0.
> So I'm attaching an initial patch which:
>   * Removes Searcher, Searchable, absorbing all their methods into 
> IndexSearcher
>   * Removes contrib/remote
>   * Removes MultiSearcher
>   * Absorbs ParallelMultiSearcher into IndexSearcher (ie you can now
> pass useThreads=true, or a custom ES to the ctor)
> The patch is rough -- I just ripped stuff out, did search/replace to
> IndexSearcher, etc.  EG nothing is directly testing using threads with
> IndexSearcher, but before committing I think we should add a
> newSearcher to LuceneTestCase, which randomly chooses whether the
> searcher uses threads, and cutover tests to use this instead of making
> their own IndexSearcher.
> I think MultiSearcher has a useful purpose, but as it is today it's
> too low-level, eg it shouldn't be involved in rewriting queries: the
> Query.combine method is scary.  Maybe in its place we make a higher
> level class, with limited API, that's able to federate search across
> multiple IndexSearchers?  It'd also be able to optionally use thread
> per IndexSearcher.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] Reopened: (LUCENE-2837) Collapse Searcher/Searchable/IndexSearcher; remove contrib/remote; merge PMS into IndexSearcher

2011-01-05 Thread Michael McCandless (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-2837?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Michael McCandless reopened LUCENE-2837:


  Assignee: Michael McCandless

> Collapse Searcher/Searchable/IndexSearcher; remove contrib/remote; merge PMS 
> into IndexSearcher
> ---
>
> Key: LUCENE-2837
> URL: https://issues.apache.org/jira/browse/LUCENE-2837
> Project: Lucene - Java
>  Issue Type: Improvement
>  Components: Search
>Reporter: Michael McCandless
>Assignee: Michael McCandless
> Fix For: 3.1, 4.0
>
> Attachments: LUCENE-2837.patch, LUCENE-2837.patch
>
>
> We've discussed cleaning up our *Searcher stack for some time... I
> think we should try to do this before releasing 4.0.
> So I'm attaching an initial patch which:
>   * Removes Searcher, Searchable, absorbing all their methods into 
> IndexSearcher
>   * Removes contrib/remote
>   * Removes MultiSearcher
>   * Absorbs ParallelMultiSearcher into IndexSearcher (ie you can now
> pass useThreads=true, or a custom ES to the ctor)
> The patch is rough -- I just ripped stuff out, did search/replace to
> IndexSearcher, etc.  EG nothing is directly testing using threads with
> IndexSearcher, but before committing I think we should add a
> newSearcher to LuceneTestCase, which randomly chooses whether the
> searcher uses threads, and cutover tests to use this instead of making
> their own IndexSearcher.
> I think MultiSearcher has a useful purpose, but as it is today it's
> too low-level, eg it shouldn't be involved in rewriting queries: the
> Query.combine method is scary.  Maybe in its place we make a higher
> level class, with limited API, that's able to federate search across
> multiple IndexSearchers?  It'd also be able to optionally use thread
> per IndexSearcher.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] Commented: (LUCENE-2831) Revise Weight#scorer & Filter#getDocIdSet API to pass Readers context

2011-01-05 Thread Simon Willnauer (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-2831?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12977878#action_12977878
 ] 

Simon Willnauer commented on LUCENE-2831:
-

bq. ReaderContextBuilder.numLeafes uses an AtomicInt, but ReaderUtil.Gather 
doesn't do any threading.
that way I can update it in the annonymous class - not do any threading doesn't 
really matter that operation is not time critical at all. Impl. detail IMO 
which is just convenient 

bq. ReaderContext.leaves() is a method - shouldn't it just be a member for 
consistency? I don't really understand the javadoc on that method either, since 
I don't see how I could walk the tree myself - there are no child pointers.
well they are in CompositeReaderContext but that jdoc is missleading. I added 
it to prevent a cast to check if there are leaves I don't see why this is 
problematic here though. I would rather add a children() method for consistency 
here though.

bq. Is ReaderContext.isTopLevel redundant (i.e. it will always be equal to 
parent==null)? Maybe the same thing for isAtomic and leaves==null?
yeah we could do that but I would prefer the simple booleans since they are way 
more expressive and easier to understand.



> Revise Weight#scorer & Filter#getDocIdSet API to pass Readers context
> -
>
> Key: LUCENE-2831
> URL: https://issues.apache.org/jira/browse/LUCENE-2831
> Project: Lucene - Java
>  Issue Type: Improvement
>  Components: Search
>Affects Versions: 4.0
>Reporter: Simon Willnauer
>Assignee: Simon Willnauer
> Fix For: 4.0
>
> Attachments: LUCENE-2831.patch, LUCENE-2831.patch, LUCENE-2831.patch, 
> LUCENE-2831.patch
>
>
> Spinoff from LUCENE-2694 - instead of passing a reader into Weight#scorer(IR, 
> boolean, boolean) we should / could revise the API and pass in a struct that 
> has parent reader, sub reader, ord of that sub. The ord mapping plus the 
> context with its parent would make several issues way easier. See 
> LUCENE-2694, LUCENE-2348 and LUCENE-2829 to name some.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] Updated: (SOLR-2306) Modify default solrconfig parameters via JMX

2011-01-05 Thread Amit Nithian (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-2306?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Amit Nithian updated SOLR-2306:
---

Attachment: tuning.patch

This is the first version of the making parameters writable via JMX.

> Modify default solrconfig parameters via JMX
> 
>
> Key: SOLR-2306
> URL: https://issues.apache.org/jira/browse/SOLR-2306
> Project: Solr
>  Issue Type: Improvement
>  Components: SearchComponents - other
>Affects Versions: 1.5
>Reporter: Amit Nithian
>Priority: Minor
> Fix For: 1.5
>
> Attachments: tuning.patch
>
>
> Solr JMX support is great for reading the state of the search engine but it 
> should also support writing parameters that can affect the runtime 
> performance of the engine. At Zvents, our team wrote a custom web-UI in the 
> /admin folder to accomplish this but now have made a preliminary patch to 
> move this into JMX so that JConsole can be used to modify runtime parameters. 
> This is mostly used to tune ranking parameters in the configuration file 
> without passing them via the URL (to prevent changes to our front end site) 
> nor restarting the servlet container.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] Created: (SOLR-2306) Modify default solrconfig parameters via JMX

2011-01-05 Thread Amit Nithian (JIRA)
Modify default solrconfig parameters via JMX


 Key: SOLR-2306
 URL: https://issues.apache.org/jira/browse/SOLR-2306
 Project: Solr
  Issue Type: Improvement
  Components: SearchComponents - other
Affects Versions: 1.5
Reporter: Amit Nithian
Priority: Minor
 Fix For: 1.5
 Attachments: tuning.patch

Solr JMX support is great for reading the state of the search engine but it 
should also support writing parameters that can affect the runtime performance 
of the engine. At Zvents, our team wrote a custom web-UI in the /admin folder 
to accomplish this but now have made a preliminary patch to move this into JMX 
so that JConsole can be used to modify runtime parameters. This is mostly used 
to tune ranking parameters in the configuration file without passing them via 
the URL (to prevent changes to our front end site) nor restarting the servlet 
container.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] Commented: (LUCENE-2831) Revise Weight#scorer & Filter#getDocIdSet API to pass Readers context

2011-01-05 Thread Yonik Seeley (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-2831?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12977872#action_12977872
 ] 

Yonik Seeley commented on LUCENE-2831:
--

I'm browsing through this latest patch a bit...
- ReaderContextBuilder.numLeafes uses an AtomicInt, but ReaderUtil.Gather 
doesn't do any threading.
- ReaderContext.leaves() is a method - shouldn't it just be a member for 
consistency?  I don't really understand the javadoc on that method either, 
since I don't see how I could walk the tree myself - there are no child 
pointers.
- Is ReaderContext.isTopLevel redundant (i.e. it will always be equal to 
parent==null)?  Maybe the same thing for isAtomic and leaves==null?

> Revise Weight#scorer & Filter#getDocIdSet API to pass Readers context
> -
>
> Key: LUCENE-2831
> URL: https://issues.apache.org/jira/browse/LUCENE-2831
> Project: Lucene - Java
>  Issue Type: Improvement
>  Components: Search
>Affects Versions: 4.0
>Reporter: Simon Willnauer
>Assignee: Simon Willnauer
> Fix For: 4.0
>
> Attachments: LUCENE-2831.patch, LUCENE-2831.patch, LUCENE-2831.patch, 
> LUCENE-2831.patch
>
>
> Spinoff from LUCENE-2694 - instead of passing a reader into Weight#scorer(IR, 
> boolean, boolean) we should / could revise the API and pass in a struct that 
> has parent reader, sub reader, ord of that sub. The ord mapping plus the 
> context with its parent would make several issues way easier. See 
> LUCENE-2694, LUCENE-2348 and LUCENE-2829 to name some.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



RE: Use of MultiFields.getFields() bad practice?

2011-01-05 Thread Uwe Schindler
> I do agree we should be returning null not EMPTY_DOCIDSET since
> Filter.getDocIDSet's jdoc clearly states a null return means no docs are
> accepted.

I disagree. NULL is a stupid indicator, if something is empty it should
return something empty. I also don't like somebody returning NULL instead of
Collections.emptySet() or like that. We should document Filter and also
Weight.scorer to always return non-null and also supply a
Scorer.EMPTY_SCORER. If you want to optimize something, you can always
change you loops to first do next() and then exit early.

> And I think I'm on the opposite side of the fence on null returns, at
least for
> advanced APIs -- I'd prefer not to hide information (by returning an empty
> instance) in this case.  But other cases I think should never return null
-- eg
> once you have a non-null DocIdSet, then its .iterator() should never
return
> null.

I agree. DocIdSet.iterator() should return EMPTY_DOCIDSETITERATOR.

> Mike
> 
> On Tue, Jan 4, 2011 at 5:38 PM, Smiley, David W. 
> wrote:
> > I'm looking through the trunk code on various implementations of
> Filter.getDocIdSet(IndexReader).  It is often needed to get an instance of
> Terms and then do other work from there.  Looking at
> MultiTermQueryWrapperFilter, the first set of lines to do this is:
> >
> >  public DocIdSet getDocIdSet(IndexReader reader) throws IOException {
> >    final Fields fields = MultiFields.getFields(reader);
> >    if (fields == null) {
> >      // reader has no fields
> >      return DocIdSet.EMPTY_DOCIDSET;
> >    }
> >
> >    final Terms terms = fields.terms(query.field);
> >    if (terms == null) {
> >      // field does not exist
> >      return DocIdSet.EMPTY_DOCIDSET;
> >    }
> > 
> >
> > When I look at the javadoc for MultiFields.getFields(reader), I see some
> Javadoc (apparently written by Michael McCandless, CC'ed), with the
> following javadoc snippet :
> >   *  NOTE: this is a slow way to access postings.
> >   *  It's better to get the sub-readers (using {...@link
> >   *  Gather}) and iterate through them
> >   *  yourself.
> >
> > If this is the case, then why is MultiFields.getFields(reader) used 43
times
> across Lucene/Solr whereas ReaderUtil.Gather is only used 5 times?  If
it's a
> TODO then perhaps a JIRA issue needs to be created.  I don't find helpful
> examples of how to use ReaderUtil.Gather... the existing 5 uses are all
within
> MultiFields & ReaderUtil.
> >
> > FWIW, in a Lucene Filter I wrote, I've been using this code snippet
> successfully:
> >
> >        Terms terms = reader.fields().terms(fieldName);
> >
> > On a related topic, I think that if Filter.getDocIdSet() is documented
that it
> may return null, then it's better code design to consequently return null
in
> appropriate circumstances instead of DocIdSet.EMPTY_DOCIDSET.  That said,
> FWIW, I prefer API design that favors non-null when you can get away with
> it, like this case.  So I'm in favor of making getDocIdSet() be documented
to
> not return null (and follow through throughout the codebase).  Admittedly
> some callers might have short-circuit logic.
> >
> > ~ David Smiley
> 
> -
> To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional
> commands, e-mail: dev-h...@lucene.apache.org



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] Resolved: (LUCENE-2814) stop writing shared doc stores across segments

2011-01-05 Thread Michael McCandless (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-2814?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Michael McCandless resolved LUCENE-2814.


   Resolution: Fixed
Fix Version/s: 4.0
   3.1

> stop writing shared doc stores across segments
> --
>
> Key: LUCENE-2814
> URL: https://issues.apache.org/jira/browse/LUCENE-2814
> Project: Lucene - Java
>  Issue Type: Improvement
>  Components: Index
>Affects Versions: 3.1, 4.0
>Reporter: Michael McCandless
>Assignee: Michael McCandless
> Fix For: 3.1, 4.0
>
> Attachments: LUCENE-2814.patch, LUCENE-2814.patch, LUCENE-2814.patch, 
> LUCENE-2814.patch, LUCENE-2814.patch
>
>
> Shared doc stores enables the files for stored fields and term vectors to be 
> shared across multiple segments.  We've had this optimization since 2.1 I 
> think.
> It works best against a new index, where you open an IW, add lots of docs, 
> and then close it.  In that case all of the written segments will reference 
> slices a single shared doc store segment.
> This was a good optimization because it means we never need to merge these 
> files.  But, when you open another IW on that index, it writes a new set of 
> doc stores, and then whenever merges take place across doc stores, they must 
> now be merged.
> However, since we switched to shared doc stores, there have been two 
> optimizations for merging the stores.  First, we now bulk-copy the bytes in 
> these files if the field name/number assignment is "congruent".  Second, we 
> now force congruent field name/number mapping in IndexWriter.  This means 
> this optimization is much less potent than it used to be.
> Furthermore, the optimization adds *a lot* of hair to 
> IndexWriter/DocumentsWriter; this has been the source of sneaky bugs over 
> time, and causes odd behavior like a merge possibly forcing a flush when it 
> starts.  Finally, with DWPT (LUCENE-2324), which gets us truly concurrent 
> flushing, we can no longer share doc stores.
> So, I think we should turn off the write-side of shared doc stores to pave 
> the path for DWPT to land on trunk and simplify IW/DW.  We still must support 
> reading them (until 5.0), but the read side is far less hairy.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



Re: Use of MultiFields.getFields() bad practice?

2011-01-05 Thread Michael McCandless
I just cleaned up a few more unnecessary ones.

Really, we need to break out atomic vs composite IndexReaders.
Effectively, we already have, it's just that it's dynamically typed
(you hit exc's at runtime) not statically typed.  I'd like to make it
statically typed so it's clear which APIs take what readers.  EG
Filter.getDocIDSet always takes an atomic reader.

I do agree we should be returning null not EMPTY_DOCIDSET since
Filter.getDocIDSet's jdoc clearly states a null return means no docs
are accepted.

And I think I'm on the opposite side of the fence on null returns, at
least for advanced APIs -- I'd prefer not to hide information (by
returning an empty instance) in this case.  But other cases I think
should never return null -- eg once you have a non-null DocIdSet, then
its .iterator() should never return null.

Mike

On Tue, Jan 4, 2011 at 5:38 PM, Smiley, David W.  wrote:
> I'm looking through the trunk code on various implementations of 
> Filter.getDocIdSet(IndexReader).  It is often needed to get an instance of 
> Terms and then do other work from there.  Looking at 
> MultiTermQueryWrapperFilter, the first set of lines to do this is:
>
>  public DocIdSet getDocIdSet(IndexReader reader) throws IOException {
>    final Fields fields = MultiFields.getFields(reader);
>    if (fields == null) {
>      // reader has no fields
>      return DocIdSet.EMPTY_DOCIDSET;
>    }
>
>    final Terms terms = fields.terms(query.field);
>    if (terms == null) {
>      // field does not exist
>      return DocIdSet.EMPTY_DOCIDSET;
>    }
> 
>
> When I look at the javadoc for MultiFields.getFields(reader), I see some 
> Javadoc (apparently written by Michael McCandless, CC'ed), with the following 
> javadoc snippet :
>   *  NOTE: this is a slow way to access postings.
>   *  It's better to get the sub-readers (using {...@link
>   *  Gather}) and iterate through them
>   *  yourself.
>
> If this is the case, then why is MultiFields.getFields(reader) used 43 times 
> across Lucene/Solr whereas ReaderUtil.Gather is only used 5 times?  If it's a 
> TODO then perhaps a JIRA issue needs to be created.  I don't find helpful 
> examples of how to use ReaderUtil.Gather... the existing 5 uses are all 
> within MultiFields & ReaderUtil.
>
> FWIW, in a Lucene Filter I wrote, I've been using this code snippet 
> successfully:
>
>        Terms terms = reader.fields().terms(fieldName);
>
> On a related topic, I think that if Filter.getDocIdSet() is documented that 
> it may return null, then it's better code design to consequently return null 
> in appropriate circumstances instead of DocIdSet.EMPTY_DOCIDSET.  That said, 
> FWIW, I prefer API design that favors non-null when you can get away with it, 
> like this case.  So I'm in favor of making getDocIdSet() be documented to not 
> return null (and follow through throughout the codebase).  Admittedly some 
> callers might have short-circuit logic.
>
> ~ David Smiley

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] Updated: (LUCENE-2831) Revise Weight#scorer & Filter#getDocIdSet API to pass Readers context

2011-01-05 Thread Simon Willnauer (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-2831?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Simon Willnauer updated LUCENE-2831:


Attachment: LUCENE-2831.patch

Updated to trunk and fixed some variable naming s/info/context

all tests pass

> Revise Weight#scorer & Filter#getDocIdSet API to pass Readers context
> -
>
> Key: LUCENE-2831
> URL: https://issues.apache.org/jira/browse/LUCENE-2831
> Project: Lucene - Java
>  Issue Type: Improvement
>  Components: Search
>Affects Versions: 4.0
>Reporter: Simon Willnauer
>Assignee: Simon Willnauer
> Fix For: 4.0
>
> Attachments: LUCENE-2831.patch, LUCENE-2831.patch, LUCENE-2831.patch, 
> LUCENE-2831.patch
>
>
> Spinoff from LUCENE-2694 - instead of passing a reader into Weight#scorer(IR, 
> boolean, boolean) we should / could revise the API and pass in a struct that 
> has parent reader, sub reader, ord of that sub. The ord mapping plus the 
> context with its parent would make several issues way easier. See 
> LUCENE-2694, LUCENE-2348 and LUCENE-2829 to name some.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] Commented: (LUCENE-2324) Per thread DocumentsWriters that write their own private segments

2011-01-05 Thread Jason Rutherglen (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-2324?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12977833#action_12977833
 ] 

Jason Rutherglen commented on LUCENE-2324:
--

Perhaps it's best to place the RAM tracking into FlushControl where the RAM
consumed by deleted query, terms, and added documents can be recorded, so that
the proper flush decision may be made in it, a central global object. To get 
this idea
working we'd need to implement LUCENE-2573 in FlushControl. I'll likely get
started on this.

> Per thread DocumentsWriters that write their own private segments
> -
>
> Key: LUCENE-2324
> URL: https://issues.apache.org/jira/browse/LUCENE-2324
> Project: Lucene - Java
>  Issue Type: Improvement
>  Components: Index
>Reporter: Michael Busch
>Assignee: Michael Busch
>Priority: Minor
> Fix For: Realtime Branch
>
> Attachments: LUCENE-2324-SMALL.patch, LUCENE-2324-SMALL.patch, 
> lucene-2324.patch, lucene-2324.patch, LUCENE-2324.patch, test.out
>
>
> See LUCENE-2293 for motivation and more details.
> I'm copying here Mike's summary he posted on 2293:
> Change the approach for how we buffer in RAM to a more isolated
> approach, whereby IW has N fully independent RAM segments
> in-process and when a doc needs to be indexed it's added to one of
> them. Each segment would also write its own doc stores and
> "normal" segment merging (not the inefficient merge we now do on
> flush) would merge them. This should be a good simplification in
> the chain (eg maybe we can remove the *PerThread classes). The
> segments can flush independently, letting us make much better
> concurrent use of IO & CPU.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] Commented: (SOLR-2155) Geospatial search using geohash prefixes

2011-01-05 Thread David Smiley (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-2155?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12977791#action_12977791
 ] 

David Smiley commented on SOLR-2155:


Bill, you can find examples here:  http://wiki.apache.org/solr/SpatialSearch   
In particular, look for the filter queries involving geofilt or bbox.

> Geospatial search using geohash prefixes
> 
>
> Key: SOLR-2155
> URL: https://issues.apache.org/jira/browse/SOLR-2155
> Project: Solr
>  Issue Type: Improvement
>Reporter: David Smiley
> Attachments: GeoHashPrefixFilter.patch
>
>
> There currently isn't a solution in Solr for doing geospatial filtering on 
> documents that have a variable number of points.  This scenario occurs when 
> there is location extraction (i.e. via a "gazateer") occurring on free text.  
> None, one, or many geospatial locations might be extracted from any given 
> document and users want to limit their search results to those occurring in a 
> user-specified area.
> I've implemented this by furthering the GeoHash based work in Lucene/Solr 
> with a geohash prefix based filter.  A geohash refers to a lat-lon box on the 
> earth.  Each successive character added further subdivides the box into a 4x8 
> (or 8x4 depending on the even/odd length of the geohash) grid.  The first 
> step in this scheme is figuring out which geohash grid squares cover the 
> user's search query.  I've added various extra methods to GeoHashUtils (and 
> added tests) to assist in this purpose.  The next step is an actual Lucene 
> Filter, GeoHashPrefixFilter, that uses these geohash prefixes in 
> TermsEnum.seek() to skip to relevant grid squares in the index.  Once a 
> matching geohash grid is found, the points therein are compared against the 
> user's query to see if it matches.  I created an abstraction GeoShape 
> extended by subclasses named PointDistance... and CartesianBox to support 
> different queried shapes so that the filter need not care about these details.
> This work was presented at LuceneRevolution in Boston on October 8th.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] Updated: (LUCENE-2657) Replace Maven POM templates with full POMs, and change documentation accordingly

2011-01-05 Thread Steven Rowe (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-2657?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Steven Rowe updated LUCENE-2657:


Attachment: LUCENE-2657.patch

Removed {{lucene/contrib/remote/}} (LUCENE-2837)

> Replace Maven POM templates with full POMs, and change documentation 
> accordingly
> 
>
> Key: LUCENE-2657
> URL: https://issues.apache.org/jira/browse/LUCENE-2657
> Project: Lucene - Java
>  Issue Type: Improvement
>  Components: Build
>Affects Versions: 3.1, 4.0
>Reporter: Steven Rowe
>Assignee: Steven Rowe
> Fix For: 3.1, 4.0
>
> Attachments: LUCENE-2657.patch, LUCENE-2657.patch, LUCENE-2657.patch, 
> LUCENE-2657.patch, LUCENE-2657.patch, LUCENE-2657.patch, LUCENE-2657.patch, 
> LUCENE-2657.patch, LUCENE-2657.patch
>
>
> The current Maven POM templates only contain dependency information, the bare 
> bones necessary for uploading artifacts to the Maven repository.
> The full Maven POMs in the attached patch include the information necessary 
> to run a multi-module Maven build, in addition to serving the same purpose as 
> the current POM templates.
> Several dependencies are not available through public maven repositories.  A 
> profile in the top-level POM can be activated to install these dependencies 
> from the various {{lib/}} directories into your local repository.  From the 
> top-level directory:
> {code}
> mvn -N -Pbootstrap install
> {code}
> Once these non-Maven dependencies have been installed, to run all Lucene/Solr 
> tests via Maven's surefire plugin, and populate your local repository with 
> all artifacts, from the top level directory, run:
> {code}
> mvn install
> {code}
> When one Lucene/Solr module depends on another, the dependency is declared on 
> the *artifact(s)* produced by the other module and deposited in your local 
> repository, rather than on the other module's un-jarred compiler output in 
> the {{build/}} directory, so you must run {{mvn install}} on the other module 
> before its changes are visible to the module that depends on it.
> To create all the artifacts without running tests:
> {code}
> mvn -DskipTests install
> {code}
> I almost always include the {{clean}} phase when I do a build, e.g.:
> {code}
> mvn -DskipTests clean install
> {code}

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] Commented: (LUCENE-2837) Collapse Searcher/Searchable/IndexSearcher; remove contrib/remote; merge PMS into IndexSearcher

2011-01-05 Thread Steven Rowe (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-2837?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12977784#action_12977784
 ] 

Steven Rowe commented on LUCENE-2837:
-

{quote}
This broke ant target 'eclipse' - just fixed it (remove the 'remote' dir).
Probably the same is needed also for "Idea" but I'm not sure how to do this. 
{quote}

Done:  r1055474

> Collapse Searcher/Searchable/IndexSearcher; remove contrib/remote; merge PMS 
> into IndexSearcher
> ---
>
> Key: LUCENE-2837
> URL: https://issues.apache.org/jira/browse/LUCENE-2837
> Project: Lucene - Java
>  Issue Type: Improvement
>  Components: Search
>Reporter: Michael McCandless
> Fix For: 4.0
>
> Attachments: LUCENE-2837.patch, LUCENE-2837.patch
>
>
> We've discussed cleaning up our *Searcher stack for some time... I
> think we should try to do this before releasing 4.0.
> So I'm attaching an initial patch which:
>   * Removes Searcher, Searchable, absorbing all their methods into 
> IndexSearcher
>   * Removes contrib/remote
>   * Removes MultiSearcher
>   * Absorbs ParallelMultiSearcher into IndexSearcher (ie you can now
> pass useThreads=true, or a custom ES to the ctor)
> The patch is rough -- I just ripped stuff out, did search/replace to
> IndexSearcher, etc.  EG nothing is directly testing using threads with
> IndexSearcher, but before committing I think we should add a
> newSearcher to LuceneTestCase, which randomly chooses whether the
> searcher uses threads, and cutover tests to use this instead of making
> their own IndexSearcher.
> I think MultiSearcher has a useful purpose, but as it is today it's
> too low-level, eg it shouldn't be involved in rewriting queries: the
> Query.combine method is scary.  Maybe in its place we make a higher
> level class, with limited API, that's able to federate search across
> multiple IndexSearchers?  It'd also be able to optionally use thread
> per IndexSearcher.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] Updated: (LUCENE-2831) Revise Weight#scorer & Filter#getDocIdSet API to pass Readers context

2011-01-05 Thread Simon Willnauer (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-2831?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Simon Willnauer updated LUCENE-2831:


Attachment: LUCENE-2831.patch

next iteration, this time I think we are very close.
- I renamed AtomicContext to AtomicReaderContext and likewise for 
CompositeContext
- s/topLevelReaderContext/getTopReaderContext
- updated to latest trunk and adopted the changes to IS in LUCENE-2837
- Removed the dummy searcher in QueryWrapperFilter which now works just fine 
with a IS instance
- added ReaderContext ctors to IS
- replaced some members in IS in favor of AtomicReader[] leaves <--- leafs :)
- s/leafs/leaves
- Sharpened the JDocs in Weight - review please
- added missing JDocs to IR, IS & ReaderContext + subs

{quote}
Should Filter.getDocIDSet take an AtomicReaderContext? We don't have
to do that in this patch, though... this patch is a big enough first
step!
{quote}

Yeah I would like to do so, similar to Weight#scorer but currently mainly solr 
prevents us from this. There is also the functionqueries that still operate on 
IR instead of ReaderContext but maybe this is a good usecase to consolidate 
them move them into a module and get them out of core?!
Anyway, we should do this in a different issue - this has its purpose as you 
stated.
Likewise I would do issues for CachingWrapperFilter & DuplcateFilter though.

simon


> Revise Weight#scorer & Filter#getDocIdSet API to pass Readers context
> -
>
> Key: LUCENE-2831
> URL: https://issues.apache.org/jira/browse/LUCENE-2831
> Project: Lucene - Java
>  Issue Type: Improvement
>  Components: Search
>Affects Versions: 4.0
>Reporter: Simon Willnauer
>Assignee: Simon Willnauer
> Fix For: 4.0
>
> Attachments: LUCENE-2831.patch, LUCENE-2831.patch, LUCENE-2831.patch
>
>
> Spinoff from LUCENE-2694 - instead of passing a reader into Weight#scorer(IR, 
> boolean, boolean) we should / could revise the API and pass in a struct that 
> has parent reader, sub reader, ord of that sub. The ord mapping plus the 
> context with its parent would make several issues way easier. See 
> LUCENE-2694, LUCENE-2348 and LUCENE-2829 to name some.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



Re: top-level README or similar?

2011-01-05 Thread Doron Cohen
> Maybe because i have a fast internet connection and its easier for me
> to checkout a clean svn area for each issue (am I the only one that
> does this?)
>
>
I used the 'eclipse' target several times already, thanking you each time
for adding it!

So I prefer the short names.

BTW in the proposed Readme perhaps worth to mention that it assumes Java 6
as the IDE default JVM otherwise need to change it specifically for the
project after each use of this target.


[jira] Commented: (LUCENE-2821) FilterManager starts threads with no way to stop, and should be in contrib/remote, not core

2011-01-05 Thread Robert Muir (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-2821?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12977765#action_12977765
 ] 

Robert Muir commented on LUCENE-2821:
-

Since contrib/remote is now gone i want to:
* deprecate this functionality in 3.x, with the wordage "use your own 
LinkedHashmap", but apply this patch to clean it up.
* remove this class in trunk.

> FilterManager starts threads with no way to stop, and should be in 
> contrib/remote, not core
> ---
>
> Key: LUCENE-2821
> URL: https://issues.apache.org/jira/browse/LUCENE-2821
> Project: Lucene - Java
>  Issue Type: Bug
>Reporter: Robert Muir
> Attachments: LUCENE-2821.patch
>
>
> See the warning produced by contrib/remote's tests.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] Commented: (LUCENE-2837) Collapse Searcher/Searchable/IndexSearcher; remove contrib/remote; merge PMS into IndexSearcher

2011-01-05 Thread Robert Muir (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-2837?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12977764#action_12977764
 ] 

Robert Muir commented on LUCENE-2837:
-

Hello, I think we should revisit branch_3x here. I'm not asking for a backport 
but i think we should do some targeted javadoc+deprecations:
* I think we should deprecate Searcher, suggesting to use IndexSearcher 
instead. Searcher->IndexSearcher is probably the only way 
this change will affect 99% of users and so this fixes that case, as most users 
make a simple change to their code.
* We could add some wordage to MultiSearcher, such as 'if you are making a MS 
of IS you might want to consider MR instead'. This would
be nice since we are still going to have the lurking combine() bug, at least 
people then know that MR is recommended.
* i think it would be nice to add something to contrib/remote so users expect 
to change their code? Personally I think we should deprecate.
Deprecate doesn't mean there has to be a 1-1 replacement... sometimes we 
re-think and it wasnt the best idea all along. But deprecation
helps alert anyone using it so they wont be surprised with 4.0

> Collapse Searcher/Searchable/IndexSearcher; remove contrib/remote; merge PMS 
> into IndexSearcher
> ---
>
> Key: LUCENE-2837
> URL: https://issues.apache.org/jira/browse/LUCENE-2837
> Project: Lucene - Java
>  Issue Type: Improvement
>  Components: Search
>Reporter: Michael McCandless
> Fix For: 4.0
>
> Attachments: LUCENE-2837.patch, LUCENE-2837.patch
>
>
> We've discussed cleaning up our *Searcher stack for some time... I
> think we should try to do this before releasing 4.0.
> So I'm attaching an initial patch which:
>   * Removes Searcher, Searchable, absorbing all their methods into 
> IndexSearcher
>   * Removes contrib/remote
>   * Removes MultiSearcher
>   * Absorbs ParallelMultiSearcher into IndexSearcher (ie you can now
> pass useThreads=true, or a custom ES to the ctor)
> The patch is rough -- I just ripped stuff out, did search/replace to
> IndexSearcher, etc.  EG nothing is directly testing using threads with
> IndexSearcher, but before committing I think we should add a
> newSearcher to LuceneTestCase, which randomly chooses whether the
> searcher uses threads, and cutover tests to use this instead of making
> their own IndexSearcher.
> I think MultiSearcher has a useful purpose, but as it is today it's
> too low-level, eg it shouldn't be involved in rewriting queries: the
> Query.combine method is scary.  Maybe in its place we make a higher
> level class, with limited API, that's able to federate search across
> multiple IndexSearchers?  It'd also be able to optionally use thread
> per IndexSearcher.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



Re: [jira] Commented: (LUCENE-2837) Collapse Searcher/Searchable/IndexSearcher; remove contrib/remote; merge PMS into IndexSearcher

2011-01-05 Thread Simon Willnauer
On Wed, Jan 5, 2011 at 1:18 PM, Doron Cohen (JIRA)  wrote:
>
>    [ 
> https://issues.apache.org/jira/browse/LUCENE-2837?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12977753#action_12977753
>  ]
>
> Doron Cohen commented on LUCENE-2837:
> -
>
> This broke ant target 'eclipse' - just fixed it (remove the 'remote' dir).
> Probably the same is needed also for "Idea" but I'm not sure how to do this.

bye bye remote :)

simon
>
>> Collapse Searcher/Searchable/IndexSearcher; remove contrib/remote; merge PMS 
>> into IndexSearcher
>> ---
>>
>>                 Key: LUCENE-2837
>>                 URL: https://issues.apache.org/jira/browse/LUCENE-2837
>>             Project: Lucene - Java
>>          Issue Type: Improvement
>>          Components: Search
>>            Reporter: Michael McCandless
>>             Fix For: 4.0
>>
>>         Attachments: LUCENE-2837.patch, LUCENE-2837.patch
>>
>>
>> We've discussed cleaning up our *Searcher stack for some time... I
>> think we should try to do this before releasing 4.0.
>> So I'm attaching an initial patch which:
>>   * Removes Searcher, Searchable, absorbing all their methods into 
>> IndexSearcher
>>   * Removes contrib/remote
>>   * Removes MultiSearcher
>>   * Absorbs ParallelMultiSearcher into IndexSearcher (ie you can now
>>     pass useThreads=true, or a custom ES to the ctor)
>> The patch is rough -- I just ripped stuff out, did search/replace to
>> IndexSearcher, etc.  EG nothing is directly testing using threads with
>> IndexSearcher, but before committing I think we should add a
>> newSearcher to LuceneTestCase, which randomly chooses whether the
>> searcher uses threads, and cutover tests to use this instead of making
>> their own IndexSearcher.
>> I think MultiSearcher has a useful purpose, but as it is today it's
>> too low-level, eg it shouldn't be involved in rewriting queries: the
>> Query.combine method is scary.  Maybe in its place we make a higher
>> level class, with limited API, that's able to federate search across
>> multiple IndexSearchers?  It'd also be able to optionally use thread
>> per IndexSearcher.
>
> --
> This message is automatically generated by JIRA.
> -
> You can reply to this email to add a comment to the issue online.
>
>
> -
> To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
> For additional commands, e-mail: dev-h...@lucene.apache.org
>
>

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] Commented: (LUCENE-2837) Collapse Searcher/Searchable/IndexSearcher; remove contrib/remote; merge PMS into IndexSearcher

2011-01-05 Thread Doron Cohen (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-2837?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12977753#action_12977753
 ] 

Doron Cohen commented on LUCENE-2837:
-

This broke ant target 'eclipse' - just fixed it (remove the 'remote' dir).
Probably the same is needed also for "Idea" but I'm not sure how to do this.

> Collapse Searcher/Searchable/IndexSearcher; remove contrib/remote; merge PMS 
> into IndexSearcher
> ---
>
> Key: LUCENE-2837
> URL: https://issues.apache.org/jira/browse/LUCENE-2837
> Project: Lucene - Java
>  Issue Type: Improvement
>  Components: Search
>Reporter: Michael McCandless
> Fix For: 4.0
>
> Attachments: LUCENE-2837.patch, LUCENE-2837.patch
>
>
> We've discussed cleaning up our *Searcher stack for some time... I
> think we should try to do this before releasing 4.0.
> So I'm attaching an initial patch which:
>   * Removes Searcher, Searchable, absorbing all their methods into 
> IndexSearcher
>   * Removes contrib/remote
>   * Removes MultiSearcher
>   * Absorbs ParallelMultiSearcher into IndexSearcher (ie you can now
> pass useThreads=true, or a custom ES to the ctor)
> The patch is rough -- I just ripped stuff out, did search/replace to
> IndexSearcher, etc.  EG nothing is directly testing using threads with
> IndexSearcher, but before committing I think we should add a
> newSearcher to LuceneTestCase, which randomly chooses whether the
> searcher uses threads, and cutover tests to use this instead of making
> their own IndexSearcher.
> I think MultiSearcher has a useful purpose, but as it is today it's
> too low-level, eg it shouldn't be involved in rewriting queries: the
> Query.combine method is scary.  Maybe in its place we make a higher
> level class, with limited API, that's able to federate search across
> multiple IndexSearchers?  It'd also be able to optionally use thread
> per IndexSearcher.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] Resolved: (LUCENE-2837) Collapse Searcher/Searchable/IndexSearcher; remove contrib/remote; merge PMS into IndexSearcher

2011-01-05 Thread Michael McCandless (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-2837?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Michael McCandless resolved LUCENE-2837.


Resolution: Fixed

> Collapse Searcher/Searchable/IndexSearcher; remove contrib/remote; merge PMS 
> into IndexSearcher
> ---
>
> Key: LUCENE-2837
> URL: https://issues.apache.org/jira/browse/LUCENE-2837
> Project: Lucene - Java
>  Issue Type: Improvement
>  Components: Search
>Reporter: Michael McCandless
> Fix For: 4.0
>
> Attachments: LUCENE-2837.patch, LUCENE-2837.patch
>
>
> We've discussed cleaning up our *Searcher stack for some time... I
> think we should try to do this before releasing 4.0.
> So I'm attaching an initial patch which:
>   * Removes Searcher, Searchable, absorbing all their methods into 
> IndexSearcher
>   * Removes contrib/remote
>   * Removes MultiSearcher
>   * Absorbs ParallelMultiSearcher into IndexSearcher (ie you can now
> pass useThreads=true, or a custom ES to the ctor)
> The patch is rough -- I just ripped stuff out, did search/replace to
> IndexSearcher, etc.  EG nothing is directly testing using threads with
> IndexSearcher, but before committing I think we should add a
> newSearcher to LuceneTestCase, which randomly chooses whether the
> searcher uses threads, and cutover tests to use this instead of making
> their own IndexSearcher.
> I think MultiSearcher has a useful purpose, but as it is today it's
> too low-level, eg it shouldn't be involved in rewriting queries: the
> Query.combine method is scary.  Maybe in its place we make a higher
> level class, with limited API, that's able to federate search across
> multiple IndexSearchers?  It'd also be able to optionally use thread
> per IndexSearcher.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



Lucene-Solr-tests-only-3.x - Build # 3385 - Failure

2011-01-05 Thread Apache Hudson Server
Build: https://hudson.apache.org/hudson/job/Lucene-Solr-tests-only-3.x/3385/

1 tests failed.
REGRESSION:  org.apache.lucene.search.TestThreadSafe.testLazyLoadThreadSafety

Error Message:
unable to create new native thread

Stack Trace:
java.lang.OutOfMemoryError: unable to create new native thread
at java.lang.Thread.start0(Native Method)
at java.lang.Thread.start(Thread.java:614)
at 
org.apache.lucene.search.TestThreadSafe.doTest(TestThreadSafe.java:133)
at 
org.apache.lucene.search.TestThreadSafe.testLazyLoadThreadSafety(TestThreadSafe.java:152)
at 
org.apache.lucene.util.LuceneTestCase.runBare(LuceneTestCase.java:255)




Build Log (for compile errors):
[...truncated 8551 lines...]



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



WARNING: re-index all Lucene trunk indices

2011-01-05 Thread Michael McCandless
If you are using Lucene's trunk (to be 4.0) builds, read on...

I just committed LUCENE-2843, which is a hard break on the index file format.

If you are living on Lucene's trunk then you have to remove any
previously created indices and re-index, after updating.

The change cuts over to a more RAM efficient and faster terms index
implementation, using FSTs (finite state transducers) to hold the term
index data.

Mike

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] Resolved: (LUCENE-2843) Add variable-gap terms index impl.

2011-01-05 Thread Michael McCandless (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-2843?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Michael McCandless resolved LUCENE-2843.


Resolution: Fixed

> Add variable-gap terms index impl.
> --
>
> Key: LUCENE-2843
> URL: https://issues.apache.org/jira/browse/LUCENE-2843
> Project: Lucene - Java
>  Issue Type: Improvement
>  Components: Index
>Reporter: Michael McCandless
>Assignee: Michael McCandless
> Fix For: 4.0
>
> Attachments: LUCENE-2843.patch, LUCENE-2843.patch
>
>
> PrefixCodedTermsReader/Writer (used by all "real" core codecs) already
> supports pluggable terms index impls.
> The only impl we have now is FixedGapTermsIndexReader/Writer, which
> picks every Nth (default 32) term and holds it in efficient packed
> int/byte arrays in RAM.  This is already an enormous improvement (RAM
> reduction, init time) over 3.x.
> This patch adds another impl, VariableGapTermsIndexReader/Writer,
> which lets you specify an arbitrary IndexTermSelector to pick which
> terms are indexed, and then uses an FST to hold the indexed terms.
> This is typically even more memory efficient than packed int/byte
> arrays, though, it does not support ord() so it's not quite a fair
> comparison.
> I had to relax the terms index plugin api for
> PrefixCodedTermsReader/Writer to not assume that the terms index impl
> supports ord.
> I also did some cleanup of the FST/FSTEnum APIs and impls, and broke
> out separate seekCeil and seekFloor in FSTEnum.  Eg we need seekFloor
> when the FST is used as a terms index but seekCeil when it's holding
> all terms in the index (ie which SimpleText uses FSTs for).

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] Commented: (LUCENE-2831) Revise Weight#scorer & Filter#getDocIdSet API to pass Readers context

2011-01-05 Thread Michael McCandless (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-2831?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12977719#action_12977719
 ] 

Michael McCandless commented on LUCENE-2831:


Looks good Simon!  Random comments...

Maybe rename AtomicContext -> AtomicReaderContext?  And same for
CompositeContext?

Should Filter.getDocIDSet take an AtomicReaderContext?  We don't have
to do that in this patch, though... this patch is a big enough first
step!

Leafes -> Leaves

Maybe IR.getTopReaderContext() instead of IR.topLevelReaderContext()?
(Or .getRootReaderContext()?).

I agree this should eventually subsume
.getSequentialReaders... though, we probably should change IR base
method to return null not throw UOE, if so (until we succeed in
statically typing composite vs atomic readers...).

I think we can change the expert IndexSearcher ctor that takes the
forced subReaders to instead take a root ReaderContext?  In fact,
maybe we can remove it altogether?  It was added to avoid the
"relatively costly" gatherSubReaders that IS does if you just pass it
an IR, but, we are now fixing that w/ this issue, by having IR cache
the root ReaderContext...

If we did that could we go back to having QueryWrapperFilter just make
an IndexSearcher?

Do we really need forceLeafs()?  Can't QueryWrapperFilter make a
MultiReader holding just its atomic IR and pass that to IS?  And then
we can remove the AtomicContext ctor that takes a "naked" atomic
reader?

QueryWrapperFilter's WeightOnlyearcher should be WeightOnlySearcher.


> Revise Weight#scorer & Filter#getDocIdSet API to pass Readers context
> -
>
> Key: LUCENE-2831
> URL: https://issues.apache.org/jira/browse/LUCENE-2831
> Project: Lucene - Java
>  Issue Type: Improvement
>  Components: Search
>Affects Versions: 4.0
>Reporter: Simon Willnauer
>Assignee: Simon Willnauer
> Fix For: 4.0
>
> Attachments: LUCENE-2831.patch, LUCENE-2831.patch
>
>
> Spinoff from LUCENE-2694 - instead of passing a reader into Weight#scorer(IR, 
> boolean, boolean) we should / could revise the API and pass in a struct that 
> has parent reader, sub reader, ord of that sub. The ord mapping plus the 
> context with its parent would make several issues way easier. See 
> LUCENE-2694, LUCENE-2348 and LUCENE-2829 to name some.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] Commented: (LUCENE-2837) Collapse Searcher/Searchable/IndexSearcher; remove contrib/remote; merge PMS into IndexSearcher

2011-01-05 Thread Michael McCandless (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-2837?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12977717#action_12977717
 ] 

Michael McCandless commented on LUCENE-2837:


Good point Simon -- I'll add that warning to the jdoc.

> Collapse Searcher/Searchable/IndexSearcher; remove contrib/remote; merge PMS 
> into IndexSearcher
> ---
>
> Key: LUCENE-2837
> URL: https://issues.apache.org/jira/browse/LUCENE-2837
> Project: Lucene - Java
>  Issue Type: Improvement
>  Components: Search
>Reporter: Michael McCandless
> Fix For: 4.0
>
> Attachments: LUCENE-2837.patch, LUCENE-2837.patch
>
>
> We've discussed cleaning up our *Searcher stack for some time... I
> think we should try to do this before releasing 4.0.
> So I'm attaching an initial patch which:
>   * Removes Searcher, Searchable, absorbing all their methods into 
> IndexSearcher
>   * Removes contrib/remote
>   * Removes MultiSearcher
>   * Absorbs ParallelMultiSearcher into IndexSearcher (ie you can now
> pass useThreads=true, or a custom ES to the ctor)
> The patch is rough -- I just ripped stuff out, did search/replace to
> IndexSearcher, etc.  EG nothing is directly testing using threads with
> IndexSearcher, but before committing I think we should add a
> newSearcher to LuceneTestCase, which randomly chooses whether the
> searcher uses threads, and cutover tests to use this instead of making
> their own IndexSearcher.
> I think MultiSearcher has a useful purpose, but as it is today it's
> too low-level, eg it shouldn't be involved in rewriting queries: the
> Query.combine method is scary.  Maybe in its place we make a higher
> level class, with limited API, that's able to federate search across
> multiple IndexSearchers?  It'd also be able to optionally use thread
> per IndexSearcher.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] Commented: (LUCENE-2837) Collapse Searcher/Searchable/IndexSearcher; remove contrib/remote; merge PMS into IndexSearcher

2011-01-05 Thread Simon Willnauer (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-2837?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12977677#action_12977677
 ] 

Simon Willnauer commented on LUCENE-2837:
-

Mike that patch looks good to me. I just have one small comment about the 
executor service. You stated that the user has to shutdown the service upon 
IS#close(). This is absolutely the recommended way but I see a little risk for 
people calling ExecutorService#shutdownNow()  which interrupts the executing 
threads and can cause a AlreadyClosedException down in one of our NIO 
Directories if there are still searches going on etc. I don't think this is 
super important but I would point it out in the JDoc to give folks a heads-up. 
Thoughts?

simon

> Collapse Searcher/Searchable/IndexSearcher; remove contrib/remote; merge PMS 
> into IndexSearcher
> ---
>
> Key: LUCENE-2837
> URL: https://issues.apache.org/jira/browse/LUCENE-2837
> Project: Lucene - Java
>  Issue Type: Improvement
>  Components: Search
>Reporter: Michael McCandless
> Fix For: 4.0
>
> Attachments: LUCENE-2837.patch, LUCENE-2837.patch
>
>
> We've discussed cleaning up our *Searcher stack for some time... I
> think we should try to do this before releasing 4.0.
> So I'm attaching an initial patch which:
>   * Removes Searcher, Searchable, absorbing all their methods into 
> IndexSearcher
>   * Removes contrib/remote
>   * Removes MultiSearcher
>   * Absorbs ParallelMultiSearcher into IndexSearcher (ie you can now
> pass useThreads=true, or a custom ES to the ctor)
> The patch is rough -- I just ripped stuff out, did search/replace to
> IndexSearcher, etc.  EG nothing is directly testing using threads with
> IndexSearcher, but before committing I think we should add a
> newSearcher to LuceneTestCase, which randomly chooses whether the
> searcher uses threads, and cutover tests to use this instead of making
> their own IndexSearcher.
> I think MultiSearcher has a useful purpose, but as it is today it's
> too low-level, eg it shouldn't be involved in rewriting queries: the
> Query.combine method is scary.  Maybe in its place we make a higher
> level class, with limited API, that's able to federate search across
> multiple IndexSearchers?  It'd also be able to optionally use thread
> per IndexSearcher.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org