from:"Michael McCandless \(Resolved\) \(JIRA\)"

[jira] [Resolved] (LUCENE-3970) Rename getUnique[Field/Terms]Count() into size()

2012-04-10 Thread Michael McCandless (Resolved) (JIRA)


 [ 
https://issues.apache.org/jira/browse/LUCENE-3970?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Michael McCandless resolved LUCENE-3970.


Resolution: Fixed

Thanks Iulius!

> Rename getUnique[Field/Terms]Count() into size()
> 
>
> Key: LUCENE-3970
> URL: https://issues.apache.org/jira/browse/LUCENE-3970
> Project: Lucene - Java
>  Issue Type: Task
>  Components: core/index
>Reporter: Iulius Curt
>Assignee: Michael McCandless
>Priority: Minor
> Fix For: 4.0
>
> Attachments: LUCENE-3970.patch
>
>
> Like Robert Muir said in LUCENE-3109:
> {quote}Also I think there are other improvements we can do here that would be 
> more natural:
> Fields.getUniqueFieldCount() -> Fields.size()
> Terms.getUniqueTermCount() -> Terms.size(){quote}
> I believe this dramatically improves understandability (way less 'scary', 
> actually beautiful).

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Resolved] (LUCENE-3942) SynonymFilter should set pos length att

2012-04-08 Thread Michael McCandless (Resolved) (JIRA)


 [ 
https://issues.apache.org/jira/browse/LUCENE-3942?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Michael McCandless resolved LUCENE-3942.


Resolution: Fixed

> SynonymFilter should set pos length att
> ---
>
> Key: LUCENE-3942
> URL: https://issues.apache.org/jira/browse/LUCENE-3942
> Project: Lucene - Java
>  Issue Type: Improvement
>Reporter: Michael McCandless
>Assignee: Michael McCandless
> Fix For: 4.0
>
> Attachments: LUCENE-3942.patch
>
>
> Tokenizers/Filters can now produce graphs instead of a single linear
> chain of tokens, by setting the PositionLengthAttribute, expressing
> where (how many positions ahead) this token "ends".
> The default is 1, meaning it ends at the next position, to be
> backwards compatible.
> SynonymFilter produces graph output tokens, as long as the output is a
> single token, but currently never sets the pos length to express this.
> EG for the rule "wifi network -> hotspot", the hotspot token should
> have pos length = 2.  With LUCENE-3940 this will allow us to verify
> that the offsets for such tokens are correct...

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Resolved] (LUCENE-3940) When Japanese (Kuromoji) tokenizer removes a punctuation token it should leave a hole

2012-04-08 Thread Michael McCandless (Resolved) (JIRA)


 [ 
https://issues.apache.org/jira/browse/LUCENE-3940?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Michael McCandless resolved LUCENE-3940.


Resolution: Fixed

> When Japanese (Kuromoji) tokenizer removes a punctuation token it should 
> leave a hole
> -
>
> Key: LUCENE-3940
> URL: https://issues.apache.org/jira/browse/LUCENE-3940
> Project: Lucene - Java
>  Issue Type: Bug
>Reporter: Michael McCandless
>Assignee: Michael McCandless
>Priority: Minor
> Fix For: 4.0
>
> Attachments: LUCENE-3940.patch, LUCENE-3940.patch, LUCENE-3940.patch, 
> LUCENE-3940.patch
>
>
> I modified BaseTokenStreamTestCase to assert that the start/end
> offsets match for graph (posLen > 1) tokens, and this caught a bug in
> Kuromoji when the decompounding of a compound token has a punctuation
> token that's dropped.
> In this case we should leave hole(s) so that the graph is intact, ie,
> the graph should look the same as if the punctuation tokens were not
> initially removed, but then a StopFilter had removed them.
> This also affects tokens that have no compound over them, ie we fail
> to leave a hole today when we remove the punctuation tokens.
> I'm not sure this is serious enough to warrant fixing in 3.6 at the
> last minute...

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Resolved] (LUCENE-3932) Improve load time of .tii files

2012-04-08 Thread Michael McCandless (Resolved) (JIRA)


 [ 
https://issues.apache.org/jira/browse/LUCENE-3932?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Michael McCandless resolved LUCENE-3932.


   Resolution: Fixed
Fix Version/s: 4.0
 Assignee: Michael McCandless

> Improve load time of .tii files
> ---
>
> Key: LUCENE-3932
> URL: https://issues.apache.org/jira/browse/LUCENE-3932
> Project: Lucene - Java
>  Issue Type: Improvement
>Affects Versions: 3.5
> Environment: Linux
>Reporter: Sean Bridges
>Assignee: Michael McCandless
> Fix For: 4.0
>
> Attachments: LUCENE-3932.trunk.patch, perf.csv
>
>
> We have a large 50 gig index which is optimized as one segment, with a 66 MEG 
> .tii file.  This index has no norms, and no field cache.
> It takes about 5 seconds to load this index, profiling reveals that 60% of 
> the time is spent in GrowableWriter.set(index, value), and most of time in 
> set(...) is spent resizing PackedInts.Mutatable current.
> In the constructor for TermInfosReaderIndex, you initialize the writer with 
> the line,
> {quote}GrowableWriter indexToTerms = new GrowableWriter(4, indexSize, 
> false);{quote}
> For our index using four as the bit estimate results in 27 resizes.
> The last value in indexToTerms is going to be ~ tiiFileLength, and if instead 
> you use,
> {quote}int bitEstimate = (int) Math.ceil(Math.log10(tiiFileLength) / 
> Math.log10(2));
> GrowableWriter indexToTerms = new GrowableWriter(bitEstimate, indexSize, 
> false);{quote}
> Load time improves to ~ 2 seconds.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Resolved] (LUCENE-3966) smokeTestRelease should accept a local (file://) staging URL

2012-04-08 Thread Michael McCandless (Resolved) (JIRA)


 [ 
https://issues.apache.org/jira/browse/LUCENE-3966?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Michael McCandless resolved LUCENE-3966.


   Resolution: Fixed
Fix Version/s: 4.0

> smokeTestRelease should accept a local (file://) staging URL
> 
>
> Key: LUCENE-3966
> URL: https://issues.apache.org/jira/browse/LUCENE-3966
> Project: Lucene - Java
>  Issue Type: Improvement
>Reporter: Michael McCandless
>Assignee: Michael McCandless
> Fix For: 4.0
>
> Attachments: LUCENE-3966.patch
>
>
> I'll also fix buildAndPushRelease so it can push to a local URL; this way at 
> any time we can build, push to local staging, and run smoke tester on it, and 
> hopefully nothing fails...
> But really any tests in smoke tester should ideally be pushed back earlier in 
> our dev process (into jenkins, into "ant test").

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Resolved] (LUCENE-3109) Rename FieldsConsumer to InvertedFieldsConsumer

2012-04-08 Thread Michael McCandless (Resolved) (JIRA)


 [ 
https://issues.apache.org/jira/browse/LUCENE-3109?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Michael McCandless resolved LUCENE-3109.


Resolution: Fixed
  Assignee: Michael McCandless

Thanks Iulius!

> Rename FieldsConsumer to InvertedFieldsConsumer
> ---
>
> Key: LUCENE-3109
> URL: https://issues.apache.org/jira/browse/LUCENE-3109
> Project: Lucene - Java
>  Issue Type: Task
>  Components: core/codecs
>Affects Versions: 4.0
>Reporter: Simon Willnauer
>Assignee: Michael McCandless
>Priority: Minor
> Fix For: 4.0
>
> Attachments: LUCENE-3109.patch, LUCENE-3109.patch, LUCENE-3109.patch, 
> LUCENE-3109.patch, LUCENE-3109.patch
>
>
> The name FieldsConsumer is missleading here it really is an 
> InvertedFieldsConsumer and since we are extending codecs to consume 
> non-inverted Fields we should be clear here. Same applies to Fields.java as 
> well as FieldsProducer.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Resolved] (LUCENE-3873) tie MockGraphTokenFilter into all analyzers tests

2012-04-07 Thread Michael McCandless (Resolved) (JIRA)


 [ 
https://issues.apache.org/jira/browse/LUCENE-3873?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Michael McCandless resolved LUCENE-3873.


   Resolution: Fixed
Fix Version/s: 4.0

> tie MockGraphTokenFilter into all analyzers tests
> -
>
> Key: LUCENE-3873
> URL: https://issues.apache.org/jira/browse/LUCENE-3873
> Project: Lucene - Java
>  Issue Type: Task
>  Components: modules/analysis
>Reporter: Robert Muir
>Assignee: Michael McCandless
> Fix For: 4.0
>
> Attachments: LUCENE-3873.patch, LUCENE-3873.patch
>
>
> Mike made a MockGraphTokenFilter on LUCENE-3848.
> Many filters currently arent tested with anything but a simple tokenstream.
> we should test them with this, too, it might find bugs (zero-length terms,
> stacked terms/synonyms, etc)

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Resolved] (LUCENE-3955) smokeTestRelease should test solr example

2012-04-06 Thread Michael McCandless (Resolved) (JIRA)


 [ 
https://issues.apache.org/jira/browse/LUCENE-3955?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Michael McCandless resolved LUCENE-3955.


Resolution: Fixed

This was fixed w/ SOLR-3331.

> smokeTestRelease should test solr example
> -
>
> Key: LUCENE-3955
> URL: https://issues.apache.org/jira/browse/LUCENE-3955
> Project: Lucene - Java
>  Issue Type: Improvement
>  Components: general/build
>Reporter: Robert Muir
>Assignee: Michael McCandless
> Fix For: 4.0
>
>
> I think most anyone reviewing the solr artifacts will do this,
> so really the RM has to do it manually:
> but we can test 'ant example' from the source dist + java -jar start.jar from 
> solr/example
> (or/and 'ant run-example'), and also java -jar start.jar from the binary 
> distribution.
> some basic checks we can do are to run the test_utf8.sh, and to index the 
> example docs 
> (post.jar/post.sh the docs in exampledocs) then do a search.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Resolved] (LUCENE-3905) BaseTokenStreamTestCase should test analyzers on real-ish content

2012-03-23 Thread Michael McCandless (Resolved) (JIRA)


 [ 
https://issues.apache.org/jira/browse/LUCENE-3905?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Michael McCandless resolved LUCENE-3905.


   Resolution: Fixed
Fix Version/s: 4.0
   3.6

> BaseTokenStreamTestCase should test analyzers on real-ish content
> -
>
> Key: LUCENE-3905
> URL: https://issues.apache.org/jira/browse/LUCENE-3905
> Project: Lucene - Java
>  Issue Type: Improvement
>Reporter: Michael McCandless
> Fix For: 3.6, 4.0
>
> Attachments: LUCENE-3905.patch
>
>
> We already have LineFileDocs, that pulls content generated from europarl or 
> wikipedia... I think sometimes BTSTC should test the analyzers on that as 
> well.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Resolved] (LUCENE-3898) possible SynonymFilter bug: hudson fail

2012-03-21 Thread Michael McCandless (Resolved) (JIRA)


 [ 
https://issues.apache.org/jira/browse/LUCENE-3898?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Michael McCandless resolved LUCENE-3898.


   Resolution: Fixed
Fix Version/s: 4.0
   3.6

I think this is fixed...

> possible SynonymFilter bug: hudson fail
> ---
>
> Key: LUCENE-3898
> URL: https://issues.apache.org/jira/browse/LUCENE-3898
> Project: Lucene - Java
>  Issue Type: Bug
>  Components: modules/analysis
>Reporter: Robert Muir
>Assignee: Michael McCandless
> Fix For: 3.6, 4.0
>
>
> See https://builds.apache.org/job/Lucene-trunk/1867/consoleText (no seed)

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Resolved] (LUCENE-3894) Make BaseTokenStreamTestCase a bit more evil

2012-03-20 Thread Michael McCandless (Resolved) (JIRA)


 [ 
https://issues.apache.org/jira/browse/LUCENE-3894?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Michael McCandless resolved LUCENE-3894.


Resolution: Fixed

> Make BaseTokenStreamTestCase a bit more evil
> 
>
> Key: LUCENE-3894
> URL: https://issues.apache.org/jira/browse/LUCENE-3894
> Project: Lucene - Java
>  Issue Type: Improvement
>Reporter: Michael McCandless
>Assignee: Michael McCandless
> Fix For: 3.6, 4.0
>
> Attachments: LUCENE-3894.patch, LUCENE-3894.patch, LUCENE-3894.patch
>
>
> Throw an exception from the Reader while tokenizing, stop after not consuming 
> all tokens, sometimes spoon-feed chars from the reader...

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Resolved] (LUCENE-783) Store all metadata in human-readable segments file

2012-03-20 Thread Michael McCandless (Resolved) (JIRA)


 [ 
https://issues.apache.org/jira/browse/LUCENE-783?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Michael McCandless resolved LUCENE-783.
---

Resolution: Fixed

Actually I think SimpleText's SegmentInfosFormat does this well?


> Store all metadata in human-readable segments file
> --
>
> Key: LUCENE-783
> URL: https://issues.apache.org/jira/browse/LUCENE-783
> Project: Lucene - Java
>  Issue Type: Improvement
>  Components: core/index
>Reporter: Marvin Humphrey
>Priority: Minor
>  Labels: gsoc2012, lucene-gsoc-12
> Fix For: 4.0
>
>
> Various index-reading components in Lucene need metadata in addition to data.
> This metadata is presently stored in arbitrary binary headers and spread out
> over several files.  We should move to concentrate it in a single file, and 
> this file should be encoded using a human-readable, extensible, standardized 
> data serialization language -- either XML or YAML.
> * Making metadata human-readable makes debugging easier.  Centralizing it
>   makes debugging easier still.  Developers benefit from being able to scan
>   and locate relevant information quickly and with less debug printing.  Users
>   get a new window through which to peer into the index structure.
> * Since metadata is written to a separate file, there would no longer be a 
>   need to seek back to the beginning of any data file to finish a header, 
>   solving issue LUCENE-532.
> * Special-case parsing code needed for extracting metadata supplied by 
>   different index formats can be pared down.  If a value is no longer 
>   necessary, it can just be ignored/discarded.
> * Removing headers from the data files simplifies them and makes the file
>   format easier to implement. 
> * With headers removed, all or nearly all data structures can take the
>   form of records stacked end to end, so that once a decoder has been
>   selected, an iterator can read the file from top to tail.  To an extent,
>   this allows us to separate our data-processing algorithms from our
>   serialization algorithms, decoupling Lucene's code base from its file
>   format.  For instance, instead of further subclassing TermDocs to deal with
>   "flexible indexing" formats, we might replace it with a PostingList which
>   returns a subclass of Posting.  The deserialization code would be wholly
>   contained within the Posting subclass rather than spread out over several
>   subclasses of TermDocs.
> * YAML and XML are equally well suited for the task of storing metadata, 
>   but in either case a complete parser would not be needed -- a small subset 
>   of the language will do.  KinoSearch 0.20's custom-coded YAML parser 
>   occupies about 600 lines of C -- not too bad, considering how miserable C's 
>   string handling capabilities are. 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Resolved] (LUCENE-1750) Create a MergePolicy that limits the maximum size of it's segments

2012-03-20 Thread Michael McCandless (Resolved) (JIRA)


 [ 
https://issues.apache.org/jira/browse/LUCENE-1750?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Michael McCandless resolved LUCENE-1750.


   Resolution: Duplicate
Fix Version/s: 3.2

TieredMergePolicy does this...

> Create a MergePolicy that limits the maximum size of it's segments
> --
>
> Key: LUCENE-1750
> URL: https://issues.apache.org/jira/browse/LUCENE-1750
> Project: Lucene - Java
>  Issue Type: Improvement
>  Components: core/index
>Affects Versions: 2.4.1
>Reporter: Jason Rutherglen
>Priority: Minor
> Fix For: 4.0, 3.2
>
> Attachments: LUCENE-1750.patch
>
>   Original Estimate: 48h
>  Remaining Estimate: 48h
>
> Basically I'm trying to create largish 2-4GB shards using
> LogByteSizeMergePolicy, however I've found in the attached unit
> test segments that exceed maxMergeMB.
> The goal is for segments to be merged up to 2GB, then all
> merging to that segment stops, and then another 2GB segment is
> created. This helps when replicating in Solr where if a single
> optimized 60GB segment is created, the machine stops working due
> to IO and CPU starvation. 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Resolved] (LUCENE-1922) exposing the ability to get the number of unique term count per field

2012-03-20 Thread Michael McCandless (Resolved) (JIRA)


 [ 
https://issues.apache.org/jira/browse/LUCENE-1922?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Michael McCandless resolved LUCENE-1922.


   Resolution: Duplicate
Fix Version/s: 2.9

Fixed in LUCENE-1586.

> exposing the ability to get the number of unique term count per field
> -
>
> Key: LUCENE-1922
> URL: https://issues.apache.org/jira/browse/LUCENE-1922
> Project: Lucene - Java
>  Issue Type: New Feature
>  Components: core/index
>Affects Versions: 4.0
>Reporter: John Wang
> Fix For: 4.0, 2.9
>
>
> Add an api to get the number of unique term count given a field name, e.g.:
> IndexReader.getUniqueTermCount(String field)
> This issue has a dependency on LUCENE-1458

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Resolved] (LUCENE-1948) Deprecating InstantiatedIndexWriter

2012-03-20 Thread Michael McCandless (Resolved) (JIRA)


 [ 
https://issues.apache.org/jira/browse/LUCENE-1948?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Michael McCandless resolved LUCENE-1948.


Resolution: Fixed

> Deprecating InstantiatedIndexWriter
> ---
>
> Key: LUCENE-1948
> URL: https://issues.apache.org/jira/browse/LUCENE-1948
> Project: Lucene - Java
>  Issue Type: Task
>  Components: modules/other
>Affects Versions: 2.9
>Reporter: Karl Wettin
>Assignee: Karl Wettin
> Fix For: 4.0
>
> Attachments: LUCENE-1948.patch
>
>
> http://markmail.org/message/j6ip266fpzuaibf7
> I suppose that should have been suggested before 2.9 rather than  
> after...
> There are at least three reasons to why I want to do this:
> The code is based on the behaviour or the Directory IndexWriter as of  
> 2.3 and I have not been touching it since then. If there will be  
> changes in the future one will have to keep IIW in sync, something  
> that's easy to forget.
> There is no locking which will cause concurrent modification  
> exceptions when accessing the index via searcher/reader while  
> committing.
> It use the old token stream API so it has to be upgraded in case it  
> should stay.
> The java- and package level docs have since it was committed been  
> suggesting that one should consider using II as if it was immutable  
> due to the locklessness. My suggestion is that we make it immutable  
> for real.
> Since II is ment for small corpora there is very little time lost by  
> using the constructor that builts the index from an IndexReader. I.e.  
> rather than using InstantiatedIndexWriter one would have to use a  
> Directory and an IndexWriter and then pass an IndexReader to a new  
> InstantiatedIndex.
> Any objections?

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Resolved] (LUCENE-2120) Possible file handle leak in near real-time reader

2012-03-20 Thread Michael McCandless (Resolved) (JIRA)


 [ 
https://issues.apache.org/jira/browse/LUCENE-2120?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Michael McCandless resolved LUCENE-2120.


Resolution: Cannot Reproduce

> Possible file handle leak in near real-time reader
> --
>
> Key: LUCENE-2120
> URL: https://issues.apache.org/jira/browse/LUCENE-2120
> Project: Lucene - Java
>  Issue Type: Bug
>  Components: core/index
>Affects Versions: 3.1
>Reporter: Michael McCandless
>Assignee: Michael McCandless
> Fix For: 4.0
>
>
> Spinoff of LUCENE-1526: Jake/John hit file descriptor exhaustion when testing 
> NRT.
> I've tried to repro this, stress testing NRT, saturating reopens, indexing, 
> searching, but haven't found any issue.
> Let's try to get to the bottom of it, here...

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Resolved] (LUCENE-2276) Add IndexReader.document(int, Document, FieldSelector)

2012-03-20 Thread Michael McCandless (Resolved) (JIRA)


 [ 
https://issues.apache.org/jira/browse/LUCENE-2276?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Michael McCandless resolved LUCENE-2276.


   Resolution: Duplicate
Fix Version/s: 4.0

The StoredFieldVisitor API (4.0) makes this possible...

> Add IndexReader.document(int, Document, FieldSelector)
> --
>
> Key: LUCENE-2276
> URL: https://issues.apache.org/jira/browse/LUCENE-2276
> Project: Lucene - Java
>  Issue Type: Wish
>  Components: core/search
>Reporter: Tim Smith
> Fix For: 4.0
>
> Attachments: LUCENE-2276+2539.patch, LUCENE-2276.patch
>
>
> The Document object passed in would be populated with the fields identified 
> by the FieldSelector for the specified internal document id
> This method would allow reuse of Document objects when retrieving stored 
> fields from the index

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Resolved] (LUCENE-2334) IndexReader.close() should call IndexReader.decRef() unconditionally ??

2012-03-20 Thread Michael McCandless (Resolved) (JIRA)


 [ 
https://issues.apache.org/jira/browse/LUCENE-2334?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Michael McCandless resolved LUCENE-2334.


Resolution: Won't Fix

> IndexReader.close() should call IndexReader.decRef() unconditionally ??
> ---
>
> Key: LUCENE-2334
> URL: https://issues.apache.org/jira/browse/LUCENE-2334
> Project: Lucene - Java
>  Issue Type: Improvement
>  Components: core/index
>Affects Versions: 3.0.1
>Reporter: Mike Hanafey
>Priority: Minor
>
> IndexReader.close() is defined:
> {code}  /**
>* Closes files associated with this index.
>* Also saves any new deletions to disk.
>* No other methods should be called after this has been called.
>* @throws IOException if there is a low-level IO error
>*/
>   public final synchronized void close() throws IOException {
> if (!closed) {
>   decRef();
>   closed = true;
> }
>   }
> {code}
> This  means that  if the refCount is bigger than one, close() does not 
> actually close, but it is also true that calling close() again has no effect.
> Why does close() not simply call decRef() unconditionally? This way if 
> incRef() is called each time an instance of IndexReader were handed out, if 
> close() is called by each recipient when they are done, the last one to call 
> close will actually close the index. As written it seems the API is very 
> confusing -- the first close() does one thing, but the next close() does 
> something different.
> At a minimum the JavaDoc should clarify the behavior.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Resolved] (LUCENE-2310) Reduce Fieldable, AbstractField and Field complexity

2012-03-20 Thread Michael McCandless (Resolved) (JIRA)


 [ 
https://issues.apache.org/jira/browse/LUCENE-2310?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Michael McCandless resolved LUCENE-2310.


   Resolution: Fixed
Fix Version/s: 4.0

> Reduce Fieldable, AbstractField and Field complexity
> 
>
> Key: LUCENE-2310
> URL: https://issues.apache.org/jira/browse/LUCENE-2310
> Project: Lucene - Java
>  Issue Type: Sub-task
>  Components: core/index
>Reporter: Chris Male
> Fix For: 4.0
>
> Attachments: LUCENE-2310-Deprecate-AbstractField-CleanField.patch, 
> LUCENE-2310-Deprecate-AbstractField.patch, 
> LUCENE-2310-Deprecate-AbstractField.patch, 
> LUCENE-2310-Deprecate-AbstractField.patch, 
> LUCENE-2310-Deprecate-DocumentGetFields-core.patch, 
> LUCENE-2310-Deprecate-DocumentGetFields.patch, 
> LUCENE-2310-Deprecate-DocumentGetFields.patch, LUCENE-2310.patch
>
>
> In order to move field type like functionality into its own class, we really 
> need to try to tackle the hierarchy of Fieldable, AbstractField and Field.  
> Currently AbstractField depends on Field, and does not provide much more 
> functionality that storing fields, most of which are being moved over to 
> FieldType.  Therefore it seems ideal to try to deprecate AbstractField (and 
> possible Fieldable), moving much of the functionality into Field and 
> FieldType.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Resolved] (LUCENE-2338) Some tests catch Exceptions in separate threads and just print a stack trace - the test does not fail

2012-03-20 Thread Michael McCandless (Resolved) (JIRA)


 [ 
https://issues.apache.org/jira/browse/LUCENE-2338?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Michael McCandless resolved LUCENE-2338.


   Resolution: Fixed
Fix Version/s: 4.0
   3.6

Our test framework fails tests w/ errant exceptions from threads now...

> Some tests catch Exceptions in separate threads and just print a stack trace 
> - the test does not fail
> -
>
> Key: LUCENE-2338
> URL: https://issues.apache.org/jira/browse/LUCENE-2338
> Project: Lucene - Java
>  Issue Type: Test
>  Components: general/build
>Reporter: Uwe Schindler
> Fix For: 3.6, 4.0
>
>
> Some tests catch Exceptions in separate threads and just print a stack trace 
> - the test does not fail. The test should fail. Since LUCENE-2274, the 
> LuceneTestCase(J4) class installs an UncaughtExceptionHandler, so this type 
> of catching and solely printing a Stack trace is a bad idea. Problem is, that 
> the run() method of threads is not allowed to throw checked Exceptions.
> Two possibilities:
> - Catch checked Exceptions in the run() method and wrap into RuntimeException 
> or call Assert.fail() instead
> - Use Executors

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Resolved] (LUCENE-2364) Add support for terms in BytesRef format to Term, TermQuery, TermRangeQuery & Co.

2012-03-20 Thread Michael McCandless (Resolved) (JIRA)


 [ 
https://issues.apache.org/jira/browse/LUCENE-2364?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Michael McCandless resolved LUCENE-2364.


Resolution: Fixed

Term now stores BytesRef internally...

> Add support for terms in BytesRef format to Term, TermQuery, TermRangeQuery & 
> Co.
> -
>
> Key: LUCENE-2364
> URL: https://issues.apache.org/jira/browse/LUCENE-2364
> Project: Lucene - Java
>  Issue Type: Improvement
>  Components: core/search
>Affects Versions: 4.0
>Reporter: Uwe Schindler
>Assignee: Michael McCandless
> Fix For: 4.0
>
>
> It would be good to directly allow BytesRefs in TermQuery and TermRangeQuery 
> (as both queries convert the strings to BytesRef internally). For 
> NumericRange support in Solr it will be needed to support numerics as ByteRef 
> in single-term queries.
> When this will be added, don't forget to change TestNumericRangeQueryXX to 
> use the BytesRef ctor of TRQ.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Resolved] (LUCENE-2445) Perf improvements for the DocsEnum bulk read API

2012-03-20 Thread Michael McCandless (Resolved) (JIRA)


 [ 
https://issues.apache.org/jira/browse/LUCENE-2445?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Michael McCandless resolved LUCENE-2445.


Resolution: Won't Fix

We removed bulk API in 4.0.

> Perf improvements for the DocsEnum bulk read API
> 
>
> Key: LUCENE-2445
> URL: https://issues.apache.org/jira/browse/LUCENE-2445
> Project: Lucene - Java
>  Issue Type: Bug
>  Components: core/index
>Reporter: Michael McCandless
> Fix For: 4.0
>
>
> I started to work on LUCENE-2443, to create a test showing the
> problems, but it turns out none of the core codecs (even sep/intblock)
> ever set a non-zero offset.
> So I set forth to fix sep to do so, but ran into some issues w/ the
> current bulk-read API that we should fix to make it higher
> performance:
>   * Filtering of deleted docs should be the caller's job (saves an
> extra pass through the docs)
>   * Probably docs should arrive as deltas and caller sums these up to
> get the actual docID
>   * Whether to load freqs or not should be separately controllable
>   * We may want to require that the int[] for docs and freqs are
> "aligned", ie the offset into each is the same
>   * Maybe we should separate out a BulkDocsEnum from DocsEnum.  We can
> make it optional for codecs (ie, we can emulate BulkDocsEnum from
> the DocsEnum)

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Resolved] (LUCENE-2441) Create 3.x -> 4.0 index migration tool

2012-03-20 Thread Michael McCandless (Resolved) (JIRA)


 [ 
https://issues.apache.org/jira/browse/LUCENE-2441?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Michael McCandless resolved LUCENE-2441.


Resolution: Duplicate

We already have IndexUpgrader now.

> Create 3.x -> 4.0 index migration tool
> --
>
> Key: LUCENE-2441
> URL: https://issues.apache.org/jira/browse/LUCENE-2441
> Project: Lucene - Java
>  Issue Type: New Feature
>  Components: core/index
>Reporter: Michael McCandless
> Fix For: 4.0
>
>
> We need a tool to upgrade an index so that 4.0 can read it.  I think the only 
> change right now is the cutover to flex's standard codec format, but with 
> LUCENE-2426 we also need to correct the term sort order to be true unicode 
> code point order.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Resolved] (LUCENE-2505) The system cannot find the file specified - _0.fdt

2012-03-20 Thread Michael McCandless (Resolved) (JIRA)


 [ 
https://issues.apache.org/jira/browse/LUCENE-2505?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Michael McCandless resolved LUCENE-2505.


Resolution: Incomplete

> The system cannot find the file specified - _0.fdt
> --
>
> Key: LUCENE-2505
> URL: https://issues.apache.org/jira/browse/LUCENE-2505
> Project: Lucene - Java
>  Issue Type: Bug
>  Components: core/index
>Affects Versions: 2.4.1
>Reporter: Tej Kiran Sharma
>
> Hi,
> I am using Lucene version 2.4.1 and while i indexing my files i got following 
> exception.
> i set indexwriter as following..
> Directory lucDirectory = FSDirectory.getDirectory(_sIndexPath);
> lucDirectory.setLockFactory(new SimpleFSLockFactory(_sIndexPath));
> lucWriter = new IndexWriter(lucDirectory, true, new 
> KeywordAnalyzer(), true);
> lucWriter.setMergeFactor(10);
> lucWriter.setMaxMergeDocs(2147483647);
> lucWriter.setMaxBufferedDocs(1);
> lucWriter.setRAMBufferSizeMB(32);
> lucWriter.setUseCompoundFile(false);
> I am doing indexing and searching both symultaniously and i am getting 
> following exception < the system cannot find the file specified >
> "ERROR Exception while checking size - 
> C:\00scripts\Temp\TempIndex\20104261030775\_0.fdt (The system cannot find the 
> file specified)Stacktrace java.io.FileNotFoundException: 
> C:\00scripts\Temp\TempIndex\20104261030775\_0.fdt (The system cannot find the 
> file specified)   at java.io.RandomAccessFile.open(Native Method) at 
> java.io.RandomAccessFile.(Unknown Source)  at 
> org.apache.lucene.store.FSDirectory$FSIndexInput$Descriptor.(Unknown 
> Source)   at org.apache.lucene.store.FSDirectory$FSIndexInput.(Unknown 
> Source)  at org.apache.lucene.store.FSDirectory.openInput(Unknown Source) 
>at org.apache.lucene.index.FieldsReader.(Unknown Source)  at 
> org.apache.lucene.index.SegmentReader.initialize(Unknown Source) at 
> org.apache.lucene.index.SegmentReader.get(Unknown Source)at 
> org.apache.lucene.index.SegmentReader.get(Unknown Source)at 
> org.apache.lucene.index.DirectoryIndexReader$1.doBody(Unknown Source)
> at org.apache.lucene.index.SegmentInfos$FindSegmentsFile.run(Unknown Source)  
>   at org.apache.lucene.index.DirectoryIndexReader.open(Unknown Source)at 
> org.apache.lucene.index.IndexReader.open(Unknown Source) at 
> org.apache.lucene.index.IndexReader.open(Unknown Source) at 
> org.apache.lucene.search.IndexSearcher.(Unknown Source)at 
> com..main.apu.d(Unknown Source)  at com..main.apu.a(Unknown Source)  
> at com.main.arn.a(Unknown Source)   at com.main.abh.b(Unknown Source) 
>   at com.main.abh.a(Unknown Source)   at com..main.abh.f(Unknown Source)  
> at com.main.eu.run(Unknown Source)"

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Resolved] (LUCENE-2530) rename docsEnum.getBulkResult() to make its role clearer

2012-03-20 Thread Michael McCandless (Resolved) (JIRA)


 [ 
https://issues.apache.org/jira/browse/LUCENE-2530?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Michael McCandless resolved LUCENE-2530.


Resolution: Won't Fix

We removed bulk API in 4.0.

> rename docsEnum.getBulkResult() to make its role clearer
> 
>
> Key: LUCENE-2530
> URL: https://issues.apache.org/jira/browse/LUCENE-2530
> Project: Lucene - Java
>  Issue Type: Bug
>  Components: core/index
>Affects Versions: 4.0
>Reporter: Andi Vajda
>Assignee: Michael McCandless
>Priority: Minor
> Fix For: 4.0
>
>
> Before docsEnum.read() can be called a BulkResult instance must be allocated 
> for it (it == the default implementation of that method).
> This is done by calling docsEnum.getBulkResult(). Failure to call this method 
> before read() is called results in a NullPointerException.
> It is somewhat counterintuitive to "get" the results of an operation before 
> calling said operation.
> Maybe this method should be renamed to something more definite-sounding like 
> obtainBulkResult() or prepareBulkResult() ?

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Resolved] (LUCENE-2948) Make var gap terms index a partial prefix trie

2012-03-20 Thread Michael McCandless (Resolved) (JIRA)


 [ 
https://issues.apache.org/jira/browse/LUCENE-2948?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Michael McCandless resolved LUCENE-2948.


Resolution: Won't Fix

I think BlockTree terms dict accomplished the same thing.

> Make var gap terms index a partial prefix trie
> --
>
> Key: LUCENE-2948
> URL: https://issues.apache.org/jira/browse/LUCENE-2948
> Project: Lucene - Java
>  Issue Type: Improvement
>  Components: core/index
>Reporter: Michael McCandless
>Assignee: Michael McCandless
> Fix For: 4.0
>
> Attachments: LUCENE-2948.patch, LUCENE-2948.patch, LUCENE-2948.patch, 
> LUCENE-2948_automaton.patch, Results.png
>
>
> Var gap stores (in an FST) the indexed terms (every 32nd term, by
> default), minus their non-distinguishing suffixes.
> However, often times the resulting FST is "close" to a prefix trie in
> some portion of the terms space.
> By allowing some nodes of the FST to store all outgoing edges,
> including ones that do not lead to an indexed term, and by recording
> that this node is then "authoritative" as to what terms exist in the
> terms dict from that prefix, we can get some important benefits:
>   * It becomes possible to know that a certain term prefix cannot
> exist in the terms index, which means we can save a disk seek in
> some cases (like PK lookup, docFreq, etc.)
>   * We can query for the next possible prefix in the index, allowing
> some MTQs (eg FuzzyQuery) to save disk seeks.
> Basically, the terms index is able to answer questions that previously
> required seeking/scanning in the terms dict file.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Resolved] (LUCENE-3177) Decouple indexer from Document/Field impls

2012-03-20 Thread Michael McCandless (Resolved) (JIRA)


 [ 
https://issues.apache.org/jira/browse/LUCENE-3177?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Michael McCandless resolved LUCENE-3177.


Resolution: Fixed

> Decouple indexer from Document/Field impls
> --
>
> Key: LUCENE-3177
> URL: https://issues.apache.org/jira/browse/LUCENE-3177
> Project: Lucene - Java
>  Issue Type: Improvement
>Reporter: Michael McCandless
>Assignee: Michael McCandless
> Fix For: 4.0
>
> Attachments: LUCENE-3177.patch, LUCENE-3177.patch
>
>
> I think we should define minimal iterator interfaces,
> IndexableDocument/Field, that indexer requires to index documents.
> Indexer would consume only these bare minimum interfaces, not the
> concrete Document/Field/FieldType classes from oal.document package.
> Then, the Document/Field/FieldType hierarchy is one concrete impl of
> these interfaces. Apps are free to make their own impls as well.
> Maybe eventually we make another impl that enforces a global schema,
> eg factored out of Solr's impl.
> I think this frees design pressure on our Document/Field/FieldType
> hierarchy, ie, these classes are free to become concrete
> fully-featured "user-space" classes with all sorts of friendly sugar
> APIs for adding/removing fields, getting/setting values, types, etc.,
> but they don't need substantial extensibility/hierarchy. Ie, the
> extensibility point shifts to IndexableDocument/Field interface.
> I think this means we can collapse the three classes we now have for a
> Field (Fieldable/AbstracField/Field) down to a single concrete class
> (well, except for LUCENE-2308 where we want to break out dedicated
> classes for different field types...).

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Resolved] (LUCENE-3272) Consolidate Lucene's QueryParsers into a module

2012-03-20 Thread Michael McCandless (Resolved) (JIRA)


 [ 
https://issues.apache.org/jira/browse/LUCENE-3272?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Michael McCandless resolved LUCENE-3272.


   Resolution: Fixed
Fix Version/s: 4.0

> Consolidate Lucene's QueryParsers into a module
> ---
>
> Key: LUCENE-3272
> URL: https://issues.apache.org/jira/browse/LUCENE-3272
> Project: Lucene - Java
>  Issue Type: Improvement
>  Components: modules/queryparser
>Reporter: Chris Male
> Fix For: 4.0
>
>
> Lucene has a lot of QueryParsers and we should have them all in a single 
> consistent place.  
> The following are QueryParsers I can find that warrant moving to the new 
> module:
> - Lucene Core's QueryParser
> - AnalyzingQueryParser
> - ComplexPhraseQueryParser
> - ExtendableQueryParser
> - Surround's QueryParser
> - PrecedenceQueryParser
> - StandardQueryParser
> - XML-Query-Parser's CoreParser
> All seem to do a good job at their kind of parsing with extensive tests.
> One challenge of consolidating these is that many tests use Lucene Core's 
> QueryParser.  One option is to just replicate this class in src/test and call 
> it TestingQueryParser.  Another option is to convert all tests over to 
> programmatically building their queries (seems like alot of work).

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Resolved] (LUCENE-3422) IndeIndexWriter.optimize() throws FileNotFoundException and IOException

2012-03-20 Thread Michael McCandless (Resolved) (JIRA)


 [ 
https://issues.apache.org/jira/browse/LUCENE-3422?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Michael McCandless resolved LUCENE-3422.


Resolution: Incomplete

> IndeIndexWriter.optimize() throws FileNotFoundException and IOException
> ---
>
> Key: LUCENE-3422
> URL: https://issues.apache.org/jira/browse/LUCENE-3422
> Project: Lucene - Java
>  Issue Type: Bug
>Reporter: Elizabeth Nisha
>
> I am using lucene 3.0.2 search APIs for my application. 
> Indexed data is about 350MB and time taken for indexing is 25 hrs. Search 
> indexing and Optimization runs in two different threads. Optimization runs 
> for every 1 hour and it doesn't run while indexing is going on and vice 
> versa. When optimization is going on using IndexWriter.optimize(), 
> FileNotFoundException and IOException are seen in my log and the index file 
> is getting corrupted, log says
> 1. java.io.IOException: No sub-file with id _5r8.fdt found 
> [The file name in this message changes over time (_5r8.fdt, _6fa.fdt, 
> _6uh.fdt, ..., _emv.fdt) ]
> 2. java.io.FileNotFoundException: 
> /local/groups/necim/index_5.3/index/_bdx.cfs (No such file or directory)  
> 3. java.io.FileNotFoundException: 
> /local/groups/necim/index_5.3/index/_hkq.cfs (No such file or directory)
>   Stack trace: java.io.IOException: background merge hit exception: 
> _hkp:c100->_hkp _hkq:c100->_hkp _hkr:c100->_hkr _hks:c100->_hkr _hxb:c5500 
> _hx5:c1000 _hxc:c198
> 84 into _hxd [optimize] [mergeDocStores]
>at org.apache.lucene.index.IndexWriter.optimize(IndexWriter.java:2359)
>at org.apache.lucene.index.IndexWriter.optimize(IndexWriter.java:2298)
>at org.apache.lucene.index.IndexWriter.optimize(IndexWriter.java:2268)
>at com.telelogic.cs.search.SearchIndex.doOptimize(SearchIndex.java:130)
>at 
> com.telelogic.cs.search.SearchIndexerThread$1.run(SearchIndexerThread.java:337)
>at 
> java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
>at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
>at java.lang.Thread.run(Thread.java:662)
> Caused by: java.io.FileNotFoundException: 
> /local/groups/necim/index_5.3/index/_hkq.cfs (No such file or directory)
>at java.io.RandomAccessFile.open(Native Method)
>at java.io.RandomAccessFile.(RandomAccessFile.java:212)
>at 
> org.apache.lucene.store.SimpleFSDirectory$SimpleFSIndexInput$Descriptor.(SimpleFSDirectory.java:76)
>at 
> org.apache.lucene.store.SimpleFSDirectory$SimpleFSIndexInput.(SimpleFSDirectory.java:97)
>at 
> org.apache.lucene.store.NIOFSDirectory$NIOFSIndexInput.(NIOFSDirectory.java:87)
>at 
> org.apache.lucene.store.NIOFSDirectory.openInput(NIOFSDirectory.java:67)
>at 
> org.apache.lucene.index.CompoundFileReader.(CompoundFileReader.java:67)
>at 
> org.apache.lucene.index.SegmentReader$CoreReaders.(SegmentReader.java:114)
>at org.apache.lucene.index.SegmentReader.get(SegmentReader.java:590)
>at 
> org.apache.lucene.index.IndexWriter$ReaderPool.get(IndexWriter.java:616)
>at 
> org.apache.lucene.index.IndexWriter.mergeMiddle(IndexWriter.java:4309)
>at org.apache.lucene.index.IndexWriter.merge(IndexWriter.java:3965)
>at 
> org.apache.lucene.index.ConcurrentMergeScheduler.doMerge(ConcurrentMergeScheduler.java:231)
>at 
> org.apache.lucene.index.ConcurrentMergeScheduler$MergeThread.run(ConcurrentMergeScheduler.java:288)
>  

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Resolved] (LUCENE-3872) Index changes are lost if you call prepareCommit() then close()

2012-03-15 Thread Michael McCandless (Resolved) (JIRA)


 [ 
https://issues.apache.org/jira/browse/LUCENE-3872?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Michael McCandless resolved LUCENE-3872.


Resolution: Fixed

Thanks Tim!

> Index changes are lost if you call prepareCommit() then close()
> ---
>
> Key: LUCENE-3872
> URL: https://issues.apache.org/jira/browse/LUCENE-3872
> Project: Lucene - Java
>  Issue Type: Bug
>Reporter: Michael McCandless
>Assignee: Michael McCandless
> Fix For: 3.6, 4.0
>
> Attachments: LUCENE-3872.patch, LUCENE-3872.patch
>
>
> You are supposed to call commit() after calling prepareCommit(), but... if 
> you forget, and call close() after prepareCommit() without calling commit(), 
> then any changes done after the prepareCommit() are silently lost (including 
> adding/deleting docs, but also any completed merges).
> Spinoff from java-user thread "lots of .cfs (compound files) in the index 
> directory" from Tim Bogaert.
> I think to fix this, IW.close should throw an IllegalStateException if 
> prepareCommit() was called with no matching call to commit().

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Resolved] (LUCENE-3841) CloseableThreadLocal does not work well with Tomcat thread pooling

2012-03-14 Thread Michael McCandless (Resolved) (JIRA)


 [ 
https://issues.apache.org/jira/browse/LUCENE-3841?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Michael McCandless resolved LUCENE-3841.


Resolution: Fixed

Thanks Matthew!

> CloseableThreadLocal does not work well with Tomcat thread pooling
> --
>
> Key: LUCENE-3841
> URL: https://issues.apache.org/jira/browse/LUCENE-3841
> Project: Lucene - Java
>  Issue Type: Bug
>  Components: core/other
>Affects Versions: 3.5
> Environment: Lucene/Tika/Snowball running in a Tomcat web application
>Reporter: Matthew Bellew
>Assignee: Michael McCandless
> Fix For: 3.6, 4.0
>
> Attachments: LUCENE-3841.patch
>
>
> We tracked down a large memory leak (effectively a leak anyway) caused
> by how Analyzer users CloseableThreadLocal.
> CloseableThreadLocal.hardRefs holds references to Thread objects as
> keys.  The problem is that it only frees these references in the set()
> method, and SnowballAnalyzer will only call set() when it is used by a
> NEW thread.
> The problem scenario is as follows:
> The server experiences a spike in usage (say by robots or whatever)
> and many threads are created and referenced by
> CloseableThreadLocal.hardRefs.  The server quiesces and lets many of
> these threads expire normally.  Now we have a smaller, but adequate
> thread pool.  So CloseableThreadLocal.set() may not be called by
> SnowBallAnalyzer (via Analyzer) for a _long_ time.  The purge code is
> never called, and these threads along with their thread local storage
> (lucene related or not) is never cleaned up.
> I think calling the purge code in both get() and set() would have
> avoided this problem, but is potentially expensive.  Perhaps using 
> WeakHashMap instead of HashMap may also have helped.  WeakHashMap 
> purges on get() and set().  So this might be an efficient way to
> clean up threads in get(), while set() might do the more expensive
> Map.keySet() iteration.
> Our current work around is to not share SnowBallAnalyzer instances
> among HTTP searcher threads.  We open and close one on every request.
> Thanks,
> Matt

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Resolved] (LUCENE-3855) TestStressNRT failures (reproducible)

2012-03-12 Thread Michael McCandless (Resolved) (JIRA)


 [ 
https://issues.apache.org/jira/browse/LUCENE-3855?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Michael McCandless resolved LUCENE-3855.


Resolution: Fixed

> TestStressNRT failures (reproducible)
> -
>
> Key: LUCENE-3855
> URL: https://issues.apache.org/jira/browse/LUCENE-3855
> Project: Lucene - Java
>  Issue Type: Bug
>Reporter: Dawid Weiss
>Assignee: Michael McCandless
>Priority: Minor
> Fix For: 4.0
>
> Attachments: LUCENE-3855.patch, 
> hoss-r1298470-fixed-seed__TEST-org.apache.lucene.index.TestStressNRT.xml, 
> output1.log, output2.log, output3.log, output4.log
>
>
> Build server logs. Reproduces on at least two machines.
> {noformat}
> [junit] - Standard Error -
> [junit] NOTE: reproduce with: ant test -Dtestcase=TestStressNRT 
> -Dtestmethod=test 
> -Dtests.seed=69468941c1bbf693:19e66d58475da929:69e9d2f81769b6d0 
> -Dargs="-Dfile.encoding=UTF-8"
> [junit] NOTE: test params are: codec=Lucene3x, 
> sim=RandomSimilarityProvider(queryNorm=true,coord=false): {}, locale=ro, 
> timezone=Etc/GMT+1
> [junit] NOTE: all tests run in this JVM:
> [junit] [TestStressNRT]
> [junit] NOTE: Linux 3.0.0-16-generic amd64/Sun Microsystems Inc. 1.6.0_27 
> (64-bit)/cpus=2,threads=1,free=74960064,total=135987200
> [junit] -  ---
> [junit] Testcase: test(org.apache.lucene.index.TestStressNRT):Caused 
> an ERROR
> [junit] MockDirectoryWrapper: cannot close: there are still open files: 
> {_ng.cfs=8}
> [junit] java.lang.RuntimeException: MockDirectoryWrapper: cannot close: 
> there are still open files: {_ng.cfs=8}
> [junit]   at 
> org.apache.lucene.store.MockDirectoryWrapper.close(MockDirectoryWrapper.java:555)
> [junit]   at 
> org.apache.lucene.index.TestStressNRT.test(TestStressNRT.java:385)
> [junit]   at 
> org.apache.lucene.util.LuceneTestCase$SubclassSetupTeardownRule$1.evaluate(LuceneTestCase.java:743)
> [junit]   at 
> org.apache.lucene.util.LuceneTestCase$InternalSetupTeardownRule$1.evaluate(LuceneTestCase.java:639)
> [junit]   at 
> org.apache.lucene.util.SystemPropertiesInvariantRule$1.evaluate(SystemPropertiesInvariantRule.java:22)
> [junit]   at 
> org.apache.lucene.util.LuceneTestCase$TestResultInterceptorRule$1.evaluate(LuceneTestCase.java:538)
> [junit]   at 
> org.apache.lucene.util.LuceneTestCase$RememberThreadRule$1.evaluate(LuceneTestCase.java:600)
> [junit]   at 
> org.apache.lucene.util.LuceneTestCaseRunner.runChild(LuceneTestCaseRunner.java:164)
> [junit]   at 
> org.apache.lucene.util.LuceneTestCaseRunner.runChild(LuceneTestCaseRunner.java:57)
> [junit]   at 
> org.apache.lucene.util.StoreClassNameRule$1.evaluate(StoreClassNameRule.java:21)
> [junit]   at 
> org.apache.lucene.util.SystemPropertiesInvariantRule$1.evaluate(SystemPropertiesInvariantRule.java:22)
> [junit] Caused by: java.lang.RuntimeException: unclosed IndexInput: 
> _ng.cfs
> [junit]   at 
> org.apache.lucene.store.MockDirectoryWrapper.addFileHandle(MockDirectoryWrapper.java:479)
> [junit]   at 
> org.apache.lucene.store.MockDirectoryWrapper$1.openSlice(MockDirectoryWrapper.java:777)
> [junit]   at 
> org.apache.lucene.store.CompoundFileDirectory.openInput(CompoundFileDirectory.java:221)
> [junit]   at 
> org.apache.lucene.codecs.lucene3x.TermInfosReader.(TermInfosReader.java:112)
> [junit]   at 
> org.apache.lucene.codecs.lucene3x.Lucene3xFields.(Lucene3xFields.java:84)
> [junit]   at 
> org.apache.lucene.codecs.lucene3x.PreFlexRWPostingsFormat$1.(PreFlexRWPostingsFormat.java:51)
> [junit]   at 
> org.apache.lucene.codecs.lucene3x.PreFlexRWPostingsFormat.fieldsProducer(PreFlexRWPostingsFormat.java:51)
> [junit]   at 
> org.apache.lucene.index.SegmentCoreReaders.(SegmentCoreReaders.java:108)
> [junit]   at 
> org.apache.lucene.index.SegmentReader.(SegmentReader.java:51)
> [junit]   at 
> org.apache.lucene.index.IndexWriter$ReadersAndLiveDocs.getMergeReader(IndexWriter.java:521)
> [junit]   at 
> org.apache.lucene.index.IndexWriter.mergeMiddle(IndexWriter.java:3587)
> [junit]   at 
> org.apache.lucene.index.IndexWriter.merge(IndexWriter.java:3257)
> [junit]   at 
> org.apache.lucene.index.ConcurrentMergeScheduler.doMerge(ConcurrentMergeScheduler.java:382)
> [junit]   at 
> org.apache.lucene.index.ConcurrentMergeScheduler$MergeThread.run(ConcurrentMergeScheduler.java:451)
> [junit] 
> [junit] 
> [junit] Test org.apache.lucene.index.TestStressNRT FAILED
> {noformat}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information

[jira] [Resolved] (LUCENE-3831) Passing a null fieldname to MemoryFields#terms in MemoryIndex throws a NPE

2012-03-07 Thread Michael McCandless (Resolved) (JIRA)


 [ 
https://issues.apache.org/jira/browse/LUCENE-3831?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Michael McCandless resolved LUCENE-3831.


   Resolution: Fixed
Fix Version/s: 4.0
   3.6

Thanks Alan.

I couldn't provoke an NPE on 3.x but I still fixed SpanWeight to not pass on a 
null field to IR.norms.

> Passing a null fieldname to MemoryFields#terms in MemoryIndex throws a NPE
> --
>
> Key: LUCENE-3831
> URL: https://issues.apache.org/jira/browse/LUCENE-3831
> Project: Lucene - Java
>  Issue Type: Bug
>  Components: core/index
>Affects Versions: 4.0
>Reporter: Alan Woodward
>Assignee: Michael McCandless
>Priority: Minor
> Fix For: 3.6, 4.0
>
> Attachments: LUCENE-3831.patch, TestNullFieldAfterRegexpRewrite.java, 
> mindex-null-field.patch
>
>
> I found this when querying a MemoryIndex using a RegexpQuery wrapped by a 
> SpanMultiTermQueryWrapper.  If the regexp doesn't match anything in the 
> index, it gets rewritten to an empty SpanOrQuery with a null field value, 
> which then triggers the NPE.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Resolved] (LUCENE-3851) TestTermInfosReaderIndex failing (always reproducible)

2012-03-06 Thread Michael McCandless (Resolved) (JIRA)


 [ 
https://issues.apache.org/jira/browse/LUCENE-3851?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Michael McCandless resolved LUCENE-3851.


   Resolution: Fixed
Fix Version/s: 3.6

Thanks Dawid!

> TestTermInfosReaderIndex failing (always reproducible)
> --
>
> Key: LUCENE-3851
> URL: https://issues.apache.org/jira/browse/LUCENE-3851
> Project: Lucene - Java
>  Issue Type: Bug
>Reporter: Dawid Weiss
>Assignee: Michael McCandless
>Priority: Minor
> Fix For: 3.6, 4.0
>
>
> Always fails on branch (use reproduce string below):
> git clone --depth 1 -b rr g...@github.com:dweiss/lucene_solr.git
> {noformat}
> [junit4] Running org.apache.lucene.codecs.lucene3x.TestTermInfosReaderIndex
> [junit4] FAILURE 0.04s J0 | TestTermInfosReaderIndex.testSeekEnum
> [junit4]> Throwable #1: java.lang.AssertionError: 
> expected: but was:<:>
> [junit4]> at 
> __randomizedtesting.SeedInfo.seed([C7597DFBBE0B3D7D:C6D9CEDD0700AAFF]:0)
> [junit4]> at org.junit.Assert.fail(Assert.java:93)
> [junit4]> at org.junit.Assert.failNotEquals(Assert.java:647)
> [junit4]> at org.junit.Assert.assertEquals(Assert.java:128)
> [junit4]> at org.junit.Assert.assertEquals(Assert.java:147)
> [junit4]> at 
> org.apache.lucene.codecs.lucene3x.TestTermInfosReaderIndex.testSeekEnum(TestTermInfosReaderIndex.java:137)
> [junit4]> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native 
> Method)
> [junit4]> at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
> [junit4]> at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
> [junit4]> at java.lang.reflect.Method.invoke(Method.java:597)
> [junit4]> at 
> com.carrotsearch.randomizedtesting.RandomizedRunner.invoke(RandomizedRunner.java:1766)
> [junit4]> at 
> com.carrotsearch.randomizedtesting.RandomizedRunner.access$1000(RandomizedRunner.java:141)
> [junit4]> at 
> com.carrotsearch.randomizedtesting.RandomizedRunner$6.evaluate(RandomizedRunner.java:728)
> [junit4]> at 
> com.carrotsearch.randomizedtesting.RandomizedRunner$7.evaluate(RandomizedRunner.java:789)
> [junit4]> at 
> com.carrotsearch.randomizedtesting.RandomizedRunner$8.evaluate(RandomizedRunner.java:803)
> [junit4]> at 
> org.apache.lucene.util.LuceneTestCase$SubclassSetupTeardownRule$1.evaluate(LuceneTestCase.java:744)
> [junit4]> at 
> org.apache.lucene.util.LuceneTestCase$InternalSetupTeardownRule$1.evaluate(LuceneTestCase.java:636)
> [junit4]> at 
> org.apache.lucene.util.SystemPropertiesInvariantRule$1.evaluate(SystemPropertiesInvariantRule.java:22)
> [junit4]> at 
> org.apache.lucene.util.LuceneTestCase$TestResultInterceptorRule$1.evaluate(LuceneTestCase.java:550)
> [junit4]> at 
> org.apache.lucene.util.LuceneTestCase$RememberThreadRule$1.evaluate(LuceneTestCase.java:600)
> [junit4]> at 
> com.carrotsearch.randomizedtesting.RandomizedRunner.runSingleTest(RandomizedRunner.java:735)
> [junit4]> at 
> com.carrotsearch.randomizedtesting.RandomizedRunner.access$600(RandomizedRunner.java:141)
> [junit4]> at 
> com.carrotsearch.randomizedtesting.RandomizedRunner$3$1.run(RandomizedRunner.java:586)
> [junit4]> at 
> com.carrotsearch.randomizedtesting.RandomizedRunner$3.evaluate(RandomizedRunner.java:605)
> [junit4]> at 
> com.carrotsearch.randomizedtesting.RandomizedRunner$4.evaluate(RandomizedRunner.java:641)
> [junit4]> at 
> com.carrotsearch.randomizedtesting.RandomizedRunner$5.evaluate(RandomizedRunner.java:652)
> [junit4]> at 
> org.apache.lucene.util.SystemPropertiesInvariantRule$1.evaluate(SystemPropertiesInvariantRule.java:22)
> [junit4]> at 
> com.carrotsearch.randomizedtesting.RandomizedRunner.runSuite(RandomizedRunner.java:533)
> [junit4]> at 
> com.carrotsearch.randomizedtesting.RandomizedRunner.access$400(RandomizedRunner.java:141)
> [junit4]> at 
> com.carrotsearch.randomizedtesting.RandomizedRunner$2.run(RandomizedRunner.java:479)
> [junit4]> 
> [junit4]   2> NOTE: reproduce with: ant test 
> -Dtests.filter=*.TestTermInfosReaderIndex -Dtests.filter.method=testSeekEnum 
> -Drt.seed=C7597DFBBE0B3D7D -Dargs="-Dfile.encoding=UTF-8"
> [junit4]   2>
> [junit4]> (@AfterClass output)
> [junit4]   2> NOTE: test params are: codec=Appending, sim=DefaultSimilarity, 
> locale=en, timezone=Atlantic/Stanley
> [junit4]   2> NOTE: all tests run in this JVM:
> [junit4]   2> [TestLock, TestFileSwitchDirectory, TestWildcardRandom, 
> TestVersionComparator, TestTermdocPerf, TestBi

[jira] [Resolved] (LUCENE-3003) Move UnInvertedField into Lucene core

2012-03-06 Thread Michael McCandless (Resolved) (JIRA)


 [ 
https://issues.apache.org/jira/browse/LUCENE-3003?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Michael McCandless resolved LUCENE-3003.


   Resolution: Fixed
Fix Version/s: (was: 3.6)

> Move UnInvertedField into Lucene core
> -
>
> Key: LUCENE-3003
> URL: https://issues.apache.org/jira/browse/LUCENE-3003
> Project: Lucene - Java
>  Issue Type: Improvement
>  Components: core/index
>Reporter: Michael McCandless
>Assignee: Michael McCandless
>Priority: Minor
> Fix For: 4.0
>
> Attachments: LUCENE-3003.patch, LUCENE-3003.patch, 
> byte_size_32-bit-openjdk6.txt
>
>
> Solr's UnInvertedField lets you quickly lookup all terms ords for a
> given doc/field.
> Like, FieldCache, it inverts the index to produce this, and creates a
> RAM-resident data structure holding the bits; but, unlike FieldCache,
> it can handle multiple values per doc, and, it does not hold the term
> bytes in RAM.  Rather, it holds only term ords, and then uses
> TermsEnum to resolve ord -> term.
> This is great eg for faceting, where you want to use int ords for all
> of your counting, and then only at the end you need to resolve the
> "top N" ords to their text.
> I think this is a useful core functionality, and we should move most
> of it into Lucene's core.  It's a good complement to FieldCache.  For
> this first baby step, I just move it into core and refactor Solr's
> usage of it.
> After this, as separate issues, I think there are some things we could
> explore/improve:
>   * The first-pass that allocates lots of tiny byte[] looks like it
> could be inefficient.  Maybe we could use the byte slices from the
> indexer for this...
>   * We can improve the RAM efficiency of the TermIndex: if the codec
> supports ords, and we are operating on one segment, we should just
> use it.  If not, we can use a more RAM-efficient data structure,
> eg an FST mapping to the ord.
>   * We may be able to improve on the main byte[] representation by
> using packed ints instead of delta-vInt?
>   * Eventually we should fold this ability into docvalues, ie we'd
> write the byte[] image at indexing time, and then loading would be
> fast, instead of uninverting

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Resolved] (LUCENE-3824) TermOrdVal/DocValuesComparator does too much work in compareBottom

2012-02-28 Thread Michael McCandless (Resolved) (JIRA)


 [ 
https://issues.apache.org/jira/browse/LUCENE-3824?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Michael McCandless resolved LUCENE-3824.


Resolution: Fixed

> TermOrdVal/DocValuesComparator does too much work in compareBottom
> --
>
> Key: LUCENE-3824
> URL: https://issues.apache.org/jira/browse/LUCENE-3824
> Project: Lucene - Java
>  Issue Type: Improvement
>Reporter: Michael McCandless
>Assignee: Michael McCandless
>Priority: Minor
> Fix For: 3.6, 4.0
>
> Attachments: LUCENE-3824.patch
>
>
> We now have logic to fall back to by-value comparison, when the bottom
> slot is not from the current reader.
> But this is silly, because if the bottom slot is from a different
> reader, it means the tie-break case is not possible (since the current
> reader didn't have the bottom value), so when the incoming ord equals
> the bottom ord we should always return x > 0.
> I added a new random string sort test case to TestSort...
> I also renamed DocValues.SortedSource.getByValue -> getOrdByValue and
> cleaned up some whitespace.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Resolved] (LUCENE-3829) Lucene40 codec's DocValues DirectSource impls aren't thread-safe

2012-02-28 Thread Michael McCandless (Resolved) (JIRA)


 [ 
https://issues.apache.org/jira/browse/LUCENE-3829?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Michael McCandless resolved LUCENE-3829.


Resolution: Invalid

Duh, thanks Simon ;)

Once I fixed the test to use the API correctly, it passes!

> Lucene40 codec's DocValues DirectSource impls aren't thread-safe
> 
>
> Key: LUCENE-3829
> URL: https://issues.apache.org/jira/browse/LUCENE-3829
> Project: Lucene - Java
>  Issue Type: Bug
>Reporter: Michael McCandless
> Fix For: 4.0
>
> Attachments: LUCENE-3829.patch
>
>
> Our DirectSource impls hold IndexInput(s) open against the dat/idx
> files, which we then seek + read when loading a specific document's
> value.  But this is in no way protected against multiple threads
> I think...?

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Resolved] (LUCENE-3820) Wrong trailing index calculation in PatternReplaceCharFilter

2012-02-28 Thread Michael McCandless (Resolved) (JIRA)


 [ 
https://issues.apache.org/jira/browse/LUCENE-3820?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Michael McCandless resolved LUCENE-3820.


Resolution: Fixed

Thanks Dawid!

> Wrong trailing index calculation in PatternReplaceCharFilter
> 
>
> Key: LUCENE-3820
> URL: https://issues.apache.org/jira/browse/LUCENE-3820
> Project: Lucene - Java
>  Issue Type: Bug
>Reporter: Dawid Weiss
>Assignee: Dawid Weiss
>Priority: Minor
> Fix For: 3.6, 4.0
>
> Attachments: LUCENE-3820.patch, LUCENE-3820.patch, 
> LUCENE-3820_test.patch, LUCENE-3820_test.patch
>
>
> Reimplementation of PatternReplaceCharFilter to pass randomized tests (used 
> to throw exceptions previously). Simplified code, dropped boundary 
> characters, full input buffered for pattern matching.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Resolved] (LUCENE-3827) Make term offsets work in MemoryIndex

2012-02-27 Thread Michael McCandless (Resolved) (JIRA)


 [ 
https://issues.apache.org/jira/browse/LUCENE-3827?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Michael McCandless resolved LUCENE-3827.


   Resolution: Fixed
Fix Version/s: 4.0

I just committed this.

Thanks Alan!

> Make term offsets work in MemoryIndex
> -
>
> Key: LUCENE-3827
> URL: https://issues.apache.org/jira/browse/LUCENE-3827
> Project: Lucene - Java
>  Issue Type: Improvement
>  Components: core/index
>Affects Versions: 4.0
>Reporter: Alan Woodward
>Assignee: Michael McCandless
>Priority: Minor
> Fix For: 4.0
>
> Attachments: mindex.patch
>
>
> Fix the logic for retrieving term offsets from DocsAndPositionsEnum on a 
> MemoryIndex, and allow subclasses to access them.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Resolved] (LUCENE-3776) NRTManager shouldn't expose its private SearcherManager

2012-02-17 Thread Michael McCandless (Resolved) (JIRA)


 [ 
https://issues.apache.org/jira/browse/LUCENE-3776?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Michael McCandless resolved LUCENE-3776.


Resolution: Fixed

Thanks Shai!

> NRTManager shouldn't expose its private SearcherManager
> ---
>
> Key: LUCENE-3776
> URL: https://issues.apache.org/jira/browse/LUCENE-3776
> Project: Lucene - Java
>  Issue Type: Improvement
>Reporter: Michael McCandless
>Assignee: Michael McCandless
>Priority: Blocker
> Fix For: 3.6, 4.0
>
> Attachments: LUCENE-3776.patch, LUCENE-3776.patch, LUCENE-3776.patch
>
>
> Spinoff from LUCENE-3769.
> To actually obtain an IndexSearcher from NRTManager, it's a 2-step process 
> now.
> You must .getSearcherManager(), then .acquire() from the returned 
> SearcherManager.
> This is very trappy... because if the app incorrectly calls maybeReopen on 
> that private SearcherManager (instead of NRTManager.maybeReopen) then it can 
> unexpectedly cause threads to block forever, waiting for the necessary gen to 
> become visible.  This will be hard to debug... I don't like creating trappy 
> APIs.
> Hopefully once LUCENE-3761 is in, we can fix NRTManager to no longer expose 
> its private SM, instead subclassing ReferenceManaager.
> Or alternatively, or in addition, maybe we factor out a new interface 
> (SearcherProvider or something...) that only has acquire and release methods, 
> and both NRTManager and ReferenceManager/SM impl that, and we keep 
> NRTManager's SM private.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Resolved] (LUCENE-3769) Simplify NRTManager

2012-02-13 Thread Michael McCandless (Resolved) (JIRA)


 [ 
https://issues.apache.org/jira/browse/LUCENE-3769?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Michael McCandless resolved LUCENE-3769.


Resolution: Fixed

I'll open follow-on issue for the nasty trap...

> Simplify NRTManager
> ---
>
> Key: LUCENE-3769
> URL: https://issues.apache.org/jira/browse/LUCENE-3769
> Project: Lucene - Java
>  Issue Type: Improvement
>  Components: core/search
>Reporter: Michael McCandless
>Assignee: Michael McCandless
> Fix For: 3.6, 4.0
>
> Attachments: LUCENE-3769.patch, LUCENE-3769.patch
>
>
> NRTManager is hairy now, because the applyDeletes is separately passed
> to ctor, passed to maybeReopen, passed to getSearcherManager, etc.
> I think, instead, you should pass it only to the ctor, and if you have
> some cases needing deletes and others not then you can make two
> NRTManagers.  This should be no less efficient than we have today,
> just simpler.
> I think it will also enable NRTManager to subclass ThingyManager
> (LUCENE-3761).

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Resolved] (LUCENE-3760) Cleanup DR.getCurrentVersion/DR.getUserData/DR.getIndexCommit().getUserData()

2012-02-10 Thread Michael McCandless (Resolved) (JIRA)


 [ 
https://issues.apache.org/jira/browse/LUCENE-3760?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Michael McCandless resolved LUCENE-3760.


Resolution: Fixed

> Cleanup DR.getCurrentVersion/DR.getUserData/DR.getIndexCommit().getUserData()
> -
>
> Key: LUCENE-3760
> URL: https://issues.apache.org/jira/browse/LUCENE-3760
> Project: Lucene - Java
>  Issue Type: Improvement
>Reporter: Michael McCandless
>Assignee: Michael McCandless
> Fix For: 3.6, 4.0
>
> Attachments: LUCENE-3760.patch, LUCENE-3760.patch
>
>
> Spinoff from Ryan's dev thread "DR.getCommitUserData() vs 
> DR.getIndexCommit().getUserData()"... these methods are confusing/dups right 
> now.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Resolved] (LUCENE-3672) IndexCommit.equals() bug

2012-02-07 Thread Michael McCandless (Resolved) (JIRA)


 [ 
https://issues.apache.org/jira/browse/LUCENE-3672?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Michael McCandless resolved LUCENE-3672.


   Resolution: Fixed
Fix Version/s: 4.0
   3.6

> IndexCommit.equals() bug
> 
>
> Key: LUCENE-3672
> URL: https://issues.apache.org/jira/browse/LUCENE-3672
> Project: Lucene - Java
>  Issue Type: Bug
>  Components: core/index
>Affects Versions: 4.0
>Reporter: Andrzej Bialecki 
>Assignee: Michael McCandless
> Fix For: 3.6, 4.0
>
> Attachments: LUCENE-3672.patch
>
>
> IndexCommit.equals() checks for equality of Directories and versions, but it 
> doesn't check IMHO the more important generation numbers. It looks like 
> commits are really identified by a combination of directory and segments_XXX, 
> which means the generation number, because that's what the 
> DirectoryReader.open() checks for.
> This bug leads to an unexpected behavior when the only change to be committed 
> is in userData - we get two commits then that are declared equal, they have 
> the same version but they have different generation numbers. I have no idea 
> how this situation is treated in a few dozen references to 
> IndexCommit.equals() across Lucene...
> On the surface the fix is trivial - either add the gen number to equals(), or 
> use gen number instead of version. However, it's puzzling why these two would 
> ever get out of sync??? and if they are always supposed to be in sync then 
> maybe we don't need both of them at all, maybe just generation or version is 
> sufficient?

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Resolved] (LUCENE-3742) SynFilter doesn't set offsets for outputs that hang off the end of the input tokens

2012-01-31 Thread Michael McCandless (Resolved) (JIRA)


 [ 
https://issues.apache.org/jira/browse/LUCENE-3742?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Michael McCandless resolved LUCENE-3742.


Resolution: Fixed

I set the offset to match the last input token...

> SynFilter doesn't set offsets for outputs that hang off the end of the input 
> tokens
> ---
>
> Key: LUCENE-3742
> URL: https://issues.apache.org/jira/browse/LUCENE-3742
> Project: Lucene - Java
>  Issue Type: Bug
>Reporter: Michael McCandless
>Assignee: Michael McCandless
> Fix For: 3.6, 4.0
>
> Attachments: LUCENE-3742.patch
>
>
> If you have syn rule a -> x y and input a then output is a/x y but... what 
> should y's offsets be?  Right now we set to 0/0.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Resolved] (LUCENE-3725) Add optional packing to FST building

2012-01-29 Thread Michael McCandless (Resolved) (JIRA)


 [ 
https://issues.apache.org/jira/browse/LUCENE-3725?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Michael McCandless resolved LUCENE-3725.


Resolution: Fixed

> Add optional packing to FST building
> 
>
> Key: LUCENE-3725
> URL: https://issues.apache.org/jira/browse/LUCENE-3725
> Project: Lucene - Java
>  Issue Type: Improvement
>  Components: core/FSTs
>Reporter: Michael McCandless
>Assignee: Michael McCandless
> Fix For: 3.6, 4.0
>
> Attachments: LUCENE-3725.patch, LUCENE-3725.patch, LUCENE-3725.patch, 
> Perf.java
>
>
> The FSTs produced by Builder can be further shrunk if you are willing
> to spend highish transient RAM to do so... our Builder today tries
> hard not to use much RAM (and has options to tweak down the RAM usage,
> in exchange for somewhat lager FST), even when building immense FSTs.
> But for apps that can afford highish transient RAM to get a smaller
> net FST, I think we should offer packing.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Resolved] (LUCENE-2795) Genericize DirectIOLinuxDir -> UnixDir

2012-01-29 Thread Michael McCandless (Resolved) (JIRA)


 [ 
https://issues.apache.org/jira/browse/LUCENE-2795?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Michael McCandless resolved LUCENE-2795.


   Resolution: Fixed
Fix Version/s: 4.0

Thanks Varun!

> Genericize DirectIOLinuxDir -> UnixDir
> --
>
> Key: LUCENE-2795
> URL: https://issues.apache.org/jira/browse/LUCENE-2795
> Project: Lucene - Java
>  Issue Type: Improvement
>  Components: core/store
>Reporter: Michael McCandless
>Assignee: Varun Thacker
>  Labels: gsoc2011, lucene-gsoc-11, mentor
> Fix For: 4.0
>
> Attachments: LUCENE-2795.patch, LUCENE-2795.patch, LUCENE-2795.patch, 
> LUCENE-2795.patch, LUCENE-2795.patch, LUCENE-2795.patch, LUCENE-2795.patch, 
> LUCENE-2795.patch, LUCENE-2795.patch, LUCENE-2795.patch
>
>
> Today DirectIOLinuxDir is tricky/dangerous to use, because you only want to 
> use it for indexWriter and not IndexReader (searching).  It's a trap.
> But, once we do LUCENE-2793, we can make it fully general purpose because 
> then a single native Dir impl can be used.
> I'd also like to make it generic to other Unices, if we can, so that it 
> becomes UnixDirectory.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Resolved] (LUCENE-3121) FST should offer lookup-by-output API when output strictly increases

2012-01-19 Thread Michael McCandless (Resolved) (JIRA)


 [ 
https://issues.apache.org/jira/browse/LUCENE-3121?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Michael McCandless resolved LUCENE-3121.


   Resolution: Fixed
Fix Version/s: 3.6

> FST should offer lookup-by-output API when output strictly increases
> 
>
> Key: LUCENE-3121
> URL: https://issues.apache.org/jira/browse/LUCENE-3121
> Project: Lucene - Java
>  Issue Type: Improvement
>  Components: core/other
>Reporter: Michael McCandless
>Assignee: Michael McCandless
> Fix For: 3.6, 4.0
>
> Attachments: LUCENE-3121.patch
>
>
> Spinoff from "FST and FieldCache" java-dev thread 
> http://lucene.markmail.org/thread/swoawlv3fq4dntvl
> FST is able to associate arbitrary outputs with the sorted input keys, but in 
> the special (and, common) case where the function is strictly monotonic (each 
> output only "increases" vs prior outputs), such as mapping to term ords or 
> mapping to file offsets in the terms dict, we should offer a lookup-by-output 
> API that efficiently walks the FST and locates input key (exact or floor or 
> ceil) matching that output.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Resolved] (LUCENE-3694) DocValuesField should not overload setInt/setFloat etc

2012-01-16 Thread Michael McCandless (Resolved) (JIRA)


 [ 
https://issues.apache.org/jira/browse/LUCENE-3694?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Michael McCandless resolved LUCENE-3694.


Resolution: Fixed

Fixed with LUCENE-3453.

> DocValuesField should not overload setInt/setFloat etc
> --
>
> Key: LUCENE-3694
> URL: https://issues.apache.org/jira/browse/LUCENE-3694
> Project: Lucene - Java
>  Issue Type: Bug
>Reporter: Robert Muir
>Assignee: Michael McCandless
> Fix For: 4.0
>
>
> See my description on LUCENE-3687. In general we should avoid this for 
> primitive types and give them each unique names.
> So I think instead of setInt(byte), setInt(short), setInt(int), setInt(long), 
> setFloat(float) and setFloat(double),
> we should have setByte(byte), setShort(short), setInt(int), setLong(long), 
> setFloat(float) and setDouble(double).

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Resolved] (LUCENE-3682) Add deprecated 'transition' api for Document/Field

2012-01-16 Thread Michael McCandless (Resolved) (JIRA)


 [ 
https://issues.apache.org/jira/browse/LUCENE-3682?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Michael McCandless resolved LUCENE-3682.


Resolution: Fixed

> Add deprecated 'transition' api for Document/Field
> --
>
> Key: LUCENE-3682
> URL: https://issues.apache.org/jira/browse/LUCENE-3682
> Project: Lucene - Java
>  Issue Type: Task
>Reporter: Robert Muir
>Assignee: Michael McCandless
> Fix For: 4.0
>
>
> I think for 4.0 we should have a deprecated transition api for Field so you 
> can do new Field(..., Field.Store.xxx, Field.Index.yyy) like before.
> These combinations would just be some predefined fieldtypes that are used 
> behind the scenes if you use these deprecated ctors
> Sure it wouldn't be 'totally' backwards binary compat for Field.java, but why 
> must it be all or nothing? I think this would eliminate a big
> hurdle for people that want to check out 4.x

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

dev@lucene.apache.org

2012-01-15 Thread Michael McCandless (Resolved) (JIRA)


 [ 
https://issues.apache.org/jira/browse/LUCENE-3684?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Michael McCandless resolved LUCENE-3684.


Resolution: Fixed

> Add offsets to postings (D&PEnum)
> -
>
> Key: LUCENE-3684
> URL: https://issues.apache.org/jira/browse/LUCENE-3684
> Project: Lucene - Java
>  Issue Type: Improvement
>Reporter: Michael McCandless
>Assignee: Michael McCandless
> Fix For: 4.0
>
> Attachments: LUCENE-3684.patch, LUCENE-3684.patch, LUCENE-3684.patch
>
>
> I think should explore making start/end offsets a first-class attr in the
> postings APIs, and fixing the indexer to index them into postings.
> This will make term vector access cleaner (we now have to jump through
> hoops w/ non-first-class offset attr).  It can also enable efficient
> highlighting without term vectors / reanalyzing, if the app indexes
> offsets into the postings.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Resolved] (LUCENE-3453) remove IndexDocValuesField

2012-01-15 Thread Michael McCandless (Resolved) (JIRA)


 [ 
https://issues.apache.org/jira/browse/LUCENE-3453?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Michael McCandless resolved LUCENE-3453.


Resolution: Fixed

> remove IndexDocValuesField
> --
>
> Key: LUCENE-3453
> URL: https://issues.apache.org/jira/browse/LUCENE-3453
> Project: Lucene - Java
>  Issue Type: Bug
>Affects Versions: 4.0
>Reporter: Robert Muir
>Assignee: Michael McCandless
> Fix For: 4.0
>
> Attachments: LUCENE-3453.patch, LUCENE-3453.patch
>
>
> Its confusing how we present CSF functionality to the user, its actually not 
> a "field" but an "attribute" of a field like  STORED or INDEXED.
> Otherwise, its really hard to think about CSF because there is a mismatch 
> between the APIs and the index format.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Resolved] (LUCENE-3685) Add top-down version of BlockJoinQuery

2012-01-14 Thread Michael McCandless (Resolved) (JIRA)


 [ 
https://issues.apache.org/jira/browse/LUCENE-3685?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Michael McCandless resolved LUCENE-3685.


Resolution: Fixed

> Add top-down version of BlockJoinQuery
> --
>
> Key: LUCENE-3685
> URL: https://issues.apache.org/jira/browse/LUCENE-3685
> Project: Lucene - Java
>  Issue Type: Improvement
>  Components: modules/join
>Reporter: Michael McCandless
>Assignee: Michael McCandless
> Fix For: 3.6, 4.0
>
> Attachments: LUCENE-3685.patch
>
>
> Today, BlockJoinQuery can join from child docIDs up to parent docIDs.
> EG this works well for product (parent) + many SKUs (child) search.
> But the reverse, which BJQ cannot do, is also useful in some cases.
> EG say you index songs (child) within albums (parent), but you want to
> search and present by song not album while involving some fields from
> the album in the query.  In this case you want to wrap a parent query
> (against album), joining down to the child document space.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Resolved] (LUCENE-3679) Replace IndexReader.getFieldNames with IndexReader.getFieldInfos

2012-01-09 Thread Michael McCandless (Resolved) (JIRA)


 [ 
https://issues.apache.org/jira/browse/LUCENE-3679?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Michael McCandless resolved LUCENE-3679.


Resolution: Fixed

> Replace IndexReader.getFieldNames with IndexReader.getFieldInfos
> 
>
> Key: LUCENE-3679
> URL: https://issues.apache.org/jira/browse/LUCENE-3679
> Project: Lucene - Java
>  Issue Type: Improvement
>Reporter: Michael McCandless
>Assignee: Michael McCandless
> Fix For: 3.6, 4.0
>
> Attachments: LUCENE-3679.patch, LUCENE-3679.patch
>
>


--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Resolved] (LUCENE-3681) FST.BYTE2 should save as fixed 2 byte not as vInt

2012-01-08 Thread Michael McCandless (Resolved) (JIRA)


 [ 
https://issues.apache.org/jira/browse/LUCENE-3681?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Michael McCandless resolved LUCENE-3681.


Resolution: Fixed

> FST.BYTE2 should save as fixed 2 byte not as vInt
> -
>
> Key: LUCENE-3681
> URL: https://issues.apache.org/jira/browse/LUCENE-3681
> Project: Lucene - Java
>  Issue Type: Improvement
>Reporter: Michael McCandless
>Assignee: Michael McCandless
> Fix For: 3.6, 4.0
>
> Attachments: LUCENE-3681.patch
>
>
> We currently write BYTE1 as a single byte, but BYTE2/4 as vInt, but I think 
> that's confusing.  Also, for the FST for the new Kuromoji analyzer 
> (LUCENE-3305), writing as 2 bytes instead shrank the FST and ran faster, 
> presumably because more values were >= 16384 than were < 128.
> Separately the whole INPUT_TYPE is very confusing... really all it's doing is 
> "declaring" the allowed range of the characters of the input alphabet, and 
> then the only thing that uses that is the write/readLabel methods (well and 
> some confusing sugar methods in Builder!).  Not sure how to fix that yet...
> It's a simple change but it changes the FST binary format so any users w/ 
> FSTs out there will have to rebuild (FST is marked experimental...).

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Resolved] (LUCENE-3668) offsets issues with multiword synonyms

2012-01-07 Thread Michael McCandless (Resolved) (JIRA)


 [ 
https://issues.apache.org/jira/browse/LUCENE-3668?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Michael McCandless resolved LUCENE-3668.


   Resolution: Fixed
Fix Version/s: 4.0
   3.6

Thanks Koji!

> offsets issues with multiword synonyms
> --
>
> Key: LUCENE-3668
> URL: https://issues.apache.org/jira/browse/LUCENE-3668
> Project: Lucene - Java
>  Issue Type: Bug
>  Components: modules/analysis
>Reporter: Robert Muir
>Assignee: Michael McCandless
> Fix For: 3.6, 4.0
>
> Attachments: LUCENE-3668.patch, LUCENE-3668_test.patch
>
>
> as reported on the list, there are some strange offsets with FSTSynonyms, in 
> the case of multiword synonyms.
> as a workaround it was suggested to use the older synonym impl, but it has 
> bugs too (just in a different way).

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Resolved] (LUCENE-830) norms file can become unexpectedly enormous

2012-01-05 Thread Michael McCandless (Resolved) (JIRA)


 [ 
https://issues.apache.org/jira/browse/LUCENE-830?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Michael McCandless resolved LUCENE-830.
---

   Resolution: Fixed
Fix Version/s: 4.0

As of 4.0, when norms are missing we drop norms for the entire field, unlike 
before when we invent a fake norm for documents missing that field or omitting 
norm for it.

Also, as of 4.0, you can now make a custom norm provider and custom similarity 
so if you really want to it's possible (in theory!) to have a sparse norms data 
structure...

> norms file can become unexpectedly enormous
> ---
>
> Key: LUCENE-830
> URL: https://issues.apache.org/jira/browse/LUCENE-830
> Project: Lucene - Java
>  Issue Type: Bug
>  Components: core/index
>Affects Versions: 2.1
>Reporter: Michael McCandless
>Priority: Minor
> Fix For: 4.0
>
>
> Spinoff from this user thread:
>http://www.gossamer-threads.com/lists/lucene/java-user/46754
> Norms are not stored sparsely, so even if a doc doesn't have field X
> we still use up 1 byte in the norms file (and in memory when that
> field is searched) for that segment.  I think this is done for
> performance at search time?
> For indexes that have a large # documents where each document can have
> wildly varying fields, each segment will use # documents times # fields
> seen in that segment.  When optimize merges all segments, that product
> grows multiplicatively so the norms file for the single segment will
> require far more storage than the sum of all previous segments' norm
> files.
> I think it's uncommon to have a huge number of distinct fields (?) so
> we would need a solution that doesn't hurt the more common case where
> most documents have the same fields.  Maybe something analogous to how
> bitvectors are now optionally stored sparsely?
> One simple workaround is to disable norms.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Resolved] (LUCENE-3634) remove old static main methods in core

2012-01-03 Thread Michael McCandless (Resolved) (JIRA)


 [ 
https://issues.apache.org/jira/browse/LUCENE-3634?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Michael McCandless resolved LUCENE-3634.


Resolution: Fixed

> remove old static main methods in core
> --
>
> Key: LUCENE-3634
> URL: https://issues.apache.org/jira/browse/LUCENE-3634
> Project: Lucene - Java
>  Issue Type: Improvement
>Reporter: Michael McCandless
>Assignee: Michael McCandless
> Fix For: 3.6, 4.0
>
> Attachments: LUCENE-3634.patch
>
>
> We have a few random static main methods that I think are very rarely used... 
> we should remove them (IndexReader, UTF32ToUTF8, English).
> The IndexReader main lets you list / extract the sub-files from a CFS... I 
> think we should move this to a new tool in contrib/misc.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Resolved] (LUCENE-3605) revisit segments.gen sleeping

2011-12-22 Thread Michael McCandless (Resolved) (JIRA)


 [ 
https://issues.apache.org/jira/browse/LUCENE-3605?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Michael McCandless resolved LUCENE-3605.


   Resolution: Fixed
Fix Version/s: 4.0
   3.6

> revisit segments.gen sleeping
> -
>
> Key: LUCENE-3605
> URL: https://issues.apache.org/jira/browse/LUCENE-3605
> Project: Lucene - Java
>  Issue Type: Improvement
>Reporter: Robert Muir
>Assignee: Michael McCandless
> Fix For: 3.6, 4.0
>
> Attachments: LUCENE-3605.patch
>
>
> in LUCENE-3601, i worked up a change where we intentionally crash() all 
> un-fsynced files 
> in tests to ensure that we are calling sync on files when we should.
> I think this would be nice to do always (and with some fixes all tests pass).
> But this is super-slow sometimes because when we corrupt the unsynced 
> segments.gen, it causes
> SIS.read to take 500ms each time (and in checkindex for some reason we do 
> this twice, which seems wrong).
> I can workaround this for now for tests (just do a partial crash that avoids 
> corrupting the segments.gen),
> but I wanted to create this issue for discussion about the 
> sleeping/non-fsyncing of segments.gen, just
> because i guess its possible someone could hit this slowness.
>  

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Resolved] (LUCENE-3631) Remove write access from SegmentReader and possibly move to separate class or IndexWriter/BufferedDeletes/...

2011-12-20 Thread Michael McCandless (Resolved) (JIRA)


 [ 
https://issues.apache.org/jira/browse/LUCENE-3631?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Michael McCandless resolved LUCENE-3631.


Resolution: Fixed

> Remove write access from SegmentReader and possibly move to separate class or 
> IndexWriter/BufferedDeletes/...
> -
>
> Key: LUCENE-3631
> URL: https://issues.apache.org/jira/browse/LUCENE-3631
> Project: Lucene - Java
>  Issue Type: Task
>  Components: core/index
>Affects Versions: 4.0
>Reporter: Uwe Schindler
>Assignee: Michael McCandless
> Attachments: LUCENE-3631.patch, LUCENE-3631.patch
>
>
> After LUCENE-3606 is finished, there are some TODOs:
> SegmentReader still contains (package-private) all delete logic including 
> crazy copyOnWrite for validDocs Bits. It would be good, if SegmentReader 
> itsself could be read-only like all other IndexReaders.
> There are two possibilities to do this:
> # the simple one: Subclass SegmentReader and make a RWSegmentReader that is 
> only used by IndexWriter/BufferedDeletes/... DirectoryReader will only use 
> the read-only SegmentReader. This would move all TODOs to a separate class. 
> It's reopen/clone method would always create a RO-SegmentReader (for NRT).
> # Remove all write and commit stuff from SegmentReader completely and move it 
> to IndexWriter's readerPool (it must be in readerPool as deletions need a 
> not-changing view on an index snapshot).
> Unfortunately the code is so complicated and I have no real experience in 
> those internals of IndexWriter so I did not want to do it with LUCENE-3606, I 
> just separated the code in SegmentReader and marked with TODO. Maybe Mike 
> McCandless can help :-)

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Resolved] (LUCENE-3658) NRTCachingDir has invalid asserts (if same file name is written twice)

2011-12-20 Thread Michael McCandless (Resolved) (JIRA)


 [ 
https://issues.apache.org/jira/browse/LUCENE-3658?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Michael McCandless resolved LUCENE-3658.


Resolution: Fixed

> NRTCachingDir has invalid asserts (if same file name is written twice)
> --
>
> Key: LUCENE-3658
> URL: https://issues.apache.org/jira/browse/LUCENE-3658
> Project: Lucene - Java
>  Issue Type: Bug
>Reporter: Michael McCandless
>Assignee: Michael McCandless
> Fix For: 3.6, 4.0
>
> Attachments: LUCENE-3658.patch
>
>
> Normally Lucene is write-once (except for segments.gen file, which 
> NRTCachingDir never caches), but in some tests (TestDoc, TestCrash) we can 
> write the same file more than once.
> I don't think NRTCachingDir should have these asserts, and I think on 
> createOutput it should remove any old file if present.
> I also found & fixed a possible concurrency issue (if more than one thread 
> syncs at the same time; IndexWriter doesn't ever do this today but it has in 
> the past).

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Resolved] (LUCENE-3598) Improve InfoStream class in trunk to be more consistent with logging-frameworks like slf4j/log4j/commons-logging

2011-12-20 Thread Michael McCandless (Resolved) (JIRA)


 [ 
https://issues.apache.org/jira/browse/LUCENE-3598?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Michael McCandless resolved LUCENE-3598.


   Resolution: Fixed
Fix Version/s: 4.0

> Improve InfoStream class in trunk to be more consistent with 
> logging-frameworks like slf4j/log4j/commons-logging
> 
>
> Key: LUCENE-3598
> URL: https://issues.apache.org/jira/browse/LUCENE-3598
> Project: Lucene - Java
>  Issue Type: Improvement
>  Components: core/index
>Affects Versions: 4.0
>Reporter: Uwe Schindler
>Assignee: Uwe Schindler
> Fix For: 4.0
>
> Attachments: LUCENE-3598.patch, LUCENE-3598.patch, LUCENE-3598.patch, 
> LUCENE-3598.patch, LUCENE-3598.patch
>
>
> Followup on a [thread by Shai Erea on 
> java-dev@lao|http://lucene.472066.n3.nabble.com/IndexWriter-infoStream-is-final-td3537485.html]:
>  I already discussed with Robert about that, that there is one thing missing. 
> Currently the IW only checks if the infoStream!=null and then passes the 
> message to the method, and that *may* ignore it. For your requirement it is 
> the case that this is enabled or disabled dynamically. Unfortunately if the 
> construction of the message is heavy, then this wastes resources.
> I would like to add another method to this class: abstract boolean 
> isEnabled() that can also be implemented. I would then replace all null 
> checks in IW by this method. The default config in IW would be changed to use 
> a NoOutputInfoStream that returns false here and ignores the message.
> A simple logger wrapper for e.g. log4j / slf4j then could look like (ignoring 
> component, could be enabled):
> {code:java}
> Loger log = YourLoggingFramework.getLogger(IndexWriter.class);
> public void message(String component, String message) {
>   log.debug(component + ": " + message);
> }
> public boolean isEnabled(String component) {
>   return log.isDebugEnabled();
> }
> {code}
> Using this you could enable/disable logging live by e.g. the log4j management 
> console of your app server by enabling/disabling IndexWriter.class logging.
> The changes are really simple:
> - PrintStreamInfoStream returns true, always, mabye make it dynamically 
> enable/disable to allow Shai's request
> - infoStream.getDefault() is never null and can never be set to null. Instead 
> the default is a singleton NoOutputInfoStream that returns false of 
> isEnabled(component).
> - All null checks on infoStream should be replaced by 
> infoStream.isEanbled(component), this is possible as always != null. There 
> are no slowdowns by this - it's like Collections.emptyList() instead stupid 
> null checks.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Resolved] (LUCENE-3639) Add test case support for shard searching

2011-12-18 Thread Michael McCandless (Resolved) (JIRA)


 [ 
https://issues.apache.org/jira/browse/LUCENE-3639?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Michael McCandless resolved LUCENE-3639.


Resolution: Fixed

> Add test case support for shard searching
> -
>
> Key: LUCENE-3639
> URL: https://issues.apache.org/jira/browse/LUCENE-3639
> Project: Lucene - Java
>  Issue Type: Improvement
>Reporter: Michael McCandless
>Assignee: Michael McCandless
> Fix For: 4.0, 3.5
>
> Attachments: LUCENE-3639.patch, LUCENE-3639.patch
>
>
> New test case that helps stress test the APIs to support sharding

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Resolved] (LUCENE-3638) IndexReader.document always return a doc with all the stored fields loaded. And this can be slow for the indexed document contain huge fields

2011-12-14 Thread Michael McCandless (Resolved) (JIRA)


 [ 
https://issues.apache.org/jira/browse/LUCENE-3638?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Michael McCandless resolved LUCENE-3638.


Resolution: Fixed

Thanks Peter!

> IndexReader.document always return a doc with all the stored fields loaded. 
> And this can be slow for the indexed document contain huge fields
> -
>
> Key: LUCENE-3638
> URL: https://issues.apache.org/jira/browse/LUCENE-3638
> Project: Lucene - Java
>  Issue Type: Improvement
>  Components: core/index, core/search
>Affects Versions: 4.0
> Environment: 64bit linux java 1.6
>Reporter: peter chang
>Priority: Minor
>  Labels: patch
> Fix For: 4.0
>
> Attachments: LUCENE-3638.patch, doc.fields.patch
>
>
> when generating digest for some documents with huge fields, it should be 
> unnecessary to load the field but just interesting part of the field with the 
> offset information. but indexreader always return the whole field content. 
> afterward, the customized storedfieldsreader will got a repeated loading

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Resolved] (LUCENE-3531) Improve CachingWrapperFilter to optionally also cache acceptDocs, if identical to liveDocs

2011-12-13 Thread Michael McCandless (Resolved) (JIRA)


 [ 
https://issues.apache.org/jira/browse/LUCENE-3531?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Michael McCandless resolved LUCENE-3531.


   Resolution: Fixed
Fix Version/s: 4.0

> Improve CachingWrapperFilter to optionally also cache acceptDocs, if 
> identical to liveDocs
> --
>
> Key: LUCENE-3531
> URL: https://issues.apache.org/jira/browse/LUCENE-3531
> Project: Lucene - Java
>  Issue Type: Improvement
>  Components: core/search
>Affects Versions: 4.0
>Reporter: Uwe Schindler
>Assignee: Michael McCandless
> Fix For: 4.0
>
> Attachments: LUCENE-3531.patch
>
>
> Spinoff from LUCENE-1536: This issue removed the different cache modes 
> completely and always applies the acceptDocs using 
> BitsFilteredDocIdSet.wrap(), the cache only contains raw DocIdSet without any 
> deletions/acceptDocs. For IndexReaders that are seldom reopened, this might 
> not be as performant as it could be. If the acceptDocs==IR.liveDocs, those 
> DocIdSet could also be cached with liveDocs applied.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Resolved] (LUCENE-3586) Choose a specific Directory implementation running the CheckIndex main

2011-12-13 Thread Michael McCandless (Resolved) (JIRA)


 [ 
https://issues.apache.org/jira/browse/LUCENE-3586?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Michael McCandless resolved LUCENE-3586.


   Resolution: Fixed
Fix Version/s: 4.0
   3.6

Thanks Luca!

> Choose a specific Directory implementation running the CheckIndex main
> --
>
> Key: LUCENE-3586
> URL: https://issues.apache.org/jira/browse/LUCENE-3586
> Project: Lucene - Java
>  Issue Type: Improvement
>Reporter: Luca Cavanna
>Assignee: Luca Cavanna
>Priority: Minor
> Fix For: 3.6, 4.0
>
> Attachments: LUCENE-3586.patch, LUCENE-3586.patch, LUCENE-3586.patch, 
> LUCENE-3586.patch
>
>
> It should be possible to choose a specific Directory implementation to use 
> during the CheckIndex process when we run it from its main.
> What about an additional main parameter?
> In fact, I'm experiencing some problems with MMapDirectory working with a big 
> segment, and after some failed attempts playing with maxChunkSize, I decided 
> to switch to another FSDirectory implementation but I needed to do that on my 
> own main.
> Should we also consider to use a FileSwitchDirectory?
> I'm willing to contribute, could you please let me know your thoughts about 
> it?

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Resolved] (LUCENE-3627) CorruptIndexException on indexing after a failure occurs after segments file creation but before any bytes are written

2011-12-08 Thread Michael McCandless (Resolved) (JIRA)


 [ 
https://issues.apache.org/jira/browse/LUCENE-3627?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Michael McCandless resolved LUCENE-3627.


   Resolution: Fixed
Fix Version/s: 4.0
   3.6

> CorruptIndexException on indexing after a failure occurs after segments file 
> creation but before any bytes are written
> --
>
> Key: LUCENE-3627
> URL: https://issues.apache.org/jira/browse/LUCENE-3627
> Project: Lucene - Java
>  Issue Type: Bug
>Affects Versions: 3.5
> Environment: lucene-3.5.0, src download from GA release 
> lucene.apache.org.
> Mac OS X 10.6.5, running tests in Eclipse Build id: 20100218-1602, 
> java version "1.6.0_24"
> Java(TM) SE Runtime Environment (build 1.6.0_24-b07-334-10M3326)
> Java HotSpot(TM) 64-Bit Server VM (build 19.1-b02-334, mixed mode)
>Reporter: Ken McCracken
>Assignee: Michael McCandless
>Priority: Critical
> Fix For: 3.6, 4.0
>
> Attachments: LUCENE-3627.patch, LUCENE-3627_initial_proposal.txt, 
> TestCrashCausesCorruptIndex.java
>
>   Original Estimate: 48h
>  Remaining Estimate: 48h
>
> FSDirectory.createOutput(..) uses a RandomAccessFile to do its work.  On my 
> system the default FSDirectory.open(..) creates an NIOFSDirectory.  If 
> createOutput is called on a segments_* file and a crash occurs between 
> RandomAccessFile creation (file system shows a segments_* file exists but has 
> zero bytes) but before any bytes are written to the file, subsequent 
> IndexWriters cannot proceed.  The difficulty is that it does not know how to 
> clear the empty segments_* file.  None of the file deletions will happen on 
> such a segment file because the opening bytes cannot not be read to determine 
> format and version.
> An initial proposed patch file is attached below.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Resolved] (LUCENE-3600) BlockJoinQuery advance fails on an assert in case of a single parent with child segment

2011-11-27 Thread Michael McCandless (Resolved) (JIRA)


 [ 
https://issues.apache.org/jira/browse/LUCENE-3600?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Michael McCandless resolved LUCENE-3600.


Resolution: Fixed

> BlockJoinQuery advance fails on an assert in case of a single parent with 
> child segment
> ---
>
> Key: LUCENE-3600
> URL: https://issues.apache.org/jira/browse/LUCENE-3600
> Project: Lucene - Java
>  Issue Type: Bug
>  Components: modules/join
>Affects Versions: 3.5, 4.0
>Reporter: Shay Banon
>Assignee: Michael McCandless
> Fix For: 3.6, 4.0
>
>
> The BlockJoinQuery will fail on an assert when advance in called on a segment 
> with a single parent with a child. The call to 
> parentBits.prevSetBit(parentTarget - 1) will cause -1 to be returned, and the 
> assert will fail, though its valid. Just removing the assert fixes the 
> problem, since nextDoc will handle it properly.
> Also, I don't understand the "assert parentTarget != 0;", with a comment of 
> each parent must have one child. There isn't really a reason to add this 
> constraint, as far as I can tell..., just call nextDoc in this case, no?

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Resolved] (LUCENE-3577) rename expungeDeletes

2011-11-18 Thread Michael McCandless (Resolved) (JIRA)


 [ 
https://issues.apache.org/jira/browse/LUCENE-3577?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Michael McCandless resolved LUCENE-3577.


   Resolution: Fixed
Fix Version/s: 4.0
   3.5

> rename expungeDeletes
> -
>
> Key: LUCENE-3577
> URL: https://issues.apache.org/jira/browse/LUCENE-3577
> Project: Lucene - Java
>  Issue Type: Task
>Reporter: Robert Muir
>Assignee: Michael McCandless
> Fix For: 3.5, 4.0
>
> Attachments: LUCENE-3577.patch
>
>
> Similar to optimize(), expungeDeletes() has a misleading name.
> We already had problems with this on the user list because TieredMergePolicy
> didn't 'expunge' all their deletes.
> Also I think expunge is the wrong word, because expunge makes it seem
> like you just wrangle up the deletes and kick them out of the party and
> that it should be fast.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Resolved] (LUCENE-3562) Stop storing TermsEnum in CloseableThreadLocal inside Terms instance

2011-11-17 Thread Michael McCandless (Resolved) (JIRA)


 [ 
https://issues.apache.org/jira/browse/LUCENE-3562?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Michael McCandless resolved LUCENE-3562.


Resolution: Fixed

> Stop storing TermsEnum in CloseableThreadLocal inside Terms instance
> 
>
> Key: LUCENE-3562
> URL: https://issues.apache.org/jira/browse/LUCENE-3562
> Project: Lucene - Java
>  Issue Type: Improvement
>Reporter: Michael McCandless
>Assignee: Michael McCandless
> Fix For: 4.0
>
> Attachments: LUCENE-3562.patch, LUCENE-3562.patch
>
>
> We have sugar methods in Terms.java (docFreq, totalTermFreq, docs,
> docsAndPositions) that use a saved thread-private TermsEnum to do the
> lookups.
> But on apps that send many threads through Lucene, and/or have many
> segments, this can add up to a lot of RAM, especially if the codecs
> impl holds onto stuff.
> Also, Terms has a close method (closes the CloseableThreadLocal) which
> must be called, but we fail to do so in some places.
> These saved enums are the cause of the recent OOME in TestNRTManager
> (TestNRTManager.testNRTManager -seed
> 2aa27e1aec20c4a2:-4a5a5ecf46837d0e:-7c4f651f1f0b75d7 -mult 3
> -nightly).
> Really sharing these enums is a holdover from before Lucene queries
> would share state (ie, save the TermState from the first pass, and use
> it later to pull enums, get docFreq, etc.).  It's not helpful anymore,
> and it can use gobbs of RAM, so I'd like to remove it.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Resolved] (LUCENE-3575) Field names can be wrong for stored fields / term vectors after merging

2011-11-16 Thread Michael McCandless (Resolved) (JIRA)


 [ 
https://issues.apache.org/jira/browse/LUCENE-3575?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Michael McCandless resolved LUCENE-3575.


Resolution: Fixed

I also ported the test case back to 3.x.

> Field names can be wrong for stored fields / term vectors after merging
> ---
>
> Key: LUCENE-3575
> URL: https://issues.apache.org/jira/browse/LUCENE-3575
> Project: Lucene - Java
>  Issue Type: Bug
>Affects Versions: 4.0
>Reporter: Michael McCandless
>Assignee: Michael McCandless
> Fix For: 4.0
>
> Attachments: LUCENE-3575.patch
>
>
> The good news is this bug only exists in trunk... the bad news is it's
> been here for some time (created by accident in LUCENE-2881).  But the
> good news is it should strike fairly rarely.
> SegmentMerger sometimes incorrectly thinks it can bulk-copy TVs/stored
> fields when it cannot (because field numbers don't map to the same
> names across segments).
> I think it happens only with addIndexes, or indexes that have
> pre-trunk segments, and then SM falsely thinks it can bulk-merge only
> when the last field number has the same field name across segments.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Resolved] (LUCENE-3578) TestSort testParallelMultiSort reproducible seed failure

2011-11-15 Thread Michael McCandless (Resolved) (JIRA)


 [ 
https://issues.apache.org/jira/browse/LUCENE-3578?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Michael McCandless resolved LUCENE-3578.


   Resolution: Fixed
Fix Version/s: 4.0

Thanks selckin!

I had to generalize the check I committed for LUCENE-3572 to catch any embedded 
SlowMultiReaderWrappers...

> TestSort testParallelMultiSort reproducible seed failure
> 
>
> Key: LUCENE-3578
> URL: https://issues.apache.org/jira/browse/LUCENE-3578
> Project: Lucene - Java
>  Issue Type: Bug
>Reporter: selckin
>Assignee: Michael McCandless
> Fix For: 4.0
>
>
> trunk r1202157
> {code}
> [junit] Testsuite: org.apache.lucene.search.TestSort
> [junit] Tests run: 1, Failures: 1, Errors: 0, Time elapsed: 0.978 sec
> [junit] 
> [junit] - Standard Error -
> [junit] NOTE: reproduce with: ant test -Dtestcase=TestSort 
> -Dtestmethod=testParallelMultiSort 
> -Dtests.seed=-2996f3e0f5d118c2:32c8e62dd9611f63:7a90f44586ae8263 
> -Dargs="-Dfile.encoding=UTF-8"
> [junit] WARNING: test method: 'testParallelMultiSort' left thread 
> running: Thread[pool-1-thread-1,5,main]
> [junit] WARNING: test method: 'testParallelMultiSort' left thread 
> running: Thread[pool-1-thread-2,5,main]
> [junit] WARNING: test method: 'testParallelMultiSort' left thread 
> running: Thread[pool-1-thread-3,5,main]
> [junit] NOTE: test params are: codec=Lucene40: 
> {short=Lucene40(minBlockSize=98 maxBlockSize=214), 
> contents=PostingsFormat(name=MockSep), byte=PostingsFormat(name=SimpleText), 
> int=Pulsing40(freqCutoff=4 minBlockSize=58 maxBlockSize=186), 
> string=PostingsFormat(name=NestedPulsing), i18n=Lucene40(minBlockSize=98 
> maxBlockSize=214), long=PostingsFormat(name=Memory), 
> double=Pulsing40(freqCutoff=4 minBlockSize=58 maxBlockSize=186), 
> parser=MockVariableIntBlock(baseBlockSize=88), float=Lucene40(minBlockSize=98 
> maxBlockSize=214), custom=PostingsFormat(name=MockRandom)}, 
> sim=RandomSimilarityProvider(queryNorm=false,coord=false): 
> {short=BM25(k1=1.2,b=0.75), tracer=DFR I(ne)B2, byte=DFR I(ne)B3(800.0), 
> contents=IB LL-LZ(0.3), int=DFR I(n)BZ(0.3), string=IB LL-D3(800.0), i18n=DFR 
> GB2, double=DFR I(ne)B2, long=DFR GB1, parser=DFR GL2, 
> float=BM25(k1=1.2,b=0.75), custom=DFR I(ne)Z(0.3)}, locale=ga_IE, 
> timezone=America/Louisville
> [junit] NOTE: all tests run in this JVM:
> [junit] [TestSort]
> [junit] NOTE: Linux 3.0.6-gentoo amd64/Sun Microsystems Inc. 1.6.0_29 
> (64-bit)/cpus=8,threads=4,free=78022136,total=125632512
> [junit] -  ---
> [junit] Testcase: 
> testParallelMultiSort(org.apache.lucene.search.TestSort): FAILED
> [junit] expected:<[ZJ]I> but was:<[JZ]I>
> [junit] junit.framework.AssertionFailedError: expected:<[ZJ]I> but 
> was:<[JZ]I>
> [junit] at 
> org.apache.lucene.search.TestSort.assertMatches(TestSort.java:1245)
> [junit] at 
> org.apache.lucene.search.TestSort.assertMatches(TestSort.java:1216)
> [junit] at 
> org.apache.lucene.search.TestSort.runMultiSorts(TestSort.java:1202)
> [junit] at 
> org.apache.lucene.search.TestSort.testParallelMultiSort(TestSort.java:855)
> [junit] at 
> org.apache.lucene.util.LuceneTestCase$3$1.evaluate(LuceneTestCase.java:523)
> [junit] at 
> org.apache.lucene.util.LuceneTestCaseRunner.runChild(LuceneTestCaseRunner.java:149)
> [junit] at 
> org.apache.lucene.util.LuceneTestCaseRunner.runChild(LuceneTestCaseRunner.java:51)
> [junit] 
> [junit] 
> [junit] Test org.apache.lucene.search.TestSort FAILED
> {code}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Resolved] (LUCENE-3572) MultiIndexDocValues pretends it can merge sorted sources

2011-11-15 Thread Michael McCandless (Resolved) (JIRA)


 [ 
https://issues.apache.org/jira/browse/LUCENE-3572?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Michael McCandless resolved LUCENE-3572.


Resolution: Fixed

> MultiIndexDocValues pretends it can merge sorted sources
> 
>
> Key: LUCENE-3572
> URL: https://issues.apache.org/jira/browse/LUCENE-3572
> Project: Lucene - Java
>  Issue Type: Bug
>Reporter: Michael McCandless
>Assignee: Michael McCandless
> Fix For: 4.0
>
> Attachments: LUCENE-3572.patch
>
>
> Nightly build hit this failure:
> {noformat}
> ant test-core -Dtestcase=TestSort -Dtestmethod=testReverseSort 
> -Dtests.seed=791b126576b0cfab:-48895c7243ecc5d0:743c683d1c9f7768 
> -Dtests.multiplier=3 -Dargs="-Dfile.encoding=ISO8859-1"
> [junit] Testcase: testReverseSort(org.apache.lucene.search.TestSort): 
> Caused an ERROR
> [junit] expected:<[CEGIA]> but was:<[ACEGI]>
> [junit]   at 
> org.apache.lucene.search.TestSort.assertMatches(TestSort.java:1248)
> [junit]   at 
> org.apache.lucene.search.TestSort.assertMatches(TestSort.java:1216)
> [junit]   at 
> org.apache.lucene.search.TestSort.testReverseSort(TestSort.java:759)
> [junit]   at 
> org.apache.lucene.util.LuceneTestCase$3$1.evaluate(LuceneTestCase.java:523)
> [junit]   at 
> org.apache.lucene.util.LuceneTestCaseRunner.runChild(LuceneTestCaseRunner.java:149)
> [junit]   at 
> org.apache.lucene.util.LuceneTestCaseRunner.runChild(LuceneTestCaseRunner.java:51)
> {noformat}
> It's happening in the test for reverse-sort of a string field with DocValues, 
> when the test had gotten SlowMultiReaderWrapper.
> I committed a fix to the test to avoid testing this case, but we need a 
> better fix to the underlying bug.
> MultiIndexDocValues cannot merge sorted sources (I think?), yet somehow it's 
> pretending it can (in the above test, the three subs had BYTES_FIXED_SORTED 
> type, and the TypePromoter happily claims to merge these to 
> BYTES_FIXED_SORTED; I think MultiIndexDocValues should return null for the 
> sorted source in this case?

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Resolved] (LUCENE-3518) Add sort-by-term with DocValues

2011-11-13 Thread Michael McCandless (Resolved) (JIRA)


 [ 
https://issues.apache.org/jira/browse/LUCENE-3518?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Michael McCandless resolved LUCENE-3518.


Resolution: Fixed

> Add sort-by-term with DocValues
> ---
>
> Key: LUCENE-3518
> URL: https://issues.apache.org/jira/browse/LUCENE-3518
> Project: Lucene - Java
>  Issue Type: New Feature
>  Components: core/search
>Reporter: Michael McCandless
>Assignee: Michael McCandless
> Fix For: 4.0
>
> Attachments: LUCENE-3518.patch, LUCENE-3518.patch, LUCENE-3518.patch
>
>
> There are two sorted byte[] types with DocValues (BYTES_VAR_SORTED,
> BYTES_FIXED_SORTED), so you can index this type, but you can't yet
> sort by it.
> So I added a FieldComparator just like TermOrdValComparator, except it
> pulls from the doc values instead.
> There are some small diffs, eg with doc values there are never null
> values (see LUCENE-3504).

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Resolved] (LUCENE-3454) rename optimize to a less cool-sounding name

2011-11-12 Thread Michael McCandless (Resolved) (JIRA)


 [ 
https://issues.apache.org/jira/browse/LUCENE-3454?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Michael McCandless resolved LUCENE-3454.


   Resolution: Fixed
Fix Version/s: 4.0
   3.5

> rename optimize to a less cool-sounding name
> 
>
> Key: LUCENE-3454
> URL: https://issues.apache.org/jira/browse/LUCENE-3454
> Project: Lucene - Java
>  Issue Type: Improvement
>Affects Versions: 3.4, 4.0
>Reporter: Robert Muir
>Assignee: Michael McCandless
> Fix For: 3.5, 4.0
>
> Attachments: LUCENE-3454.patch, LUCENE-3454.patch
>
>
> I think users see the name optimize and feel they must do this, because who 
> wants a suboptimal system? but this probably just results in wasted time and 
> resources.
> maybe rename to collapseSegments or something?

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Resolved] (LUCENE-3443) Port 3.x FieldCache.getDocsWithField() to trunk

2011-11-10 Thread Michael McCandless (Resolved) (JIRA)


 [ 
https://issues.apache.org/jira/browse/LUCENE-3443?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Michael McCandless resolved LUCENE-3443.


Resolution: Fixed

> Port 3.x FieldCache.getDocsWithField() to trunk
> ---
>
> Key: LUCENE-3443
> URL: https://issues.apache.org/jira/browse/LUCENE-3443
> Project: Lucene - Java
>  Issue Type: Improvement
>  Components: core/search
>Reporter: Michael McCandless
>Assignee: Michael McCandless
> Fix For: 3.5, 4.0
>
> Attachments: LUCENE-3443.patch, LUCENE-3443.patch, LUCENE-3443.patch
>
>
> [Spinoff from LUCENE-3390]
> I think the approach in 3.x for handling un-valued docs, and making it
> possible to specify how such docs are sorted, is better than the
> solution we have in trunk.
> I like that FC has a dedicated method to get the Bits for docs with field
> -- easy for apps to directly use.  And I like that the
> bits have their own entry in the FC.
> One downside is that it's 2 passes to get values and valid bits, but
> I think we can fix this by passing optional bool to FC.getXXX methods
> indicating you want the bits, and the populate the FC entry for the
> missing bits as well.  (We can do that for 3.x and trunk). Then it's
> single pass.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Resolved] (LUCENE-3339) TestNRTThreads hangs in nightly 3.x builds

2011-11-09 Thread Michael McCandless (Resolved) (JIRA)


 [ 
https://issues.apache.org/jira/browse/LUCENE-3339?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Michael McCandless resolved LUCENE-3339.


   Resolution: Fixed
Fix Version/s: (was: 3.5)
   3.4

Simon, yes I think so; I believe this was fixed in 3.4.0.

> TestNRTThreads hangs in nightly 3.x builds
> --
>
> Key: LUCENE-3339
> URL: https://issues.apache.org/jira/browse/LUCENE-3339
> Project: Lucene - Java
>  Issue Type: Bug
>Reporter: Robert Muir
>Assignee: Michael McCandless
> Fix For: 3.4
>
> Attachments: LUCENE-3339.patch
>
>
> Maybe we have a problem, maybe its a bug in the test.
> But its strange that lately the 3.x nightlies have been hanging here.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Resolved] (LUCENE-3524) Add "direct" PackedInts.Reader impl, that reads directly from disk on each get

2011-11-07 Thread Michael McCandless (Resolved) (JIRA)


 [ 
https://issues.apache.org/jira/browse/LUCENE-3524?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Michael McCandless resolved LUCENE-3524.


   Resolution: Fixed
Fix Version/s: 4.0

> Add "direct" PackedInts.Reader impl, that reads directly from disk on each get
> --
>
> Key: LUCENE-3524
> URL: https://issues.apache.org/jira/browse/LUCENE-3524
> Project: Lucene - Java
>  Issue Type: Improvement
>Reporter: Michael McCandless
>Assignee: Michael McCandless
> Fix For: 4.0
>
> Attachments: LUCENE-3524.patch, LUCENE-3524.patch
>
>
> Spinoff from LUCENE-3518.
> If we had a direct PackedInts.Reader impl we could use that instead of
> the RandomAccessReaderIterator.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Resolved] (LUCENE-3563) TestPagedBytes failure

2011-11-06 Thread Michael McCandless (Resolved) (JIRA)


 [ 
https://issues.apache.org/jira/browse/LUCENE-3563?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Michael McCandless resolved LUCENE-3563.


   Resolution: Fixed
Fix Version/s: 4.0
   3.5

> TestPagedBytes failure
> --
>
> Key: LUCENE-3563
> URL: https://issues.apache.org/jira/browse/LUCENE-3563
> Project: Lucene - Java
>  Issue Type: Bug
>Affects Versions: 3.5
>Reporter: Robert Muir
> Fix For: 3.5, 4.0
>
>
> ant test -Dtestcase=TestPagedBytes -Dtestmethod=testDataInputOutput 
> -Dtests.seed=268db1f3329b70d:3125365bc9c56c90:116e02aa4a70ec2f 
> -Dtests.multiplier=5

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Resolved] (LUCENE-3539) IndexFormatTooOld/NewExc should try to include fileName + directory when possible

2011-11-04 Thread Michael McCandless (Resolved) (JIRA)


 [ 
https://issues.apache.org/jira/browse/LUCENE-3539?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Michael McCandless resolved LUCENE-3539.


Resolution: Fixed

> IndexFormatTooOld/NewExc should try to include fileName + directory when 
> possible
> -
>
> Key: LUCENE-3539
> URL: https://issues.apache.org/jira/browse/LUCENE-3539
> Project: Lucene - Java
>  Issue Type: Improvement
>Reporter: Michael McCandless
>Assignee: Michael McCandless
> Fix For: 3.5, 4.0
>
> Attachments: LUCENE-3539.patch, LUCENE-3539.patch, LUCENE-3539.patch
>
>
> (Spinoff from http://markmail.org/thread/t6s7nn3ve765nojc )
> When we throw a too old/new exc we should try to include the full path to the 
> offending file, if possible.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Resolved] (LUCENE-2205) Rework of the TermInfosReader class to remove the Terms[], TermInfos[], and the index pointer long[] and create a more memory efficient data structure.

2011-10-27 Thread Michael McCandless (Resolved) (JIRA)


 [ 
https://issues.apache.org/jira/browse/LUCENE-2205?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Michael McCandless resolved LUCENE-2205.


   Resolution: Fixed
Fix Version/s: 4.0

Finally resolved; thanks Aaron!

> Rework of the TermInfosReader class to remove the Terms[], TermInfos[], and 
> the index pointer long[] and create a more memory efficient data structure.
> ---
>
> Key: LUCENE-2205
> URL: https://issues.apache.org/jira/browse/LUCENE-2205
> Project: Lucene - Java
>  Issue Type: Improvement
>  Components: core/index
> Environment: Java5
>Reporter: Aaron McCurry
>Assignee: Michael McCandless
> Fix For: 3.5, 4.0
>
> Attachments: LUCENE-2205.patch, LUCENE-2205.patch, 
> RandomAccessTest.java, TermInfosReader.java, TermInfosReaderIndex.java, 
> TermInfosReaderIndexDefault.java, TermInfosReaderIndexSmall.java, 
> lowmemory_w_utf8_encoding.patch, lowmemory_w_utf8_encoding.v4.patch, 
> patch-final.txt, rawoutput.txt
>
>
> Basically packing those three arrays into a byte array with an int array as 
> an index offset.  
> The performance benefits are stagering on my test index (of size 6.2 GB, with 
> ~1,000,000 documents and ~175,000,000 terms), the memory needed to load the 
> terminfos into memory were reduced to 17% of there original size.  From 291.5 
> MB to 49.7 MB.  The random access speed has been made better by 1-2%, load 
> time of the segments are ~40% faster as well, and full GC's on my JVM were 
> made 7 times faster.
> I have already performed the work and am offering this code as a patch.  
> Currently all test in the trunk pass with this new code enabled.  I did write 
> a system property switch to allow for the original implementation to be used 
> as well.
> -Dorg.apache.lucene.index.TermInfosReader=default or small
> I have also written a blog about this patch here is the link.
> http://www.nearinfinity.com/blogs/aaron_mccurry/my_first_lucene_patch.html

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Resolved] (LUCENE-3522) TermsFilter.getDocIdSet(context) NPE on missing field

2011-10-17 Thread Michael McCandless (Resolved) (JIRA)


 [ 
https://issues.apache.org/jira/browse/LUCENE-3522?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Michael McCandless resolved LUCENE-3522.


   Resolution: Fixed
Fix Version/s: 4.0

Thanks Dan!

I committed to trunk and backported the test case to 3.x.  I had to add missing 
rd1/2.close() at the end of the test case.

> TermsFilter.getDocIdSet(context) NPE on missing field
> -
>
> Key: LUCENE-3522
> URL: https://issues.apache.org/jira/browse/LUCENE-3522
> Project: Lucene - Java
>  Issue Type: Bug
>  Components: modules/other
>Affects Versions: 4.0
>Reporter: Dan Climan
>Assignee: Michael McCandless
>Priority: Minor
> Fix For: 4.0
>
> Attachments: LUCENE-3522.patch
>
>
> If the context does not contain the field for a term when calling 
> TermsFilter.getDocIdSet(AtomicReaderContext context) then a 
> NullPointerException is thrown due to not checking for null Terms before 
> getting iterator.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Resolved] (LUCENE-3520) If the NRT reader hasn't changed then IndexReader.openIfChanged should return null

2011-10-16 Thread Michael McCandless (Resolved) (JIRA)


 [ 
https://issues.apache.org/jira/browse/LUCENE-3520?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Michael McCandless resolved LUCENE-3520.


Resolution: Fixed

> If the NRT reader hasn't changed then IndexReader.openIfChanged should return 
> null
> --
>
> Key: LUCENE-3520
> URL: https://issues.apache.org/jira/browse/LUCENE-3520
> Project: Lucene - Java
>  Issue Type: Bug
>  Components: core/index
>Reporter: Michael McCandless
>Assignee: Michael McCandless
> Fix For: 3.5, 4.0
>
> Attachments: LUCENE-3520.patch, LUCENE-3520.patch
>
>
> I hit a failure in TestSearcherManager (NOTE: doesn't always fail):
> {noformat}
>   ant test -Dtestcase=TestSearcherManager -Dtestmethod=testSearcherManager 
> -Dtests.seed=459ac99a4256789c:-29b8a7f52497c3b4:145ae632ae9e1ecf
> {noformat}
> It was tripping the assert inside SearcherLifetimeManager.record,
> because two different IndexSearcher instances had different IR
> instances sharing the same version.  This was happening because
> IW.getReader always returns a new reader even when there are no
> changes.  I think we should fix that...
> Separately I found a deadlock in
> TestSearcherManager.testIntermediateClose, if the test gets
> SerialMergeScheduler and needs to merge during the second commit.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Resolved] (LUCENE-3515) Possible slowdown of indexing/merging on 3.x vs trunk

2011-10-14 Thread Michael McCandless (Resolved) (JIRA)


 [ 
https://issues.apache.org/jira/browse/LUCENE-3515?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Michael McCandless resolved LUCENE-3515.


Resolution: Fixed

Thank you Marc and Erick!  This was a devious issue and severely impacted merge 
performance for non-MMapDir impls.

> Possible slowdown of indexing/merging on 3.x vs trunk
> -
>
> Key: LUCENE-3515
> URL: https://issues.apache.org/jira/browse/LUCENE-3515
> Project: Lucene - Java
>  Issue Type: Bug
>  Components: core/index
>Reporter: Michael McCandless
>Assignee: Michael McCandless
> Fix For: 3.5, 4.0
>
> Attachments: LUCENE-3515.patch, LUCENE-3515.patch, 
> LUCENE-index-34.patch, LUCENE-index-40.patch, TestGenerationTime.java.3x, 
> TestGenerationTime.java.40, stdout-snow-leopard.tar.gz
>
>
> Opening an issue to pursue the possible slowdown Marc Sturlese uncovered.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Resolved] (LUCENE-3519) BlockJoinCollector only allows retrieving groups for only one BlockJoinQuery

2011-10-14 Thread Michael McCandless (Resolved) (JIRA)


 [ 
https://issues.apache.org/jira/browse/LUCENE-3519?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Michael McCandless resolved LUCENE-3519.


Resolution: Fixed

Thanks Mark!

> BlockJoinCollector only allows retrieving groups for only one BlockJoinQuery
> 
>
> Key: LUCENE-3519
> URL: https://issues.apache.org/jira/browse/LUCENE-3519
> Project: Lucene - Java
>  Issue Type: Bug
>  Components: modules/join
>Reporter: Michael McCandless
>Assignee: Michael McCandless
> Fix For: 3.5, 4.0
>
> Attachments: LUCENE-3519.patch
>
>
> Spinoff from Mark Harwood's email (subject "BlockJoin concerns") to
> dev list.
> It's fine to use multiple nested joins in a single query, and
> BlockJoinCollector should let you retrieve the top groups for all of
> them.
> But currently it always returns null after the first query's groups
> have been retrieved, because of a silly bug.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Resolved] (LUCENE-3510) BooleanScorer should not limit number of prohibited clauses

2011-10-14 Thread Michael McCandless (Resolved) (JIRA)


 [ 
https://issues.apache.org/jira/browse/LUCENE-3510?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Michael McCandless resolved LUCENE-3510.


Resolution: Fixed

> BooleanScorer should not limit number of prohibited clauses
> ---
>
> Key: LUCENE-3510
> URL: https://issues.apache.org/jira/browse/LUCENE-3510
> Project: Lucene - Java
>  Issue Type: Improvement
>Reporter: Michael McCandless
>Assignee: Michael McCandless
> Fix For: 3.5, 4.0
>
> Attachments: LUCENE-3510.patch
>
>
> Today it's limited to 32, because it uses a separate bit in the mask
> for each clause.
> But I don't understand why it does this; I think all prohibited
> clauses can share a single boolean/bit?  Any match on a prohibited
> clause sets this bit and the doc is not collected; we don't need each
> prohibited clause to have a dedicated bit?
> We also use the mask for required clauses, but this code is now
> commented out (we always use BS2 if there are any required clauses);
> if we re-enable this code (and I think we should, at least in certain
> cases: I suspect it'd be faster than BS2 in many cases), I think we
> can cutover to an int count instead of bit masks, and then have no
> limit on the required clauses sent to BooleanScorer also.
> Separately I cleaned a few things up about BooleanScorer: all of the
> embedded scorer methods (nextDoc, docID, advance, score) now throw
> UOE; pre-allocate the buckets instead of doing it lazily
> per-sub-collect.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Resolved] (LUCENE-3504) DocValues: deref/sorted bytes types shouldn't return empty byte[] when doc didn't have a value

2011-10-11 Thread Michael McCandless (Resolved) (JIRA)


 [ 
https://issues.apache.org/jira/browse/LUCENE-3504?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Michael McCandless resolved LUCENE-3504.


Resolution: Won't Fix

> DocValues: deref/sorted bytes types shouldn't return empty byte[] when doc 
> didn't have a value
> --
>
> Key: LUCENE-3504
> URL: https://issues.apache.org/jira/browse/LUCENE-3504
> Project: Lucene - Java
>  Issue Type: Bug
>Reporter: Michael McCandless
>Assignee: Michael McCandless
> Fix For: 4.0
>
>
> I'm looking at making a FieldComparator that uses DV's SortedSource to
> sort by string field (ie just like TermOrdValComparator, except using
> DV instead of FieldCache).  We already have comparators for DV int and
> float DV fields.
> But one thing I noticed is we can't detect documents that didn't have
> any value indexed vs documents that had empty byte[] indexed.
> This is easy to fix (and we used to do this), because these types are
> deref'd (ie, each doc stores an address, and then separately looks up
> the byte[] at that address), we can reserve ord/address 0 to mean "doc
> didn't have the field".  Then we should return null when you retrieve
> the BytesRef value for that field.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Resolved] (LUCENE-3486) Add SearcherLifetimeManager, so you can retrieve the same searcher you previously used

2011-10-10 Thread Michael McCandless (Resolved) (JIRA)


 [ 
https://issues.apache.org/jira/browse/LUCENE-3486?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Michael McCandless resolved LUCENE-3486.


Resolution: Fixed

> Add SearcherLifetimeManager, so you can retrieve the same searcher you 
> previously used
> --
>
> Key: LUCENE-3486
> URL: https://issues.apache.org/jira/browse/LUCENE-3486
> Project: Lucene - Java
>  Issue Type: New Feature
>  Components: core/search
>Reporter: Michael McCandless
>Assignee: Michael McCandless
> Fix For: 3.5, 4.0
>
> Attachments: LUCENE-3486.patch, LUCENE-3486.patch, LUCENE-3486.patch
>
>
> The idea is similar to SOLR-2809 (adding searcher leases to Solr).
> This utility class sits above whatever your source is for "the
> current" searcher (eg NRTManager, SearcherManager, etc.), and records
> (holds a reference to) each searcher in recent history.
> The idea is to ensure that when a user does a follow-on action (clicks
> next page, drills down/up), or when two or more searcher invocations
> within a single user search need to happen against the same searcher
> (eg in distributed search), you can retrieve the same searcher you
> used "last time".
> I think with the new searchAfter API (LUCENE-2215), doing follow-on
> searches on the same searcher is more important, since the "bottom"
> (score/docID) held for that API can easily shift when a new searcher
> is opened.
> When you do a "new" search, you record the searcher you used with the
> manager, and it returns to you a long token (currently just the
> IR.getVersion()), which you can later use to retrieve the same
> searcher.
> Separately you must periodically call prune(), to prune the old
> searchers, ideally from the same thread / at the same time that
> you open a new searcher.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Resolved] (LUCENE-3502) Packed ints: move .getArray into Reader API

2011-10-10 Thread Michael McCandless (Resolved) (JIRA)


 [ 
https://issues.apache.org/jira/browse/LUCENE-3502?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Michael McCandless resolved LUCENE-3502.


Resolution: Fixed

Thanks Simon and Robert.

> Packed ints: move .getArray into Reader API
> ---
>
> Key: LUCENE-3502
> URL: https://issues.apache.org/jira/browse/LUCENE-3502
> Project: Lucene - Java
>  Issue Type: Improvement
>Reporter: Michael McCandless
>Assignee: Michael McCandless
> Fix For: 4.0
>
> Attachments: LUCENE-3502.patch, LUCENE-3502.patch
>
>
> This is a simple code cleanup... it's messy that a consumer of
> PackedInts.Reader must check whether the impl is Direct8/16/32/64 in
> order to get an array; it's better to move up the .getArray into the
> Reader interface and then make the DirectN impls package private.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Resolved] (LUCENE-3464) Rename IndexReader.reopen to make it clear that reopen may not happen

2011-10-05 Thread Michael McCandless (Resolved) (JIRA)


 [ 
https://issues.apache.org/jira/browse/LUCENE-3464?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Michael McCandless resolved LUCENE-3464.


Resolution: Fixed

Thanks everyone!

> Rename IndexReader.reopen to make it clear that reopen may not happen
> -
>
> Key: LUCENE-3464
> URL: https://issues.apache.org/jira/browse/LUCENE-3464
> Project: Lucene - Java
>  Issue Type: Bug
>Reporter: Michael McCandless
>Assignee: Michael McCandless
> Fix For: 3.5, 4.0
>
> Attachments: LUCENE-3464.3x.patch, LUCENE-3464.patch, 
> LUCENE-3464.patch
>
>
> Spinoff from LUCENE-3454 where Shai noted this inconsistency.
> IR.reopen sounds like an unconditional operation, which has trapped users in 
> the past into always closing the old reader instead of only closing it if the 
> returned reader is new.
> I think this hidden maybe-ness is trappy and we should rename it 
> (maybeReopen?  reopenIfNeeded?).
> In addition, instead of returning "this" when the reopen didn't happen, I 
> think we should return null to enforce proper usage of the maybe-ness of this 
> API.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Resolved] (LUCENE-3477) Fix JFlex tokenizer compiler warnings

2011-09-30 Thread Michael McCandless (Resolved) (JIRA)


 [ 
https://issues.apache.org/jira/browse/LUCENE-3477?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Michael McCandless resolved LUCENE-3477.


Resolution: Fixed

> Fix JFlex tokenizer compiler warnings
> -
>
> Key: LUCENE-3477
> URL: https://issues.apache.org/jira/browse/LUCENE-3477
> Project: Lucene - Java
>  Issue Type: Improvement
>Reporter: Michael McCandless
> Fix For: 3.5, 4.0
>
> Attachments: LUCENE-3477.patch
>
>
> We get lots of distracting fallthrough warnings running "ant compile"
> in modules/analysis, from the tokenizers generated from JFlex.
> Digging a bit, they actually do look spooky.
> So I managed to edit the JFlex inputs to insert a bunch of break
> statements in our rules, but I have no idea if this is
> right/dangerous, and it seems a bit weird having to do such insertions
> of "naked" breaks.
> But, this does fix all the warnings, and all tests pass...

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Resolved] (LUCENE-3472) add back Document.getValues()

2011-09-27 Thread Michael McCandless (Resolved) (JIRA)


 [ 
https://issues.apache.org/jira/browse/LUCENE-3472?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Michael McCandless resolved LUCENE-3472.


Resolution: Fixed

> add back Document.getValues()
> -
>
> Key: LUCENE-3472
> URL: https://issues.apache.org/jira/browse/LUCENE-3472
> Project: Lucene - Java
>  Issue Type: Bug
>Reporter: Robert Muir
>Assignee: Michael McCandless
> Fix For: 4.0
>
>
> I'm porting some code to trunk's new Doc/Field apis, and i keep running into 
> this pattern:
> {noformat}
> String[] values = doc.getValues("field");
> {noformat}
> But with the new apis, this becomes a little too verbose:
> {noformat}
> IndexableField[] fields = doc.getFields("field");
> String[] values = new String[fields.length];
> for (int i = 0; i < values.length; i++) {
>   values[i] = fields[i].stringValue();
> }
> {noformat}
> I think we should probably add back the sugar api (with the same name) ?

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Resolved] (LUCENE-3471) TestNRTManager test failure

2011-09-27 Thread Michael McCandless (Resolved) (JIRA)


 [ 
https://issues.apache.org/jira/browse/LUCENE-3471?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Michael McCandless resolved LUCENE-3471.


   Resolution: Fixed
Fix Version/s: 4.0
   3.5

Thank you Charlie!

> TestNRTManager test failure
> ---
>
> Key: LUCENE-3471
> URL: https://issues.apache.org/jira/browse/LUCENE-3471
> Project: Lucene - Java
>  Issue Type: Bug
>Reporter: Robert Muir
>Assignee: Michael McCandless
> Fix For: 3.5, 4.0
>
>
> reproduces for me

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Resolved] (LUCENE-3465) IndexSearcher fails to pass docBase to Collector when using ExecutorService

2011-09-26 Thread Michael McCandless (Resolved) (JIRA)


 [ 
https://issues.apache.org/jira/browse/LUCENE-3465?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Michael McCandless resolved LUCENE-3465.


Resolution: Fixed

> IndexSearcher fails to pass docBase to Collector when using ExecutorService
> ---
>
> Key: LUCENE-3465
> URL: https://issues.apache.org/jira/browse/LUCENE-3465
> Project: Lucene - Java
>  Issue Type: Bug
>Reporter: Michael McCandless
>Assignee: Michael McCandless
> Fix For: 3.5
>
> Attachments: LUCENE-3465.patch
>
>
> This bug is causing the failure in TestSearchAfter.
> We are now always passing docBase 0 to Collector when you use ExecutorService 
> with IndexSearcher.
> This doesn't affect trunk (AtomicReaderContext carries the right docBase); 
> only 3.x.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

93 matches

Mail list logo