[jira] [Resolved] (LUCENE-3970) Rename getUnique[Field/Terms]Count() into size()
[ https://issues.apache.org/jira/browse/LUCENE-3970?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Michael McCandless resolved LUCENE-3970. Resolution: Fixed Thanks Iulius! > Rename getUnique[Field/Terms]Count() into size() > > > Key: LUCENE-3970 > URL: https://issues.apache.org/jira/browse/LUCENE-3970 > Project: Lucene - Java > Issue Type: Task > Components: core/index >Reporter: Iulius Curt >Assignee: Michael McCandless >Priority: Minor > Fix For: 4.0 > > Attachments: LUCENE-3970.patch > > > Like Robert Muir said in LUCENE-3109: > {quote}Also I think there are other improvements we can do here that would be > more natural: > Fields.getUniqueFieldCount() -> Fields.size() > Terms.getUniqueTermCount() -> Terms.size(){quote} > I believe this dramatically improves understandability (way less 'scary', > actually beautiful). -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Resolved] (LUCENE-3942) SynonymFilter should set pos length att
[ https://issues.apache.org/jira/browse/LUCENE-3942?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Michael McCandless resolved LUCENE-3942. Resolution: Fixed > SynonymFilter should set pos length att > --- > > Key: LUCENE-3942 > URL: https://issues.apache.org/jira/browse/LUCENE-3942 > Project: Lucene - Java > Issue Type: Improvement >Reporter: Michael McCandless >Assignee: Michael McCandless > Fix For: 4.0 > > Attachments: LUCENE-3942.patch > > > Tokenizers/Filters can now produce graphs instead of a single linear > chain of tokens, by setting the PositionLengthAttribute, expressing > where (how many positions ahead) this token "ends". > The default is 1, meaning it ends at the next position, to be > backwards compatible. > SynonymFilter produces graph output tokens, as long as the output is a > single token, but currently never sets the pos length to express this. > EG for the rule "wifi network -> hotspot", the hotspot token should > have pos length = 2. With LUCENE-3940 this will allow us to verify > that the offsets for such tokens are correct... -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Resolved] (LUCENE-3940) When Japanese (Kuromoji) tokenizer removes a punctuation token it should leave a hole
[ https://issues.apache.org/jira/browse/LUCENE-3940?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Michael McCandless resolved LUCENE-3940. Resolution: Fixed > When Japanese (Kuromoji) tokenizer removes a punctuation token it should > leave a hole > - > > Key: LUCENE-3940 > URL: https://issues.apache.org/jira/browse/LUCENE-3940 > Project: Lucene - Java > Issue Type: Bug >Reporter: Michael McCandless >Assignee: Michael McCandless >Priority: Minor > Fix For: 4.0 > > Attachments: LUCENE-3940.patch, LUCENE-3940.patch, LUCENE-3940.patch, > LUCENE-3940.patch > > > I modified BaseTokenStreamTestCase to assert that the start/end > offsets match for graph (posLen > 1) tokens, and this caught a bug in > Kuromoji when the decompounding of a compound token has a punctuation > token that's dropped. > In this case we should leave hole(s) so that the graph is intact, ie, > the graph should look the same as if the punctuation tokens were not > initially removed, but then a StopFilter had removed them. > This also affects tokens that have no compound over them, ie we fail > to leave a hole today when we remove the punctuation tokens. > I'm not sure this is serious enough to warrant fixing in 3.6 at the > last minute... -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Resolved] (LUCENE-3932) Improve load time of .tii files
[ https://issues.apache.org/jira/browse/LUCENE-3932?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Michael McCandless resolved LUCENE-3932. Resolution: Fixed Fix Version/s: 4.0 Assignee: Michael McCandless > Improve load time of .tii files > --- > > Key: LUCENE-3932 > URL: https://issues.apache.org/jira/browse/LUCENE-3932 > Project: Lucene - Java > Issue Type: Improvement >Affects Versions: 3.5 > Environment: Linux >Reporter: Sean Bridges >Assignee: Michael McCandless > Fix For: 4.0 > > Attachments: LUCENE-3932.trunk.patch, perf.csv > > > We have a large 50 gig index which is optimized as one segment, with a 66 MEG > .tii file. This index has no norms, and no field cache. > It takes about 5 seconds to load this index, profiling reveals that 60% of > the time is spent in GrowableWriter.set(index, value), and most of time in > set(...) is spent resizing PackedInts.Mutatable current. > In the constructor for TermInfosReaderIndex, you initialize the writer with > the line, > {quote}GrowableWriter indexToTerms = new GrowableWriter(4, indexSize, > false);{quote} > For our index using four as the bit estimate results in 27 resizes. > The last value in indexToTerms is going to be ~ tiiFileLength, and if instead > you use, > {quote}int bitEstimate = (int) Math.ceil(Math.log10(tiiFileLength) / > Math.log10(2)); > GrowableWriter indexToTerms = new GrowableWriter(bitEstimate, indexSize, > false);{quote} > Load time improves to ~ 2 seconds. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Resolved] (LUCENE-3966) smokeTestRelease should accept a local (file://) staging URL
[ https://issues.apache.org/jira/browse/LUCENE-3966?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Michael McCandless resolved LUCENE-3966. Resolution: Fixed Fix Version/s: 4.0 > smokeTestRelease should accept a local (file://) staging URL > > > Key: LUCENE-3966 > URL: https://issues.apache.org/jira/browse/LUCENE-3966 > Project: Lucene - Java > Issue Type: Improvement >Reporter: Michael McCandless >Assignee: Michael McCandless > Fix For: 4.0 > > Attachments: LUCENE-3966.patch > > > I'll also fix buildAndPushRelease so it can push to a local URL; this way at > any time we can build, push to local staging, and run smoke tester on it, and > hopefully nothing fails... > But really any tests in smoke tester should ideally be pushed back earlier in > our dev process (into jenkins, into "ant test"). -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Resolved] (LUCENE-3109) Rename FieldsConsumer to InvertedFieldsConsumer
[ https://issues.apache.org/jira/browse/LUCENE-3109?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Michael McCandless resolved LUCENE-3109. Resolution: Fixed Assignee: Michael McCandless Thanks Iulius! > Rename FieldsConsumer to InvertedFieldsConsumer > --- > > Key: LUCENE-3109 > URL: https://issues.apache.org/jira/browse/LUCENE-3109 > Project: Lucene - Java > Issue Type: Task > Components: core/codecs >Affects Versions: 4.0 >Reporter: Simon Willnauer >Assignee: Michael McCandless >Priority: Minor > Fix For: 4.0 > > Attachments: LUCENE-3109.patch, LUCENE-3109.patch, LUCENE-3109.patch, > LUCENE-3109.patch, LUCENE-3109.patch > > > The name FieldsConsumer is missleading here it really is an > InvertedFieldsConsumer and since we are extending codecs to consume > non-inverted Fields we should be clear here. Same applies to Fields.java as > well as FieldsProducer. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Resolved] (LUCENE-3873) tie MockGraphTokenFilter into all analyzers tests
[ https://issues.apache.org/jira/browse/LUCENE-3873?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Michael McCandless resolved LUCENE-3873. Resolution: Fixed Fix Version/s: 4.0 > tie MockGraphTokenFilter into all analyzers tests > - > > Key: LUCENE-3873 > URL: https://issues.apache.org/jira/browse/LUCENE-3873 > Project: Lucene - Java > Issue Type: Task > Components: modules/analysis >Reporter: Robert Muir >Assignee: Michael McCandless > Fix For: 4.0 > > Attachments: LUCENE-3873.patch, LUCENE-3873.patch > > > Mike made a MockGraphTokenFilter on LUCENE-3848. > Many filters currently arent tested with anything but a simple tokenstream. > we should test them with this, too, it might find bugs (zero-length terms, > stacked terms/synonyms, etc) -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Resolved] (LUCENE-3955) smokeTestRelease should test solr example
[ https://issues.apache.org/jira/browse/LUCENE-3955?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Michael McCandless resolved LUCENE-3955. Resolution: Fixed This was fixed w/ SOLR-3331. > smokeTestRelease should test solr example > - > > Key: LUCENE-3955 > URL: https://issues.apache.org/jira/browse/LUCENE-3955 > Project: Lucene - Java > Issue Type: Improvement > Components: general/build >Reporter: Robert Muir >Assignee: Michael McCandless > Fix For: 4.0 > > > I think most anyone reviewing the solr artifacts will do this, > so really the RM has to do it manually: > but we can test 'ant example' from the source dist + java -jar start.jar from > solr/example > (or/and 'ant run-example'), and also java -jar start.jar from the binary > distribution. > some basic checks we can do are to run the test_utf8.sh, and to index the > example docs > (post.jar/post.sh the docs in exampledocs) then do a search. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Resolved] (LUCENE-3905) BaseTokenStreamTestCase should test analyzers on real-ish content
[ https://issues.apache.org/jira/browse/LUCENE-3905?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Michael McCandless resolved LUCENE-3905. Resolution: Fixed Fix Version/s: 4.0 3.6 > BaseTokenStreamTestCase should test analyzers on real-ish content > - > > Key: LUCENE-3905 > URL: https://issues.apache.org/jira/browse/LUCENE-3905 > Project: Lucene - Java > Issue Type: Improvement >Reporter: Michael McCandless > Fix For: 3.6, 4.0 > > Attachments: LUCENE-3905.patch > > > We already have LineFileDocs, that pulls content generated from europarl or > wikipedia... I think sometimes BTSTC should test the analyzers on that as > well. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Resolved] (LUCENE-3898) possible SynonymFilter bug: hudson fail
[ https://issues.apache.org/jira/browse/LUCENE-3898?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Michael McCandless resolved LUCENE-3898. Resolution: Fixed Fix Version/s: 4.0 3.6 I think this is fixed... > possible SynonymFilter bug: hudson fail > --- > > Key: LUCENE-3898 > URL: https://issues.apache.org/jira/browse/LUCENE-3898 > Project: Lucene - Java > Issue Type: Bug > Components: modules/analysis >Reporter: Robert Muir >Assignee: Michael McCandless > Fix For: 3.6, 4.0 > > > See https://builds.apache.org/job/Lucene-trunk/1867/consoleText (no seed) -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Resolved] (LUCENE-3894) Make BaseTokenStreamTestCase a bit more evil
[ https://issues.apache.org/jira/browse/LUCENE-3894?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Michael McCandless resolved LUCENE-3894. Resolution: Fixed > Make BaseTokenStreamTestCase a bit more evil > > > Key: LUCENE-3894 > URL: https://issues.apache.org/jira/browse/LUCENE-3894 > Project: Lucene - Java > Issue Type: Improvement >Reporter: Michael McCandless >Assignee: Michael McCandless > Fix For: 3.6, 4.0 > > Attachments: LUCENE-3894.patch, LUCENE-3894.patch, LUCENE-3894.patch > > > Throw an exception from the Reader while tokenizing, stop after not consuming > all tokens, sometimes spoon-feed chars from the reader... -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Resolved] (LUCENE-783) Store all metadata in human-readable segments file
[ https://issues.apache.org/jira/browse/LUCENE-783?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Michael McCandless resolved LUCENE-783. --- Resolution: Fixed Actually I think SimpleText's SegmentInfosFormat does this well? > Store all metadata in human-readable segments file > -- > > Key: LUCENE-783 > URL: https://issues.apache.org/jira/browse/LUCENE-783 > Project: Lucene - Java > Issue Type: Improvement > Components: core/index >Reporter: Marvin Humphrey >Priority: Minor > Labels: gsoc2012, lucene-gsoc-12 > Fix For: 4.0 > > > Various index-reading components in Lucene need metadata in addition to data. > This metadata is presently stored in arbitrary binary headers and spread out > over several files. We should move to concentrate it in a single file, and > this file should be encoded using a human-readable, extensible, standardized > data serialization language -- either XML or YAML. > * Making metadata human-readable makes debugging easier. Centralizing it > makes debugging easier still. Developers benefit from being able to scan > and locate relevant information quickly and with less debug printing. Users > get a new window through which to peer into the index structure. > * Since metadata is written to a separate file, there would no longer be a > need to seek back to the beginning of any data file to finish a header, > solving issue LUCENE-532. > * Special-case parsing code needed for extracting metadata supplied by > different index formats can be pared down. If a value is no longer > necessary, it can just be ignored/discarded. > * Removing headers from the data files simplifies them and makes the file > format easier to implement. > * With headers removed, all or nearly all data structures can take the > form of records stacked end to end, so that once a decoder has been > selected, an iterator can read the file from top to tail. To an extent, > this allows us to separate our data-processing algorithms from our > serialization algorithms, decoupling Lucene's code base from its file > format. For instance, instead of further subclassing TermDocs to deal with > "flexible indexing" formats, we might replace it with a PostingList which > returns a subclass of Posting. The deserialization code would be wholly > contained within the Posting subclass rather than spread out over several > subclasses of TermDocs. > * YAML and XML are equally well suited for the task of storing metadata, > but in either case a complete parser would not be needed -- a small subset > of the language will do. KinoSearch 0.20's custom-coded YAML parser > occupies about 600 lines of C -- not too bad, considering how miserable C's > string handling capabilities are. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Resolved] (LUCENE-1750) Create a MergePolicy that limits the maximum size of it's segments
[ https://issues.apache.org/jira/browse/LUCENE-1750?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Michael McCandless resolved LUCENE-1750. Resolution: Duplicate Fix Version/s: 3.2 TieredMergePolicy does this... > Create a MergePolicy that limits the maximum size of it's segments > -- > > Key: LUCENE-1750 > URL: https://issues.apache.org/jira/browse/LUCENE-1750 > Project: Lucene - Java > Issue Type: Improvement > Components: core/index >Affects Versions: 2.4.1 >Reporter: Jason Rutherglen >Priority: Minor > Fix For: 4.0, 3.2 > > Attachments: LUCENE-1750.patch > > Original Estimate: 48h > Remaining Estimate: 48h > > Basically I'm trying to create largish 2-4GB shards using > LogByteSizeMergePolicy, however I've found in the attached unit > test segments that exceed maxMergeMB. > The goal is for segments to be merged up to 2GB, then all > merging to that segment stops, and then another 2GB segment is > created. This helps when replicating in Solr where if a single > optimized 60GB segment is created, the machine stops working due > to IO and CPU starvation. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Resolved] (LUCENE-1922) exposing the ability to get the number of unique term count per field
[ https://issues.apache.org/jira/browse/LUCENE-1922?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Michael McCandless resolved LUCENE-1922. Resolution: Duplicate Fix Version/s: 2.9 Fixed in LUCENE-1586. > exposing the ability to get the number of unique term count per field > - > > Key: LUCENE-1922 > URL: https://issues.apache.org/jira/browse/LUCENE-1922 > Project: Lucene - Java > Issue Type: New Feature > Components: core/index >Affects Versions: 4.0 >Reporter: John Wang > Fix For: 4.0, 2.9 > > > Add an api to get the number of unique term count given a field name, e.g.: > IndexReader.getUniqueTermCount(String field) > This issue has a dependency on LUCENE-1458 -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Resolved] (LUCENE-1948) Deprecating InstantiatedIndexWriter
[ https://issues.apache.org/jira/browse/LUCENE-1948?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Michael McCandless resolved LUCENE-1948. Resolution: Fixed > Deprecating InstantiatedIndexWriter > --- > > Key: LUCENE-1948 > URL: https://issues.apache.org/jira/browse/LUCENE-1948 > Project: Lucene - Java > Issue Type: Task > Components: modules/other >Affects Versions: 2.9 >Reporter: Karl Wettin >Assignee: Karl Wettin > Fix For: 4.0 > > Attachments: LUCENE-1948.patch > > > http://markmail.org/message/j6ip266fpzuaibf7 > I suppose that should have been suggested before 2.9 rather than > after... > There are at least three reasons to why I want to do this: > The code is based on the behaviour or the Directory IndexWriter as of > 2.3 and I have not been touching it since then. If there will be > changes in the future one will have to keep IIW in sync, something > that's easy to forget. > There is no locking which will cause concurrent modification > exceptions when accessing the index via searcher/reader while > committing. > It use the old token stream API so it has to be upgraded in case it > should stay. > The java- and package level docs have since it was committed been > suggesting that one should consider using II as if it was immutable > due to the locklessness. My suggestion is that we make it immutable > for real. > Since II is ment for small corpora there is very little time lost by > using the constructor that builts the index from an IndexReader. I.e. > rather than using InstantiatedIndexWriter one would have to use a > Directory and an IndexWriter and then pass an IndexReader to a new > InstantiatedIndex. > Any objections? -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Resolved] (LUCENE-2120) Possible file handle leak in near real-time reader
[ https://issues.apache.org/jira/browse/LUCENE-2120?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Michael McCandless resolved LUCENE-2120. Resolution: Cannot Reproduce > Possible file handle leak in near real-time reader > -- > > Key: LUCENE-2120 > URL: https://issues.apache.org/jira/browse/LUCENE-2120 > Project: Lucene - Java > Issue Type: Bug > Components: core/index >Affects Versions: 3.1 >Reporter: Michael McCandless >Assignee: Michael McCandless > Fix For: 4.0 > > > Spinoff of LUCENE-1526: Jake/John hit file descriptor exhaustion when testing > NRT. > I've tried to repro this, stress testing NRT, saturating reopens, indexing, > searching, but haven't found any issue. > Let's try to get to the bottom of it, here... -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Resolved] (LUCENE-2276) Add IndexReader.document(int, Document, FieldSelector)
[ https://issues.apache.org/jira/browse/LUCENE-2276?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Michael McCandless resolved LUCENE-2276. Resolution: Duplicate Fix Version/s: 4.0 The StoredFieldVisitor API (4.0) makes this possible... > Add IndexReader.document(int, Document, FieldSelector) > -- > > Key: LUCENE-2276 > URL: https://issues.apache.org/jira/browse/LUCENE-2276 > Project: Lucene - Java > Issue Type: Wish > Components: core/search >Reporter: Tim Smith > Fix For: 4.0 > > Attachments: LUCENE-2276+2539.patch, LUCENE-2276.patch > > > The Document object passed in would be populated with the fields identified > by the FieldSelector for the specified internal document id > This method would allow reuse of Document objects when retrieving stored > fields from the index -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Resolved] (LUCENE-2334) IndexReader.close() should call IndexReader.decRef() unconditionally ??
[ https://issues.apache.org/jira/browse/LUCENE-2334?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Michael McCandless resolved LUCENE-2334. Resolution: Won't Fix > IndexReader.close() should call IndexReader.decRef() unconditionally ?? > --- > > Key: LUCENE-2334 > URL: https://issues.apache.org/jira/browse/LUCENE-2334 > Project: Lucene - Java > Issue Type: Improvement > Components: core/index >Affects Versions: 3.0.1 >Reporter: Mike Hanafey >Priority: Minor > > IndexReader.close() is defined: > {code} /** >* Closes files associated with this index. >* Also saves any new deletions to disk. >* No other methods should be called after this has been called. >* @throws IOException if there is a low-level IO error >*/ > public final synchronized void close() throws IOException { > if (!closed) { > decRef(); > closed = true; > } > } > {code} > This means that if the refCount is bigger than one, close() does not > actually close, but it is also true that calling close() again has no effect. > Why does close() not simply call decRef() unconditionally? This way if > incRef() is called each time an instance of IndexReader were handed out, if > close() is called by each recipient when they are done, the last one to call > close will actually close the index. As written it seems the API is very > confusing -- the first close() does one thing, but the next close() does > something different. > At a minimum the JavaDoc should clarify the behavior. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Resolved] (LUCENE-2310) Reduce Fieldable, AbstractField and Field complexity
[ https://issues.apache.org/jira/browse/LUCENE-2310?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Michael McCandless resolved LUCENE-2310. Resolution: Fixed Fix Version/s: 4.0 > Reduce Fieldable, AbstractField and Field complexity > > > Key: LUCENE-2310 > URL: https://issues.apache.org/jira/browse/LUCENE-2310 > Project: Lucene - Java > Issue Type: Sub-task > Components: core/index >Reporter: Chris Male > Fix For: 4.0 > > Attachments: LUCENE-2310-Deprecate-AbstractField-CleanField.patch, > LUCENE-2310-Deprecate-AbstractField.patch, > LUCENE-2310-Deprecate-AbstractField.patch, > LUCENE-2310-Deprecate-AbstractField.patch, > LUCENE-2310-Deprecate-DocumentGetFields-core.patch, > LUCENE-2310-Deprecate-DocumentGetFields.patch, > LUCENE-2310-Deprecate-DocumentGetFields.patch, LUCENE-2310.patch > > > In order to move field type like functionality into its own class, we really > need to try to tackle the hierarchy of Fieldable, AbstractField and Field. > Currently AbstractField depends on Field, and does not provide much more > functionality that storing fields, most of which are being moved over to > FieldType. Therefore it seems ideal to try to deprecate AbstractField (and > possible Fieldable), moving much of the functionality into Field and > FieldType. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Resolved] (LUCENE-2338) Some tests catch Exceptions in separate threads and just print a stack trace - the test does not fail
[ https://issues.apache.org/jira/browse/LUCENE-2338?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Michael McCandless resolved LUCENE-2338. Resolution: Fixed Fix Version/s: 4.0 3.6 Our test framework fails tests w/ errant exceptions from threads now... > Some tests catch Exceptions in separate threads and just print a stack trace > - the test does not fail > - > > Key: LUCENE-2338 > URL: https://issues.apache.org/jira/browse/LUCENE-2338 > Project: Lucene - Java > Issue Type: Test > Components: general/build >Reporter: Uwe Schindler > Fix For: 3.6, 4.0 > > > Some tests catch Exceptions in separate threads and just print a stack trace > - the test does not fail. The test should fail. Since LUCENE-2274, the > LuceneTestCase(J4) class installs an UncaughtExceptionHandler, so this type > of catching and solely printing a Stack trace is a bad idea. Problem is, that > the run() method of threads is not allowed to throw checked Exceptions. > Two possibilities: > - Catch checked Exceptions in the run() method and wrap into RuntimeException > or call Assert.fail() instead > - Use Executors -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Resolved] (LUCENE-2364) Add support for terms in BytesRef format to Term, TermQuery, TermRangeQuery & Co.
[ https://issues.apache.org/jira/browse/LUCENE-2364?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Michael McCandless resolved LUCENE-2364. Resolution: Fixed Term now stores BytesRef internally... > Add support for terms in BytesRef format to Term, TermQuery, TermRangeQuery & > Co. > - > > Key: LUCENE-2364 > URL: https://issues.apache.org/jira/browse/LUCENE-2364 > Project: Lucene - Java > Issue Type: Improvement > Components: core/search >Affects Versions: 4.0 >Reporter: Uwe Schindler >Assignee: Michael McCandless > Fix For: 4.0 > > > It would be good to directly allow BytesRefs in TermQuery and TermRangeQuery > (as both queries convert the strings to BytesRef internally). For > NumericRange support in Solr it will be needed to support numerics as ByteRef > in single-term queries. > When this will be added, don't forget to change TestNumericRangeQueryXX to > use the BytesRef ctor of TRQ. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Resolved] (LUCENE-2445) Perf improvements for the DocsEnum bulk read API
[ https://issues.apache.org/jira/browse/LUCENE-2445?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Michael McCandless resolved LUCENE-2445. Resolution: Won't Fix We removed bulk API in 4.0. > Perf improvements for the DocsEnum bulk read API > > > Key: LUCENE-2445 > URL: https://issues.apache.org/jira/browse/LUCENE-2445 > Project: Lucene - Java > Issue Type: Bug > Components: core/index >Reporter: Michael McCandless > Fix For: 4.0 > > > I started to work on LUCENE-2443, to create a test showing the > problems, but it turns out none of the core codecs (even sep/intblock) > ever set a non-zero offset. > So I set forth to fix sep to do so, but ran into some issues w/ the > current bulk-read API that we should fix to make it higher > performance: > * Filtering of deleted docs should be the caller's job (saves an > extra pass through the docs) > * Probably docs should arrive as deltas and caller sums these up to > get the actual docID > * Whether to load freqs or not should be separately controllable > * We may want to require that the int[] for docs and freqs are > "aligned", ie the offset into each is the same > * Maybe we should separate out a BulkDocsEnum from DocsEnum. We can > make it optional for codecs (ie, we can emulate BulkDocsEnum from > the DocsEnum) -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Resolved] (LUCENE-2441) Create 3.x -> 4.0 index migration tool
[ https://issues.apache.org/jira/browse/LUCENE-2441?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Michael McCandless resolved LUCENE-2441. Resolution: Duplicate We already have IndexUpgrader now. > Create 3.x -> 4.0 index migration tool > -- > > Key: LUCENE-2441 > URL: https://issues.apache.org/jira/browse/LUCENE-2441 > Project: Lucene - Java > Issue Type: New Feature > Components: core/index >Reporter: Michael McCandless > Fix For: 4.0 > > > We need a tool to upgrade an index so that 4.0 can read it. I think the only > change right now is the cutover to flex's standard codec format, but with > LUCENE-2426 we also need to correct the term sort order to be true unicode > code point order. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Resolved] (LUCENE-2505) The system cannot find the file specified - _0.fdt
[ https://issues.apache.org/jira/browse/LUCENE-2505?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Michael McCandless resolved LUCENE-2505. Resolution: Incomplete > The system cannot find the file specified - _0.fdt > -- > > Key: LUCENE-2505 > URL: https://issues.apache.org/jira/browse/LUCENE-2505 > Project: Lucene - Java > Issue Type: Bug > Components: core/index >Affects Versions: 2.4.1 >Reporter: Tej Kiran Sharma > > Hi, > I am using Lucene version 2.4.1 and while i indexing my files i got following > exception. > i set indexwriter as following.. > Directory lucDirectory = FSDirectory.getDirectory(_sIndexPath); > lucDirectory.setLockFactory(new SimpleFSLockFactory(_sIndexPath)); > lucWriter = new IndexWriter(lucDirectory, true, new > KeywordAnalyzer(), true); > lucWriter.setMergeFactor(10); > lucWriter.setMaxMergeDocs(2147483647); > lucWriter.setMaxBufferedDocs(1); > lucWriter.setRAMBufferSizeMB(32); > lucWriter.setUseCompoundFile(false); > I am doing indexing and searching both symultaniously and i am getting > following exception < the system cannot find the file specified > > "ERROR Exception while checking size - > C:\00scripts\Temp\TempIndex\20104261030775\_0.fdt (The system cannot find the > file specified)Stacktrace java.io.FileNotFoundException: > C:\00scripts\Temp\TempIndex\20104261030775\_0.fdt (The system cannot find the > file specified) at java.io.RandomAccessFile.open(Native Method) at > java.io.RandomAccessFile.(Unknown Source) at > org.apache.lucene.store.FSDirectory$FSIndexInput$Descriptor.(Unknown > Source) at org.apache.lucene.store.FSDirectory$FSIndexInput.(Unknown > Source) at org.apache.lucene.store.FSDirectory.openInput(Unknown Source) >at org.apache.lucene.index.FieldsReader.(Unknown Source) at > org.apache.lucene.index.SegmentReader.initialize(Unknown Source) at > org.apache.lucene.index.SegmentReader.get(Unknown Source)at > org.apache.lucene.index.SegmentReader.get(Unknown Source)at > org.apache.lucene.index.DirectoryIndexReader$1.doBody(Unknown Source) > at org.apache.lucene.index.SegmentInfos$FindSegmentsFile.run(Unknown Source) > at org.apache.lucene.index.DirectoryIndexReader.open(Unknown Source)at > org.apache.lucene.index.IndexReader.open(Unknown Source) at > org.apache.lucene.index.IndexReader.open(Unknown Source) at > org.apache.lucene.search.IndexSearcher.(Unknown Source)at > com..main.apu.d(Unknown Source) at com..main.apu.a(Unknown Source) > at com.main.arn.a(Unknown Source) at com.main.abh.b(Unknown Source) > at com.main.abh.a(Unknown Source) at com..main.abh.f(Unknown Source) > at com.main.eu.run(Unknown Source)" -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Resolved] (LUCENE-2530) rename docsEnum.getBulkResult() to make its role clearer
[ https://issues.apache.org/jira/browse/LUCENE-2530?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Michael McCandless resolved LUCENE-2530. Resolution: Won't Fix We removed bulk API in 4.0. > rename docsEnum.getBulkResult() to make its role clearer > > > Key: LUCENE-2530 > URL: https://issues.apache.org/jira/browse/LUCENE-2530 > Project: Lucene - Java > Issue Type: Bug > Components: core/index >Affects Versions: 4.0 >Reporter: Andi Vajda >Assignee: Michael McCandless >Priority: Minor > Fix For: 4.0 > > > Before docsEnum.read() can be called a BulkResult instance must be allocated > for it (it == the default implementation of that method). > This is done by calling docsEnum.getBulkResult(). Failure to call this method > before read() is called results in a NullPointerException. > It is somewhat counterintuitive to "get" the results of an operation before > calling said operation. > Maybe this method should be renamed to something more definite-sounding like > obtainBulkResult() or prepareBulkResult() ? -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Resolved] (LUCENE-2948) Make var gap terms index a partial prefix trie
[ https://issues.apache.org/jira/browse/LUCENE-2948?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Michael McCandless resolved LUCENE-2948. Resolution: Won't Fix I think BlockTree terms dict accomplished the same thing. > Make var gap terms index a partial prefix trie > -- > > Key: LUCENE-2948 > URL: https://issues.apache.org/jira/browse/LUCENE-2948 > Project: Lucene - Java > Issue Type: Improvement > Components: core/index >Reporter: Michael McCandless >Assignee: Michael McCandless > Fix For: 4.0 > > Attachments: LUCENE-2948.patch, LUCENE-2948.patch, LUCENE-2948.patch, > LUCENE-2948_automaton.patch, Results.png > > > Var gap stores (in an FST) the indexed terms (every 32nd term, by > default), minus their non-distinguishing suffixes. > However, often times the resulting FST is "close" to a prefix trie in > some portion of the terms space. > By allowing some nodes of the FST to store all outgoing edges, > including ones that do not lead to an indexed term, and by recording > that this node is then "authoritative" as to what terms exist in the > terms dict from that prefix, we can get some important benefits: > * It becomes possible to know that a certain term prefix cannot > exist in the terms index, which means we can save a disk seek in > some cases (like PK lookup, docFreq, etc.) > * We can query for the next possible prefix in the index, allowing > some MTQs (eg FuzzyQuery) to save disk seeks. > Basically, the terms index is able to answer questions that previously > required seeking/scanning in the terms dict file. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Resolved] (LUCENE-3177) Decouple indexer from Document/Field impls
[ https://issues.apache.org/jira/browse/LUCENE-3177?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Michael McCandless resolved LUCENE-3177. Resolution: Fixed > Decouple indexer from Document/Field impls > -- > > Key: LUCENE-3177 > URL: https://issues.apache.org/jira/browse/LUCENE-3177 > Project: Lucene - Java > Issue Type: Improvement >Reporter: Michael McCandless >Assignee: Michael McCandless > Fix For: 4.0 > > Attachments: LUCENE-3177.patch, LUCENE-3177.patch > > > I think we should define minimal iterator interfaces, > IndexableDocument/Field, that indexer requires to index documents. > Indexer would consume only these bare minimum interfaces, not the > concrete Document/Field/FieldType classes from oal.document package. > Then, the Document/Field/FieldType hierarchy is one concrete impl of > these interfaces. Apps are free to make their own impls as well. > Maybe eventually we make another impl that enforces a global schema, > eg factored out of Solr's impl. > I think this frees design pressure on our Document/Field/FieldType > hierarchy, ie, these classes are free to become concrete > fully-featured "user-space" classes with all sorts of friendly sugar > APIs for adding/removing fields, getting/setting values, types, etc., > but they don't need substantial extensibility/hierarchy. Ie, the > extensibility point shifts to IndexableDocument/Field interface. > I think this means we can collapse the three classes we now have for a > Field (Fieldable/AbstracField/Field) down to a single concrete class > (well, except for LUCENE-2308 where we want to break out dedicated > classes for different field types...). -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Resolved] (LUCENE-3272) Consolidate Lucene's QueryParsers into a module
[ https://issues.apache.org/jira/browse/LUCENE-3272?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Michael McCandless resolved LUCENE-3272. Resolution: Fixed Fix Version/s: 4.0 > Consolidate Lucene's QueryParsers into a module > --- > > Key: LUCENE-3272 > URL: https://issues.apache.org/jira/browse/LUCENE-3272 > Project: Lucene - Java > Issue Type: Improvement > Components: modules/queryparser >Reporter: Chris Male > Fix For: 4.0 > > > Lucene has a lot of QueryParsers and we should have them all in a single > consistent place. > The following are QueryParsers I can find that warrant moving to the new > module: > - Lucene Core's QueryParser > - AnalyzingQueryParser > - ComplexPhraseQueryParser > - ExtendableQueryParser > - Surround's QueryParser > - PrecedenceQueryParser > - StandardQueryParser > - XML-Query-Parser's CoreParser > All seem to do a good job at their kind of parsing with extensive tests. > One challenge of consolidating these is that many tests use Lucene Core's > QueryParser. One option is to just replicate this class in src/test and call > it TestingQueryParser. Another option is to convert all tests over to > programmatically building their queries (seems like alot of work). -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Resolved] (LUCENE-3422) IndeIndexWriter.optimize() throws FileNotFoundException and IOException
[ https://issues.apache.org/jira/browse/LUCENE-3422?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Michael McCandless resolved LUCENE-3422. Resolution: Incomplete > IndeIndexWriter.optimize() throws FileNotFoundException and IOException > --- > > Key: LUCENE-3422 > URL: https://issues.apache.org/jira/browse/LUCENE-3422 > Project: Lucene - Java > Issue Type: Bug >Reporter: Elizabeth Nisha > > I am using lucene 3.0.2 search APIs for my application. > Indexed data is about 350MB and time taken for indexing is 25 hrs. Search > indexing and Optimization runs in two different threads. Optimization runs > for every 1 hour and it doesn't run while indexing is going on and vice > versa. When optimization is going on using IndexWriter.optimize(), > FileNotFoundException and IOException are seen in my log and the index file > is getting corrupted, log says > 1. java.io.IOException: No sub-file with id _5r8.fdt found > [The file name in this message changes over time (_5r8.fdt, _6fa.fdt, > _6uh.fdt, ..., _emv.fdt) ] > 2. java.io.FileNotFoundException: > /local/groups/necim/index_5.3/index/_bdx.cfs (No such file or directory) > 3. java.io.FileNotFoundException: > /local/groups/necim/index_5.3/index/_hkq.cfs (No such file or directory) > Stack trace: java.io.IOException: background merge hit exception: > _hkp:c100->_hkp _hkq:c100->_hkp _hkr:c100->_hkr _hks:c100->_hkr _hxb:c5500 > _hx5:c1000 _hxc:c198 > 84 into _hxd [optimize] [mergeDocStores] >at org.apache.lucene.index.IndexWriter.optimize(IndexWriter.java:2359) >at org.apache.lucene.index.IndexWriter.optimize(IndexWriter.java:2298) >at org.apache.lucene.index.IndexWriter.optimize(IndexWriter.java:2268) >at com.telelogic.cs.search.SearchIndex.doOptimize(SearchIndex.java:130) >at > com.telelogic.cs.search.SearchIndexerThread$1.run(SearchIndexerThread.java:337) >at > java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886) >at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908) >at java.lang.Thread.run(Thread.java:662) > Caused by: java.io.FileNotFoundException: > /local/groups/necim/index_5.3/index/_hkq.cfs (No such file or directory) >at java.io.RandomAccessFile.open(Native Method) >at java.io.RandomAccessFile.(RandomAccessFile.java:212) >at > org.apache.lucene.store.SimpleFSDirectory$SimpleFSIndexInput$Descriptor.(SimpleFSDirectory.java:76) >at > org.apache.lucene.store.SimpleFSDirectory$SimpleFSIndexInput.(SimpleFSDirectory.java:97) >at > org.apache.lucene.store.NIOFSDirectory$NIOFSIndexInput.(NIOFSDirectory.java:87) >at > org.apache.lucene.store.NIOFSDirectory.openInput(NIOFSDirectory.java:67) >at > org.apache.lucene.index.CompoundFileReader.(CompoundFileReader.java:67) >at > org.apache.lucene.index.SegmentReader$CoreReaders.(SegmentReader.java:114) >at org.apache.lucene.index.SegmentReader.get(SegmentReader.java:590) >at > org.apache.lucene.index.IndexWriter$ReaderPool.get(IndexWriter.java:616) >at > org.apache.lucene.index.IndexWriter.mergeMiddle(IndexWriter.java:4309) >at org.apache.lucene.index.IndexWriter.merge(IndexWriter.java:3965) >at > org.apache.lucene.index.ConcurrentMergeScheduler.doMerge(ConcurrentMergeScheduler.java:231) >at > org.apache.lucene.index.ConcurrentMergeScheduler$MergeThread.run(ConcurrentMergeScheduler.java:288) > -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Resolved] (LUCENE-3872) Index changes are lost if you call prepareCommit() then close()
[ https://issues.apache.org/jira/browse/LUCENE-3872?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Michael McCandless resolved LUCENE-3872. Resolution: Fixed Thanks Tim! > Index changes are lost if you call prepareCommit() then close() > --- > > Key: LUCENE-3872 > URL: https://issues.apache.org/jira/browse/LUCENE-3872 > Project: Lucene - Java > Issue Type: Bug >Reporter: Michael McCandless >Assignee: Michael McCandless > Fix For: 3.6, 4.0 > > Attachments: LUCENE-3872.patch, LUCENE-3872.patch > > > You are supposed to call commit() after calling prepareCommit(), but... if > you forget, and call close() after prepareCommit() without calling commit(), > then any changes done after the prepareCommit() are silently lost (including > adding/deleting docs, but also any completed merges). > Spinoff from java-user thread "lots of .cfs (compound files) in the index > directory" from Tim Bogaert. > I think to fix this, IW.close should throw an IllegalStateException if > prepareCommit() was called with no matching call to commit(). -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Resolved] (LUCENE-3841) CloseableThreadLocal does not work well with Tomcat thread pooling
[ https://issues.apache.org/jira/browse/LUCENE-3841?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Michael McCandless resolved LUCENE-3841. Resolution: Fixed Thanks Matthew! > CloseableThreadLocal does not work well with Tomcat thread pooling > -- > > Key: LUCENE-3841 > URL: https://issues.apache.org/jira/browse/LUCENE-3841 > Project: Lucene - Java > Issue Type: Bug > Components: core/other >Affects Versions: 3.5 > Environment: Lucene/Tika/Snowball running in a Tomcat web application >Reporter: Matthew Bellew >Assignee: Michael McCandless > Fix For: 3.6, 4.0 > > Attachments: LUCENE-3841.patch > > > We tracked down a large memory leak (effectively a leak anyway) caused > by how Analyzer users CloseableThreadLocal. > CloseableThreadLocal.hardRefs holds references to Thread objects as > keys. The problem is that it only frees these references in the set() > method, and SnowballAnalyzer will only call set() when it is used by a > NEW thread. > The problem scenario is as follows: > The server experiences a spike in usage (say by robots or whatever) > and many threads are created and referenced by > CloseableThreadLocal.hardRefs. The server quiesces and lets many of > these threads expire normally. Now we have a smaller, but adequate > thread pool. So CloseableThreadLocal.set() may not be called by > SnowBallAnalyzer (via Analyzer) for a _long_ time. The purge code is > never called, and these threads along with their thread local storage > (lucene related or not) is never cleaned up. > I think calling the purge code in both get() and set() would have > avoided this problem, but is potentially expensive. Perhaps using > WeakHashMap instead of HashMap may also have helped. WeakHashMap > purges on get() and set(). So this might be an efficient way to > clean up threads in get(), while set() might do the more expensive > Map.keySet() iteration. > Our current work around is to not share SnowBallAnalyzer instances > among HTTP searcher threads. We open and close one on every request. > Thanks, > Matt -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Resolved] (LUCENE-3855) TestStressNRT failures (reproducible)
[ https://issues.apache.org/jira/browse/LUCENE-3855?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Michael McCandless resolved LUCENE-3855. Resolution: Fixed > TestStressNRT failures (reproducible) > - > > Key: LUCENE-3855 > URL: https://issues.apache.org/jira/browse/LUCENE-3855 > Project: Lucene - Java > Issue Type: Bug >Reporter: Dawid Weiss >Assignee: Michael McCandless >Priority: Minor > Fix For: 4.0 > > Attachments: LUCENE-3855.patch, > hoss-r1298470-fixed-seed__TEST-org.apache.lucene.index.TestStressNRT.xml, > output1.log, output2.log, output3.log, output4.log > > > Build server logs. Reproduces on at least two machines. > {noformat} > [junit] - Standard Error - > [junit] NOTE: reproduce with: ant test -Dtestcase=TestStressNRT > -Dtestmethod=test > -Dtests.seed=69468941c1bbf693:19e66d58475da929:69e9d2f81769b6d0 > -Dargs="-Dfile.encoding=UTF-8" > [junit] NOTE: test params are: codec=Lucene3x, > sim=RandomSimilarityProvider(queryNorm=true,coord=false): {}, locale=ro, > timezone=Etc/GMT+1 > [junit] NOTE: all tests run in this JVM: > [junit] [TestStressNRT] > [junit] NOTE: Linux 3.0.0-16-generic amd64/Sun Microsystems Inc. 1.6.0_27 > (64-bit)/cpus=2,threads=1,free=74960064,total=135987200 > [junit] - --- > [junit] Testcase: test(org.apache.lucene.index.TestStressNRT):Caused > an ERROR > [junit] MockDirectoryWrapper: cannot close: there are still open files: > {_ng.cfs=8} > [junit] java.lang.RuntimeException: MockDirectoryWrapper: cannot close: > there are still open files: {_ng.cfs=8} > [junit] at > org.apache.lucene.store.MockDirectoryWrapper.close(MockDirectoryWrapper.java:555) > [junit] at > org.apache.lucene.index.TestStressNRT.test(TestStressNRT.java:385) > [junit] at > org.apache.lucene.util.LuceneTestCase$SubclassSetupTeardownRule$1.evaluate(LuceneTestCase.java:743) > [junit] at > org.apache.lucene.util.LuceneTestCase$InternalSetupTeardownRule$1.evaluate(LuceneTestCase.java:639) > [junit] at > org.apache.lucene.util.SystemPropertiesInvariantRule$1.evaluate(SystemPropertiesInvariantRule.java:22) > [junit] at > org.apache.lucene.util.LuceneTestCase$TestResultInterceptorRule$1.evaluate(LuceneTestCase.java:538) > [junit] at > org.apache.lucene.util.LuceneTestCase$RememberThreadRule$1.evaluate(LuceneTestCase.java:600) > [junit] at > org.apache.lucene.util.LuceneTestCaseRunner.runChild(LuceneTestCaseRunner.java:164) > [junit] at > org.apache.lucene.util.LuceneTestCaseRunner.runChild(LuceneTestCaseRunner.java:57) > [junit] at > org.apache.lucene.util.StoreClassNameRule$1.evaluate(StoreClassNameRule.java:21) > [junit] at > org.apache.lucene.util.SystemPropertiesInvariantRule$1.evaluate(SystemPropertiesInvariantRule.java:22) > [junit] Caused by: java.lang.RuntimeException: unclosed IndexInput: > _ng.cfs > [junit] at > org.apache.lucene.store.MockDirectoryWrapper.addFileHandle(MockDirectoryWrapper.java:479) > [junit] at > org.apache.lucene.store.MockDirectoryWrapper$1.openSlice(MockDirectoryWrapper.java:777) > [junit] at > org.apache.lucene.store.CompoundFileDirectory.openInput(CompoundFileDirectory.java:221) > [junit] at > org.apache.lucene.codecs.lucene3x.TermInfosReader.(TermInfosReader.java:112) > [junit] at > org.apache.lucene.codecs.lucene3x.Lucene3xFields.(Lucene3xFields.java:84) > [junit] at > org.apache.lucene.codecs.lucene3x.PreFlexRWPostingsFormat$1.(PreFlexRWPostingsFormat.java:51) > [junit] at > org.apache.lucene.codecs.lucene3x.PreFlexRWPostingsFormat.fieldsProducer(PreFlexRWPostingsFormat.java:51) > [junit] at > org.apache.lucene.index.SegmentCoreReaders.(SegmentCoreReaders.java:108) > [junit] at > org.apache.lucene.index.SegmentReader.(SegmentReader.java:51) > [junit] at > org.apache.lucene.index.IndexWriter$ReadersAndLiveDocs.getMergeReader(IndexWriter.java:521) > [junit] at > org.apache.lucene.index.IndexWriter.mergeMiddle(IndexWriter.java:3587) > [junit] at > org.apache.lucene.index.IndexWriter.merge(IndexWriter.java:3257) > [junit] at > org.apache.lucene.index.ConcurrentMergeScheduler.doMerge(ConcurrentMergeScheduler.java:382) > [junit] at > org.apache.lucene.index.ConcurrentMergeScheduler$MergeThread.run(ConcurrentMergeScheduler.java:451) > [junit] > [junit] > [junit] Test org.apache.lucene.index.TestStressNRT FAILED > {noformat} -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information
[jira] [Resolved] (LUCENE-3831) Passing a null fieldname to MemoryFields#terms in MemoryIndex throws a NPE
[ https://issues.apache.org/jira/browse/LUCENE-3831?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Michael McCandless resolved LUCENE-3831. Resolution: Fixed Fix Version/s: 4.0 3.6 Thanks Alan. I couldn't provoke an NPE on 3.x but I still fixed SpanWeight to not pass on a null field to IR.norms. > Passing a null fieldname to MemoryFields#terms in MemoryIndex throws a NPE > -- > > Key: LUCENE-3831 > URL: https://issues.apache.org/jira/browse/LUCENE-3831 > Project: Lucene - Java > Issue Type: Bug > Components: core/index >Affects Versions: 4.0 >Reporter: Alan Woodward >Assignee: Michael McCandless >Priority: Minor > Fix For: 3.6, 4.0 > > Attachments: LUCENE-3831.patch, TestNullFieldAfterRegexpRewrite.java, > mindex-null-field.patch > > > I found this when querying a MemoryIndex using a RegexpQuery wrapped by a > SpanMultiTermQueryWrapper. If the regexp doesn't match anything in the > index, it gets rewritten to an empty SpanOrQuery with a null field value, > which then triggers the NPE. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Resolved] (LUCENE-3851) TestTermInfosReaderIndex failing (always reproducible)
[ https://issues.apache.org/jira/browse/LUCENE-3851?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Michael McCandless resolved LUCENE-3851. Resolution: Fixed Fix Version/s: 3.6 Thanks Dawid! > TestTermInfosReaderIndex failing (always reproducible) > -- > > Key: LUCENE-3851 > URL: https://issues.apache.org/jira/browse/LUCENE-3851 > Project: Lucene - Java > Issue Type: Bug >Reporter: Dawid Weiss >Assignee: Michael McCandless >Priority: Minor > Fix For: 3.6, 4.0 > > > Always fails on branch (use reproduce string below): > git clone --depth 1 -b rr g...@github.com:dweiss/lucene_solr.git > {noformat} > [junit4] Running org.apache.lucene.codecs.lucene3x.TestTermInfosReaderIndex > [junit4] FAILURE 0.04s J0 | TestTermInfosReaderIndex.testSeekEnum > [junit4]> Throwable #1: java.lang.AssertionError: > expected: but was:<:> > [junit4]> at > __randomizedtesting.SeedInfo.seed([C7597DFBBE0B3D7D:C6D9CEDD0700AAFF]:0) > [junit4]> at org.junit.Assert.fail(Assert.java:93) > [junit4]> at org.junit.Assert.failNotEquals(Assert.java:647) > [junit4]> at org.junit.Assert.assertEquals(Assert.java:128) > [junit4]> at org.junit.Assert.assertEquals(Assert.java:147) > [junit4]> at > org.apache.lucene.codecs.lucene3x.TestTermInfosReaderIndex.testSeekEnum(TestTermInfosReaderIndex.java:137) > [junit4]> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native > Method) > [junit4]> at > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39) > [junit4]> at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) > [junit4]> at java.lang.reflect.Method.invoke(Method.java:597) > [junit4]> at > com.carrotsearch.randomizedtesting.RandomizedRunner.invoke(RandomizedRunner.java:1766) > [junit4]> at > com.carrotsearch.randomizedtesting.RandomizedRunner.access$1000(RandomizedRunner.java:141) > [junit4]> at > com.carrotsearch.randomizedtesting.RandomizedRunner$6.evaluate(RandomizedRunner.java:728) > [junit4]> at > com.carrotsearch.randomizedtesting.RandomizedRunner$7.evaluate(RandomizedRunner.java:789) > [junit4]> at > com.carrotsearch.randomizedtesting.RandomizedRunner$8.evaluate(RandomizedRunner.java:803) > [junit4]> at > org.apache.lucene.util.LuceneTestCase$SubclassSetupTeardownRule$1.evaluate(LuceneTestCase.java:744) > [junit4]> at > org.apache.lucene.util.LuceneTestCase$InternalSetupTeardownRule$1.evaluate(LuceneTestCase.java:636) > [junit4]> at > org.apache.lucene.util.SystemPropertiesInvariantRule$1.evaluate(SystemPropertiesInvariantRule.java:22) > [junit4]> at > org.apache.lucene.util.LuceneTestCase$TestResultInterceptorRule$1.evaluate(LuceneTestCase.java:550) > [junit4]> at > org.apache.lucene.util.LuceneTestCase$RememberThreadRule$1.evaluate(LuceneTestCase.java:600) > [junit4]> at > com.carrotsearch.randomizedtesting.RandomizedRunner.runSingleTest(RandomizedRunner.java:735) > [junit4]> at > com.carrotsearch.randomizedtesting.RandomizedRunner.access$600(RandomizedRunner.java:141) > [junit4]> at > com.carrotsearch.randomizedtesting.RandomizedRunner$3$1.run(RandomizedRunner.java:586) > [junit4]> at > com.carrotsearch.randomizedtesting.RandomizedRunner$3.evaluate(RandomizedRunner.java:605) > [junit4]> at > com.carrotsearch.randomizedtesting.RandomizedRunner$4.evaluate(RandomizedRunner.java:641) > [junit4]> at > com.carrotsearch.randomizedtesting.RandomizedRunner$5.evaluate(RandomizedRunner.java:652) > [junit4]> at > org.apache.lucene.util.SystemPropertiesInvariantRule$1.evaluate(SystemPropertiesInvariantRule.java:22) > [junit4]> at > com.carrotsearch.randomizedtesting.RandomizedRunner.runSuite(RandomizedRunner.java:533) > [junit4]> at > com.carrotsearch.randomizedtesting.RandomizedRunner.access$400(RandomizedRunner.java:141) > [junit4]> at > com.carrotsearch.randomizedtesting.RandomizedRunner$2.run(RandomizedRunner.java:479) > [junit4]> > [junit4] 2> NOTE: reproduce with: ant test > -Dtests.filter=*.TestTermInfosReaderIndex -Dtests.filter.method=testSeekEnum > -Drt.seed=C7597DFBBE0B3D7D -Dargs="-Dfile.encoding=UTF-8" > [junit4] 2> > [junit4]> (@AfterClass output) > [junit4] 2> NOTE: test params are: codec=Appending, sim=DefaultSimilarity, > locale=en, timezone=Atlantic/Stanley > [junit4] 2> NOTE: all tests run in this JVM: > [junit4] 2> [TestLock, TestFileSwitchDirectory, TestWildcardRandom, > TestVersionComparator, TestTermdocPerf, TestBi
[jira] [Resolved] (LUCENE-3003) Move UnInvertedField into Lucene core
[ https://issues.apache.org/jira/browse/LUCENE-3003?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Michael McCandless resolved LUCENE-3003. Resolution: Fixed Fix Version/s: (was: 3.6) > Move UnInvertedField into Lucene core > - > > Key: LUCENE-3003 > URL: https://issues.apache.org/jira/browse/LUCENE-3003 > Project: Lucene - Java > Issue Type: Improvement > Components: core/index >Reporter: Michael McCandless >Assignee: Michael McCandless >Priority: Minor > Fix For: 4.0 > > Attachments: LUCENE-3003.patch, LUCENE-3003.patch, > byte_size_32-bit-openjdk6.txt > > > Solr's UnInvertedField lets you quickly lookup all terms ords for a > given doc/field. > Like, FieldCache, it inverts the index to produce this, and creates a > RAM-resident data structure holding the bits; but, unlike FieldCache, > it can handle multiple values per doc, and, it does not hold the term > bytes in RAM. Rather, it holds only term ords, and then uses > TermsEnum to resolve ord -> term. > This is great eg for faceting, where you want to use int ords for all > of your counting, and then only at the end you need to resolve the > "top N" ords to their text. > I think this is a useful core functionality, and we should move most > of it into Lucene's core. It's a good complement to FieldCache. For > this first baby step, I just move it into core and refactor Solr's > usage of it. > After this, as separate issues, I think there are some things we could > explore/improve: > * The first-pass that allocates lots of tiny byte[] looks like it > could be inefficient. Maybe we could use the byte slices from the > indexer for this... > * We can improve the RAM efficiency of the TermIndex: if the codec > supports ords, and we are operating on one segment, we should just > use it. If not, we can use a more RAM-efficient data structure, > eg an FST mapping to the ord. > * We may be able to improve on the main byte[] representation by > using packed ints instead of delta-vInt? > * Eventually we should fold this ability into docvalues, ie we'd > write the byte[] image at indexing time, and then loading would be > fast, instead of uninverting -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Resolved] (LUCENE-3824) TermOrdVal/DocValuesComparator does too much work in compareBottom
[ https://issues.apache.org/jira/browse/LUCENE-3824?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Michael McCandless resolved LUCENE-3824. Resolution: Fixed > TermOrdVal/DocValuesComparator does too much work in compareBottom > -- > > Key: LUCENE-3824 > URL: https://issues.apache.org/jira/browse/LUCENE-3824 > Project: Lucene - Java > Issue Type: Improvement >Reporter: Michael McCandless >Assignee: Michael McCandless >Priority: Minor > Fix For: 3.6, 4.0 > > Attachments: LUCENE-3824.patch > > > We now have logic to fall back to by-value comparison, when the bottom > slot is not from the current reader. > But this is silly, because if the bottom slot is from a different > reader, it means the tie-break case is not possible (since the current > reader didn't have the bottom value), so when the incoming ord equals > the bottom ord we should always return x > 0. > I added a new random string sort test case to TestSort... > I also renamed DocValues.SortedSource.getByValue -> getOrdByValue and > cleaned up some whitespace. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Resolved] (LUCENE-3829) Lucene40 codec's DocValues DirectSource impls aren't thread-safe
[ https://issues.apache.org/jira/browse/LUCENE-3829?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Michael McCandless resolved LUCENE-3829. Resolution: Invalid Duh, thanks Simon ;) Once I fixed the test to use the API correctly, it passes! > Lucene40 codec's DocValues DirectSource impls aren't thread-safe > > > Key: LUCENE-3829 > URL: https://issues.apache.org/jira/browse/LUCENE-3829 > Project: Lucene - Java > Issue Type: Bug >Reporter: Michael McCandless > Fix For: 4.0 > > Attachments: LUCENE-3829.patch > > > Our DirectSource impls hold IndexInput(s) open against the dat/idx > files, which we then seek + read when loading a specific document's > value. But this is in no way protected against multiple threads > I think...? -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Resolved] (LUCENE-3820) Wrong trailing index calculation in PatternReplaceCharFilter
[ https://issues.apache.org/jira/browse/LUCENE-3820?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Michael McCandless resolved LUCENE-3820. Resolution: Fixed Thanks Dawid! > Wrong trailing index calculation in PatternReplaceCharFilter > > > Key: LUCENE-3820 > URL: https://issues.apache.org/jira/browse/LUCENE-3820 > Project: Lucene - Java > Issue Type: Bug >Reporter: Dawid Weiss >Assignee: Dawid Weiss >Priority: Minor > Fix For: 3.6, 4.0 > > Attachments: LUCENE-3820.patch, LUCENE-3820.patch, > LUCENE-3820_test.patch, LUCENE-3820_test.patch > > > Reimplementation of PatternReplaceCharFilter to pass randomized tests (used > to throw exceptions previously). Simplified code, dropped boundary > characters, full input buffered for pattern matching. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Resolved] (LUCENE-3827) Make term offsets work in MemoryIndex
[ https://issues.apache.org/jira/browse/LUCENE-3827?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Michael McCandless resolved LUCENE-3827. Resolution: Fixed Fix Version/s: 4.0 I just committed this. Thanks Alan! > Make term offsets work in MemoryIndex > - > > Key: LUCENE-3827 > URL: https://issues.apache.org/jira/browse/LUCENE-3827 > Project: Lucene - Java > Issue Type: Improvement > Components: core/index >Affects Versions: 4.0 >Reporter: Alan Woodward >Assignee: Michael McCandless >Priority: Minor > Fix For: 4.0 > > Attachments: mindex.patch > > > Fix the logic for retrieving term offsets from DocsAndPositionsEnum on a > MemoryIndex, and allow subclasses to access them. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Resolved] (LUCENE-3776) NRTManager shouldn't expose its private SearcherManager
[ https://issues.apache.org/jira/browse/LUCENE-3776?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Michael McCandless resolved LUCENE-3776. Resolution: Fixed Thanks Shai! > NRTManager shouldn't expose its private SearcherManager > --- > > Key: LUCENE-3776 > URL: https://issues.apache.org/jira/browse/LUCENE-3776 > Project: Lucene - Java > Issue Type: Improvement >Reporter: Michael McCandless >Assignee: Michael McCandless >Priority: Blocker > Fix For: 3.6, 4.0 > > Attachments: LUCENE-3776.patch, LUCENE-3776.patch, LUCENE-3776.patch > > > Spinoff from LUCENE-3769. > To actually obtain an IndexSearcher from NRTManager, it's a 2-step process > now. > You must .getSearcherManager(), then .acquire() from the returned > SearcherManager. > This is very trappy... because if the app incorrectly calls maybeReopen on > that private SearcherManager (instead of NRTManager.maybeReopen) then it can > unexpectedly cause threads to block forever, waiting for the necessary gen to > become visible. This will be hard to debug... I don't like creating trappy > APIs. > Hopefully once LUCENE-3761 is in, we can fix NRTManager to no longer expose > its private SM, instead subclassing ReferenceManaager. > Or alternatively, or in addition, maybe we factor out a new interface > (SearcherProvider or something...) that only has acquire and release methods, > and both NRTManager and ReferenceManager/SM impl that, and we keep > NRTManager's SM private. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Resolved] (LUCENE-3769) Simplify NRTManager
[ https://issues.apache.org/jira/browse/LUCENE-3769?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Michael McCandless resolved LUCENE-3769. Resolution: Fixed I'll open follow-on issue for the nasty trap... > Simplify NRTManager > --- > > Key: LUCENE-3769 > URL: https://issues.apache.org/jira/browse/LUCENE-3769 > Project: Lucene - Java > Issue Type: Improvement > Components: core/search >Reporter: Michael McCandless >Assignee: Michael McCandless > Fix For: 3.6, 4.0 > > Attachments: LUCENE-3769.patch, LUCENE-3769.patch > > > NRTManager is hairy now, because the applyDeletes is separately passed > to ctor, passed to maybeReopen, passed to getSearcherManager, etc. > I think, instead, you should pass it only to the ctor, and if you have > some cases needing deletes and others not then you can make two > NRTManagers. This should be no less efficient than we have today, > just simpler. > I think it will also enable NRTManager to subclass ThingyManager > (LUCENE-3761). -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Resolved] (LUCENE-3760) Cleanup DR.getCurrentVersion/DR.getUserData/DR.getIndexCommit().getUserData()
[ https://issues.apache.org/jira/browse/LUCENE-3760?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Michael McCandless resolved LUCENE-3760. Resolution: Fixed > Cleanup DR.getCurrentVersion/DR.getUserData/DR.getIndexCommit().getUserData() > - > > Key: LUCENE-3760 > URL: https://issues.apache.org/jira/browse/LUCENE-3760 > Project: Lucene - Java > Issue Type: Improvement >Reporter: Michael McCandless >Assignee: Michael McCandless > Fix For: 3.6, 4.0 > > Attachments: LUCENE-3760.patch, LUCENE-3760.patch > > > Spinoff from Ryan's dev thread "DR.getCommitUserData() vs > DR.getIndexCommit().getUserData()"... these methods are confusing/dups right > now. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Resolved] (LUCENE-3672) IndexCommit.equals() bug
[ https://issues.apache.org/jira/browse/LUCENE-3672?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Michael McCandless resolved LUCENE-3672. Resolution: Fixed Fix Version/s: 4.0 3.6 > IndexCommit.equals() bug > > > Key: LUCENE-3672 > URL: https://issues.apache.org/jira/browse/LUCENE-3672 > Project: Lucene - Java > Issue Type: Bug > Components: core/index >Affects Versions: 4.0 >Reporter: Andrzej Bialecki >Assignee: Michael McCandless > Fix For: 3.6, 4.0 > > Attachments: LUCENE-3672.patch > > > IndexCommit.equals() checks for equality of Directories and versions, but it > doesn't check IMHO the more important generation numbers. It looks like > commits are really identified by a combination of directory and segments_XXX, > which means the generation number, because that's what the > DirectoryReader.open() checks for. > This bug leads to an unexpected behavior when the only change to be committed > is in userData - we get two commits then that are declared equal, they have > the same version but they have different generation numbers. I have no idea > how this situation is treated in a few dozen references to > IndexCommit.equals() across Lucene... > On the surface the fix is trivial - either add the gen number to equals(), or > use gen number instead of version. However, it's puzzling why these two would > ever get out of sync??? and if they are always supposed to be in sync then > maybe we don't need both of them at all, maybe just generation or version is > sufficient? -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Resolved] (LUCENE-3742) SynFilter doesn't set offsets for outputs that hang off the end of the input tokens
[ https://issues.apache.org/jira/browse/LUCENE-3742?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Michael McCandless resolved LUCENE-3742. Resolution: Fixed I set the offset to match the last input token... > SynFilter doesn't set offsets for outputs that hang off the end of the input > tokens > --- > > Key: LUCENE-3742 > URL: https://issues.apache.org/jira/browse/LUCENE-3742 > Project: Lucene - Java > Issue Type: Bug >Reporter: Michael McCandless >Assignee: Michael McCandless > Fix For: 3.6, 4.0 > > Attachments: LUCENE-3742.patch > > > If you have syn rule a -> x y and input a then output is a/x y but... what > should y's offsets be? Right now we set to 0/0. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Resolved] (LUCENE-3725) Add optional packing to FST building
[ https://issues.apache.org/jira/browse/LUCENE-3725?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Michael McCandless resolved LUCENE-3725. Resolution: Fixed > Add optional packing to FST building > > > Key: LUCENE-3725 > URL: https://issues.apache.org/jira/browse/LUCENE-3725 > Project: Lucene - Java > Issue Type: Improvement > Components: core/FSTs >Reporter: Michael McCandless >Assignee: Michael McCandless > Fix For: 3.6, 4.0 > > Attachments: LUCENE-3725.patch, LUCENE-3725.patch, LUCENE-3725.patch, > Perf.java > > > The FSTs produced by Builder can be further shrunk if you are willing > to spend highish transient RAM to do so... our Builder today tries > hard not to use much RAM (and has options to tweak down the RAM usage, > in exchange for somewhat lager FST), even when building immense FSTs. > But for apps that can afford highish transient RAM to get a smaller > net FST, I think we should offer packing. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Resolved] (LUCENE-2795) Genericize DirectIOLinuxDir -> UnixDir
[ https://issues.apache.org/jira/browse/LUCENE-2795?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Michael McCandless resolved LUCENE-2795. Resolution: Fixed Fix Version/s: 4.0 Thanks Varun! > Genericize DirectIOLinuxDir -> UnixDir > -- > > Key: LUCENE-2795 > URL: https://issues.apache.org/jira/browse/LUCENE-2795 > Project: Lucene - Java > Issue Type: Improvement > Components: core/store >Reporter: Michael McCandless >Assignee: Varun Thacker > Labels: gsoc2011, lucene-gsoc-11, mentor > Fix For: 4.0 > > Attachments: LUCENE-2795.patch, LUCENE-2795.patch, LUCENE-2795.patch, > LUCENE-2795.patch, LUCENE-2795.patch, LUCENE-2795.patch, LUCENE-2795.patch, > LUCENE-2795.patch, LUCENE-2795.patch, LUCENE-2795.patch > > > Today DirectIOLinuxDir is tricky/dangerous to use, because you only want to > use it for indexWriter and not IndexReader (searching). It's a trap. > But, once we do LUCENE-2793, we can make it fully general purpose because > then a single native Dir impl can be used. > I'd also like to make it generic to other Unices, if we can, so that it > becomes UnixDirectory. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Resolved] (LUCENE-3121) FST should offer lookup-by-output API when output strictly increases
[ https://issues.apache.org/jira/browse/LUCENE-3121?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Michael McCandless resolved LUCENE-3121. Resolution: Fixed Fix Version/s: 3.6 > FST should offer lookup-by-output API when output strictly increases > > > Key: LUCENE-3121 > URL: https://issues.apache.org/jira/browse/LUCENE-3121 > Project: Lucene - Java > Issue Type: Improvement > Components: core/other >Reporter: Michael McCandless >Assignee: Michael McCandless > Fix For: 3.6, 4.0 > > Attachments: LUCENE-3121.patch > > > Spinoff from "FST and FieldCache" java-dev thread > http://lucene.markmail.org/thread/swoawlv3fq4dntvl > FST is able to associate arbitrary outputs with the sorted input keys, but in > the special (and, common) case where the function is strictly monotonic (each > output only "increases" vs prior outputs), such as mapping to term ords or > mapping to file offsets in the terms dict, we should offer a lookup-by-output > API that efficiently walks the FST and locates input key (exact or floor or > ceil) matching that output. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Resolved] (LUCENE-3694) DocValuesField should not overload setInt/setFloat etc
[ https://issues.apache.org/jira/browse/LUCENE-3694?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Michael McCandless resolved LUCENE-3694. Resolution: Fixed Fixed with LUCENE-3453. > DocValuesField should not overload setInt/setFloat etc > -- > > Key: LUCENE-3694 > URL: https://issues.apache.org/jira/browse/LUCENE-3694 > Project: Lucene - Java > Issue Type: Bug >Reporter: Robert Muir >Assignee: Michael McCandless > Fix For: 4.0 > > > See my description on LUCENE-3687. In general we should avoid this for > primitive types and give them each unique names. > So I think instead of setInt(byte), setInt(short), setInt(int), setInt(long), > setFloat(float) and setFloat(double), > we should have setByte(byte), setShort(short), setInt(int), setLong(long), > setFloat(float) and setDouble(double). -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Resolved] (LUCENE-3682) Add deprecated 'transition' api for Document/Field
[ https://issues.apache.org/jira/browse/LUCENE-3682?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Michael McCandless resolved LUCENE-3682. Resolution: Fixed > Add deprecated 'transition' api for Document/Field > -- > > Key: LUCENE-3682 > URL: https://issues.apache.org/jira/browse/LUCENE-3682 > Project: Lucene - Java > Issue Type: Task >Reporter: Robert Muir >Assignee: Michael McCandless > Fix For: 4.0 > > > I think for 4.0 we should have a deprecated transition api for Field so you > can do new Field(..., Field.Store.xxx, Field.Index.yyy) like before. > These combinations would just be some predefined fieldtypes that are used > behind the scenes if you use these deprecated ctors > Sure it wouldn't be 'totally' backwards binary compat for Field.java, but why > must it be all or nothing? I think this would eliminate a big > hurdle for people that want to check out 4.x -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
dev@lucene.apache.org
[ https://issues.apache.org/jira/browse/LUCENE-3684?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Michael McCandless resolved LUCENE-3684. Resolution: Fixed > Add offsets to postings (D&PEnum) > - > > Key: LUCENE-3684 > URL: https://issues.apache.org/jira/browse/LUCENE-3684 > Project: Lucene - Java > Issue Type: Improvement >Reporter: Michael McCandless >Assignee: Michael McCandless > Fix For: 4.0 > > Attachments: LUCENE-3684.patch, LUCENE-3684.patch, LUCENE-3684.patch > > > I think should explore making start/end offsets a first-class attr in the > postings APIs, and fixing the indexer to index them into postings. > This will make term vector access cleaner (we now have to jump through > hoops w/ non-first-class offset attr). It can also enable efficient > highlighting without term vectors / reanalyzing, if the app indexes > offsets into the postings. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Resolved] (LUCENE-3453) remove IndexDocValuesField
[ https://issues.apache.org/jira/browse/LUCENE-3453?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Michael McCandless resolved LUCENE-3453. Resolution: Fixed > remove IndexDocValuesField > -- > > Key: LUCENE-3453 > URL: https://issues.apache.org/jira/browse/LUCENE-3453 > Project: Lucene - Java > Issue Type: Bug >Affects Versions: 4.0 >Reporter: Robert Muir >Assignee: Michael McCandless > Fix For: 4.0 > > Attachments: LUCENE-3453.patch, LUCENE-3453.patch > > > Its confusing how we present CSF functionality to the user, its actually not > a "field" but an "attribute" of a field like STORED or INDEXED. > Otherwise, its really hard to think about CSF because there is a mismatch > between the APIs and the index format. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Resolved] (LUCENE-3685) Add top-down version of BlockJoinQuery
[ https://issues.apache.org/jira/browse/LUCENE-3685?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Michael McCandless resolved LUCENE-3685. Resolution: Fixed > Add top-down version of BlockJoinQuery > -- > > Key: LUCENE-3685 > URL: https://issues.apache.org/jira/browse/LUCENE-3685 > Project: Lucene - Java > Issue Type: Improvement > Components: modules/join >Reporter: Michael McCandless >Assignee: Michael McCandless > Fix For: 3.6, 4.0 > > Attachments: LUCENE-3685.patch > > > Today, BlockJoinQuery can join from child docIDs up to parent docIDs. > EG this works well for product (parent) + many SKUs (child) search. > But the reverse, which BJQ cannot do, is also useful in some cases. > EG say you index songs (child) within albums (parent), but you want to > search and present by song not album while involving some fields from > the album in the query. In this case you want to wrap a parent query > (against album), joining down to the child document space. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Resolved] (LUCENE-3679) Replace IndexReader.getFieldNames with IndexReader.getFieldInfos
[ https://issues.apache.org/jira/browse/LUCENE-3679?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Michael McCandless resolved LUCENE-3679. Resolution: Fixed > Replace IndexReader.getFieldNames with IndexReader.getFieldInfos > > > Key: LUCENE-3679 > URL: https://issues.apache.org/jira/browse/LUCENE-3679 > Project: Lucene - Java > Issue Type: Improvement >Reporter: Michael McCandless >Assignee: Michael McCandless > Fix For: 3.6, 4.0 > > Attachments: LUCENE-3679.patch, LUCENE-3679.patch > > -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Resolved] (LUCENE-3681) FST.BYTE2 should save as fixed 2 byte not as vInt
[ https://issues.apache.org/jira/browse/LUCENE-3681?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Michael McCandless resolved LUCENE-3681. Resolution: Fixed > FST.BYTE2 should save as fixed 2 byte not as vInt > - > > Key: LUCENE-3681 > URL: https://issues.apache.org/jira/browse/LUCENE-3681 > Project: Lucene - Java > Issue Type: Improvement >Reporter: Michael McCandless >Assignee: Michael McCandless > Fix For: 3.6, 4.0 > > Attachments: LUCENE-3681.patch > > > We currently write BYTE1 as a single byte, but BYTE2/4 as vInt, but I think > that's confusing. Also, for the FST for the new Kuromoji analyzer > (LUCENE-3305), writing as 2 bytes instead shrank the FST and ran faster, > presumably because more values were >= 16384 than were < 128. > Separately the whole INPUT_TYPE is very confusing... really all it's doing is > "declaring" the allowed range of the characters of the input alphabet, and > then the only thing that uses that is the write/readLabel methods (well and > some confusing sugar methods in Builder!). Not sure how to fix that yet... > It's a simple change but it changes the FST binary format so any users w/ > FSTs out there will have to rebuild (FST is marked experimental...). -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Resolved] (LUCENE-3668) offsets issues with multiword synonyms
[ https://issues.apache.org/jira/browse/LUCENE-3668?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Michael McCandless resolved LUCENE-3668. Resolution: Fixed Fix Version/s: 4.0 3.6 Thanks Koji! > offsets issues with multiword synonyms > -- > > Key: LUCENE-3668 > URL: https://issues.apache.org/jira/browse/LUCENE-3668 > Project: Lucene - Java > Issue Type: Bug > Components: modules/analysis >Reporter: Robert Muir >Assignee: Michael McCandless > Fix For: 3.6, 4.0 > > Attachments: LUCENE-3668.patch, LUCENE-3668_test.patch > > > as reported on the list, there are some strange offsets with FSTSynonyms, in > the case of multiword synonyms. > as a workaround it was suggested to use the older synonym impl, but it has > bugs too (just in a different way). -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Resolved] (LUCENE-830) norms file can become unexpectedly enormous
[ https://issues.apache.org/jira/browse/LUCENE-830?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Michael McCandless resolved LUCENE-830. --- Resolution: Fixed Fix Version/s: 4.0 As of 4.0, when norms are missing we drop norms for the entire field, unlike before when we invent a fake norm for documents missing that field or omitting norm for it. Also, as of 4.0, you can now make a custom norm provider and custom similarity so if you really want to it's possible (in theory!) to have a sparse norms data structure... > norms file can become unexpectedly enormous > --- > > Key: LUCENE-830 > URL: https://issues.apache.org/jira/browse/LUCENE-830 > Project: Lucene - Java > Issue Type: Bug > Components: core/index >Affects Versions: 2.1 >Reporter: Michael McCandless >Priority: Minor > Fix For: 4.0 > > > Spinoff from this user thread: >http://www.gossamer-threads.com/lists/lucene/java-user/46754 > Norms are not stored sparsely, so even if a doc doesn't have field X > we still use up 1 byte in the norms file (and in memory when that > field is searched) for that segment. I think this is done for > performance at search time? > For indexes that have a large # documents where each document can have > wildly varying fields, each segment will use # documents times # fields > seen in that segment. When optimize merges all segments, that product > grows multiplicatively so the norms file for the single segment will > require far more storage than the sum of all previous segments' norm > files. > I think it's uncommon to have a huge number of distinct fields (?) so > we would need a solution that doesn't hurt the more common case where > most documents have the same fields. Maybe something analogous to how > bitvectors are now optionally stored sparsely? > One simple workaround is to disable norms. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Resolved] (LUCENE-3634) remove old static main methods in core
[ https://issues.apache.org/jira/browse/LUCENE-3634?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Michael McCandless resolved LUCENE-3634. Resolution: Fixed > remove old static main methods in core > -- > > Key: LUCENE-3634 > URL: https://issues.apache.org/jira/browse/LUCENE-3634 > Project: Lucene - Java > Issue Type: Improvement >Reporter: Michael McCandless >Assignee: Michael McCandless > Fix For: 3.6, 4.0 > > Attachments: LUCENE-3634.patch > > > We have a few random static main methods that I think are very rarely used... > we should remove them (IndexReader, UTF32ToUTF8, English). > The IndexReader main lets you list / extract the sub-files from a CFS... I > think we should move this to a new tool in contrib/misc. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Resolved] (LUCENE-3605) revisit segments.gen sleeping
[ https://issues.apache.org/jira/browse/LUCENE-3605?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Michael McCandless resolved LUCENE-3605. Resolution: Fixed Fix Version/s: 4.0 3.6 > revisit segments.gen sleeping > - > > Key: LUCENE-3605 > URL: https://issues.apache.org/jira/browse/LUCENE-3605 > Project: Lucene - Java > Issue Type: Improvement >Reporter: Robert Muir >Assignee: Michael McCandless > Fix For: 3.6, 4.0 > > Attachments: LUCENE-3605.patch > > > in LUCENE-3601, i worked up a change where we intentionally crash() all > un-fsynced files > in tests to ensure that we are calling sync on files when we should. > I think this would be nice to do always (and with some fixes all tests pass). > But this is super-slow sometimes because when we corrupt the unsynced > segments.gen, it causes > SIS.read to take 500ms each time (and in checkindex for some reason we do > this twice, which seems wrong). > I can workaround this for now for tests (just do a partial crash that avoids > corrupting the segments.gen), > but I wanted to create this issue for discussion about the > sleeping/non-fsyncing of segments.gen, just > because i guess its possible someone could hit this slowness. > -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Resolved] (LUCENE-3631) Remove write access from SegmentReader and possibly move to separate class or IndexWriter/BufferedDeletes/...
[ https://issues.apache.org/jira/browse/LUCENE-3631?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Michael McCandless resolved LUCENE-3631. Resolution: Fixed > Remove write access from SegmentReader and possibly move to separate class or > IndexWriter/BufferedDeletes/... > - > > Key: LUCENE-3631 > URL: https://issues.apache.org/jira/browse/LUCENE-3631 > Project: Lucene - Java > Issue Type: Task > Components: core/index >Affects Versions: 4.0 >Reporter: Uwe Schindler >Assignee: Michael McCandless > Attachments: LUCENE-3631.patch, LUCENE-3631.patch > > > After LUCENE-3606 is finished, there are some TODOs: > SegmentReader still contains (package-private) all delete logic including > crazy copyOnWrite for validDocs Bits. It would be good, if SegmentReader > itsself could be read-only like all other IndexReaders. > There are two possibilities to do this: > # the simple one: Subclass SegmentReader and make a RWSegmentReader that is > only used by IndexWriter/BufferedDeletes/... DirectoryReader will only use > the read-only SegmentReader. This would move all TODOs to a separate class. > It's reopen/clone method would always create a RO-SegmentReader (for NRT). > # Remove all write and commit stuff from SegmentReader completely and move it > to IndexWriter's readerPool (it must be in readerPool as deletions need a > not-changing view on an index snapshot). > Unfortunately the code is so complicated and I have no real experience in > those internals of IndexWriter so I did not want to do it with LUCENE-3606, I > just separated the code in SegmentReader and marked with TODO. Maybe Mike > McCandless can help :-) -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Resolved] (LUCENE-3658) NRTCachingDir has invalid asserts (if same file name is written twice)
[ https://issues.apache.org/jira/browse/LUCENE-3658?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Michael McCandless resolved LUCENE-3658. Resolution: Fixed > NRTCachingDir has invalid asserts (if same file name is written twice) > -- > > Key: LUCENE-3658 > URL: https://issues.apache.org/jira/browse/LUCENE-3658 > Project: Lucene - Java > Issue Type: Bug >Reporter: Michael McCandless >Assignee: Michael McCandless > Fix For: 3.6, 4.0 > > Attachments: LUCENE-3658.patch > > > Normally Lucene is write-once (except for segments.gen file, which > NRTCachingDir never caches), but in some tests (TestDoc, TestCrash) we can > write the same file more than once. > I don't think NRTCachingDir should have these asserts, and I think on > createOutput it should remove any old file if present. > I also found & fixed a possible concurrency issue (if more than one thread > syncs at the same time; IndexWriter doesn't ever do this today but it has in > the past). -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Resolved] (LUCENE-3598) Improve InfoStream class in trunk to be more consistent with logging-frameworks like slf4j/log4j/commons-logging
[ https://issues.apache.org/jira/browse/LUCENE-3598?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Michael McCandless resolved LUCENE-3598. Resolution: Fixed Fix Version/s: 4.0 > Improve InfoStream class in trunk to be more consistent with > logging-frameworks like slf4j/log4j/commons-logging > > > Key: LUCENE-3598 > URL: https://issues.apache.org/jira/browse/LUCENE-3598 > Project: Lucene - Java > Issue Type: Improvement > Components: core/index >Affects Versions: 4.0 >Reporter: Uwe Schindler >Assignee: Uwe Schindler > Fix For: 4.0 > > Attachments: LUCENE-3598.patch, LUCENE-3598.patch, LUCENE-3598.patch, > LUCENE-3598.patch, LUCENE-3598.patch > > > Followup on a [thread by Shai Erea on > java-dev@lao|http://lucene.472066.n3.nabble.com/IndexWriter-infoStream-is-final-td3537485.html]: > I already discussed with Robert about that, that there is one thing missing. > Currently the IW only checks if the infoStream!=null and then passes the > message to the method, and that *may* ignore it. For your requirement it is > the case that this is enabled or disabled dynamically. Unfortunately if the > construction of the message is heavy, then this wastes resources. > I would like to add another method to this class: abstract boolean > isEnabled() that can also be implemented. I would then replace all null > checks in IW by this method. The default config in IW would be changed to use > a NoOutputInfoStream that returns false here and ignores the message. > A simple logger wrapper for e.g. log4j / slf4j then could look like (ignoring > component, could be enabled): > {code:java} > Loger log = YourLoggingFramework.getLogger(IndexWriter.class); > public void message(String component, String message) { > log.debug(component + ": " + message); > } > public boolean isEnabled(String component) { > return log.isDebugEnabled(); > } > {code} > Using this you could enable/disable logging live by e.g. the log4j management > console of your app server by enabling/disabling IndexWriter.class logging. > The changes are really simple: > - PrintStreamInfoStream returns true, always, mabye make it dynamically > enable/disable to allow Shai's request > - infoStream.getDefault() is never null and can never be set to null. Instead > the default is a singleton NoOutputInfoStream that returns false of > isEnabled(component). > - All null checks on infoStream should be replaced by > infoStream.isEanbled(component), this is possible as always != null. There > are no slowdowns by this - it's like Collections.emptyList() instead stupid > null checks. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Resolved] (LUCENE-3639) Add test case support for shard searching
[ https://issues.apache.org/jira/browse/LUCENE-3639?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Michael McCandless resolved LUCENE-3639. Resolution: Fixed > Add test case support for shard searching > - > > Key: LUCENE-3639 > URL: https://issues.apache.org/jira/browse/LUCENE-3639 > Project: Lucene - Java > Issue Type: Improvement >Reporter: Michael McCandless >Assignee: Michael McCandless > Fix For: 4.0, 3.5 > > Attachments: LUCENE-3639.patch, LUCENE-3639.patch > > > New test case that helps stress test the APIs to support sharding -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Resolved] (LUCENE-3638) IndexReader.document always return a doc with all the stored fields loaded. And this can be slow for the indexed document contain huge fields
[ https://issues.apache.org/jira/browse/LUCENE-3638?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Michael McCandless resolved LUCENE-3638. Resolution: Fixed Thanks Peter! > IndexReader.document always return a doc with all the stored fields loaded. > And this can be slow for the indexed document contain huge fields > - > > Key: LUCENE-3638 > URL: https://issues.apache.org/jira/browse/LUCENE-3638 > Project: Lucene - Java > Issue Type: Improvement > Components: core/index, core/search >Affects Versions: 4.0 > Environment: 64bit linux java 1.6 >Reporter: peter chang >Priority: Minor > Labels: patch > Fix For: 4.0 > > Attachments: LUCENE-3638.patch, doc.fields.patch > > > when generating digest for some documents with huge fields, it should be > unnecessary to load the field but just interesting part of the field with the > offset information. but indexreader always return the whole field content. > afterward, the customized storedfieldsreader will got a repeated loading -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Resolved] (LUCENE-3531) Improve CachingWrapperFilter to optionally also cache acceptDocs, if identical to liveDocs
[ https://issues.apache.org/jira/browse/LUCENE-3531?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Michael McCandless resolved LUCENE-3531. Resolution: Fixed Fix Version/s: 4.0 > Improve CachingWrapperFilter to optionally also cache acceptDocs, if > identical to liveDocs > -- > > Key: LUCENE-3531 > URL: https://issues.apache.org/jira/browse/LUCENE-3531 > Project: Lucene - Java > Issue Type: Improvement > Components: core/search >Affects Versions: 4.0 >Reporter: Uwe Schindler >Assignee: Michael McCandless > Fix For: 4.0 > > Attachments: LUCENE-3531.patch > > > Spinoff from LUCENE-1536: This issue removed the different cache modes > completely and always applies the acceptDocs using > BitsFilteredDocIdSet.wrap(), the cache only contains raw DocIdSet without any > deletions/acceptDocs. For IndexReaders that are seldom reopened, this might > not be as performant as it could be. If the acceptDocs==IR.liveDocs, those > DocIdSet could also be cached with liveDocs applied. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Resolved] (LUCENE-3586) Choose a specific Directory implementation running the CheckIndex main
[ https://issues.apache.org/jira/browse/LUCENE-3586?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Michael McCandless resolved LUCENE-3586. Resolution: Fixed Fix Version/s: 4.0 3.6 Thanks Luca! > Choose a specific Directory implementation running the CheckIndex main > -- > > Key: LUCENE-3586 > URL: https://issues.apache.org/jira/browse/LUCENE-3586 > Project: Lucene - Java > Issue Type: Improvement >Reporter: Luca Cavanna >Assignee: Luca Cavanna >Priority: Minor > Fix For: 3.6, 4.0 > > Attachments: LUCENE-3586.patch, LUCENE-3586.patch, LUCENE-3586.patch, > LUCENE-3586.patch > > > It should be possible to choose a specific Directory implementation to use > during the CheckIndex process when we run it from its main. > What about an additional main parameter? > In fact, I'm experiencing some problems with MMapDirectory working with a big > segment, and after some failed attempts playing with maxChunkSize, I decided > to switch to another FSDirectory implementation but I needed to do that on my > own main. > Should we also consider to use a FileSwitchDirectory? > I'm willing to contribute, could you please let me know your thoughts about > it? -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Resolved] (LUCENE-3627) CorruptIndexException on indexing after a failure occurs after segments file creation but before any bytes are written
[ https://issues.apache.org/jira/browse/LUCENE-3627?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Michael McCandless resolved LUCENE-3627. Resolution: Fixed Fix Version/s: 4.0 3.6 > CorruptIndexException on indexing after a failure occurs after segments file > creation but before any bytes are written > -- > > Key: LUCENE-3627 > URL: https://issues.apache.org/jira/browse/LUCENE-3627 > Project: Lucene - Java > Issue Type: Bug >Affects Versions: 3.5 > Environment: lucene-3.5.0, src download from GA release > lucene.apache.org. > Mac OS X 10.6.5, running tests in Eclipse Build id: 20100218-1602, > java version "1.6.0_24" > Java(TM) SE Runtime Environment (build 1.6.0_24-b07-334-10M3326) > Java HotSpot(TM) 64-Bit Server VM (build 19.1-b02-334, mixed mode) >Reporter: Ken McCracken >Assignee: Michael McCandless >Priority: Critical > Fix For: 3.6, 4.0 > > Attachments: LUCENE-3627.patch, LUCENE-3627_initial_proposal.txt, > TestCrashCausesCorruptIndex.java > > Original Estimate: 48h > Remaining Estimate: 48h > > FSDirectory.createOutput(..) uses a RandomAccessFile to do its work. On my > system the default FSDirectory.open(..) creates an NIOFSDirectory. If > createOutput is called on a segments_* file and a crash occurs between > RandomAccessFile creation (file system shows a segments_* file exists but has > zero bytes) but before any bytes are written to the file, subsequent > IndexWriters cannot proceed. The difficulty is that it does not know how to > clear the empty segments_* file. None of the file deletions will happen on > such a segment file because the opening bytes cannot not be read to determine > format and version. > An initial proposed patch file is attached below. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Resolved] (LUCENE-3600) BlockJoinQuery advance fails on an assert in case of a single parent with child segment
[ https://issues.apache.org/jira/browse/LUCENE-3600?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Michael McCandless resolved LUCENE-3600. Resolution: Fixed > BlockJoinQuery advance fails on an assert in case of a single parent with > child segment > --- > > Key: LUCENE-3600 > URL: https://issues.apache.org/jira/browse/LUCENE-3600 > Project: Lucene - Java > Issue Type: Bug > Components: modules/join >Affects Versions: 3.5, 4.0 >Reporter: Shay Banon >Assignee: Michael McCandless > Fix For: 3.6, 4.0 > > > The BlockJoinQuery will fail on an assert when advance in called on a segment > with a single parent with a child. The call to > parentBits.prevSetBit(parentTarget - 1) will cause -1 to be returned, and the > assert will fail, though its valid. Just removing the assert fixes the > problem, since nextDoc will handle it properly. > Also, I don't understand the "assert parentTarget != 0;", with a comment of > each parent must have one child. There isn't really a reason to add this > constraint, as far as I can tell..., just call nextDoc in this case, no? -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Resolved] (LUCENE-3577) rename expungeDeletes
[ https://issues.apache.org/jira/browse/LUCENE-3577?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Michael McCandless resolved LUCENE-3577. Resolution: Fixed Fix Version/s: 4.0 3.5 > rename expungeDeletes > - > > Key: LUCENE-3577 > URL: https://issues.apache.org/jira/browse/LUCENE-3577 > Project: Lucene - Java > Issue Type: Task >Reporter: Robert Muir >Assignee: Michael McCandless > Fix For: 3.5, 4.0 > > Attachments: LUCENE-3577.patch > > > Similar to optimize(), expungeDeletes() has a misleading name. > We already had problems with this on the user list because TieredMergePolicy > didn't 'expunge' all their deletes. > Also I think expunge is the wrong word, because expunge makes it seem > like you just wrangle up the deletes and kick them out of the party and > that it should be fast. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Resolved] (LUCENE-3562) Stop storing TermsEnum in CloseableThreadLocal inside Terms instance
[ https://issues.apache.org/jira/browse/LUCENE-3562?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Michael McCandless resolved LUCENE-3562. Resolution: Fixed > Stop storing TermsEnum in CloseableThreadLocal inside Terms instance > > > Key: LUCENE-3562 > URL: https://issues.apache.org/jira/browse/LUCENE-3562 > Project: Lucene - Java > Issue Type: Improvement >Reporter: Michael McCandless >Assignee: Michael McCandless > Fix For: 4.0 > > Attachments: LUCENE-3562.patch, LUCENE-3562.patch > > > We have sugar methods in Terms.java (docFreq, totalTermFreq, docs, > docsAndPositions) that use a saved thread-private TermsEnum to do the > lookups. > But on apps that send many threads through Lucene, and/or have many > segments, this can add up to a lot of RAM, especially if the codecs > impl holds onto stuff. > Also, Terms has a close method (closes the CloseableThreadLocal) which > must be called, but we fail to do so in some places. > These saved enums are the cause of the recent OOME in TestNRTManager > (TestNRTManager.testNRTManager -seed > 2aa27e1aec20c4a2:-4a5a5ecf46837d0e:-7c4f651f1f0b75d7 -mult 3 > -nightly). > Really sharing these enums is a holdover from before Lucene queries > would share state (ie, save the TermState from the first pass, and use > it later to pull enums, get docFreq, etc.). It's not helpful anymore, > and it can use gobbs of RAM, so I'd like to remove it. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Resolved] (LUCENE-3575) Field names can be wrong for stored fields / term vectors after merging
[ https://issues.apache.org/jira/browse/LUCENE-3575?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Michael McCandless resolved LUCENE-3575. Resolution: Fixed I also ported the test case back to 3.x. > Field names can be wrong for stored fields / term vectors after merging > --- > > Key: LUCENE-3575 > URL: https://issues.apache.org/jira/browse/LUCENE-3575 > Project: Lucene - Java > Issue Type: Bug >Affects Versions: 4.0 >Reporter: Michael McCandless >Assignee: Michael McCandless > Fix For: 4.0 > > Attachments: LUCENE-3575.patch > > > The good news is this bug only exists in trunk... the bad news is it's > been here for some time (created by accident in LUCENE-2881). But the > good news is it should strike fairly rarely. > SegmentMerger sometimes incorrectly thinks it can bulk-copy TVs/stored > fields when it cannot (because field numbers don't map to the same > names across segments). > I think it happens only with addIndexes, or indexes that have > pre-trunk segments, and then SM falsely thinks it can bulk-merge only > when the last field number has the same field name across segments. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Resolved] (LUCENE-3578) TestSort testParallelMultiSort reproducible seed failure
[ https://issues.apache.org/jira/browse/LUCENE-3578?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Michael McCandless resolved LUCENE-3578. Resolution: Fixed Fix Version/s: 4.0 Thanks selckin! I had to generalize the check I committed for LUCENE-3572 to catch any embedded SlowMultiReaderWrappers... > TestSort testParallelMultiSort reproducible seed failure > > > Key: LUCENE-3578 > URL: https://issues.apache.org/jira/browse/LUCENE-3578 > Project: Lucene - Java > Issue Type: Bug >Reporter: selckin >Assignee: Michael McCandless > Fix For: 4.0 > > > trunk r1202157 > {code} > [junit] Testsuite: org.apache.lucene.search.TestSort > [junit] Tests run: 1, Failures: 1, Errors: 0, Time elapsed: 0.978 sec > [junit] > [junit] - Standard Error - > [junit] NOTE: reproduce with: ant test -Dtestcase=TestSort > -Dtestmethod=testParallelMultiSort > -Dtests.seed=-2996f3e0f5d118c2:32c8e62dd9611f63:7a90f44586ae8263 > -Dargs="-Dfile.encoding=UTF-8" > [junit] WARNING: test method: 'testParallelMultiSort' left thread > running: Thread[pool-1-thread-1,5,main] > [junit] WARNING: test method: 'testParallelMultiSort' left thread > running: Thread[pool-1-thread-2,5,main] > [junit] WARNING: test method: 'testParallelMultiSort' left thread > running: Thread[pool-1-thread-3,5,main] > [junit] NOTE: test params are: codec=Lucene40: > {short=Lucene40(minBlockSize=98 maxBlockSize=214), > contents=PostingsFormat(name=MockSep), byte=PostingsFormat(name=SimpleText), > int=Pulsing40(freqCutoff=4 minBlockSize=58 maxBlockSize=186), > string=PostingsFormat(name=NestedPulsing), i18n=Lucene40(minBlockSize=98 > maxBlockSize=214), long=PostingsFormat(name=Memory), > double=Pulsing40(freqCutoff=4 minBlockSize=58 maxBlockSize=186), > parser=MockVariableIntBlock(baseBlockSize=88), float=Lucene40(minBlockSize=98 > maxBlockSize=214), custom=PostingsFormat(name=MockRandom)}, > sim=RandomSimilarityProvider(queryNorm=false,coord=false): > {short=BM25(k1=1.2,b=0.75), tracer=DFR I(ne)B2, byte=DFR I(ne)B3(800.0), > contents=IB LL-LZ(0.3), int=DFR I(n)BZ(0.3), string=IB LL-D3(800.0), i18n=DFR > GB2, double=DFR I(ne)B2, long=DFR GB1, parser=DFR GL2, > float=BM25(k1=1.2,b=0.75), custom=DFR I(ne)Z(0.3)}, locale=ga_IE, > timezone=America/Louisville > [junit] NOTE: all tests run in this JVM: > [junit] [TestSort] > [junit] NOTE: Linux 3.0.6-gentoo amd64/Sun Microsystems Inc. 1.6.0_29 > (64-bit)/cpus=8,threads=4,free=78022136,total=125632512 > [junit] - --- > [junit] Testcase: > testParallelMultiSort(org.apache.lucene.search.TestSort): FAILED > [junit] expected:<[ZJ]I> but was:<[JZ]I> > [junit] junit.framework.AssertionFailedError: expected:<[ZJ]I> but > was:<[JZ]I> > [junit] at > org.apache.lucene.search.TestSort.assertMatches(TestSort.java:1245) > [junit] at > org.apache.lucene.search.TestSort.assertMatches(TestSort.java:1216) > [junit] at > org.apache.lucene.search.TestSort.runMultiSorts(TestSort.java:1202) > [junit] at > org.apache.lucene.search.TestSort.testParallelMultiSort(TestSort.java:855) > [junit] at > org.apache.lucene.util.LuceneTestCase$3$1.evaluate(LuceneTestCase.java:523) > [junit] at > org.apache.lucene.util.LuceneTestCaseRunner.runChild(LuceneTestCaseRunner.java:149) > [junit] at > org.apache.lucene.util.LuceneTestCaseRunner.runChild(LuceneTestCaseRunner.java:51) > [junit] > [junit] > [junit] Test org.apache.lucene.search.TestSort FAILED > {code} -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Resolved] (LUCENE-3572) MultiIndexDocValues pretends it can merge sorted sources
[ https://issues.apache.org/jira/browse/LUCENE-3572?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Michael McCandless resolved LUCENE-3572. Resolution: Fixed > MultiIndexDocValues pretends it can merge sorted sources > > > Key: LUCENE-3572 > URL: https://issues.apache.org/jira/browse/LUCENE-3572 > Project: Lucene - Java > Issue Type: Bug >Reporter: Michael McCandless >Assignee: Michael McCandless > Fix For: 4.0 > > Attachments: LUCENE-3572.patch > > > Nightly build hit this failure: > {noformat} > ant test-core -Dtestcase=TestSort -Dtestmethod=testReverseSort > -Dtests.seed=791b126576b0cfab:-48895c7243ecc5d0:743c683d1c9f7768 > -Dtests.multiplier=3 -Dargs="-Dfile.encoding=ISO8859-1" > [junit] Testcase: testReverseSort(org.apache.lucene.search.TestSort): > Caused an ERROR > [junit] expected:<[CEGIA]> but was:<[ACEGI]> > [junit] at > org.apache.lucene.search.TestSort.assertMatches(TestSort.java:1248) > [junit] at > org.apache.lucene.search.TestSort.assertMatches(TestSort.java:1216) > [junit] at > org.apache.lucene.search.TestSort.testReverseSort(TestSort.java:759) > [junit] at > org.apache.lucene.util.LuceneTestCase$3$1.evaluate(LuceneTestCase.java:523) > [junit] at > org.apache.lucene.util.LuceneTestCaseRunner.runChild(LuceneTestCaseRunner.java:149) > [junit] at > org.apache.lucene.util.LuceneTestCaseRunner.runChild(LuceneTestCaseRunner.java:51) > {noformat} > It's happening in the test for reverse-sort of a string field with DocValues, > when the test had gotten SlowMultiReaderWrapper. > I committed a fix to the test to avoid testing this case, but we need a > better fix to the underlying bug. > MultiIndexDocValues cannot merge sorted sources (I think?), yet somehow it's > pretending it can (in the above test, the three subs had BYTES_FIXED_SORTED > type, and the TypePromoter happily claims to merge these to > BYTES_FIXED_SORTED; I think MultiIndexDocValues should return null for the > sorted source in this case? -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Resolved] (LUCENE-3518) Add sort-by-term with DocValues
[ https://issues.apache.org/jira/browse/LUCENE-3518?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Michael McCandless resolved LUCENE-3518. Resolution: Fixed > Add sort-by-term with DocValues > --- > > Key: LUCENE-3518 > URL: https://issues.apache.org/jira/browse/LUCENE-3518 > Project: Lucene - Java > Issue Type: New Feature > Components: core/search >Reporter: Michael McCandless >Assignee: Michael McCandless > Fix For: 4.0 > > Attachments: LUCENE-3518.patch, LUCENE-3518.patch, LUCENE-3518.patch > > > There are two sorted byte[] types with DocValues (BYTES_VAR_SORTED, > BYTES_FIXED_SORTED), so you can index this type, but you can't yet > sort by it. > So I added a FieldComparator just like TermOrdValComparator, except it > pulls from the doc values instead. > There are some small diffs, eg with doc values there are never null > values (see LUCENE-3504). -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Resolved] (LUCENE-3454) rename optimize to a less cool-sounding name
[ https://issues.apache.org/jira/browse/LUCENE-3454?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Michael McCandless resolved LUCENE-3454. Resolution: Fixed Fix Version/s: 4.0 3.5 > rename optimize to a less cool-sounding name > > > Key: LUCENE-3454 > URL: https://issues.apache.org/jira/browse/LUCENE-3454 > Project: Lucene - Java > Issue Type: Improvement >Affects Versions: 3.4, 4.0 >Reporter: Robert Muir >Assignee: Michael McCandless > Fix For: 3.5, 4.0 > > Attachments: LUCENE-3454.patch, LUCENE-3454.patch > > > I think users see the name optimize and feel they must do this, because who > wants a suboptimal system? but this probably just results in wasted time and > resources. > maybe rename to collapseSegments or something? -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Resolved] (LUCENE-3443) Port 3.x FieldCache.getDocsWithField() to trunk
[ https://issues.apache.org/jira/browse/LUCENE-3443?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Michael McCandless resolved LUCENE-3443. Resolution: Fixed > Port 3.x FieldCache.getDocsWithField() to trunk > --- > > Key: LUCENE-3443 > URL: https://issues.apache.org/jira/browse/LUCENE-3443 > Project: Lucene - Java > Issue Type: Improvement > Components: core/search >Reporter: Michael McCandless >Assignee: Michael McCandless > Fix For: 3.5, 4.0 > > Attachments: LUCENE-3443.patch, LUCENE-3443.patch, LUCENE-3443.patch > > > [Spinoff from LUCENE-3390] > I think the approach in 3.x for handling un-valued docs, and making it > possible to specify how such docs are sorted, is better than the > solution we have in trunk. > I like that FC has a dedicated method to get the Bits for docs with field > -- easy for apps to directly use. And I like that the > bits have their own entry in the FC. > One downside is that it's 2 passes to get values and valid bits, but > I think we can fix this by passing optional bool to FC.getXXX methods > indicating you want the bits, and the populate the FC entry for the > missing bits as well. (We can do that for 3.x and trunk). Then it's > single pass. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Resolved] (LUCENE-3339) TestNRTThreads hangs in nightly 3.x builds
[ https://issues.apache.org/jira/browse/LUCENE-3339?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Michael McCandless resolved LUCENE-3339. Resolution: Fixed Fix Version/s: (was: 3.5) 3.4 Simon, yes I think so; I believe this was fixed in 3.4.0. > TestNRTThreads hangs in nightly 3.x builds > -- > > Key: LUCENE-3339 > URL: https://issues.apache.org/jira/browse/LUCENE-3339 > Project: Lucene - Java > Issue Type: Bug >Reporter: Robert Muir >Assignee: Michael McCandless > Fix For: 3.4 > > Attachments: LUCENE-3339.patch > > > Maybe we have a problem, maybe its a bug in the test. > But its strange that lately the 3.x nightlies have been hanging here. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Resolved] (LUCENE-3524) Add "direct" PackedInts.Reader impl, that reads directly from disk on each get
[ https://issues.apache.org/jira/browse/LUCENE-3524?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Michael McCandless resolved LUCENE-3524. Resolution: Fixed Fix Version/s: 4.0 > Add "direct" PackedInts.Reader impl, that reads directly from disk on each get > -- > > Key: LUCENE-3524 > URL: https://issues.apache.org/jira/browse/LUCENE-3524 > Project: Lucene - Java > Issue Type: Improvement >Reporter: Michael McCandless >Assignee: Michael McCandless > Fix For: 4.0 > > Attachments: LUCENE-3524.patch, LUCENE-3524.patch > > > Spinoff from LUCENE-3518. > If we had a direct PackedInts.Reader impl we could use that instead of > the RandomAccessReaderIterator. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Resolved] (LUCENE-3563) TestPagedBytes failure
[ https://issues.apache.org/jira/browse/LUCENE-3563?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Michael McCandless resolved LUCENE-3563. Resolution: Fixed Fix Version/s: 4.0 3.5 > TestPagedBytes failure > -- > > Key: LUCENE-3563 > URL: https://issues.apache.org/jira/browse/LUCENE-3563 > Project: Lucene - Java > Issue Type: Bug >Affects Versions: 3.5 >Reporter: Robert Muir > Fix For: 3.5, 4.0 > > > ant test -Dtestcase=TestPagedBytes -Dtestmethod=testDataInputOutput > -Dtests.seed=268db1f3329b70d:3125365bc9c56c90:116e02aa4a70ec2f > -Dtests.multiplier=5 -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Resolved] (LUCENE-3539) IndexFormatTooOld/NewExc should try to include fileName + directory when possible
[ https://issues.apache.org/jira/browse/LUCENE-3539?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Michael McCandless resolved LUCENE-3539. Resolution: Fixed > IndexFormatTooOld/NewExc should try to include fileName + directory when > possible > - > > Key: LUCENE-3539 > URL: https://issues.apache.org/jira/browse/LUCENE-3539 > Project: Lucene - Java > Issue Type: Improvement >Reporter: Michael McCandless >Assignee: Michael McCandless > Fix For: 3.5, 4.0 > > Attachments: LUCENE-3539.patch, LUCENE-3539.patch, LUCENE-3539.patch > > > (Spinoff from http://markmail.org/thread/t6s7nn3ve765nojc ) > When we throw a too old/new exc we should try to include the full path to the > offending file, if possible. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Resolved] (LUCENE-2205) Rework of the TermInfosReader class to remove the Terms[], TermInfos[], and the index pointer long[] and create a more memory efficient data structure.
[ https://issues.apache.org/jira/browse/LUCENE-2205?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Michael McCandless resolved LUCENE-2205. Resolution: Fixed Fix Version/s: 4.0 Finally resolved; thanks Aaron! > Rework of the TermInfosReader class to remove the Terms[], TermInfos[], and > the index pointer long[] and create a more memory efficient data structure. > --- > > Key: LUCENE-2205 > URL: https://issues.apache.org/jira/browse/LUCENE-2205 > Project: Lucene - Java > Issue Type: Improvement > Components: core/index > Environment: Java5 >Reporter: Aaron McCurry >Assignee: Michael McCandless > Fix For: 3.5, 4.0 > > Attachments: LUCENE-2205.patch, LUCENE-2205.patch, > RandomAccessTest.java, TermInfosReader.java, TermInfosReaderIndex.java, > TermInfosReaderIndexDefault.java, TermInfosReaderIndexSmall.java, > lowmemory_w_utf8_encoding.patch, lowmemory_w_utf8_encoding.v4.patch, > patch-final.txt, rawoutput.txt > > > Basically packing those three arrays into a byte array with an int array as > an index offset. > The performance benefits are stagering on my test index (of size 6.2 GB, with > ~1,000,000 documents and ~175,000,000 terms), the memory needed to load the > terminfos into memory were reduced to 17% of there original size. From 291.5 > MB to 49.7 MB. The random access speed has been made better by 1-2%, load > time of the segments are ~40% faster as well, and full GC's on my JVM were > made 7 times faster. > I have already performed the work and am offering this code as a patch. > Currently all test in the trunk pass with this new code enabled. I did write > a system property switch to allow for the original implementation to be used > as well. > -Dorg.apache.lucene.index.TermInfosReader=default or small > I have also written a blog about this patch here is the link. > http://www.nearinfinity.com/blogs/aaron_mccurry/my_first_lucene_patch.html -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Resolved] (LUCENE-3522) TermsFilter.getDocIdSet(context) NPE on missing field
[ https://issues.apache.org/jira/browse/LUCENE-3522?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Michael McCandless resolved LUCENE-3522. Resolution: Fixed Fix Version/s: 4.0 Thanks Dan! I committed to trunk and backported the test case to 3.x. I had to add missing rd1/2.close() at the end of the test case. > TermsFilter.getDocIdSet(context) NPE on missing field > - > > Key: LUCENE-3522 > URL: https://issues.apache.org/jira/browse/LUCENE-3522 > Project: Lucene - Java > Issue Type: Bug > Components: modules/other >Affects Versions: 4.0 >Reporter: Dan Climan >Assignee: Michael McCandless >Priority: Minor > Fix For: 4.0 > > Attachments: LUCENE-3522.patch > > > If the context does not contain the field for a term when calling > TermsFilter.getDocIdSet(AtomicReaderContext context) then a > NullPointerException is thrown due to not checking for null Terms before > getting iterator. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Resolved] (LUCENE-3520) If the NRT reader hasn't changed then IndexReader.openIfChanged should return null
[ https://issues.apache.org/jira/browse/LUCENE-3520?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Michael McCandless resolved LUCENE-3520. Resolution: Fixed > If the NRT reader hasn't changed then IndexReader.openIfChanged should return > null > -- > > Key: LUCENE-3520 > URL: https://issues.apache.org/jira/browse/LUCENE-3520 > Project: Lucene - Java > Issue Type: Bug > Components: core/index >Reporter: Michael McCandless >Assignee: Michael McCandless > Fix For: 3.5, 4.0 > > Attachments: LUCENE-3520.patch, LUCENE-3520.patch > > > I hit a failure in TestSearcherManager (NOTE: doesn't always fail): > {noformat} > ant test -Dtestcase=TestSearcherManager -Dtestmethod=testSearcherManager > -Dtests.seed=459ac99a4256789c:-29b8a7f52497c3b4:145ae632ae9e1ecf > {noformat} > It was tripping the assert inside SearcherLifetimeManager.record, > because two different IndexSearcher instances had different IR > instances sharing the same version. This was happening because > IW.getReader always returns a new reader even when there are no > changes. I think we should fix that... > Separately I found a deadlock in > TestSearcherManager.testIntermediateClose, if the test gets > SerialMergeScheduler and needs to merge during the second commit. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Resolved] (LUCENE-3515) Possible slowdown of indexing/merging on 3.x vs trunk
[ https://issues.apache.org/jira/browse/LUCENE-3515?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Michael McCandless resolved LUCENE-3515. Resolution: Fixed Thank you Marc and Erick! This was a devious issue and severely impacted merge performance for non-MMapDir impls. > Possible slowdown of indexing/merging on 3.x vs trunk > - > > Key: LUCENE-3515 > URL: https://issues.apache.org/jira/browse/LUCENE-3515 > Project: Lucene - Java > Issue Type: Bug > Components: core/index >Reporter: Michael McCandless >Assignee: Michael McCandless > Fix For: 3.5, 4.0 > > Attachments: LUCENE-3515.patch, LUCENE-3515.patch, > LUCENE-index-34.patch, LUCENE-index-40.patch, TestGenerationTime.java.3x, > TestGenerationTime.java.40, stdout-snow-leopard.tar.gz > > > Opening an issue to pursue the possible slowdown Marc Sturlese uncovered. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Resolved] (LUCENE-3519) BlockJoinCollector only allows retrieving groups for only one BlockJoinQuery
[ https://issues.apache.org/jira/browse/LUCENE-3519?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Michael McCandless resolved LUCENE-3519. Resolution: Fixed Thanks Mark! > BlockJoinCollector only allows retrieving groups for only one BlockJoinQuery > > > Key: LUCENE-3519 > URL: https://issues.apache.org/jira/browse/LUCENE-3519 > Project: Lucene - Java > Issue Type: Bug > Components: modules/join >Reporter: Michael McCandless >Assignee: Michael McCandless > Fix For: 3.5, 4.0 > > Attachments: LUCENE-3519.patch > > > Spinoff from Mark Harwood's email (subject "BlockJoin concerns") to > dev list. > It's fine to use multiple nested joins in a single query, and > BlockJoinCollector should let you retrieve the top groups for all of > them. > But currently it always returns null after the first query's groups > have been retrieved, because of a silly bug. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Resolved] (LUCENE-3510) BooleanScorer should not limit number of prohibited clauses
[ https://issues.apache.org/jira/browse/LUCENE-3510?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Michael McCandless resolved LUCENE-3510. Resolution: Fixed > BooleanScorer should not limit number of prohibited clauses > --- > > Key: LUCENE-3510 > URL: https://issues.apache.org/jira/browse/LUCENE-3510 > Project: Lucene - Java > Issue Type: Improvement >Reporter: Michael McCandless >Assignee: Michael McCandless > Fix For: 3.5, 4.0 > > Attachments: LUCENE-3510.patch > > > Today it's limited to 32, because it uses a separate bit in the mask > for each clause. > But I don't understand why it does this; I think all prohibited > clauses can share a single boolean/bit? Any match on a prohibited > clause sets this bit and the doc is not collected; we don't need each > prohibited clause to have a dedicated bit? > We also use the mask for required clauses, but this code is now > commented out (we always use BS2 if there are any required clauses); > if we re-enable this code (and I think we should, at least in certain > cases: I suspect it'd be faster than BS2 in many cases), I think we > can cutover to an int count instead of bit masks, and then have no > limit on the required clauses sent to BooleanScorer also. > Separately I cleaned a few things up about BooleanScorer: all of the > embedded scorer methods (nextDoc, docID, advance, score) now throw > UOE; pre-allocate the buckets instead of doing it lazily > per-sub-collect. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Resolved] (LUCENE-3504) DocValues: deref/sorted bytes types shouldn't return empty byte[] when doc didn't have a value
[ https://issues.apache.org/jira/browse/LUCENE-3504?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Michael McCandless resolved LUCENE-3504. Resolution: Won't Fix > DocValues: deref/sorted bytes types shouldn't return empty byte[] when doc > didn't have a value > -- > > Key: LUCENE-3504 > URL: https://issues.apache.org/jira/browse/LUCENE-3504 > Project: Lucene - Java > Issue Type: Bug >Reporter: Michael McCandless >Assignee: Michael McCandless > Fix For: 4.0 > > > I'm looking at making a FieldComparator that uses DV's SortedSource to > sort by string field (ie just like TermOrdValComparator, except using > DV instead of FieldCache). We already have comparators for DV int and > float DV fields. > But one thing I noticed is we can't detect documents that didn't have > any value indexed vs documents that had empty byte[] indexed. > This is easy to fix (and we used to do this), because these types are > deref'd (ie, each doc stores an address, and then separately looks up > the byte[] at that address), we can reserve ord/address 0 to mean "doc > didn't have the field". Then we should return null when you retrieve > the BytesRef value for that field. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Resolved] (LUCENE-3486) Add SearcherLifetimeManager, so you can retrieve the same searcher you previously used
[ https://issues.apache.org/jira/browse/LUCENE-3486?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Michael McCandless resolved LUCENE-3486. Resolution: Fixed > Add SearcherLifetimeManager, so you can retrieve the same searcher you > previously used > -- > > Key: LUCENE-3486 > URL: https://issues.apache.org/jira/browse/LUCENE-3486 > Project: Lucene - Java > Issue Type: New Feature > Components: core/search >Reporter: Michael McCandless >Assignee: Michael McCandless > Fix For: 3.5, 4.0 > > Attachments: LUCENE-3486.patch, LUCENE-3486.patch, LUCENE-3486.patch > > > The idea is similar to SOLR-2809 (adding searcher leases to Solr). > This utility class sits above whatever your source is for "the > current" searcher (eg NRTManager, SearcherManager, etc.), and records > (holds a reference to) each searcher in recent history. > The idea is to ensure that when a user does a follow-on action (clicks > next page, drills down/up), or when two or more searcher invocations > within a single user search need to happen against the same searcher > (eg in distributed search), you can retrieve the same searcher you > used "last time". > I think with the new searchAfter API (LUCENE-2215), doing follow-on > searches on the same searcher is more important, since the "bottom" > (score/docID) held for that API can easily shift when a new searcher > is opened. > When you do a "new" search, you record the searcher you used with the > manager, and it returns to you a long token (currently just the > IR.getVersion()), which you can later use to retrieve the same > searcher. > Separately you must periodically call prune(), to prune the old > searchers, ideally from the same thread / at the same time that > you open a new searcher. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Resolved] (LUCENE-3502) Packed ints: move .getArray into Reader API
[ https://issues.apache.org/jira/browse/LUCENE-3502?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Michael McCandless resolved LUCENE-3502. Resolution: Fixed Thanks Simon and Robert. > Packed ints: move .getArray into Reader API > --- > > Key: LUCENE-3502 > URL: https://issues.apache.org/jira/browse/LUCENE-3502 > Project: Lucene - Java > Issue Type: Improvement >Reporter: Michael McCandless >Assignee: Michael McCandless > Fix For: 4.0 > > Attachments: LUCENE-3502.patch, LUCENE-3502.patch > > > This is a simple code cleanup... it's messy that a consumer of > PackedInts.Reader must check whether the impl is Direct8/16/32/64 in > order to get an array; it's better to move up the .getArray into the > Reader interface and then make the DirectN impls package private. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Resolved] (LUCENE-3464) Rename IndexReader.reopen to make it clear that reopen may not happen
[ https://issues.apache.org/jira/browse/LUCENE-3464?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Michael McCandless resolved LUCENE-3464. Resolution: Fixed Thanks everyone! > Rename IndexReader.reopen to make it clear that reopen may not happen > - > > Key: LUCENE-3464 > URL: https://issues.apache.org/jira/browse/LUCENE-3464 > Project: Lucene - Java > Issue Type: Bug >Reporter: Michael McCandless >Assignee: Michael McCandless > Fix For: 3.5, 4.0 > > Attachments: LUCENE-3464.3x.patch, LUCENE-3464.patch, > LUCENE-3464.patch > > > Spinoff from LUCENE-3454 where Shai noted this inconsistency. > IR.reopen sounds like an unconditional operation, which has trapped users in > the past into always closing the old reader instead of only closing it if the > returned reader is new. > I think this hidden maybe-ness is trappy and we should rename it > (maybeReopen? reopenIfNeeded?). > In addition, instead of returning "this" when the reopen didn't happen, I > think we should return null to enforce proper usage of the maybe-ness of this > API. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Resolved] (LUCENE-3477) Fix JFlex tokenizer compiler warnings
[ https://issues.apache.org/jira/browse/LUCENE-3477?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Michael McCandless resolved LUCENE-3477. Resolution: Fixed > Fix JFlex tokenizer compiler warnings > - > > Key: LUCENE-3477 > URL: https://issues.apache.org/jira/browse/LUCENE-3477 > Project: Lucene - Java > Issue Type: Improvement >Reporter: Michael McCandless > Fix For: 3.5, 4.0 > > Attachments: LUCENE-3477.patch > > > We get lots of distracting fallthrough warnings running "ant compile" > in modules/analysis, from the tokenizers generated from JFlex. > Digging a bit, they actually do look spooky. > So I managed to edit the JFlex inputs to insert a bunch of break > statements in our rules, but I have no idea if this is > right/dangerous, and it seems a bit weird having to do such insertions > of "naked" breaks. > But, this does fix all the warnings, and all tests pass... -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Resolved] (LUCENE-3472) add back Document.getValues()
[ https://issues.apache.org/jira/browse/LUCENE-3472?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Michael McCandless resolved LUCENE-3472. Resolution: Fixed > add back Document.getValues() > - > > Key: LUCENE-3472 > URL: https://issues.apache.org/jira/browse/LUCENE-3472 > Project: Lucene - Java > Issue Type: Bug >Reporter: Robert Muir >Assignee: Michael McCandless > Fix For: 4.0 > > > I'm porting some code to trunk's new Doc/Field apis, and i keep running into > this pattern: > {noformat} > String[] values = doc.getValues("field"); > {noformat} > But with the new apis, this becomes a little too verbose: > {noformat} > IndexableField[] fields = doc.getFields("field"); > String[] values = new String[fields.length]; > for (int i = 0; i < values.length; i++) { > values[i] = fields[i].stringValue(); > } > {noformat} > I think we should probably add back the sugar api (with the same name) ? -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Resolved] (LUCENE-3471) TestNRTManager test failure
[ https://issues.apache.org/jira/browse/LUCENE-3471?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Michael McCandless resolved LUCENE-3471. Resolution: Fixed Fix Version/s: 4.0 3.5 Thank you Charlie! > TestNRTManager test failure > --- > > Key: LUCENE-3471 > URL: https://issues.apache.org/jira/browse/LUCENE-3471 > Project: Lucene - Java > Issue Type: Bug >Reporter: Robert Muir >Assignee: Michael McCandless > Fix For: 3.5, 4.0 > > > reproduces for me -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Resolved] (LUCENE-3465) IndexSearcher fails to pass docBase to Collector when using ExecutorService
[ https://issues.apache.org/jira/browse/LUCENE-3465?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Michael McCandless resolved LUCENE-3465. Resolution: Fixed > IndexSearcher fails to pass docBase to Collector when using ExecutorService > --- > > Key: LUCENE-3465 > URL: https://issues.apache.org/jira/browse/LUCENE-3465 > Project: Lucene - Java > Issue Type: Bug >Reporter: Michael McCandless >Assignee: Michael McCandless > Fix For: 3.5 > > Attachments: LUCENE-3465.patch > > > This bug is causing the failure in TestSearchAfter. > We are now always passing docBase 0 to Collector when you use ExecutorService > with IndexSearcher. > This doesn't affect trunk (AtomicReaderContext carries the right docBase); > only 3.x. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org