Build failed in Hudson: Lucene-trunk #720
See http://hudson.zones.apache.org/hudson/job/Lucene-trunk/720/changes Changes: [mikemccand] LUCENE-1483: switch to newly added MultiReaderHitCollector for all core collectors, that is aware of segment transitions during searching, to improve performance of searching and warming [uschindler] Implement a shortcut, when range has min>max. In this case a static empty SortedVIntList is returned. [uschindler] LUCENE-1530: Support inclusive/exclusive for TrieRangeQuery/-Filter, remove default trie variant setters/getters -- [...truncated 10489 lines...] init: compile-test: [echo] Building swing... javacc-uptodate-check: javacc-notice: jflex-uptodate-check: jflex-notice: common.init: build-lucene: build-lucene-tests: init: clover.setup: clover.info: clover: compile-core: common.compile-test: common.test: [mkdir] Created dir: http://hudson.zones.apache.org/hudson/job/Lucene-trunk/ws/trunk/build/contrib/swing/test [junit] Testsuite: org.apache.lucene.swing.models.TestBasicList [junit] Tests run: 2, Failures: 0, Errors: 0, Time elapsed: 0.614 sec [junit] [junit] Testsuite: org.apache.lucene.swing.models.TestBasicTable [junit] Tests run: 3, Failures: 0, Errors: 0, Time elapsed: 0.6 sec [junit] [junit] Testsuite: org.apache.lucene.swing.models.TestSearchingList [junit] Tests run: 1, Failures: 0, Errors: 0, Time elapsed: 0.595 sec [junit] [junit] Testsuite: org.apache.lucene.swing.models.TestSearchingTable [junit] Tests run: 1, Failures: 0, Errors: 0, Time elapsed: 0.636 sec [junit] [junit] Testsuite: org.apache.lucene.swing.models.TestUpdatingList [junit] Tests run: 4, Failures: 0, Errors: 0, Time elapsed: 0.73 sec [junit] [junit] Testsuite: org.apache.lucene.swing.models.TestUpdatingTable [junit] Tests run: 4, Failures: 0, Errors: 0, Time elapsed: 1.09 sec [junit] [delete] Deleting: http://hudson.zones.apache.org/hudson/job/Lucene-trunk/ws/trunk/build/contrib/swing/test/junitfailed.flag [echo] Building wikipedia... javacc-uptodate-check: javacc-notice: jflex-uptodate-check: jflex-notice: common.init: build-lucene: build-lucene-tests: init: test: [echo] Building wikipedia... javacc-uptodate-check: javacc-notice: jflex-uptodate-check: jflex-notice: common.init: build-lucene: build-lucene-tests: init: compile-test: [echo] Building wikipedia... javacc-uptodate-check: javacc-notice: jflex-uptodate-check: jflex-notice: common.init: build-lucene: build-lucene-tests: init: clover.setup: clover.info: clover: compile-core: common.compile-test: common.test: [mkdir] Created dir: http://hudson.zones.apache.org/hudson/job/Lucene-trunk/ws/trunk/build/contrib/wikipedia/test [junit] Testsuite: org.apache.lucene.wikipedia.analysis.WikipediaTokenizerTest [junit] Tests run: 5, Failures: 0, Errors: 0, Time elapsed: 0.407 sec [junit] [delete] Deleting: http://hudson.zones.apache.org/hudson/job/Lucene-trunk/ws/trunk/build/contrib/wikipedia/test/junitfailed.flag [echo] Building wordnet... javacc-uptodate-check: javacc-notice: jflex-uptodate-check: jflex-notice: common.init: build-lucene: build-lucene-tests: init: test: [echo] Building xml-query-parser... javacc-uptodate-check: javacc-notice: jflex-uptodate-check: jflex-notice: common.init: build-lucene: build-lucene-tests: init: test: [echo] Building xml-query-parser... javacc-uptodate-check: javacc-notice: jflex-uptodate-check: jflex-notice: common.init: build-lucene: build-lucene-tests: init: compile-test: [echo] Building xml-query-parser... build-queries: javacc-uptodate-check: javacc-notice: jflex-uptodate-check: jflex-notice: common.init: build-lucene: build-lucene-tests: init: clover.setup: clover.info: clover: common.compile-core: compile-core: common.compile-test: common.test: [mkdir] Created dir: http://hudson.zones.apache.org/hudson/job/Lucene-trunk/ws/trunk/build/contrib/xml-query-parser/test [junit] Testsuite: org.apache.lucene.xmlparser.TestParser [junit] Tests run: 17, Failures: 0, Errors: 0, Time elapsed: 3.062 sec [junit] [junit] Testsuite: org.apache.lucene.xmlparser.TestQueryTemplateManager [junit] Tests run: 1, Failures: 0, Errors: 0, Time elapsed: 2.782 sec [junit] [delete] Deleting: http://hudson.zones.apache.org/hudson/job/Lucene-trunk/ws/trunk/build/contrib/xml-query-parser/test/junitfailed.flag test: download-tag: [mkdir] Created dir: http://hudson.zones.apache.org/hudson/job/Lucene-trunk/ws/trunk/tags/tags/lucene_2_4_back_compat_tests_20090127 [exec] Error validating server certificate for 'https://svn.apache.org:443': [exec] - The certificate is not issued by a trusted authority. Use the [exec]fingerprint to validate the certificate manually! [exec] Certificate information: [exec] - Hostname: s
[jira] Commented: (LUCENE-1484) Remove SegmentReader.document synchronization
[ https://issues.apache.org/jira/browse/LUCENE-1484?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12667917#action_12667917 ] Jason Bennett commented on LUCENE-1484: --- Is there any chance this patch could be released in 2.4.1, instead of waiting for 2.9? > Remove SegmentReader.document synchronization > - > > Key: LUCENE-1484 > URL: https://issues.apache.org/jira/browse/LUCENE-1484 > Project: Lucene - Java > Issue Type: Improvement > Components: Index >Affects Versions: 2.4 >Reporter: Jason Rutherglen >Assignee: Michael McCandless > Fix For: 2.9 > > Attachments: LUCENE-1484.patch, LUCENE-1484.patch > > Original Estimate: 96h > Remaining Estimate: 96h > > This is probably the last synchronization issue in Lucene. It is the > document method in SegmentReader. It is avoidable by using a threadlocal for > FieldsReader. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org For additional commands, e-mail: java-dev-h...@lucene.apache.org
what is the correct javacc version to generate the queryparser?
Hi, Javacc 4.2 is out, but the code generated by this version is different than code generated by javacc 4.1, which I think is the version used to generate the lucene queryparser files. What is the official javacc version used when generating the queryparser classes ? Is it a good idea to submit a patch with code generated by javacc 4.2 ? -Lafa - To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org For additional commands, e-mail: java-dev-h...@lucene.apache.org
RE: RE: Hudson Java Docs?
Hi, When looking through the All-Javadocs index generated on hudson, I have seen some small bugs due to the new spatial contrib and the trie package. I modified build.xml and site.xml (attached patch). Can somebody with commit rights apply this? Maybe, the spatial contrib should go into CHANGES.txt of contrib, too. Uwe - Uwe Schindler H.-H.-Meier-Allee 63, D-28213 Bremen http://www.thetaphi.de eMail: u...@thetaphi.de > -Original Message- > From: Uwe Schindler [mailto:u...@thetaphi.de] > Sent: Tuesday, January 27, 2009 11:56 AM > To: java-dev@lucene.apache.org > Subject: RE: RE: Hudson Java Docs? > > > Alternately, we could turn off the "Publish Javadoc" feature, and > instead > > add trunk/build/docs/api to the list of files to "Archive" and then > start > > refering to a URL like this (doesn't work at the moment) for all the > > javadocs... > > > > http://hudson.zones.apache.org/hudson/view/Lucene/job/Lucene- > > trunk/lastSuccessfulBuild/artifact/trunk/build/docs/api/ > > > > turning that Javadoc feature off should eliminate the existing Javadoc > > links in the hudson navigation, but I suspect the old files would still > be > > there (and in search engine caches) > > Can we do a one-time cleanup (rm -rf in the directory) and then have a new > and clean start (maybe ask the hudson team at Apache)? The index.html file > for the javadocs in the root javadoc folder is possible, but it would not > remove the old files (the others, not index.html) from Googles > cache/index. > > Uwe > > > - > To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org > For additional commands, e-mail: java-dev-h...@lucene.apache.org fix-javadocs-build.patch Description: Binary data - To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org For additional commands, e-mail: java-dev-h...@lucene.apache.org
[jira] Commented: (LUCENE-1528) Add support for Ideographic Space to the queryparser - also know as fullwith space and wide-space
[ https://issues.apache.org/jira/browse/LUCENE-1528?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12667837#action_12667837 ] Luis Alves commented on LUCENE-1528: Hi Michael, I checked the book "Generating parser with JavaCC" and I checked the javacc website (https://javacc.dev.java.net/doc/javaccgrm.html) for grammar, here is the syntax for a character list: character_list ::= [ "~" ] "[" [ character_descriptor ( "," character_descriptor )* ] "]" character_descriptor::= java_string_literal [ "-" java_string_literal ] also the '|' character in javacc syntax is used like an XOR, and there is no OR or AND operator to be used in the javacc syntax that I'm aware. So the expression <_WHITESPACE> | [ "+", ... ] would have to look like ~(<_WHITESPACE> & [ "+", ... ]) but this is not possible in javacc grammar. So I think the best option for now, is to keep the current syntax. If you like, I can change <#_WHITESPACE: ( " " | "\t" | "\n" | "\r") > to a character_list to make it more consistent, but that would not help to remove the duplicated list of characters. <#_WHITESPACE: [ " ", "\t", "\n", "\r" ] > > Add support for Ideographic Space to the queryparser - also know as fullwith > space and wide-space > - > > Key: LUCENE-1528 > URL: https://issues.apache.org/jira/browse/LUCENE-1528 > Project: Lucene - Java > Issue Type: Improvement > Components: QueryParser >Affects Versions: 2.4.1 >Reporter: Luis Alves >Assignee: Michael Busch >Priority: Minor > Fix For: 2.4.1 > > Attachments: lucene_wide_space_v1_src.patch > > Original Estimate: 4h > Remaining Estimate: 4h > > The Ideographic Space is a space character that is as wide as a normal CJK > character cell. > It is also known as wide-space or fullwith space.This type of space is used > in CJK languages. > This patch adds support for the wide space, making the queryparser component > more friendly > to queries that contain CJK text. > Reference: > 'http://en.wikipedia.org/wiki/Space_(punctuation)' - see Table of spaces, > char U+3000. > I also added a new testcase that fails before the patch. > After the patch is applied all junits pass. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org For additional commands, e-mail: java-dev-h...@lucene.apache.org
[jira] Commented: (LUCENE-1507) adding EmptyDocIdSet/Iterator
[ https://issues.apache.org/jira/browse/LUCENE-1507?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12667793#action_12667793 ] Michael McCandless commented on LUCENE-1507: New patch looks good, thanks Uwe. > adding EmptyDocIdSet/Iterator > - > > Key: LUCENE-1507 > URL: https://issues.apache.org/jira/browse/LUCENE-1507 > Project: Lucene - Java > Issue Type: Improvement > Components: Search >Affects Versions: 2.4 >Reporter: John Wang >Assignee: Michael McCandless > Attachments: emptydocidset.txt, LUCENE-1507.patch, LUCENE-1507.patch > > Original Estimate: 1h > Remaining Estimate: 1h > > Adding convenience classes for EmptyDocIdSet and EmptyDocIdSetIterator -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org For additional commands, e-mail: java-dev-h...@lucene.apache.org
[jira] Resolved: (LUCENE-1483) Change IndexSearcher multisegment searches to search each individual segment using a single HitCollector
[ https://issues.apache.org/jira/browse/LUCENE-1483?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Michael McCandless resolved LUCENE-1483. Resolution: Fixed Fix Version/s: 2.9 Committed revision 738219. Thanks to everyone who helped out here...and many thanks to Mark for working through so many iterations as we explored the different approaches here! > Change IndexSearcher multisegment searches to search each individual segment > using a single HitCollector > > > Key: LUCENE-1483 > URL: https://issues.apache.org/jira/browse/LUCENE-1483 > Project: Lucene - Java > Issue Type: Improvement >Affects Versions: 2.9 >Reporter: Mark Miller >Priority: Minor > Fix For: 2.9 > > Attachments: LUCENE-1483-backcompat.patch, LUCENE-1483-partial.patch, > LUCENE-1483.patch, LUCENE-1483.patch, LUCENE-1483.patch, LUCENE-1483.patch, > LUCENE-1483.patch, LUCENE-1483.patch, LUCENE-1483.patch, LUCENE-1483.patch, > LUCENE-1483.patch, LUCENE-1483.patch, LUCENE-1483.patch, LUCENE-1483.patch, > LUCENE-1483.patch, LUCENE-1483.patch, LUCENE-1483.patch, LUCENE-1483.patch, > LUCENE-1483.patch, LUCENE-1483.patch, LUCENE-1483.patch, LUCENE-1483.patch, > LUCENE-1483.patch, LUCENE-1483.patch, LUCENE-1483.patch, LUCENE-1483.patch, > LUCENE-1483.patch, LUCENE-1483.patch, LUCENE-1483.patch, LUCENE-1483.patch, > LUCENE-1483.patch, LUCENE-1483.patch, LUCENE-1483.patch, LUCENE-1483.patch, > LUCENE-1483.patch, LUCENE-1483.patch, sortBench.py, sortCollate.py > > > This issue changes how an IndexSearcher searches over multiple segments. The > current method of searching multiple segments is to use a MultiSegmentReader > and treat all of the segments as one. This causes filters and FieldCaches to > be keyed to the MultiReader and makes reopen expensive. If only a few > segments change, the FieldCache is still loaded for all of them. > This patch changes things by searching each individual segment one at a time, > but sharing the HitCollector used across each segment. This allows > FieldCaches and Filters to be keyed on individual SegmentReaders, making > reopen much cheaper. FieldCache loading over multiple segments can be much > faster as well - with the old method, all unique terms for every segment is > enumerated against each segment - because of the likely logarithmic change in > terms per segment, this can be very wasteful. Searching individual segments > avoids this cost. The term/document statistics from the multireader are used > to score results for each segment. > When sorting, its more difficult to use a single HitCollector for each sub > searcher. Ordinals are not comparable across segments. To account for this, a > new field sort enabled HitCollector is introduced that is able to collect and > sort across segments (because of its ability to compare ordinals across > segments). This TopFieldCollector class will collect the values/ordinals for > a given segment, and upon moving to the next segment, translate any > ordinals/values so that they can be compared against the values for the new > segment. This is done lazily. > All and all, the switch seems to provide numerous performance benefits, in > both sorted and non sorted search. We were seeing a good loss on indices with > lots of segments (1000?) and certain queue sizes / queries, but the latest > results seem to show thats been mostly taken care of (you shouldnt be using > such a large queue on such a segmented index anyway). > * Introduces > ** MultiReaderHitCollector - a HitCollector that can collect across multiple > IndexReaders. Old HitCollectors are wrapped to support multiple IndexReaders. > ** TopFieldCollector - a HitCollector that can compare values/ordinals across > IndexReaders and sort on fields. > ** FieldValueHitQueue - a Priority queue that is part of the > TopFieldCollector implementation. > ** FieldComparator - a new Comparator class that works across IndexReaders. > Part of the TopFieldCollector implementation. > ** FieldComparatorSource - new class to allow for custom Comparators. > * Alters > ** IndexSearcher uses a single HitCollector to collect hits against each > individual SegmentReader. All the other changes stem from this ;) > * Deprecates > ** TopFieldDocCollector > ** FieldSortedHitQueue -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org For additional commands, e-mail: java-dev-h...@lucene.apache.org
[jira] Updated: (LUCENE-1507) adding EmptyDocIdSet/Iterator
[ https://issues.apache.org/jira/browse/LUCENE-1507?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Uwe Schindler updated LUCENE-1507: -- Attachment: LUCENE-1507.patch Hi Mike, I just updated the patch a little bit to supply javadocs for iterator() method, too. It also contains the first example usage in TrieRangeFilter (where a private instance was used until now). This can be committed together with this. Maybe the conventional RangeFilter/RangeQuery can be optimized in that way, too. > adding EmptyDocIdSet/Iterator > - > > Key: LUCENE-1507 > URL: https://issues.apache.org/jira/browse/LUCENE-1507 > Project: Lucene - Java > Issue Type: Improvement > Components: Search >Affects Versions: 2.4 >Reporter: John Wang >Assignee: Michael McCandless > Attachments: emptydocidset.txt, LUCENE-1507.patch, LUCENE-1507.patch > > Original Estimate: 1h > Remaining Estimate: 1h > > Adding convenience classes for EmptyDocIdSet and EmptyDocIdSetIterator -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org For additional commands, e-mail: java-dev-h...@lucene.apache.org
[jira] Commented: (LUCENE-1483) Change IndexSearcher multisegment searches to search each individual segment using a single HitCollector
[ https://issues.apache.org/jira/browse/LUCENE-1483?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12667775#action_12667775 ] Michael McCandless commented on LUCENE-1483: bq. The only problem with parallelization is that the MultiReaderHitCollector must be synchronized in some way. I think we'd have to collect to separate collectors and then merge (like ParallelMultiSearcher does today)? I think this (separate thread for the "big" segments, and one thread for the "long tail") would be a good approach, except I don't like that the performance would depend so much on the structure of the index. EG after you've optimized your index you'd suddenly get no concurrency, and presumably worse performance than when you had a few big segments. Could we instead divide the index into chunks and have each thread skipTo the start of its chunk? EG if the index has N docs, and you want to use M threads, each thread visits N/M docs. If that can work it should be less dependent on the index structure. > Change IndexSearcher multisegment searches to search each individual segment > using a single HitCollector > > > Key: LUCENE-1483 > URL: https://issues.apache.org/jira/browse/LUCENE-1483 > Project: Lucene - Java > Issue Type: Improvement >Affects Versions: 2.9 >Reporter: Mark Miller >Priority: Minor > Attachments: LUCENE-1483-backcompat.patch, LUCENE-1483-partial.patch, > LUCENE-1483.patch, LUCENE-1483.patch, LUCENE-1483.patch, LUCENE-1483.patch, > LUCENE-1483.patch, LUCENE-1483.patch, LUCENE-1483.patch, LUCENE-1483.patch, > LUCENE-1483.patch, LUCENE-1483.patch, LUCENE-1483.patch, LUCENE-1483.patch, > LUCENE-1483.patch, LUCENE-1483.patch, LUCENE-1483.patch, LUCENE-1483.patch, > LUCENE-1483.patch, LUCENE-1483.patch, LUCENE-1483.patch, LUCENE-1483.patch, > LUCENE-1483.patch, LUCENE-1483.patch, LUCENE-1483.patch, LUCENE-1483.patch, > LUCENE-1483.patch, LUCENE-1483.patch, LUCENE-1483.patch, LUCENE-1483.patch, > LUCENE-1483.patch, LUCENE-1483.patch, LUCENE-1483.patch, LUCENE-1483.patch, > LUCENE-1483.patch, LUCENE-1483.patch, sortBench.py, sortCollate.py > > > This issue changes how an IndexSearcher searches over multiple segments. The > current method of searching multiple segments is to use a MultiSegmentReader > and treat all of the segments as one. This causes filters and FieldCaches to > be keyed to the MultiReader and makes reopen expensive. If only a few > segments change, the FieldCache is still loaded for all of them. > This patch changes things by searching each individual segment one at a time, > but sharing the HitCollector used across each segment. This allows > FieldCaches and Filters to be keyed on individual SegmentReaders, making > reopen much cheaper. FieldCache loading over multiple segments can be much > faster as well - with the old method, all unique terms for every segment is > enumerated against each segment - because of the likely logarithmic change in > terms per segment, this can be very wasteful. Searching individual segments > avoids this cost. The term/document statistics from the multireader are used > to score results for each segment. > When sorting, its more difficult to use a single HitCollector for each sub > searcher. Ordinals are not comparable across segments. To account for this, a > new field sort enabled HitCollector is introduced that is able to collect and > sort across segments (because of its ability to compare ordinals across > segments). This TopFieldCollector class will collect the values/ordinals for > a given segment, and upon moving to the next segment, translate any > ordinals/values so that they can be compared against the values for the new > segment. This is done lazily. > All and all, the switch seems to provide numerous performance benefits, in > both sorted and non sorted search. We were seeing a good loss on indices with > lots of segments (1000?) and certain queue sizes / queries, but the latest > results seem to show thats been mostly taken care of (you shouldnt be using > such a large queue on such a segmented index anyway). > * Introduces > ** MultiReaderHitCollector - a HitCollector that can collect across multiple > IndexReaders. Old HitCollectors are wrapped to support multiple IndexReaders. > ** TopFieldCollector - a HitCollector that can compare values/ordinals across > IndexReaders and sort on fields. > ** FieldValueHitQueue - a Priority queue that is part of the > TopFieldCollector implementation. > ** FieldComparator - a new Comparator class that works across IndexReaders. > Part of the TopFieldCollector implementation. > ** FieldComparatorSource - new class to allow for custom Comparators. > * Alters > ** IndexSearcher uses a sing
[jira] Commented: (LUCENE-1483) Change IndexSearcher multisegment searches to search each individual segment using a single HitCollector
[ https://issues.apache.org/jira/browse/LUCENE-1483?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12667774#action_12667774 ] Michael McCandless commented on LUCENE-1483: bq. since last Friday, we had no problem with the new sort implementation. OK, excellent. I will commit shortly! > Change IndexSearcher multisegment searches to search each individual segment > using a single HitCollector > > > Key: LUCENE-1483 > URL: https://issues.apache.org/jira/browse/LUCENE-1483 > Project: Lucene - Java > Issue Type: Improvement >Affects Versions: 2.9 >Reporter: Mark Miller >Priority: Minor > Attachments: LUCENE-1483-backcompat.patch, LUCENE-1483-partial.patch, > LUCENE-1483.patch, LUCENE-1483.patch, LUCENE-1483.patch, LUCENE-1483.patch, > LUCENE-1483.patch, LUCENE-1483.patch, LUCENE-1483.patch, LUCENE-1483.patch, > LUCENE-1483.patch, LUCENE-1483.patch, LUCENE-1483.patch, LUCENE-1483.patch, > LUCENE-1483.patch, LUCENE-1483.patch, LUCENE-1483.patch, LUCENE-1483.patch, > LUCENE-1483.patch, LUCENE-1483.patch, LUCENE-1483.patch, LUCENE-1483.patch, > LUCENE-1483.patch, LUCENE-1483.patch, LUCENE-1483.patch, LUCENE-1483.patch, > LUCENE-1483.patch, LUCENE-1483.patch, LUCENE-1483.patch, LUCENE-1483.patch, > LUCENE-1483.patch, LUCENE-1483.patch, LUCENE-1483.patch, LUCENE-1483.patch, > LUCENE-1483.patch, LUCENE-1483.patch, sortBench.py, sortCollate.py > > > This issue changes how an IndexSearcher searches over multiple segments. The > current method of searching multiple segments is to use a MultiSegmentReader > and treat all of the segments as one. This causes filters and FieldCaches to > be keyed to the MultiReader and makes reopen expensive. If only a few > segments change, the FieldCache is still loaded for all of them. > This patch changes things by searching each individual segment one at a time, > but sharing the HitCollector used across each segment. This allows > FieldCaches and Filters to be keyed on individual SegmentReaders, making > reopen much cheaper. FieldCache loading over multiple segments can be much > faster as well - with the old method, all unique terms for every segment is > enumerated against each segment - because of the likely logarithmic change in > terms per segment, this can be very wasteful. Searching individual segments > avoids this cost. The term/document statistics from the multireader are used > to score results for each segment. > When sorting, its more difficult to use a single HitCollector for each sub > searcher. Ordinals are not comparable across segments. To account for this, a > new field sort enabled HitCollector is introduced that is able to collect and > sort across segments (because of its ability to compare ordinals across > segments). This TopFieldCollector class will collect the values/ordinals for > a given segment, and upon moving to the next segment, translate any > ordinals/values so that they can be compared against the values for the new > segment. This is done lazily. > All and all, the switch seems to provide numerous performance benefits, in > both sorted and non sorted search. We were seeing a good loss on indices with > lots of segments (1000?) and certain queue sizes / queries, but the latest > results seem to show thats been mostly taken care of (you shouldnt be using > such a large queue on such a segmented index anyway). > * Introduces > ** MultiReaderHitCollector - a HitCollector that can collect across multiple > IndexReaders. Old HitCollectors are wrapped to support multiple IndexReaders. > ** TopFieldCollector - a HitCollector that can compare values/ordinals across > IndexReaders and sort on fields. > ** FieldValueHitQueue - a Priority queue that is part of the > TopFieldCollector implementation. > ** FieldComparator - a new Comparator class that works across IndexReaders. > Part of the TopFieldCollector implementation. > ** FieldComparatorSource - new class to allow for custom Comparators. > * Alters > ** IndexSearcher uses a single HitCollector to collect hits against each > individual SegmentReader. All the other changes stem from this ;) > * Deprecates > ** TopFieldDocCollector > ** FieldSortedHitQueue -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org For additional commands, e-mail: java-dev-h...@lucene.apache.org
[jira] Commented: (LUCENE-1476) BitVector implement DocIdSet, IndexReader returns DocIdSet deleted docs
[ https://issues.apache.org/jira/browse/LUCENE-1476?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12667770#action_12667770 ] Michael McCandless commented on LUCENE-1476: bq. Perhaps this should be an option in the benchmark output? That's a great idea! Something silly must be going on... 99% performance drop can't be right. > BitVector implement DocIdSet, IndexReader returns DocIdSet deleted docs > --- > > Key: LUCENE-1476 > URL: https://issues.apache.org/jira/browse/LUCENE-1476 > Project: Lucene - Java > Issue Type: Improvement > Components: Index >Affects Versions: 2.4 >Reporter: Jason Rutherglen >Priority: Trivial > Attachments: LUCENE-1476.patch, LUCENE-1476.patch, LUCENE-1476.patch, > quasi_iterator_deletions.diff, quasi_iterator_deletions_r2.diff, > searchdeletes.alg > > Original Estimate: 12h > Remaining Estimate: 12h > > Update BitVector to implement DocIdSet. Expose deleted docs DocIdSet from > IndexReader. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org For additional commands, e-mail: java-dev-h...@lucene.apache.org
[jira] Commented: (LUCENE-1507) adding EmptyDocIdSet/Iterator
[ https://issues.apache.org/jira/browse/LUCENE-1507?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12667762#action_12667762 ] Michael McCandless commented on LUCENE-1507: That looks great to me! I'll commit in a day or two. > adding EmptyDocIdSet/Iterator > - > > Key: LUCENE-1507 > URL: https://issues.apache.org/jira/browse/LUCENE-1507 > Project: Lucene - Java > Issue Type: Improvement > Components: Search >Affects Versions: 2.4 >Reporter: John Wang > Attachments: emptydocidset.txt, LUCENE-1507.patch > > Original Estimate: 1h > Remaining Estimate: 1h > > Adding convenience classes for EmptyDocIdSet and EmptyDocIdSetIterator -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org For additional commands, e-mail: java-dev-h...@lucene.apache.org
[jira] Assigned: (LUCENE-1507) adding EmptyDocIdSet/Iterator
[ https://issues.apache.org/jira/browse/LUCENE-1507?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Michael McCandless reassigned LUCENE-1507: -- Assignee: Michael McCandless > adding EmptyDocIdSet/Iterator > - > > Key: LUCENE-1507 > URL: https://issues.apache.org/jira/browse/LUCENE-1507 > Project: Lucene - Java > Issue Type: Improvement > Components: Search >Affects Versions: 2.4 >Reporter: John Wang >Assignee: Michael McCandless > Attachments: emptydocidset.txt, LUCENE-1507.patch > > Original Estimate: 1h > Remaining Estimate: 1h > > Adding convenience classes for EmptyDocIdSet and EmptyDocIdSetIterator -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org For additional commands, e-mail: java-dev-h...@lucene.apache.org
[jira] Commented: (LUCENE-1483) Change IndexSearcher multisegment searches to search each individual segment using a single HitCollector
[ https://issues.apache.org/jira/browse/LUCENE-1483?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12667755#action_12667755 ] Uwe Schindler commented on LUCENE-1483: --- Jason: We should open a new issue for that after this one is solved. Maybe we can create a good parallelized implementation after solving the problems with MultiReaderHitCollector (if more than one thread call setNextReader with collect calls inbetween, it would not work). > Change IndexSearcher multisegment searches to search each individual segment > using a single HitCollector > > > Key: LUCENE-1483 > URL: https://issues.apache.org/jira/browse/LUCENE-1483 > Project: Lucene - Java > Issue Type: Improvement >Affects Versions: 2.9 >Reporter: Mark Miller >Priority: Minor > Attachments: LUCENE-1483-backcompat.patch, LUCENE-1483-partial.patch, > LUCENE-1483.patch, LUCENE-1483.patch, LUCENE-1483.patch, LUCENE-1483.patch, > LUCENE-1483.patch, LUCENE-1483.patch, LUCENE-1483.patch, LUCENE-1483.patch, > LUCENE-1483.patch, LUCENE-1483.patch, LUCENE-1483.patch, LUCENE-1483.patch, > LUCENE-1483.patch, LUCENE-1483.patch, LUCENE-1483.patch, LUCENE-1483.patch, > LUCENE-1483.patch, LUCENE-1483.patch, LUCENE-1483.patch, LUCENE-1483.patch, > LUCENE-1483.patch, LUCENE-1483.patch, LUCENE-1483.patch, LUCENE-1483.patch, > LUCENE-1483.patch, LUCENE-1483.patch, LUCENE-1483.patch, LUCENE-1483.patch, > LUCENE-1483.patch, LUCENE-1483.patch, LUCENE-1483.patch, LUCENE-1483.patch, > LUCENE-1483.patch, LUCENE-1483.patch, sortBench.py, sortCollate.py > > > This issue changes how an IndexSearcher searches over multiple segments. The > current method of searching multiple segments is to use a MultiSegmentReader > and treat all of the segments as one. This causes filters and FieldCaches to > be keyed to the MultiReader and makes reopen expensive. If only a few > segments change, the FieldCache is still loaded for all of them. > This patch changes things by searching each individual segment one at a time, > but sharing the HitCollector used across each segment. This allows > FieldCaches and Filters to be keyed on individual SegmentReaders, making > reopen much cheaper. FieldCache loading over multiple segments can be much > faster as well - with the old method, all unique terms for every segment is > enumerated against each segment - because of the likely logarithmic change in > terms per segment, this can be very wasteful. Searching individual segments > avoids this cost. The term/document statistics from the multireader are used > to score results for each segment. > When sorting, its more difficult to use a single HitCollector for each sub > searcher. Ordinals are not comparable across segments. To account for this, a > new field sort enabled HitCollector is introduced that is able to collect and > sort across segments (because of its ability to compare ordinals across > segments). This TopFieldCollector class will collect the values/ordinals for > a given segment, and upon moving to the next segment, translate any > ordinals/values so that they can be compared against the values for the new > segment. This is done lazily. > All and all, the switch seems to provide numerous performance benefits, in > both sorted and non sorted search. We were seeing a good loss on indices with > lots of segments (1000?) and certain queue sizes / queries, but the latest > results seem to show thats been mostly taken care of (you shouldnt be using > such a large queue on such a segmented index anyway). > * Introduces > ** MultiReaderHitCollector - a HitCollector that can collect across multiple > IndexReaders. Old HitCollectors are wrapped to support multiple IndexReaders. > ** TopFieldCollector - a HitCollector that can compare values/ordinals across > IndexReaders and sort on fields. > ** FieldValueHitQueue - a Priority queue that is part of the > TopFieldCollector implementation. > ** FieldComparator - a new Comparator class that works across IndexReaders. > Part of the TopFieldCollector implementation. > ** FieldComparatorSource - new class to allow for custom Comparators. > * Alters > ** IndexSearcher uses a single HitCollector to collect hits against each > individual SegmentReader. All the other changes stem from this ;) > * Deprecates > ** TopFieldDocCollector > ** FieldSortedHitQueue -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org For additional commands, e-mail: java-dev-h...@lucene.apache.org
[jira] Commented: (LUCENE-1483) Change IndexSearcher multisegment searches to search each individual segment using a single HitCollector
[ https://issues.apache.org/jira/browse/LUCENE-1483?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12667752#action_12667752 ] Uwe Schindler commented on LUCENE-1483: --- Hi Mike, since last Friday, we had no problem with the new sort implementation. No exceptions from Lucene or any problems with Lucene. The sorting of results was (as far as I have seen) always correct (tested was SortField.INT, SortField.STRING). The index was updated each half hour and reopened, really great performance. There were also no errors after an optimize() and reopen again on Sunday (only that it took longer than to warmup the sorting). > Change IndexSearcher multisegment searches to search each individual segment > using a single HitCollector > > > Key: LUCENE-1483 > URL: https://issues.apache.org/jira/browse/LUCENE-1483 > Project: Lucene - Java > Issue Type: Improvement >Affects Versions: 2.9 >Reporter: Mark Miller >Priority: Minor > Attachments: LUCENE-1483-backcompat.patch, LUCENE-1483-partial.patch, > LUCENE-1483.patch, LUCENE-1483.patch, LUCENE-1483.patch, LUCENE-1483.patch, > LUCENE-1483.patch, LUCENE-1483.patch, LUCENE-1483.patch, LUCENE-1483.patch, > LUCENE-1483.patch, LUCENE-1483.patch, LUCENE-1483.patch, LUCENE-1483.patch, > LUCENE-1483.patch, LUCENE-1483.patch, LUCENE-1483.patch, LUCENE-1483.patch, > LUCENE-1483.patch, LUCENE-1483.patch, LUCENE-1483.patch, LUCENE-1483.patch, > LUCENE-1483.patch, LUCENE-1483.patch, LUCENE-1483.patch, LUCENE-1483.patch, > LUCENE-1483.patch, LUCENE-1483.patch, LUCENE-1483.patch, LUCENE-1483.patch, > LUCENE-1483.patch, LUCENE-1483.patch, LUCENE-1483.patch, LUCENE-1483.patch, > LUCENE-1483.patch, LUCENE-1483.patch, sortBench.py, sortCollate.py > > > This issue changes how an IndexSearcher searches over multiple segments. The > current method of searching multiple segments is to use a MultiSegmentReader > and treat all of the segments as one. This causes filters and FieldCaches to > be keyed to the MultiReader and makes reopen expensive. If only a few > segments change, the FieldCache is still loaded for all of them. > This patch changes things by searching each individual segment one at a time, > but sharing the HitCollector used across each segment. This allows > FieldCaches and Filters to be keyed on individual SegmentReaders, making > reopen much cheaper. FieldCache loading over multiple segments can be much > faster as well - with the old method, all unique terms for every segment is > enumerated against each segment - because of the likely logarithmic change in > terms per segment, this can be very wasteful. Searching individual segments > avoids this cost. The term/document statistics from the multireader are used > to score results for each segment. > When sorting, its more difficult to use a single HitCollector for each sub > searcher. Ordinals are not comparable across segments. To account for this, a > new field sort enabled HitCollector is introduced that is able to collect and > sort across segments (because of its ability to compare ordinals across > segments). This TopFieldCollector class will collect the values/ordinals for > a given segment, and upon moving to the next segment, translate any > ordinals/values so that they can be compared against the values for the new > segment. This is done lazily. > All and all, the switch seems to provide numerous performance benefits, in > both sorted and non sorted search. We were seeing a good loss on indices with > lots of segments (1000?) and certain queue sizes / queries, but the latest > results seem to show thats been mostly taken care of (you shouldnt be using > such a large queue on such a segmented index anyway). > * Introduces > ** MultiReaderHitCollector - a HitCollector that can collect across multiple > IndexReaders. Old HitCollectors are wrapped to support multiple IndexReaders. > ** TopFieldCollector - a HitCollector that can compare values/ordinals across > IndexReaders and sort on fields. > ** FieldValueHitQueue - a Priority queue that is part of the > TopFieldCollector implementation. > ** FieldComparator - a new Comparator class that works across IndexReaders. > Part of the TopFieldCollector implementation. > ** FieldComparatorSource - new class to allow for custom Comparators. > * Alters > ** IndexSearcher uses a single HitCollector to collect hits against each > individual SegmentReader. All the other changes stem from this ;) > * Deprecates > ** TopFieldDocCollector > ** FieldSortedHitQueue -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe
[jira] Updated: (LUCENE-1507) adding EmptyDocIdSet/Iterator
[ https://issues.apache.org/jira/browse/LUCENE-1507?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Uwe Schindler updated LUCENE-1507: -- Attachment: LUCENE-1507.patch How about that patch? It just a static final for usage like this in filters: {code} if (shortcut condition) return DocIdSet.EMPTY_DOCIDSET {code} > adding EmptyDocIdSet/Iterator > - > > Key: LUCENE-1507 > URL: https://issues.apache.org/jira/browse/LUCENE-1507 > Project: Lucene - Java > Issue Type: Improvement > Components: Search >Affects Versions: 2.4 >Reporter: John Wang > Attachments: emptydocidset.txt, LUCENE-1507.patch > > Original Estimate: 1h > Remaining Estimate: 1h > > Adding convenience classes for EmptyDocIdSet and EmptyDocIdSetIterator -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org For additional commands, e-mail: java-dev-h...@lucene.apache.org
[jira] Commented: (LUCENE-1476) BitVector implement DocIdSet, IndexReader returns DocIdSet deleted docs
[ https://issues.apache.org/jira/browse/LUCENE-1476?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12667748#action_12667748 ] Marvin Humphrey commented on LUCENE-1476: - > The percentage performance decrease in the previous > results is 99%. That's pretty strange. I look forward to seeing profiling data. > BitVector implement DocIdSet, IndexReader returns DocIdSet deleted docs > --- > > Key: LUCENE-1476 > URL: https://issues.apache.org/jira/browse/LUCENE-1476 > Project: Lucene - Java > Issue Type: Improvement > Components: Index >Affects Versions: 2.4 >Reporter: Jason Rutherglen >Priority: Trivial > Attachments: LUCENE-1476.patch, LUCENE-1476.patch, LUCENE-1476.patch, > quasi_iterator_deletions.diff, quasi_iterator_deletions_r2.diff, > searchdeletes.alg > > Original Estimate: 12h > Remaining Estimate: 12h > > Update BitVector to implement DocIdSet. Expose deleted docs DocIdSet from > IndexReader. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org For additional commands, e-mail: java-dev-h...@lucene.apache.org
[jira] Commented: (LUCENE-1476) BitVector implement DocIdSet, IndexReader returns DocIdSet deleted docs
[ https://issues.apache.org/jira/browse/LUCENE-1476?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12667713#action_12667713 ] Jason Rutherglen commented on LUCENE-1476: -- The percentage performance decrease in the previous results is 99%. {quote} Jason can you format those results using a Jira table? {quote} Perhaps this should be an option in the benchmark output? {quote} M.M. LUCENE-1516 comment: "I think the larger number of [harder-for-cpu-to-predict] if statements may be the cause of the slowdown once %tg deletes gets high enough?" {quote} I have been looking at the performance with YourKit and don't have any conclusions yet. The main difference between using skipto and BV.get is the if statements and some added method calls, which even if they are inlined I suspect will not make up the difference. Next steps: 1. Deletes as a NOT boolean query which probably should be it's own patch 2. Pluggable alternative representations such as OpenBitSet and int array, part of this patch? > BitVector implement DocIdSet, IndexReader returns DocIdSet deleted docs > --- > > Key: LUCENE-1476 > URL: https://issues.apache.org/jira/browse/LUCENE-1476 > Project: Lucene - Java > Issue Type: Improvement > Components: Index >Affects Versions: 2.4 >Reporter: Jason Rutherglen >Priority: Trivial > Attachments: LUCENE-1476.patch, LUCENE-1476.patch, LUCENE-1476.patch, > quasi_iterator_deletions.diff, quasi_iterator_deletions_r2.diff, > searchdeletes.alg > > Original Estimate: 12h > Remaining Estimate: 12h > > Update BitVector to implement DocIdSet. Expose deleted docs DocIdSet from > IndexReader. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org For additional commands, e-mail: java-dev-h...@lucene.apache.org
[jira] Resolved: (LUCENE-1530) Support inclusive/exclusive for TrieRangeQuery/-Filter, remove default trie variant setters/getters
[ https://issues.apache.org/jira/browse/LUCENE-1530?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Uwe Schindler resolved LUCENE-1530. --- Resolution: Fixed Lucene Fields: [New, Patch Available] (was: [Patch Available, New]) Comitted revision 738109 > Support inclusive/exclusive for TrieRangeQuery/-Filter, remove default trie > variant setters/getters > --- > > Key: LUCENE-1530 > URL: https://issues.apache.org/jira/browse/LUCENE-1530 > Project: Lucene - Java > Issue Type: New Feature > Components: contrib/* >Affects Versions: 2.9 >Reporter: Uwe Schindler >Assignee: Uwe Schindler > Fix For: 2.9 > > Attachments: LUCENE-1530.patch, LUCENE-1530.patch > > > TrieRangeQuery/Filter is missing one thing: Ranges that have exclusive > bounds. For TrieRangeQuery this may not be important for ranges on long or > Date (==long) values (because [1..5] is the same like ]0..6[ or ]0..5]). This > is not so simple for doubles because you must add/substract 1 from the trie > encoded unsigned long. > To be conform with the other range queries, I will submit a patch that has > two additional boolean parameters in the ctors to support inclusive/exclusive > ranges for both ends. Internally it will be implemented using > TrieUtils.incrementTrieCoded/decrementTrieCoded() but makes life simplier for > double ranges (a simple exclusive replacement for the floating point range > [0.0..1.0] is not possible without having the underlying unsigned long). > In December, when trie contrib was included (LUCENE-1470), 3 trie variants > were supplied by TrieUtils. For new APIs a statically configureable default > Trie variant does not conform to an API we want in Lucene (currently we want > to deprecate all these static setters/getters). The important thing: It does > not make code shorter or easier to understand, its more error prone. Before > release of 2.9 it is a good time to remove the default trie variant and > always force the parameter in TrieRangeQuery/Filter. It is better to choose > the variant in the application and do not automatically manage it. > As Lucene 2.9 was not yet released, I will change the ctors and not preserve > the old ones. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org For additional commands, e-mail: java-dev-h...@lucene.apache.org
[jira] Commented: (LUCENE-1489) highlighter problem with n-gram tokens
[ https://issues.apache.org/jira/browse/LUCENE-1489?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12667654#action_12667654 ] Mark Harwood commented on LUCENE-1489: -- It looks to me like this could be fixed in the "Formatter" classes when marking up the output string. Currently classes such as SimpleHTMLFormatter in their "highlightTerm" method put a tag around the whole section of text, if it contains a hit, i.e. {code:title=SimpleHTMLFormatter.java|borderStyle=solid} public String highlightTerm(String originalText, TokenGroup tokenGroup) { StringBuffer returnBuffer; if(tokenGroup.getTotalScore()>0) { returnBuffer=new StringBuffer(); returnBuffer.append(preTag); returnBuffer.append(originalText); returnBuffer.append(postTag); return returnBuffer.toString(); } return originalText; } {code} The TokenGroup object passed to this method contains all of the tokens and their scores so it should be possible to use this information to deconstruct the originalText parameter and inject markup according to which tokens in the group had a match rather than putting a tag around the whole block. Some complexity may lie in handling token streams that produce tokens that "rewind" to earlier offsets. SimpleHtmlFormatter suddenly seems less simple! TokenStreams that produce entirely overlapping streams of tokens will automatically be broken into multiple TokenGroups because TokenGroup has a maximum number of linked Tokens it will ever hold in a single group. I haven't got the time to fix this right now but if someone has a burning need to leap in, the above seems like what may be required. Cheers Mark > highlighter problem with n-gram tokens > -- > > Key: LUCENE-1489 > URL: https://issues.apache.org/jira/browse/LUCENE-1489 > Project: Lucene - Java > Issue Type: Bug > Components: contrib/highlighter >Reporter: Koji Sekiguchi >Priority: Minor > > I have a problem when using n-gram and highlighter. I thought it had been > solved in LUCENE-627... > Actually, I found this problem when I was using CJKTokenizer on Solr, though, > here is lucene program to reproduce it using NGramTokenizer(min=2,max=2) > instead of CJKTokenizer: > {code:java} > public class TestNGramHighlighter { > public static void main(String[] args) throws Exception { > Analyzer analyzer = new NGramAnalyzer(); > final String TEXT = "Lucene can make index. Then Lucene can search."; > final String QUERY = "can"; > QueryParser parser = new QueryParser("f",analyzer); > Query query = parser.parse(QUERY); > QueryScorer scorer = new QueryScorer(query,"f"); > Highlighter h = new Highlighter( scorer ); > System.out.println( h.getBestFragment(analyzer, "f", TEXT) ); > } > static class NGramAnalyzer extends Analyzer { > public TokenStream tokenStream(String field, Reader input) { > return new NGramTokenizer(input,2,2); > } > } > } > {code} > expected output is: > Lucene can make index. Then Lucene can search. > but the actual output is: > Lucene can make index. Then Lucene can search. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org For additional commands, e-mail: java-dev-h...@lucene.apache.org
[jira] Commented: (LUCENE-1507) adding EmptyDocIdSet/Iterator
[ https://issues.apache.org/jira/browse/LUCENE-1507?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12667645#action_12667645 ] Michael McCandless commented on LUCENE-1507: We could simply add a static method somewhere (getEmptyDocIdSet()) to retrieve a single re-used instance of 0-sized SortedVIntList? > adding EmptyDocIdSet/Iterator > - > > Key: LUCENE-1507 > URL: https://issues.apache.org/jira/browse/LUCENE-1507 > Project: Lucene - Java > Issue Type: Improvement > Components: Search >Affects Versions: 2.4 >Reporter: John Wang > Attachments: emptydocidset.txt > > Original Estimate: 1h > Remaining Estimate: 1h > > Adding convenience classes for EmptyDocIdSet and EmptyDocIdSetIterator -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org For additional commands, e-mail: java-dev-h...@lucene.apache.org
[jira] Commented: (LUCENE-1530) Support inclusive/exclusive for TrieRangeQuery/-Filter, remove default trie variant setters/getters
[ https://issues.apache.org/jira/browse/LUCENE-1530?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12667644#action_12667644 ] Michael McCandless commented on LUCENE-1530: bq. If nobody complains and the removal of not yet released constructors without the boolean parameters is ok, This is perfectly fine. Not yet released APIs are free to change. > Support inclusive/exclusive for TrieRangeQuery/-Filter, remove default trie > variant setters/getters > --- > > Key: LUCENE-1530 > URL: https://issues.apache.org/jira/browse/LUCENE-1530 > Project: Lucene - Java > Issue Type: New Feature > Components: contrib/* >Affects Versions: 2.9 >Reporter: Uwe Schindler >Assignee: Uwe Schindler > Fix For: 2.9 > > Attachments: LUCENE-1530.patch, LUCENE-1530.patch > > > TrieRangeQuery/Filter is missing one thing: Ranges that have exclusive > bounds. For TrieRangeQuery this may not be important for ranges on long or > Date (==long) values (because [1..5] is the same like ]0..6[ or ]0..5]). This > is not so simple for doubles because you must add/substract 1 from the trie > encoded unsigned long. > To be conform with the other range queries, I will submit a patch that has > two additional boolean parameters in the ctors to support inclusive/exclusive > ranges for both ends. Internally it will be implemented using > TrieUtils.incrementTrieCoded/decrementTrieCoded() but makes life simplier for > double ranges (a simple exclusive replacement for the floating point range > [0.0..1.0] is not possible without having the underlying unsigned long). > In December, when trie contrib was included (LUCENE-1470), 3 trie variants > were supplied by TrieUtils. For new APIs a statically configureable default > Trie variant does not conform to an API we want in Lucene (currently we want > to deprecate all these static setters/getters). The important thing: It does > not make code shorter or easier to understand, its more error prone. Before > release of 2.9 it is a good time to remove the default trie variant and > always force the parameter in TrieRangeQuery/Filter. It is better to choose > the variant in the application and do not automatically manage it. > As Lucene 2.9 was not yet released, I will change the ctors and not preserve > the old ones. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org For additional commands, e-mail: java-dev-h...@lucene.apache.org
RE: RE: Hudson Java Docs?
> Alternately, we could turn off the "Publish Javadoc" feature, and instead > add trunk/build/docs/api to the list of files to "Archive" and then start > refering to a URL like this (doesn't work at the moment) for all the > javadocs... > > http://hudson.zones.apache.org/hudson/view/Lucene/job/Lucene- > trunk/lastSuccessfulBuild/artifact/trunk/build/docs/api/ > > turning that Javadoc feature off should eliminate the existing Javadoc > links in the hudson navigation, but I suspect the old files would still be > there (and in search engine caches) Can we do a one-time cleanup (rm -rf in the directory) and then have a new and clean start (maybe ask the hudson team at Apache)? The index.html file for the javadocs in the root javadoc folder is possible, but it would not remove the old files (the others, not index.html) from Googles cache/index. Uwe - To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org For additional commands, e-mail: java-dev-h...@lucene.apache.org
[jira] Updated: (LUCENE-1530) Support inclusive/exclusive for TrieRangeQuery/-Filter, remove default trie variant setters/getters
[ https://issues.apache.org/jira/browse/LUCENE-1530?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Uwe Schindler updated LUCENE-1530: -- Description: TrieRangeQuery/Filter is missing one thing: Ranges that have exclusive bounds. For TrieRangeQuery this may not be important for ranges on long or Date (==long) values (because [1..5] is the same like ]0..6[ or ]0..5]). This is not so simple for doubles because you must add/substract 1 from the trie encoded unsigned long. To be conform with the other range queries, I will submit a patch that has two additional boolean parameters in the ctors to support inclusive/exclusive ranges for both ends. Internally it will be implemented using TrieUtils.incrementTrieCoded/decrementTrieCoded() but makes life simplier for double ranges (a simple exclusive replacement for the floating point range [0.0..1.0] is not possible without having the underlying unsigned long). In December, when trie contrib was included (LUCENE-1470), 3 trie variants were supplied by TrieUtils. For new APIs a statically configureable default Trie variant does not conform to an API we want in Lucene (currently we want to deprecate all these static setters/getters). The important thing: It does not make code shorter or easier to understand, its more error prone. Before release of 2.9 it is a good time to remove the default trie variant and always force the parameter in TrieRangeQuery/Filter. It is better to choose the variant in the application and do not automatically manage it. As Lucene 2.9 was not yet released, I will change the ctors and not preserve the old ones. was: TrieRangeQuery/Filter is missing one thing: Ranges that have exclusive bounds. For TrieRangeQuery this may not be important for ranges on long or Date (==long) values (because [1..5] is the same like ]0..6[ or ]0..5]). This is not so simple for doubles because you must add/substract 1 from the trie encoded unsigned long. To be conform with the other range queries, I will submit a patch that has two additional boolean parameters in the ctors to support inclusive/exclusive ranges for both ends. Internally it will be implemented using TrieUtils.incrementTrieCoded/decrementTrieCoded() but makes life simplier for double ranges (a simple exclusive replacement for the floating point range [0.0..1.0] is not possible without having the underlying unsigned long). As Lucene 2.9 was not yet released, I will change the ctors and not preserve the old ones. Lucene Fields: [New, Patch Available] (was: [New]) Update issue description to include both changes. > Support inclusive/exclusive for TrieRangeQuery/-Filter, remove default trie > variant setters/getters > --- > > Key: LUCENE-1530 > URL: https://issues.apache.org/jira/browse/LUCENE-1530 > Project: Lucene - Java > Issue Type: New Feature > Components: contrib/* >Affects Versions: 2.9 >Reporter: Uwe Schindler >Assignee: Uwe Schindler > Fix For: 2.9 > > Attachments: LUCENE-1530.patch, LUCENE-1530.patch > > > TrieRangeQuery/Filter is missing one thing: Ranges that have exclusive > bounds. For TrieRangeQuery this may not be important for ranges on long or > Date (==long) values (because [1..5] is the same like ]0..6[ or ]0..5]). This > is not so simple for doubles because you must add/substract 1 from the trie > encoded unsigned long. > To be conform with the other range queries, I will submit a patch that has > two additional boolean parameters in the ctors to support inclusive/exclusive > ranges for both ends. Internally it will be implemented using > TrieUtils.incrementTrieCoded/decrementTrieCoded() but makes life simplier for > double ranges (a simple exclusive replacement for the floating point range > [0.0..1.0] is not possible without having the underlying unsigned long). > In December, when trie contrib was included (LUCENE-1470), 3 trie variants > were supplied by TrieUtils. For new APIs a statically configureable default > Trie variant does not conform to an API we want in Lucene (currently we want > to deprecate all these static setters/getters). The important thing: It does > not make code shorter or easier to understand, its more error prone. Before > release of 2.9 it is a good time to remove the default trie variant and > always force the parameter in TrieRangeQuery/Filter. It is better to choose > the variant in the application and do not automatically manage it. > As Lucene 2.9 was not yet released, I will change the ctors and not preserve > the old ones. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To u
RE: Hudson Java Docs?
> Chris Hostetter wrote: > > : > I think, the outdated docs should be removed from the server to also > > : > disappear from search engines. > > We do not want unofficial builds to be indexed by search engines anyway. > Folks who're searching for information about Lucene should not be > referred to unreleased docuementation on an Apache host that can easily > be confused with official documentation. I am frankly appalled to see > that nightly build documentation still appears at the top of the search > results for queries such as "lucene api". > > We should add a robots.txt for Hudson that prohibits crawling, no? > > Why waste effort on documentation for use only by those very same people > who can easily create their own copy? > > > Alternately, we could turn off the "Publish Javadoc" feature, and > instead > > add trunk/build/docs/api to the list of files to "Archive" and then > start > > refering to a URL like this (doesn't work at the moment) for all the > > javadocs... > > > > http://hudson.zones.apache.org/hudson/view/Lucene/job/Lucene- > trunk/lastSuccessfulBuild/artifact/trunk/build/docs/api/ > > +1, except the referring part. Why not refer? A robots.txt is OK, but the docs should be accessible via a link from Hudson and the developer resources page. If search engines do not harvest them, there is no problem with the linking, I think it would be fine. Uwe - To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org For additional commands, e-mail: java-dev-h...@lucene.apache.org