Hudson build is back to normal: Lucene-trunk #904
See http://hudson.zones.apache.org/hudson/job/Lucene-trunk/904/changes - To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org For additional commands, e-mail: java-dev-h...@lucene.apache.org
RE: Build failed in Hudson: Lucene-trunk #902
This seems to be fixed now. But there is something completely wrong with clover: If you look into the clover reports, there are a lot of classes having 0% code coverage, but there are tests available (e.g. my new NumericRange things). Also *all* contribs have 0%. After thinking a little bit about it, it seems, that the cloverage report is build not from the normal test-run, but it is generated from the results of the test-tag. This explains, why NumericRange and Spatial seem to have no tests for clover. Does anybody know, how to fix this. Maybe the cloverage should be disabled for the test run in test-tag? What can be changed in build.xml to do this? I have no clover installed locally, so I cannot try this out. - Uwe Schindler H.-H.-Meier-Allee 63, D-28213 Bremen http://www.thetaphi.de eMail: u...@thetaphi.de -Original Message- From: Michael McCandless [mailto:luc...@mikemccandless.com] Sent: Tuesday, July 28, 2009 12:13 PM To: java-dev@lucene.apache.org Subject: Re: Build failed in Hudson: Lucene-trunk #902 Hmm... the build looks like it failed because of some odd clover licensing issue: [clover] Sorry, you are not licensed to instrument files in the package ''. Anyone have any ideas? Mike On Mon, Jul 27, 2009 at 11:26 PM, Apache Hudson Serverhud...@hudson.zones.apache.org wrote: See http://hudson.zones.apache.org/hudson/job/Lucene-trunk/902/changes Changes: [uschindler] LUCENE-1754: JavaDoc updates [mikemccand] LUCENE-1754: EMPTY_DOCIDSET subclasses DocIdSet directly [mikemccand] LUCENE-1754: just use EMPTY_DOCIDSET.iterator() instead of new EmptyDocIdSetIterator [mikemccand] LUCENE-1595: don't use SortField.AUTO; deprecate LineDocMaker EnwikiDocMaker [mikemccand] LUCENE-1754: add EmptyDocIdSetIterator [mikemccand] LUCENE-1754: update back-compat test [mikemccand] LUCENE-1754: BooleanQuery detects up front if it won't match any docs and returns null from its scorer() instead of NonMatchingScorer -- [...truncated 21062 lines...] [javadoc] Note: Custom tags that were not seen: �...@todo, @uml.property [javadoc] 1 error [javadoc] 32 warnings [jar] Building jar: http://hudson.zones.apache.org/hudson/job/Lucene- trunk/ws/trunk/build/contrib/spatial/lucene-spatial-2009-07-28_02-04-46- javadoc.jar [echo] Building spellchecker... javadocs: [javadoc] Generating Javadoc [javadoc] Javadoc execution [javadoc] Loading source files for package org.apache.lucene.search.spell... [javadoc] Constructing Javadoc information... [javadoc] javadoc: warning - Error reading file: http://hudson.zones.apache.org/hudson/job/Lucene- trunk/ws/trunk/build/docs/api/contrib-spellchecker/../package-list [javadoc] Standard Doclet version 1.5.0_14 [javadoc] Building tree for all the packages and classes... [javadoc] Building index for all the packages and classes... [javadoc] Building index for all classes... [javadoc] javadoc: error - Error while reading file http://hudson.zones.apache.org/hudson/job/Lucene- trunk/ws/trunk/contrib/spellchecker/src/java/overview.html [javadoc] Generating http://hudson.zones.apache.org/hudson/job/Lucene- trunk/ws/trunk/build/docs/api/contrib-spellchecker/stylesheet.css... [javadoc] Note: Custom tags that could override future standard tags: �...@todo. To avoid potential overrides, use at least one period character (.) in custom tag names. [javadoc] Note: Custom tags that were not seen: �...@todo, @uml.property [javadoc] 1 error [javadoc] 1 warning [jar] Building jar: http://hudson.zones.apache.org/hudson/job/Lucene- trunk/ws/trunk/build/contrib/spellchecker/lucene-spellchecker-2009-07- 28_02-04-46-javadoc.jar [echo] Building surround... javadocs: [javadoc] Generating Javadoc [javadoc] Javadoc execution [javadoc] Loading source files for package org.apache.lucene.queryParser.surround.parser... [javadoc] Loading source files for package org.apache.lucene.queryParser.surround.query... [javadoc] Constructing Javadoc information... [javadoc] javadoc: warning - Error reading file: http://hudson.zones.apache.org/hudson/job/Lucene- trunk/ws/trunk/build/docs/api/contrib-surround/../package-list [javadoc] Standard Doclet version 1.5.0_14 [javadoc] Building tree for all the packages and classes... [javadoc] Building index for all the packages and classes... [javadoc] Building index for all classes... [javadoc] javadoc: error - Error while reading file http://hudson.zones.apache.org/hudson/job/Lucene- trunk/ws/trunk/contrib/surround/src/java/overview.html [javadoc] Generating http://hudson.zones.apache.org/hudson/job/Lucene- trunk/ws/trunk/build/docs/api/contrib-surround/stylesheet.css... [javadoc] Note: Custom tags that could override future standard tags: �...@todo. To avoid potential overrides, use at least one period character (.) in
[jira] Created: (LUCENE-1765) incorrect doc description of fielded query syntax
incorrect doc description of fielded query syntax - Key: LUCENE-1765 URL: https://issues.apache.org/jira/browse/LUCENE-1765 Project: Lucene - Java Issue Type: Bug Components: Other Affects Versions: 2.4.1 Environment: lucene.apache.org docs Reporter: solrize Priority: Minor http://lucene.apache.org/java/2_4_1/queryparsersyntax.html#Fields says: You can search any field by typing the field name followed by a colon : and then the term you are looking for. This is slightly incomplete since the stuff after the fieldname can be a more complex query, not necessarily a term. For example, title:(do it right) seems to work when I tried it. It would be good if the doc was updated to describe the syntax exactly. Also, documentation should be one of the components selectable in bug reports. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org For additional commands, e-mail: java-dev-h...@lucene.apache.org
[jira] Commented: (LUCENE-1690) Morelikethis queries are very slow compared to other search types
[ https://issues.apache.org/jira/browse/LUCENE-1690?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12736525#action_12736525 ] Richard Marr commented on LUCENE-1690: -- There's also another problem I've just noticed. Please ignore the latest patch. Morelikethis queries are very slow compared to other search types - Key: LUCENE-1690 URL: https://issues.apache.org/jira/browse/LUCENE-1690 Project: Lucene - Java Issue Type: Improvement Components: contrib/* Affects Versions: 2.4.1 Reporter: Richard Marr Priority: Minor Attachments: LruCache.patch, LUCENE-1690.patch Original Estimate: 2h Remaining Estimate: 2h The MoreLikeThis object performs term frequency lookups for every query. From my testing that's what seems to take up the majority of time for MoreLikeThis searches. For some (I'd venture many) applications it's not necessary for term statistics to be looked up every time. A fairly naive opt-in caching mechanism tied to the life of the MoreLikeThis object would allow applications to cache term statistics for the duration that suits them. I've got this working in my test code. I'll put together a patch file when I get a minute. From my testing this can improve performance by a factor of around 10. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org For additional commands, e-mail: java-dev-h...@lucene.apache.org
Re: Build failed in Hudson: Lucene-trunk #902
I'm guessing it was the empty source file I accidentally left in for LUCENE-1754, that Hoss removed (thanks!). I think clover saw that as an attempt to instrument a source in the empty-string package. I'm unfamiliar w/ how to configure clover, but I agree we should make sure it's testing coverage for our normal unit tests. Rather than turn it off for test-tag, can we measure coverage of all tests (test-tag, test-core, test-contrib)? Is there someone familiar w/ clover who can look into this? Mike On Wed, Jul 29, 2009 at 3:10 AM, Uwe Schindleru...@thetaphi.de wrote: This seems to be fixed now. But there is something completely wrong with clover: If you look into the clover reports, there are a lot of classes having 0% code coverage, but there are tests available (e.g. my new NumericRange things). Also *all* contribs have 0%. After thinking a little bit about it, it seems, that the cloverage report is build not from the normal test-run, but it is generated from the results of the test-tag. This explains, why NumericRange and Spatial seem to have no tests for clover. Does anybody know, how to fix this. Maybe the cloverage should be disabled for the test run in test-tag? What can be changed in build.xml to do this? I have no clover installed locally, so I cannot try this out. - Uwe Schindler H.-H.-Meier-Allee 63, D-28213 Bremen http://www.thetaphi.de eMail: u...@thetaphi.de -Original Message- From: Michael McCandless [mailto:luc...@mikemccandless.com] Sent: Tuesday, July 28, 2009 12:13 PM To: java-dev@lucene.apache.org Subject: Re: Build failed in Hudson: Lucene-trunk #902 Hmm... the build looks like it failed because of some odd clover licensing issue: [clover] Sorry, you are not licensed to instrument files in the package ''. Anyone have any ideas? Mike On Mon, Jul 27, 2009 at 11:26 PM, Apache Hudson Serverhud...@hudson.zones.apache.org wrote: See http://hudson.zones.apache.org/hudson/job/Lucene-trunk/902/changes Changes: [uschindler] LUCENE-1754: JavaDoc updates [mikemccand] LUCENE-1754: EMPTY_DOCIDSET subclasses DocIdSet directly [mikemccand] LUCENE-1754: just use EMPTY_DOCIDSET.iterator() instead of new EmptyDocIdSetIterator [mikemccand] LUCENE-1595: don't use SortField.AUTO; deprecate LineDocMaker EnwikiDocMaker [mikemccand] LUCENE-1754: add EmptyDocIdSetIterator [mikemccand] LUCENE-1754: update back-compat test [mikemccand] LUCENE-1754: BooleanQuery detects up front if it won't match any docs and returns null from its scorer() instead of NonMatchingScorer -- [...truncated 21062 lines...] [javadoc] Note: Custom tags that were not seen: �...@todo, @uml.property [javadoc] 1 error [javadoc] 32 warnings [jar] Building jar: http://hudson.zones.apache.org/hudson/job/Lucene- trunk/ws/trunk/build/contrib/spatial/lucene-spatial-2009-07-28_02-04-46- javadoc.jar [echo] Building spellchecker... javadocs: [javadoc] Generating Javadoc [javadoc] Javadoc execution [javadoc] Loading source files for package org.apache.lucene.search.spell... [javadoc] Constructing Javadoc information... [javadoc] javadoc: warning - Error reading file: http://hudson.zones.apache.org/hudson/job/Lucene- trunk/ws/trunk/build/docs/api/contrib-spellchecker/../package-list [javadoc] Standard Doclet version 1.5.0_14 [javadoc] Building tree for all the packages and classes... [javadoc] Building index for all the packages and classes... [javadoc] Building index for all classes... [javadoc] javadoc: error - Error while reading file http://hudson.zones.apache.org/hudson/job/Lucene- trunk/ws/trunk/contrib/spellchecker/src/java/overview.html [javadoc] Generating http://hudson.zones.apache.org/hudson/job/Lucene- trunk/ws/trunk/build/docs/api/contrib-spellchecker/stylesheet.css... [javadoc] Note: Custom tags that could override future standard tags: �...@todo. To avoid potential overrides, use at least one period character (.) in custom tag names. [javadoc] Note: Custom tags that were not seen: �...@todo, @uml.property [javadoc] 1 error [javadoc] 1 warning [jar] Building jar: http://hudson.zones.apache.org/hudson/job/Lucene- trunk/ws/trunk/build/contrib/spellchecker/lucene-spellchecker-2009-07- 28_02-04-46-javadoc.jar [echo] Building surround... javadocs: [javadoc] Generating Javadoc [javadoc] Javadoc execution [javadoc] Loading source files for package org.apache.lucene.queryParser.surround.parser... [javadoc] Loading source files for package org.apache.lucene.queryParser.surround.query... [javadoc] Constructing Javadoc information... [javadoc] javadoc: warning - Error reading file: http://hudson.zones.apache.org/hudson/job/Lucene- trunk/ws/trunk/build/docs/api/contrib-surround/../package-list [javadoc] Standard Doclet version 1.5.0_14 [javadoc] Building tree for all the packages
[jira] Created: (LUCENE-1766) Add Thread-Safety note to IndexWriter JavaDoc
Add Thread-Safety note to IndexWriter JavaDoc - Key: LUCENE-1766 URL: https://issues.apache.org/jira/browse/LUCENE-1766 Project: Lucene - Java Issue Type: Improvement Affects Versions: 2.9 Reporter: Simon Willnauer Priority: Minor Fix For: 2.9 IndexWriter Javadocs should contain a note about thread-safety. This is already mentioned on the wiki FAQ page but such an essential information should be part of the module documentation too. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org For additional commands, e-mail: java-dev-h...@lucene.apache.org
[jira] Assigned: (LUCENE-1766) Add Thread-Safety note to IndexWriter JavaDoc
[ https://issues.apache.org/jira/browse/LUCENE-1766?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Michael McCandless reassigned LUCENE-1766: -- Assignee: Michael McCandless Add Thread-Safety note to IndexWriter JavaDoc - Key: LUCENE-1766 URL: https://issues.apache.org/jira/browse/LUCENE-1766 Project: Lucene - Java Issue Type: Improvement Affects Versions: 2.9 Reporter: Simon Willnauer Assignee: Michael McCandless Priority: Minor Fix For: 2.9 Attachments: LUCENE-1766.patch IndexWriter Javadocs should contain a note about thread-safety. This is already mentioned on the wiki FAQ page but such an essential information should be part of the module documentation too. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org For additional commands, e-mail: java-dev-h...@lucene.apache.org
[jira] Updated: (LUCENE-1766) Add Thread-Safety note to IndexWriter JavaDoc
[ https://issues.apache.org/jira/browse/LUCENE-1766?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Michael McCandless updated LUCENE-1766: --- Attachment: LUCENE-1766.patch Tweaked the wording... Simon if this looks OK to you I'll commit shortly! Add Thread-Safety note to IndexWriter JavaDoc - Key: LUCENE-1766 URL: https://issues.apache.org/jira/browse/LUCENE-1766 Project: Lucene - Java Issue Type: Improvement Affects Versions: 2.9 Reporter: Simon Willnauer Assignee: Michael McCandless Priority: Minor Fix For: 2.9 Attachments: LUCENE-1766.patch, LUCENE-1766.patch IndexWriter Javadocs should contain a note about thread-safety. This is already mentioned on the wiki FAQ page but such an essential information should be part of the module documentation too. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org For additional commands, e-mail: java-dev-h...@lucene.apache.org
[jira] Commented: (LUCENE-1758) improve arabic analyzer: light8 - light10
[ https://issues.apache.org/jira/browse/LUCENE-1758?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12736548#action_12736548 ] Michael McCandless commented on LUCENE-1758: bq. perhaps both this and LUCENE-1628 should include LowerCaseFilter. That seems reasonable? improve arabic analyzer: light8 - light10 -- Key: LUCENE-1758 URL: https://issues.apache.org/jira/browse/LUCENE-1758 Project: Lucene - Java Issue Type: Improvement Components: contrib/analyzers Reporter: Robert Muir Assignee: Michael McCandless Priority: Minor Fix For: 2.9 Attachments: LUCENE-1758.patch, LUCENE-1758.txt Someone mentioned on the java user list that the arabic analysis was not as good as they would like. This patch adds the لل- prefix (light10 algorithm versus light8 algorithm). In the light10 paper, this improves precision from .390 to .413 They mention this is not statistically significant, but it makes linguistic sense and at least has been shown not to hurt. In the future, I hope openrelevance will allow us to try some more approaches. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org For additional commands, e-mail: java-dev-h...@lucene.apache.org
[jira] Commented: (LUCENE-1766) Add Thread-Safety note to IndexWriter JavaDoc
[ https://issues.apache.org/jira/browse/LUCENE-1766?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12736551#action_12736551 ] Simon Willnauer commented on LUCENE-1766: - looks good to me. Add Thread-Safety note to IndexWriter JavaDoc - Key: LUCENE-1766 URL: https://issues.apache.org/jira/browse/LUCENE-1766 Project: Lucene - Java Issue Type: Improvement Affects Versions: 2.9 Reporter: Simon Willnauer Assignee: Michael McCandless Priority: Minor Fix For: 2.9 Attachments: LUCENE-1766.patch, LUCENE-1766.patch IndexWriter Javadocs should contain a note about thread-safety. This is already mentioned on the wiki FAQ page but such an essential information should be part of the module documentation too. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org For additional commands, e-mail: java-dev-h...@lucene.apache.org
[jira] Commented: (LUCENE-1766) Add Thread-Safety note to IndexWriter JavaDoc
[ https://issues.apache.org/jira/browse/LUCENE-1766?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12736556#action_12736556 ] Uwe Schindler commented on LUCENE-1766: --- By the way: Do we have a TS note for IndexReader? Add Thread-Safety note to IndexWriter JavaDoc - Key: LUCENE-1766 URL: https://issues.apache.org/jira/browse/LUCENE-1766 Project: Lucene - Java Issue Type: Improvement Affects Versions: 2.9 Reporter: Simon Willnauer Assignee: Michael McCandless Priority: Minor Fix For: 2.9 Attachments: LUCENE-1766.patch, LUCENE-1766.patch IndexWriter Javadocs should contain a note about thread-safety. This is already mentioned on the wiki FAQ page but such an essential information should be part of the module documentation too. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org For additional commands, e-mail: java-dev-h...@lucene.apache.org
[jira] Resolved: (LUCENE-1766) Add Thread-Safety note to IndexWriter JavaDoc
[ https://issues.apache.org/jira/browse/LUCENE-1766?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Michael McCandless resolved LUCENE-1766. Resolution: Fixed OK thanks Simon! Add Thread-Safety note to IndexWriter JavaDoc - Key: LUCENE-1766 URL: https://issues.apache.org/jira/browse/LUCENE-1766 Project: Lucene - Java Issue Type: Improvement Affects Versions: 2.9 Reporter: Simon Willnauer Assignee: Michael McCandless Priority: Minor Fix For: 2.9 Attachments: LUCENE-1766.patch, LUCENE-1766.patch IndexWriter Javadocs should contain a note about thread-safety. This is already mentioned on the wiki FAQ page but such an essential information should be part of the module documentation too. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org For additional commands, e-mail: java-dev-h...@lucene.apache.org
[jira] Commented: (LUCENE-1766) Add Thread-Safety note to IndexWriter JavaDoc
[ https://issues.apache.org/jira/browse/LUCENE-1766?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12736557#action_12736557 ] Simon Willnauer commented on LUCENE-1766: - We don't afaik. Add Thread-Safety note to IndexWriter JavaDoc - Key: LUCENE-1766 URL: https://issues.apache.org/jira/browse/LUCENE-1766 Project: Lucene - Java Issue Type: Improvement Affects Versions: 2.9 Reporter: Simon Willnauer Assignee: Michael McCandless Priority: Minor Fix For: 2.9 Attachments: LUCENE-1766.patch, LUCENE-1766.patch IndexWriter Javadocs should contain a note about thread-safety. This is already mentioned on the wiki FAQ page but such an essential information should be part of the module documentation too. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org For additional commands, e-mail: java-dev-h...@lucene.apache.org
[jira] Reopened: (LUCENE-1766) Add Thread-Safety note to IndexWriter JavaDoc
[ https://issues.apache.org/jira/browse/LUCENE-1766?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Michael McCandless reopened LUCENE-1766: I'll add to IndexReader IndexSearcher as well. Add Thread-Safety note to IndexWriter JavaDoc - Key: LUCENE-1766 URL: https://issues.apache.org/jira/browse/LUCENE-1766 Project: Lucene - Java Issue Type: Improvement Affects Versions: 2.9 Reporter: Simon Willnauer Assignee: Michael McCandless Priority: Minor Fix For: 2.9 Attachments: LUCENE-1766.patch, LUCENE-1766.patch IndexWriter Javadocs should contain a note about thread-safety. This is already mentioned on the wiki FAQ page but such an essential information should be part of the module documentation too. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org For additional commands, e-mail: java-dev-h...@lucene.apache.org
[jira] Commented: (LUCENE-1763) MergePolicy should require an IndexWriter upon construction
[ https://issues.apache.org/jira/browse/LUCENE-1763?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12736614#action_12736614 ] Michael McCandless commented on LUCENE-1763: How about we: * Simply change the methods. Yes it's technically a break in back-compat, but since they are package private, and so advanced (I think very few people have customized their merge policy/scheduler), a compile time error on upgrade seems fine. * Make the APIs public (perhaps add a unit test, outside of oal.index package, asserting that all that's required is in fact public) * Mark the APIs as subject to change. MergePolicy should require an IndexWriter upon construction --- Key: LUCENE-1763 URL: https://issues.apache.org/jira/browse/LUCENE-1763 Project: Lucene - Java Issue Type: Improvement Components: Index Reporter: Shai Erera Assignee: Michael McCandless Priority: Minor Fix For: 2.9 MergePolicy does not require an IW upon construction, but requires one to be passed as method arg to various methods. This gives the impression as if a single MP instance can be shared across various IW instances, which is not true for all MPs (if at all). In addition, LogMergePolicy uses the IW instance passed to these methods incosistently, and is currently exposed to potential NPEs. This issue will change MP to require an IW instance, however for back-compat reasons the following changes will be made: # A new MP ctor w/ IW as arg will be introduced. Additionally, for back-compat a default ctor will also be declared which will assign null to the member IW. # Methods that require IW will be deprecated, and new ones will be declared. #* For back-compat, the new ones will not be made abstract, but will throw UOE, with a comment that they will become abstract in 3.0. # All current MP impls will move to use the member instance. # The code which calls MP methods will continue to use the deprecated methods, passing an IW even that it won't be necessary -- this is strictly for back-compat. In 3.0, we'll remove the deprecated default ctor and methods, and change the code to not call the IW method variants anymore. I hope that I didn't leave anything out. I'm sure I'll find out when I work on the patch :). -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org For additional commands, e-mail: java-dev-h...@lucene.apache.org
[jira] Updated: (LUCENE-1766) Add Thread-Safety note to IndexWriter JavaDoc
[ https://issues.apache.org/jira/browse/LUCENE-1766?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Michael McCandless updated LUCENE-1766: --- Attachment: LUCENE-1766.patch IndexReader IndexSearcher as well. Add Thread-Safety note to IndexWriter JavaDoc - Key: LUCENE-1766 URL: https://issues.apache.org/jira/browse/LUCENE-1766 Project: Lucene - Java Issue Type: Improvement Affects Versions: 2.9 Reporter: Simon Willnauer Assignee: Michael McCandless Priority: Minor Fix For: 2.9 Attachments: LUCENE-1766.patch, LUCENE-1766.patch, LUCENE-1766.patch IndexWriter Javadocs should contain a note about thread-safety. This is already mentioned on the wiki FAQ page but such an essential information should be part of the module documentation too. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org For additional commands, e-mail: java-dev-h...@lucene.apache.org
[jira] Commented: (LUCENE-1763) MergePolicy should require an IndexWriter upon construction
[ https://issues.apache.org/jira/browse/LUCENE-1763?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12736617#action_12736617 ] Shai Erera commented on LUCENE-1763: I don't mind doing that ... but note that LMP's methods are public (it overrides and declare them public) and so I was thinking that someone could potentially have written his own LMP (no one can write their own MP today). But if you're fine w/ me doing that, it's fine by me as well. BTW - I don't need to come up w/ new names after all, since by just adding the same method, w/o the IW arg changes its signature. But I agree that having just the right form makes more sense. MergePolicy should require an IndexWriter upon construction --- Key: LUCENE-1763 URL: https://issues.apache.org/jira/browse/LUCENE-1763 Project: Lucene - Java Issue Type: Improvement Components: Index Reporter: Shai Erera Assignee: Michael McCandless Priority: Minor Fix For: 2.9 MergePolicy does not require an IW upon construction, but requires one to be passed as method arg to various methods. This gives the impression as if a single MP instance can be shared across various IW instances, which is not true for all MPs (if at all). In addition, LogMergePolicy uses the IW instance passed to these methods incosistently, and is currently exposed to potential NPEs. This issue will change MP to require an IW instance, however for back-compat reasons the following changes will be made: # A new MP ctor w/ IW as arg will be introduced. Additionally, for back-compat a default ctor will also be declared which will assign null to the member IW. # Methods that require IW will be deprecated, and new ones will be declared. #* For back-compat, the new ones will not be made abstract, but will throw UOE, with a comment that they will become abstract in 3.0. # All current MP impls will move to use the member instance. # The code which calls MP methods will continue to use the deprecated methods, passing an IW even that it won't be necessary -- this is strictly for back-compat. In 3.0, we'll remove the deprecated default ctor and methods, and change the code to not call the IW method variants anymore. I hope that I didn't leave anything out. I'm sure I'll find out when I work on the patch :). -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org For additional commands, e-mail: java-dev-h...@lucene.apache.org
[jira] Updated: (LUCENE-1752) incorrect snippet returned with SpanScorer
[ https://issues.apache.org/jira/browse/LUCENE-1752?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Koji Sekiguchi updated LUCENE-1752: --- Fix Version/s: 2.9 I'd like set 2.9. With the patch, highlighter works on our production environment perfectly. incorrect snippet returned with SpanScorer -- Key: LUCENE-1752 URL: https://issues.apache.org/jira/browse/LUCENE-1752 Project: Lucene - Java Issue Type: Bug Components: contrib/highlighter Affects Versions: 2.9 Reporter: Koji Sekiguchi Assignee: Mark Miller Priority: Minor Fix For: 2.9 Attachments: LUCENE-1752.patch This problem was reported by my customer. They are using Solr 1.3 and uni-gram, but it can be reproduced with Lucene 2.9 and WhitespaceAnalyzer. {panel:title=Query} (f1:a b c d OR f2:a b c d) AND (f1:b c g OR f2:b c g) {panel} The snippet we expected is: {panel} x y z Ba/B Bb/B Bc/B Bd/B e f g Bb/B Bc/B Bg/B {panel} but we got: {panel} x y z Ba/B b c Bd/B e f g Bb/B Bc/B Bg/B {panel} Program to reproduce the problem: {code} public class TestHighlighter { static final String CONTENT = x y z a b c d e f g b c g; static final String PH1 = \a b c d\; static final String PH2 = \b c g\; static final String F1 = f1; static final String F2 = f2; static final String F1C = F1 + :; static final String F2C = F2 + :; static final String QUERY_STRING = ( + F1C + PH1 + OR + F2C + PH1 + ) AND ( + F1C + PH2 + OR + F2C + PH2 + ); static Analyzer analyzer = new WhitespaceAnalyzer(); public static void main(String[] args) throws Exception { QueryParser qp = new QueryParser( F1, analyzer ); Query query = qp.parse( QUERY_STRING ); CachingTokenFilter stream = new CachingTokenFilter( analyzer.tokenStream( F1, new StringReader( CONTENT ) ) ); Scorer scorer = new SpanScorer( query, F1, stream, false ); Highlighter h = new Highlighter( scorer ); System.out.println( query : + QUERY_STRING ); System.out.println( h.getBestFragment( analyzer, F1, CONTENT ) ); } } {code} -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org For additional commands, e-mail: java-dev-h...@lucene.apache.org
[jira] Updated: (LUCENE-1766) Add Thread-Safety note to IndexWriter JavaDoc
[ https://issues.apache.org/jira/browse/LUCENE-1766?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Simon Willnauer updated LUCENE-1766: Attachment: LUCENE-1766.patch Added small but important fact about the synchronization Object. Everything else looks good to me! Add Thread-Safety note to IndexWriter JavaDoc - Key: LUCENE-1766 URL: https://issues.apache.org/jira/browse/LUCENE-1766 Project: Lucene - Java Issue Type: Improvement Affects Versions: 2.9 Reporter: Simon Willnauer Assignee: Michael McCandless Priority: Minor Fix For: 2.9 Attachments: LUCENE-1766.patch, LUCENE-1766.patch, LUCENE-1766.patch, LUCENE-1766.patch IndexWriter Javadocs should contain a note about thread-safety. This is already mentioned on the wiki FAQ page but such an essential information should be part of the module documentation too. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org For additional commands, e-mail: java-dev-h...@lucene.apache.org
[jira] Commented: (LUCENE-1752) incorrect snippet returned with SpanScorer
[ https://issues.apache.org/jira/browse/LUCENE-1752?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12736629#action_12736629 ] Mark Miller commented on LUCENE-1752: - Thanks Koji - I had forgotten about this one. I'll commit it in a bit. incorrect snippet returned with SpanScorer -- Key: LUCENE-1752 URL: https://issues.apache.org/jira/browse/LUCENE-1752 Project: Lucene - Java Issue Type: Bug Components: contrib/highlighter Affects Versions: 2.9 Reporter: Koji Sekiguchi Assignee: Mark Miller Priority: Minor Fix For: 2.9 Attachments: LUCENE-1752.patch This problem was reported by my customer. They are using Solr 1.3 and uni-gram, but it can be reproduced with Lucene 2.9 and WhitespaceAnalyzer. {panel:title=Query} (f1:a b c d OR f2:a b c d) AND (f1:b c g OR f2:b c g) {panel} The snippet we expected is: {panel} x y z Ba/B Bb/B Bc/B Bd/B e f g Bb/B Bc/B Bg/B {panel} but we got: {panel} x y z Ba/B b c Bd/B e f g Bb/B Bc/B Bg/B {panel} Program to reproduce the problem: {code} public class TestHighlighter { static final String CONTENT = x y z a b c d e f g b c g; static final String PH1 = \a b c d\; static final String PH2 = \b c g\; static final String F1 = f1; static final String F2 = f2; static final String F1C = F1 + :; static final String F2C = F2 + :; static final String QUERY_STRING = ( + F1C + PH1 + OR + F2C + PH1 + ) AND ( + F1C + PH2 + OR + F2C + PH2 + ); static Analyzer analyzer = new WhitespaceAnalyzer(); public static void main(String[] args) throws Exception { QueryParser qp = new QueryParser( F1, analyzer ); Query query = qp.parse( QUERY_STRING ); CachingTokenFilter stream = new CachingTokenFilter( analyzer.tokenStream( F1, new StringReader( CONTENT ) ) ); Scorer scorer = new SpanScorer( query, F1, stream, false ); Highlighter h = new Highlighter( scorer ); System.out.println( query : + QUERY_STRING ); System.out.println( h.getBestFragment( analyzer, F1, CONTENT ) ); } } {code} -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org For additional commands, e-mail: java-dev-h...@lucene.apache.org
[jira] Commented: (LUCENE-1460) Change all contrib TokenStreams/Filters to use the new TokenStream API
[ https://issues.apache.org/jira/browse/LUCENE-1460?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12736634#action_12736634 ] Robert Muir commented on LUCENE-1460: - Michael, sorry to leave it incomplete, I think I am not the best for the remaining ones. For example I am a little intimidated by things such as this note in ShingleMatrix: {code} * This method exists in order to avoid reursive calls to the method * as the complexity of a fairlt small matrix then easily would require * a gigabyte sized stack per thread. {code} Change all contrib TokenStreams/Filters to use the new TokenStream API -- Key: LUCENE-1460 URL: https://issues.apache.org/jira/browse/LUCENE-1460 Project: Lucene - Java Issue Type: Task Reporter: Michael Busch Assignee: Michael Busch Priority: Minor Fix For: 2.9 Attachments: LUCENE-1460.patch, lucene-1460.patch, lucene-1460.patch, lucene-1460.patch, LUCENE-1460_contrib_partial.txt, LUCENE-1460_contrib_partial.txt, LUCENE-1460_contrib_partial.txt, LUCENE-1460_core.txt, LUCENE-1460_partial.txt Now that we have the new TokenStream API (LUCENE-1422) we should change all contrib modules to use it. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org For additional commands, e-mail: java-dev-h...@lucene.apache.org
[jira] Commented: (LUCENE-1763) MergePolicy should require an IndexWriter upon construction
[ https://issues.apache.org/jira/browse/LUCENE-1763?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12736643#action_12736643 ] Michael McCandless commented on LUCENE-1763: I think subclassing LMP is also extremely advanced, ie, it's OK to make an exception to our back-compat policy. MergePolicy should require an IndexWriter upon construction --- Key: LUCENE-1763 URL: https://issues.apache.org/jira/browse/LUCENE-1763 Project: Lucene - Java Issue Type: Improvement Components: Index Reporter: Shai Erera Assignee: Michael McCandless Priority: Minor Fix For: 2.9 MergePolicy does not require an IW upon construction, but requires one to be passed as method arg to various methods. This gives the impression as if a single MP instance can be shared across various IW instances, which is not true for all MPs (if at all). In addition, LogMergePolicy uses the IW instance passed to these methods incosistently, and is currently exposed to potential NPEs. This issue will change MP to require an IW instance, however for back-compat reasons the following changes will be made: # A new MP ctor w/ IW as arg will be introduced. Additionally, for back-compat a default ctor will also be declared which will assign null to the member IW. # Methods that require IW will be deprecated, and new ones will be declared. #* For back-compat, the new ones will not be made abstract, but will throw UOE, with a comment that they will become abstract in 3.0. # All current MP impls will move to use the member instance. # The code which calls MP methods will continue to use the deprecated methods, passing an IW even that it won't be necessary -- this is strictly for back-compat. In 3.0, we'll remove the deprecated default ctor and methods, and change the code to not call the IW method variants anymore. I hope that I didn't leave anything out. I'm sure I'll find out when I work on the patch :). -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org For additional commands, e-mail: java-dev-h...@lucene.apache.org
[jira] Commented: (LUCENE-1695) Update the Highlighter to use the new TokenStream API
[ https://issues.apache.org/jira/browse/LUCENE-1695?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12736646#action_12736646 ] Mark Miller commented on LUCENE-1695: - So without further objection, I'm going to commit this so that I can finish the 'make spanscorer the default' issue. Update the Highlighter to use the new TokenStream API - Key: LUCENE-1695 URL: https://issues.apache.org/jira/browse/LUCENE-1695 Project: Lucene - Java Issue Type: Improvement Components: contrib/highlighter Reporter: Mark Miller Assignee: Mark Miller Fix For: 2.9 Attachments: LUCENE-1695.patch, LUCENE-1695.patch, LUCENE-1695.patch, LUCENE-1695.patch -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org For additional commands, e-mail: java-dev-h...@lucene.apache.org
Adding undeleteDocument(docId | Term | Query) to IndexReader (and IndexWriter?)
Hi I think such methods are useful for a Lucene app, which needs to rollback a single document delete. Today, IndexReader offers undeleteAll(), which is a bit extreme. There are two scenarios for this, that I know of: 1) (recently showed up on the user list) I'd like to synchronize documents on disk and in the index. So if I have a document in the index which I want to delete, and also a file on the file system (corresponds to an ID or something), and the file delete fails, I may want to undelete that document. This has alternatives, but still and undeleteDocument will be useful in this case. 2) ParallelReader allows one to add a document to two indexes, some fields to one index and other to the second index, and then read those indexes in parallel. Such applications will need to delete documents sometimes, and an undeleteDocument will be useful if a transactional delete is needed: i.e., if the first delete succeeds, and the second fails, undo the first delete. 3) ParallelReader doesn't support deleteDocument well currently - i.e., if one of the deletes fail, some readers will be left w/ the document and some won't (this is I think a bug). What do you think? Shai
[jira] Updated: (LUCENE-1763) MergePolicy should require an IndexWriter upon construction
[ https://issues.apache.org/jira/browse/LUCENE-1763?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shai Erera updated LUCENE-1763: --- Attachment: LUCENE-1763.patch Adds a ctor w/ IndexWriter to MergePolicy, LogMergePolicy, and its extensions. Fixed tests and IndexWriter code Fixed tags All tests pass MergePolicy should require an IndexWriter upon construction --- Key: LUCENE-1763 URL: https://issues.apache.org/jira/browse/LUCENE-1763 Project: Lucene - Java Issue Type: Improvement Components: Index Reporter: Shai Erera Assignee: Michael McCandless Priority: Minor Fix For: 2.9 Attachments: LUCENE-1763.patch MergePolicy does not require an IW upon construction, but requires one to be passed as method arg to various methods. This gives the impression as if a single MP instance can be shared across various IW instances, which is not true for all MPs (if at all). In addition, LogMergePolicy uses the IW instance passed to these methods incosistently, and is currently exposed to potential NPEs. This issue will change MP to require an IW instance, however for back-compat reasons the following changes will be made: # A new MP ctor w/ IW as arg will be introduced. Additionally, for back-compat a default ctor will also be declared which will assign null to the member IW. # Methods that require IW will be deprecated, and new ones will be declared. #* For back-compat, the new ones will not be made abstract, but will throw UOE, with a comment that they will become abstract in 3.0. # All current MP impls will move to use the member instance. # The code which calls MP methods will continue to use the deprecated methods, passing an IW even that it won't be necessary -- this is strictly for back-compat. In 3.0, we'll remove the deprecated default ctor and methods, and change the code to not call the IW method variants anymore. I hope that I didn't leave anything out. I'm sure I'll find out when I work on the patch :). -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org For additional commands, e-mail: java-dev-h...@lucene.apache.org
[jira] Updated: (LUCENE-1749) FieldCache introspection API
[ https://issues.apache.org/jira/browse/LUCENE-1749?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hoss Man updated LUCENE-1749: - Attachment: LUCENE-1749.patch checkpoint: refactored the sanity checking code into a utility class and wrote tests specifically for it to prove it finds insane stuff. TODO: * clean up the api, make it less clunky (and not static) ** return structured data showing exactly which combinations in FieldCache are insane * javadocs * figure out why previously mentioned tests are breaking (need help with this one ... don't know enough about the code these tests excercise) FieldCache introspection API Key: LUCENE-1749 URL: https://issues.apache.org/jira/browse/LUCENE-1749 Project: Lucene - Java Issue Type: Improvement Components: Search Reporter: Hoss Man Priority: Minor Fix For: 2.9 Attachments: fieldcache-introspection.patch, LUCENE-1749.patch, LUCENE-1749.patch, LUCENE-1749.patch, LUCENE-1749.patch, LUCENE-1749.patch FieldCache should expose an Expert level API for runtime introspection of the FieldCache to provide info about what is in the FieldCache at any given moment. We should also provide utility methods for sanity checking that the FieldCache doesn't contain anything odd... * entries for the same reader/field with different types/parsers * entries for the same field/type/parser in a reader and it's subreader(s) * etc... -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org For additional commands, e-mail: java-dev-h...@lucene.apache.org
[jira] Commented: (LUCENE-1567) New flexible query parser
[ https://issues.apache.org/jira/browse/LUCENE-1567?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12736662#action_12736662 ] Uwe Schindler commented on LUCENE-1567: --- Just a question: Will it be possible to specify some type of schema for the query parser in future, to automatically create NumericRangeQuery for different numeric types? It would then be possible to index a numeric value (double,float,long,int) using NumericField and then the query parser knows, which type of field this is and so it correctly creates a NumericRangeQuery for strings like [1.567..*] or (1.787..19.5]. NumericRangeQuery also supports the rewrite modes, only some type of schema support is missing. I ask this, because someone asked on java-user for such a feature in query parser. New flexible query parser - Key: LUCENE-1567 URL: https://issues.apache.org/jira/browse/LUCENE-1567 Project: Lucene - Java Issue Type: New Feature Components: QueryParser Environment: N/A Reporter: Luis Alves Assignee: Michael Busch Fix For: 2.9 Attachments: lucene-1567.patch, lucene_1567_adriano_crestani_07_13_2009.patch, lucene_trunk_FlexQueryParser_2009July09_v4.patch, lucene_trunk_FlexQueryParser_2009July10_v5.patch, lucene_trunk_FlexQueryParser_2009july15_v6.patch, lucene_trunk_FlexQueryParser_2009july16_v7.patch, lucene_trunk_FlexQueryParser_2009july23_v8.patch, lucene_trunk_FlexQueryParser_2009july27_v9.patch, lucene_trunk_FlexQueryParser_2009july28_v10.patch, lucene_trunk_FlexQueryParser_2009March24.patch, lucene_trunk_FlexQueryParser_2009March26_v3.patch, new_query_parser_src.tar, QueryParser_restructure_meetup_june2009_v2.pdf, wiki_switching_to_the_new_query_parser.txt From New flexible query parser thread by Micheal Busch in my team at IBM we have used a different query parser than Lucene's in our products for quite a while. Recently we spent a significant amount of time in refactoring the code and designing a very generic architecture, so that this query parser can be easily used for different products with varying query syntaxes. This work was originally driven by Andreas Neumann (who, however, left our team); most of the code was written by Luis Alves, who has been a bit active in Lucene in the past, and Adriano Campos, who joined our team at IBM half a year ago. Adriano is Apache committer and PMC member on the Tuscany project and getting familiar with Lucene now too. We think this code is much more flexible and extensible than the current Lucene query parser, and would therefore like to contribute it to Lucene. I'd like to give a very brief architecture overview here, Adriano and Luis can then answer more detailed questions as they're much more familiar with the code than I am. The goal was it to separate syntax and semantics of a query. E.g. 'a AND b', '+a +b', 'AND(a,b)' could be different syntaxes for the same query. We distinguish the semantics of the different query components, e.g. whether and how to tokenize/lemmatize/normalize the different terms or which Query objects to create for the terms. We wanted to be able to write a parser with a new syntax, while reusing the underlying semantics, as quickly as possible. In fact, Adriano is currently working on a 100% Lucene-syntax compatible implementation to make it easy for people who are using Lucene's query parser to switch. The query parser has three layers and its core is what we call the QueryNodeTree. It is a tree that initially represents the syntax of the original query, e.g. for 'a AND b': AND / \ A B The three layers are: 1. QueryParser 2. QueryNodeProcessor 3. QueryBuilder 1. The upper layer is the parsing layer which simply transforms the query text string into a QueryNodeTree. Currently our implementations of this layer use javacc. 2. The query node processors do most of the work. It is in fact a configurable chain of processors. Each processors can walk the tree and modify nodes or even the tree's structure. That makes it possible to e.g. do query optimization before the query is executed or to tokenize terms. 3. The third layer is also a configurable chain of builders, which transform the QueryNodeTree into Lucene Query objects. Furthermore the query parser uses flexible configuration objects, which are based on AttributeSource/Attribute. It also uses message classes that allow to attach resource bundles. This makes it possible to translate messages, which is an important feature of a query parser. This design allows us to develop different query syntaxes very quickly. Adriano wrote the Lucene-compatible syntax in a matter of hours, and the underlying processors and builders in a few days. We now have a 100% compatible Lucene query parser,
[jira] Commented: (LUCENE-1748) getPayloadSpans on org.apache.lucene.search.spans.SpanQuery should be abstract
[ https://issues.apache.org/jira/browse/LUCENE-1748?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12736674#action_12736674 ] Mark Miller commented on LUCENE-1748: - This is going to require a patch to the 2.4 back compat branch to pass tests. getPayloadSpans on org.apache.lucene.search.spans.SpanQuery should be abstract -- Key: LUCENE-1748 URL: https://issues.apache.org/jira/browse/LUCENE-1748 Project: Lucene - Java Issue Type: Bug Components: Query/Scoring Affects Versions: 2.4, 2.4.1 Environment: all Reporter: Hugh Cayless Assignee: Mark Miller Fix For: 2.9, 3.0, 3.1 Attachments: LUCENE-1748.patch I just spent a long time tracking down a bug resulting from upgrading to Lucene 2.4.1 on a project that implements some SpanQuerys of its own and was written against 2.3. Since the project's SpanQuerys didn't implement getPayloadSpans, the call to that method went to SpanQuery.getPayloadSpans which returned null and caused a NullPointerException in the Lucene code, far away from the actual source of the problem. It would be much better for this kind of thing to show up at compile time, I think. Thanks! -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org For additional commands, e-mail: java-dev-h...@lucene.apache.org
[jira] Updated: (LUCENE-1766) Add Thread-Safety note to IndexWriter JavaDoc
[ https://issues.apache.org/jira/browse/LUCENE-1766?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Michael McCandless updated LUCENE-1766: --- Attachment: LUCENE-1766.patch OK another rev! I backed away from giving particulars on how should synchronize and just said generically use your own (non-Lucene) objects instead. Add Thread-Safety note to IndexWriter JavaDoc - Key: LUCENE-1766 URL: https://issues.apache.org/jira/browse/LUCENE-1766 Project: Lucene - Java Issue Type: Improvement Affects Versions: 2.9 Reporter: Simon Willnauer Assignee: Michael McCandless Priority: Minor Fix For: 2.9 Attachments: LUCENE-1766.patch, LUCENE-1766.patch, LUCENE-1766.patch, LUCENE-1766.patch, LUCENE-1766.patch IndexWriter Javadocs should contain a note about thread-safety. This is already mentioned on the wiki FAQ page but such an essential information should be part of the module documentation too. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org For additional commands, e-mail: java-dev-h...@lucene.apache.org
backwards compat tests
Is their a wiki page on how to handle updating the back compat tests? I found some mail regarding it, but most of what I found was older. The latest I saw talked about the separate branch, and updating that branch with fixes if you need too - but I see now it seems to work with tags? Do I update the branch, tag it with the current date, then update the build file to point to the new tag (compatibility.tag)? -- - Mark http://www.lucidimagination.com
Re: Adding undeleteDocument(docId | Term | Query) to IndexReader (and IndexWriter?)
+1 Though not by docID (since they aren't reliable in context of IndexWriter)... and it should be undeleteDocuments (with an s) since it could affect more than one doc. Mike On Wed, Jul 29, 2009 at 10:55 AM, Shai Ereraser...@gmail.com wrote: Hi I think such methods are useful for a Lucene app, which needs to rollback a single document delete. Today, IndexReader offers undeleteAll(), which is a bit extreme. There are two scenarios for this, that I know of: 1) (recently showed up on the user list) I'd like to synchronize documents on disk and in the index. So if I have a document in the index which I want to delete, and also a file on the file system (corresponds to an ID or something), and the file delete fails, I may want to undelete that document. This has alternatives, but still and undeleteDocument will be useful in this case. 2) ParallelReader allows one to add a document to two indexes, some fields to one index and other to the second index, and then read those indexes in parallel. Such applications will need to delete documents sometimes, and an undeleteDocument will be useful if a transactional delete is needed: i.e., if the first delete succeeds, and the second fails, undo the first delete. 3) ParallelReader doesn't support deleteDocument well currently - i.e., if one of the deletes fail, some readers will be left w/ the document and some won't (this is I think a bug). What do you think? Shai - To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org For additional commands, e-mail: java-dev-h...@lucene.apache.org
Re: backwards compat tests
I think it's not documented anywhere... roughly these are the steps: * Make mods to tags/lucene_2_4_.../* so ant test-tag passes * Use svn switch to switch that tags checkout from a tag to the 2_4 back compat branch * Commit from that dir plant a new tag * Update common-build.xml to point to the new tag * Maybe run ant test-tag again and confirm everything passes * Commit at the top level Mike On Wed, Jul 29, 2009 at 12:23 PM, Mark Millermarkrmil...@gmail.com wrote: Is their a wiki page on how to handle updating the back compat tests? I found some mail regarding it, but most of what I found was older. The latest I saw talked about the separate branch, and updating that branch with fixes if you need too - but I see now it seems to work with tags? Do I update the branch, tag it with the current date, then update the build file to point to the new tag (compatibility.tag)? -- - Mark http://www.lucidimagination.com - To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org For additional commands, e-mail: java-dev-h...@lucene.apache.org
[jira] Commented: (LUCENE-1766) Add Thread-Safety note to IndexWriter JavaDoc
[ https://issues.apache.org/jira/browse/LUCENE-1766?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12736696#action_12736696 ] Simon Willnauer commented on LUCENE-1766: - looks good. private final Object is rather a general best practice than something lucene or module specific. simon Add Thread-Safety note to IndexWriter JavaDoc - Key: LUCENE-1766 URL: https://issues.apache.org/jira/browse/LUCENE-1766 Project: Lucene - Java Issue Type: Improvement Affects Versions: 2.9 Reporter: Simon Willnauer Assignee: Michael McCandless Priority: Minor Fix For: 2.9 Attachments: LUCENE-1766.patch, LUCENE-1766.patch, LUCENE-1766.patch, LUCENE-1766.patch, LUCENE-1766.patch IndexWriter Javadocs should contain a note about thread-safety. This is already mentioned on the wiki FAQ page but such an essential information should be part of the module documentation too. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org For additional commands, e-mail: java-dev-h...@lucene.apache.org
[jira] Resolved: (LUCENE-1752) incorrect snippet returned with SpanScorer
[ https://issues.apache.org/jira/browse/LUCENE-1752?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mark Miller resolved LUCENE-1752. - Resolution: Fixed Lucene Fields: [New, Patch Available] (was: [New]) Thanks Koji! incorrect snippet returned with SpanScorer -- Key: LUCENE-1752 URL: https://issues.apache.org/jira/browse/LUCENE-1752 Project: Lucene - Java Issue Type: Bug Components: contrib/highlighter Affects Versions: 2.9 Reporter: Koji Sekiguchi Assignee: Mark Miller Priority: Minor Fix For: 2.9 Attachments: LUCENE-1752.patch This problem was reported by my customer. They are using Solr 1.3 and uni-gram, but it can be reproduced with Lucene 2.9 and WhitespaceAnalyzer. {panel:title=Query} (f1:a b c d OR f2:a b c d) AND (f1:b c g OR f2:b c g) {panel} The snippet we expected is: {panel} x y z Ba/B Bb/B Bc/B Bd/B e f g Bb/B Bc/B Bg/B {panel} but we got: {panel} x y z Ba/B b c Bd/B e f g Bb/B Bc/B Bg/B {panel} Program to reproduce the problem: {code} public class TestHighlighter { static final String CONTENT = x y z a b c d e f g b c g; static final String PH1 = \a b c d\; static final String PH2 = \b c g\; static final String F1 = f1; static final String F2 = f2; static final String F1C = F1 + :; static final String F2C = F2 + :; static final String QUERY_STRING = ( + F1C + PH1 + OR + F2C + PH1 + ) AND ( + F1C + PH2 + OR + F2C + PH2 + ); static Analyzer analyzer = new WhitespaceAnalyzer(); public static void main(String[] args) throws Exception { QueryParser qp = new QueryParser( F1, analyzer ); Query query = qp.parse( QUERY_STRING ); CachingTokenFilter stream = new CachingTokenFilter( analyzer.tokenStream( F1, new StringReader( CONTENT ) ) ); Scorer scorer = new SpanScorer( query, F1, stream, false ); Highlighter h = new Highlighter( scorer ); System.out.println( query : + QUERY_STRING ); System.out.println( h.getBestFragment( analyzer, F1, CONTENT ) ); } } {code} -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org For additional commands, e-mail: java-dev-h...@lucene.apache.org
[jira] Created: (LUCENE-1767) Add sizeof to OpenBitSet
Add sizeof to OpenBitSet Key: LUCENE-1767 URL: https://issues.apache.org/jira/browse/LUCENE-1767 Project: Lucene - Java Issue Type: Improvement Components: Index Affects Versions: 2.4.1 Reporter: Jason Rutherglen Priority: Trivial Fix For: 2.9 Adding a sizeof method to OpenBitSet will facilitate estimating RAM usage when many OBS' are cached (such as Solr). -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org For additional commands, e-mail: java-dev-h...@lucene.apache.org
[jira] Updated: (LUCENE-1767) Add sizeof to OpenBitSet
[ https://issues.apache.org/jira/browse/LUCENE-1767?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jason Rutherglen updated LUCENE-1767: - Attachment: LUCENE-1767.patch Added sizeOf method Add sizeof to OpenBitSet Key: LUCENE-1767 URL: https://issues.apache.org/jira/browse/LUCENE-1767 Project: Lucene - Java Issue Type: Improvement Components: Index Affects Versions: 2.4.1 Reporter: Jason Rutherglen Priority: Trivial Fix For: 2.9 Attachments: LUCENE-1767.patch Original Estimate: 2h Remaining Estimate: 2h Adding a sizeof method to OpenBitSet will facilitate estimating RAM usage when many OBS' are cached (such as Solr). -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org For additional commands, e-mail: java-dev-h...@lucene.apache.org
[jira] Commented: (LUCENE-1767) Add sizeof to OpenBitSet
[ https://issues.apache.org/jira/browse/LUCENE-1767?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12736724#action_12736724 ] Simon Willnauer commented on LUCENE-1767: - Jason, I would expect a sizeOf method to return the size of the bitset itself (what #size()) returns. Maybe you find another name for that method. I also think you can safely leave the constants out - once you leave those out this method is almost identical to #capacity / #size. I'm not sure if such a method would rather confuse users / developers. If we add it I would rather go for a very meaningful name like allocatedBytes. simon Add sizeof to OpenBitSet Key: LUCENE-1767 URL: https://issues.apache.org/jira/browse/LUCENE-1767 Project: Lucene - Java Issue Type: Improvement Components: Index Affects Versions: 2.4.1 Reporter: Jason Rutherglen Priority: Trivial Fix For: 2.9 Attachments: LUCENE-1767.patch Original Estimate: 2h Remaining Estimate: 2h Adding a sizeof method to OpenBitSet will facilitate estimating RAM usage when many OBS' are cached (such as Solr). -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org For additional commands, e-mail: java-dev-h...@lucene.apache.org
[jira] Commented: (LUCENE-1749) FieldCache introspection API
[ https://issues.apache.org/jira/browse/LUCENE-1749?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12736732#action_12736732 ] Mark Miller commented on LUCENE-1749: - bq. figure out why previously mentioned tests are breaking (need help with this one ... don't know enough about the code these tests excercise Eh - its yucky. There are parts where the tests are passing the top level reader (say to a collector) when it should be using the sub readers. I fixed one :) But then there is more - looked at a couple more difficult ones that also pass the top level reader for the test. And then there is explain - IndexSearcher passes the top level reader to the weight explain, and valuesourcequery will get a fieldcache based on that reader. I guess that one is a bug. And there are prob a few other similar type things... FieldCache introspection API Key: LUCENE-1749 URL: https://issues.apache.org/jira/browse/LUCENE-1749 Project: Lucene - Java Issue Type: Improvement Components: Search Reporter: Hoss Man Priority: Minor Fix For: 2.9 Attachments: fieldcache-introspection.patch, LUCENE-1749.patch, LUCENE-1749.patch, LUCENE-1749.patch, LUCENE-1749.patch, LUCENE-1749.patch FieldCache should expose an Expert level API for runtime introspection of the FieldCache to provide info about what is in the FieldCache at any given moment. We should also provide utility methods for sanity checking that the FieldCache doesn't contain anything odd... * entries for the same reader/field with different types/parsers * entries for the same field/type/parser in a reader and it's subreader(s) * etc... -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org For additional commands, e-mail: java-dev-h...@lucene.apache.org
[jira] Resolved: (LUCENE-1766) Add Thread-Safety note to IndexWriter JavaDoc
[ https://issues.apache.org/jira/browse/LUCENE-1766?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Michael McCandless resolved LUCENE-1766. Resolution: Fixed Add Thread-Safety note to IndexWriter JavaDoc - Key: LUCENE-1766 URL: https://issues.apache.org/jira/browse/LUCENE-1766 Project: Lucene - Java Issue Type: Improvement Affects Versions: 2.9 Reporter: Simon Willnauer Assignee: Michael McCandless Priority: Minor Fix For: 2.9 Attachments: LUCENE-1766.patch, LUCENE-1766.patch, LUCENE-1766.patch, LUCENE-1766.patch, LUCENE-1766.patch IndexWriter Javadocs should contain a note about thread-safety. This is already mentioned on the wiki FAQ page but such an essential information should be part of the module documentation too. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org For additional commands, e-mail: java-dev-h...@lucene.apache.org
[jira] Commented: (LUCENE-1749) FieldCache introspection API
[ https://issues.apache.org/jira/browse/LUCENE-1749?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12736750#action_12736750 ] Mark Miller commented on LUCENE-1749: - bq. And then there is explain - IndexSearcher passes the top level reader to the weight explain, and valuesourcequery will get a fieldcache based on that reader. I guess that one is a bug. I don't even know what to do about this one. All I can think is that you pump out an explain for each sub reader - but thats pretty unhelpful. Perhaps the best we can do is javadoc the extra requirements that may be needed when you use explain? FieldCache introspection API Key: LUCENE-1749 URL: https://issues.apache.org/jira/browse/LUCENE-1749 Project: Lucene - Java Issue Type: Improvement Components: Search Reporter: Hoss Man Priority: Minor Fix For: 2.9 Attachments: fieldcache-introspection.patch, LUCENE-1749.patch, LUCENE-1749.patch, LUCENE-1749.patch, LUCENE-1749.patch, LUCENE-1749.patch FieldCache should expose an Expert level API for runtime introspection of the FieldCache to provide info about what is in the FieldCache at any given moment. We should also provide utility methods for sanity checking that the FieldCache doesn't contain anything odd... * entries for the same reader/field with different types/parsers * entries for the same field/type/parser in a reader and it's subreader(s) * etc... -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org For additional commands, e-mail: java-dev-h...@lucene.apache.org
RE: backwards compat tests
I do it that way: - Checkout the backwards branch (not the tag) to trunk/tags/lucene_2_4_back_compat_tests. I have this checkout everytime there, I update it regularily together with trunk. - Place and leave a build.properties files with the following line in your trunk dir: tag=lucene_2_4_back_compat_tests - You can then test using ant test / test-tag and so on, the java property fixes the tag directory to your branch checkout. The good thing is, that you always have the last revision of branch and can modify and commit it directly. - If everything is ok, do a tag from your checked out branch (svn copy .) and then update the main common-build.xml I was always wondering: Why do we need tags for the backwards tests? Why not just automatically checkout the revision equal to the current trunk revision for testing (what I did manually)? Currently we always have to create a new tag after each commit to backwards branch, this is somehow strange (ok, by that you fix the revision used for testing this trunk checkout, but if you checkout the same revision no in the backwards branch that trunk currently has, it would always be correctly related). - Uwe Schindler H.-H.-Meier-Allee 63, D-28213 Bremen http://www.thetaphi.de http://www.thetaphi.de eMail: u...@thetaphi.de _ From: Mark Miller [mailto:markrmil...@gmail.com] Sent: Wednesday, July 29, 2009 6:24 PM To: java-dev@lucene.apache.org Subject: backwards compat tests Is their a wiki page on how to handle updating the back compat tests? I found some mail regarding it, but most of what I found was older. The latest I saw talked about the separate branch, and updating that branch with fixes if you need too - but I see now it seems to work with tags? Do I update the branch, tag it with the current date, then update the build file to point to the new tag (compatibility.tag)? -- - Mark http://www.lucidimagination.com
Re: Adding undeleteDocument(docId | Term | Query) to IndexReader (and IndexWriter?)
Yes of course. I meant to create an undeleteDoc variant for every deleteDoc. So if IndexWriter has deleteDocuments(Term), I will add undeleteDocuments(Term). If IndexReader has deleteDocument(int), I will add undeleteDocument(int). It is up to the caller to make sure whatever he undeletes was indeed deleted, i.e., if you reader.deleteDocument(4) and then reader.undeleteDocument(4), you should make sure that 4 represents the same document. In fact, I think it might be useful to restrict the undeleteDoc methods to the same reader instance with which they were deleted? It's easy to do by checking if deletedDocs does not contain any of the docs passed to the undelete method. The rational is that I believe the best use case for these undelete methods to be a mini undo of the last delete. Using the same reader instance you're guaranteed that the document is still deleted between delete() and undelete(). Also, since I can only open the index for write once, whether by IndexWriter or IndexReader w/ readOnly=false, we can guarantee that an undelete followed by delete is safe? Shai On Wed, Jul 29, 2009 at 7:26 PM, Michael McCandless luc...@mikemccandless.com wrote: +1 Though not by docID (since they aren't reliable in context of IndexWriter)... and it should be undeleteDocuments (with an s) since it could affect more than one doc. Mike On Wed, Jul 29, 2009 at 10:55 AM, Shai Ereraser...@gmail.com wrote: Hi I think such methods are useful for a Lucene app, which needs to rollback a single document delete. Today, IndexReader offers undeleteAll(), which is a bit extreme. There are two scenarios for this, that I know of: 1) (recently showed up on the user list) I'd like to synchronize documents on disk and in the index. So if I have a document in the index which I want to delete, and also a file on the file system (corresponds to an ID or something), and the file delete fails, I may want to undelete that document. This has alternatives, but still and undeleteDocument will be useful in this case. 2) ParallelReader allows one to add a document to two indexes, some fields to one index and other to the second index, and then read those indexes in parallel. Such applications will need to delete documents sometimes, and an undeleteDocument will be useful if a transactional delete is needed: i.e., if the first delete succeeds, and the second fails, undo the first delete. 3) ParallelReader doesn't support deleteDocument well currently - i.e., if one of the deletes fail, some readers will be left w/ the document and some won't (this is I think a bug). What do you think? Shai - To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org For additional commands, e-mail: java-dev-h...@lucene.apache.org
[jira] Updated: (LUCENE-1762) Slightly more readable code in Token/TermAttributeImpl
[ https://issues.apache.org/jira/browse/LUCENE-1762?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Uwe Schindler updated LUCENE-1762: -- Description: No big deal. growTermBuffer(int newSize) was using correct, but slightly hard to follow code. the method was returning null as a hint that the current termBuffer has enough space to the upstream code or reallocated buffer. this patch simplifies logic making this method to only reallocate buffer, nothing more. It reduces number of if(null) checks in a few methods and reduces amount of code. all tests pass. This also adds tests for the new basic attribute impls (copies of the Token tests). was: No big deal. growTermBuffer(int newSize) was using correct, but slightly hard to follow code. the method was returning null as a hint that the current termBuffer has enough space to the upstream code or reallocated buffer. this patch simplifies logic making this method to only reallocate buffer, nothing more. It reduces number of if(null) checks in a few methods and reduces amount of code. all tests pass. Summary: Slightly more readable code in Token/TermAttributeImpl (was: Slightly more readable code in TermAttributeImpl ) Slightly more readable code in Token/TermAttributeImpl -- Key: LUCENE-1762 URL: https://issues.apache.org/jira/browse/LUCENE-1762 Project: Lucene - Java Issue Type: Improvement Components: Analysis Affects Versions: 2.9 Reporter: Eks Dev Assignee: Uwe Schindler Priority: Trivial Fix For: 2.9 Attachments: LUCENE-1762-bw.patch, LUCENE-1762.patch, LUCENE-1762.patch, LUCENE-1762.patch, LUCENE-1762.patch, LUCENE-1762.patch No big deal. growTermBuffer(int newSize) was using correct, but slightly hard to follow code. the method was returning null as a hint that the current termBuffer has enough space to the upstream code or reallocated buffer. this patch simplifies logic making this method to only reallocate buffer, nothing more. It reduces number of if(null) checks in a few methods and reduces amount of code. all tests pass. This also adds tests for the new basic attribute impls (copies of the Token tests). -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org For additional commands, e-mail: java-dev-h...@lucene.apache.org
[jira] Closed: (LUCENE-1762) Slightly more readable code in Token/TermAttributeImpl
[ https://issues.apache.org/jira/browse/LUCENE-1762?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Uwe Schindler closed LUCENE-1762. - Resolution: Fixed Committed revision: 799025 This is without CHANGES.txt updates, because nothing was changed that is visible to the outside :-) Thanks Eks! Slightly more readable code in Token/TermAttributeImpl -- Key: LUCENE-1762 URL: https://issues.apache.org/jira/browse/LUCENE-1762 Project: Lucene - Java Issue Type: Improvement Components: Analysis Affects Versions: 2.9 Reporter: Eks Dev Assignee: Uwe Schindler Priority: Trivial Fix For: 2.9 Attachments: LUCENE-1762-bw.patch, LUCENE-1762.patch, LUCENE-1762.patch, LUCENE-1762.patch, LUCENE-1762.patch, LUCENE-1762.patch No big deal. growTermBuffer(int newSize) was using correct, but slightly hard to follow code. the method was returning null as a hint that the current termBuffer has enough space to the upstream code or reallocated buffer. this patch simplifies logic making this method to only reallocate buffer, nothing more. It reduces number of if(null) checks in a few methods and reduces amount of code. all tests pass. This also adds tests for the new basic attribute impls (copies of the Token tests). -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org For additional commands, e-mail: java-dev-h...@lucene.apache.org
Re: backwards compat tests
Uwe - I asked this question a while ago on LUCENE-1529 and this is an answer Mike gave: http://issues.apache.org/jira/browse/LUCENE-1529?focusedCommentId=12699177page=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#action_12699177 I think it's related to what you ask Shai On Wed, Jul 29, 2009 at 10:01 PM, Uwe Schindler u...@thetaphi.de wrote: I do it that way: - Checkout the backwards branch (not the tag) to trunk/tags/lucene_2_4_back_compat_tests. I have this checkout everytime there, I update it regularily together with trunk. - Place and leave a build.properties files with the following line in your trunk dir: “tag=lucene_2_4_back_compat_tests” - You can then test using ant test / test-tag and so on, the java property fixes the tag directory to your branch checkout. The good thing is, that you always have the last revision of branch and can modify and commit it directly. - If everything is ok, do a tag from your checked out branch (svn copy …) and then update the main common-build.xml I was always wondering: Why do we need tags for the backwards tests? Why not just automatically checkout the revision equal to the current trunk revision for testing (what I did manually)? Currently we always have to create a new tag after each commit to backwards branch, this is somehow strange (ok, by that you fix the revision used for testing this trunk checkout, but if you checkout the same revision no in the backwards branch that trunk currently has, it would always be correctly related). - Uwe Schindler H.-H.-Meier-Allee 63, D-28213 Bremen http://www.thetaphi.de eMail: u...@thetaphi.de -- *From:* Mark Miller [mailto:markrmil...@gmail.com] *Sent:* Wednesday, July 29, 2009 6:24 PM *To:* java-dev@lucene.apache.org *Subject:* backwards compat tests Is their a wiki page on how to handle updating the back compat tests? I found some mail regarding it, but most of what I found was older. The latest I saw talked about the separate branch, and updating that branch with fixes if you need too - but I see now it seems to work with tags? Do I update the branch, tag it with the current date, then update the build file to point to the new tag (compatibility.tag)? -- - Mark http://www.lucidimagination.com
Re: Adding undeleteDocument(docId | Term | Query) to IndexReader (and IndexWriter?)
On Wed, Jul 29, 2009 at 3:05 PM, Shai Ereraser...@gmail.com wrote: Yes of course. I meant to create an undeleteDoc variant for every deleteDoc. So if IndexWriter has deleteDocuments(Term), I will add undeleteDocuments(Term). If IndexReader has deleteDocument(int), I will add undeleteDocument(int). OK. It is up to the caller to make sure whatever he undeletes was indeed deleted, i.e., if you reader.deleteDocument(4) and then reader.undeleteDocument(4), you should make sure that 4 represents the same document. Presumably in IndexReader we can return int count (how many deleted), but in IndexWriter it's void. In fact, I think it might be useful to restrict the undeleteDoc methods to the same reader instance with which they were deleted? It's easy to do by checking if deletedDocs does not contain any of the docs passed to the undelete method. The rational is that I believe the best use case for these undelete methods to be a mini undo of the last delete. Using the same reader instance you're guaranteed that the document is still deleted between delete() and undelete(). That might be too restrictive? Ie, this is the best use case we can picture today, but others could come up with different use cases, and there's no technical reason for such a restriction? undeleteAll doesn't have such a restriction. Also, since I can only open the index for write once, whether by IndexWriter or IndexReader w/ readOnly=false, we can guarantee that an undelete followed by delete is safe? Or the undelete methods in IndexReader could just acquire the write lock? Mike - To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org For additional commands, e-mail: java-dev-h...@lucene.apache.org
Re: Adding undeleteDocument(docId | Term | Query) to IndexReader (and IndexWriter?)
Or the undelete methods in IndexReader could just acquire the write lock? I'll need to open IndexReader w/ readOnly=false if I want to delete/undelete a document, no? And then I'll need to acquire the write lock, just like any other write operation done through IndexReader, right? Or do you suggest we allow this for readOnly IndexReaders too? That might be too restrictive? Yes - I pointed that just as a safety measure. However, sometimes (especially following the 'agile' guidelines) it's better to develop something for a problem we know exist, rather than trying to over-engineer for something we 'think might exist'. If a good use case will be presented in the future which requires the undelete to work also in readers that did not do the delete themselves, we can change that behavior then, no? Maybe I'll start to work on it and we can decide that as we go? There's no point making decisions now, when we don't know if it is a major thing to support or not. Maybe it can be supported 'for free', and then it won't be a question at all. Shai On Wed, Jul 29, 2009 at 10:58 PM, Michael McCandless luc...@mikemccandless.com wrote: undeleteAll doesn't have such a restriction.
Re: Adding undeleteDocument(docId | Term | Query) to IndexReader (and IndexWriter?)
On Wed, Jul 29, 2009 at 4:06 PM, Shai Ereraser...@gmail.com wrote: Or the undelete methods in IndexReader could just acquire the write lock? I'll need to open IndexReader w/ readOnly=false if I want to delete/undelete a document, no? And then I'll need to acquire the write lock, just like any other write operation done through IndexReader, right? Or do you suggest we allow this for readOnly IndexReaders too? Right, you'll definitely need to acquire the write lock for undeleteDoc. That might be too restrictive? Yes - I pointed that just as a safety measure. However, sometimes (especially following the 'agile' guidelines) it's better to develop something for a problem we know exist, rather than trying to over-engineer for something we 'think might exist'. If a good use case will be presented in the future which requires the undelete to work also in readers that did not do the delete themselves, we can change that behavior then, no? Maybe I'll start to work on it and we can decide that as we go? There's no point making decisions now, when we don't know if it is a major thing to support or not. Maybe it can be supported 'for free', and then it won't be a question at all. I agree! There's no need to decide now. So let's defer. Mike - To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org For additional commands, e-mail: java-dev-h...@lucene.apache.org
Re: backwards compat tests
On Wed, Jul 29, 2009 at 4:31 PM, Uwe Schindleru...@thetaphi.de wrote: My suggestion was to write the build script in a way that it checks out the branch with the same revision number as the current base dir (trunk). I think this would work, as long as we always commit top-level and back-compat tag in one transaction (commit)? (And, even if we don't do it as one commit, the risk that someone happens to do a checkout between the two commits is presumably negligible). Alternatively instead of putting a tag name into common-build.xml, it could be the revision number. So it would check out …/branches/ lucene_2_4_back_compat_tests with the revision given in common-build. This would also be better than what we have today (saves the extra svn copy step), but if we can make the first approach work that's even better! Mike - To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org For additional commands, e-mail: java-dev-h...@lucene.apache.org
RE: backwards compat tests
My suggestion was to write the build script in a way that it checks out the branch with the same revision number as the current base dir (trunk). I think this would work, as long as we always commit top-level and back-compat tag in one transaction (commit)? (And, even if we don't do it as one commit, the risk that someone happens to do a checkout between the two commits is presumably negligible). I think if you first commit in backwards-branch and then in trunk, you never get an inconsistent state. The trunk revision is lower than the new branch revision, so nothing changes, as a trunk checkout and test-tag would run the tests from its current revision (that did not change). This is the same as now. You can modify the bw-branch and create a new tag, but as trunks common-build is not updated, nobody would see it. You only get an inconsistent state if you have run test-tag before and have a current checkout of the bw-branch. If you then do svn update on the bw-branch you will update this to last revision. But if you do this, you will also update trunk (otherwise it would not make sense). There is only one problem: If you already have checked out the branch with a specific revision and then update trunk, the next test-run will use the old tests (as dir already exists, currently it would checkout a new tag because dir name changed). Because of this, test-tag should also do a svn update to the current trunk's revision. Alternatively instead of putting a tag name into common-build.xml, it could be the revision number. So it would check out ./branches/ lucene_2_4_back_compat_tests with the revision given in common-build. This would also be better than what we have today (saves the extra svn copy step), but if we can make the first approach work that's even better! I suggest two variables in common-build.xml: - backwards-branch or backwards-branch-url (must be changed when 3.0 is out and 3.1 starts in trunk). - backwards-revision The same problem with trunk updated and branch still available also happens here. So each run of test-tag should do a svn update to the revision from the config before (maybe give the possibility to switch this off or only update, never downgrade) - To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org For additional commands, e-mail: java-dev-h...@lucene.apache.org
[jira] Commented: (LUCENE-1486) Wildcards, ORs etc inside Phrase queries
[ https://issues.apache.org/jira/browse/LUCENE-1486?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12736851#action_12736851 ] Mark Miller commented on LUCENE-1486: - If we don't have a clear path for this very soon I think we should pull it from this release. Wildcards, ORs etc inside Phrase queries Key: LUCENE-1486 URL: https://issues.apache.org/jira/browse/LUCENE-1486 Project: Lucene - Java Issue Type: Improvement Components: QueryParser Affects Versions: 2.4 Reporter: Mark Harwood Assignee: Mark Harwood Priority: Minor Fix For: 2.9 Attachments: ComplexPhraseQueryParser.java, junit_complex_phrase_qp_07_21_2009.patch, junit_complex_phrase_qp_07_22_2009.patch, Lucene-1486 non default field.patch, LUCENE-1486.patch, LUCENE-1486.patch, LUCENE-1486.patch, LUCENE-1486.patch, TestComplexPhraseQuery.java An extension to the default QueryParser that overrides the parsing of PhraseQueries to allow more complex syntax e.g. wildcards in phrase queries. The implementation feels a little hacky - this is arguably better handled in QueryParser itself. This works as a proof of concept for much of the query parser syntax. Examples from the Junit test include: checkMatches(\j* smyth~\, 1,2); //wildcards and fuzzies are OK in phrases checkMatches(\(jo* -john) smith\, 2); // boolean logic works checkMatches(\jo* smith\~2, 1,2,3); // position logic works. checkBadQuery(\jo* id:1 smith\); //mixing fields in a phrase is bad checkBadQuery(\jo* \smith\ \); //phrases inside phrases is bad checkBadQuery(\jo* [sma TO smZ]\ \); //range queries inside phrases not supported Code plus Junit test to follow... -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org For additional commands, e-mail: java-dev-h...@lucene.apache.org
[jira] Commented: (LUCENE-625) Query auto completer
[ https://issues.apache.org/jira/browse/LUCENE-625?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12736858#action_12736858 ] Jason Rutherglen commented on LUCENE-625: - Karl, did you ever proceed on this patch? I'm interested in adding autosuggest to Solr. Query auto completer Key: LUCENE-625 URL: https://issues.apache.org/jira/browse/LUCENE-625 Project: Lucene - Java Issue Type: New Feature Components: Search Reporter: Karl Wettin Priority: Minor Attachments: autocomplete_0.0.1.tar.gz, autocomplete_20060730.tar.gz A trie that helps users to type in their query. Made for AJAX, works great with ruby on rails common scripts http://script.aculo.us/. Similar to the Google labs suggester. Trained by user queries. Optimizable. Uses an in memory corpus. Serializable. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org For additional commands, e-mail: java-dev-h...@lucene.apache.org
[jira] Commented: (LUCENE-1567) New flexible query parser
[ https://issues.apache.org/jira/browse/LUCENE-1567?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12736879#action_12736879 ] Michael Busch commented on LUCENE-1567: --- {quote} Could you also please fix the javadocs? When I'm building the javadocs I'm getting a lot of warnings about not found references. {quote} The warnings occur because you put links to the new contrib queryparser into the core queryparser. That doesn't work as the contribs are not in the classpath of the core, so I think we should remove those links and change them just to plain text. Also, please make sure to add to the main build.xml appropriate entries for the javadocs, otherwise the All javadocs will not contain the contrib QP classes. There are also some TODOs in the docs; especially in top-level places, such as the package.html of your new package, we should not have TODOs in the docs. Please fix that soon, 2.9 is coming quickly. New flexible query parser - Key: LUCENE-1567 URL: https://issues.apache.org/jira/browse/LUCENE-1567 Project: Lucene - Java Issue Type: New Feature Components: QueryParser Environment: N/A Reporter: Luis Alves Assignee: Michael Busch Fix For: 2.9 Attachments: lucene-1567.patch, lucene_1567_adriano_crestani_07_13_2009.patch, lucene_trunk_FlexQueryParser_2009July09_v4.patch, lucene_trunk_FlexQueryParser_2009July10_v5.patch, lucene_trunk_FlexQueryParser_2009july15_v6.patch, lucene_trunk_FlexQueryParser_2009july16_v7.patch, lucene_trunk_FlexQueryParser_2009july23_v8.patch, lucene_trunk_FlexQueryParser_2009july27_v9.patch, lucene_trunk_FlexQueryParser_2009july28_v10.patch, lucene_trunk_FlexQueryParser_2009March24.patch, lucene_trunk_FlexQueryParser_2009March26_v3.patch, new_query_parser_src.tar, QueryParser_restructure_meetup_june2009_v2.pdf, wiki_switching_to_the_new_query_parser.txt From New flexible query parser thread by Micheal Busch in my team at IBM we have used a different query parser than Lucene's in our products for quite a while. Recently we spent a significant amount of time in refactoring the code and designing a very generic architecture, so that this query parser can be easily used for different products with varying query syntaxes. This work was originally driven by Andreas Neumann (who, however, left our team); most of the code was written by Luis Alves, who has been a bit active in Lucene in the past, and Adriano Campos, who joined our team at IBM half a year ago. Adriano is Apache committer and PMC member on the Tuscany project and getting familiar with Lucene now too. We think this code is much more flexible and extensible than the current Lucene query parser, and would therefore like to contribute it to Lucene. I'd like to give a very brief architecture overview here, Adriano and Luis can then answer more detailed questions as they're much more familiar with the code than I am. The goal was it to separate syntax and semantics of a query. E.g. 'a AND b', '+a +b', 'AND(a,b)' could be different syntaxes for the same query. We distinguish the semantics of the different query components, e.g. whether and how to tokenize/lemmatize/normalize the different terms or which Query objects to create for the terms. We wanted to be able to write a parser with a new syntax, while reusing the underlying semantics, as quickly as possible. In fact, Adriano is currently working on a 100% Lucene-syntax compatible implementation to make it easy for people who are using Lucene's query parser to switch. The query parser has three layers and its core is what we call the QueryNodeTree. It is a tree that initially represents the syntax of the original query, e.g. for 'a AND b': AND / \ A B The three layers are: 1. QueryParser 2. QueryNodeProcessor 3. QueryBuilder 1. The upper layer is the parsing layer which simply transforms the query text string into a QueryNodeTree. Currently our implementations of this layer use javacc. 2. The query node processors do most of the work. It is in fact a configurable chain of processors. Each processors can walk the tree and modify nodes or even the tree's structure. That makes it possible to e.g. do query optimization before the query is executed or to tokenize terms. 3. The third layer is also a configurable chain of builders, which transform the QueryNodeTree into Lucene Query objects. Furthermore the query parser uses flexible configuration objects, which are based on AttributeSource/Attribute. It also uses message classes that allow to attach resource bundles. This makes it possible to translate messages, which is an important feature of a query parser. This design allows us to develop different query syntaxes very quickly. Adriano wrote the
[jira] Updated: (LUCENE-1749) FieldCache introspection API
[ https://issues.apache.org/jira/browse/LUCENE-1749?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mark Miller updated LUCENE-1749: Attachment: LUCENE-1749.patch Updates: * merged in updated ram usage estimator code * updated most failing tests to work without creating top level FieldCaches * removed offending calls to explain - I left nocommit comments here - depending on what we decide, we could turn off the subreader check for these * Turned off the subreader check for stress sort test - it sorts in back compat mode and compares to the new mode - so it loads both on purpose. * I don't remember if I touched anything else. tests pass now FieldCache introspection API Key: LUCENE-1749 URL: https://issues.apache.org/jira/browse/LUCENE-1749 Project: Lucene - Java Issue Type: Improvement Components: Search Reporter: Hoss Man Priority: Minor Fix For: 2.9 Attachments: fieldcache-introspection.patch, LUCENE-1749.patch, LUCENE-1749.patch, LUCENE-1749.patch, LUCENE-1749.patch, LUCENE-1749.patch, LUCENE-1749.patch FieldCache should expose an Expert level API for runtime introspection of the FieldCache to provide info about what is in the FieldCache at any given moment. We should also provide utility methods for sanity checking that the FieldCache doesn't contain anything odd... * entries for the same reader/field with different types/parsers * entries for the same field/type/parser in a reader and it's subreader(s) * etc... -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org For additional commands, e-mail: java-dev-h...@lucene.apache.org
[jira] Commented: (LUCENE-625) Query auto completer
[ https://issues.apache.org/jira/browse/LUCENE-625?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12736923#action_12736923 ] Karl Wettin commented on LUCENE-625: bq. Karl, did you ever proceed on this patch? I'm interested in adding autosuggest to Solr. I used this patch for a few things a couple of years ago. If I recall everything right I ended up using the bootstrapped apriori corpus of LUCENE-626 as training data the last time. Made the corpus rather small, speedy and still relevant for most users. But the major caveat is that this patch is a trie and is thus a precise forward only thing. So that might not fit all use cases. It might be easier to get things going using an index with ngrams of untokenized user queries (i.e. including whitespace) or subject-like fields. But I really prefere user queries as using only the last n queries will make it sensitive to trends. That will however require quite a bit of data to work well. A lot as in hundreds of thousands of user queries, according to my experience. Not sure if this was an answer to your question.. : ) Query auto completer Key: LUCENE-625 URL: https://issues.apache.org/jira/browse/LUCENE-625 Project: Lucene - Java Issue Type: New Feature Components: Search Reporter: Karl Wettin Priority: Minor Attachments: autocomplete_0.0.1.tar.gz, autocomplete_20060730.tar.gz A trie that helps users to type in their query. Made for AJAX, works great with ruby on rails common scripts http://script.aculo.us/. Similar to the Google labs suggester. Trained by user queries. Optimizable. Uses an in memory corpus. Serializable. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org For additional commands, e-mail: java-dev-h...@lucene.apache.org
[jira] Updated: (LUCENE-1695) Update the Highlighter to use the new TokenStream API
[ https://issues.apache.org/jira/browse/LUCENE-1695?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mark Miller updated LUCENE-1695: Attachment: LUCENE-1695.patch To trunk Update the Highlighter to use the new TokenStream API - Key: LUCENE-1695 URL: https://issues.apache.org/jira/browse/LUCENE-1695 Project: Lucene - Java Issue Type: Improvement Components: contrib/highlighter Reporter: Mark Miller Assignee: Mark Miller Fix For: 2.9 Attachments: LUCENE-1695.patch, LUCENE-1695.patch, LUCENE-1695.patch, LUCENE-1695.patch, LUCENE-1695.patch -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org For additional commands, e-mail: java-dev-h...@lucene.apache.org
[jira] Commented: (LUCENE-1486) Wildcards, ORs etc inside Phrase queries
[ https://issues.apache.org/jira/browse/LUCENE-1486?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12736965#action_12736965 ] Luis Alves commented on LUCENE-1486: My understanding is that with New flexible query parser (LUCENE-1567), the old QueryParser classes will be deprecated in 2.9 and removed in 3.0 (or moved to contrib in 3.0). This change will also make ComplexPhraseQueryParser deprecated because it currently extends the old queryparser. ComplexPhraseQueryParser was not part of any lucene release and was only checked in 2 months ago in trunk. For the reasons above I think we should re-implement this functionality using the new flexible query parser. 3.0 and 2.9 releases will be very similar but 3.0 will have all deprecated APIs removed (at least this is my understanding). In my view the path should be: - Wait for LUCENE-1567 to be in trunk - re-implement this feature using the New flexible query parser - and probably do it using a super set of the current syntax with a new TextParser. I'm not sure if I'll have the time to implement a compatible implementation of ComplexPhraseQueryParser before 2.9 release :( I'm currently working on 1567 to finalize the patch, cleaning up javadocs and some small clean up to the APIs. I'll try to work on ComplexPhraseQueryParser, once lucene-1567 is in the trunk. So in my view, ComplexPhraseQueryParser depends on 1567, and will require some extra work after 1567 is in the trunk. I think we have the following, options: # We could wait until 1567 is in trunk and wait for a compatible implementation of ComplexPhraseQueryParser using 1567, before we release 2.9. (this would still remove the current ComplexPhraseQueryParser class, and provide this features with LuceneQueryParserHelper class, or with a new TextParser name complexphrase) # We can release 2.9 with only 1567, but that will require ComplexPhraseQueryParser to be removed from trunk or at least deprecated in 2.9, and in 3.X re-implement it using the New flexible query parser APIs I hope this helps :) Wildcards, ORs etc inside Phrase queries Key: LUCENE-1486 URL: https://issues.apache.org/jira/browse/LUCENE-1486 Project: Lucene - Java Issue Type: Improvement Components: QueryParser Affects Versions: 2.4 Reporter: Mark Harwood Assignee: Mark Harwood Priority: Minor Fix For: 2.9 Attachments: ComplexPhraseQueryParser.java, junit_complex_phrase_qp_07_21_2009.patch, junit_complex_phrase_qp_07_22_2009.patch, Lucene-1486 non default field.patch, LUCENE-1486.patch, LUCENE-1486.patch, LUCENE-1486.patch, LUCENE-1486.patch, TestComplexPhraseQuery.java An extension to the default QueryParser that overrides the parsing of PhraseQueries to allow more complex syntax e.g. wildcards in phrase queries. The implementation feels a little hacky - this is arguably better handled in QueryParser itself. This works as a proof of concept for much of the query parser syntax. Examples from the Junit test include: checkMatches(\j* smyth~\, 1,2); //wildcards and fuzzies are OK in phrases checkMatches(\(jo* -john) smith\, 2); // boolean logic works checkMatches(\jo* smith\~2, 1,2,3); // position logic works. checkBadQuery(\jo* id:1 smith\); //mixing fields in a phrase is bad checkBadQuery(\jo* \smith\ \); //phrases inside phrases is bad checkBadQuery(\jo* [sma TO smZ]\ \); //range queries inside phrases not supported Code plus Junit test to follow... -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org For additional commands, e-mail: java-dev-h...@lucene.apache.org
[jira] Commented: (LUCENE-1486) Wildcards, ORs etc inside Phrase queries
[ https://issues.apache.org/jira/browse/LUCENE-1486?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12736966#action_12736966 ] Mark Miller commented on LUCENE-1486: - Okay thanks. I think we should pull it for 2.9. Wildcards, ORs etc inside Phrase queries Key: LUCENE-1486 URL: https://issues.apache.org/jira/browse/LUCENE-1486 Project: Lucene - Java Issue Type: Improvement Components: QueryParser Affects Versions: 2.4 Reporter: Mark Harwood Assignee: Mark Harwood Priority: Minor Fix For: 2.9 Attachments: ComplexPhraseQueryParser.java, junit_complex_phrase_qp_07_21_2009.patch, junit_complex_phrase_qp_07_22_2009.patch, Lucene-1486 non default field.patch, LUCENE-1486.patch, LUCENE-1486.patch, LUCENE-1486.patch, LUCENE-1486.patch, TestComplexPhraseQuery.java An extension to the default QueryParser that overrides the parsing of PhraseQueries to allow more complex syntax e.g. wildcards in phrase queries. The implementation feels a little hacky - this is arguably better handled in QueryParser itself. This works as a proof of concept for much of the query parser syntax. Examples from the Junit test include: checkMatches(\j* smyth~\, 1,2); //wildcards and fuzzies are OK in phrases checkMatches(\(jo* -john) smith\, 2); // boolean logic works checkMatches(\jo* smith\~2, 1,2,3); // position logic works. checkBadQuery(\jo* id:1 smith\); //mixing fields in a phrase is bad checkBadQuery(\jo* \smith\ \); //phrases inside phrases is bad checkBadQuery(\jo* [sma TO smZ]\ \); //range queries inside phrases not supported Code plus Junit test to follow... -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org For additional commands, e-mail: java-dev-h...@lucene.apache.org
RE: [jira] Commented: (LUCENE-1764) SampleComparable doesn't work well in contrib/remote tests
: SortField.equals() and hashCode() contain a hint: : : /** Returns true if codeo/code is equal to this. If a :* {...@link SortComparatorSource} (deprecated) or {...@link :* FieldCache.Parser} was provided, it must properly :* implement equals (unless a singleton is always used). */ : : Maybe we should make this more visible, contain all different SortField : comparator/parsers and place it in the the setter methods for parser and : comparators. SortField doesn't seem like the right place at all -- people constructing instances of SortField, or calling setter methods of SortField shouldn't have to care about this at all -- it's people who extend SortComparatorSource or FieldCache.Parser who need to be aware of these issues, so shouldn't the class level javadocs for those packages spell it out? (ideally those abstract classes would declare hasCode and equals as abstract to *force* people to implement them ... but ship has sailed) -Hoss - To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org For additional commands, e-mail: java-dev-h...@lucene.apache.org
[jira] Commented: (LUCENE-1567) New flexible query parser
[ https://issues.apache.org/jira/browse/LUCENE-1567?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12736982#action_12736982 ] Luis Alves commented on LUCENE-1567: Hi Uwe, {quote} Will it be possible to specify some type of schema for the query parser in future, to automatically create NumericRangeQuery for different numeric types? It would then be possible to index a numeric value (double,float,long,int) using NumericField and then the query parser knows, which type of field this is and so it correctly creates a NumericRangeQuery for strings like [1.567..*] or (1.787..19.5]. NumericRangeQuery also supports the rewrite modes, only some type of schema support is missing. {quote} I think this is doable. I don't think there is a way to extract if a field is numeric from the index, so the user will have to configure the FieldConfig objects in the ConfigHandler. But if this is done, it will not be that difficult to implement the rest. Can you create a new jira issue with the description of the feature, so we can discuss the details there. I'll try to implement that once we agree on all the details. New flexible query parser - Key: LUCENE-1567 URL: https://issues.apache.org/jira/browse/LUCENE-1567 Project: Lucene - Java Issue Type: New Feature Components: QueryParser Environment: N/A Reporter: Luis Alves Assignee: Michael Busch Fix For: 2.9 Attachments: lucene-1567.patch, lucene_1567_adriano_crestani_07_13_2009.patch, lucene_trunk_FlexQueryParser_2009July09_v4.patch, lucene_trunk_FlexQueryParser_2009July10_v5.patch, lucene_trunk_FlexQueryParser_2009july15_v6.patch, lucene_trunk_FlexQueryParser_2009july16_v7.patch, lucene_trunk_FlexQueryParser_2009july23_v8.patch, lucene_trunk_FlexQueryParser_2009july27_v9.patch, lucene_trunk_FlexQueryParser_2009july28_v10.patch, lucene_trunk_FlexQueryParser_2009March24.patch, lucene_trunk_FlexQueryParser_2009March26_v3.patch, new_query_parser_src.tar, QueryParser_restructure_meetup_june2009_v2.pdf, wiki_switching_to_the_new_query_parser.txt From New flexible query parser thread by Micheal Busch in my team at IBM we have used a different query parser than Lucene's in our products for quite a while. Recently we spent a significant amount of time in refactoring the code and designing a very generic architecture, so that this query parser can be easily used for different products with varying query syntaxes. This work was originally driven by Andreas Neumann (who, however, left our team); most of the code was written by Luis Alves, who has been a bit active in Lucene in the past, and Adriano Campos, who joined our team at IBM half a year ago. Adriano is Apache committer and PMC member on the Tuscany project and getting familiar with Lucene now too. We think this code is much more flexible and extensible than the current Lucene query parser, and would therefore like to contribute it to Lucene. I'd like to give a very brief architecture overview here, Adriano and Luis can then answer more detailed questions as they're much more familiar with the code than I am. The goal was it to separate syntax and semantics of a query. E.g. 'a AND b', '+a +b', 'AND(a,b)' could be different syntaxes for the same query. We distinguish the semantics of the different query components, e.g. whether and how to tokenize/lemmatize/normalize the different terms or which Query objects to create for the terms. We wanted to be able to write a parser with a new syntax, while reusing the underlying semantics, as quickly as possible. In fact, Adriano is currently working on a 100% Lucene-syntax compatible implementation to make it easy for people who are using Lucene's query parser to switch. The query parser has three layers and its core is what we call the QueryNodeTree. It is a tree that initially represents the syntax of the original query, e.g. for 'a AND b': AND / \ A B The three layers are: 1. QueryParser 2. QueryNodeProcessor 3. QueryBuilder 1. The upper layer is the parsing layer which simply transforms the query text string into a QueryNodeTree. Currently our implementations of this layer use javacc. 2. The query node processors do most of the work. It is in fact a configurable chain of processors. Each processors can walk the tree and modify nodes or even the tree's structure. That makes it possible to e.g. do query optimization before the query is executed or to tokenize terms. 3. The third layer is also a configurable chain of builders, which transform the QueryNodeTree into Lucene Query objects. Furthermore the query parser uses flexible configuration objects, which are based on AttributeSource/Attribute. It also uses message classes that allow to attach resource bundles.
[jira] Updated: (LUCENE-1758) improve arabic analyzer: light8 - light10
[ https://issues.apache.org/jira/browse/LUCENE-1758?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Robert Muir updated LUCENE-1758: Attachment: LUCENE-1758.patch add lowercasefilter, and replace TODO: more tests with some tests. improve arabic analyzer: light8 - light10 -- Key: LUCENE-1758 URL: https://issues.apache.org/jira/browse/LUCENE-1758 Project: Lucene - Java Issue Type: Improvement Components: contrib/analyzers Reporter: Robert Muir Assignee: Michael McCandless Priority: Minor Fix For: 2.9 Attachments: LUCENE-1758.patch, LUCENE-1758.patch, LUCENE-1758.txt Someone mentioned on the java user list that the arabic analysis was not as good as they would like. This patch adds the لل- prefix (light10 algorithm versus light8 algorithm). In the light10 paper, this improves precision from .390 to .413 They mention this is not statistically significant, but it makes linguistic sense and at least has been shown not to hurt. In the future, I hope openrelevance will allow us to try some more approaches. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org For additional commands, e-mail: java-dev-h...@lucene.apache.org
[jira] Updated: (LUCENE-1628) Persian Analyzer
[ https://issues.apache.org/jira/browse/LUCENE-1628?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Robert Muir updated LUCENE-1628: Attachment: LUCENE-1628.patch add lowercasefilter, consistent with the arabic analyzer, its userfriendly for the common case where there is also some english text. Persian Analyzer Key: LUCENE-1628 URL: https://issues.apache.org/jira/browse/LUCENE-1628 Project: Lucene - Java Issue Type: New Feature Components: contrib/analyzers Reporter: Robert Muir Assignee: Mark Miller Priority: Minor Fix For: 2.9 Attachments: LUCENE-1628.patch, LUCENE-1628.patch, LUCENE-1628.patch, LUCENE-1628.patch, LUCENE-1628.patch, LUCENE-1628.txt A simple persian analyzer. i measured trec scores with the benchmark package below against http://ece.ut.ac.ir/DBRG/Hamshahri/ : SimpleAnalyzer: SUMMARY Search Seconds: 0.012 DocName Seconds:0.020 Num Points: 981.015 Num Good Points: 33.738 Max Good Points: 36.185 Average Precision: 0.374 MRR:0.667 Recall: 0.905 Precision At 1: 0.585 Precision At 2: 0.531 Precision At 3: 0.513 Precision At 4: 0.496 Precision At 5: 0.486 Precision At 6: 0.487 Precision At 7: 0.479 Precision At 8: 0.465 Precision At 9: 0.458 Precision At 10:0.460 Precision At 11:0.453 Precision At 12:0.453 Precision At 13:0.445 Precision At 14:0.438 Precision At 15:0.438 Precision At 16:0.438 Precision At 17:0.429 Precision At 18:0.429 Precision At 19:0.419 Precision At 20:0.415 PersianAnalyzer: SUMMARY Search Seconds: 0.004 DocName Seconds:0.011 Num Points: 987.692 Num Good Points: 36.123 Max Good Points: 36.185 Average Precision: 0.481 MRR:0.833 Recall: 0.998 Precision At 1: 0.754 Precision At 2: 0.715 Precision At 3: 0.646 Precision At 4: 0.646 Precision At 5: 0.631 Precision At 6: 0.621 Precision At 7: 0.593 Precision At 8: 0.577 Precision At 9: 0.573 Precision At 10:0.566 Precision At 11:0.572 Precision At 12:0.562 Precision At 13:0.554 Precision At 14:0.549 Precision At 15:0.542 Precision At 16:0.538 Precision At 17:0.533 Precision At 18:0.527 Precision At 19:0.525 Precision At 20:0.518 -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org For additional commands, e-mail: java-dev-h...@lucene.apache.org