[jira] Commented: (LUCENE-935) Improve maven artifacts
[ https://issues.apache.org/jira/browse/LUCENE-935?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12521710 ] Michael Busch commented on LUCENE-935: -- I'm planning to commit this together with LUCENE-908 in a day or so... Improve maven artifacts --- Key: LUCENE-935 URL: https://issues.apache.org/jira/browse/LUCENE-935 Project: Lucene - Java Issue Type: Improvement Components: Build Reporter: Michael Busch Assignee: Michael Busch Priority: Minor Fix For: 2.3 Attachments: lucene-935-rename-poms.patch, lucene-935.patch There are a couple of things we can improve for the next release: - *pom.xml files should be renamed to *pom.xml.template - artifacts lucene-parent should extend apache-parent - add source jars as artifacts - update generate-maven-artifacts task to work with latest version of maven-ant-tasks.jar - metadata filenames should not contain local -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
[jira] Created: (LUCENE-986) Refactor segmentInfos from IndexReader into its subclasses
Refactor segmentInfos from IndexReader into its subclasses -- Key: LUCENE-986 URL: https://issues.apache.org/jira/browse/LUCENE-986 Project: Lucene - Java Issue Type: Improvement Components: Index Reporter: Michael Busch Assignee: Michael Busch Priority: Minor Fix For: 2.3 References to segmentInfos in IndexReader cause different kinds of problems for subclasses of IndexReader, like e. g. MultiReader. Only subclasses of IndexReader that own the index directory, namely SegmentReader and MultiSegmentReader, should have a SegmentInfos object and be able to access it. Further information: http://www.gossamer-threads.com/lists/lucene/java-dev/51808 http://www.gossamer-threads.com/lists/lucene/java-user/52460 A part of the refactoring work was already done in LUCENE-781 -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Request to change coord similarity API:
I'm hoping that coord similarity API can be changed from: float coord(int overlap, int maxOverlap) TO float coord(int overlap, int maxOverlap, int docSize) Where docSize is the num Terms in the document/hit being evaluated for similarity to the query. The reason for this is that many people are using Lucene to match documents that are not web pages, and in these cases, the size of the query and the document MUST be similar sizes. For example ... If your documents are cars, and there's a 3 styles of a volvo wagon, say: - Volvo V70 Wagon (just the normal edition) - Volvo V70 Wagon Luxury Edition - Volvo V70 Wagon Luxury Edition Sports Pacakge AWD If somebody searches for a longer name, like Volvo V70 Wagon Luxury Edition Sports Pacakge AWD, then the normal edition Volvo V70 Wagon will be excluded most likely due to the coord factor only having 3/8 hits. **However**, in the reverse situation, if somebody wants to search for the normal wagon, Volvo V70 Wagon, it will match all 3 of these w/ the same score. Nothing can help here, changing lengthNorm to intentionally lower the score of car names as they get longer doesn't make sense, the Volvo V70 Wagon Luxury Edition Sports Pacakge AWD is just as much of a car as the Volvo V70 Wagon, so the lengthNorm is using the SweetSpot or Plateau methodology, and anything between 2 words and about 10 are all legit values. So, back to my orig request. By changing coord to also have the length of the matching document, it would allow coord to lower scores on docs that are not similar length to the orig query. Again, searching Volvo V70 Wagon, when the hit for Volvo V70 Wagon Luxury Edition Sports Pacakge AWD, is analyzed, the coord would tell me that it has 8 terms, vs the 3 that i'm looking for, and then i could apply any algorithm i want to reduce the hit score (in this case, most likely returning 3/8). However, if your application does consider those hits all the same, then u could leave its current implementation as is, and return a 1. Hopefully this makes sense. I'm (sort of) aware that this could be coded up myself by doing a custom query and scorer class, but I think it warrants being added to the abstract similarity class. I'm not a pro on lucene so I could be missing something, thank you for reading. Sincerely, John
[jira] Created: (LUCENE-987) Deprecate IndexModifier
Deprecate IndexModifier --- Key: LUCENE-987 URL: https://issues.apache.org/jira/browse/LUCENE-987 Project: Lucene - Java Issue Type: Test Components: Index Reporter: Ning Li Priority: Minor See discussion at http://www.gossamer-threads.com/lists/lucene/java-dev/52017?search_string=deprecating%20indexmodifier;#52017 This is to deprecate IndexModifier before 3.0 and remove it in 3.0. This patch includes: 1 IndexModifier and TestIndexModifier are deprecated. 2 TestIndexWriterModify is added. It is similar to TestIndexModifer but uses IndexWriter and has a few other changes. The changes are because of the difference between IndexModifier and IndexWriter. 3 TestIndexWriterLockRelease and TestStressIndexing are switched to use IndexWriter instead of IndexModifier. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
[jira] Updated: (LUCENE-987) Deprecate IndexModifier
[ https://issues.apache.org/jira/browse/LUCENE-987?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ning Li updated LUCENE-987: --- Attachment: deprecateIndexModifier.patch Deprecate IndexModifier --- Key: LUCENE-987 URL: https://issues.apache.org/jira/browse/LUCENE-987 Project: Lucene - Java Issue Type: Test Components: Index Reporter: Ning Li Priority: Minor Attachments: deprecateIndexModifier.patch See discussion at http://www.gossamer-threads.com/lists/lucene/java-dev/52017?search_string=deprecating%20indexmodifier;#52017 This is to deprecate IndexModifier before 3.0 and remove it in 3.0. This patch includes: 1 IndexModifier and TestIndexModifier are deprecated. 2 TestIndexWriterModify is added. It is similar to TestIndexModifer but uses IndexWriter and has a few other changes. The changes are because of the difference between IndexModifier and IndexWriter. 3 TestIndexWriterLockRelease and TestStressIndexing are switched to use IndexWriter instead of IndexModifier. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
[jira] Resolved: (LUCENE-935) Improve maven artifacts
[ https://issues.apache.org/jira/browse/LUCENE-935?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Michael Busch resolved LUCENE-935. -- Resolution: Fixed Committed. Revision: 568766 Improve maven artifacts --- Key: LUCENE-935 URL: https://issues.apache.org/jira/browse/LUCENE-935 Project: Lucene - Java Issue Type: Improvement Components: Build Reporter: Michael Busch Assignee: Michael Busch Priority: Minor Fix For: 2.3 Attachments: lucene-935-rename-poms.patch, lucene-935.patch There are a couple of things we can improve for the next release: - *pom.xml files should be renamed to *pom.xml.template - artifacts lucene-parent should extend apache-parent - add source jars as artifacts - update generate-maven-artifacts task to work with latest version of maven-ant-tasks.jar - metadata filenames should not contain local -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
[jira] Resolved: (LUCENE-908) MANIFEST.MF cleanup (main jar and luci customizations)
[ https://issues.apache.org/jira/browse/LUCENE-908?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Michael Busch resolved LUCENE-908. -- Resolution: Fixed Committed. Revision: 568766 MANIFEST.MF cleanup (main jar and luci customizations) -- Key: LUCENE-908 URL: https://issues.apache.org/jira/browse/LUCENE-908 Project: Lucene - Java Issue Type: Bug Components: Build Reporter: Michael Busch Assignee: Michael Busch Priority: Trivial Fix For: 2.3 Attachments: lucene-908-new.patch, lucene-908.patch, LUCENE-908.patch there are several problems with the MANIFEST.MF file used in the core jar, and some inconsistencies in th luci jar: Lucli's build.xml has an own jar target and does not use the jar target from common-build.xml. The result is that the MANIFEST.MF file is not consistent and the META-INF dir does not contain LICENSE.TXT and NOTICE.TXT. Is there a reason why lucli behaves different in this regard? If not I think we should fix this. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Hudson generated Javadocs not current?
While the main index page for the javadocs generated by hudson... http://lucene.zones.apache.org:8080/hudson/job/Lucene-Nightly/javadoc/overview-summary.html ...says it is from the 2007-08-23_02-46-02 build, some javadocs are clearly out of date, for example MultiReader still says it extends IndexReader, and there is not MultiSegmentReader.html file... http://lucene.zones.apache.org:8080/hudson/job/Lucene-Nightly/javadoc/org/apache/lucene/index/MultiReader.html ... the tgz download is slow, but based on the Workspace it does seem that the svn checkout and compilation of the new classes is working, the problem just seems to be with the javadocs (even in the workspace)... http://lucene.zones.apache.org:8080/hudson/job/Lucene-Nightly/ws/trunk/build/classes/java/org/apache/lucene/index/ http://lucene.zones.apache.org:8080/hudson/job/Lucene-Nightly/ws/trunk/build/docs/api/org/apache/lucene/index/MultiReader.html anyone (with Hudson admin access) have any idea what's going on? -Hoss - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
[jira] Commented: (LUCENE-986) Refactor segmentInfos from IndexReader into its subclasses
[ https://issues.apache.org/jira/browse/LUCENE-986?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12522025 ] Hoss Man commented on LUCENE-986: - one aspect of this that should be considered: It may not make sense for MultiReader to extend MultiSegmentReader ... as Michael says, only subclasses that own the index directory should have segmentInfos, and a MultiReader (as defined on the trunk now) can never own it's own directory. I haven't worked through all the implications, but perhaps the most logical refactoring would be... * IndexReader ...as abstract as possible given that we can't actually make methods abstract * DirectoryIndexReader extends IndexReader ...new class, encapsulated all the segmentInfos and locking logic currently in IndexReader (can definitely be made abstract if helpful) * SegmentReader extends DirectoryIndexReader * MultiSegmentReader extends DirectoryIndexReader * ParallelIndexReader extends IndexReader * FilterIndexReader extends IndexReader * MultiReader extends IndexReader ...(side note that i *really* haven't thought through completley: should MultiReader extend FilterIndexReader?) there would likely be some utlity functionality that could be reused between MultiReader and MultiSegmentReader ... possible as static methods in IndexReader (or a new util class) Refactor segmentInfos from IndexReader into its subclasses -- Key: LUCENE-986 URL: https://issues.apache.org/jira/browse/LUCENE-986 Project: Lucene - Java Issue Type: Improvement Components: Index Reporter: Michael Busch Assignee: Michael Busch Priority: Minor Fix For: 2.3 References to segmentInfos in IndexReader cause different kinds of problems for subclasses of IndexReader, like e. g. MultiReader. Only subclasses of IndexReader that own the index directory, namely SegmentReader and MultiSegmentReader, should have a SegmentInfos object and be able to access it. Further information: http://www.gossamer-threads.com/lists/lucene/java-dev/51808 http://www.gossamer-threads.com/lists/lucene/java-user/52460 A part of the refactoring work was already done in LUCENE-781 -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
[jira] Commented: (LUCENE-584) Decouple Filter from BitSet
[ https://issues.apache.org/jira/browse/LUCENE-584?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12522030 ] Hoss Man commented on LUCENE-584: - I, unfortunately, haven't had the time to read through everything in the latest patches, but catching up on my jira mail one of Paul's comments jumped out at me, so i wanted to make sure it's completley clear: this latest set of patches completely breaks backwards compatibility for any clients who have Filter subclasses, or methods that take a Filter as a param, since the Filter class now has an abstract getMatcher method and no longer supports an abstract BitSet method -- presumably the expectation being that all client code should have a search/replace done from Filter=BitSetFilter which begs the question: why not eliminate BitSetFilter and move it's getMatcher impl to the Filter class? (if the concern is just that there be a higher level class in which both methods are abstract, why not insert a parent with some new name above the Filter class?) For the record: it really bothers me that the old attachments got deleted ... the inability to refresh my memory by looking at the older patches and compare them with the current patches is extremely frustrating Decouple Filter from BitSet --- Key: LUCENE-584 URL: https://issues.apache.org/jira/browse/LUCENE-584 Project: Lucene - Java Issue Type: Improvement Components: Search Affects Versions: 2.0.1 Reporter: Peter Schäfer Priority: Minor Attachments: bench-diff.txt, bench-diff.txt, Matcher1-ground-20070730.patch, Matcher2-default-20070730.patch, Matcher3-core-20070730.patch, Matcher4-contrib-misc-20070730.patch, Matcher5-contrib-queries-20070730.patch, Matcher6-contrib-xml-20070730.patch, Some Matchers.zip {code} package org.apache.lucene.search; public abstract class Filter implements java.io.Serializable { public abstract AbstractBitSet bits(IndexReader reader) throws IOException; } public interface AbstractBitSet { public boolean get(int index); } {code} It would be useful if the method =Filter.bits()= returned an abstract interface, instead of =java.util.BitSet=. Use case: there is a very large index, and, depending on the user's privileges, only a small portion of the index is actually visible. Sparsely populated =java.util.BitSet=s are not efficient and waste lots of memory. It would be desirable to have an alternative BitSet implementation with smaller memory footprint. Though it _is_ possibly to derive classes from =java.util.BitSet=, it was obviously not designed for that purpose. That's why I propose to use an interface instead. The default implementation could still delegate to =java.util.BitSet=. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]