[jira] Commented: (LUCENE-935) Improve maven artifacts

2007-08-22 Thread Michael Busch (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-935?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12521710
 ] 

Michael Busch commented on LUCENE-935:
--

I'm planning to commit this together with LUCENE-908 in a day or so...

 Improve maven artifacts
 ---

 Key: LUCENE-935
 URL: https://issues.apache.org/jira/browse/LUCENE-935
 Project: Lucene - Java
  Issue Type: Improvement
  Components: Build
Reporter: Michael Busch
Assignee: Michael Busch
Priority: Minor
 Fix For: 2.3

 Attachments: lucene-935-rename-poms.patch, lucene-935.patch


 There are a couple of things we can improve for the next release:
 - *pom.xml files should be renamed to *pom.xml.template
 - artifacts lucene-parent should extend apache-parent
 - add source jars as artifacts
 - update generate-maven-artifacts task to work with latest version of 
 maven-ant-tasks.jar
 - metadata filenames should not contain local

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



[jira] Created: (LUCENE-986) Refactor segmentInfos from IndexReader into its subclasses

2007-08-22 Thread Michael Busch (JIRA)
Refactor segmentInfos from IndexReader into its subclasses
--

 Key: LUCENE-986
 URL: https://issues.apache.org/jira/browse/LUCENE-986
 Project: Lucene - Java
  Issue Type: Improvement
  Components: Index
Reporter: Michael Busch
Assignee: Michael Busch
Priority: Minor
 Fix For: 2.3


References to segmentInfos in IndexReader cause different kinds of problems
for subclasses of IndexReader, like e. g. MultiReader.

Only subclasses of IndexReader that own the index directory, namely 
SegmentReader and MultiSegmentReader, should have a SegmentInfos object
and be able to access it.

Further information:
http://www.gossamer-threads.com/lists/lucene/java-dev/51808
http://www.gossamer-threads.com/lists/lucene/java-user/52460

A part of the refactoring work was already done in LUCENE-781

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



Request to change coord similarity API:

2007-08-22 Thread John Kleven
I'm hoping that coord similarity API can be changed from:
float coord(int overlap, int maxOverlap)

TO

float coord(int overlap, int maxOverlap, int docSize)

Where docSize is the num Terms in the document/hit being evaluated for
similarity to the query.

The reason for this is that many people are using Lucene to match documents
that are not web pages, and in these cases, the size of the query and the
document MUST be similar sizes.  For example ...

If your documents are cars, and there's a 3 styles of a volvo wagon, say:
 - Volvo V70 Wagon   (just the normal edition)
 - Volvo V70 Wagon Luxury Edition
 - Volvo V70 Wagon Luxury Edition Sports Pacakge AWD

If somebody searches for a longer name, like Volvo V70 Wagon Luxury Edition
Sports Pacakge AWD, then the normal edition Volvo V70 Wagon will be
excluded most likely due to the coord factor only having 3/8 hits.

**However**, in the reverse situation, if somebody wants to search for the
normal wagon, Volvo V70 Wagon, it will match all 3 of these w/ the same
score.  Nothing can help here, changing lengthNorm to intentionally lower
the score of car names as they get longer doesn't make sense, the Volvo V70
Wagon Luxury Edition Sports Pacakge AWD is just as much of a car as the
Volvo V70 Wagon, so the lengthNorm is using the SweetSpot or Plateau
methodology, and anything between 2 words and about 10 are all legit values.

So, back to my orig request.  By changing coord to also have the length of
the matching document, it would allow coord to lower scores on docs that are
not similar length to the orig query.  Again, searching Volvo V70 Wagon,
when the hit for Volvo V70 Wagon Luxury Edition Sports Pacakge AWD, is
analyzed, the coord would tell me that it has 8 terms, vs the 3 that i'm
looking for, and then i could apply any algorithm i want to reduce the hit
score (in this case, most likely returning 3/8).  However, if your
application does consider those hits all the same, then u could leave its
current implementation as is, and return a 1.

Hopefully this makes sense.  I'm (sort of) aware that this could be coded up
myself by doing a custom query and scorer class, but I think it warrants
being added to the abstract similarity class.  I'm not a pro on lucene so I
could be missing something, thank you for reading.

Sincerely,
John


[jira] Created: (LUCENE-987) Deprecate IndexModifier

2007-08-22 Thread Ning Li (JIRA)
Deprecate IndexModifier
---

 Key: LUCENE-987
 URL: https://issues.apache.org/jira/browse/LUCENE-987
 Project: Lucene - Java
  Issue Type: Test
  Components: Index
Reporter: Ning Li
Priority: Minor


See discussion at 
http://www.gossamer-threads.com/lists/lucene/java-dev/52017?search_string=deprecating%20indexmodifier;#52017

This is to deprecate IndexModifier before 3.0 and remove it in 3.0.

This patch includes:
  1 IndexModifier and TestIndexModifier are deprecated.
  2 TestIndexWriterModify is added. It is similar to TestIndexModifer but uses 
IndexWriter and has a few other changes. The changes are because of the 
difference between IndexModifier and IndexWriter.
  3 TestIndexWriterLockRelease and TestStressIndexing are switched to use 
IndexWriter instead of IndexModifier.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



[jira] Updated: (LUCENE-987) Deprecate IndexModifier

2007-08-22 Thread Ning Li (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-987?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ning Li updated LUCENE-987:
---

Attachment: deprecateIndexModifier.patch

 Deprecate IndexModifier
 ---

 Key: LUCENE-987
 URL: https://issues.apache.org/jira/browse/LUCENE-987
 Project: Lucene - Java
  Issue Type: Test
  Components: Index
Reporter: Ning Li
Priority: Minor
 Attachments: deprecateIndexModifier.patch


 See discussion at 
 http://www.gossamer-threads.com/lists/lucene/java-dev/52017?search_string=deprecating%20indexmodifier;#52017
 This is to deprecate IndexModifier before 3.0 and remove it in 3.0.
 This patch includes:
   1 IndexModifier and TestIndexModifier are deprecated.
   2 TestIndexWriterModify is added. It is similar to TestIndexModifer but 
 uses IndexWriter and has a few other changes. The changes are because of the 
 difference between IndexModifier and IndexWriter.
   3 TestIndexWriterLockRelease and TestStressIndexing are switched to use 
 IndexWriter instead of IndexModifier.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



[jira] Resolved: (LUCENE-935) Improve maven artifacts

2007-08-22 Thread Michael Busch (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-935?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Michael Busch resolved LUCENE-935.
--

Resolution: Fixed

Committed. Revision: 568766

 Improve maven artifacts
 ---

 Key: LUCENE-935
 URL: https://issues.apache.org/jira/browse/LUCENE-935
 Project: Lucene - Java
  Issue Type: Improvement
  Components: Build
Reporter: Michael Busch
Assignee: Michael Busch
Priority: Minor
 Fix For: 2.3

 Attachments: lucene-935-rename-poms.patch, lucene-935.patch


 There are a couple of things we can improve for the next release:
 - *pom.xml files should be renamed to *pom.xml.template
 - artifacts lucene-parent should extend apache-parent
 - add source jars as artifacts
 - update generate-maven-artifacts task to work with latest version of 
 maven-ant-tasks.jar
 - metadata filenames should not contain local

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



[jira] Resolved: (LUCENE-908) MANIFEST.MF cleanup (main jar and luci customizations)

2007-08-22 Thread Michael Busch (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-908?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Michael Busch resolved LUCENE-908.
--

Resolution: Fixed

Committed. Revision: 568766

 MANIFEST.MF cleanup (main jar and luci customizations)
 --

 Key: LUCENE-908
 URL: https://issues.apache.org/jira/browse/LUCENE-908
 Project: Lucene - Java
  Issue Type: Bug
  Components: Build
Reporter: Michael Busch
Assignee: Michael Busch
Priority: Trivial
 Fix For: 2.3

 Attachments: lucene-908-new.patch, lucene-908.patch, LUCENE-908.patch


 there are several problems with the MANIFEST.MF file used in the core jar, 
 and some inconsistencies in th luci jar:
 Lucli's build.xml has an own jar target and does not use the jar target 
 from common-build.xml. The result
 is that the MANIFEST.MF file is not consistent and the META-INF dir does not 
 contain LICENSE.TXT and NOTICE.TXT.
 Is there a reason why lucli behaves different in this regard? If not I think 
 we should fix this.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



Hudson generated Javadocs not current?

2007-08-22 Thread Chris Hostetter

While the main index page for the javadocs generated by hudson...

http://lucene.zones.apache.org:8080/hudson/job/Lucene-Nightly/javadoc/overview-summary.html

...says it is from the 2007-08-23_02-46-02 build, some javadocs are
clearly out of date, for example MultiReader still says it extends
IndexReader, and there is not MultiSegmentReader.html file...

http://lucene.zones.apache.org:8080/hudson/job/Lucene-Nightly/javadoc/org/apache/lucene/index/MultiReader.html

... the tgz download is slow, but based on the Workspace it does seem
that the svn checkout and compilation of the new classes is working, the
problem just seems to be with the javadocs (even in the workspace)...

http://lucene.zones.apache.org:8080/hudson/job/Lucene-Nightly/ws/trunk/build/classes/java/org/apache/lucene/index/
http://lucene.zones.apache.org:8080/hudson/job/Lucene-Nightly/ws/trunk/build/docs/api/org/apache/lucene/index/MultiReader.html


anyone (with Hudson admin access) have any idea what's going on?

-Hoss


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



[jira] Commented: (LUCENE-986) Refactor segmentInfos from IndexReader into its subclasses

2007-08-22 Thread Hoss Man (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-986?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12522025
 ] 

Hoss Man commented on LUCENE-986:
-

one aspect of this that should be considered: It may not make sense for 
MultiReader to extend MultiSegmentReader ... as Michael says, only subclasses 
that own the index directory should have segmentInfos, and a MultiReader (as 
defined on the trunk now) can never own it's own directory.

I haven't worked through all the implications, but perhaps the most logical 
refactoring would be...

 * IndexReader 
...as abstract as possible given that we can't actually make methods 
abstract
* DirectoryIndexReader extends IndexReader
   ...new class, encapsulated all the segmentInfos and locking logic 
currently in 
  IndexReader (can definitely be made abstract if helpful)
   * SegmentReader extends DirectoryIndexReader
   * MultiSegmentReader extends DirectoryIndexReader
* ParallelIndexReader extends IndexReader
* FilterIndexReader extends IndexReader
* MultiReader extends IndexReader
   ...(side note that i *really* haven't thought through completley: should 
  MultiReader extend FilterIndexReader?)

there would likely be some utlity functionality that could be reused between 
MultiReader and MultiSegmentReader ... possible as static methods in 
IndexReader (or a new util class)



 Refactor segmentInfos from IndexReader into its subclasses
 --

 Key: LUCENE-986
 URL: https://issues.apache.org/jira/browse/LUCENE-986
 Project: Lucene - Java
  Issue Type: Improvement
  Components: Index
Reporter: Michael Busch
Assignee: Michael Busch
Priority: Minor
 Fix For: 2.3


 References to segmentInfos in IndexReader cause different kinds of problems
 for subclasses of IndexReader, like e. g. MultiReader.
 Only subclasses of IndexReader that own the index directory, namely 
 SegmentReader and MultiSegmentReader, should have a SegmentInfos object
 and be able to access it.
 Further information:
 http://www.gossamer-threads.com/lists/lucene/java-dev/51808
 http://www.gossamer-threads.com/lists/lucene/java-user/52460
 A part of the refactoring work was already done in LUCENE-781

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



[jira] Commented: (LUCENE-584) Decouple Filter from BitSet

2007-08-22 Thread Hoss Man (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-584?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12522030
 ] 

Hoss Man commented on LUCENE-584:
-

I, unfortunately, haven't had the time to read through everything in the latest 
patches, but catching up on my jira mail one of Paul's comments jumped out at 
me, so i wanted to make sure it's completley clear: this latest set of patches 
completely breaks backwards compatibility for any clients who have Filter 
subclasses, or methods that take a Filter as a param, since the Filter class 
now has an abstract getMatcher method and no longer supports an abstract BitSet 
method -- presumably the expectation being that all client code should have a 
search/replace done from Filter=BitSetFilter

which begs the question: why not eliminate BitSetFilter and move it's 
getMatcher impl to the Filter class?  (if the concern is just that there be a 
higher level class in which both methods are abstract, why not insert a 
parent with some new name above the Filter class?)




For the record: it really bothers me that the old attachments got deleted ... 
the inability to refresh my memory by looking at the older patches and compare 
them with the current patches is extremely frustrating

 Decouple Filter from BitSet
 ---

 Key: LUCENE-584
 URL: https://issues.apache.org/jira/browse/LUCENE-584
 Project: Lucene - Java
  Issue Type: Improvement
  Components: Search
Affects Versions: 2.0.1
Reporter: Peter Schäfer
Priority: Minor
 Attachments: bench-diff.txt, bench-diff.txt, 
 Matcher1-ground-20070730.patch, Matcher2-default-20070730.patch, 
 Matcher3-core-20070730.patch, Matcher4-contrib-misc-20070730.patch, 
 Matcher5-contrib-queries-20070730.patch, Matcher6-contrib-xml-20070730.patch, 
 Some Matchers.zip


 {code}
 package org.apache.lucene.search;
 public abstract class Filter implements java.io.Serializable 
 {
   public abstract AbstractBitSet bits(IndexReader reader) throws IOException;
 }
 public interface AbstractBitSet 
 {
   public boolean get(int index);
 }
 {code}
 It would be useful if the method =Filter.bits()= returned an abstract 
 interface, instead of =java.util.BitSet=.
 Use case: there is a very large index, and, depending on the user's 
 privileges, only a small portion of the index is actually visible.
 Sparsely populated =java.util.BitSet=s are not efficient and waste lots of 
 memory. It would be desirable to have an alternative BitSet implementation 
 with smaller memory footprint.
 Though it _is_ possibly to derive classes from =java.util.BitSet=, it was 
 obviously not designed for that purpose.
 That's why I propose to use an interface instead. The default implementation 
 could still delegate to =java.util.BitSet=.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]