Hudson build is back to normal : Solr-3.x #33
See http://hudson.zones.apache.org/hudson/job/Solr-3.x/33/ - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] Commented: (LUCENE-2492) Make PulsingCodec (wrapping StandardCodec) the default codec
[ https://issues.apache.org/jira/browse/LUCENE-2492?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12876577#action_12876577 ] Andrzej Bialecki commented on LUCENE-2492: --- How about adding some metadata to SegmentInfos ... if we figure out how to proceed with LUCENE-2491 then SegmentInfos could keep the list of codecs per file plus their init args. Make PulsingCodec (wrapping StandardCodec) the default codec Key: LUCENE-2492 URL: https://issues.apache.org/jira/browse/LUCENE-2492 Project: Lucene - Java Issue Type: Improvement Components: Index Affects Versions: 4.0 Reporter: Michael McCandless Assignee: Michael McCandless Fix For: 4.0 PulsingCodec can provides good gains, by inlining the postings into the terms dict for rare terms. This is especially helpful for primary key like fields, since every term is rare and batch lookups are common (see http://chbits.blogspot.com/2010/06/lucenes-pulsingcodec-on-primary-key.html for a simple perf test), but it should also be a gain for ordinary fields, thanks to Zipf's law. I think we should make it the default -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
Re: [BUG/ISSUE] Distributed Search doesn't response the result set when use existing lucene index
Are you sure you have uniqueKey field in your lucene index? Distributed search needs it. Koji Sekiguchi from mobile On 2010/06/08, at 15:52, Scott Zhang macromars...@gmail.com wrote: Hi. All. I am coming from solr user mailing list. I got a problem with distributed search. Looks it is BUG/ISSUE in solr itself. I am trying to use solr to search over 2 lucene indexes. I am following the solr tutorial and test the distributed search example. It works. Then I am using my own lucene indexes. Search in each solr instance works and return the expected result. But when I do distributed search using shards. It only return the numFound=14. But the result contain nothing. The doc in my existing lucene indexes, when search with distributed search, none of them are returned. But the docs inserted from solr post.jar are returned successfully. Don't know why. looks the lucene docs has some difference from solr's lucene. And my situation is, I already have 72 indexes folders which occupy lots of disk and repost them to solr will take very long time, so I have to stick with my existing index. Is there a solution for this? Thanks. Regards. - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
Re: [BUG/ISSUE] Distributed Search doesn't response the result set when use existing lucene index
Hi. Koji. Not sure how to set uniqueKey field in my lucene index. I am creating it by using lucene.net Document doc = new Document(); doc.Add(new Field(id, product_obj.product_id.ToString(), Field.Store.YES, Field.Index.UN_TOKENIZED)); doc.Add(new Field(type, product, Field.Store.YES, Field.Index.UN_TOKENIZED)); Field bField = new Field(keyword_level1, product_obj.title, Field.Store.NO, Field.Index.ANALYZED); bField.SetBoost(10.0F); doc.Add(bField); //keyword_level1 doc.Add(new Field(keyword_level1, product_obj.sku, Field.Store.NO, Field.Index.NOT_ANALYZED)); if (product_obj.is_zuup) { doc.Add(new Field(keyword_level1, zuup, Field.Store.NO, Field.Index.NOT_ANALYZED)); } Regards. Scott On Tue, Jun 8, 2010 at 3:25 PM, Koji Sekiguchi k...@r.email.ne.jp wrote: Are you sure you have uniqueKey field in your lucene index? Distributed search needs it. Koji Sekiguchi from mobile On 2010/06/08, at 15:52, Scott Zhang macromars...@gmail.com wrote: Hi. All. I am coming from solr user mailing list. I got a problem with distributed search. Looks it is BUG/ISSUE in solr itself. I am trying to use solr to search over 2 lucene indexes. I am following the solr tutorial and test the distributed search example. It works. Then I am using my own lucene indexes. Search in each solr instance works and return the expected result. But when I do distributed search using shards. It only return the numFound=14. But the result contain nothing. The doc in my existing lucene indexes, when search with distributed search, none of them are returned. But the docs inserted from solr post.jar are returned successfully. Don't know why. looks the lucene docs has some difference from solr's lucene. And my situation is, I already have 72 indexes folders which occupy lots of disk and repost them to solr will take very long time, so I have to stick with my existing index. Is there a solution for this? Thanks. Regards. - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
Re: [BUG/ISSUE] Distributed Search doesn't response the result set when use existing lucene index
I checked my existing lucene indexes. All the ID field are stored but not indexed. I don't want to rebuild these indexes as it will take days. Can solr be changed a little let ID be not indexed? Thanks. On Tue, Jun 8, 2010 at 3:30 PM, Scott Zhang macromars...@gmail.com wrote: Hi. Koji. Not sure how to set uniqueKey field in my lucene index. I am creating it by using lucene.net Document doc = new Document(); doc.Add(new Field(id, product_obj.product_id.ToString(), Field.Store.YES, Field.Index.UN_TOKENIZED)); doc.Add(new Field(type, product, Field.Store.YES, Field.Index.UN_TOKENIZED)); Field bField = new Field(keyword_level1, product_obj.title, Field.Store.NO, Field.Index.ANALYZED); bField.SetBoost(10.0F); doc.Add(bField); //keyword_level1 doc.Add(new Field(keyword_level1, product_obj.sku, Field.Store.NO, Field.Index.NOT_ANALYZED)); if (product_obj.is_zuup) { doc.Add(new Field(keyword_level1, zuup, Field.Store.NO, Field.Index.NOT_ANALYZED)); } Regards. Scott On Tue, Jun 8, 2010 at 3:25 PM, Koji Sekiguchi k...@r.email.ne.jp wrote: Are you sure you have uniqueKey field in your lucene index? Distributed search needs it. Koji Sekiguchi from mobile On 2010/06/08, at 15:52, Scott Zhang macromars...@gmail.com wrote: Hi. All. I am coming from solr user mailing list. I got a problem with distributed search. Looks it is BUG/ISSUE in solr itself. I am trying to use solr to search over 2 lucene indexes. I am following the solr tutorial and test the distributed search example. It works. Then I am using my own lucene indexes. Search in each solr instance works and return the expected result. But when I do distributed search using shards. It only return the numFound=14. But the result contain nothing. The doc in my existing lucene indexes, when search with distributed search, none of them are returned. But the docs inserted from solr post.jar are returned successfully. Don't know why. looks the lucene docs has some difference from solr's lucene. And my situation is, I already have 72 indexes folders which occupy lots of disk and repost them to solr will take very long time, so I have to stick with my existing index. Is there a solution for this? Thanks. Regards. - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] Updated: (SOLR-1943) Disable clustering contrib in Solr trunk
[ https://issues.apache.org/jira/browse/SOLR-1943?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Uwe Schindler updated SOLR-1943: Attachment: SOLR-1943.patch This patch effectively adds a readme file and renames build.xml. Will commit soon, to be able to go forward with LUCENE-2484. Disable clustering contrib in Solr trunk Key: SOLR-1943 URL: https://issues.apache.org/jira/browse/SOLR-1943 Project: Solr Issue Type: Bug Components: contrib - Clustering Reporter: Uwe Schindler Assignee: Uwe Schindler Attachments: SOLR-1943.patch With LUCENE-2484, Lucene's trunk API changed incompatible. As the clustering contrib depends on a older carror2 jar file compoiled against an older version of Lucene (3.0), the tests failed to run (TermAttribute class removed). As we should be able to change the APIs in trunk without forcing external projects like carrot2 to update its internal stuff to work with Lucene trunk. The attached patch will simply rename build.xml to build.xml.disabled, so the module is simply no loger built. After we create a release branch out of trunk, wen can simply enable it again after upgrading the carror2.jar files. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] Resolved: (SOLR-1943) Disable clustering contrib in Solr trunk
[ https://issues.apache.org/jira/browse/SOLR-1943?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Uwe Schindler resolved SOLR-1943. - Resolution: Fixed Committed revision: 952613 Disable clustering contrib in Solr trunk Key: SOLR-1943 URL: https://issues.apache.org/jira/browse/SOLR-1943 Project: Solr Issue Type: Bug Components: contrib - Clustering Reporter: Uwe Schindler Assignee: Uwe Schindler Attachments: SOLR-1943.patch With LUCENE-2484, Lucene's trunk API changed incompatible. As the clustering contrib depends on a older carror2 jar file compoiled against an older version of Lucene (3.0), the tests failed to run (TermAttribute class removed). As we should be able to change the APIs in trunk without forcing external projects like carrot2 to update its internal stuff to work with Lucene trunk. The attached patch will simply rename build.xml to build.xml.disabled, so the module is simply no loger built. After we create a release branch out of trunk, wen can simply enable it again after upgrading the carror2.jar files. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] Updated: (LUCENE-2495) Add In/Out/putStream wrapper around Lucene IndexIn/Out/put
[ https://issues.apache.org/jira/browse/LUCENE-2495?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] John Wang updated LUCENE-2495: -- Attachment: lucene-iostream.tar Added classes: LuceneDirectoryInputStream and LuceneDirectoryOutputStream that decorate IndexInput and IndexOuput classes. Add In/Out/putStream wrapper around Lucene IndexIn/Out/put -- Key: LUCENE-2495 URL: https://issues.apache.org/jira/browse/LUCENE-2495 Project: Lucene - Java Issue Type: New Feature Components: Store Reporter: John Wang Attachments: lucene-iostream.tar Lucene Directory is an abstraction that builds IndexInput and IndexOutput instances. Sometimes it is useful to add in custom files in the index directory for custom searching. It is often useful in that case to have some sort of bridge between this and code that understand the regular java In/Out/putStream class. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] Created: (LUCENE-2495) Add In/Out/putStream wrapper around Lucene IndexIn/Out/put
Add In/Out/putStream wrapper around Lucene IndexIn/Out/put -- Key: LUCENE-2495 URL: https://issues.apache.org/jira/browse/LUCENE-2495 Project: Lucene - Java Issue Type: New Feature Components: Store Reporter: John Wang Attachments: lucene-iostream.tar Lucene Directory is an abstraction that builds IndexInput and IndexOutput instances. Sometimes it is useful to add in custom files in the index directory for custom searching. It is often useful in that case to have some sort of bridge between this and code that understand the regular java In/Out/putStream class. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
Re: Proposal: Scorer api change
Hi Shai: I am not sure I understand how changing Similarity would solve this problem, wouldn't you need the reader? As for PayloadTermQuery, payload is not always the most efficient way of storing such data, especially when number of terms numdocs. (I am not sure accessing the payload when you iterate is a good idea, but that is another discussion) Yes, what I described is exactly a simple CustomScoreQuery for a special use-case. The problem is also in CustomScoreQuery, where nextDoc and advance are calling the sub-scorers as a wrapper. This can be avoided if the Scorer returns an iterator instead. Separating scoring and doc iteration is a good idea anyway. I don't know the reason to combine them originally. Thanks -John On Tue, Jun 8, 2010 at 8:47 AM, Shai Erera ser...@gmail.com wrote: So wouldn't it make sense to add some method to Similarity? Which receives the doc Id in question maybe ... just thinking here. Factoring Scorer like you propose would create 3 objects for scoring/iterating: Scorer (which really becomes an iterator), Similarity and CustomScoreFunction ... Maybe you can use CustomScoreQuery? or PayloadTermQuery? depends how you compute your age decay function (where you pull the data about the age of the document). Shai On Tue, Jun 8, 2010 at 6:41 PM, John Wang john.w...@gmail.com wrote: Hi Shai: Similarity in many cases is not sufficient for scoring. For example, to implement age decaying of a document (very useful for corpuses like news or tweets), you want to project the raw tfidf score onto a time curve, say f(x), to do this, you'd have a custom scorer that decorates the underlying scorer from your say, boolean query: public float score(){ return myFunc(innerScorer.score()); } This is fine, but then you would have to do this as well: public int nextDoc(){ return innerScorer.nextDoc(); } and also: public int advance(int target){ return innerScorer.advance(); } The difference here is that nextDoc and advance are called far more times as score. And you are introducing an extra method call for them, which is not insignificant for queries result in large recall sets. Hope this makes sense. Thanks -John On Tue, Jun 8, 2010 at 5:02 AM, Shai Erera ser...@gmail.com wrote: I'm not sure I understand what you mean - Scorer is a DISI itself, and the scoring formula is mostly controlled by Similarity. What will be the benefits of the proposed change? Shai On Tue, Jun 8, 2010 at 8:25 AM, John Wang john.w...@gmail.com wrote: Hi guys: I'd like to make a proposal to change the Scorer class/api to the following: public abstract class Scorer{ DocIdSetIterator getDocIDSetIterator(); float score(int docid); } Reasons: 1) To build a Scorer from an existing Scorer (e.g. that produces raw scores from tfidf), one would decorate it, and it would introduce overhead (in function calls) around nextDoc and advance, even if you just want to augment the score method which is called much fewer times. 2) The current contract forces scoring on the currentDoc in the underlying iterator. So once you pass current, you can no longer score. In one of our use-cases, it is very inconvenient. What do you think? I can go ahead and open an issue and work on a patch if I get some agreement. Thanks -John
Re: any lucene 2.9.3 RC already available?
thanks! found it already On Tue, Jun 8, 2010 at 6:33 PM, Michael McCandless luc...@mikemccandless.com wrote: It's gene...@lucene.apache.org -- that list has been around forever. We are generally (heh) supposed to do the release vote on general, but in the past we have also done it on dev. If you search on Lucid, you'll find this thread and the VOTE thread on general: http://www.lucidimagination.com/search/?q=vote+release+2.9.3 Mike On Tue, Jun 8, 2010 at 12:22 PM, jm jmugur...@gmail.com wrote: thanks Mike. which list is that? In the past I have seen these sort of threads in this list. I guess it changed with the merge and now I am missing some other list? On Tue, Jun 8, 2010 at 6:19 PM, Michael McCandless luc...@mikemccandless.com wrote: Yes, see the thread [VOTE] Apache Lucene Java 2.9.3 and 3.0.2 artifacts to be released on the general list; it has links to the RC changes. Mike On Tue, Jun 8, 2010 at 12:11 PM, jm jmugur...@gmail.com wrote: Hi, I think I read 2.9.3 was about to be released soon. I am chasing some memory issue in our process and it looks like https://issues.apache.org/jira/browse/LUCENE-2467 could be a culprit, so is there already any RC I could try? It does not matter if it's not official yet. thanks javi - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
Re: Proposal: Scorer api change
re: But Scorer is itself an iterator, so what prevents you from calling nextDoc and advance on it without score() Nothing. It is just inefficient to pay the method call overhead just to overload score. re: If I were in your shoes, I'd simply provider a Query wrapper. If CSQ is not good enough I'd just develop my own. That is what I am doing. I am just proposing the change (see my first email) as an improvement. re: Scorer is itself an iterator yes, that is the current definition. The point of the proposal is to make this change. -John On Tue, Jun 8, 2010 at 9:45 AM, Shai Erera ser...@gmail.com wrote: Well … I don't know the reason as well and always thought Scorer and Similarity are confusing. But Scorer is itself an iterator, so what prevents you from calling nextDoc and advance on it without score(). And what would the returned DISI do when nextDoc is called, if not delegate to its subs? If I were in your shoes, I'd simply provider a Query wrapper. If CSQ is not good enough I'd just develop my own. But perhaps others think differently? Shai On Tuesday, June 8, 2010, John Wang john.w...@gmail.com wrote: Hi Shai: I am not sure I understand how changing Similarity would solve this problem, wouldn't you need the reader? As for PayloadTermQuery, payload is not always the most efficient way of storing such data, especially when number of terms numdocs. (I am not sure accessing the payload when you iterate is a good idea, but that is another discussion) Yes, what I described is exactly a simple CustomScoreQuery for a special use-case. The problem is also in CustomScoreQuery, where nextDoc and advance are calling the sub-scorers as a wrapper. This can be avoided if the Scorer returns an iterator instead. Separating scoring and doc iteration is a good idea anyway. I don't know the reason to combine them originally. Thanks -John On Tue, Jun 8, 2010 at 8:47 AM, Shai Erera ser...@gmail.com wrote: So wouldn't it make sense to add some method to Similarity? Which receives the doc Id in question maybe ... just thinking here. Factoring Scorer like you propose would create 3 objects for scoring/iterating: Scorer (which really becomes an iterator), Similarity and CustomScoreFunction ... Maybe you can use CustomScoreQuery? or PayloadTermQuery? depends how you compute your age decay function (where you pull the data about the age of the document). Shai On Tue, Jun 8, 2010 at 6:41 PM, John Wang john.w...@gmail.com wrote: Hi Shai: Similarity in many cases is not sufficient for scoring. For example, to implement age decaying of a document (very useful for corpuses like news or tweets), you want to project the raw tfidf score onto a time curve, say f(x), to do this, you'd have a custom scorer that decorates the underlying scorer from your say, boolean query: public float score(){return myFunc(innerScorer.score());} This is fine, but then you would have to do this as well: public int nextDoc(){ return innerScorer.nextDoc();} and also: public int advance(int target){ return innerScorer.advance();} The difference here is that nextDoc and advance are called far more times as score. And you are introducing an extra method call for them, which is not insignificant for queries result in large recall sets. Hope this makes sense. Thanks -John On Tue, Jun 8, 2010 at 5:02 AM, Shai Erera ser...@gmail.com wrote: I'm not sure I understand what you mean - Scorer is a DISI itself, and the scoring formula is mostly controlled by Similarity. What will be the benefits of the proposed change? Shai On Tue, Jun 8, 2010 at 8:25 AM, John Wang john.w...@gmail.com wrote: Hi guys: I'd like to make a proposal to change the Scorer class/api to the following: public abstract class Scorer{ DocIdSetIterator getDocIDSetIterator(); float score(int docid); } Reasons: 1) To build a Scorer from an existing Scorer (e.g. that produces raw scores from tfidf), one would decorate it, and it would introduce overhead (in function calls) around nextDoc and advance, even if you just want to augment the score method which is called much fewer times. 2) The current contract forces scoring on the currentDoc in the underlying iterator. So once you pass current, you can no longer score. In one of our use-cases, it is very inconvenient. What do you think? I can go ahead and open an issue and work on a patch if I get some agreement. Thanks -John - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
Re: Proposal: Scorer api change
I guess I must be missing something fundamental here :). If Scorer is defined as you propose, and I create my Scorer which impls getDISI() as return this - what do I lose? What's wrong w/ Scorer already being a DISI? You mention it is just inefficient to pay the method call overhead ... - what overhead? Are you talking about the decorator delegating the call to the wrapped scorer? I really think the compiler can handle that, no? Especially if you make your nextDoc/advance final (which probably you should) ... That doesn't seem to justify an API change, break bw completely (even if we do it in 4.0 only) and change all the current Scorers ... Shai On Tue, Jun 8, 2010 at 8:01 PM, John Wang john.w...@gmail.com wrote: re: But Scorer is itself an iterator, so what prevents you from calling nextDoc and advance on it without score() Nothing. It is just inefficient to pay the method call overhead just to overload score. re: If I were in your shoes, I'd simply provider a Query wrapper. If CSQ is not good enough I'd just develop my own. That is what I am doing. I am just proposing the change (see my first email) as an improvement. re: Scorer is itself an iterator yes, that is the current definition. The point of the proposal is to make this change. -John On Tue, Jun 8, 2010 at 9:45 AM, Shai Erera ser...@gmail.com wrote: Well … I don't know the reason as well and always thought Scorer and Similarity are confusing. But Scorer is itself an iterator, so what prevents you from calling nextDoc and advance on it without score(). And what would the returned DISI do when nextDoc is called, if not delegate to its subs? If I were in your shoes, I'd simply provider a Query wrapper. If CSQ is not good enough I'd just develop my own. But perhaps others think differently? Shai On Tuesday, June 8, 2010, John Wang john.w...@gmail.com wrote: Hi Shai: I am not sure I understand how changing Similarity would solve this problem, wouldn't you need the reader? As for PayloadTermQuery, payload is not always the most efficient way of storing such data, especially when number of terms numdocs. (I am not sure accessing the payload when you iterate is a good idea, but that is another discussion) Yes, what I described is exactly a simple CustomScoreQuery for a special use-case. The problem is also in CustomScoreQuery, where nextDoc and advance are calling the sub-scorers as a wrapper. This can be avoided if the Scorer returns an iterator instead. Separating scoring and doc iteration is a good idea anyway. I don't know the reason to combine them originally. Thanks -John On Tue, Jun 8, 2010 at 8:47 AM, Shai Erera ser...@gmail.com wrote: So wouldn't it make sense to add some method to Similarity? Which receives the doc Id in question maybe ... just thinking here. Factoring Scorer like you propose would create 3 objects for scoring/iterating: Scorer (which really becomes an iterator), Similarity and CustomScoreFunction ... Maybe you can use CustomScoreQuery? or PayloadTermQuery? depends how you compute your age decay function (where you pull the data about the age of the document). Shai On Tue, Jun 8, 2010 at 6:41 PM, John Wang john.w...@gmail.com wrote: Hi Shai: Similarity in many cases is not sufficient for scoring. For example, to implement age decaying of a document (very useful for corpuses like news or tweets), you want to project the raw tfidf score onto a time curve, say f(x), to do this, you'd have a custom scorer that decorates the underlying scorer from your say, boolean query: public float score(){return myFunc(innerScorer.score());} This is fine, but then you would have to do this as well: public int nextDoc(){ return innerScorer.nextDoc();} and also: public int advance(int target){ return innerScorer.advance();} The difference here is that nextDoc and advance are called far more times as score. And you are introducing an extra method call for them, which is not insignificant for queries result in large recall sets. Hope this makes sense. Thanks -John On Tue, Jun 8, 2010 at 5:02 AM, Shai Erera ser...@gmail.com wrote: I'm not sure I understand what you mean - Scorer is a DISI itself, and the scoring formula is mostly controlled by Similarity. What will be the benefits of the proposed change? Shai On Tue, Jun 8, 2010 at 8:25 AM, John Wang john.w...@gmail.com wrote: Hi guys: I'd like to make a proposal to change the Scorer class/api to the following: public abstract class Scorer{ DocIdSetIterator getDocIDSetIterator(); float score(int docid); } Reasons: 1) To build a Scorer from an existing Scorer (e.g. that produces raw scores from tfidf), one would decorate it, and it would introduce overhead (in function calls) around nextDoc and advance, even if you just want to augment the score method which
Re: Proposal: Scorer api change
The problem with your proposal is that, currently, Lucene uses current iteration state to compute score. I.e. it already knows which of SHOULD BQ clauses matched for current doc, so it's easier to calculate the score. If you change API to allow scoring arbitrary documents (even those that didn't match the query at all), you're opening a can of worms :) As an alternative, you can try looking at MG4J sources. As far as I understand, their scoring is decoupled from matching, just like you (and I bet many more people) want. The matcher is separate, and the scoring entity accepts current matcher state instead of document id, so you get the best of both worlds. On Tue, Jun 8, 2010 at 21:01, John Wang john.w...@gmail.com wrote: re: But Scorer is itself an iterator, so what prevents you from calling nextDoc and advance on it without score() Nothing. It is just inefficient to pay the method call overhead just to overload score. re: If I were in your shoes, I'd simply provider a Query wrapper. If CSQ is not good enough I'd just develop my own. That is what I am doing. I am just proposing the change (see my first email) as an improvement. re: Scorer is itself an iterator yes, that is the current definition. The point of the proposal is to make this change. -John On Tue, Jun 8, 2010 at 9:45 AM, Shai Erera ser...@gmail.com wrote: Well … I don't know the reason as well and always thought Scorer and Similarity are confusing. But Scorer is itself an iterator, so what prevents you from calling nextDoc and advance on it without score(). And what would the returned DISI do when nextDoc is called, if not delegate to its subs? If I were in your shoes, I'd simply provider a Query wrapper. If CSQ is not good enough I'd just develop my own. But perhaps others think differently? Shai On Tuesday, June 8, 2010, John Wang john.w...@gmail.com wrote: Hi Shai: I am not sure I understand how changing Similarity would solve this problem, wouldn't you need the reader? As for PayloadTermQuery, payload is not always the most efficient way of storing such data, especially when number of terms numdocs. (I am not sure accessing the payload when you iterate is a good idea, but that is another discussion) Yes, what I described is exactly a simple CustomScoreQuery for a special use-case. The problem is also in CustomScoreQuery, where nextDoc and advance are calling the sub-scorers as a wrapper. This can be avoided if the Scorer returns an iterator instead. Separating scoring and doc iteration is a good idea anyway. I don't know the reason to combine them originally. Thanks -John On Tue, Jun 8, 2010 at 8:47 AM, Shai Erera ser...@gmail.com wrote: So wouldn't it make sense to add some method to Similarity? Which receives the doc Id in question maybe ... just thinking here. Factoring Scorer like you propose would create 3 objects for scoring/iterating: Scorer (which really becomes an iterator), Similarity and CustomScoreFunction ... Maybe you can use CustomScoreQuery? or PayloadTermQuery? depends how you compute your age decay function (where you pull the data about the age of the document). Shai On Tue, Jun 8, 2010 at 6:41 PM, John Wang john.w...@gmail.com wrote: Hi Shai: Similarity in many cases is not sufficient for scoring. For example, to implement age decaying of a document (very useful for corpuses like news or tweets), you want to project the raw tfidf score onto a time curve, say f(x), to do this, you'd have a custom scorer that decorates the underlying scorer from your say, boolean query: public float score(){ return myFunc(innerScorer.score());} This is fine, but then you would have to do this as well: public int nextDoc(){ return innerScorer.nextDoc();} and also: public int advance(int target){ return innerScorer.advance();} The difference here is that nextDoc and advance are called far more times as score. And you are introducing an extra method call for them, which is not insignificant for queries result in large recall sets. Hope this makes sense. Thanks -John On Tue, Jun 8, 2010 at 5:02 AM, Shai Erera ser...@gmail.com wrote: I'm not sure I understand what you mean - Scorer is a DISI itself, and the scoring formula is mostly controlled by Similarity. What will be the benefits of the proposed change? Shai On Tue, Jun 8, 2010 at 8:25 AM, John Wang john.w...@gmail.com wrote: Hi guys: I'd like to make a proposal to change the Scorer class/api to the following: public abstract class Scorer{ DocIdSetIterator getDocIDSetIterator(); float score(int docid); } Reasons: 1) To build a Scorer from an existing Scorer (e.g. that produces raw scores from tfidf), one would decorate it, and it would introduce overhead (in function calls) around nextDoc and advance, even if you just
Re: Proposal: Scorer api change
Shai, his wrapper Scorer will just look like: DISI getDISI() { return delegate.getDISI(); } float score(int doc) { return calcMyAwesomeScore(doc); } this saves delegate.nextDoc(), delegate.advance() indirection calls. But I already offered a better alternative :) On Tue, Jun 8, 2010 at 21:09, Shai Erera ser...@gmail.com wrote: I guess I must be missing something fundamental here :). If Scorer is defined as you propose, and I create my Scorer which impls getDISI() as return this - what do I lose? What's wrong w/ Scorer already being a DISI? You mention it is just inefficient to pay the method call overhead ... - what overhead? Are you talking about the decorator delegating the call to the wrapped scorer? I really think the compiler can handle that, no? Especially if you make your nextDoc/advance final (which probably you should) ... That doesn't seem to justify an API change, break bw completely (even if we do it in 4.0 only) and change all the current Scorers ... Shai On Tue, Jun 8, 2010 at 8:01 PM, John Wang john.w...@gmail.com wrote: re: But Scorer is itself an iterator, so what prevents you from calling nextDoc and advance on it without score() Nothing. It is just inefficient to pay the method call overhead just to overload score. re: If I were in your shoes, I'd simply provider a Query wrapper. If CSQ is not good enough I'd just develop my own. That is what I am doing. I am just proposing the change (see my first email) as an improvement. re: Scorer is itself an iterator yes, that is the current definition. The point of the proposal is to make this change. -John On Tue, Jun 8, 2010 at 9:45 AM, Shai Erera ser...@gmail.com wrote: Well … I don't know the reason as well and always thought Scorer and Similarity are confusing. But Scorer is itself an iterator, so what prevents you from calling nextDoc and advance on it without score(). And what would the returned DISI do when nextDoc is called, if not delegate to its subs? If I were in your shoes, I'd simply provider a Query wrapper. If CSQ is not good enough I'd just develop my own. But perhaps others think differently? Shai On Tuesday, June 8, 2010, John Wang john.w...@gmail.com wrote: Hi Shai: I am not sure I understand how changing Similarity would solve this problem, wouldn't you need the reader? As for PayloadTermQuery, payload is not always the most efficient way of storing such data, especially when number of terms numdocs. (I am not sure accessing the payload when you iterate is a good idea, but that is another discussion) Yes, what I described is exactly a simple CustomScoreQuery for a special use-case. The problem is also in CustomScoreQuery, where nextDoc and advance are calling the sub-scorers as a wrapper. This can be avoided if the Scorer returns an iterator instead. Separating scoring and doc iteration is a good idea anyway. I don't know the reason to combine them originally. Thanks -John On Tue, Jun 8, 2010 at 8:47 AM, Shai Erera ser...@gmail.com wrote: So wouldn't it make sense to add some method to Similarity? Which receives the doc Id in question maybe ... just thinking here. Factoring Scorer like you propose would create 3 objects for scoring/iterating: Scorer (which really becomes an iterator), Similarity and CustomScoreFunction ... Maybe you can use CustomScoreQuery? or PayloadTermQuery? depends how you compute your age decay function (where you pull the data about the age of the document). Shai On Tue, Jun 8, 2010 at 6:41 PM, John Wang john.w...@gmail.com wrote: Hi Shai: Similarity in many cases is not sufficient for scoring. For example, to implement age decaying of a document (very useful for corpuses like news or tweets), you want to project the raw tfidf score onto a time curve, say f(x), to do this, you'd have a custom scorer that decorates the underlying scorer from your say, boolean query: public float score(){ return myFunc(innerScorer.score());} This is fine, but then you would have to do this as well: public int nextDoc(){ return innerScorer.nextDoc();} and also: public int advance(int target){ return innerScorer.advance();} The difference here is that nextDoc and advance are called far more times as score. And you are introducing an extra method call for them, which is not insignificant for queries result in large recall sets. Hope this makes sense. Thanks -John On Tue, Jun 8, 2010 at 5:02 AM, Shai Erera ser...@gmail.com wrote: I'm not sure I understand what you mean - Scorer is a DISI itself, and the scoring formula is mostly controlled by Similarity. What will be the benefits of the proposed change? Shai On Tue, Jun 8, 2010 at 8:25 AM, John Wang john.w...@gmail.com wrote: Hi guys: I'd like to make a proposal to change the Scorer class/api to the
Re: Proposal: Scorer api change
Shai: method call overhead in this case is not insignificant because it is in a very tight loop, and no, compiler cannot optimize it for you, we are not inline-ing cuz we are in a java world. You are right, this breaks backward compatibility. But from 2.4 - 2.9, we have done MUCH worse. :) -John On Tue, Jun 8, 2010 at 10:09 AM, Shai Erera ser...@gmail.com wrote: I guess I must be missing something fundamental here :). If Scorer is defined as you propose, and I create my Scorer which impls getDISI() as return this - what do I lose? What's wrong w/ Scorer already being a DISI? You mention it is just inefficient to pay the method call overhead ... - what overhead? Are you talking about the decorator delegating the call to the wrapped scorer? I really think the compiler can handle that, no? Especially if you make your nextDoc/advance final (which probably you should) ... That doesn't seem to justify an API change, break bw completely (even if we do it in 4.0 only) and change all the current Scorers ... Shai On Tue, Jun 8, 2010 at 8:01 PM, John Wang john.w...@gmail.com wrote: re: But Scorer is itself an iterator, so what prevents you from calling nextDoc and advance on it without score() Nothing. It is just inefficient to pay the method call overhead just to overload score. re: If I were in your shoes, I'd simply provider a Query wrapper. If CSQ is not good enough I'd just develop my own. That is what I am doing. I am just proposing the change (see my first email) as an improvement. re: Scorer is itself an iterator yes, that is the current definition. The point of the proposal is to make this change. -John On Tue, Jun 8, 2010 at 9:45 AM, Shai Erera ser...@gmail.com wrote: Well … I don't know the reason as well and always thought Scorer and Similarity are confusing. But Scorer is itself an iterator, so what prevents you from calling nextDoc and advance on it without score(). And what would the returned DISI do when nextDoc is called, if not delegate to its subs? If I were in your shoes, I'd simply provider a Query wrapper. If CSQ is not good enough I'd just develop my own. But perhaps others think differently? Shai On Tuesday, June 8, 2010, John Wang john.w...@gmail.com wrote: Hi Shai: I am not sure I understand how changing Similarity would solve this problem, wouldn't you need the reader? As for PayloadTermQuery, payload is not always the most efficient way of storing such data, especially when number of terms numdocs. (I am not sure accessing the payload when you iterate is a good idea, but that is another discussion) Yes, what I described is exactly a simple CustomScoreQuery for a special use-case. The problem is also in CustomScoreQuery, where nextDoc and advance are calling the sub-scorers as a wrapper. This can be avoided if the Scorer returns an iterator instead. Separating scoring and doc iteration is a good idea anyway. I don't know the reason to combine them originally. Thanks -John On Tue, Jun 8, 2010 at 8:47 AM, Shai Erera ser...@gmail.com wrote: So wouldn't it make sense to add some method to Similarity? Which receives the doc Id in question maybe ... just thinking here. Factoring Scorer like you propose would create 3 objects for scoring/iterating: Scorer (which really becomes an iterator), Similarity and CustomScoreFunction ... Maybe you can use CustomScoreQuery? or PayloadTermQuery? depends how you compute your age decay function (where you pull the data about the age of the document). Shai On Tue, Jun 8, 2010 at 6:41 PM, John Wang john.w...@gmail.com wrote: Hi Shai: Similarity in many cases is not sufficient for scoring. For example, to implement age decaying of a document (very useful for corpuses like news or tweets), you want to project the raw tfidf score onto a time curve, say f(x), to do this, you'd have a custom scorer that decorates the underlying scorer from your say, boolean query: public float score(){return myFunc(innerScorer.score());} This is fine, but then you would have to do this as well: public int nextDoc(){ return innerScorer.nextDoc();} and also: public int advance(int target){ return innerScorer.advance();} The difference here is that nextDoc and advance are called far more times as score. And you are introducing an extra method call for them, which is not insignificant for queries result in large recall sets. Hope this makes sense. Thanks -John On Tue, Jun 8, 2010 at 5:02 AM, Shai Erera ser...@gmail.com wrote: I'm not sure I understand what you mean - Scorer is a DISI itself, and the scoring formula is mostly controlled by Similarity. What will be the benefits of the proposed change? Shai On Tue, Jun 8, 2010 at 8:25 AM, John Wang john.w...@gmail.com wrote: Hi guys: I'd like to make a proposal to change the Scorer class/api to the
Re: Proposal: Scorer api change
Shai: Java cannot inline in this case. Actually there is an urban legend around using final to hint to underlying compiler to inline :) (turns out to be false, one reason being dynamic classloading) write a simple pgm and try and see for yourself (remember to turn on -server on VM options) -John On Tue, Jun 8, 2010 at 10:28 AM, Shai Erera ser...@gmail.com wrote: What do you mean we are not inlining? The compiler inlines methods .. at least it tries. Shai On Tue, Jun 8, 2010 at 8:21 PM, John Wang john.w...@gmail.com wrote: Shai: method call overhead in this case is not insignificant because it is in a very tight loop, and no, compiler cannot optimize it for you, we are not inline-ing cuz we are in a java world. You are right, this breaks backward compatibility. But from 2.4 - 2.9, we have done MUCH worse. :) -John On Tue, Jun 8, 2010 at 10:09 AM, Shai Erera ser...@gmail.com wrote: I guess I must be missing something fundamental here :). If Scorer is defined as you propose, and I create my Scorer which impls getDISI() as return this - what do I lose? What's wrong w/ Scorer already being a DISI? You mention it is just inefficient to pay the method call overhead ... - what overhead? Are you talking about the decorator delegating the call to the wrapped scorer? I really think the compiler can handle that, no? Especially if you make your nextDoc/advance final (which probably you should) ... That doesn't seem to justify an API change, break bw completely (even if we do it in 4.0 only) and change all the current Scorers ... Shai On Tue, Jun 8, 2010 at 8:01 PM, John Wang john.w...@gmail.com wrote: re: But Scorer is itself an iterator, so what prevents you from calling nextDoc and advance on it without score() Nothing. It is just inefficient to pay the method call overhead just to overload score. re: If I were in your shoes, I'd simply provider a Query wrapper. If CSQ is not good enough I'd just develop my own. That is what I am doing. I am just proposing the change (see my first email) as an improvement. re: Scorer is itself an iterator yes, that is the current definition. The point of the proposal is to make this change. -John On Tue, Jun 8, 2010 at 9:45 AM, Shai Erera ser...@gmail.com wrote: Well … I don't know the reason as well and always thought Scorer and Similarity are confusing. But Scorer is itself an iterator, so what prevents you from calling nextDoc and advance on it without score(). And what would the returned DISI do when nextDoc is called, if not delegate to its subs? If I were in your shoes, I'd simply provider a Query wrapper. If CSQ is not good enough I'd just develop my own. But perhaps others think differently? Shai On Tuesday, June 8, 2010, John Wang john.w...@gmail.com wrote: Hi Shai: I am not sure I understand how changing Similarity would solve this problem, wouldn't you need the reader? As for PayloadTermQuery, payload is not always the most efficient way of storing such data, especially when number of terms numdocs. (I am not sure accessing the payload when you iterate is a good idea, but that is another discussion) Yes, what I described is exactly a simple CustomScoreQuery for a special use-case. The problem is also in CustomScoreQuery, where nextDoc and advance are calling the sub-scorers as a wrapper. This can be avoided if the Scorer returns an iterator instead. Separating scoring and doc iteration is a good idea anyway. I don't know the reason to combine them originally. Thanks -John On Tue, Jun 8, 2010 at 8:47 AM, Shai Erera ser...@gmail.com wrote: So wouldn't it make sense to add some method to Similarity? Which receives the doc Id in question maybe ... just thinking here. Factoring Scorer like you propose would create 3 objects for scoring/iterating: Scorer (which really becomes an iterator), Similarity and CustomScoreFunction ... Maybe you can use CustomScoreQuery? or PayloadTermQuery? depends how you compute your age decay function (where you pull the data about the age of the document). Shai On Tue, Jun 8, 2010 at 6:41 PM, John Wang john.w...@gmail.com wrote: Hi Shai: Similarity in many cases is not sufficient for scoring. For example, to implement age decaying of a document (very useful for corpuses like news or tweets), you want to project the raw tfidf score onto a time curve, say f(x), to do this, you'd have a custom scorer that decorates the underlying scorer from your say, boolean query: public float score(){return myFunc(innerScorer.score());} This is fine, but then you would have to do this as well: public int nextDoc(){ return innerScorer.nextDoc();} and also: public int advance(int target){ return innerScorer.advance();} The difference here is that nextDoc and advance are called far more times as score. And you are introducing an
Re: Proposal: Scorer api change
Wouldn't you get it as well with proposed api? You would still be able to iterate the doc and at that point call score with the docid. If you call score() along with iteration, you would still get the information no? Making scorer take a docid allows you score any docid in the reader if the query wants it to. Wouldn't it make it more flexible? -John On Tue, Jun 8, 2010 at 10:54 AM, Earwin Burrfoot ear...@gmail.com wrote: To compute a score you have to see which of your subqueries did not match, which did, and what are the docfreqs/positions for them. When iterating, and calling score() only for current doc - parts of this data (maybe even all of it, not sure) is already gathered for you. If you allow calling score(int doc) - for arbitrary docId, you'll have to redo this work. 2010/6/8 John Wang john.w...@gmail.com: Hi Earwin: I am not sure I understand here, e.g. what si the difference between: float myscorinCode(){ computeMyScore(scorer.score()); } and float myscorinCode(){ computeMyScore(scorer.score(scorer.getDocIdSetIterator().docID()); } In the case of BQ, when you get a hit, would you still be able to call subscorer.score(hit)? Why is the point of iteration important for BQ? please elaborate. Thanks -John On Tue, Jun 8, 2010 at 10:10 AM, Earwin Burrfoot ear...@gmail.com wrote: The problem with your proposal is that, currently, Lucene uses current iteration state to compute score. I.e. it already knows which of SHOULD BQ clauses matched for current doc, so it's easier to calculate the score. If you change API to allow scoring arbitrary documents (even those that didn't match the query at all), you're opening a can of worms :) As an alternative, you can try looking at MG4J sources. As far as I understand, their scoring is decoupled from matching, just like you (and I bet many more people) want. The matcher is separate, and the scoring entity accepts current matcher state instead of document id, so you get the best of both worlds. On Tue, Jun 8, 2010 at 21:01, John Wang john.w...@gmail.com wrote: re: But Scorer is itself an iterator, so what prevents you from calling nextDoc and advance on it without score() Nothing. It is just inefficient to pay the method call overhead just to overload score. re: If I were in your shoes, I'd simply provider a Query wrapper. If CSQ is not good enough I'd just develop my own. That is what I am doing. I am just proposing the change (see my first email) as an improvement. re: Scorer is itself an iterator yes, that is the current definition. The point of the proposal is to make this change. -John On Tue, Jun 8, 2010 at 9:45 AM, Shai Erera ser...@gmail.com wrote: Well ... I don't know the reason as well and always thought Scorer and Similarity are confusing. But Scorer is itself an iterator, so what prevents you from calling nextDoc and advance on it without score(). And what would the returned DISI do when nextDoc is called, if not delegate to its subs? If I were in your shoes, I'd simply provider a Query wrapper. If CSQ is not good enough I'd just develop my own. But perhaps others think differently? Shai On Tuesday, June 8, 2010, John Wang john.w...@gmail.com wrote: Hi Shai: I am not sure I understand how changing Similarity would solve this problem, wouldn't you need the reader? As for PayloadTermQuery, payload is not always the most efficient way of storing such data, especially when number of terms numdocs. (I am not sure accessing the payload when you iterate is a good idea, but that is another discussion) Yes, what I described is exactly a simple CustomScoreQuery for a special use-case. The problem is also in CustomScoreQuery, where nextDoc and advance are calling the sub-scorers as a wrapper. This can be avoided if the Scorer returns an iterator instead. Separating scoring and doc iteration is a good idea anyway. I don't know the reason to combine them originally. Thanks -John On Tue, Jun 8, 2010 at 8:47 AM, Shai Erera ser...@gmail.com wrote: So wouldn't it make sense to add some method to Similarity? Which receives the doc Id in question maybe ... just thinking here. Factoring Scorer like you propose would create 3 objects for scoring/iterating: Scorer (which really becomes an iterator), Similarity and CustomScoreFunction ... Maybe you can use CustomScoreQuery? or PayloadTermQuery? depends how you compute your age decay function (where you pull the data about the age of the document). Shai On Tue, Jun 8, 2010 at 6:41 PM, John Wang john.w...@gmail.com wrote: Hi Shai: Similarity in many
Re: Proposal: Scorer api change
Some people don't do IO while searching at all. When you're over certain qps/index size threshold, you need less nodes to keep all your index (or its hot parts) in memory, than to keep combined IO subsystem throughput high enough to satisfy disc-based search demands. 2010/6/9 Doron Cohen cdor...@gmail.com: I too tend to ignore the overhead of delegated calls, especially comparing to all other IO ops and computations done by the stack of scorers, but accepting that you cannot ignore it, could you achieve the same goal by sub-classing the top query where you subclass its weight to return a sub-class of its scorer which would only override score() but not the other methods, and in score would apply that eg decay logic? This way no delegation is required for the other methods. A disadvantage of this is that you would need subclass like this any kind of top level query that might come up in your app - so not sure if this is really acceptable in your case. Another disadvantage is that this is a much more complicated code to write. Doron 2010/6/8 John Wang john.w...@gmail.com Wouldn't you get it as well with proposed api? You would still be able to iterate the doc and at that point call score with the docid. If you call score() along with iteration, you would still get the information no? Making scorer take a docid allows you score any docid in the reader if the query wants it to. Wouldn't it make it more flexible? -John On Tue, Jun 8, 2010 at 10:54 AM, Earwin Burrfoot ear...@gmail.com wrote: To compute a score you have to see which of your subqueries did not match, which did, and what are the docfreqs/positions for them. When iterating, and calling score() only for current doc - parts of this data (maybe even all of it, not sure) is already gathered for you. If you allow calling score(int doc) - for arbitrary docId, you'll have to redo this work. 2010/6/8 John Wang john.w...@gmail.com: Hi Earwin: I am not sure I understand here, e.g. what si the difference between: float myscorinCode(){ computeMyScore(scorer.score()); } and float myscorinCode(){ computeMyScore(scorer.score(scorer.getDocIdSetIterator().docID()); } In the case of BQ, when you get a hit, would you still be able to call subscorer.score(hit)? Why is the point of iteration important for BQ? please elaborate. Thanks -John On Tue, Jun 8, 2010 at 10:10 AM, Earwin Burrfoot ear...@gmail.com wrote: The problem with your proposal is that, currently, Lucene uses current iteration state to compute score. I.e. it already knows which of SHOULD BQ clauses matched for current doc, so it's easier to calculate the score. If you change API to allow scoring arbitrary documents (even those that didn't match the query at all), you're opening a can of worms :) As an alternative, you can try looking at MG4J sources. As far as I understand, their scoring is decoupled from matching, just like you (and I bet many more people) want. The matcher is separate, and the scoring entity accepts current matcher state instead of document id, so you get the best of both worlds. On Tue, Jun 8, 2010 at 21:01, John Wang john.w...@gmail.com wrote: re: But Scorer is itself an iterator, so what prevents you from calling nextDoc and advance on it without score() Nothing. It is just inefficient to pay the method call overhead just to overload score. re: If I were in your shoes, I'd simply provider a Query wrapper. If CSQ is not good enough I'd just develop my own. That is what I am doing. I am just proposing the change (see my first email) as an improvement. re: Scorer is itself an iterator yes, that is the current definition. The point of the proposal is to make this change. -John On Tue, Jun 8, 2010 at 9:45 AM, Shai Erera ser...@gmail.com wrote: Well … I don't know the reason as well and always thought Scorer and Similarity are confusing. But Scorer is itself an iterator, so what prevents you from calling nextDoc and advance on it without score(). And what would the returned DISI do when nextDoc is called, if not delegate to its subs? If I were in your shoes, I'd simply provider a Query wrapper. If CSQ is not good enough I'd just develop my own. But perhaps others think differently? Shai On Tuesday, June 8, 2010, John Wang john.w...@gmail.com wrote: Hi Shai: I am not sure I understand how changing Similarity would solve this problem, wouldn't you need the reader? As for PayloadTermQuery, payload is not always the most efficient way of storing such data, especially when number of terms numdocs. (I am not sure accessing the payload when you iterate is a good idea, but that is another discussion) Yes, what I
Hudson build is back to normal : Lucene-3.x #36
See http://hudson.zones.apache.org/hudson/job/Lucene-3.x/36/ - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org