Re: Does anyone have tips on managing cached filters?
On Wed, Nov 28, 2012 at 12:27 AM, Trejkaz wrote: > On Wed, Nov 28, 2012 at 2:09 AM, Robert Muir wrote: > > > > I don't understand how a filter could become invalid even though the > reader > > has not changed. > > I did state two ways in my last email, but just to re-iterate: > > (1): The filter reflects a query constructed from lines in a text > file. If some other application modifies the text file, that filter is > now invalid. > > (2): The filter reflects the results of an SQL query against a > separate database. If someone inserts a new value into that table, > then that filter is now invalid. > > Case 1 occurs for things like word lists. Case 2 occurs for things > like tags. Neither of these would ever be possible to implement purely > using Lucene, so it is a fact of life that they will become invalid > for reasons other than the reader changing. > > My point is really that lucene (especially clear in 4.0) assumes indexreaders are immutable points in time. I don't think it makes sense for us to provide any e.g. filtercaching or similar otherwise, because this is a key simplification to the design. If you depart from this, by scoring or filtering from mutable stuff outside the inverted index, things are likely going to get complicated.
Re: Does anyone have tips on managing cached filters?
On Wed, Nov 28, 2012 at 2:09 AM, Robert Muir wrote: > > I don't understand how a filter could become invalid even though the reader > has not changed. I did state two ways in my last email, but just to re-iterate: (1): The filter reflects a query constructed from lines in a text file. If some other application modifies the text file, that filter is now invalid. (2): The filter reflects the results of an SQL query against a separate database. If someone inserts a new value into that table, then that filter is now invalid. Case 1 occurs for things like word lists. Case 2 occurs for things like tags. Neither of these would ever be possible to implement purely using Lucene, so it is a fact of life that they will become invalid for reasons other than the reader changing. TX - To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org For additional commands, e-mail: java-user-h...@lucene.apache.org
native, versioned XML-DBMS (that is full text search in versioned document collections)
Hello, as posted some time ago I'm working on a native, versioned XML-DBMS [1]. I'd like to provide a full text index and I recently read about customized Codecs which can be plugged in. Usually data (for instance XML nodes) are stored on RecordPages. I'm still not sure if it is possible and makes sense to implement PostingsFormat and possibly Directory. What I want to achieve is to be able to use my infrastructure for transaction-safe versioning. That is I need some kind of record for the different types (I think fields, terms, documents and term positions) with a simple record-ID to retrieve the record from disk and which kind the record is. Furthermore all I need is a serialization/deserialization mechanism for each record type. Probably I can simply reuse the default serialization/deserialization routine. I'm furthermore not sure if it would be nice to provide a B+-tree implementation which always clusters for instance the fields, the terms, then the documents and the term positions. I don't know what index structure Lucene uses per default, but I think it must be something which is performant with any kind of disks (reading/writing blocks of data). Any hints and suggestions would be nice. kind regards, Johannes [1] https://github.com/JohannesLichtenberger/sirix - To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org For additional commands, e-mail: java-user-h...@lucene.apache.org
Re: What is "flexible indexing" in Lucene 4.0 if it's not the ability to make new postings codecs?
Flexible indexing is the ability to make your own codec, which controls the reading and writing of all index parts (postings, stored fields, term vectors, deleted docs, etc.). So for example if you want to store some postings as a bit set instead of the block format that's the default coming up in 4.1, that's easy to do. But what is less easy (as I described below) is changing what is actually stored in the postings, eg adding a new per-position attribute. The original goal was to allow arbitrary attributes beyond the known docs/freqs/positions/offsets that Lucene supports today, so that you could easily make new application-dependent per-term, per-doc, per-position things, pull them from the analyzer, save them to the index, and access them from an IndexReader / query, but while some APIs do expose this, it's not very well explored yet (eg, you'd have to make a custom indexing chain to get the attributes "through" IndexWriter down to your codec). It would be great to make progress making this easier, so ideas are very welcome :) Mike McCandless http://blog.mikemccandless.com On Tue, Nov 27, 2012 at 3:37 PM, Wu, Stephen T., Ph.D. wrote: > Following up on a previous question... > What is "flexible indexing" in Lucene 4.0? We assumed it was the ability to > easily make new postings formats/codecs -- but a response below says that > would be "tricky"? > > stephen > > > On 11/27/12 11:48 AM, "David Causse" wrote: > >> Hi, >> >> We use payloads but we can't use the whole lucene API. >> For example we use it to do some relation query for example : >> >> @quote(@speaker(obama) @discourse(health)) >> >> Search for all documents that contains a quote by Obama talking about >> health. >> We encode linguistic informations (standoff annotations) inside payloads >> and use custom search API to query the index. >> I didn't found a convenable way to attach my code to lucene >> Query/Scorer/Weight API. Like SpanQuery you have to rewrite the whole >> Query stack. >> In short if you want to go with Payloads that do more than boosting a >> term there's chances that you'll need to rewrite a big part of the query >> stack. >> >> >> Le 27/11/2012 16:59, Wu, Stephen T., Ph.D. a écrit : >>> I think we're looking at doing something related. I haven't explored the >>> Enums or know how to make a postings codec... But what is "flexible >>> indexing" in Lucene 4.0 if it's not the ability to make new postings codecs? >>> >>> We're trying to incorporate attributes onto terms/spans in indexes. We'd >>> also like to try out some interesting ways to score things that go beyond >>> just tokens. >>> >>> We were considering using Attributes instead of Payloads, because it seems >>> like using Payloads ties you to a particular kind of scoring -- just a >>> weight on a token. Can Payloads be used for more general scoring functions? >>> E.g., considering a span of text alongside multiple Payloads? >>> >>> Does it make sense to move outside of Payloads here? >>> >>> Thanks! >>> >>> stephen >>> >>> >>> >>> >>> On 11/19/12 8:14 AM, "Michael McCandless" wrote: >>> A new postings format would be tricky because you have new attributes you want to index. The DocsAndPositionsEnum does have an attributes source, but this is not well explored, and there are known problems (they can't be easily merged in the composite reader case). So that's why I suggested packing your information into a payload ... Mike McCandless http://blog.mikemccandless.com On Sun, Nov 18, 2012 at 8:33 PM, wgggfiy wrote: > thx, mike. > about the 3th question, "encode them all into the payload" is better than > "a new postings format with the codec" ?? > I mean replace the orginal posting item (position, startOffset, endOffset, > payload) with my own inverted item such as > class TestPostingItem > { > int termId; > long startOffset; > long endOffset; > float score; > int segId; > long timeStamp; > } > ? > > > > > -- > View this message in context: > http://lucene.472066.n3.nabble.com/what-is-the-offsets-and-payload-in-DocsA > nd > PositionsEnum-for-tp4020933p4020968.html > Sent from the Lucene - Java Users mailing list archive at Nabble.com. > > - > To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org > For additional commands, e-mail: java-user-h...@lucene.apache.org > - To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org For additional commands, e-mail: java-user-h...@lucene.apache.org >>> >>> - >>> To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org >>> For additional comm
What is "flexible indexing" in Lucene 4.0 if it's not the ability to make new postings codecs?
Following up on a previous question... What is "flexible indexing" in Lucene 4.0? We assumed it was the ability to easily make new postings formats/codecs -- but a response below says that would be "tricky"? stephen On 11/27/12 11:48 AM, "David Causse" wrote: > Hi, > > We use payloads but we can't use the whole lucene API. > For example we use it to do some relation query for example : > > @quote(@speaker(obama) @discourse(health)) > > Search for all documents that contains a quote by Obama talking about > health. > We encode linguistic informations (standoff annotations) inside payloads > and use custom search API to query the index. > I didn't found a convenable way to attach my code to lucene > Query/Scorer/Weight API. Like SpanQuery you have to rewrite the whole > Query stack. > In short if you want to go with Payloads that do more than boosting a > term there's chances that you'll need to rewrite a big part of the query > stack. > > > Le 27/11/2012 16:59, Wu, Stephen T., Ph.D. a écrit : >> I think we're looking at doing something related. I haven't explored the >> Enums or know how to make a postings codec... But what is "flexible >> indexing" in Lucene 4.0 if it's not the ability to make new postings codecs? >> >> We're trying to incorporate attributes onto terms/spans in indexes. We'd >> also like to try out some interesting ways to score things that go beyond >> just tokens. >> >> We were considering using Attributes instead of Payloads, because it seems >> like using Payloads ties you to a particular kind of scoring -- just a >> weight on a token. Can Payloads be used for more general scoring functions? >> E.g., considering a span of text alongside multiple Payloads? >> >> Does it make sense to move outside of Payloads here? >> >> Thanks! >> >> stephen >> >> >> >> >> On 11/19/12 8:14 AM, "Michael McCandless" wrote: >> >>> A new postings format would be tricky because you have new attributes >>> you want to index. >>> >>> The DocsAndPositionsEnum does have an attributes source, but this is >>> not well explored, and there are known problems (they can't be easily >>> merged in the composite reader case). >>> >>> So that's why I suggested packing your information into a payload ... >>> >>> Mike McCandless >>> >>> http://blog.mikemccandless.com >>> >>> On Sun, Nov 18, 2012 at 8:33 PM, wgggfiy wrote: thx, mike. about the 3th question, "encode them all into the payload" is better than "a new postings format with the codec" ?? I mean replace the orginal posting item (position, startOffset, endOffset, payload) with my own inverted item such as class TestPostingItem { int termId; long startOffset; long endOffset; float score; int segId; long timeStamp; } ? -- View this message in context: http://lucene.472066.n3.nabble.com/what-is-the-offsets-and-payload-in-DocsA nd PositionsEnum-for-tp4020933p4020968.html Sent from the Lucene - Java Users mailing list archive at Nabble.com. - To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org For additional commands, e-mail: java-user-h...@lucene.apache.org >>> - >>> To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org >>> For additional commands, e-mail: java-user-h...@lucene.apache.org >>> >> >> - >> To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org >> For additional commands, e-mail: java-user-h...@lucene.apache.org >> >> >> > - To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org For additional commands, e-mail: java-user-h...@lucene.apache.org
Re: info on how lucene conducsts a search?
As you can tell from the title, Lucene In Action is more about using lucene than how it works internally, but yes, it is good and is worth buying. If you're worried about how up to date it is, keep a copy of the release notes and migration guides for later versions to hand. -- Ian. On Tue, Nov 27, 2012 at 4:19 PM, geeky2 wrote: > hello, > > thanks for the info. > > as you suggested - i did do a general search and found this slide > presentation - which had some good general info. i am not sure what the > source of this preso, how qualified the author (although he/she seems very > good) or how current the information is? > > http://www.slideshare.net/nitin_stephens/lucene-basics#btnNext > > i have been working with solr for over a year - but feel like i am missing > the larger picture and want to know more. > > is the lucene in action book good and worth buying - it looks like it covers > lucene 3.0 but may be 2 years old now. > > thx > mark > > > > > -- > View this message in context: > http://lucene.472066.n3.nabble.com/info-on-how-lucene-conducsts-a-search-tp4022665p4022676.html > Sent from the Lucene - Java Users mailing list archive at Nabble.com. > > - > To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org > For additional commands, e-mail: java-user-h...@lucene.apache.org > - To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org For additional commands, e-mail: java-user-h...@lucene.apache.org
Re: handling different scores related to queries
Call the IndexSearch#explain method to get the technical details on how any query is scored. Call Explanation#toString to get the English description for the scoring. Or, using Solr, add the &debugQuery=true parameter to your query request and look at the "explain" section for scoring calculations. Some of these complex queries are "constant score" for performance reasons. -- Jack Krupansky -Original Message- From: sri krishna Sent: Tuesday, November 27, 2012 12:38 PM To: java-user Subject: handling different scores related to queries for a search string hello*~ how the scoring is calculated? as the formula given in the url: http://lucene.apache.org/core/old_versioned_docs/versions/3_0_1/api/core/org/apache/lucene/search/Similarity.html, doesn't take into consideration of edit distance(levenshtein distance) and prefix term corresponding factors into account. Does lucene add up the scores obtained from each type of query included i.e for the above query actual score=default scoring+1/(edit distance)+prefix match score ?, If so, there is no normalization between scores, else what is the approach lucene follows starting from seperating each query based identifiers like (~(edit distance), *(prefix query) etc) to actual scoring. - To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org For additional commands, e-mail: java-user-h...@lucene.apache.org
Re: How does lucene handle the wildcard and fuzzy queries ?
The proper answer to all of these questions is the same and very simple: If you want "internal" details, read the source code first. If you have specific questions then, fine, ask specific questions - but only after you've checked the code first. Also, questions or issues related to "internals" aren't appropriate on "user" lists. -- Jack Krupansky -Original Message- From: sri krishna Sent: Tuesday, November 27, 2012 12:36 PM To: java-user@lucene.apache.org Subject: How does lucene handle the wildcard and fuzzy queries ? How does lucene handle the prefix queries(wild card) and fuzzy queries internally? Lucene stores date in in for of inverted index in segments, i.e term->doc id's. How does it search a word in the term list efficiently? And how does it handle the adv queries on same the inverted index? Thanks - To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org For additional commands, e-mail: java-user-h...@lucene.apache.org
Re: what is the offsets and payload in DocsAndPositionsEnum for ??
Hi, We use payloads but we can't use the whole lucene API. For example we use it to do some relation query for example : @quote(@speaker(obama) @discourse(health)) Search for all documents that contains a quote by Obama talking about health. We encode linguistic informations (standoff annotations) inside payloads and use custom search API to query the index. I didn't found a convenable way to attach my code to lucene Query/Scorer/Weight API. Like SpanQuery you have to rewrite the whole Query stack. In short if you want to go with Payloads that do more than boosting a term there's chances that you'll need to rewrite a big part of the query stack. Le 27/11/2012 16:59, Wu, Stephen T., Ph.D. a écrit : I think we're looking at doing something related. I haven't explored the Enums or know how to make a postings codec... But what is "flexible indexing" in Lucene 4.0 if it's not the ability to make new postings codecs? We're trying to incorporate attributes onto terms/spans in indexes. We'd also like to try out some interesting ways to score things that go beyond just tokens. We were considering using Attributes instead of Payloads, because it seems like using Payloads ties you to a particular kind of scoring -- just a weight on a token. Can Payloads be used for more general scoring functions? E.g., considering a span of text alongside multiple Payloads? Does it make sense to move outside of Payloads here? Thanks! stephen On 11/19/12 8:14 AM, "Michael McCandless" wrote: A new postings format would be tricky because you have new attributes you want to index. The DocsAndPositionsEnum does have an attributes source, but this is not well explored, and there are known problems (they can't be easily merged in the composite reader case). So that's why I suggested packing your information into a payload ... Mike McCandless http://blog.mikemccandless.com On Sun, Nov 18, 2012 at 8:33 PM, wgggfiy wrote: thx, mike. about the 3th question, "encode them all into the payload" is better than "a new postings format with the codec" ?? I mean replace the orginal posting item (position, startOffset, endOffset, payload) with my own inverted item such as class TestPostingItem { int termId; long startOffset; long endOffset; float score; int segId; long timeStamp; } ? -- View this message in context: http://lucene.472066.n3.nabble.com/what-is-the-offsets-and-payload-in-DocsAnd PositionsEnum-for-tp4020933p4020968.html Sent from the Lucene - Java Users mailing list archive at Nabble.com. - To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org For additional commands, e-mail: java-user-h...@lucene.apache.org - To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org For additional commands, e-mail: java-user-h...@lucene.apache.org - To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org For additional commands, e-mail: java-user-h...@lucene.apache.org -- David Causse Spotter http://www.spotter.com/ - To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org For additional commands, e-mail: java-user-h...@lucene.apache.org
Re: info on how lucene conducsts a search?
http://wiki.apache.org/lucene-java/LucenePapers Many people have come to this list asking the same question,including myself. Most answers are practical ones. But lucene has so many interesting ideas in it, which triggers everyones academic curiosity, without caring for the results. 2012/11/27 geeky2 > Ian Lea wrote > > > > The question on cores might be better asked on the solr list, assuming > > you are talking about Solr cores. But I bet the answer will be a > > variant on either "it depends" or, my favourite, "whatever works for > > you". > > yes - i am referring to solr cores. > > i was hoping to find a more academic explanation to a few of my questions. > for example - is a lucene search done as a "full table scan" and therefore > linear in performance or O(n)?? > > knowing things like this - would help me make better core/index design > decisions (along with other factors - of course). > > thx > mark > > > > > -- > View this message in context: > http://lucene.472066.n3.nabble.com/info-on-how-lucene-conducsts-a-search-tp4022665p4022683.html > Sent from the Lucene - Java Users mailing list archive at Nabble.com. > > - > To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org > For additional commands, e-mail: java-user-h...@lucene.apache.org > > -- Sincerely yours, Apostolis Xekoukoulotakis
Re: info on how lucene conducsts a search?
Ian Lea wrote > > The question on cores might be better asked on the solr list, assuming > you are talking about Solr cores. But I bet the answer will be a > variant on either "it depends" or, my favourite, "whatever works for > you". yes - i am referring to solr cores. i was hoping to find a more academic explanation to a few of my questions. for example - is a lucene search done as a "full table scan" and therefore linear in performance or O(n)?? knowing things like this - would help me make better core/index design decisions (along with other factors - of course). thx mark -- View this message in context: http://lucene.472066.n3.nabble.com/info-on-how-lucene-conducsts-a-search-tp4022665p4022683.html Sent from the Lucene - Java Users mailing list archive at Nabble.com. - To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org For additional commands, e-mail: java-user-h...@lucene.apache.org
Re: info on how lucene conducsts a search?
hello, thanks for the info. as you suggested - i did do a general search and found this slide presentation - which had some good general info. i am not sure what the source of this preso, how qualified the author (although he/she seems very good) or how current the information is? http://www.slideshare.net/nitin_stephens/lucene-basics#btnNext i have been working with solr for over a year - but feel like i am missing the larger picture and want to know more. is the lucene in action book good and worth buying - it looks like it covers lucene 3.0 but may be 2 years old now. thx mark -- View this message in context: http://lucene.472066.n3.nabble.com/info-on-how-lucene-conducsts-a-search-tp4022665p4022676.html Sent from the Lucene - Java Users mailing list archive at Nabble.com. - To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org For additional commands, e-mail: java-user-h...@lucene.apache.org
Re: info on how lucene conducsts a search?
http://lucene.apache.org/core/4_0_0/core/org/apache/lucene/search/package-summary.html#package_description might help. Or Google something like "how does lucene work". The question on cores might be better asked on the solr list, assuming you are talking about Solr cores. But I bet the answer will be a variant on either "it depends" or, my favourite, "whatever works for you". -- Ian. On Tue, Nov 27, 2012 at 3:55 PM, geeky2 wrote: > Hello all, > > can someone point me to info or docs on how a lucene search is conducted? > > i would like to have a better understanding of how this works in general - > but also from a design perspective. > > for instance - a question that keeps coming up is, should we add content to > a given core - or break it out in to another core - for performance reasons. > > thx > mark > > > > > -- > View this message in context: > http://lucene.472066.n3.nabble.com/info-on-how-lucene-conducsts-a-search-tp4022665.html > Sent from the Lucene - Java Users mailing list archive at Nabble.com. > > - > To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org > For additional commands, e-mail: java-user-h...@lucene.apache.org > - To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org For additional commands, e-mail: java-user-h...@lucene.apache.org
Re: what is the offsets and payload in DocsAndPositionsEnum for ??
I think we're looking at doing something related. I haven't explored the Enums or know how to make a postings codec... But what is "flexible indexing" in Lucene 4.0 if it's not the ability to make new postings codecs? We're trying to incorporate attributes onto terms/spans in indexes. We'd also like to try out some interesting ways to score things that go beyond just tokens. We were considering using Attributes instead of Payloads, because it seems like using Payloads ties you to a particular kind of scoring -- just a weight on a token. Can Payloads be used for more general scoring functions? E.g., considering a span of text alongside multiple Payloads? Does it make sense to move outside of Payloads here? Thanks! stephen On 11/19/12 8:14 AM, "Michael McCandless" wrote: > A new postings format would be tricky because you have new attributes > you want to index. > > The DocsAndPositionsEnum does have an attributes source, but this is > not well explored, and there are known problems (they can't be easily > merged in the composite reader case). > > So that's why I suggested packing your information into a payload ... > > Mike McCandless > > http://blog.mikemccandless.com > > On Sun, Nov 18, 2012 at 8:33 PM, wgggfiy wrote: >> thx, mike. >> about the 3th question, "encode them all into the payload" is better than >> "a new postings format with the codec" ?? >> I mean replace the orginal posting item (position, startOffset, endOffset, >> payload) with my own inverted item such as >> class TestPostingItem >> { >> int termId; >> long startOffset; >> long endOffset; >> float score; >> int segId; >> long timeStamp; >> } >> ? >> >> >> >> >> -- >> View this message in context: >> http://lucene.472066.n3.nabble.com/what-is-the-offsets-and-payload-in-DocsAnd >> PositionsEnum-for-tp4020933p4020968.html >> Sent from the Lucene - Java Users mailing list archive at Nabble.com. >> >> - >> To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org >> For additional commands, e-mail: java-user-h...@lucene.apache.org >> > > - > To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org > For additional commands, e-mail: java-user-h...@lucene.apache.org > - To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org For additional commands, e-mail: java-user-h...@lucene.apache.org
info on how lucene conducsts a search?
Hello all, can someone point me to info or docs on how a lucene search is conducted? i would like to have a better understanding of how this works in general - but also from a design perspective. for instance - a question that keeps coming up is, should we add content to a given core - or break it out in to another core - for performance reasons. thx mark -- View this message in context: http://lucene.472066.n3.nabble.com/info-on-how-lucene-conducsts-a-search-tp4022665.html Sent from the Lucene - Java Users mailing list archive at Nabble.com. - To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org For additional commands, e-mail: java-user-h...@lucene.apache.org
Re: Does anyone have tips on managing cached filters?
On Tue, Nov 27, 2012 at 6:17 AM, Trejkaz wrote: > > Ah, yeah... I should have been clearer on what I meant there. > > If you want to make a filter which relies on data that isn't in the > index, there is no mechanism for invalidation. One example of it is if > you have a filter which essentially constructs a query based on the > contents of a text file (like a word list.) Another example is with > tagging, with the tags stored in an external database. > I don't understand how a filter could become invalid even though the reader has not changed. If this is the case in your design, then you have much bigger problems.
Re: Does anyone have tips on managing cached filters?
On Tue, Nov 27, 2012 at 9:31 AM, Robert Muir wrote: > On Thu, Nov 22, 2012 at 11:10 PM, Trejkaz wrote: > >> >> As for actually doing the invalidation, CachingWrapperFilter itself >> doesn't appear to have any mechanism for invalidation at all, so I >> imagine I will be building a variation of it with additional methods >> to invalidate parts of the cache. >> >> > Actually it does, it uses a weakhashmap keyed on either the segment > (core+deletes) or just the segment's core. Ah, yeah... I should have been clearer on what I meant there. If you want to make a filter which relies on data that isn't in the index, there is no mechanism for invalidation. One example of it is if you have a filter which essentially constructs a query based on the contents of a text file (like a word list.) Another example is with tagging, with the tags stored in an external database. At the moment we use a separate level of filter cache which asks the contained filter whether it's still OK to use (if the timestamp on the file changes, it gets ejected from the cache.) I suspect the same cache is useful anyway, as it also holds onto the filter instances so that they don't get collected too soon (filters can come out of our query parser, so the caller can't conveniently hold onto the instances in all cases. Sometimes they do two similar queries which happen to call the same filter, so caching the entire resulting query doesn't help either.) An interesting, somewhat-related issue is that for some filters, we can't keep the contents of the file itself in memory due to size limits, so we have to read it on the fly. When there are multiple segments, the file gets read multiple times. So it's a rare case where computing the filter across all readers might actually come out faster than computing it per-segment... TX - To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org For additional commands, e-mail: java-user-h...@lucene.apache.org
Re: sort by field and score
What are you getting for the scores? If it's NaN I think you'll need to use a TopFieldCollector. See for example http://www.gossamer-threads.com/lists/lucene/java-user/86309 -- Ian. On Tue, Nov 27, 2012 at 3:51 AM, Andy Yu wrote: > Hi All, > > > Now I want to sort by a field and the relevance > For example > > SortField sortField[] = {new SortField("id", new > CustomComparatorSource(bitSet)),SortField.FIELD_SCORE}; > Sort sort = new Sort(sortField); > TopDocs topDocs = indexSearcher.search(query, 10,sort); > > if (0 < topDocs.totalHits) { > for (ScoreDoc scoreDoc : topDocs.scoreDocs) { > > System.out.println(indexSearcher.doc(scoreDoc.doc).get("id")); > System.out.println("score is " + scoreDoc.score); > > System.out.println(indexSearcher.doc(scoreDoc.doc).get("name")); > } > } > > I found that the search result sort just by [new SortField("id", new > CustomComparatorSource(bitSet))] > [SortField.FIELD_SCORE] does not work at all > > > PS: my lucene version is 3.6 > > does anybodu know the reason or how to solve it ? > > > Thanks , > Andy - To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org For additional commands, e-mail: java-user-h...@lucene.apache.org