Duplicate Hits
Is there a way to eliminate duplicate hits being returned from the index? Jerry Jalenak Senior Programmer / Analyst, Web Publishing LabOne, Inc. 10101 Renner Blvd. Lenexa, KS 66219 (913) 577-1496 [EMAIL PROTECTED] This transmission (and any information attached to it) may be confidential and is intended solely for the use of the individual or entity to which it is addressed. If you are not the intended recipient or the person responsible for delivering the transmission to the intended recipient, be advised that you have received this transmission in error and that any use, dissemination, forwarding, printing, or copying of this information is strictly prohibited. If you have received this transmission in error, please immediately notify LabOne at the following email address: [EMAIL PROTECTED] - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
RE: Duplicate Hits
Ok, OK. Should have that response coming 8-) The documents I'm indexing are sent from a legacy system, and can be sent multiple times - but I only want to keep the documents if something has changed. If the indexed fields match exactly, I don't want to index the second (or third, forth, etc) documents. If the indexed fields have changed, then I want to index the 'new' document, and keep it. Given Erik's response of 'don't put duplicate documents in the index', how can I accomplish this in the IndexWriter? Jerry Jalenak Senior Programmer / Analyst, Web Publishing LabOne, Inc. 10101 Renner Blvd. Lenexa, KS 66219 (913) 577-1496 [EMAIL PROTECTED] -Original Message- From: Erik Hatcher [mailto:[EMAIL PROTECTED] Sent: Tuesday, February 01, 2005 8:35 AM To: Lucene Users List Subject: Re: Duplicate Hits On Feb 1, 2005, at 9:01 AM, Jerry Jalenak wrote: Is there a way to eliminate duplicate hits being returned from the index? Sure, don't put duplicate documents in the index :) Erik - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] This transmission (and any information attached to it) may be confidential and is intended solely for the use of the individual or entity to which it is addressed. If you are not the intended recipient or the person responsible for delivering the transmission to the intended recipient, be advised that you have received this transmission in error and that any use, dissemination, forwarding, printing, or copying of this information is strictly prohibited. If you have received this transmission in error, please immediately notify LabOne at the following email address: [EMAIL PROTECTED] - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
RE: Duplicate Hits
Nice idea John - one I hadn't considered. Once you have the checksum, do you 'check' in the index first before storing the second document? Or do you filter on the query side? Jerry Jalenak Senior Programmer / Analyst, Web Publishing LabOne, Inc. 10101 Renner Blvd. Lenexa, KS 66219 (913) 577-1496 [EMAIL PROTECTED] -Original Message- From: John Haxby [mailto:[EMAIL PROTECTED] Sent: Tuesday, February 01, 2005 9:06 AM To: Lucene Users List Subject: Re: Duplicate Hits Jerry Jalenak wrote: Given Erik's response of 'don't put duplicate documents in the index', how can I accomplish this in the IndexWriter? I was dealing with a similar requirement recently. I eventually decided on storing the MD5 checksum of the document as a keyword. It means reading it twice (once to calculate the checksum, once to index it), but it seems to do the trick. jch - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] This transmission (and any information attached to it) may be confidential and is intended solely for the use of the individual or entity to which it is addressed. If you are not the intended recipient or the person responsible for delivering the transmission to the intended recipient, be advised that you have received this transmission in error and that any use, dissemination, forwarding, printing, or copying of this information is strictly prohibited. If you have received this transmission in error, please immediately notify LabOne at the following email address: [EMAIL PROTECTED] - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
RE: Duplicate Hits
Just to make sure I understand Do you keep an IndexReader open at the same time you are running the IndexWriter? From what I can see in the JavaDocs, it looks like only IndexReader (or IndexSearch) can peek into the index and see if a document exists or not Thanks! Jerry Jalenak Senior Programmer / Analyst, Web Publishing LabOne, Inc. 10101 Renner Blvd. Lenexa, KS 66219 (913) 577-1496 [EMAIL PROTECTED] -Original Message- From: John Haxby [mailto:[EMAIL PROTECTED] Sent: Tuesday, February 01, 2005 9:39 AM To: Lucene Users List Subject: Re: Duplicate Hits Jerry Jalenak wrote: Nice idea John - one I hadn't considered. Once you have the checksum, do you 'check' in the index first before storing the second document? Or do you filter on the query side? I do a quick search for the md5 checksum before indexing. Although I suspect not applicable in your case, I also maintained a last time something was indexed time alongside the index. I used this to drastically prune the number of documents that needed to be considered for indexing if I restarted; anything modified before then wasn't a candidate. Since the MD5 checksum provides the definitive (for a sufficiently loose definition of definitive) indication of whether a document is indexed I didn't need to worry about ultra-fine granularity in the time stamp and I didn't need to worry about it being committed to disk; it generally got committed to the magnetic stuff every few seconds or so. It does help a lot though if documents have nice unique identifiers that you can use instead, then you can use the identifier and the last modified time to decide whether or not to re-index. jch - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] This transmission (and any information attached to it) may be confidential and is intended solely for the use of the individual or entity to which it is addressed. If you are not the intended recipient or the person responsible for delivering the transmission to the intended recipient, be advised that you have received this transmission in error and that any use, dissemination, forwarding, printing, or copying of this information is strictly prohibited. If you have received this transmission in error, please immediately notify LabOne at the following email address: [EMAIL PROTECTED] - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
RE: Duplicate Hits
OK - but I'm dealing with indexing between 1.5 and 2 million documents, so I really don't want to 'batch' them up if I can avoid it. And I also don't think I can keep an IndexRead open to the index at the same time I have an IndexWriter open. I may have to try and deal with this issue through some sort of filter on the query side, provided it doesn't impact performance to much. Thanks. Jerry Jalenak Senior Programmer / Analyst, Web Publishing LabOne, Inc. 10101 Renner Blvd. Lenexa, KS 66219 (913) 577-1496 [EMAIL PROTECTED] -Original Message- From: John Haxby [mailto:[EMAIL PROTECTED] Sent: Tuesday, February 01, 2005 9:48 AM To: Lucene Users List Subject: Re: Duplicate Hits Jerry Jalenak wrote: Just to make sure I understand Do you keep an IndexReader open at the same time you are running the IndexWriter? From what I can see in the JavaDocs, it looks like only IndexReader (or IndexSearch) can peek into the index and see if a document exists or not I slightly misled you: it wasn't Lucene that I was using at the time and in that system the distinction between IndexReader and IndexWriter didn't exist. I'm just getting to grips with Lucene really but it would seem to be possible to use a similar scheme, especially if you batch up your documents for indexing: as they come in, check the md5 checksum against what's already known and what's already queued and then when the time comes to process the queue you know what you've got needs to be indexed. jch - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] This transmission (and any information attached to it) may be confidential and is intended solely for the use of the individual or entity to which it is addressed. If you are not the intended recipient or the person responsible for delivering the transmission to the intended recipient, be advised that you have received this transmission in error and that any use, dissemination, forwarding, printing, or copying of this information is strictly prohibited. If you have received this transmission in error, please immediately notify LabOne at the following email address: [EMAIL PROTECTED] - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Index Layout Question
I am in the process of indexing about 1.5 million documents, and have started down the path of indexing these by month. Each month has between 100,000 and 200,000 documents. From a performance standpoint, is this the right approach? This allows me to use MultiSearcher (or ParallelMultiSearcher), but I'm not sure if the performance gains are really there. Would one monolithic index be better? Thanks. Jerry Jalenak Senior Programmer / Analyst, Web Publishing LabOne, Inc. 10101 Renner Blvd. Lenexa, KS 66219 (913) 577-1496 [EMAIL PROTECTED] This transmission (and any information attached to it) may be confidential and is intended solely for the use of the individual or entity to which it is addressed. If you are not the intended recipient or the person responsible for delivering the transmission to the intended recipient, be advised that you have received this transmission in error and that any use, dissemination, forwarding, printing, or copying of this information is strictly prohibited. If you have received this transmission in error, please immediately notify LabOne at the following email address: [EMAIL PROTECTED] - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
RE: Index Layout Question
That's good to know. I'm indexing on 11 fields (9 keyword, 2 text). The documents themselves are between 1K to 2K in size. Is there a point at which IndexSearcher performance begins to fall off? (in term of # of index records?) Jerry Jalenak Senior Programmer / Analyst, Web Publishing LabOne, Inc. 10101 Renner Blvd. Lenexa, KS 66219 (913) 577-1496 [EMAIL PROTECTED] -Original Message- From: Ian Soboroff [mailto:[EMAIL PROTECTED] Sent: Thursday, January 27, 2005 10:31 AM To: Lucene Users List Subject: Re: Index Layout Question Jerry Jalenak [EMAIL PROTECTED] writes: I am in the process of indexing about 1.5 million documents, and have started down the path of indexing these by month. Each month has between 100,000 and 200,000 documents. From a performance standpoint, is this the right approach? This allows me to use MultiSearcher (or ParallelMultiSearcher), but I'm not sure if the performance gains are really there. Would one monolithic index be better? Depends on your search infrastructure. Doug Cutting has sent out some basic optimization guidelines on this list which should be in the archives... simply, you need to think about how many CPUs and spindles are involved. 1.5m documents isn't a challenge for Lucene to index or search on a single machine with a monolithic index. I indexed about 1.6m web pages in 22 hours on a single machine with all data local, and search with a single IndexSearcher was instantaneous. We've also done some testing with a larger collection (25m pages) and ParallelMultiSearchers on several machines, and likewise on a fast network haven't felt a slowdown, but we haven't actually benchmarked it. Ian - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] This transmission (and any information attached to it) may be confidential and is intended solely for the use of the individual or entity to which it is addressed. If you are not the intended recipient or the person responsible for delivering the transmission to the intended recipient, be advised that you have received this transmission in error and that any use, dissemination, forwarding, printing, or copying of this information is strictly prohibited. If you have received this transmission in error, please immediately notify LabOne at the following email address: [EMAIL PROTECTED] - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
[HOWTO] Setting BooleanQuery MaxClauseCount
Is there a way to set the maxClauseCount field of BooleanQuery when using QueryParser? Jerry Jalenak Senior Programmer / Analyst, Web Publishing LabOne, Inc. 10101 Renner Blvd. Lenexa, KS 66219 (913) 577-1496 [EMAIL PROTECTED] This transmission (and any information attached to it) may be confidential and is intended solely for the use of the individual or entity to which it is addressed. If you are not the intended recipient or the person responsible for delivering the transmission to the intended recipient, be advised that you have received this transmission in error and that any use, dissemination, forwarding, printing, or copying of this information is strictly prohibited. If you have received this transmission in error, please immediately notify LabOne at the following email address: [EMAIL PROTECTED] - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
RE: [HOWTO] Setting BooleanQuery MaxClauseCount
Never mind. disclaimer These types of questions is what occurs when one is trying to do too many things at the same time. /disclaimer Jerry Jalenak Senior Programmer / Analyst, Web Publishing LabOne, Inc. 10101 Renner Blvd. Lenexa, KS 66219 (913) 577-1496 [EMAIL PROTECTED] -Original Message- From: Jerry Jalenak [mailto:[EMAIL PROTECTED] Sent: Wednesday, January 26, 2005 10:19 AM To: 'lucene-user@jakarta.apache.org' Subject: [HOWTO] Setting BooleanQuery MaxClauseCount Is there a way to set the maxClauseCount field of BooleanQuery when using QueryParser? Jerry Jalenak Senior Programmer / Analyst, Web Publishing LabOne, Inc. 10101 Renner Blvd. Lenexa, KS 66219 (913) 577-1496 [EMAIL PROTECTED] This transmission (and any information attached to it) may be confidential and is intended solely for the use of the individual or entity to which it is addressed. If you are not the intended recipient or the person responsible for delivering the transmission to the intended recipient, be advised that you have received this transmission in error and that any use, dissemination, forwarding, printing, or copying of this information is strictly prohibited. If you have received this transmission in error, please immediately notify LabOne at the following email address: [EMAIL PROTECTED] - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
RE: Filtering w/ Multiple Terms
I spent some time reading the Lucene in Action book this weekend (great job, btw), and came across the section on using custom filters. Since the data that I need to use to filter my hit set with comes from a database, I thought it would be worth my effort this morning to write a custom filter that would handle the filtering for me. So, using the example from the book (page 210), I've coded an AccountFilter: public class AccountFilter extends Filter { public AccountFilter() {} public BitSet bits(IndexReader indexReader) throws IOException { System.out.println(Entering AccountFilter...); BitSet bitSet = new BitSet(indexReader.maxDoc()); String[] reportingAccounts = new String[] {0011, 4kfs}; int[] docs = new int[1]; int[] freqs = new int[1]; for (int i = 0; i reportingAccounts.length; i++) { String reportingAccount = reportingAccounts[i]; if (reportingAccount != null) { TermDocs termDocs = indexReader.termDocs(new Term(account, reportingAccount)); int count = termDocs.read(docs, freqs); if (count == 1) { System.out.println(Setting bit on); bitSet.set(docs[0]); } } } System.out.println(Leaving AccountFilter...); return bitSet; } } I see where the AccountFilter is setting the cooresponding 'bits', but I end up without any 'hits': Entering AccountFilter... Entering AccountFilter... Entering AccountFilter... Setting bit on Setting bit on Setting bit on Setting bit on Setting bit on Leaving AccountFilter... Leaving AccountFilter... Leaving AccountFilter... ... Found 0 matching documents in 1000 ms Can anyone tell me what I've done wrong? Jerry Jalenak Senior Programmer / Analyst, Web Publishing LabOne, Inc. 10101 Renner Blvd. Lenexa, KS 66219 (913) 577-1496 [EMAIL PROTECTED] -Original Message- From: Otis Gospodnetic [mailto:[EMAIL PROTECTED] Sent: Friday, January 21, 2005 8:15 AM To: Lucene Users List Subject: RE: Filtering w/ Multiple Terms This: http://jakarta.apache.org/lucene/docs/api/org/apache/lucene/se arch/BooleanQuery.TooManyClauses.html ? You can control that limit via http://jakarta.apache.org/lucene/docs/api/org/apache/lucene/se arch/BooleanQuery.html#maxClauseCount Otis --- Jerry Jalenak [EMAIL PROTECTED] wrote: OK. But isn't there a limit on the number of BooleanQueries that can be combined with AND / OR / etc? Jerry Jalenak Senior Programmer / Analyst, Web Publishing LabOne, Inc. 10101 Renner Blvd. Lenexa, KS 66219 (913) 577-1496 [EMAIL PROTECTED] -Original Message- From: Erik Hatcher [mailto:[EMAIL PROTECTED] Sent: Thursday, January 20, 2005 5:05 PM To: Lucene Users List Subject: Re: Filtering w/ Multiple Terms On Jan 20, 2005, at 5:02 PM, Jerry Jalenak wrote: In looking at the examples for filtering of hits, it looks like I can only specify a single term; i.e. Filter f = new QueryFilter(new TermQuery(new Term(acct, acct1))); I need to specify more than one term in my filter. Short of using something like ChainFilter, how are others handling this? You can make as complex of a Query as you want for QueryFilter. If you want to filter on multiple terms, construct a BooleanQuery with nested TermQuery's, either in an AND or OR fashion. Erik - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] This transmission (and any information attached to it) may be confidential and is intended solely for the use of the individual or entity to which it is addressed. If you are not the intended recipient or the person responsible for delivering the transmission to the intended recipient, be advised that you have received this transmission in error and that any use, dissemination, forwarding, printing, or copying of this information is strictly prohibited. If you have received this transmission in error, please immediately notify LabOne at the following email address: [EMAIL PROTECTED] - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED
RE: Filtering w/ Multiple Terms
Paul / Erik - I'm use the ParallelMultiSearcher to search three indexes concurrently - hence the three entries into AccountFilter. If I remove the filter from my query, and simply enter the query on the command line, I get two hits back. In other words, I can enter this: smith AND (account:0011) and get hits back. When I add the filter back in (which should take care of the account:0011 part of the query), and enter only smith as my query, I get 0 hits. Jerry Jalenak Senior Programmer / Analyst, Web Publishing LabOne, Inc. 10101 Renner Blvd. Lenexa, KS 66219 (913) 577-1496 [EMAIL PROTECTED] -Original Message- From: Erik Hatcher [mailto:[EMAIL PROTECTED] Sent: Monday, January 24, 2005 1:07 PM To: Lucene Users List Subject: Re: Filtering w/ Multiple Terms On Jan 24, 2005, at 12:26 PM, Jerry Jalenak wrote: I spent some time reading the Lucene in Action book this weekend (great job, btw) Thanks! public class AccountFilter extends Filter I see where the AccountFilter is setting the cooresponding 'bits', but I end up without any 'hits': Entering AccountFilter... Entering AccountFilter... Entering AccountFilter... Setting bit on Setting bit on Setting bit on Setting bit on Setting bit on Leaving AccountFilter... Leaving AccountFilter... Leaving AccountFilter... ... Found 0 matching documents in 1000 ms Can anyone tell me what I've done wrong? A filter constrains which documents will be consulted during a search, but the Query needs to match some documents that are turned on by the filter bits. I'm guessing that your Query did not match any of the documents you turned on. Erik - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] This transmission (and any information attached to it) may be confidential and is intended solely for the use of the individual or entity to which it is addressed. If you are not the intended recipient or the person responsible for delivering the transmission to the intended recipient, be advised that you have received this transmission in error and that any use, dissemination, forwarding, printing, or copying of this information is strictly prohibited. If you have received this transmission in error, please immediately notify LabOne at the following email address: [EMAIL PROTECTED] - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
RE: Filtering w/ Multiple Terms
sheepish-look-on-face/ After re-reading the book (again), and the javadocs (again), it dawned on my little brain that I needed to have a doc and freq array *the size of maxDocs* for the index reader. I also needed to iterate through the docs array and call bitSet.set for each entry in docs (that was valid, of course). Everything is good now Thanks! Jerry Jalenak Senior Programmer / Analyst, Web Publishing LabOne, Inc. 10101 Renner Blvd. Lenexa, KS 66219 (913) 577-1496 [EMAIL PROTECTED] -Original Message- From: Erik Hatcher [mailto:[EMAIL PROTECTED] Sent: Monday, January 24, 2005 1:27 PM To: Lucene Users List Subject: Re: Filtering w/ Multiple Terms As Paul suggested, output the Lucene document numbers from your Hits, and also output which bit you're setting in your filter. Do those sets overlap? Erik On Jan 24, 2005, at 2:13 PM, Jerry Jalenak wrote: Paul / Erik - I'm use the ParallelMultiSearcher to search three indexes concurrently - hence the three entries into AccountFilter. If I remove the filter from my query, and simply enter the query on the command line, I get two hits back. In other words, I can enter this: smith AND (account:0011) and get hits back. When I add the filter back in (which should take care of the account:0011 part of the query), and enter only smith as my query, I get 0 hits. Jerry Jalenak Senior Programmer / Analyst, Web Publishing LabOne, Inc. 10101 Renner Blvd. Lenexa, KS 66219 (913) 577-1496 [EMAIL PROTECTED] -Original Message- From: Erik Hatcher [mailto:[EMAIL PROTECTED] Sent: Monday, January 24, 2005 1:07 PM To: Lucene Users List Subject: Re: Filtering w/ Multiple Terms On Jan 24, 2005, at 12:26 PM, Jerry Jalenak wrote: I spent some time reading the Lucene in Action book this weekend (great job, btw) Thanks! public class AccountFilter extends Filter I see where the AccountFilter is setting the cooresponding 'bits', but I end up without any 'hits': Entering AccountFilter... Entering AccountFilter... Entering AccountFilter... Setting bit on Setting bit on Setting bit on Setting bit on Setting bit on Leaving AccountFilter... Leaving AccountFilter... Leaving AccountFilter... ... Found 0 matching documents in 1000 ms Can anyone tell me what I've done wrong? A filter constrains which documents will be consulted during a search, but the Query needs to match some documents that are turned on by the filter bits. I'm guessing that your Query did not match any of the documents you turned on. Erik - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] This transmission (and any information attached to it) may be confidential and is intended solely for the use of the individual or entity to which it is addressed. If you are not the intended recipient or the person responsible for delivering the transmission to the intended recipient, be advised that you have received this transmission in error and that any use, dissemination, forwarding, printing, or copying of this information is strictly prohibited. If you have received this transmission in error, please immediately notify LabOne at the following email address: [EMAIL PROTECTED] - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] This transmission (and any information attached to it) may be confidential and is intended solely for the use of the individual or entity to which it is addressed. If you are not the intended recipient or the person responsible for delivering the transmission to the intended recipient, be advised that you have received this transmission in error and that any use, dissemination, forwarding, printing, or copying of this information is strictly prohibited. If you have received this transmission in error, please immediately notify LabOne at the following email address: [EMAIL PROTECTED] - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
RE: Filtering w/ Multiple Terms
OK. But isn't there a limit on the number of BooleanQueries that can be combined with AND / OR / etc? Jerry Jalenak Senior Programmer / Analyst, Web Publishing LabOne, Inc. 10101 Renner Blvd. Lenexa, KS 66219 (913) 577-1496 [EMAIL PROTECTED] -Original Message- From: Erik Hatcher [mailto:[EMAIL PROTECTED] Sent: Thursday, January 20, 2005 5:05 PM To: Lucene Users List Subject: Re: Filtering w/ Multiple Terms On Jan 20, 2005, at 5:02 PM, Jerry Jalenak wrote: In looking at the examples for filtering of hits, it looks like I can only specify a single term; i.e. Filter f = new QueryFilter(new TermQuery(new Term(acct, acct1))); I need to specify more than one term in my filter. Short of using something like ChainFilter, how are others handling this? You can make as complex of a Query as you want for QueryFilter. If you want to filter on multiple terms, construct a BooleanQuery with nested TermQuery's, either in an AND or OR fashion. Erik - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] This transmission (and any information attached to it) may be confidential and is intended solely for the use of the individual or entity to which it is addressed. If you are not the intended recipient or the person responsible for delivering the transmission to the intended recipient, be advised that you have received this transmission in error and that any use, dissemination, forwarding, printing, or copying of this information is strictly prohibited. If you have received this transmission in error, please immediately notify LabOne at the following email address: [EMAIL PROTECTED] - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Filtering w/ Multiple Terms
In looking at the examples for filtering of hits, it looks like I can only specify a single term; i.e. Filter f = new QueryFilter(new TermQuery(new Term(acct, acct1))); I need to specify more than one term in my filter. Short of using something like ChainFilter, how are others handling this? Thanks! Jerry Jalenak Senior Programmer / Analyst, Web Publishing LabOne, Inc. 10101 Renner Blvd. Lenexa, KS 66219 (913) 577-1496 [EMAIL PROTECTED] This transmission (and any information attached to it) may be confidential and is intended solely for the use of the individual or entity to which it is addressed. If you are not the intended recipient or the person responsible for delivering the transmission to the intended recipient, be advised that you have received this transmission in error and that any use, dissemination, forwarding, printing, or copying of this information is strictly prohibited. If you have received this transmission in error, please immediately notify LabOne at the following email address: [EMAIL PROTECTED] - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
[newbie] Confused about PrefixQuery
) + , DOB = + document.get(dob) + , Collected = + document.get(collected) + , Created = + document.get(created)); //System.out.println(document.get(content)); } } } } catch(Exception e) { System.out.println(e.getClass() + caught with message + e.getMessage()); } } /snip When I run this using a criteria string of lastname:mar* I get back the following: Query: lastname:mar* Searching for: lastname:mar* ... Found 9 matching documents Hit 0: Specimen = 40062720, Account = 0001, Status = N, Name = LOIS MARTIN, SSN = 536628498, DOB = 19010101, Collected = 20050118, Created = 20050119 Hit 1: Specimen = 38843845, Account = 4NEK, Status = N, Name = RENEE CAPPETTA, SSN = 585132901, DOB = 19010101, Collected = 20050117, Created = 20050119 Hit 2: Specimen = 39894441, Account = 3384, Status = N, Name = LINDA CANTU, SSN = 453539817, DOB = 19010101, Collected = 20050118, Created = 20050119 Hit 3: Specimen = 39894441, Account = 3384, Status = N, Name = LINDA CANTU, SSN = 453539817, DOB = 19010101, Collected = 20050118, Created = 20050119 Hit 4: Specimen = 38247027, Account = 23SQ, Status = N, Name = ROBERT BASTOW, SSN = 528960058, DOB = 19010101, Collected = 20050118, Created = 20050119 Hit 5: Specimen = 38247027, Account = 23SQ, Status = N, Name = ROBERT BASTOW, SSN = 528960058, DOB = 19010101, Collected = 20050118, Created = 20050119 Hit 6: Specimen = 38247027, Account = 23SQ, Status = N, Name = ROBERT BASTOW, SSN = 528960058, DOB = 19010101, Collected = 20050118, Created = 20050119 Hit 7: Specimen = 38247027, Account = 23SQ, Status = N, Name = ROBERT BASTOW, SSN = 528960058, DOB = 19010101, Collected = 20050118, Created = 20050119 Hit 8: Specimen = 38247027, Account = 23SQ, Status = N, Name = ROBERT BASTOW, SSN = 528960058, DOB = 19010101, Collected = 20050118, Created = 20050119 I'm at a loss to explain why I'm getting hits 1 - 8 - the lastnames don't start with mar! I suspect it is due to an incorrect use of Field.Keyword vs Field.Text in the indexer, but I can seem to figure it out... Thanks. Jerry Jalenak Senior Programmer / Analyst, Web Publishing LabOne, Inc. 10101 Renner Blvd. Lenexa, KS 66219 (913) 577-1496 [EMAIL PROTECTED] This transmission (and any information attached to it) may be confidential and is intended solely for the use of the individual or entity to which it is addressed. If you are not the intended recipient or the person responsible for delivering the transmission to the intended recipient, be advised that you have received this transmission in error and that any use, dissemination, forwarding, printing, or copying of this information is strictly prohibited. If you have received this transmission in error, please immediately notify LabOne at the following email address: [EMAIL PROTECTED] - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
RE: [newbie] Confused about PrefixQuery
Erik, Thanks for reply. Some lists want all the info, some don't. Just thought I'd try to provide as much info as possible 8-) That being said, where do I find Luke? Jerry Jalenak Senior Programmer / Analyst, Web Publishing LabOne, Inc. 10101 Renner Blvd. Lenexa, KS 66219 (913) 577-1496 [EMAIL PROTECTED] -Original Message- From: Erik Hatcher [mailto:[EMAIL PROTECTED] Sent: Wednesday, January 19, 2005 2:42 PM To: Lucene Users List Subject: Re: [newbie] Confused about PrefixQuery On Jan 19, 2005, at 3:16 PM, Jerry Jalenak wrote: The text files have two control lines at the beginning of them - CC and AN. That's quite a complex example to ask a user list to decipher. Simplifying the example, besides making it easier for us to understand, would likely shed light on the problem. Everything (I think) indexes correctly. To be sure, try Luke out and see what got indexed exactly. You can also use Luke as an ad-hoc search tool rather than writing your own. When I search against this index, though, I get some weird results, especially when using an '*' at the end of my criteria. The results you got definitely are weird given the query, and in my initial glance through your code I did not see the issue pop out. Luke will likely shed much more light on the matter. Erik - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] This transmission (and any information attached to it) may be confidential and is intended solely for the use of the individual or entity to which it is addressed. If you are not the intended recipient or the person responsible for delivering the transmission to the intended recipient, be advised that you have received this transmission in error and that any use, dissemination, forwarding, printing, or copying of this information is strictly prohibited. If you have received this transmission in error, please immediately notify LabOne at the following email address: [EMAIL PROTECTED] - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
RE: [newbie] Confused about PrefixQuery
oops / Never mind. Stupid, stupid assumption on my part with the data. Thanks anyway. Jerry Jalenak Senior Programmer / Analyst, Web Publishing LabOne, Inc. 10101 Renner Blvd. Lenexa, KS 66219 (913) 577-1496 [EMAIL PROTECTED] -Original Message- From: Jerry Jalenak [mailto:[EMAIL PROTECTED] Sent: Wednesday, January 19, 2005 3:12 PM To: 'Lucene Users List' Subject: RE: [newbie] Confused about PrefixQuery Erik, Thanks for reply. Some lists want all the info, some don't. Just thought I'd try to provide as much info as possible 8-) That being said, where do I find Luke? Jerry Jalenak Senior Programmer / Analyst, Web Publishing LabOne, Inc. 10101 Renner Blvd. Lenexa, KS 66219 (913) 577-1496 [EMAIL PROTECTED] -Original Message- From: Erik Hatcher [mailto:[EMAIL PROTECTED] Sent: Wednesday, January 19, 2005 2:42 PM To: Lucene Users List Subject: Re: [newbie] Confused about PrefixQuery On Jan 19, 2005, at 3:16 PM, Jerry Jalenak wrote: The text files have two control lines at the beginning of them - CC and AN. That's quite a complex example to ask a user list to decipher. Simplifying the example, besides making it easier for us to understand, would likely shed light on the problem. Everything (I think) indexes correctly. To be sure, try Luke out and see what got indexed exactly. You can also use Luke as an ad-hoc search tool rather than writing your own. When I search against this index, though, I get some weird results, especially when using an '*' at the end of my criteria. The results you got definitely are weird given the query, and in my initial glance through your code I did not see the issue pop out. Luke will likely shed much more light on the matter. Erik - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] This transmission (and any information attached to it) may be confidential and is intended solely for the use of the individual or entity to which it is addressed. If you are not the intended recipient or the person responsible for delivering the transmission to the intended recipient, be advised that you have received this transmission in error and that any use, dissemination, forwarding, printing, or copying of this information is strictly prohibited. If you have received this transmission in error, please immediately notify LabOne at the following email address: [EMAIL PROTECTED] - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] This transmission (and any information attached to it) may be confidential and is intended solely for the use of the individual or entity to which it is addressed. If you are not the intended recipient or the person responsible for delivering the transmission to the intended recipient, be advised that you have received this transmission in error and that any use, dissemination, forwarding, printing, or copying of this information is strictly prohibited. If you have received this transmission in error, please immediately notify LabOne at the following email address: [EMAIL PROTECTED] - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
RE: [newbie] Confused about PrefixQuery
Sorry. Thought Luke came bundled with Lucene, and I was just missing it.. Jerry Jalenak Senior Programmer / Analyst, Web Publishing LabOne, Inc. 10101 Renner Blvd. Lenexa, KS 66219 (913) 577-1496 [EMAIL PROTECTED] -Original Message- From: Erik Hatcher [mailto:[EMAIL PROTECTED] Sent: Wednesday, January 19, 2005 3:28 PM To: Lucene Users List Subject: Re: [newbie] Confused about PrefixQuery On Jan 19, 2005, at 4:12 PM, Jerry Jalenak wrote: Thanks for reply. Some lists want all the info, some don't. Just thought I'd try to provide as much info as possible 8-) The info is good... I just push for simple examples :) By simplifying, often the problem becomes apparent and trivial. That being said, where do I find Luke? Silly response, but go to Google, type in _luke lucene_ and press I'm feeling lucky :) But, since I already have the URL handy, here it is: http://www.getopt.org/luke/ Erik - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] This transmission (and any information attached to it) may be confidential and is intended solely for the use of the individual or entity to which it is addressed. If you are not the intended recipient or the person responsible for delivering the transmission to the intended recipient, be advised that you have received this transmission in error and that any use, dissemination, forwarding, printing, or copying of this information is strictly prohibited. If you have received this transmission in error, please immediately notify LabOne at the following email address: [EMAIL PROTECTED] - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]