Re: Database
Parminder Singh wrote: I've a CMS application that deploys metadata to a database. Is it possible to use lucene to search this database instead of it's (lucene's) index. If you could tell me the steps that would be involved in doing this, it'd be great help. I'm new to Lucene. I've done this extensively. Basically you create documents out of the database and in my case I generated a URL for each doc which would be fed to users after a query. This is one of those things where Lucene stands apart from what seemed to be the alternatives a few years ago (htdig was one thing I used) -- it doesn't have to spider a web site, and if you have a dynamic web site (pages generated from db queries) then the indexing is in a way more efficient as you don't have to parse html to extract what may or may not be the actual text - the db will have your exact content so you index off the db, not from web pages.. Thank You. Parminder Singh In war: resolution. In defeat: defiance. In victory: magnanimity. In peace: goodwill. - Sir Winston Leonard Spencer Churchill * Disclaimer This message (including any attachments) contains confidential information intended for a specific individual and purpose, and is protected by law. If you are not the intended recipient, you should delete this message and are hereby notified that any disclosure, copying, or distribution of this message, or the taking of any action based on it, is strictly prohibited. * Visit us at http://www.mahindrabt.com - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Re: Iterating TermEnum backwards
Matt Quail wrote: Is there any way to iterate through a TermEnum backwards? Okay, I know that there isn't a way to do this via the TermEnum class, but is it implementable on top of the underlying Lucene datastore? Not really. The best you can do is skip back to the previous indexed term in TermInfosReader.indexTerms, and scan forward from there. You could try adding a method to that class like: final synchronized void seekBefore(Term term) throws IOException { int offset = getIndexOffset(term); seekEnum(offset 0 ? offset - 1 : offset); } Then you'd need to add stuff to MultiReader, SegmentReader and IndexReader, to take advantage of this. It could get a little tricky, but it is possible. I'm not convinced this is your best route. My particular problem is this: I have an index of documents, each document has a date field (I'm using DateField). Most documents have a different date, so the number of unique dates is close to the number of documents. Are you adding documents in date order? If so, then you could look at the date of the document numbered maxDoc() - N and scan forward from there. To be safe, you could start at maxDoc() - N*2 or something. I want to find the top N most recent dates, but I don't want to have to iterate through ALL of them first. NB: With DateField, the earlier dates are lexocographically smaller. (I also want to find the most recent N less than some date D). I know I could invert my dates (something like MAX_LONG - date) to get the REVERSE order, but I want to be able to do least recent and most recent. Why not have two date fields, one inverted and one not? PS: my current solution is to do a binary search between MIN and MAX, halving my search space until I find close to N matching documents. That doesn't sound like a bad solution. Doug - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
RE: Lucene scalability/clustering
Anson, One way of doing it is having subsets of your indexes / data on different machines. Each machine indexes its own data. You implement a system that distributes queries to the various machines and merges the results back. The working well completely depends on your implementation of the distributed search. I believe there was a discussion about implementing this using a MultiSearcher somewhere as well. Cheers! Jochen -Original Message- From: Anson Lau [mailto:[EMAIL PROTECTED] Sent: Sunday, February 22, 2004 2:17 PM To: 'Lucene Users List' Subject: RE: Lucene scalability/clustering Further on this topic - has anyone tried implementing a distributed search with Lucene? How does it work and does it work well? Anson -Original Message- From: Hamish Carpenter [mailto:[EMAIL PROTECTED] Sent: Monday, February 23, 2004 5:24 AM To: Lucene Users List Subject: Re: Lucene scalability/clustering Hi All, I'm Hamish Carpenter who contributed the benchmarks with the comment about the IndexSearcherCache. Using this solved our issues with too many files open under linux. The original IndexSearcherCache email is here: http://www.mail-archive.com/[EMAIL PROTECTED]/msg01967.html See here for a copy of the above message and a download link: http://www.geocities.com/haytona/lucene/ The mailing list doesn't like attachments. The source is 10K in size. HTH Hamish Carpenter. [EMAIL PROTECTED] wrote: BTW, where can I get Peter Halacsy's IndexSearcherCache? - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
RE: Lucene scalability/clustering
I tend to think of scaling in two dimensions: scaling by volumes of users and scaling by volumes of data. The former is addressed through replicated indexes and the latter by segmented indexes. Distribute replicated segments across multiple boxes and create a broker which a)Determines which segments to query b)Load balances query requests across the replicated servers for each segment c) Merges responses Make sure your communications are batched to avoid too much fine-grained chatter. This is the basis of a scalable architecture. Cheers Mark - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Re: Iterating TermEnum backwards
I know I could invert my dates (something like MAX_LONG - date) to get the REVERSE order, but I want to be able to do least recent and most recent. Why not have two date fields, one inverted and one not? PS: my current solution is to do a binary search between MIN and MAX, halving my search space until I find close to N matching documents. That doesn't sound like a bad solution. Cool, thanks for all your suggestions. I'm getting adequate performance from my binary search now, and if it really becomes a performance problem, I'll just index an inverted version of the date (we all have diskspace to spare!). =Matt - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Re: Database
Thanks Byron. That's the way even I've implemented. Each row is a document and each column in the row is a field. Thank You. Parminder Singh Mahindra-British Telecom Limited Sharda Center, Erandwane Pune 411 004. India. Ph: 91-20-4018100 (Ext: 1847) Mob: 91-9850053787 In war: resolution. In defeat: defiance. In victory: magnanimity. In peace: goodwill. - Sir Winston Leonard Spencer Churchill - Original Message - From: Saltysiak, Byron [EMAIL PROTECTED] To: 'Lucene Users List' [EMAIL PROTECTED] Sent: Thursday, February 26, 2004 9:08 PM Subject: RE: Database I have integrated with a database by creating Document objects based on rows from the database and then creating indexes as normal. That was rather easy to implement. Let me know if there is an easier or more direct way to use Lucene with the database. - Byron Saltysiak -Original Message- From: Parminder Singh [mailto:[EMAIL PROTECTED] Sent: Wednesday, February 25, 2004 11:56 PM To: [EMAIL PROTECTED] Subject: Database I've a CMS application that deploys metadata to a database. Is it possible to use lucene to search this database instead of it's (lucene's) index. If you could tell me the steps that would be involved in doing this, it'd be great help. I'm new to Lucene. Thank You. Parminder Singh -- -- In war: resolution. In defeat: defiance. In victory: magnanimity. In peace: goodwill. - Sir Winston Leonard Spencer Churchill * Disclaimer This message (including any attachments) contains confidential information intended for a specific individual and purpose, and is protected by law. If you are not the intended recipient, you should delete this message and are hereby notified that any disclosure, copying, or distribution of this message, or the taking of any action based on it, is strictly prohibited. * Visit us at http://www.mahindrabt.com * Disclaimer This message (including any attachments) contains confidential information intended for a specific individual and purpose, and is protected by law. If you are not the intended recipient, you should delete this message and are hereby notified that any disclosure, copying, or distribution of this message, or the taking of any action based on it, is strictly prohibited. * Visit us at http://www.mahindrabt.com - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
RE: Database
I have integrated with a database by creating Document objects based on rows from the database and then creating indexes as normal. That was rather easy to implement. Let me know if there is an easier or more direct way to use Lucene with the database. - Byron Saltysiak -Original Message- From: Parminder Singh [mailto:[EMAIL PROTECTED] Sent: Wednesday, February 25, 2004 11:56 PM To: [EMAIL PROTECTED] Subject: Database I've a CMS application that deploys metadata to a database. Is it possible to use lucene to search this database instead of it's (lucene's) index. If you could tell me the steps that would be involved in doing this, it'd be great help. I'm new to Lucene. Thank You. Parminder Singh In war: resolution. In defeat: defiance. In victory: magnanimity. In peace: goodwill. - Sir Winston Leonard Spencer Churchill * Disclaimer This message (including any attachments) contains confidential information intended for a specific individual and purpose, and is protected by law. If you are not the intended recipient, you should delete this message and are hereby notified that any disclosure, copying, or distribution of this message, or the taking of any action based on it, is strictly prohibited. * Visit us at http://www.mahindrabt.com