Re: Database

2004-02-26 Thread David Spencer
Parminder Singh wrote:

I've a CMS application that deploys metadata to a database. Is it possible to use lucene to search this database instead of it's (lucene's) index. If you could tell me the steps that would be involved in doing this, it'd be great help. I'm new to Lucene. 
 

I've done this extensively. Basically you create documents out of the 
database and in my case I generated a URL for each doc which would be 
fed to users after a query.  This is one of those things where Lucene 
stands apart from what seemed to be the alternatives a few years ago 
(htdig was one thing I used) -- it doesn't have to spider a web site, 
and if you have a dynamic web site (pages generated from db queries) 
then the indexing is in a way more efficient as you don't have to parse 
html to extract what may or may not be the actual text - the db will 
have your exact content so you index off the db, not from web pages..

Thank You.

Parminder Singh



In war: resolution. In defeat: defiance. In victory: magnanimity. In peace: goodwill. - Sir Winston Leonard Spencer Churchill

*
Disclaimer
This message (including any attachments) contains 
confidential information intended for a specific 
individual and purpose, and is protected by law. 
If you are not the intended recipient, you should 
delete this message and are hereby notified that 
any disclosure, copying, or distribution of this
message, or the taking of any action based on it, 
is strictly prohibited.

*
Visit us at http://www.mahindrabt.com
 



-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]


Re: Iterating TermEnum backwards

2004-02-26 Thread Doug Cutting
Matt Quail wrote:
Is there any way to iterate through a TermEnum backwards? Okay, I know
that there isn't a way to do this via the TermEnum class, but is it
implementable on top of the underlying Lucene datastore?
Not really.  The best you can do is skip back to the previous indexed 
term in TermInfosReader.indexTerms, and scan forward from there.  You 
could try adding a method to that class like:

  final synchronized void seekBefore(Term term) throws IOException {
int offset = getIndexOffset(term);
seekEnum(offset  0 ? offset - 1 : offset);
  }
Then you'd need to add stuff to MultiReader, SegmentReader and 
IndexReader, to take advantage of this.  It could get a little tricky, 
but it is possible.  I'm not convinced this is your best route.

My particular problem is this:

I have an index of documents, each document has a date field (I'm
using DateField). Most documents have a different date, so the number of
unique dates is close to the number of documents.
Are you adding documents in date order?  If so, then you could look at 
the date of the document numbered maxDoc() - N and scan forward from 
there.  To be safe, you could start at maxDoc() - N*2 or something.

I want to find the top N most recent dates, but I don't want to have to
iterate through ALL of them first. NB: With DateField, the earlier dates
are lexocographically smaller. (I also want to find the most recent N
less than some date D).
I know I could invert my dates (something like MAX_LONG - date) to get
the REVERSE order, but I want to be able to do least recent and most
recent.
Why not have two date fields, one inverted and one not?

PS: my current solution is to do a binary search between MIN and MAX,
halving my search space until I find close to N matching documents.
That doesn't sound like a bad solution.

Doug

-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]


RE: Lucene scalability/clustering

2004-02-26 Thread Jochen Frey
Anson,

One way of doing it is having subsets of your indexes / data on
different machines. Each machine indexes its own data. You implement a
system that distributes queries to the various machines and merges the
results back.

The working well completely depends on your implementation of the
distributed search.

I believe there was a discussion about implementing this using a
MultiSearcher somewhere as well.

Cheers!
Jochen


-Original Message-
From: Anson Lau [mailto:[EMAIL PROTECTED] 
Sent: Sunday, February 22, 2004 2:17 PM
To: 'Lucene Users List'
Subject: RE: Lucene scalability/clustering


Further on this topic - has anyone tried implementing a distributed
search with Lucene?  How does it work and does it work well?


Anson


-Original Message-
From: Hamish Carpenter [mailto:[EMAIL PROTECTED]
Sent: Monday, February 23, 2004 5:24 AM
To: Lucene Users List
Subject: Re: Lucene scalability/clustering

Hi All,

I'm Hamish Carpenter who contributed the benchmarks with the comment
about the IndexSearcherCache.  Using this solved our issues with too
many files open under linux.

The original IndexSearcherCache email is here:
http://www.mail-archive.com/[EMAIL PROTECTED]/msg01967.html

See here for a copy of the above message and a download link:
http://www.geocities.com/haytona/lucene/
The mailing list doesn't like attachments.  The source is 10K in size.

HTH

Hamish Carpenter.

[EMAIL PROTECTED] wrote:
  BTW, where can I get Peter Halacsy's IndexSearcherCache?

-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



RE: Lucene scalability/clustering

2004-02-26 Thread markharw00d
I tend to think of scaling in two dimensions: scaling by volumes of users and scaling 
by volumes of data. The former is addressed through replicated indexes 
and the latter by segmented indexes. 
Distribute replicated segments across multiple boxes and create a broker which
a)Determines which segments to query
b)Load balances query requests across the replicated servers for each segment
c) Merges responses

Make sure your communications are batched to avoid too much fine-grained chatter.

This is the basis of a scalable architecture.

Cheers
Mark


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



Re: Iterating TermEnum backwards

2004-02-26 Thread Matt Quail
I know I could invert my dates (something like MAX_LONG - date) to get
the REVERSE order, but I want to be able to do least recent and most
recent.


Why not have two date fields, one inverted and one not?

PS: my current solution is to do a binary search between MIN and MAX,
halving my search space until I find close to N matching documents.


That doesn't sound like a bad solution.


Cool, thanks for all your suggestions. I'm getting adequate performance 
from my binary search now, and if it really becomes a performance 
problem, I'll just index an inverted version of the date (we all have 
diskspace to spare!).

=Matt

-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]


Re: Database

2004-02-26 Thread Parminder Singh
Thanks Byron. That's the way even I've implemented. Each row is a document
and each column in the row is a field.

Thank You.

Parminder Singh
Mahindra-British Telecom Limited
Sharda Center, Erandwane
Pune 411 004. India.
Ph: 91-20-4018100 (Ext: 1847)
Mob: 91-9850053787

In war: resolution. In defeat: defiance. In victory: magnanimity. In peace:
goodwill. - Sir Winston Leonard Spencer Churchill

- Original Message -
From: Saltysiak, Byron [EMAIL PROTECTED]
To: 'Lucene Users List' [EMAIL PROTECTED]
Sent: Thursday, February 26, 2004 9:08 PM
Subject: RE: Database

 I have integrated with a database by creating Document objects based on
rows from the database and then creating indexes as normal. That was rather
easy to implement.

 Let me know if there is an easier or more direct way to use Lucene with
the database.


 - Byron Saltysiak

 -Original Message-
 From: Parminder Singh [mailto:[EMAIL PROTECTED]
 Sent: Wednesday, February 25, 2004 11:56 PM
 To: [EMAIL PROTECTED]
 Subject: Database


 I've a CMS application that deploys metadata to a database. Is it possible
to use lucene to search this database instead of it's (lucene's) index. If
you could tell me the steps that would be involved in doing this, it'd be
great help. I'm new to Lucene.

 Thank You.

 Parminder Singh

 --
--

 In war: resolution. In defeat: defiance. In victory: magnanimity. In
peace: goodwill. - Sir Winston Leonard Spencer Churchill

 *
 Disclaimer

 This message (including any attachments) contains
 confidential information intended for a specific
 individual and purpose, and is protected by law.
 If you are not the intended recipient, you should
 delete this message and are hereby notified that
 any disclosure, copying, or distribution of this
 message, or the taking of any action based on it,
 is strictly prohibited.

 *
 Visit us at http://www.mahindrabt.com



*
Disclaimer

This message (including any attachments) contains 
confidential information intended for a specific 
individual and purpose, and is protected by law. 
If you are not the intended recipient, you should 
delete this message and are hereby notified that 
any disclosure, copying, or distribution of this
message, or the taking of any action based on it, 
is strictly prohibited.

*
Visit us at http://www.mahindrabt.com




-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



RE: Database

2004-02-26 Thread Saltysiak, Byron
I have integrated with a database by creating Document objects based on rows from the 
database and then creating indexes as normal. That was rather easy to implement.

Let me know if there is an easier or more direct way to use Lucene with the database.


- Byron Saltysiak

-Original Message-
From: Parminder Singh [mailto:[EMAIL PROTECTED]
Sent: Wednesday, February 25, 2004 11:56 PM
To: [EMAIL PROTECTED]
Subject: Database


I've a CMS application that deploys metadata to a database. Is it possible to use 
lucene to search this database instead of it's (lucene's) index. If you could tell me 
the steps that would be involved in doing this, it'd be great help. I'm new to Lucene. 

Thank You.

Parminder Singh



In war: resolution. In defeat: defiance. In victory: magnanimity. In peace: goodwill. 
- Sir Winston Leonard Spencer Churchill

*
Disclaimer

This message (including any attachments) contains 
confidential information intended for a specific 
individual and purpose, and is protected by law. 
If you are not the intended recipient, you should 
delete this message and are hereby notified that 
any disclosure, copying, or distribution of this
message, or the taking of any action based on it, 
is strictly prohibited.

*
Visit us at http://www.mahindrabt.com