Re: Opinions: Using Lucene as a thin database
On Tuesday 14 December 2004 20:13, Monsur Hossain wrote: > My concern is that this just shifts the scaling issue to Lucene, and I > haven't found much info on how to scale Lucene vertically. Â You can easily use MultiSearcher to search over several indices. If you want the distribution to be more transparent, have a look at Nutch. Regards Daniel -- http://www.danielnaber.de - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
RE: Opinions: Using Lucene as a thin database
Well, one could always partition an index, distribute pieces of it horizontally across multiple 'search servers' and use the built-in RMI-based and Parallel search feature. Nutch uses something similar for search scaling. Otis --- Monsur Hossain <[EMAIL PROTECTED]> wrote: > > My concern is that this just shifts the scaling issue to > > Lucene, and I haven't found much info on how to scale Lucene > > vertically. > > By "vertically", of course, I meant "horizontally". Basically > scaling > it across servers as one might do with a relational database. > > > > - > To unsubscribe, e-mail: [EMAIL PROTECTED] > For additional commands, e-mail: [EMAIL PROTECTED] > > - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
RE: Opinions: Using Lucene as a thin database
You can see Flickr-like tag (lookup) system at my Simpy site ( http://www.simpy.com ). It uses Lucene as the backend for lookups, but still uses a RDBMS as the primary storage. I find it that keeping the RDBMS and Lucene indices is a bit of a pain and error prone, so _thin_ storage layer with simple requirements will be okay with just using Lucene, while applications with more complex domain models will quickly run into limitation (using the wrong tool for the job type of problem). Otis --- Monsur Hossain <[EMAIL PROTECTED]> wrote: > I think this is a great idea, and one that I've been mulling over to > implement keyword lookups (similar to Flickr.com's tag system). I > believe the advantage over a relational database comes from Lucene's > inverted index, which is highly optimized for this kind of lookup. > > My concern is that this just shifts the scaling issue to Lucene, and > I > haven't found much info on how to scale Lucene vertically. > > > > > > -Original Message- > > From: Kevin L. Cobb [mailto:[EMAIL PROTECTED] > > Sent: Tuesday, December 14, 2004 9:40 AM > > To: [EMAIL PROTECTED] > > Subject: Opinions: Using Lucene as a thin database > > > > > > I use Lucene as a legitimate search engine which is cool. > > But, I am also using it as a simple database too. I build an > > index with a couple of keyword fields that allows me to > > retrieve values based on exact matches in those fields. This > > is all I need to do so it works just fine for my needs. I > > also love the speed. The index is small enough that it is > > wicked fast. Was wondering if anyone out there was doing the > > same of it there are any dissenting opinions on using Lucene > > for this purpose. > > > > > > > > > > > > > > > > > > > - > To unsubscribe, e-mail: [EMAIL PROTECTED] > For additional commands, e-mail: [EMAIL PROTECTED] > > - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
RE: Opinions: Using Lucene as a thin database
I think this is a great idea, and one that I've been mulling over to implement keyword lookups (similar to Flickr.com's tag system). I believe the advantage over a relational database comes from Lucene's inverted index, which is highly optimized for this kind of lookup. My concern is that this just shifts the scaling issue to Lucene, and I haven't found much info on how to scale Lucene vertically. > -Original Message- > From: Kevin L. Cobb [mailto:[EMAIL PROTECTED] > Sent: Tuesday, December 14, 2004 9:40 AM > To: [EMAIL PROTECTED] > Subject: Opinions: Using Lucene as a thin database > > > I use Lucene as a legitimate search engine which is cool. > But, I am also using it as a simple database too. I build an > index with a couple of keyword fields that allows me to > retrieve values based on exact matches in those fields. This > is all I need to do so it works just fine for my needs. I > also love the speed. The index is small enough that it is > wicked fast. Was wondering if anyone out there was doing the > same of it there are any dissenting opinions on using Lucene > for this purpose. > > > > > > > > - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Re: Opinions: Using Lucene as a thin database
: select * from MY_TABLE where MY_NUMERIC_FIELD > 80 : : as far as I know you have only the range query so you will have to say : : my_numeric_filed:[80 TO ??] : but this would not work in the a/m example or am I missing something? RangeQuery allows you to an open ended range -- you can tell the QueryParser to leave your range opened ended using hte keyword "null", ie... my_numeric_filed:[80 TO null] -Hoss - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
RE: Opinions: Using Lucene as a thin database
> My concern is that this just shifts the scaling issue to > Lucene, and I haven't found much info on how to scale Lucene > vertically. By "vertically", of course, I meant "horizontally". Basically scaling it across servers as one might do with a relational database. - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Re: Opinions: Using Lucene as a thin database
On Dec 14, 2004, at 15:40, Kevin L. Cobb wrote: Was wondering if anyone out there was doing the same of it there are any dissenting opinions on using Lucene for this purpose. ZOE [1] [2] takes the same approach and uses Lucene as a relational engine of sort. However, for both practical and ideological reasons, its does not store any raw data in the Lucene indices themselves but instead uses JDBM [2] for that purpose. All things considered, update issues aside, Lucene turns out to be a very flexible "thin database". Cheers, PA. [1] http://zoe.nu/ [2] http://cvs.sourceforge.net/viewcvs.py/zoe/ZOE/Frameworks/SZObject/ [3] http://jdbm.sourceforge.net/ - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Re: Opinions: Using Lucene as a thin database
Hmm. So far all our fields are just strings. But I would guess you should be able to use Integer.MAX_VALUE or something on the upper bound. Or there might be a better way of doing it. Praveen - Original Message - From: "Akmal Sarhan" <[EMAIL PROTECTED]> To: "Lucene Users List" <[EMAIL PROTECTED]> Sent: Tuesday, December 14, 2004 10:23 AM Subject: Re: Opinions: Using Lucene as a thin database that sounds very interesting but how do you handle queries like select * from MY_TABLE where MY_NUMERIC_FIELD > 80 as far as I know you have only the range query so you will have to say my_numeric_filed:[80 TO ??] but this would not work in the a/m example or am I missing something? regards Akmal Am Di, den 14.12.2004 schrieb Praveen Peddi um 16:07: Even we use lucene for similar purpose except that we index and store quite a few fields. Infact I also update partial documents as people suggested. I store all the indexed fields so I don't have to build the whole document again while updating partial document. The reason we do this is due to the speed. I found the lucene search on a millions objects is 4 to 5 times faster than our oracle queries (ofcourse this might be due to our pitiful database design :) ). It works great so far. the only caveat that we had till now was incremental updates. But now I am implementing real-time updates so that the data in lucene index is almost always in sync with data in database. So now, our search does not goto the database at all. Praveen - Original Message - From: "Kevin L. Cobb" <[EMAIL PROTECTED]> To: <[EMAIL PROTECTED]> Sent: Tuesday, December 14, 2004 9:40 AM Subject: Opinions: Using Lucene as a thin database I use Lucene as a legitimate search engine which is cool. But, I am also using it as a simple database too. I build an index with a couple of keyword fields that allows me to retrieve values based on exact matches in those fields. This is all I need to do so it works just fine for my needs. I also love the speed. The index is small enough that it is wicked fast. Was wondering if anyone out there was doing the same of it there are any dissenting opinions on using Lucene for this purpose. - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] !EXCUBATOR:41bf0221115901292611315! - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Re: Opinions: Using Lucene as a thin database
that sounds very interesting but how do you handle queries like select * from MY_TABLE where MY_NUMERIC_FIELD > 80 as far as I know you have only the range query so you will have to say my_numeric_filed:[80 TO ??] but this would not work in the a/m example or am I missing something? regards Akmal Am Di, den 14.12.2004 schrieb Praveen Peddi um 16:07: > Even we use lucene for similar purpose except that we index and store quite > a few fields. Infact I also update partial documents as people suggested. I > store all the indexed fields so I don't have to build the whole document > again while updating partial document. The reason we do this is due to the > speed. I found the lucene search on a millions objects is 4 to 5 times > faster than our oracle queries (ofcourse this might be due to our pitiful > database design :) ). It works great so far. the only caveat that we had > till now was incremental updates. But now I am implementing real-time > updates so that the data in lucene index is almost always in sync with data > in database. So now, our search does not goto the database at all. > > Praveen > - Original Message - > From: "Kevin L. Cobb" <[EMAIL PROTECTED]> > To: <[EMAIL PROTECTED]> > Sent: Tuesday, December 14, 2004 9:40 AM > Subject: Opinions: Using Lucene as a thin database > > > I use Lucene as a legitimate search engine which is cool. But, I am also > using it as a simple database too. I build an index with a couple of > keyword fields that allows me to retrieve values based on exact matches > in those fields. This is all I need to do so it works just fine for my > needs. I also love the speed. The index is small enough that it is > wicked fast. Was wondering if anyone out there was doing the same of it > there are any dissenting opinions on using Lucene for this purpose. > > > > > > > > > > - > To unsubscribe, e-mail: [EMAIL PROTECTED] > For additional commands, e-mail: [EMAIL PROTECTED] > > > !EXCUBATOR:41bf0221115901292611315! > - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Re: Opinions: Using Lucene as a thin database
How big do you expect it to get and how often do you expect to update it, we've been using Lucene for about 1 M records (19 fields each) with incremental updates every 10 minutes, the performance during updates wasn't wonderful, so it took some seriously intense code to sort that out, as you mentioned, it comes down to why you need the Thin DB for, Lucene is a wonderful search engine, but if I were looking at a fast and dirty relational DB, MySQL wins hands down, put them both together and you've really got something. My 2 cents Nader Henein Kevin L. Cobb wrote: I use Lucene as a legitimate search engine which is cool. But, I am also using it as a simple database too. I build an index with a couple of keyword fields that allows me to retrieve values based on exact matches in those fields. This is all I need to do so it works just fine for my needs. I also love the speed. The index is small enough that it is wicked fast. Was wondering if anyone out there was doing the same of it there are any dissenting opinions on using Lucene for this purpose. - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Re: Opinions: Using Lucene as a thin database
Even we use lucene for similar purpose except that we index and store quite a few fields. Infact I also update partial documents as people suggested. I store all the indexed fields so I don't have to build the whole document again while updating partial document. The reason we do this is due to the speed. I found the lucene search on a millions objects is 4 to 5 times faster than our oracle queries (ofcourse this might be due to our pitiful database design :) ). It works great so far. the only caveat that we had till now was incremental updates. But now I am implementing real-time updates so that the data in lucene index is almost always in sync with data in database. So now, our search does not goto the database at all. Praveen - Original Message - From: "Kevin L. Cobb" <[EMAIL PROTECTED]> To: <[EMAIL PROTECTED]> Sent: Tuesday, December 14, 2004 9:40 AM Subject: Opinions: Using Lucene as a thin database I use Lucene as a legitimate search engine which is cool. But, I am also using it as a simple database too. I build an index with a couple of keyword fields that allows me to retrieve values based on exact matches in those fields. This is all I need to do so it works just fine for my needs. I also love the speed. The index is small enough that it is wicked fast. Was wondering if anyone out there was doing the same of it there are any dissenting opinions on using Lucene for this purpose. - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Re: Opinions: Using Lucene as a thin database
On Dec 14, 2004, at 9:40 AM, Kevin L. Cobb wrote: I use Lucene as a legitimate search engine which is cool. But, I am also using it as a simple database too. I build an index with a couple of keyword fields that allows me to retrieve values based on exact matches in those fields. This is all I need to do so it works just fine for my needs. I also love the speed. The index is small enough that it is wicked fast. Was wondering if anyone out there was doing the same of it there are any dissenting opinions on using Lucene for this purpose. I use Lucene as the complete data storage for my blog at http://www.blogscene.org/erik - all HTTP requests map to a Lucene query (based on the path and optional query parameter). I've been lame and have never put any caching in there. I'm about to start a new project that really needs a relational database under the covers, but I'm cringing at the headaches involved compared to the joys of using Lucene. Erik - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
RE: Opinions: Using Lucene as a thin database
I don't have the requirement to do range type select, i.e. the only operator I would need is the equals. Select * from MY_TABLE where MY_NUMERIC_FIELD = 80. My fields that are searchable in my model are always type KEYWORD. I believe this forces the match to be exact. So thinking about it in anything other than "equals" terms, I believe, would be a mistake. In any case, I believe that the requirement to use Lucene as a "thin DB" means that your requirements for your database select are fairly simple and straightforward. KLCobb -Original Message- From: Akmal Sarhan [mailto:[EMAIL PROTECTED] Sent: Tuesday, December 14, 2004 10:24 AM To: Lucene Users List Subject: Re: Opinions: Using Lucene as a thin database that sounds very interesting but how do you handle queries like select * from MY_TABLE where MY_NUMERIC_FIELD > 80 as far as I know you have only the range query so you will have to say my_numeric_filed:[80 TO ??] but this would not work in the a/m example or am I missing something? regards Akmal Am Di, den 14.12.2004 schrieb Praveen Peddi um 16:07: > Even we use lucene for similar purpose except that we index and store quite > a few fields. Infact I also update partial documents as people suggested. I > store all the indexed fields so I don't have to build the whole document > again while updating partial document. The reason we do this is due to the > speed. I found the lucene search on a millions objects is 4 to 5 times > faster than our oracle queries (ofcourse this might be due to our pitiful > database design :) ). It works great so far. the only caveat that we had > till now was incremental updates. But now I am implementing real-time > updates so that the data in lucene index is almost always in sync with data > in database. So now, our search does not goto the database at all. > > Praveen > - Original Message - > From: "Kevin L. Cobb" <[EMAIL PROTECTED]> > To: <[EMAIL PROTECTED]> > Sent: Tuesday, December 14, 2004 9:40 AM > Subject: Opinions: Using Lucene as a thin database > > > I use Lucene as a legitimate search engine which is cool. But, I am also > using it as a simple database too. I build an index with a couple of > keyword fields that allows me to retrieve values based on exact matches > in those fields. This is all I need to do so it works just fine for my > needs. I also love the speed. The index is small enough that it is > wicked fast. Was wondering if anyone out there was doing the same of it > there are any dissenting opinions on using Lucene for this purpose. > > > > > > > > > > - > To unsubscribe, e-mail: [EMAIL PROTECTED] > For additional commands, e-mail: [EMAIL PROTECTED] > > > !EXCUBATOR:41bf0221115901292611315! > - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Opinions: Using Lucene as a thin database
I use Lucene as a legitimate search engine which is cool. But, I am also using it as a simple database too. I build an index with a couple of keyword fields that allows me to retrieve values based on exact matches in those fields. This is all I need to do so it works just fine for my needs. I also love the speed. The index is small enough that it is wicked fast. Was wondering if anyone out there was doing the same of it there are any dissenting opinions on using Lucene for this purpose.