"Don't connect "database" (i.e. SQL, transactions, etc.) and Lucene. Connect data storage with simple, fast lookup and Lucene." Thanks, Grant for the clarification. I see now.
Nagesh On Tue, Jul 29, 2008 at 7:55 PM, Grant Ingersoll <[EMAIL PROTECTED]>wrote: > Don't connect "database" (i.e. SQL, transactions, etc.) and Lucene. > Connect data storage with simple, fast lookup and Lucene. > > One field is the key (i.e. the filename) the other field is a binary, > stored Field containing the contents of the file. Of course, there are > other ways of slicing and dicing, such that one can search (in the fuzzy > sense) the content and the key by adding tokenization, etc. This is the > more traditional model for Lucene > > Also, have a look at Apache Jackrabbit. It is a content repository that is > implemented with Lucene. > > -Grant > > > On Jul 29, 2008, at 10:02 AM, ನಾಗೇಶ್ ಸುಬ್ರಹ್ಮಣ್ಯ (Nagesh S) wrote: > > Hi Ian, >> Yes, I see that we are discussing an "option" here. >> >> But, as I said before (the three parts to search-based solution), I do not >> know (but, would like to know) how Lucene (java only - not Nutch, Solr, >> etc.) can be used as a datastore. >> >> Basically, I am not able to connect "database" and Lucene java. :) >> >> Nagesh >> >> >> On Tue, Jul 29, 2008 at 6:51 PM, Ian Lea <[EMAIL PROTECTED]> wrote: >> >> I don't think that anyone in this thread has said "should", just >>> "could" - it is a valid option (IMHO). Personally, I use it as a >>> store for lucene related data because I know and like and trust it, it >>> is already there for this project so no need to introduce another >>> software dependency, and because it is blindingly fast. >>> >>> >>> -- >>> Ian. >>> >>> >>> On Tue, Jul 29, 2008 at 1:43 PM, ನಾಗೇಶ್ ಸುಬ್ರಹ್ಮಣ್ಯ (Nagesh S) >>> <[EMAIL PROTECTED]> wrote: >>> >>>> The way I see it, search solutions (on whatever scale) have three >>>> >>> components >>> >>>> - data aggregation, indexing/searching and presentation of results. I >>>> thought, Lucene did the second part only. >>>> >>>> So, I do not quite follow, why should Lucene be used for datastore ? >>>> >>>> Nagesh >>>> >>>> On Tue, Jul 29, 2008 at 6:01 PM, Grant Ingersoll <[EMAIL PROTECTED] >>>> wrote: >>>> >>>> I think the answer is it can be done and probably quite well. I also >>>>> >>>> think >>> >>>> it's informative that Nutch does not use Lucene for this function, as I >>>>> understand it, but that shouldn't stop you either. You might also have >>>>> >>>> a >>> >>>> look at Apache Jackrabbit, which uses Lucene underneath as a content >>>>> repository. >>>>> >>>>> -Grant >>>>> >>>>> >>>>> On Jul 29, 2008, at 5:34 AM, Ganesh - yahoo wrote: >>>>> >>>>> Hello all, >>>>> >>>>>> >>>>>> I am also interested in this. I want to archive the content of the >>>>>> document using Lucene. >>>>>> >>>>>> Is it a good idea to use Lucene as storage engine? >>>>>> >>>>>> Regards >>>>>> Ganesh >>>>>> >>>>>> ----- Original Message ----- From: "Ian Lea" <[EMAIL PROTECTED]> >>>>>> To: <java-user@lucene.apache.org> >>>>>> Sent: Tuesday, July 29, 2008 2:18 PM >>>>>> Subject: Re: Using lucene as a database... good idea or bad idea? >>>>>> >>>>>> >>>>>> John >>>>>> >>>>>>> >>>>>>> >>>>>>> I think it's a great idea, and do exactly this to store 5 million+ >>>>>>> documents with info that it takes way too long to get out of our >>>>>>> Oracle database (think days). Not as many docs as you are talking >>>>>>> about, and less data for each doc, but I wouldn't have any concerns >>>>>>> about scaling. There are certainly lucene indexes out there bigger >>>>>>> than what you propose. You can compress the stored data to save some >>>>>>> space. Run times for optimization might get interesting but see >>>>>>> recent threads for suggestions on that. And since you are not too >>>>>>> concerned about performance you may not need to optimize much, or >>>>>>> even >>>>>>> at all. >>>>>>> >>>>>>> Of course you need to remember that this is not a DBMS solution in >>>>>>> the >>>>>>> sense of transactions, recovery, etc. but I'm sure you are already >>>>>>> aware of that. >>>>>>> >>>>>>> >>>>>>> -- >>>>>>> Ian. >>>>>>> >>>>>>> >>>>>>> On Tue, Jul 29, 2008 at 2:53 AM, John Evans <[EMAIL PROTECTED]> >>>>>>> wrote: >>>>>>> >>>>>>> Hi All, >>>>>>>> >>>>>>>> I have successfully used Lucene in the "tradtiional" way to provide >>>>>>>> full-text search for various websites. Now I am tasked with >>>>>>>> >>>>>>> developing >>> >>>> a >>>>>>>> data-store to back a web crawler. The crawler can be configured to >>>>>>>> retrieve >>>>>>>> arbitrary fields from arbitrary pages, so the result is that each >>>>>>>> document >>>>>>>> may have a random assortment of fields. It seems like Lucene may be >>>>>>>> >>>>>>> a >>> >>>> natural fit for this scenario since you can obviously add arbitrary >>>>>>>> fields >>>>>>>> to each document and you can store the actually data in the >>>>>>>> database. >>>>>>>> I've >>>>>>>> done some research to make sure that it would meet all of our >>>>>>>> >>>>>>> individual >>> >>>> requirements (that we can iterate over documents, update >>>>>>>> (delete/replace) >>>>>>>> documents, etc.) and everything looks good. I've also seen a couple >>>>>>>> >>>>>>> of >>> >>>> references around the net to other people trying similar things... >>>>>>>> however, >>>>>>>> I know it's not meant to be used this way, so I thought I would post >>>>>>>> here >>>>>>>> and ask for guidance? Has anyone done something similar? Is there >>>>>>>> >>>>>>> any >>> >>>> specific reason to think this is a bad idea? >>>>>>>> >>>>>>>> The one thing that I am least certain about his how well it will >>>>>>>> >>>>>>> scale. >>> >>>> We >>>>>>>> may reach the point where we have tens of millions of documents and >>>>>>>> a >>>>>>>> high >>>>>>>> percentage of those documents may be relatively large (10k-50k >>>>>>>> each). >>>>>>>> We >>>>>>>> actually would NOT be expecting/needing Lucene's normal extreme fast >>>>>>>> text >>>>>>>> search times for this, but we would need reasonable times for adding >>>>>>>> >>>>>>> new >>> >>>> documents to the index, retrieving documents by ID (for iterating >>>>>>>> >>>>>>> over >>> >>>> all >>>>>>>> documents), optimizing the index after a series of changes, etc. >>>>>>>> >>>>>>>> Any advice/input/theories anyone can contribute would be greatly >>>>>>>> appreciated. >>>>>>>> >>>>>>>> Thanks, >>>>>>>> - >>>>>>>> John >>>>>>>> >>>>>>>> >>>>>>>> --------------------------------------------------------------------- >>>>>>> To unsubscribe, e-mail: [EMAIL PROTECTED] >>>>>>> For additional commands, e-mail: [EMAIL PROTECTED] >>>>>>> >>>>>>> >>>>>> Send instant messages to your online friends >>>>>> http://in.messenger.yahoo.com >>>>>> --------------------------------------------------------------------- >>>>>> To unsubscribe, e-mail: [EMAIL PROTECTED] >>>>>> For additional commands, e-mail: [EMAIL PROTECTED] >>>>>> >>>>>> >>>>>> -------------------------- >>>>> Grant Ingersoll >>>>> http://www.lucidimagination.com >>>>> >>>>> Lucene Helpful Hints: >>>>> http://wiki.apache.org/lucene-java/BasicsOfPerformance >>>>> http://wiki.apache.org/lucene-java/LuceneFAQ >>>>> >>>>> >>>>> >>>>> >>>>> >>>>> >>>>> >>>>> >>>>> >>>>> --------------------------------------------------------------------- >>>>> To unsubscribe, e-mail: [EMAIL PROTECTED] >>>>> For additional commands, e-mail: [EMAIL PROTECTED] >>>>> >>>>> >>>>> >>>> >>> > -------------------------- > Grant Ingersoll > http://www.lucidimagination.com > > Lucene Helpful Hints: > http://wiki.apache.org/lucene-java/BasicsOfPerformance > http://wiki.apache.org/lucene-java/LuceneFAQ > > > > > > > > > --------------------------------------------------------------------- > To unsubscribe, e-mail: [EMAIL PROTECTED] > For additional commands, e-mail: [EMAIL PROTECTED] > >