The way I see it, search solutions (on whatever scale) have three components
- data aggregation, indexing/searching and presentation of results. I
thought, Lucene did the second part only.

So, I do not quite follow, why should Lucene be used for datastore ?

Nagesh

On Tue, Jul 29, 2008 at 6:01 PM, Grant Ingersoll <[EMAIL PROTECTED]>wrote:

> I think the answer is it can be done and probably quite well.  I also think
> it's informative that Nutch does not use Lucene for this function, as I
> understand it, but that shouldn't stop you either.  You might also have a
> look at Apache Jackrabbit, which uses Lucene underneath as a content
> repository.
>
> -Grant
>
>
> On Jul 29, 2008, at 5:34 AM, Ganesh - yahoo wrote:
>
>  Hello all,
>>
>> I am also interested in this. I want to archive the content of the
>> document using Lucene.
>>
>> Is it a good idea to use Lucene as storage engine?
>>
>> Regards
>> Ganesh
>>
>> ----- Original Message ----- From: "Ian Lea" <[EMAIL PROTECTED]>
>> To: <java-user@lucene.apache.org>
>> Sent: Tuesday, July 29, 2008 2:18 PM
>> Subject: Re: Using lucene as a database... good idea or bad idea?
>>
>>
>>  John
>>>
>>>
>>> I think it's a great idea, and do exactly this to store 5 million+
>>> documents with info that it takes way too long to get out of our
>>> Oracle database (think days).  Not as many docs as you are talking
>>> about, and less data for each doc, but I wouldn't have any concerns
>>> about scaling.  There are certainly lucene indexes out there bigger
>>> than what you propose.  You can compress the stored data to save some
>>> space.  Run times for optimization might get interesting but see
>>> recent threads for suggestions on that.  And since you are not too
>>> concerned about performance you may not need to optimize much, or even
>>> at all.
>>>
>>> Of course you need to remember that this is not a DBMS solution in the
>>> sense of transactions, recovery, etc. but I'm sure you are already
>>> aware of that.
>>>
>>>
>>> --
>>> Ian.
>>>
>>>
>>> On Tue, Jul 29, 2008 at 2:53 AM, John Evans <[EMAIL PROTECTED]> wrote:
>>>
>>>> Hi All,
>>>>
>>>> I have successfully used Lucene in the "tradtiional" way to provide
>>>> full-text search for various websites.  Now I am tasked with developing
>>>> a
>>>> data-store to back a web crawler.  The crawler can be configured to
>>>> retrieve
>>>> arbitrary fields from arbitrary pages, so the result is that each
>>>> document
>>>> may have a random assortment of fields.  It seems like Lucene may be a
>>>> natural fit for this scenario since you can obviously add arbitrary
>>>> fields
>>>> to each document and you can store the actually data in the database.
>>>> I've
>>>> done some research to make sure that it would meet all of our individual
>>>> requirements (that we can iterate over documents, update
>>>> (delete/replace)
>>>> documents, etc.) and everything looks good.  I've also seen a couple of
>>>> references around the net to other people trying similar things...
>>>> however,
>>>> I know it's not meant to be used this way, so I thought I would post
>>>> here
>>>> and ask for guidance?  Has anyone done something similar?  Is there any
>>>> specific reason to think this is a bad idea?
>>>>
>>>> The one thing that I am least certain about his how well it will scale.
>>>> We
>>>> may reach the point where we have tens of millions of documents and a
>>>> high
>>>> percentage of those documents may be relatively large (10k-50k each).
>>>>  We
>>>> actually would NOT be expecting/needing Lucene's normal extreme fast
>>>> text
>>>> search times for this, but we would need reasonable times for adding new
>>>> documents to the index, retrieving documents by ID (for iterating over
>>>> all
>>>> documents), optimizing the index after a series of changes, etc.
>>>>
>>>> Any advice/input/theories anyone can contribute would be greatly
>>>> appreciated.
>>>>
>>>> Thanks,
>>>> -
>>>> John
>>>>
>>>>
>>> ---------------------------------------------------------------------
>>> To unsubscribe, e-mail: [EMAIL PROTECTED]
>>> For additional commands, e-mail: [EMAIL PROTECTED]
>>>
>>
>> Send instant messages to your online friends
>> http://in.messenger.yahoo.com
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: [EMAIL PROTECTED]
>> For additional commands, e-mail: [EMAIL PROTECTED]
>>
>>
> --------------------------
> Grant Ingersoll
> http://www.lucidimagination.com
>
> Lucene Helpful Hints:
> http://wiki.apache.org/lucene-java/BasicsOfPerformance
> http://wiki.apache.org/lucene-java/LuceneFAQ
>
>
>
>
>
>
>
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: [EMAIL PROTECTED]
> For additional commands, e-mail: [EMAIL PROTECTED]
>
>
  • Using luce... John Evans
    • Re: U... Hasan Diwan
    • Re: U... Ian Lea
      • R... Ganesh - yahoo
        • ... Grant Ingersoll
          • ... ನಾಗೇಶ್ ಸುಬ್ರಹ್ಮಣ್ಯ (Nagesh S)
            • ... Ian Lea
              • ... Grant Ingersoll
              • ... ನಾಗೇಶ್ ಸುಬ್ರಹ್ಮಣ್ಯ (Nagesh S)
                • ... Aravind . Yarram
                • ... Grant Ingersoll
                • ... ನಾಗೇಶ್ ಸುಬ್ರಹ್ಮಣ್ಯ (Nagesh S)
          • ... Karsten F.
            • ... Grant Ingersoll
              • ... Ganesh - yahoo
                • ... Karsten F.

Reply via email to