Re: GData Server - Lucene storage

Otis Gospodnetic Fri, 02 Jun 2006 12:15:31 -0700

Simon,

I look a quick look at the UML PDF.  It seems to me that various *Services are 
overly complicated.  Since you can have only 1 thread modifying the Lucene 
index, perhaps you should go the same route as IndexModifier (I never used it, 
but it looks like people are using it to manage write/delete/search 
concurrency).  So perhaps all you need are IndexStorageService and 
SearchService for the searchable Lucene index(es), and a DataStorageService for 
storing and reading data from the BDB store or whatever you end up using.


Regarding the naming of StorageCache - this confused me at first.  Seeing 
"cache" makes me think "previously retrieved/found data stored in a cache for 
faster subsequent requests/searches".  But from what I can tell, that is not 
what StorageCache is about.  It looks like StorageCache is really a buffer of 
entries that are scheduled to be written to or deleted from the index+storage.  
If that's so, I would consder renaming this "StorageBuffer" or some such.

Otis

----- Original Message ----
From: Simon Willnauer <[EMAIL PROTECTED]>
To: java-dev@lucene.apache.org
Sent: Thursday, June 1, 2006 7:37:44 PM
Subject: GData Server - Lucene storage

Hello folks,
as I'm the only developer on the project due to  the SummerOfCode
program it is quiet a tough task to discuss all the architecture with
you on the mailing list. For this reason I decided to create UML
diagrams to discuss the main components. I will not attach the uml to
the mails rather upload it to a server so you can download an study
it.
Well, the next thing I have to implement is a storage to store the
entries in. I will provide 2 kinds of storage's (lucene and BerkleyDB
based). The first will be a lucene index to store the entries
identified by the entry ID and  feed ID stored in the index as a
Keyword (used to be Field.Keyword). The underlaying lucene storage
will only be used to store the entries compressed. Which feed entries
to retrieve from the lucene storage will be based on results of the
indexing/search component as every client request to a gdata server is
a query to the index. So the results of the search are entry ids and a
corresponding feed. These entries will be retrieved from the storage
and send back to the client. The storage component does also provide
delete / update and insert functionality (wouldn't be a storage
without these).
The biggest problem with the lucene storage is to achieve a
transactional state. Imagine the following scenario:
An update request comes in. -> the entry to update will be added to
the lucene writer   who writes the update. But another delete request
has locked the index and an IOException will be thrown. So the update
request will queue the entry and retries to obtain the lock. No
problem so far. But if the index writer can not open the index due to
some other error (the index could not be found)  the exception will
also be an IOExc. Is there any way to figure out whether the
IOException is caused due to a lock which would be alright or due to
some other serious reasons?

I added some comments on the UML to describe the arch. to you in more
detail. So please download the file and have a look at it.

http://www.javawithchopsticks.de/webaccess/lucenestorage.pdf

I will appreciate all your comments!!

regards Simon

---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]





---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Re: GData Server - Lucene storage

Reply via email to