Re: progress on database decoupling

Joakim Erdfelt Mon, 01 Dec 2008 07:40:25 -0800

One thing I think brett failed to mention, is that this decoupling is
just a step towards having the database as an optional component via
the plugin system being worked on by James.


The database is just moving from being a core component to being an
optional component.

- Joakim

On Mon, Dec 1, 2008 at 7:25 AM, Brett Porter <[EMAIL PROTECTED]> wrote:
>
> On 01/12/2008, at 7:17 PM, Brett Porter wrote:
>
>> There is one particular reference to a thread at the bottom of the wiki
>> page linked below, but the main reference thread would be the target
>> architecture one [1] (I'm not sure why Markmail has stopped detecting
>> threads though...).
>>
>> It is not so much to remove, but decouple so that it will run with basic
>> functionality without the database.
>>
>> That theme is probably scattered, so I can summarise:
>> - derby takes quite a lot of memory which is a potential hinderance to
>> running your own instance
>> - the performance of populating the database has been poor on a large
>> repository
>
> just to attempt to quantify this, the preliminary results are (37938
> artifacts):
> - current scan: 10 Minutes 54 Seconds (update database, including generating
> checksums)
> - alternate scan: 35 seconds (not generating checksums), 2 Minutes 55
> Seconds (generating checksums)
>
> Not highly scientific - and once fleshed out the metadata writing might
> increase marginally - but I think the magnitude of difference is clear :)
>
> We can also get a decent percentage win just by deferring all of the bits
> that need to read the entire file contents (checksums, jarinfo) to a later
> time, and generate it all at once if possible.
>
>>
>> - harder to diagnose problems when the database is not in a consistent
>> state
>> - we don't particularly take advantage of the "robustness, reliability and
>> scalability" of the database as it effectively acts as a cache for the local
>> storage, doesn't handle concurrent servers, etc.
>>
>> More importantly, there are a number of things about the current design
>> (not necessarily the database) that are a barrier to contribution IMO. Some
>> parts are quite tightly coupled, and the database code is mixed in to the
>> model. There is a mix of using paths and artifact references which causes a
>> lot of back and forward conversions, and some Maven concepts are baked in
>> that don't make sense for other repository types. The over-reliance on
>> scanning which is a hang over from the very first code I checked in is
>> biting us worst of all I think.
>>
>> I hope this all makes sense :)
>>
>> Cheers,
>> Brett
>>
>> [1] http://markmail.org/message/6o6byzjsccgzgkmr
>>
>>
>> On 01/12/2008, at 2:24 PM, Martin Cooper wrote:
>>
>>> Hey Brett,
>>>
>>> Do you have a handy link to the previous discussions you mention? I'm
>>> curious as to why someone would elect to give up the robustness,
>>> reliability
>>> and scalability of a database, since I would have counted those as assets
>>> rather than something to work to remove.
>>>
>>> Thanks!
>>>
>>> --
>>> Martin Cooper
>>>
>
> --
> Brett Porter
> [EMAIL PROTECTED]
> http://blogs.exist.com/bporter/
>
>

Re: progress on database decoupling

Reply via email to