One thing I think brett failed to mention, is that this decoupling is just a step towards having the database as an optional component via the plugin system being worked on by James.
The database is just moving from being a core component to being an optional component. - Joakim On Mon, Dec 1, 2008 at 7:25 AM, Brett Porter <[EMAIL PROTECTED]> wrote: > > On 01/12/2008, at 7:17 PM, Brett Porter wrote: > >> There is one particular reference to a thread at the bottom of the wiki >> page linked below, but the main reference thread would be the target >> architecture one [1] (I'm not sure why Markmail has stopped detecting >> threads though...). >> >> It is not so much to remove, but decouple so that it will run with basic >> functionality without the database. >> >> That theme is probably scattered, so I can summarise: >> - derby takes quite a lot of memory which is a potential hinderance to >> running your own instance >> - the performance of populating the database has been poor on a large >> repository > > just to attempt to quantify this, the preliminary results are (37938 > artifacts): > - current scan: 10 Minutes 54 Seconds (update database, including generating > checksums) > - alternate scan: 35 seconds (not generating checksums), 2 Minutes 55 > Seconds (generating checksums) > > Not highly scientific - and once fleshed out the metadata writing might > increase marginally - but I think the magnitude of difference is clear :) > > We can also get a decent percentage win just by deferring all of the bits > that need to read the entire file contents (checksums, jarinfo) to a later > time, and generate it all at once if possible. > >> >> - harder to diagnose problems when the database is not in a consistent >> state >> - we don't particularly take advantage of the "robustness, reliability and >> scalability" of the database as it effectively acts as a cache for the local >> storage, doesn't handle concurrent servers, etc. >> >> More importantly, there are a number of things about the current design >> (not necessarily the database) that are a barrier to contribution IMO. Some >> parts are quite tightly coupled, and the database code is mixed in to the >> model. There is a mix of using paths and artifact references which causes a >> lot of back and forward conversions, and some Maven concepts are baked in >> that don't make sense for other repository types. The over-reliance on >> scanning which is a hang over from the very first code I checked in is >> biting us worst of all I think. >> >> I hope this all makes sense :) >> >> Cheers, >> Brett >> >> [1] http://markmail.org/message/6o6byzjsccgzgkmr >> >> >> On 01/12/2008, at 2:24 PM, Martin Cooper wrote: >> >>> Hey Brett, >>> >>> Do you have a handy link to the previous discussions you mention? I'm >>> curious as to why someone would elect to give up the robustness, >>> reliability >>> and scalability of a database, since I would have counted those as assets >>> rather than something to work to remove. >>> >>> Thanks! >>> >>> -- >>> Martin Cooper >>> > > -- > Brett Porter > [EMAIL PROTECTED] > http://blogs.exist.com/bporter/ > >
