On 01/12/2008, at 7:17 PM, Brett Porter wrote:
There is one particular reference to a thread at the bottom of the
wiki page linked below, but the main reference thread would be the
target architecture one [1] (I'm not sure why Markmail has stopped
detecting threads though...).
It is not so much to remove, but decouple so that it will run with
basic functionality without the database.
That theme is probably scattered, so I can summarise:
- derby takes quite a lot of memory which is a potential hinderance
to running your own instance
- the performance of populating the database has been poor on a
large repository
just to attempt to quantify this, the preliminary results are (37938
artifacts):
- current scan: 10 Minutes 54 Seconds (update database, including
generating checksums)
- alternate scan: 35 seconds (not generating checksums), 2 Minutes 55
Seconds (generating checksums)
Not highly scientific - and once fleshed out the metadata writing
might increase marginally - but I think the magnitude of difference is
clear :)
We can also get a decent percentage win just by deferring all of the
bits that need to read the entire file contents (checksums, jarinfo)
to a later time, and generate it all at once if possible.
- harder to diagnose problems when the database is not in a
consistent state
- we don't particularly take advantage of the "robustness,
reliability and scalability" of the database as it effectively acts
as a cache for the local storage, doesn't handle concurrent servers,
etc.
More importantly, there are a number of things about the current
design (not necessarily the database) that are a barrier to
contribution IMO. Some parts are quite tightly coupled, and the
database code is mixed in to the model. There is a mix of using
paths and artifact references which causes a lot of back and forward
conversions, and some Maven concepts are baked in that don't make
sense for other repository types. The over-reliance on scanning
which is a hang over from the very first code I checked in is biting
us worst of all I think.
I hope this all makes sense :)
Cheers,
Brett
[1] http://markmail.org/message/6o6byzjsccgzgkmr
On 01/12/2008, at 2:24 PM, Martin Cooper wrote:
Hey Brett,
Do you have a handy link to the previous discussions you mention? I'm
curious as to why someone would elect to give up the robustness,
reliability
and scalability of a database, since I would have counted those as
assets
rather than something to work to remove.
Thanks!
--
Martin Cooper
--
Brett Porter
[EMAIL PROTECTED]
http://blogs.exist.com/bporter/