On 01/12/2008, at 7:17 PM, Brett Porter wrote:

There is one particular reference to a thread at the bottom of the wiki page linked below, but the main reference thread would be the target architecture one [1] (I'm not sure why Markmail has stopped detecting threads though...).

It is not so much to remove, but decouple so that it will run with basic functionality without the database.

That theme is probably scattered, so I can summarise:
- derby takes quite a lot of memory which is a potential hinderance to running your own instance - the performance of populating the database has been poor on a large repository

just to attempt to quantify this, the preliminary results are (37938 artifacts): - current scan: 10 Minutes 54 Seconds (update database, including generating checksums) - alternate scan: 35 seconds (not generating checksums), 2 Minutes 55 Seconds (generating checksums)

Not highly scientific - and once fleshed out the metadata writing might increase marginally - but I think the magnitude of difference is clear :)

We can also get a decent percentage win just by deferring all of the bits that need to read the entire file contents (checksums, jarinfo) to a later time, and generate it all at once if possible.


- harder to diagnose problems when the database is not in a consistent state - we don't particularly take advantage of the "robustness, reliability and scalability" of the database as it effectively acts as a cache for the local storage, doesn't handle concurrent servers, etc.

More importantly, there are a number of things about the current design (not necessarily the database) that are a barrier to contribution IMO. Some parts are quite tightly coupled, and the database code is mixed in to the model. There is a mix of using paths and artifact references which causes a lot of back and forward conversions, and some Maven concepts are baked in that don't make sense for other repository types. The over-reliance on scanning which is a hang over from the very first code I checked in is biting us worst of all I think.

I hope this all makes sense :)

Cheers,
Brett

[1] http://markmail.org/message/6o6byzjsccgzgkmr


On 01/12/2008, at 2:24 PM, Martin Cooper wrote:

Hey Brett,

Do you have a handy link to the previous discussions you mention? I'm
curious as to why someone would elect to give up the robustness, reliability and scalability of a database, since I would have counted those as assets
rather than something to work to remove.

Thanks!

--
Martin Cooper


--
Brett Porter
[EMAIL PROTECTED]
http://blogs.exist.com/bporter/

Reply via email to