Re: Initial roadmap discussion

Raffaele P. Guidi Sun, 09 Oct 2011 03:02:22 -0700

PS: as you may have noticed I've put the proposal in a wiki page
http://wiki.apache.org/directmemory/ProposedRoadmap for ease of use and
readability.


Ciao,
     R

On Sun, Oct 9, 2011 at 10:44 AM, Raffaele P. Guidi <[email protected]
> wrote:

> Gentlemen, welcome and thank you for joining in (and the opportunity, for
> me and the project, to join the ASF, which is great) . I wrote some notes
> about the current state of the project and some hypothesis on future
> developments which I would like to discuss with you all. These are the
> items I would like to discuss (and sorry for being a bit lengthy):
>
>    - *Design choices*
>    - *New features*
>    - *Integration with other products*
>    - *Build, Test and Continuous integration strategy *
>    - *Miscellanea*
>
> *Design choices*
> *I recently rewrote DM entirely for simplification. It used to have three
> layers (heap, off-heap, file/nosql) and to authomatically push
> forward/backward in the chain items according to their usage. It turned out
> overly complicated and mostly inefficent at runtime (probably mostly because
> of my poor implementation). The singleton facade is proving simple and
> effective and well refects the nature of direct memory - which cannot be
> really freed. But this needs a strategy for feature and behaviour
> composability.*
>
>    - *Singleton *(largely Play! inspired) *approach *- is it good?
>    - *Feature and behaviour composability*  (DI and Feature injection? A
>    plugin system? OSGi?)* - just let's keep things simple and developer
>    friendly*
>
> *New features*
> Adding simple heap cache features would spread usage among those who think
> that would EVENTUALLY need a huge off-heap one (I believe it's the vast
> majority of our potential "customers"). Same thing for file and distributed
> ones. Having both three would qualify DM as an Enterprise Ready (please
> notice the capitalization ;) cache.
>
>    - *Heap storage *- Guava already fits the requirement, of course. We
>    could both use the heap as a "queue" to speed up inserts and serialize 
> later
>    and/or keep most frequently used items into the heap for speed. It's more a
>    design choice than a technical one
>    - *File storage *-  this would be easy to achieve with the same "index"
>    strategy of the off-heap one (I believe JCS does the same)
>    - *Lateral storage *(distributed or replicated) - A possible way to do
>    this: *hazelcast *for map distribution and *Apache Thrift *for intra
>    node communication (node a needs an item stored in node b and then asks for
>    it). I'm not sure hazelcast would perform as well as Guava with multi
>    million item maps, it has to be thoroughly tested for perfomance and memory
>    consumption - should hazelcast not fit the performance requirement we 
> should
>    finda an alternative way to distribute/replicate the map across nodes.
>    *jgroups *with multicasting would be perfect but it's LGPL (well, JCS
>    uses it) and, of course, a custom, maybe thrift based, distribution
>    mechanism could be written ad-hoc
>
> *Integration with other products*
> Providing plugins, integration or just support with/for other
> technologies/products would of course spread adoption. These are the first
> few that pop in my mind at the moment
>
>    - *Apache Cayenne integration* - do I need to tell why? ;)
>    - *Play! Framework integration* - because I simply love play! and use
>    it in other side projects whenever I need a web/mobile fron-end
>    - *Memcached *(like) *integration* - DirectMemory can be seen as an
>    embedded memcached and adoption its protocol would be a good fit for
>    replacing it in distributed scenarios most of all when it's used by java
>    applications
>    - *Scala, Clojure and other jvm languages* integration - emerging
>    technologies that deserve attention. Should I have 48 hours days I would 
> use
>    the other 24 to improve my scala skills and rewrite DirectMemory can with 
> it
>
>
>
> *Miscellanea*
> There are of course a lot of things that are not essential but could be
> investigated
>
>    - *HugeArrayList, FastMap*, etc... DirectMemory currently uses Guava
>    for the Map and ArrayList (I know it's not thread safe but it could be
>    really not required) for the Pointer's index. Evaluation of  other fast and
>    low memory impact Map and List implementation could possibly bring
>    performance improvements
>    - *Reliability improvements* - DirectMemory is fast also because it
>    sacrifices reliability - is it always a good trade-off? Could we provide
>    configuration or pluggable implementation for different usage scenarios,
>    maybe at list for the MemoryManager? Or even transactionality?
>    - *Would hadoop need off-heap* caching? (this is a good one)
>
> *Build, Test and Continuous integration strategy *
> *The overall point for DM is testing for performance with large quantities
> of memory - where the minimum should be more than the average 2GB used by
> web applications - the more the better.*
>
>    - *Testing infrastructure* - I currently use an amazon machine with
>    16+GB RAM (which costs ~$1 per hour), a bit tedious and time consuming to
>    startup and to deploy on (would require some scripting) and of course
>    continuous performance testing is too expensive - alternatives?
>    - *Branching strategy* - I don't like feature branches - I believe
>    feature composability should not be done at the SCM level - (and SVN is
>    probably a bit too slow for them) and don't believe in using just release
>    branches. Don't know whether there's an apache standard but I usually work
>    with *spike* branches (where a spike is more than a single feature and
>    less than a whole release) and then publish on release branches tagging for
>    events (production, distribution, etc). Does it sound good for you?
>    - *Binary packaging and demo applications* - I used to provide a binary
>    distribution and a simple web application to test against but it simply was
>    too effort for me alone
>    - *OSGi bundling* - it costs very little and can be quite useful
>    - *Maven repository *- I've applied for a sonatype repo registration
>    but simply didn't have enough time to complete it and I'm using a github
>    folder as a repository. I guess that artifacts would naturally go in apache
>    repos, from now on, right?
>    - *Testing and certification* over different JVMs and OS (sun, openjdk,
>    ibm, windows, linux, AIX? Solaris?)
>
> *Roadmap*
> I would say that intensive performance testing and certification would make
> a solid 0.7 GA release; heap and file storages inclusion would make a pretty
> good 1.0 (the distributed storage would make it incredible!)
>
> Waiting forward for your feedback.
>
> Cheers,
>      Raffaele
>

Re: Initial roadmap discussion

Reply via email to