>   The use-case is defined very loosely: Having a Content Repository 
> System, which is horizontically scalable on commodity hardware, 
> preferrably stored in a way that parallel processing of the data can 
> easily take place. This means the data should be at least easily 
> exportable to a HDFS where MapReduce processing of the data may take 
> place efficiently.

Those sound like some great goals that should inform Fedora's
architecture nicely.

>   Our current idea is to use fedora on the backend of the system, 
> therefore im quite interested in developing around the HighlevelStore 
> ideas you guys were thinking about implementing.

Once 3.5 is out the door, we should bring this up on one of the Tuesday
developer meetings.  A few weeks ago, there was a suggestion that we
should define the essence of what fedora 4.0 is (HighlevelStorage
interface?  Rest API cleanup?  Architectural changes such as removing
FieldSearch from the core?  Remove classic authn, leaving only FeSL?,
etc), and produce a "quick" alpha or two.  With some tweaks, we could
very well decide that we want the proofs of concept to be the basis for
the initial HighLevelStorage implementation.  


>   Also we have to take the different sizes of digital objects objects 
> into account since storage of small files is inefficient in HDFS and big 
> files are inefficient in HBase, but the System should be designed that 
> it works as well with terabyte big media files, as with small text 
> objects. So we're thinking about deciding on the objects size where to 
> put it, whether in a HDFS, a HBase BigTable or Hadoop-archives, 
> -sequence Files or -map files.

I see.  That was one of the initial motivations for HighlevelStorage:
create a better separation of concerns so that all decisions for where
and how to store content would be made below the "fedora logic" layer.
It is possible to use Akubra (the blob store used in Fedora) to direct
blobs to different stores based on some sort of attribute, but that
would be difficult and/or limited within the current architecture, for
example.

>   The HBaseFiledSearch i implemented is nothing but a  simple PoC, which 
> operated on the HBayse table strucure which i wrote about last time.
>   But this is in now way how we think it should look in the end and i 
> think it's much more likely that we will operate on some kind of index, 
> probably some lucene index or even a whole Solr server.

That makes sense.  We have spoken on this list about (a) moving
FieldSearch outside of Fedora core and (b) using an external gsearch
instance for searching and indexing the former FieldSearch content, if
desired.  gsearch is (or can be) backed by a Lucene index.

> Hope that cleared things im trying to achieve a bit up, but please feel 
> free to ask my any questions...

Will do, thanks

  -Aaron



------------------------------------------------------------------------------
AppSumo Presents a FREE Video for the SourceForge Community by Eric 
Ries, the creator of the Lean Startup Methodology on "Lean Startup 
Secrets Revealed." This video shows you how to validate your ideas, 
optimize your ideas and identify your business strategy.
http://p.sf.net/sfu/appsumosfdev2dev
_______________________________________________
Fedora-commons-developers mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/fedora-commons-developers

Reply via email to