Hi Frank! Without a fairly significant rewrite of key bits of Fedora, attempting to implement the persistence on HBase+HDFS currently would be pretty difficult. Supporting that kind of big change is what the High Level Storage effort is all about. So far, in our development discussions, we have been talking about High Level Storage as a 4.0 thing, which is probably at least a year away.
To be honest, the High Level Storage effort has moved much more slowly than most of us would have liked (nobody's fault -- just most of us have higher priorities we're busy working on), and I think we all agree that some real prototyping and experimentation is needed at this point to move the work forward. So I think it's great that you're digging in and experimenting with HBase+HDFS...I hope some of your findings can help to inform the High Level Storage effort down the road (whatever that becomes) > 1.) From what i've seen in the fedora code, having fedora use HBase > instead of a relational DB, would encompass implementations for: > - org.fcrepo.server.management.PIDGenerator > - org.fcrepo.server.storage.DOManagar > - org.fcrepo.server.storage.lowlevel.PathRegistry > - org.fcrepo.server.utilities.rebuild.Rebuilder > Is this correct or am i missing some classes/interfaces here? I don't think a PathRegistry is really necessary, as that's an implementation detail of the legacy llstore implementation. If you're using an akubra-based llstore plugin, I don't think that class should be in use at all. A couple missing classes that come to mind here are the ResourceIndex and FieldSearch modules. By design, these are not critical to the operation of Fedora as a service...in fact risearch is explicitly optional. However, parts of the REST API as currently defined won't work if you don't have a FieldSearch replacement in place. In particular, /fedora/objects?(search criteria) I think, longer-term, both of these components really belong outside the core repository service. So if I take a long view of what you're doing I see absolutely no problem with ignoring them for now. -- As a related issue, I know a lot of folks have been thinking lately about what it would take to make Fedora horizontally scale. There are many possible approaches that could be taken; some more traditional Java clustering approaches that allow for shared state (e.g. Terracotta), and more "web-scale" approaches that involve minimal shared state and minimal points of failure. HBase+HDFS falling into the latter category. I have actually been wondering about Apache Cassandra lately as a possible solution. As you may know, Cassandra is really not designed for dealing with very large files, but it is truly a "shared nothing" persistence solution that does not have a single point of failure. HBase also looked promising to me, but I noticed that it does have at least one SPOF by design (the NameNode). Cassandra also appeared to have a slightly larger community around it at the moment. Did you also consider Cassandra in your effort? I'm curious what your evaluation criteria were if you did. Thanks, Chris ------------------------------------------------------------------------------ EditLive Enterprise is the world's most technically advanced content authoring tool. Experience the power of Track Changes, Inline Image Editing and ensure content is compliant with Accessibility Checking. http://p.sf.net/sfu/ephox-dev2dev _______________________________________________ Fedora-commons-developers mailing list [email protected] https://lists.sourceforge.net/lists/listinfo/fedora-commons-developers
