Comments inline: On Thu, May 30, 2013 at 09:42:29PM +0000, Edison Su wrote: > > > > -----Original Message----- > > From: John Burwell [mailto:jburw...@basho.com] > > Sent: Thursday, May 30, 2013 7:43 AM > > To: dev@cloudstack.apache.org > > Subject: Re: [MERGE]object_store branch into master > > > > It feels like we have jumped to a solution without completely understanding > > the scope of the problem and the associated assumptions. We have a > > community of hypervisor experts who we should consult to ensure we have > > the best solution. As such, I recommend mailing the list with the specific > > hypervisors and functions that you have been unable to interface to storage > > that does not present a filesystem. I do not recall seeing such a > > discussion on > > the list previously. > > If people using zone-wide primary storage, like, ceph/solidfire, then > suddenly, there is no need for nfs cache storage, as zone-wide storage can be > treated as both primary/secondary storage, S3 as the backup storage. It's a > simple but powerful solution. > Why we can't just add code to support this exciting new solutions? It's hard > to do it on master branch, that's why Min and I worked hard to refactor the > code, and remove nfs secondary storage dependency from management server as > much as possible. All we know, nfs secondary storage is not scalable, not > matter how fancy aging policy you have, how advanced capacity planner you > have. > > And that's one of reason I don't care that much about the issue with nfs > cache storage, couldn't we put our energy on cloud style storage solution, > instead of on the un-scalable storage?
Per your comment about you and Min working hard on this: nobody is saying that you didn't. This isn't personal (or shouldn't be). These are questions that are part of a consensus-based approach to development. > > As I understand the goals of this enhancement, we will support additional > > secondary storage types and removing the assumption that secondary > > storage will always be NFS or have a filesystem. As such, when a non-NFS > > type of secondary storage is employed, NFS is no longer the repository of > > record for this data. We can always exceed available space in the > > repository > > of record, and the failure scenarios are relatively well understood (4.1.0) > > -- > > operations will fail quickly and obviously. However, as a transitory > > staging > > storage mechanism (4.2.0), the expectation of the user is the NFS storage > > will > > not be as reliable or large. If the only solution we can provide for this > > problem is to recommend an NFS "cache" that is equal to the size of the > > object store itself then we have little to no progress addressing our user's > > No, it's not true. Admin can add multiple NFS cache storages if they want, > there is no such requirement that NFS storage will be the same size of object > store, I can't be that stupid. > It's the same thing that we are doing on the master branch: admin knows that > one NFS secondary storage is not enough, so they can add multiple NFS > secondary storage. And on the master branch, > There is no capacity planner for NFS secondary storage, if the code just > randomly chooses one of NFS secondary storages, even if one of them are full. > Yes, NFS secondary storage on master can be full, there is no way to aging > out. > > On the current object_store branch, it has the same behavior, admin can add > multiple NFS cache storages, no capacity planner. While, in case nfs cache > storage is full, admin can just simply remove the db entry related to cached > object, and cleanup NFS cache storage, then suddenly, everything just works. > > From implementation point of view, I don't think there is any difference. It's an expectation issue. Operators expect to be able to manage their storage capacity. So the question is, for the NFS "Cache", how do they plan size requirements and manage that capacity? > > > > needs. Fundamentally, the role of the NFS is different in 4.2.0 than 4.1.0. > > Therefore, I disagree with the assertion that issue is present in 4.1.0. > > The role of NFS can be changed, but they share the same problem, no capacity > planner, no aging out policy. > Secondary storage capacity management is much easier to grok for operators. I would bet that almost 100% of the time, their usage grows on a particular slope, allowing them to plan and allocate more when needed. For the NFS "cache", lifecycle of objects stored in that location, especially cleanup routines, are going to be critical to the healthy operation of that environment. > > > > An additional risk in the object_store implementation is that we lead a user > > to believe their data has been stored in reliable storage (e.g. S3, Riak > > CS, etc) > > when it may not. I saw no provision in the object_store to retry transfers > > if > > I don't know from which code you get this kind of conclusion. Could you help > to point out in the code? > AFAIK, the object can only be either stored in S3 or not stored in S3, I > don't know how the object can be in a wrong state. > > > the object_store transfer fails or becomes unavailable. In 4.0.0/4.1.0, if > > we > > can't connect to S3 or Swift, a background process continuously retries the > > upload until successful. > > Here is the interesting situation coming out: how the mgt server or admin > knows that background process push the objects successfully into s3? There is > no guarantee the background process will success, there is no status track > for this background process, right? > > What I am doing on the object_store branch is that, if push object into S3 > failed, then the whole backup process failed, admin or user needs to send out > another API request to push object into S3. This will guarantee that > operation will either success or failed, instead of in a unknown state that > we are doing on master branch. > That's the right approach IMO (at least it's correct, per the current model of operations either working or not). > > > > Finally, I see this issue as a design issue than a bug. I don't think we > > should > > Again, I don't think it's a design issue, as I said above, it's a bug, both > master branch and object_store have the same bug. It can be fixed, and easy > to be fixed on object_store comparing with fixing it on master branch. And > it's not an important issue, comparing to support cloud style storage > solution. > Can we discuss fixing it in the object_store branch then? > > Given the different use of NFS in the object_store branch vs. current, I > > don't > > see the comparison in this case. In the current implementation, when we > > exhaust space, we are truly out of resource. However, in the object_store > > branch, we have no provision to remove stale data and we may report no > > space available when there is plenty of space available in the underlying > > object store. In this scenario, the NFS "cache" becomes an artificial > > limiter on > > the capacity of the system. I do not understand how we have this problem in > > current since the object store is only a backup of secondary store -- not > > secondary storage itself. > > As I said before, no matter what's the role of NFS storage, it shares the > same issue, both NFS storage can be out of capacity, no capacity planner, no > aging policy. > But as I note above, the operator's planning process will be quite difficult. > > It is my estimate robust error handling will require design changes (e.g. > > introduction of a resource reservation mechanism, introduction of addition > > exception classes, enhancement of interfaces to provide more context > > regarding client intentions, etc) yielding significant code impact. These > > changes need to undertaken in a holistic manner with minimum risk to > > master. Fundamentally, we should not be merging code to master with > > known significant issues. When it goes to master, we should be saying, "To > > the best of my knowledge and developer testing, there are no blocker or > > critical issues." In my opinion, omission of robust error handling does not > > meet that standard. > > To be realistic, on the mgt server, there is only one class which is depended > on cache storage, there is only one interface needs to be implemented to > solve the issue, why we need redesign? Right, let's look at how to deal with it cleanly within that implementation (although I suspect that the changes will leak out of that class).