Edison/Chip,

Please see my comments in-line.

Thanks,
-John

On May 31, 2013, at 4:04 PM, Chip Childers <chip.child...@sungard.com> wrote:

> Comments inline:
> 
> On Thu, May 30, 2013 at 09:42:29PM +0000, Edison Su wrote:
>> 
>> 
>>> -----Original Message-----
>>> From: John Burwell [mailto:jburw...@basho.com]
>>> Sent: Thursday, May 30, 2013 7:43 AM
>>> To: dev@cloudstack.apache.org
>>> Subject: Re: [MERGE]object_store branch into master
>>> 
>>> It feels like we have jumped to a solution without completely understanding
>>> the scope of the problem and the associated assumptions.  We have a
>>> community of hypervisor experts who we should consult to ensure we have
>>> the best solution.  As such, I recommend mailing the list with the specific
>>> hypervisors and functions that you have been unable to interface to storage
>>> that does not present a filesystem.  I do not recall seeing such a 
>>> discussion on
>>> the list previously.
>> 
>> If people using zone-wide primary storage, like, ceph/solidfire, then 
>> suddenly, there is no need for nfs cache storage, as zone-wide storage can 
>> be treated as both primary/secondary storage, S3 as the backup  storage. 
>> It's a simple but powerful solution.
>> Why we can't just add code to support this exciting new solutions? It's hard 
>> to do it on master branch, that's why Min and I worked hard to refactor the 
>> code, and remove nfs secondary storage dependency from management server as 
>> much as possible. All we know, nfs secondary storage is not scalable, not 
>> matter how fancy aging policy you have, how advanced capacity planner you 
>> have.
>> 
>> And that's one of reason I don't care that much about the issue with nfs 
>> cache storage, couldn't we put our energy on cloud style storage solution, 
>> instead of on the un-scalable storage?
> 
> Per your comment about you and Min working hard on this: nobody is
> saying that you didn't.  This isn't personal (or shouldn't be).  These
> are questions that are part of a consensus-based approach to
> development.
> 
>>> As I understand the goals of this enhancement, we will support additional
>>> secondary storage types and removing the assumption that secondary
>>> storage will always be NFS or have a filesystem.  As such, when a non-NFS
>>> type of secondary storage is employed, NFS is no longer the repository of
>>> record for this data.  We can always exceed available space in the 
>>> repository
>>> of record, and the failure scenarios are relatively well understood (4.1.0) 
>>> --
>>> operations will fail quickly and obviously.  However, as a transitory 
>>> staging
>>> storage mechanism (4.2.0), the expectation of the user is the NFS storage 
>>> will
>>> not be as reliable or large.  If the only solution we can provide for this
>>> problem is to recommend an NFS "cache" that is equal to the size of the
>>> object store itself then we have little to no progress addressing our user's
>> 
>> No, it's not true.  Admin can add multiple NFS cache storages if they want, 
>> there is no such requirement that NFS storage will be the same size of 
>> object store, I can't be that stupid.
>> It's the same thing that we are doing on the master branch: admin knows that 
>> one NFS secondary storage is not enough, so they can add multiple NFS 
>> secondary storage. And on the master branch,
>> There is no capacity planner for NFS secondary storage, if the code just 
>> randomly chooses one of NFS secondary storages, even if one of them are 
>> full. Yes, NFS secondary storage on master can be full, there is no way to 
>> aging out.
>> 
>> On the current object_store branch, it has the same behavior, admin can add 
>> multiple NFS cache storages, no capacity planner. While, in case nfs cache 
>> storage is full, admin can just simply remove the db entry related to cached 
>> object, and cleanup NFS cache storage, then suddenly, everything just works. 
>> 
>> From implementation point of view, I don't think there is any difference. 
> 
> It's an expectation issue.  Operators expect to be able to manage their
> storage capacity.  So the question is, for the NFS "Cache", how do they
> plan size requirements and manage that capacity?

The driver for employing an object store is to reduce the cost per GB of 
storage while maintaining reliability and availability.  Requiring NFS reduces, 
if not eliminates, this benefit because system architectures must ensure that 
the NFS "cache" (staging area) has sufficient capacity and reliability to hold 
data until it can be transferred to object storage.  How does adding multiple 
staging areas decrease complexity and cost?  As implemented, the NFS "cache" is 
unbounded meaning that an operator would need to have a NFS "cache" as large as 
object storage to avoid data loss and/or operational failures.

> 
>> 
>> 
>>> needs.  Fundamentally, the role of the NFS is different in 4.2.0 than 4.1.0.
>>> Therefore, I disagree with the assertion that issue is present in 4.1.0.
>> 
>> The role of NFS can be changed, but they share the same problem, no capacity 
>> planner, no aging out policy. 
>> 
> 
> Secondary storage capacity management is much easier to grok for
> operators.  I would bet that almost 100% of the time, their usage grows
> on a particular slope, allowing them to plan and allocate more when
> needed.
> 
> For the NFS "cache", lifecycle of objects stored in that location,
> especially cleanup routines, are going to be critical to the healthy
> operation of that environment.

+1. 

> 
>>> 
>>> An additional risk in the object_store implementation is that we lead a user
>>> to believe their data has been stored in reliable storage (e.g. S3, Riak 
>>> CS, etc)
>>> when it may not.  I saw no provision in the object_store to retry transfers 
>>> if
>> 
>> I don't know from which code you get this kind of conclusion. Could you help 
>> to point out in the code?
>> AFAIK, the object can only be either stored in S3 or not stored in S3, I 
>> don't know how  the object can be in a wrong state.
>> 
>>> the object_store transfer fails or becomes unavailable.  In 4.0.0/4.1.0, if 
>>> we
>>> can't connect to S3 or Swift, a background process continuously retries the
>>> upload until successful.
>> 
>> Here is the interesting situation coming out: how the mgt server or admin 
>> knows that background process push the objects successfully into s3? There 
>> is no guarantee the background process will success, there is no status 
>> track for this background process, right?
>> 
>> What I am doing on the object_store branch is that, if push object into S3 
>> failed, then the whole backup process failed, admin or user needs to send 
>> out another API request to push object into S3. This will guarantee that 
>> operation will either success or failed, instead of in a unknown state that 
>> we are doing on master branch. 
>> 
> 
> That's the right approach IMO (at least it's correct, per the current
> model of operations either working or not).

As I previously stated, this functionality is a step back from the current 
Swift and S3 implementations present in 4.1.0.  I also think it is an 
unreasonable burden to place on an operator to check that every possible 
transfer succeeded and then issue a retry of the copy.

I am also curious about the phrase "backup".  My understanding of this branch's 
goals was to support object stores as native secondary storage.  4.1.0 already 
supports backing up secondary storage to Swift and S3.  Is your vision for 
object_store that object stores can be used as native secondary storage?

> 
>>> 
>>> Finally, I see this issue as a design issue than a bug.  I don't think we 
>>> should
>> 
>> Again, I don't think it's a design issue, as I said above, it's a bug, both 
>> master branch and object_store have the same bug. It can be fixed, and easy 
>> to be fixed on object_store comparing with fixing it on master branch. And 
>> it's not an important issue, comparing to support cloud style storage 
>> solution.
>> 
> 
> Can we discuss fixing it in the object_store branch then?

Could you please define what you mean by a cloud style storage solution?  

> 
>>> Given the different use of NFS in the object_store branch vs. current, I 
>>> don't
>>> see the comparison in this case.  In the current implementation, when we
>>> exhaust space, we are truly out of resource.  However, in the object_store
>>> branch, we have no provision to remove stale data and we may report no
>>> space available when there is plenty of space available in the underlying
>>> object store.  In this scenario, the NFS "cache" becomes an artificial 
>>> limiter on
>>> the capacity of the system.  I do not understand how we have this problem in
>>> current since the object store is only a backup of secondary store -- not
>>> secondary storage itself.
>> 
>> As I said before, no matter what's the role of NFS storage, it shares the 
>> same issue, both NFS storage can be out of capacity, no capacity planner, no 
>> aging policy. 
>> 
> 
> But as I note above, the operator's planning process will be quite
> difficult.

Also, as I previously noted, the exhaustion is a completely different cause.  
In 4.1, I am truly out of the secondary storage.  As Chip mentioned, it is 
straightforward to plan for space requirements.  In object_store, I likely am 
not exhausted of secondary storage space, but have filled the cache.  Since 
most operators will want as a little NFS space as necessary in this scenario, 
my educated guess is that we will see exhaustion of cache far more frequently.

> 
>>> It is my estimate robust error handling will require design changes (e.g.
>>> introduction of a resource reservation mechanism, introduction of addition
>>> exception classes, enhancement of interfaces to provide more context
>>> regarding client intentions, etc) yielding significant code impact.  These
>>> changes need to undertaken in a holistic manner with minimum risk to
>>> master.   Fundamentally, we should not be merging code to master with
>>> known significant issues.  When it goes to master, we should be saying, "To
>>> the best of my knowledge and developer testing, there are no blocker or
>>> critical issues."  In my opinion, omission of robust error handling does not
>>> meet that standard.
>> 
>> To be realistic, on the mgt server, there is only one class which is 
>> depended on cache storage, there is only one interface needs to be 
>> implemented to solve the issue, why we need redesign?
> 
> Right, let's look at how to deal with it cleanly within that
> implementation (although I suspect that the changes will leak out of
> that class).
> 

The lack of error handling extends beyond the cache.  The entire branch needs 
to be evaluated for exception handling.

Reply via email to