RE: [MERGE]object_store branch into master

Edison Su Tue, 28 May 2013 16:44:20 -0700


> -----Original Message-----
> From: John Burwell [mailto:jburw...@basho.com]
> Sent: Tuesday, May 28, 2013 8:43 AM
> To: dev@cloudstack.apache.org
> Subject: Re: [MERGE]object_store branch into master
> 
> All,
> 
> I have gone a through a large chunk of this patch, and published my review
> thus far (https://reviews.apache.org/r/11277/).   TL;DR is that this patch 
> has a
> number of significant issues which can be summarized as follows:

Thanks for your time to review this patch.

> 
> 1. While it appeas that the requirement of NFS for secondary storage has
> largely been removed, it has basically been if blocked out instead of pushed
> down as an detail choice in the physical layer.  Rather than exploiting
> polymorpish to vary behavior through a set of higher level abstracttions, the
> orchestration performs instanceof NFSTO checks. The concern is that future
> evolution of secondary storage layer will still have dependencies on NFS.

As long as mgt server doesn't explicitly depend on nfs secondary storage during 
the orchestration, and the nfs storage can be used based on different 
configurations, then the resource side "NFSTO checks" is not much concern to me 
at the current stage. We can add a " polymorpish high level abstraction"  at 
the resource side, as you suggested before, in the future. The current code 
change is already so huge now, add a new level a abstraction will make the 
change even much bigger. But adding this kind of "high level abstraction" is 
not hard any more, as all the storage related commands, are only dealing with 
DataTO/DataStoreTO. Right now, there is no read/write data operations on this 
*TO, one possible way to add a "high level abstraction" would be add one 
read/write data operations at each hypervisor resource based on the DataTO and 
DataStoreTO.
My point is that refactor mgt server code is much much harder than the resource 
code. If we can get the mgt server right, then we can move on to refactor 
resource code, in the future.

> 
> 2. In some sceanrios, NFS is still a requirement for secondary storage.  In
> particular, Xen users will still have to maintain an "NFS cache".  Given the
> degradation of capability, I think it would be helpful to put a few more
> eyeballs on the Xen implementation to determine if we could avoid using an
> intermediate file system.

Nfs will not disappear soon, for some hypervisors, for some type of storages. 
But as long as the mgt server doesn't have this requirement, then people can 
add new type of hypervisors(e.g. hyperv), new type of storages(ceph, solidfire 
etc) which don't need NFS. The optimization for a particular hypervisor can be 
done by other people, who may be familiar with some type of storage, or a 
certain hypervisor. 

> 3. I have the following concerns regarding potential race conditions and
> resource exhaustion in the NFS cache implementation.
> 
>     - The current random allocator algorithm does not account for the amount
> space that will be required for the operation (i.e. checking to ensure that 
> the
> cache it picks has enough space to transfer to hold the object being
> downloaded) nor does it reserve the space.Given the long (in compute time)
> running nature of these of processes, the available space in a cache could be
> exhausted by a number of concurrently transfering templates and/or
> snapshots.  By reserving space before the transfer, the allocator would be
> able to account for both pending operations and the current contents of the
> cache.

1. It's the current behavior of 4.1 and master secondary storage code. We 
didn't introduce new issue here.
2. Add a new cache storage allocator is much easier than adding it in master 
branch. 
3. I treat it as a bug, and plan to fix it after the merge.

>     - There appears no mechanism to age out contents of the cache.
> Therefore, as implemented, it will grow unbounded.  The only workaround
> for this problem would be to have an NFS cache whose size equals that of
> the object store.

Admin can multiple nfs cache storages into one zone, and it's the current 
behavior on master branch. Again, we can add aging policy later, as cache 
storage has its own pom project, and it's own implementation.

>     - The mechanism lacks robust error handling or retry logic.  In 
> particular, the
> behavior if/when a cache exhausts available space appears non-deterministic.

Again, master branch will have the same issue, fix the issue will be easier 
than fixing it on master branch.

>     - Generally, I see little consideration for alternative/exception flows.  
> For
> example, what happens if I attempt to use a template/iso/snapshot in transit
> to the object store to/from the cache?  Since these files can be very large
> (multiple GBs), we have assume some transfer latency in all interactions.

Management server mainly maintains the state of each 
object(volume/snapshot/template), we add state machine for all the objects, and 
for all the objects on each storages, which is unavailable on the master 
branch. Take copy template from object store to cache as an example, the state 
of template entry on object storage and cache storage will be changed to 
"Copying", if copying failed or time out, then the state of template entry on 
object storage will be changed to "Ready", while the state of template entry on 
cache storage will be changed to "Failed". 
As long as mgt server maintains the state correctly, user/admin can issue the 
same operation through UI/API in case one operation is failed. That' s one of 
reason, there is no retry logic in the storage subsystem code. Another reason 
is that adding retry logic on top of async api will a little bit difficult.

> 
> 4. The Image Cache abstraction is too tightly coupled to the core
> orchestration mechanism.  I would recommend implementing it as a

The only place is coupled with image cache abstraction is at 
datamotionstrategy, in particular, 
AncientDataMotionStrategy ->needCacheStorage and AncientDataMotionStrategy will 
call StorageCacheManager-> createCacheObject if cache storage is needed.

> DataStore proxy that is applied as necessary.  For example, the current Xen
> hypervisor implementation must interact with a file system.  It seems most
> appropriate that the Xen hypervisor implementation would apply the proxy
> to its secondary storage DataStore instances.  Therefore, the storage engine

The issue that struggles me for a while is that, how and where to configure 
cache storage. Take S3 as an example, the implementation of S3 provider should 
not need to care about cache storage at all, S3 implementation should only care 
about how to deal with data stored in S3. While S3 is a general purpose 
storage, the same implementation can be used by different hypervisors, and in 
different situations. For example,  If both Ceph and S3 used by KVM, then 
definitely, nfs cache is not needed, while for other cluster wide primary 
storages supported by KVM, nfs cache is needed. 
How and where to present this kind of configuration, and also both primary 
storage provider and object storage provider implementation don't need to aware 
this kind of configuration? My current solution is that, write the decision in 
the code, DataMotionStrategy is the place to decide to use cache storage or 
not. In the default implementation, all the copy operations from s3 to primary 
storage, cache storage is needed, while if people adding a new storage, and 
want to remove cache storage for certain reason, then add a new 
DataMotionStrategy.

I am not sure how the datastore proxy will solve the issue I mentioned above, I 
am glad to know your solution.

> should only provide the facility not attempt to determine when it is needed.
> Additionally, a proxy model would greatly reduce the size and complexity of
> the various classes (e.g. AncientDataMotionStrategy).

The current AncientDataMotionStrategy is only about 400 LOC now, I removed some 
of the duplicated code. But still want to know the detail of your solution.

> 
> 5. While I have only reviewed roughly 50-60% of the patch thus far, I see 
> little
> to no additional unit tests for the new code added.  Integration tests have

If the code is changing rapidly, then writing/maintaining unit test cases will 
take huge effort, we focused on the integration test and marvin test in this 
stage.

> been expanded, but many of the test simply execute the service methods
> and do not verify the form of the operation results. There are also a
> tremendous number of ignored exceptions that should likely fail these tests.
> Therefore, we could have tests passing due to an ignored exception that
> should be failing.

1. Thanks, you take a look at the integration test, which is very convenient 
for us during the implementation. It saves us a lot time during the test.
2. Yes, the integration test needed to be improved as your suggested. 

> 
> While I recognize the tremendous effort that has been expended to
> implement this capability and value of the feature, I see significant design
> and operational issues that must be addressed before it can be accepted into

I would say the object _store branch is just better than the master branch, 
almost all the issues(related to cache storages) you pointed out, are existing 
on the master branch. While fixing these issues on master branch is harder than 
on object_store branch. Frankly speaking, I don't want to read/understand/guess 
how it works on the master branch, it's a lot of hack all around the place.
Of course, these issues are still issues, we need to fix them, but when to fix 
them is arguable. I prefer fixing them after the merge. Here is the reason:
If we agreed on the issues you pointed out are existing on the master also, 
then do we want to fix them on the 4.2 or not? If the answer is yes, then who 
will fix them, and how long it will take to fix them on master? If I propose an 
offer, that I'll fix these issues after merging object_store into master, will 
it be acceptable? If the answer is no,  then there is no technical reason to 
not merge object_store into master, as these issues are not introduced by 
object_store branch.

> master.  In my opinion, the object_store represents a good first step, but it
> needs a few more review/refinement iterations before it will be ready for a
> master merge.
> 
> Thanks,
> -John
> 
> On May 28, 2013, at 10:12 AM, Nitin Mehta <nitin.me...@citrix.com> wrote:
> 
> > Agree with Wido. This would be a great feature to have in 4.2. Yes,
> > its a lot of change and probably needs more education from Edison and
> > Min maybe through code walkthroughs, documentation, IRC meetings but I
> > am +1 for this to make it to 4.2 and would go as far to say that I
> > would even volunteer for any bug fixes required.
> >
> > I would say its not too bad to merge it now as most of the features
> > for
> > 4.2 are merged by now and not a lot of them would be blocked because
> > of this. Yes, the master would be unstable but it would be even if we
> > merge it post cutting 4.2 branch. I would rather see this coming in
> > 4.2 than wait for another 6 months or so for it. Yes, this is an
> > architectural change and we are learning as a community to time these kind
> of changes.
> > We should also try and raise alarms for these changes much early when
> > the FS was proposed rather than when its done, probably a learning for
> > all of us :)
> >
> > Thanks,
> > -Nitin
> >
> > On 28/05/13 4:23 PM, "Wido den Hollander" <w...@widodh.nl> wrote:
> >
> >>
> >>
> >> On 05/23/2013 06:35 PM, Chip Childers wrote:
> >>> On Wed, May 22, 2013 at 09:25:10PM +0000, Edison Su wrote:
> >>>>
> >>>>
> >>>>> -----Original Message-----
> >>>>> From: Chip Childers [mailto:chip.child...@sungard.com]
> >>>>> Sent: Wednesday, May 22, 2013 1:26 PM
> >>>>> To: dev@cloudstack.apache.org
> >>>>> Subject: Re: [MERGE]object_store branch into master
> >>>>>
> >>>>> On Wed, May 22, 2013 at 08:15:41PM +0000, Animesh Chaturvedi
> wrote:
> >>>>>>
> >>>>>>
> >>>>>>> -----Original Message-----
> >>>>>>> From: Chip Childers [mailto:chip.child...@sungard.com]
> >>>>>>> Sent: Wednesday, May 22, 2013 12:08 PM
> >>>>>>> To: dev@cloudstack.apache.org
> >>>>>>> Subject: Re: [MERGE]object_store branch into master
> >>>>>>>
> >>>>>>> On Wed, May 22, 2013 at 07:00:51PM +0000, Animesh Chaturvedi
> wrote:
> >>>>>>>>
> >>>>>>>>
> >>>>>>>>> -----Original Message-----
> >>>>>>>>> From: John Burwell [mailto:jburw...@basho.com]
> >>>>>>>>> Sent: Tuesday, May 21, 2013 8:51 AM
> >>>>>>>>> To: dev@cloudstack.apache.org
> >>>>>>>>> Subject: Re: [MERGE]object_store branch into master
> >>>>>>>>>
> >>>>>>>>> Edison,
> >>>>>>>>>
> >>>>>>>>> Thanks, I will start going through it today.  Based on other
> >>>>>>>>> $dayjob responsibilities, it may take me a couple of days.
> >>>>>>>>>
> >>>>>>>>> Thanks,
> >>>>>>>>> -John
> >>>>>>>> [Animesh>] John we are just a few days away  from 4.2 feature
> >>>>>>>> freeze, can
> >>>>>>> you provide your comments by Friday 5/24.   I would like all feature
> >>>>> threads
> >>>>>>> to be resolved sooner so that we don't have last minute rush.
> >>>>>>>
> >>>>>>> I'm just going to comment on this, but not take it much further...
> >>>>>>> this type of change is an "architectural" change.  We had
> >>>>>>> previously discussed (on several
> >>>>>>> threads) that the appropriate time for this sort of thing to hit
> >>>>>>> master was
> >>>>>>> *early* in the release cycle.  Any reason that that consensus
> >>>>>>> doesn't apply here?
> >>>>>> [Animesh>] Yes it is an architectural change and discussion on
> >>>>>> this started a
> >>>>> few weeks back already, Min and Edison wanted to get it in sooner
> >>>>> by
> >>>>> 4/30
> >>>>> but it took longer than anticipated in  preparing for merge and
> >>>>> testing on feature branch.
> >>>>>>
> >>>>>>
> >>>>>
> >>>>> You're not following me I think.  See this thread on the Javelin
> >>>>> merge:
> >>>>>
> >>>>> http://markmail.org/message/e6peml5ddkqa6jp4
> >>>>>
> >>>>> We have discussed that our preference is for architectural changes
> >>>>> to hit master shortly after a feature branch is cut.  Why are we
> >>>>> not doing that here?
> >>>>
> >>>> This kind of refactor takes time, a lot of time. I think I worked
> >>>> on the merge of primary storage refactor into master and bug fixes
> >>>> during
> >>>>
> March(http://comments.gmane.org/gmane.comp.apache.cloudstack.devel/
> >>>> 14469 ), then started to work on the secondary storage refactor in
> >>>> April(http://markmail.org/message/cspb6xweeupfvpit). Min and I
> >>>> finished the coding at end of April, then tested for two weeks,
> >>>> send out the merge request at middle of May.
> >>>> With the refactor, the  storage code will be much cleaner, and the
> >>>> performance of S3 will be improved, and integration with other
> >>>> storage vendor will be much easier, and the quality is ok(33 bugs
> >>>> fired, only 5
> >>>> left:
> >>>>
> https://issues.apache.org/jira/issues/?jql=text%20~%20%22Object_Sto
> >>>> re_Re factor%22). Anyway, it's up to the community to decide, merge
> >>>> it or not, we already tried our best to get it done ASAP.
> >>>>
> >>>>
> >>>
> >>> I'm absolutely not questioning the time and effort here.  I know
> >>> that you have been working hard, and that testing is happening!
> >>>
> >>> I'm only asking if we, as a community, want to follow the practice
> >>> of bringing changes like this in early or late in a cycle.  I
> >>> thought we had agreed on doing it early.
> >>>
> >>
> >> So I tried reviewing the code, but I have to say that it is a lot of
> >> code. Reviewing such a large piece of code isn't easy.
> >>
> >> Now, let me be honest, I'd love to see this in 4.2 since it would
> >> make the Ceph integration a lot better. We can get rid of NFS as
> >> Secondary Storage and use Ceph as the only storage for CS.
> >>
> >> Yes, it might need some work after this branch has been merged, but I
> >> do agree that it's a lot of work to maintain a branch next to master.
> >> Even with smaller fixes you have to do a lot to keep up.
> >>
> >> Imho a feature freeze is a feature freeze. It's set for May 31st and
> >> afterwards we start ironing the bugs out, but no new merges from
> >> other branches.
> >>
> >> We will need the full support from Edison and Co to help iron out
> >> these bugs. Maybe something will be broken after the merge and that
> >> should be fixed asap then.
> >>
> >> Again, my opinion in this is a bit coloured, but I think this will be
> >> a great addition to CloudStack, it would make 4.2 a killer release.
> >>
> >> Wido
> >

RE: [MERGE]object_store branch into master

Reply via email to