Members present: jburwell, iswc, ke4qqq, edison_cs, topcloud, widodh ---------------- Meeting summary: ----------------
1. Preface IRC log follows: # 1. Preface # 19:03:44 [widodh]: and after those are on the table we can discuss them 19:04:14 [jburwell]: item 1 for me is the storage driver model 19:04:30 [jburwell]: which edison and I have been discussing for a bit 19:04:52 [edison_cs]: yah, let's continue the discuss 19:04:54 [widodh]: Yes, I noticed that and I have some questions about that (for later) 19:05:09 [widodh]: item 1 for me is the management server having direct access to all APIs 19:05:22 [widodh]: do we have anything else? 19:05:29 [edison_cs]: I have one 19:05:44 [edison_cs]: the current status of what am I doing, and the plan for 4.1 19:06:22 [edison_cs]: any new topic? 19:06:44 [iswc]: I'm here as a spectator, I have no input so just ignore me 19:07:07 [edison_cs]: ok, then let's move on 19:07:15 [edison_cs]: who will be the first? 19:07:24 [widodh]: You are? Let us know what you are doing 19:07:29 [edison_cs]: ok 19:07:29 [jburwell]: I think widodh's item 1 and my item 1 are intwined 19:08:07 [edison_cs]: I am writting code based on http://cwiki.apache.org/confluence/display/CLOUDSTACK/Storage+subsystem+2.0 19:08:07 [jburwell]: because the driver model we have been discussing is designed to be execution local agnostic 19:08:31 [edison_cs]: finished the orchestration part of code 19:08:44 [edison_cs]: on javelin branch 19:08:54 [jburwell]: do you have the storage subsystem wired into cloudstack? 19:08:59 [jburwell]: at any level .. 19:09:02 [edison_cs]: basically, it can call driver to create/copy volume/snapshot/template etc 19:09:08 [edison_cs]: not yet 19:09:14 [edison_cs]: my plan 19:09:37 [edison_cs]: is plugged into mgt server at end of this week 19:09:52 [edison_cs]: there will be no change for current storage code 19:10:07 [jburwell]: then why merge it in for 4.1.0? 19:10:29 [edison_cs]: so people can start to write driver for primary storage 19:10:52 [topcloud]: edison: is the plan still to treat the current cloudstack code as one big datastore provider? 19:10:52 [jburwell]: I would suggest merging in any new storage subsystem post-4.1.0 19:10:54 [topcloud]: and merge it in this way? 19:10:59 [edison_cs]: old code and new code will be co-exist for a while 19:11:14 [widodh]: what would the benefit be for doing that in 4.1? Won't people be writing against master anyway? 19:11:14 [jburwell]: as a first action in then next release lifecycle 19:11:15 [ke4qqq]: but it won't be useful in 4.1 - so why not merge it into master after the 4.1 branch 19:11:22 [edison_cs]: topcloud: yes, current storagemanager will be the default data store provider 19:11:44 [jburwell]: ke4qqq +1 19:11:59 [edison_cs]: ke4qqq: zone-wide storage will need the new code 19:12:02 [topcloud]: edison is saying it will be wired. 19:12:14 [topcloud]: by this weekend. 19:12:22 [jburwell]: as I understand it, it will be wired into the mgmt server 19:12:29 [jburwell]: but not actually doing anything 19:12:53 [jburwell]: so why take on such a large commit late in the development cycle? 19:13:17 [jburwell]: the difference in merge dates at this point would be early next week 19:13:22 [edison_cs]: jburwell: the exisiting storage code, will be the default data storage provider 19:13:24 [jburwell]: as opposed to late next week 19:13:46 [jburwell]: this feels like a big, big change to take on days before a code freeze 19:14:07 [jburwell]: does it add any functionality for 4.1.0? 19:14:22 [edison_cs]: to add zone-wide storage 19:14:37 [edison_cs]: and possbile to add new storage driver 19:14:37 [ke4qqq]: is the zone-wide storage stuff done? 19:14:44 [widodh]: Ok, so that's the win for 4.1? zone-wide storage 19:14:45 [edison_cs]: after the merge 19:14:52 [edison_cs]: it will be doable 19:15:07 [jburwell]: it seems unlikely new storage drivers would be added in time for 4.1.0 code freeze 19:15:14 [ke4qqq]: but no new features will hit 4.1 after branch - so new drivers in 4.1 doesn't help 19:15:37 [jburwell]: as I have said, I think we need to rework the driver model as well 19:15:38 [edison_cs]: widodh: yes, that's the plan, after merge, I will add zone-wide storage support 19:15:44 [jburwell]: to separate logical and physical operations 19:16:07 [jburwell]: being a week out from the freeze date, this feels like a lot to take on 19:16:31 [topcloud]: why don't we talk about that first. 19:16:44 [jburwell]: topcloud which .. the driver model? 19:16:44 [topcloud]: the driver model rework. 19:17:22 [jburwell]: topcloud so I have two hobby horses I have been flogged wrt to storage drivers 19:17:39 [topcloud]: flog away 19:17:52 [jburwell]: 1) remove all implicit and explicit assumptions regarding a storage device's support of an underlying filesystem 19:18:09 [jburwell]: 2) separate logical and physical storage operations to consolidate domain logic and simplify implementation 19:18:22 [topcloud]: +1 on 1) is there anything specific that you saw that led you to this? 19:18:37 [topcloud]: +1 on 2 as well. 19:18:52 [jburwell]: when implementing the S3 capabilities 19:19:07 [jburwell]: you run into File as the implicit coin of the realm 19:19:14 [topcloud]: i mean these two points are the reason why we're doing storage refactor. 19:19:24 [widodh]: jburwell: Yes, on #1, the problem with RBD is that it are "virtual devices". So never assume a block device is also a kernel block device 19:19:29 [jburwell]: basically, storage needs to speak in terms of URIs and Input/OutputStreams 19:19:38 [widodh]: Same goes when we implement Sheepdog (which will happen) 19:20:09 [jburwell]: basically, CloudStack needs to speak to a storage driver with a logical URI such as "/template/2/200" for a template id 200 on account id 2 19:20:22 [edison_cs]: In the new storage code, all the objects(volume/template/snapshot) are just URI 19:20:22 [jburwell]: the driver maps that to a physical location 19:20:29 [topcloud]: so here's our take on this. 19:20:44 [jburwell]: topcloud when you say our who do u mean? 19:20:52 [topcloud]: edison and mine. 19:21:16 [widodh]: topcloud: I'm bad at IRC nicknames :) What's your real name? 19:21:23 [topcloud]: alex huang 19:21:37 [topcloud]: i should change my name. 19:21:37 [topcloud]: or nicname. 19:21:46 [widodh]: Ah, hi Alex! 19:21:59 [topcloud]: anyways....the way we looked at this is as follows: 19:22:14 [topcloud]: there's three parts to any storage provisioning. 19:22:29 [topcloud]: - orchestration: that's cloudstack orchestration code. 19:23:00 [topcloud]: - provisioning: that's logic on provisioning different storage pieces (volume, snapshot, etc) 19:23:14 [topcloud]: - the actual moving of the bytes: 19:23:23 [topcloud]: what do you think? 19:23:44 [jburwell]: topcloud I think along similar lines but with slightly different terms 19:23:47 [jburwell]: so for a quick normalization 19:23:53 [topcloud]: like to hear your thoughts on that. 19:24:01 [jburwell]: I see two types of operations logical and physical 19:24:22 [jburwell]: logical operations cover orchestration and provisioning and should be common across all devices 19:24:29 [topcloud]: physical is the actual bye movement? 19:24:39 [jburwell]: physical operations involve the movement of bytes from one point to another 19:24:52 [topcloud]: cool...don't think we're that different. 19:24:54 [jburwell]: nope 19:24:59 [widodh]: Nope, me neither 19:25:01 [jburwell]: I think we are on the same page there 19:25:14 [jburwell]: from there I see another dimension 19:25:14 [edison_cs]: I just finished the "logical operations cover orchestration and provisioning and should be common across all devices" part 19:25:14 [jburwell]: when we get the physical layer 19:25:17 [topcloud]: for logical, we divide between orchestration and provisioning so cloudstack and the storage vendors can agree on the right division. 19:25:37 [jburwell]: a physical device has capabilities 19:25:59 [jburwell]: and at the logical layer, we have policies that constrain how the capabilities of a device can be used in cloudstack 19:25:59 [topcloud]: cool...so let's talk about where these three parts fit. i think cloudstack is obvious right....so forget that. 19:26:07 [jburwell]: so, for example, S3 is an object store 19:26:22 [jburwell]: actually, NFS is a better example of this 19:26:22 [edison_cs]: all the operations on snapshot/volume are generalized into 3 or 5 api calls: create/delete/copy etc 19:26:23 [jburwell]: NFS is a file system type of store 19:26:29 [jburwell]: right .. 19:26:31 [jburwell]: it can store anything .. 19:26:40 [jburwell]: images, isos, templates, volumes, etc 19:26:59 [jburwell]: however, we have designated at that hfs://foo-server/primary is only used for volume storage 19:27:07 [edison_cs]: the orchestration doesn't care about iamges/isos/ or whatever, they will share the same code 19:27:07 [topcloud]: so i would like logical part to rise above that. 19:27:15 [edison_cs]: to do create/delete/copy 19:27:17 [topcloud]: that it is not tied to NFS or S3.... 19:27:30 [jburwell]: topcloud agreed 19:27:52 [jburwell]: from my view, the snapshot service is passed a target storage device 19:27:59 [topcloud]: so in the storage refactor, do you see anything that has that tie in? 19:28:07 [jburwell]: for which to create the snapshot 19:28:52 [jburwell]: in this case, I see three abstractions 19:29:22 [jburwell]: StorageDevice, StorageDeviceDriver, and SnapshotStorageDevice (extends StorageDevice) 19:30:24 [topcloud]: ok..i think i see slightly different....and it has to do with where we think each piece fit in. 19:30:37 [jburwell]: StorageDevice provides the logical tie between an instance StorageDeviceDriver, configuration information, and the policy for its use 19:30:44 [edison_cs]: jburwell: I think all the diagrams in https://cwiki.apache.org/confluence/display/CLOUDSTACK/Storage+subsystem+2.0, are talking about upper level(orchestration and logical operations) 19:30:53 [edison_cs]: it's not at the physical level 19:31:09 [jburwell]: the storagedevicedriver is provided by a "vendor" to actually interact with the physical device 19:31:44 [jburwell]: finally, a SnapshotStorageDevice provides some addition operation to perform on device snapshots for those device that support it 19:31:52 [edison_cs]: seems the driver is a little bit confused in diagramgs 19:32:22 [topcloud]: jburwell: i think i understand what you're getting at. 19:32:44 [edison_cs]: the driver in the diagrams is a hook, the vendor can plugin his own code into cloudstack mgt server, which will change how the storage provision work 19:33:07 [jburwell]: topcloud so, one or more StorageDevice instances are passed into each logical service operation 19:33:29 [jburwell]: am I making sense? 19:33:37 [topcloud]: I think the difference is that we don't think cloudstack management server does not need to understand how the physical operation is performed. 19:33:52 [topcloud]: strike the does not 19:34:22 [topcloud]: I think like edison says driver was a poor choice of name. 19:34:23 [topcloud]: for his design. 19:34:29 [jburwell]: topcloud I completely agree that it does not need to know how 19:34:37 [jburwell]: however, it can drive a storage device to do something 19:35:07 [edison_cs]: but it's at the logical level, at at the physcial level 19:35:07 [jburwell]: for example, I ask a storage device for InputStream from a URI 19:35:29 [jburwell]: and then copy from that InputStream into an OutputStream provided by another storage for a second URI 19:35:44 [jburwell]: CloudStack has no idea whatsoever how those read and write operations are occuring 19:35:55 [jburwell]: that is encapsulated in the InputStream and OutputStream instances 19:36:01 [topcloud]: yup...then why should cloudstack even know about streams? 19:36:07 [topcloud]: that's my point. 19:36:10 [jburwell]: but cloudstack does know that I need a template/2/200 to create VM 19:36:14 [topcloud]: it should just know uri. 19:36:29 [jburwell]: because a lot duplicated logic can be eliminated 19:36:37 [jburwell]: and a URI without the context of the stream is useless 19:36:44 [topcloud]: the stream is only necessary at where the actual physical provisioning is taking place. 19:36:59 [widodh]: Wouldn't you otherwise also be pulling that data through your management server? 19:37:07 [topcloud]: no no. 19:37:22 [jburwell]: widodh topcloud yes, the stream is only necessary where execution occurs 19:37:37 [jburwell]: which is why storage device drivers are stateless 19:37:44 [topcloud]: cloudstack has always separate ms from where data flows. 19:37:44 [jburwell]: and create streams on demand 19:38:01 [jburwell]: so, if a StorageDevice instance is serialized down to the SSVM 19:38:07 [jburwell]: and then SSVM starts using it 19:38:14 [topcloud]: so in this case, cloudstack talks in uri and then the provider figures out how to translate a uri to a data stream. 19:38:14 [widodh]: get it, inside the SSVM 19:38:15 [jburwell]: the stream will not be created until the SSVM asks for it 19:38:29 [topcloud]: and it might be done inside ssvm or inside dom0 of the host. 19:38:37 [jburwell]: topcloud exactly 19:38:51 [jburwell]: the driver doesn't know or care where its located 19:38:52 [jburwell]: when asked for a stream, it creates i 19:38:53 [jburwell]: it 19:39:17 [jburwell]: whether it is on the SSVM, mgmt server, or some other daemon created in a future version of CS 19:39:23 [topcloud]: i think we are thinking about the same thing but i still don't understand why the stream needs to be presented to cs management server. 19:40:01 [topcloud]: to me uri is presented to management server. and then management server presents it to a data motion service and that motion service understand how to extract stream from the uri. 19:40:32 [jburwell]: because I believe a lot of logic can pulled out of the physical layer into the logical layer 19:40:37 [jburwell]: to gain consistent algorithms 19:40:47 [jburwell]: and greatly simplify device driver implementation 19:41:14 [topcloud]: so you want to push that into cloudstack's orchestration? 19:41:14 [edison_cs]: jburwell: at the logical level, all the volumes/* are all have the same base interface: dataobject 19:41:39 [edison_cs]: so mgt server, doesn't know any difference between volumes/snapshot/template during the orchestartion 19:41:52 [jburwell]: take downloading a template from a http resource as an example 19:42:09 [jburwell]: we connect to the http resource and get an output stream from the request 19:42:24 [topcloud]: define we? 19:42:29 [jburwell]: that template is going to an storage device 19:42:45 [jburwell]: we is CS 19:42:52 [topcloud]: ok. 19:42:54 [jburwell]: the SSVM daemon 19:43:01 [jburwell]: downloading a template 19:43:02 [topcloud]: ah.... 19:43:16 [topcloud]: so when we talk about CS for us, we're not considering ssvm. 19:43:22 [jburwell]: then is simply a matter grabbing an output stream from the storage device 19:43:23 [jburwell]: and copying between the two 19:43:38 [jburwell]: this logic really could be executed anywher 19:43:39 [jburwell]: e 19:44:07 [topcloud]: ok...i think i see a difference in understanding here. 19:44:14 [jburwell]: but in this case, template download occurs on the SSVM 19:44:24 [topcloud]: in terms of ssvm we agree with what you're saying. 19:45:14 [topcloud]: in the refactoring, what we've changed is only the logical part on the management server. 19:45:23 [topcloud]: it did not involve the ssvm portion. 19:45:40 [topcloud]: we believe the actual provisioning can be done anywhere, including ssvm but not necessarily always ssvm. 19:45:45 [jburwell]: in the management server, my notion of the storage device bridges logical to physical 19:46:07 [jburwell]: in my view, to be clear, I don't see storage devices as caring where they live and execute 19:46:09 [jburwell]: they simply do as they are told 19:46:14 [topcloud]: agreed. 19:46:38 [jburwell]: I hate to do this 19:46:40 [topcloud]: how about we talk about this in terms of the interfaces introduced in the refactor. 19:46:47 [jburwell]: but an emergency has come up 19:46:52 [jburwell]: and I need to get home rickey tick 19:46:59 [topcloud]: oops. 19:46:59 [topcloud]: hope everything is okay. 19:47:07 [jburwell]: so do I 19:47:09 [widodh]: Yes, go home 19:47:14 [jburwell]: can adjourn to tomorrow afternoon 19:47:14 [jburwell]: ? 19:47:15 [widodh]: more important 19:47:16 [topcloud]: let us know when you can do it. 19:47:25 [topcloud]: sure...go take care of your emergency first. 19:47:44 [jburwell]: we will pick up at topcloud's request to discuss in terms of the interfaces in the refactor 19:47:59 [topcloud]: widodh: have what we talked about resolved your concerns? 19:48:07 [widodh]: Ok! Thanks jburwell. Good luck! 19:48:14 [topcloud]: jburwell: sure...thanks. 19:48:14 [widodh]: topcloud: Yes. I have to be honest that I didn't look at the code good enough 19:48:15 [jburwell]: widodh thanks
