[vdsm] cpopen version inconsistencies
Yaniv synced the github version with the code that was released. 1.3 is now tagged. https://github.com/ficoos/cpopen/tree/1.3.0 ___ vdsm-devel mailing list vdsm-devel@lists.fedorahosted.org https://lists.fedorahosted.org/mailman/listinfo/vdsm-devel
Re: [vdsm] Modeling graphics framebuffer device in VDSM
I remember there was a discussion about this. https://lists.fedorahosted.org/pipermail/vdsm-devel/2013-November/002727.html I don't remember what came of it in the end though. - Original Message - From: Frantisek Kobzik fkob...@redhat.com To: vdsm-devel@lists.fedorahosted.org Sent: Friday, March 28, 2014 3:06:17 PM Subject: [vdsm] Modeling graphics framebuffer device in VDSM Dear VDSM devels, I've been working on refactoring graphics devices in engine and VDSM for some time now and I'd like know your opinion of that. The aim of this refactoring is to model graphics framebuffer (SPICE, VNC) as device in the engine and VDSM. This which is quite natural since libvirt treats graphics as a device and we have some kind of devices infrastructure in both projects. Another advantage (and actually the main reason for refactoring) is simplified support for multiple graphics framebuffers on a single vm. Currently, passing information about graphics from engine to VDSM is done via 'display' param in conf. In the other direction VDSM informs the engine about graphics parameters ('displayPort', 'displaySecurePort', 'displayIp' and 'displayNetwork') in conf as well. What I'd like to achieve is to encapsulate all this information in specParams of the new graphics device and use specParams as a place for transfering data about graphics device between engine and vdsm. What do you think? the draft patch is here: http://gerrit.ovirt.org/#/c/23555/ (it's currently marked with '-1' but it puts some light on what the solution looks like so feel free to take a look). Thanks, Franta. ___ vdsm-devel mailing list vdsm-devel@lists.fedorahosted.org https://lists.fedorahosted.org/mailman/listinfo/vdsm-devel ___ vdsm-devel mailing list vdsm-devel@lists.fedorahosted.org https://lists.fedorahosted.org/mailman/listinfo/vdsm-devel
Re: [vdsm] thread pool implementation
The thing that worries me the most is stuck threads. I hate them! Could we move to multiple libvirt connections scheme? Where if a call takes too long we just close the connection. I know that the call is still running in libvirt but then it's their problem and not my problem. That way the thread pool doesn't need to handle this use case making it much simpler. Because apart from the problem of libvirt calls getting stuck we just need a run of the mill threadpool solution. - Original Message - From: Francesco Romani from...@redhat.com To: vdsm-devel vdsm-devel@lists.fedorahosted.org Cc: Saggi Mizrahi smizr...@redhat.com, Yaniv Bronheim ybron...@redhat.com Sent: Tuesday, March 25, 2014 1:55:36 PM Subject: thread pool implementation Hello, in order to reduce the number of sampling threads, we'd like to move from a one thread per VM to a thread pool. The strongest requirement we have is to be able to detect if a worker pool is not responding, and if so to detach it from the pool and to kill it as soon as possible; then a new worker should be made available. This is because in sampling we are going to call libvirt and libvirt calls can block or, even worse, get stuck (I'm looking at you virDomainGetBlockInfo - http://libvirt.org/html/libvirt-libvirt.html#virDomainGetBlockInfo ) So, we need a thread pool implementation :) What is the best way forward? I see a few options: * we have a thread pool already in storage. Should we move it outside storage to lib/ and extend it? * there is a thread pool hidden inside the multiprocessing module! (see http://docs.python.org/2/library/multiprocessing.html#module-multiprocessing.dummy) should we switch to this, at least for sampling? * Python 3.2+ has concurrent.futures which has a nice API and can use a thread pool executor. See http://docs.python.org/3.3/library/concurrent.futures.html#module-concurrent.futures There is a backport for python 2.6/2.7 also: https://pypi.python.org/pypi/futures Maybe this is the most forward compatible way? * Add an(other) thread pool? I don't really have any preference granted the requirement above is satisfied. Thoughts? Especially Infra people's feedback would be appreciated. -- Francesco Romani RedHat Engineering Virtualization R D Phone: 8261328 IRC: fromani ___ vdsm-devel mailing list vdsm-devel@lists.fedorahosted.org https://lists.fedorahosted.org/mailman/listinfo/vdsm-devel
Re: [vdsm] VDSM profiling results, round 1
pthread.py:129(wait) 1230.640 1377.992 +147.28 (BAD) The threadpool would just get stuck on wait() if there are no tasks since Queues use Conditions internally. This might explain how the average wait time is so long. - Original Message - From: Francesco Romani from...@redhat.com To: vdsm-devel vdsm-devel@lists.fedorahosted.org Sent: Wednesday, March 19, 2014 10:33:51 AM Subject: [vdsm] VDSM profiling results, round 1 (sending again WITHOUT the attachments) Hi everyone I'd like to share the first round of profiling results for VDSM and my next steps. Summary: - experimented a couple of profiling approaches and found a good one - benchmarked http://gerrit.ovirt.org/#/c/25678/ : it is beneficial, was merged - found a few low-hanging fruits which seems quite safe to merge and beneficial to *all* flows - started engagement with infra (see other thread) to have common and polished performance tools - test roadmap is shaping up, wiki/ML will be updated in the coming days Please read through for a more detailed discussion. Every comment is welcome. Disclaimer: long mail, lot of content, please point out if something is missing or not clear enough or if deserves more discussion. +++ == First round results == First round of profiling was a follow-up of what I shown during the VDSM gathering. The results file contains a full profile ordered by descending time. In a nutshell: parallel start of 32 tiny VMs using engine REST API and a single hypervisor host. VMs are tiny just because I want to stuff as much VMs I can in my mini-dell (16 GB ram, 4 core + HT CPUs) It is worth to point out a few differences with respect to the *profile* (NOT the graphs) I shown during the gathering: - profile data is now collected using the profile decorator (see http://www.ovirt.org/Profiling_Vdsm) just around Vm._startUnderlyingVm. The gathering profile was obtained using the yappi application-wide profiler (see https://code.google.com/p/yappi/) and 40 VMs. * why yappi? I thought an application-wide profiler gathers more information and let us to have a better picture. I actually still think that but I faced some yappi misbehaviour which I want to fix later; function-level profile so far is easier to collect (just grab the data dumped to file). * why 40 VMs? I started with 64 but exausted my storage backing store :) Will add more storage space in the next days, for the moment I stepped back to 32. It is worth to note that while on one hand numbers change a bit (if you remember the old profile data and the scary 80secs wasted on namedtuple), on the other hand the suspects are the same and the relative positions are roughly the same. So I believe our initial findings (namedtuple patch) and the plan are still valid. == how it was done == I am still focusing just on the monday morning scenario (mass start of many VMs at the same time). Each run consisted in a parallel start of 32 VMs as described in the result data. VDSM was restarted between one run and the another. engine was *NOT* restarted between runs. individual profiles have been gathered after all the runs and the profile was extracted from the aggregation of them. profile dumps are available to everyone, just drop me a note and I'll put the tarball somewhere. please find attached the profile data as txt format. For easier consumption, they are also available on pastebin: baseline : http://paste.fedoraproject.org/86318/ namedtuple fix: http://paste.fedoraproject.org/86378/ pickle fix: http://paste.fedoraproject.org/86600/ (see below) == hotspots == the baseline profile data highlights five major areas and hotspots: 1. internal concurrency (possible patch: http://gerrit.ovirt.org/#/c/25857/ - see below) 2. libvirt 3. XML processing (initial patch: http://gerrit.ovirt.org/#/c/17694/) 4. namedtuple (patch: http://gerrit.ovirt.org/#/c/25678/ - fixed, merged) 5. pickling (patch: http://gerrit.ovirt.org/#/c/25860/ - see below) #4 is beneficial in the ISCSI path and it was already merged. #1 shows some potential but it needs to be carefully evaluated to avoid performance regressions on different scenarios (e.g. bigger machines than mine :)) #2 is basically outside of our control but it needs to be watched out #3 and #5 are beneficial for all flows and scenarios and are safe to merge. #5 is almost a no-brainer IMO == Note about the third profile == When profiling the cPickle patch http://paste.fedoraproject.org/86600/ the tests turned out actually *slower* with respect the second profile with just the namedtuple patch. The hotspots seems to be around concurrency and libvirt: location profile2(s)profile3(s) diff(s) pthread.py:129(wait) 1230.640 1377.992 +147.28 (BAD)
Re: [vdsm] Profiling and benchmarking VDSM
Thank you for taking the initiative. Just reminding you that the test framework is owned by infra so don't forget to put Yaniv and I in the CC for all future correspondence regarding this feature. As I will be the one responsible for the final approval. Ignore http://www.ovirt.org/Vdsm_Developers#Performance_and_scalability Also we don't want to do it per test since it's meaningless for most tests since they only run through the code once. I started investigating how we want to solve this issue in the past and this is what I can up with. What we need to do is create a decorator that wraps the test with cProfile. We also want to create a generator that using configuration from nose. def BenchmarkIter(): start = time.time() i = 0 while i MIN_ITERATIONS or (time.time() - start) MIN_TIME_RUNNING: yield i i += 1 So that writing a benchmark is just: @benchmark([min_iter[, min_time_running]]) def testSomething(self): something() That way we are sure we have a statistically significant sample for all tests. There will need to be a plugin created for nose that skips @benchmark if benchmarks are not turned on and can generate output for the Jenkins performance plugin[1]. That way we can run it every night as the benchmarks will be slow to run since they will intentionally take a few seconds each and try and hammer the CPU\disk so people would probably not run the entire suite themselves. [1] https://wiki.jenkins-ci.org/display/JENKINS/Performance+Plugin - Original Message - From: ybronhei ybron...@redhat.com To: Francesco Romani from...@redhat.com, vdsm-devel vdsm-devel@lists.fedorahosted.org Sent: Monday, March 17, 2014 1:57:34 PM Subject: Re: [vdsm] Profiling and benchmarking VDSM On 03/17/2014 01:03 PM, Francesco Romani wrote: - Original Message - From: Francesco Romani from...@redhat.com To: Antoni Segura Puimedon asegu...@redhat.com Cc: vdsm-devel vdsm-devel@lists.fedorahosted.org Sent: Monday, March 17, 2014 10:32:40 AM Subject: Re: [vdsm] Profiling and benchmarking VDSM next immediate steps will be - have a summary page to collect all performance/profiling/benchmarking page Links added at the bottom of the VDSM developer page: http://www.ovirt.org/Vdsm_Developers see item #15 http://www.ovirt.org/Vdsm_Developers#Performance_and_scalability - document and detail the scenarios the way you described (which I like) the benchmark templates will be attached/documented on this page Started to sketch our Monday Morning test scenario here http://www.ovirt.org/VDSM_benchmarks (yes, looks quite ugly, no attached template yet. Will add). I'll wait a few hours to let things cool down a bit and see if something is missing, then start with the benchmarks using the new, proper definitions and a more structured approach like the one documented on the wiki. http://gerrit.ovirt.org/#/c/25678/ is the first in queue. can we add the profiling decorator on each nose test function and share results link with each push to gerrit? the issue is that it collects profiling only for one function in a file. we need somehow to integrate all outputs.. the nose tests might be good to check the profiling status. it should cover most of the flows specifically (especially if we'll enforce adding unit tests for each new change) -- Yaniv Bronhaim. ___ vdsm-devel mailing list vdsm-devel@lists.fedorahosted.org https://lists.fedorahosted.org/mailman/listinfo/vdsm-devel ___ vdsm-devel mailing list vdsm-devel@lists.fedorahosted.org https://lists.fedorahosted.org/mailman/listinfo/vdsm-devel
Re: [vdsm] Profiling and benchmarking VDSM
- Original Message - From: Francesco Romani from...@redhat.com To: vdsm-devel vdsm-devel@lists.fedorahosted.org Cc: ybronhei ybron...@redhat.com, Saggi Mizrahi smizr...@redhat.com Sent: Tuesday, March 18, 2014 12:47:55 PM Subject: Re: [vdsm] Profiling and benchmarking VDSM - Original Message - From: Saggi Mizrahi smizr...@redhat.com To: Francesco Romani from...@redhat.com Cc: vdsm-devel vdsm-devel@lists.fedorahosted.org, ybronhei ybron...@redhat.com Sent: Tuesday, March 18, 2014 10:18:16 AM Subject: Re: [vdsm] Profiling and benchmarking VDSM Thank you for taking the initiative. Just reminding you that the test framework is owned by infra so don't forget to put Yaniv and I in the CC for all future correspondence regarding this feature. As I will be the one responsible for the final approval. Yes, of course I will. At the moment I'm using unofficial/out of tree decorators and support code just because I just started the exploration and the work. In the meantime, we can and should discuss the better/long term/official approach to measure performance and benchmark things. Ignore http://www.ovirt.org/Vdsm_Developers#Performance_and_scalability Not sure I understood correctly. You mean I should drop my additions to the Vdsm_Developers page? Don't drop it, just don't have it as a priority over actual work. I'd much rather have benchmarks and no WIKI than the other way around. :) Also we don't want to do it per test since it's meaningless for most tests since they only run through the code once. I started investigating how we want to solve this issue in the past and this is what I can up with. What we need to do is create a decorator that wraps the test with cProfile. We also want to create a generator that using configuration from nose. def BenchmarkIter(): start = time.time() i = 0 while i MIN_ITERATIONS or (time.time() - start) MIN_TIME_RUNNING: yield i i += 1 So that writing a benchmark is just: @benchmark([min_iter[, min_time_running]]) def testSomething(self): something() That way we are sure we have a statistically significant sample for all tests. Agreed There will need to be a plugin created for nose that skips @benchmark if benchmarks are not turned on and can generate output for the Jenkins performance plugin[1]. That way we can run it every night as the benchmarks will be slow to run since they will intentionally take a few seconds each and try and hammer the CPU\disk so people would probably not run the entire suite themselves. [1] https://wiki.jenkins-ci.org/display/JENKINS/Performance+Plugin This looks very nice. Thanks and bests, -- Francesco Romani RedHat Engineering Virtualization R D Phone: 8261328 IRC: fromani ___ vdsm-devel mailing list vdsm-devel@lists.fedorahosted.org https://lists.fedorahosted.org/mailman/listinfo/vdsm-devel
Re: [vdsm] The new GIL in python 3.2+
It's a very interesting read and I request everyone from infra read it and recommend others to put down a few minutes and read it. To give the VDSM POV. Even though it make things faster. It still doesn't solve the IO issues since they are caused by a mix of kernel issues (D state) and python libraries implementations. Namely not releasing the GIL either by mistake or for optimization sake or because an underlying C implementation just isn't thread safe. It is an interesting read about how locking policy effects speed even with a single Lock. It's good to remember this was made without effecting how people write python code at all. Thanks Francesco for sending it. - Original Message - From: Francesco Romani from...@redhat.com To: vdsm-devel vdsm-devel@lists.fedorahosted.org Sent: Thursday, March 13, 2014 10:33:34 AM Subject: [vdsm] The new GIL in python 3.2+ Hi everyone I found some time ago this very good presentation about the improvements made on python 3.2+ for the GIL (which unfortunately is still there... I think we need pypy to get rid of it) http://www.dabeaz.com/python/NewGIL.pdf -- Francesco Romani RedHat Engineering Virtualization R D Phone: 8261328 IRC: fromani ___ vdsm-devel mailing list vdsm-devel@lists.fedorahosted.org https://lists.fedorahosted.org/mailman/listinfo/vdsm-devel ___ vdsm-devel mailing list vdsm-devel@lists.fedorahosted.org https://lists.fedorahosted.org/mailman/listinfo/vdsm-devel
Re: [vdsm] suggested patch for python-pthreading
- Original Message - From: Dan Kenigsberg dan...@redhat.com To: Yaniv Bronheim ybron...@redhat.com Cc: VDSM Project Development vdsm-devel@lists.fedorahosted.org, Saggi Mizrahi smizr...@redhat.com Sent: Tuesday, February 4, 2014 12:20:52 PM Subject: Re: suggested patch for python-pthreading On Tue, Feb 04, 2014 at 04:04:37AM -0500, Yaniv Bronheim wrote: according to coredumps we found in the scope of the bug [1] we opened [2] that suggested to override python's implementation of thread.allocate_lock in each coredump we saw few threads stuck with the bt: #16 0x7fcb69288c93 in PyEval_CallObjectWithKeywords (func=0x2527820, arg=0x7fcb6972f050, kw=value optimized out) at Python/ceval.c:3663 #17 0x7fcb692ba7ba in t_bootstrap (boot_raw=0x250a820) at Modules/threadmodule.c:428 #18 0x7fcb68fa3851 in start_thread (arg=0x7fcb1bfff700) at pthread_create.c:301 #19 0x7fcb6866694d in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:115 in pystack the threads were stuck in /usr/lib64/python2.6/threading.py (513): __bootstrap_inner in bootstrap_inner we use thread.allocate_lock which python-pthreading does not override. we suggest the following commit: From 9d89e9be1a379b3d93b23dd54a381b9ca0973ebc Mon Sep 17 00:00:00 2001 From: Yaniv Bronhaim ybron...@redhat.com Date: Mon, 3 Feb 2014 19:24:30 +0200 Subject: [PATCH] Mocking thread.allocate_lock with Lock imp Signed-off-by: Yaniv Bronhaim ybron...@redhat.com --- pthreading.py | 4 1 file changed, 4 insertions(+) diff --git a/pthreading.py b/pthreading.py index 916ca7f..96df42c 100644 --- a/pthreading.py +++ b/pthreading.py @@ -132,6 +132,10 @@ def monkey_patch(): Thus, Queue and SocketServer can easily enjoy them. +import thread + +thread.allocate_lock = Lock + import threading threading.Condition = Condition -- 1.8.3.1 [1] https://bugzilla.redhat.com/show_bug.cgi?id=1022036 [2] https://bugzilla.redhat.com/show_bug.cgi?id=1060749 It makes sense to use pthreading.Lock for thread.allocate_lock instead of the standard threading.Lock CPU hog. However, I do not understand its relevance to the deadlock sited above: pthreading.Lock fixes performance issues, but not correctness issues, of threading.Lock. Would you explain, in the commit message of the pthreading patch, why you believe that the implementation of thread.allocate_lock() is buggy? Do you know if the bug is fixed in Python 3? Regards, Dan. We actually don't have concrete proof as we can't reproduce the bug so we can't test this. We are shooting in the dark hoping something hits. We assume it's there since all of our cordumps have a thread stuck acquiring the limbo lock. Since mixing lock implementations is probably a bad idea we assume that overriding this a thing we should do anyway we thought we'll give it a go. If VDSM gets stuck again we will have another coredump that we could compare to the others. ___ vdsm-devel mailing list vdsm-devel@lists.fedorahosted.org https://lists.fedorahosted.org/mailman/listinfo/vdsm-devel
Re: [vdsm] API.py | gerrit.ovirt Code Review
- Original Message - From: Doron Fediuck dfedi...@redhat.com To: Vinzenz Feenstra eviliss...@redhat.com Cc: vdsm-devel@lists.fedorahosted.org Sent: Friday, October 18, 2013 10:17:55 AM Subject: Re: [vdsm] API.py | gerrit.ovirt Code Review - Original Message - From: Vinzenz Feenstra eviliss...@redhat.com To: vdsm-devel@lists.fedorahosted.org Sent: Thursday, October 17, 2013 1:22:48 PM Subject: Re: [vdsm] API.py | gerrit.ovirt Code Review On 10/17/2013 08:43 AM, Doron Fediuck wrote: http://gerrit.ovirt.org/#/c/20126/4/vdsm/API.py Dan, just a general design question. The above will report the HA score to the engine. I suspect that in the next versions we'll extend the HA integration for other operations, such as shutting down HA. So going forward I think we'll need something like vdsm/momIF.py to stabilize this integration. What do you think? I think if you already know that you'll be extending this, it'd be nicer to already start adding this to a new module where you can keep everything together related to this. Rather than extending bits all over the place and having everywhere these conditional imports. In general we want to get rid of API.py in favor of subsystem specific classes. So removing things from API.py and moving them to another files is the recommended course of action. -- Regards, Vinzenz Feenstra | Senior Software Engineer RedHat Engineering Virtualization R D Phone: +420 532 294 625 IRC: vfeenstr or evilissimo Better technology. Faster innovation. Powered by community collaboration. See how it works at redhat.com The idea is to keep what we no have, and when need to extend we'll replace the import with an interface the same way mom has. The only question here is design wise, if such an interface will make sense. ___ vdsm-devel mailing list vdsm-devel@lists.fedorahosted.org https://lists.fedorahosted.org/mailman/listinfo/vdsm-devel ___ vdsm-devel mailing list vdsm-devel@lists.fedorahosted.org https://lists.fedorahosted.org/mailman/listinfo/vdsm-devel
Re: [vdsm] vdsm sync meeting - October 7th 2013
- Original Message - From: Oved Ourfalli ov...@redhat.com To: Saggi Mizrahi smizr...@redhat.com Cc: Dan Kenigsberg dan...@redhat.com, dc...@redhat.com, VDSM Project Development vdsm-devel@lists.fedorahosted.org Sent: Tuesday, October 8, 2013 11:42:23 AM Subject: Re: [vdsm] vdsm sync meeting - October 7th 2013 - Original Message - From: Saggi Mizrahi smizr...@redhat.com To: Dan Kenigsberg dan...@redhat.com Cc: dc...@redhat.com, VDSM Project Development vdsm-devel@lists.fedorahosted.org Sent: Monday, October 7, 2013 5:42:54 PM Subject: Re: [vdsm] vdsm sync meeting - October 7th 2013 - Original Message - From: Dan Kenigsberg dan...@redhat.com To: VDSM Project Development vdsm-devel@lists.fedorahosted.org, dc...@redhat.com Sent: Monday, October 7, 2013 5:25:22 PM Subject: [vdsm] vdsm sync meeting - October 7th 2013 We had an unpleasant talk, hampered by statics and disconnection on danken's side. Beyond the noises I've managed to recognize Yaniv, Toni, Douglas, Danken, Ayal, Timothy, Yeela and Mooli. We've managed to discuss: - vdsm-4.13.0 is tagged, with a know selinux issue on el6. Expect a new seliux-policy solving it any time soon. - All bugfixes should be backported to ovirt-3.3, so that we have a stable and comfortable vdsm in ovirt-3.3.1. Risky changes and new features should remain in master IMO. - We incorporated a glusterfs requirement breaking rpm installaiton for people. We should avoid that by posters notifying reviewers more prominently and by having http://jenkins.ovirt.org/job/vdsm_install_rpm_sanity_gerrit/ run on every patch that touches vdsm.spec.in. David, could you make the adjustment to the job? - We discussed feature negotiation: Toni and Dan liked the idea of having vdsm expose a feature flags, to make it easier on Engine to check if a certain feature is supported. Ayal argues that this is useful only for capabilities that depend on existence on lower level components. Sees little value in fine feature granularity on vdsm side - versions is enough. Versions might not be enough here, as some features might be supported by VDSM version X, but not when it is installed under operating system Y. IMO, VDSM should reflect that when reporting the features. So the disputed question is only how many feature flags we should have, and when to set them: statically or based on negotiation with kernel/libvirt/gluster/what not. I already voiced my reservation over the entire concept of feature flags. Proposing we only move to specific introspective verbs maintained in the subsystem. Have vdsm.getAvailableStorageDomainTypes() ['gluster'] instead of vdsm.getFeatures() ['storagetype/gluster'] It allows for much higher level of flexibility as the aforementioned verb can also return other information about the domain type: For example returning each domain type with parameter information: {'nfs': {'connect_params': [ {'name': 'timeout', 'type': 'int', 'range': [0, 99], 'desc': 'Sets the timeout', So even parameters can potentially be introspected. IMO it is great to have a verb per domain (e.g. network, storage, virt, etc.), as it allows getting deeper information about features. However, it does not conflict with having a single general getFeatures verb. Such a verb can be useful cases in which you don't really need more information, for example in establishing a feature negotiation between the engine and VDSM. No one is talking about feature negotiation. It's feature reporting. And all I'm saying is that having a verb reporting unrelated things in unrelated formats is usually a bad idea. How would features be represented strings? fqdn? objects of different types? If it's a string how would the user know how features depends on each other. How granular should this be? How do we change granularity in the future? We must have verbs with clear scope. Anyone can tell what GetServerConnectionTypes() needs to return. We know what it's granularity is. We know how it relates to other things. We know what flows need to check it and how it might effect them. I have no idea what getFeatures() even means. If you find out that a specific feature is supported, and you would like to get more details, such as parameter information, you would query specifically for that. - Unified network persistence patches are being merged into master - Timothy is working on fixing http://jenkins.ovirt.org/job/vdsm_verify_error_codes/lastBuild/console (hopefully by introducing the new error codes to Engine) I was dropped from the call, so please append with stuff that I've missed. Sorry for the noise! Dan. ___ vdsm-devel mailing
Re: [vdsm] vdsm sync meeting - October 7th 2013
- Original Message - From: Oved Ourfalli ov...@redhat.com To: Saggi Mizrahi smizr...@redhat.com Cc: dc...@redhat.com, VDSM Project Development vdsm-devel@lists.fedorahosted.org Sent: Tuesday, October 8, 2013 4:15:22 PM Subject: Re: [vdsm] vdsm sync meeting - October 7th 2013 - Original Message - From: Saggi Mizrahi smizr...@redhat.com To: Oved Ourfalli ov...@redhat.com Cc: dc...@redhat.com, VDSM Project Development vdsm-devel@lists.fedorahosted.org Sent: Tuesday, October 8, 2013 4:08:12 PM Subject: Re: [vdsm] vdsm sync meeting - October 7th 2013 - Original Message - From: Oved Ourfalli ov...@redhat.com To: Saggi Mizrahi smizr...@redhat.com Cc: Dan Kenigsberg dan...@redhat.com, dc...@redhat.com, VDSM Project Development vdsm-devel@lists.fedorahosted.org Sent: Tuesday, October 8, 2013 11:42:23 AM Subject: Re: [vdsm] vdsm sync meeting - October 7th 2013 - Original Message - From: Saggi Mizrahi smizr...@redhat.com To: Dan Kenigsberg dan...@redhat.com Cc: dc...@redhat.com, VDSM Project Development vdsm-devel@lists.fedorahosted.org Sent: Monday, October 7, 2013 5:42:54 PM Subject: Re: [vdsm] vdsm sync meeting - October 7th 2013 - Original Message - From: Dan Kenigsberg dan...@redhat.com To: VDSM Project Development vdsm-devel@lists.fedorahosted.org, dc...@redhat.com Sent: Monday, October 7, 2013 5:25:22 PM Subject: [vdsm] vdsm sync meeting - October 7th 2013 We had an unpleasant talk, hampered by statics and disconnection on danken's side. Beyond the noises I've managed to recognize Yaniv, Toni, Douglas, Danken, Ayal, Timothy, Yeela and Mooli. We've managed to discuss: - vdsm-4.13.0 is tagged, with a know selinux issue on el6. Expect a new seliux-policy solving it any time soon. - All bugfixes should be backported to ovirt-3.3, so that we have a stable and comfortable vdsm in ovirt-3.3.1. Risky changes and new features should remain in master IMO. - We incorporated a glusterfs requirement breaking rpm installaiton for people. We should avoid that by posters notifying reviewers more prominently and by having http://jenkins.ovirt.org/job/vdsm_install_rpm_sanity_gerrit/ run on every patch that touches vdsm.spec.in. David, could you make the adjustment to the job? - We discussed feature negotiation: Toni and Dan liked the idea of having vdsm expose a feature flags, to make it easier on Engine to check if a certain feature is supported. Ayal argues that this is useful only for capabilities that depend on existence on lower level components. Sees little value in fine feature granularity on vdsm side - versions is enough. Versions might not be enough here, as some features might be supported by VDSM version X, but not when it is installed under operating system Y. IMO, VDSM should reflect that when reporting the features. So the disputed question is only how many feature flags we should have, and when to set them: statically or based on negotiation with kernel/libvirt/gluster/what not. I already voiced my reservation over the entire concept of feature flags. Proposing we only move to specific introspective verbs maintained in the subsystem. Have vdsm.getAvailableStorageDomainTypes() ['gluster'] instead of vdsm.getFeatures() ['storagetype/gluster'] It allows for much higher level of flexibility as the aforementioned verb can also return other information about the domain type: For example returning each domain type with parameter information: {'nfs': {'connect_params': [ {'name': 'timeout', 'type': 'int', 'range': [0, 99], 'desc': 'Sets the timeout', So even parameters can potentially be introspected. IMO it is great to have a verb per domain (e.g. network, storage, virt, etc.), as it allows getting deeper information about features. However, it does not conflict with having a single general getFeatures verb. Such a verb can be useful cases in which you don't really need more information, for example in establishing a feature negotiation between the engine and VDSM. No one is talking about feature negotiation. It's feature reporting. You're right. My bad. Feature reporting is the right terminology here. And all I'm saying is that having a verb reporting unrelated things in unrelated formats is usually a bad idea. How would features be represented strings? fqdn? objects of different types? If it's a string how would the user know how features depends on each other. How granular should this be? How do we change granularity
Re: [vdsm] vdsm sync meeting - October 7th 2013
- Original Message - From: Dan Kenigsberg dan...@redhat.com To: VDSM Project Development vdsm-devel@lists.fedorahosted.org, dc...@redhat.com Sent: Monday, October 7, 2013 5:25:22 PM Subject: [vdsm] vdsm sync meeting - October 7th 2013 We had an unpleasant talk, hampered by statics and disconnection on danken's side. Beyond the noises I've managed to recognize Yaniv, Toni, Douglas, Danken, Ayal, Timothy, Yeela and Mooli. We've managed to discuss: - vdsm-4.13.0 is tagged, with a know selinux issue on el6. Expect a new seliux-policy solving it any time soon. - All bugfixes should be backported to ovirt-3.3, so that we have a stable and comfortable vdsm in ovirt-3.3.1. Risky changes and new features should remain in master IMO. - We incorporated a glusterfs requirement breaking rpm installaiton for people. We should avoid that by posters notifying reviewers more prominently and by having http://jenkins.ovirt.org/job/vdsm_install_rpm_sanity_gerrit/ run on every patch that touches vdsm.spec.in. David, could you make the adjustment to the job? - We discussed feature negotiation: Toni and Dan liked the idea of having vdsm expose a feature flags, to make it easier on Engine to check if a certain feature is supported. Ayal argues that this is useful only for capabilities that depend on existence on lower level components. Sees little value in fine feature granularity on vdsm side - versions is enough. So the disputed question is only how many feature flags we should have, and when to set them: statically or based on negotiation with kernel/libvirt/gluster/what not. I already voiced my reservation over the entire concept of feature flags. Proposing we only move to specific introspective verbs maintained in the subsystem. Have vdsm.getAvailableStorageDomainTypes() ['gluster'] instead of vdsm.getFeatures() ['storagetype/gluster'] It allows for much higher level of flexibility as the aforementioned verb can also return other information about the domain type: For example returning each domain type with parameter information: {'nfs': {'connect_params': [ {'name': 'timeout', 'type': 'int', 'range': [0, 99], 'desc': 'Sets the timeout', So even parameters can potentially be introspected. - Unified network persistence patches are being merged into master - Timothy is working on fixing http://jenkins.ovirt.org/job/vdsm_verify_error_codes/lastBuild/console (hopefully by introducing the new error codes to Engine) I was dropped from the call, so please append with stuff that I've missed. Sorry for the noise! Dan. ___ vdsm-devel mailing list vdsm-devel@lists.fedorahosted.org https://lists.fedorahosted.org/mailman/listinfo/vdsm-devel ___ vdsm-devel mailing list vdsm-devel@lists.fedorahosted.org https://lists.fedorahosted.org/mailman/listinfo/vdsm-devel
Re: [vdsm] [Engine-devel] Adding vdsm_api support for gluster vdsm verbs
This is very Gluster specific but I guess it's OK until I get some time to make things a bit more generic over there. - Original Message - From: Aravinda avish...@redhat.com To: vdsm-devel@lists.fedorahosted.org, engine-de...@ovirt.org, Saggi Mizrahi smizr...@redhat.com, a...@us.ibm.com Cc: Dan Kenigsberg dan...@redhat.com, Sahina Bose sab...@redhat.com Sent: Wednesday, April 17, 2013 3:49:13 PM Subject: Re: [Engine-devel] [vdsm] Adding vdsm_api support for gluster vdsm verbs [Adding Saggi, Adam, Dan, Sahina] On 04/16/2013 02:13 PM, Aravinda wrote: [Adding engine-devel] On 04/16/2013 02:10 PM, Aravinda wrote: vdsm/gluster is vdsm plugin for gluster related functionality. These functionalities are available only when vdsm-gluster package is installed. So the schema JSON of vdsm-gluster cannot be added to the same file(vdsm_api/vdsmapi-schema.json) Looks like vdsm_api is not providing plugin support. This patch adds functionality to vdsm_api to read vdsmapi-gluster-schema.json if available. But with this approach we need to edit the core vdsmapi.py file. http://gerrit.ovirt.org/#/c/13921/ Alternate approach: We can have vdsm_api/plugins or vdsm_api/schema directory inside vdsm_api, so that we can modify vdsmapi.py to read all schema files from that dir. When vdsm-gluster package installed, it copies vdsmapi-gluster-schema.json into schema directory. -- regards Aravinda On 04/15/2013 04:14 PM, Aravinda wrote: Hi, We are trying to add json rpc support for vdsm gluster verbs. I submitted a patch to read gluster verbs schema from vdsm/gluster directory. http://gerrit.ovirt.org/#/c/13921/ Let me know if the approach is fine. -- regards Aravinda ___ vdsm-devel mailing list vdsm-devel@lists.fedorahosted.org https://lists.fedorahosted.org/mailman/listinfo/vdsm-devel ___ vdsm-devel mailing list vdsm-devel@lists.fedorahosted.org https://lists.fedorahosted.org/mailman/listinfo/vdsm-devel ___ Engine-devel mailing list engine-de...@ovirt.org http://lists.ovirt.org/mailman/listinfo/engine-devel ___ vdsm-devel mailing list vdsm-devel@lists.fedorahosted.org https://lists.fedorahosted.org/mailman/listinfo/vdsm-devel
Re: [vdsm] [Engine-devel] Proposal VDSM = Engine Data Statistics Retrieval Optimization
I am completely against this. It make the return value differ according to input which is a big no no when talking about type safe APIs. The only reason we have this problem is because there is this thing against making multiple calls. Just split it up. getVmRuntimeStats() - transient things like mem and cpu% getVmInformation() - (semi)static things like disk\networking layout etc. Each updated at different intervals. - Original Message - From: Vinzenz Feenstra vfeen...@redhat.com To: vdsm-devel@lists.fedorahosted.org, engine-de...@ovirt.org Sent: Thursday, March 7, 2013 6:25:54 AM Subject: [Engine-devel] Proposal VDSM = Engine Data Statistics Retrieval Optimization Please find the prettier version on the wiki: http://www.ovirt.org/Proposal_VDSM_-_Engine_Data_Statistics_Retrieval Proposal VDSM - Engine Data Statistics Retrieval VDSM = Engine data retrieval optimization Motivation: Currently the RHEVM engine is polling the a lot of data from VDSM every 15 seconds. This should be optimized and the amount of data requested should be more specific. For each VM the data currently contains much more information than actually needed which blows up the size of the XML content quite big. We could optimize this by splitting the reply on the getVmStats based on the request of the engine into sections. For this reason Omer Frenkel and me have split up the data into parts based on their usage. This data can and usually does change during the lifetime of the VM. Rarely Changed: This data is change not very frequent and it should be enough to update this only once in a while. Most commonly this data changes after changes made in the UI or after a migration of the VM to another Host. Status = Running acpiEnable = true vmType = kvm guestName = W864GUESTAGENTT displayType = qxl guestOs = Win 8 kvmEnable = true # this should be constant and never changed pauseCode = NOERR monitorResponse = 0 session = Locked # unused netIfaces = [{'name': 'Realtek RTL8139C+ Fast Ethernet NIC', 'inet6': ['fe80::490c:92bb:bbcc:9f87'], 'inet': ['10.34.60.148'], 'hw': '00:1a:4a:22:3c:db'}] appsList = ['RHEV-Tools 3.2.4', 'RHEV-Agent64 3.2.3', 'RHEV-Serial64 3.2.3', 'RHEV-Network64 3.2.2', 'RHEV-Network64 3.2.3', 'RHEV-Block64 3.2.3', 'RHEV-Balloon64 3.2.3', 'RHEV-Balloon64 3.2.2', 'RHEV-Agent64 3.2.2', 'RHEV-USB 3.2.3', 'RHEV-Block64 3.2.2', 'RHEV-Serial64 3.2.2'] pid = 11314 guestIPs = 10.34.60.148 # duplicated info displayIp = 0 displayPort = 5902 displaySecurePort = 5903 username = user@W864GUESTAGENTT clientIp = lastLogin = 1361976900.67 Often Changed: This data is changed quite often however it is not necessary to update this data every 15 seconds. As this is cumulative data and reflects the current status, and it does not need to be snapshotted every 15 seconds to retrieve statistics. The data can be retrieved in much more generous time slices. (e.g. Every 5 minutes) network = {'vnet1': {'macAddr': '00:1a:4a:22:3c:db', 'rxDropped': '0', 'txDropped': '0', 'rxErrors': '0', 'txRate': '0.0', 'rxRate': '0.0', 'txErrors': '0', 'state': 'unknown', 'speed': '100', 'name': 'vnet1'}} disksUsage = [{'path': 'c:\\', 'total': '64055406592', 'fs': 'NTFS', 'used': '19223846912'}, {'path': 'd:\\', 'total': '3490912256', 'fs': 'UDF', 'used': '3490912256'}] timeOffset = 14422 elapsedTime = 68591 hash = 2335461227228498964 statsAge = 0.09 # unused Often Changed but unused This data does not seem to be used in the engine at all. It is not even used in the data warehouse. memoryStats = {'swap_out': '0', 'majflt': '0', 'mem_free': '1466884', 'swap_in': '0', 'pageflt': '0', 'mem_total': '2096736', 'mem_unused': '1466884'} balloonInfo = {'balloon_max': 2097152, 'balloon_cur': 2097152} disks = {'vda': {'readLatency': '0', 'apparentsize': '64424509440', 'writeLatency': '1754496','imageID': '28abb923-7b89-4638-84f8-1700f0b76482', 'flushLatency': '156549', 'readRate': '0.00', 'truesize': '18855059456', 'writeRate': '952.05'}, 'hdc': {'readLatency': '0', 'apparentsize': '0', 'writeLatency': '0', 'flushLatency': '0', 'readRate': '0.00', 'truesize': '0', 'writeRate': '0.00'}} Very frequent uppdates needed by webadmin portal: This data is mostly needed for the webadmin portal and might be required to be updated quite often. An exception here is the statsAge field, which seems to be unused by the Engine. This data could be requested every 15 seconds to keep things as they are now. cpuSys = 2.32 cpuUser = 1.34 memUsage = 30 Proposed Solution for VDSM Engine: We will introduce new optional parameters to getVmStats, getAllVmStats and list to allow a finer grained specification of data which should be included. Parameter: statsType = string (getVmStats, getAllVmStats only) Allowed values: * full (default to keep backwards compatibility) * app-list (Just send the application list) * rare (include everything from rarely changed to
Re: [vdsm] [Engine-devel] Proposal VDSM = Engine Data Statistics Retrieval Optimization
- Original Message - From: Ayal Baron aba...@redhat.com To: Saggi Mizrahi smizr...@redhat.com Cc: engine-de...@ovirt.org, vdsm-devel@lists.fedorahosted.org, Vinzenz Feenstra vfeen...@redhat.com Sent: Wednesday, March 13, 2013 5:39:24 PM Subject: Re: [vdsm] [Engine-devel] Proposal VDSM = Engine Data Statistics Retrieval Optimization - Original Message - I am completely against this. It make the return value differ according to input which is a big no no when talking about type safe APIs. The only reason we have this problem is because there is this thing against making multiple calls. Just split it up. getVmRuntimeStats() - transient things like mem and cpu% getVmInformation() - (semi)static things like disk\networking layout etc. Each updated at different intervals. +1 on splitting the data up into 2 separate API calls. You could potentially add a checksum (md5, or any other way) of the static data to getVmRuntimeStats and not bother even with polling the VmInformation if this hasn't changed. Then you could poll as often as you'd like the stats and immediately see if you also need to retrieve VmInfo or not (you rarely would). +1 To Ayal's suggestion except that instead of the engine hashing the data VDSM sends the key which is opaque to the engine. This can be a local timestap or a generation number. But, we might want to consider that when we add events polling becomes (much) less frequent so maybe it'll be an overkill. - Original Message - From: Vinzenz Feenstra vfeen...@redhat.com To: vdsm-devel@lists.fedorahosted.org, engine-de...@ovirt.org Sent: Thursday, March 7, 2013 6:25:54 AM Subject: [Engine-devel] Proposal VDSM = Engine Data Statistics Retrieval Optimization Please find the prettier version on the wiki: http://www.ovirt.org/Proposal_VDSM_-_Engine_Data_Statistics_Retrieval Proposal VDSM - Engine Data Statistics Retrieval VDSM = Engine data retrieval optimization Motivation: Currently the RHEVM engine is polling the a lot of data from VDSM every 15 seconds. This should be optimized and the amount of data requested should be more specific. For each VM the data currently contains much more information than actually needed which blows up the size of the XML content quite big. We could optimize this by splitting the reply on the getVmStats based on the request of the engine into sections. For this reason Omer Frenkel and me have split up the data into parts based on their usage. This data can and usually does change during the lifetime of the VM. Rarely Changed: This data is change not very frequent and it should be enough to update this only once in a while. Most commonly this data changes after changes made in the UI or after a migration of the VM to another Host. Status = Running acpiEnable = true vmType = kvm guestName = W864GUESTAGENTT displayType = qxl guestOs = Win 8 kvmEnable = true # this should be constant and never changed pauseCode = NOERR monitorResponse = 0 session = Locked # unused netIfaces = [{'name': 'Realtek RTL8139C+ Fast Ethernet NIC', 'inet6': ['fe80::490c:92bb:bbcc:9f87'], 'inet': ['10.34.60.148'], 'hw': '00:1a:4a:22:3c:db'}] appsList = ['RHEV-Tools 3.2.4', 'RHEV-Agent64 3.2.3', 'RHEV-Serial64 3.2.3', 'RHEV-Network64 3.2.2', 'RHEV-Network64 3.2.3', 'RHEV-Block64 3.2.3', 'RHEV-Balloon64 3.2.3', 'RHEV-Balloon64 3.2.2', 'RHEV-Agent64 3.2.2', 'RHEV-USB 3.2.3', 'RHEV-Block64 3.2.2', 'RHEV-Serial64 3.2.2'] pid = 11314 guestIPs = 10.34.60.148 # duplicated info displayIp = 0 displayPort = 5902 displaySecurePort = 5903 username = user@W864GUESTAGENTT clientIp = lastLogin = 1361976900.67 Often Changed: This data is changed quite often however it is not necessary to update this data every 15 seconds. As this is cumulative data and reflects the current status, and it does not need to be snapshotted every 15 seconds to retrieve statistics. The data can be retrieved in much more generous time slices. (e.g. Every 5 minutes) network = {'vnet1': {'macAddr': '00:1a:4a:22:3c:db', 'rxDropped': '0', 'txDropped': '0', 'rxErrors': '0', 'txRate': '0.0', 'rxRate': '0.0', 'txErrors': '0', 'state': 'unknown', 'speed': '100', 'name': 'vnet1'}} disksUsage = [{'path': 'c:\\', 'total': '64055406592', 'fs': 'NTFS', 'used': '19223846912'}, {'path': 'd:\\', 'total': '3490912256', 'fs': 'UDF', 'used': '3490912256'}] timeOffset = 14422 elapsedTime = 68591 hash = 2335461227228498964 statsAge = 0.09 # unused Often Changed but unused This data does not seem to be used in the engine at all. It is not even used in the data warehouse. memoryStats = {'swap_out': '0', 'majflt': '0', 'mem_free': '1466884', 'swap_in': '0', 'pageflt': '0', 'mem_total': '2096736', 'mem_unused': '1466884
Re: [vdsm] VDSM Repository Reorganization
- Original Message - From: Federico Simoncelli fsimo...@redhat.com To: Saggi Mizrahi smizr...@redhat.com Cc: VDSM Project Development vdsm-devel@lists.fedorahosted.org, Dan Kenigsberg dan...@redhat.com, Vinzenz Feenstra vfeen...@redhat.com, Ayal Baron aba...@redhat.com, Adam Litke a...@us.ibm.com Sent: Tuesday, February 19, 2013 11:27:59 AM Subject: Re: VDSM Repository Reorganization - Original Message - From: Saggi Mizrahi smizr...@redhat.com To: Adam Litke a...@us.ibm.com Cc: VDSM Project Development vdsm-devel@lists.fedorahosted.org, Dan Kenigsberg dan...@redhat.com, Vinzenz Feenstra vfeen...@redhat.com, Ayal Baron aba...@redhat.com, Federico Simoncelli fsimo...@redhat.com Sent: Monday, February 18, 2013 8:50:30 PM Subject: Re: VDSM Repository Reorganization - Original Message - From: Adam Litke a...@us.ibm.com To: Federico Simoncelli fsimo...@redhat.com Cc: VDSM Project Development vdsm-devel@lists.fedorahosted.org, Dan Kenigsberg dan...@redhat.com, Saggi Mizrahi smizr...@redhat.com, Vinzenz Feenstra vfeen...@redhat.com, Ayal Baron aba...@redhat.com Sent: Tuesday, February 12, 2013 3:08:09 PM Subject: Re: VDSM Repository Reorganization On Mon, Feb 11, 2013 at 12:17:39PM -0500, Federico Simoncelli wrote: It is some time now that we are discussing an eventual repository reorganization for vdsm. In fact I'm sure that we all experienced at least once the discomfort of having several modules scattered around the tree. The main goal of the reorganization would be to place the modules in their proper location so that they can be used (imported) without any special change (or hack) even when the code is executed inside the development repository (e.g. tests). Recently there has been an initial proposal about moving some of these modules: http://gerrit.ovirt.org/#/c/11858/ That spawned an interesting discussion that must involve the entire community; in fact before starting any work we should try to converge on a decision for the final repository structure in order to minimize the discomfort for the contributors that will be forced to rebase their pending gerrit patches. Even if the full reorganization won't happen in a short time I think we should plan the entire structure now and then eventually move only few relevant modules to their final location. To start the discussion I'm attaching here a sketch of the vdsm repository structure that I envision: . |-- client | |-- [...] | `-- vdsClient.py |-- common | |-- [...] | |-- betterPopen | | `-- [...] | `-- vdsm | |-- [...] | `-- config.py |-- contrib | |-- [...] | |-- nfs-check.py | `-- sos |-- daemon | |-- [...] | |-- supervdsm.py | `-- vdsmd `-- tool |-- [...] `-- vdsm-tool The schema file vdsmapi-schema.json (as well as the python module to parse it) are needed by the server and clients. Initially I'd think it should be installed in 'common', but a client does not need things like betterPopen. Any recommendation on where the schema/API definition should live? Well they both should have the file but when installed both should have their own version of the file depending on the version installed of the client or the server. This is so that vdsm-cli doesn't depend on vdsm or vice-versa. You can't have them share the file since if one is installed with a version of the schema where the schema syntax changed the client\server will fail to parse the schema. I'm not sure what's the purpose of having different versions of the client/server on the same machine. The software repository is one and it should provide both (as they're built from the same source). This is the standard way of delivering client/server applications in all the distributions. We can change that but we must have a good reason. There isn't really a reason. But, as I said, you don't want them to depend on each other or have the schema in it's own rpm. This means that you have to distribute them separately. I also want to allow to update the client on a host without updating the server. This is because you may want to have a script that works across the cluster but not update all the hosts. Now, even though you will use only old methods, the schema itself might become unparsable by old implementations. As for development, I think the least bad solution is to put it in contrib with symlinks that have relative paths. |--daemon | |-- [...] | `-- vdsmapi-schema.json - ../contrib/vdsmapi-schema.json |--client | |-- [...] | `-- vdsmapi-schema.json - ../contrib/vdsmapi-schema.json |--contrib | |-- [...] | `-- vdsmapi
Re: [vdsm] VDSM Repository Reorganization
- Original Message - From: Adam Litke a...@us.ibm.com To: Federico Simoncelli fsimo...@redhat.com Cc: VDSM Project Development vdsm-devel@lists.fedorahosted.org, Dan Kenigsberg dan...@redhat.com, Saggi Mizrahi smizr...@redhat.com, Vinzenz Feenstra vfeen...@redhat.com, Ayal Baron aba...@redhat.com Sent: Tuesday, February 12, 2013 3:08:09 PM Subject: Re: VDSM Repository Reorganization On Mon, Feb 11, 2013 at 12:17:39PM -0500, Federico Simoncelli wrote: It is some time now that we are discussing an eventual repository reorganization for vdsm. In fact I'm sure that we all experienced at least once the discomfort of having several modules scattered around the tree. The main goal of the reorganization would be to place the modules in their proper location so that they can be used (imported) without any special change (or hack) even when the code is executed inside the development repository (e.g. tests). Recently there has been an initial proposal about moving some of these modules: http://gerrit.ovirt.org/#/c/11858/ That spawned an interesting discussion that must involve the entire community; in fact before starting any work we should try to converge on a decision for the final repository structure in order to minimize the discomfort for the contributors that will be forced to rebase their pending gerrit patches. Even if the full reorganization won't happen in a short time I think we should plan the entire structure now and then eventually move only few relevant modules to their final location. To start the discussion I'm attaching here a sketch of the vdsm repository structure that I envision: . |-- client | |-- [...] | `-- vdsClient.py |-- common | |-- [...] | |-- betterPopen | | `-- [...] | `-- vdsm | |-- [...] | `-- config.py |-- contrib | |-- [...] | |-- nfs-check.py | `-- sos |-- daemon | |-- [...] | |-- supervdsm.py | `-- vdsmd `-- tool |-- [...] `-- vdsm-tool The schema file vdsmapi-schema.json (as well as the python module to parse it) are needed by the server and clients. Initially I'd think it should be installed in 'common', but a client does not need things like betterPopen. Any recommendation on where the schema/API definition should live? Well they both should have the file but when installed both should have their own version of the file depending on the version installed of the client or the server. This is so that vdsm-cli doesn't depend on vdsm or vice-versa. You can't have them share the file since if one is installed with a version of the schema where the schema syntax changed the client\server will fail to parse the schema. As for development, I think the least bad solution is to put it in contrib with symlinks that have relative paths. |--daemon | |-- [...] | `-- vdsmapi-schema.json - ../contrib/vdsmapi-schema.json |--client | |-- [...] | `-- vdsmapi-schema.json - ../contrib/vdsmapi-schema.json |--contrib | |-- [...] | `-- vdsmapi-schema.json : . Git knows how to handle symlinks and symlinks are relative to the location of the symlink. We could also just select the daemon or the client folder and put the real file there and a symlink in the other but I feel it's like choosing which one of your children is the main user of a schema file. -- Adam Litke a...@us.ibm.com IBM Linux Technology Center ___ vdsm-devel mailing list vdsm-devel@lists.fedorahosted.org https://lists.fedorahosted.org/mailman/listinfo/vdsm-devel
Re: [vdsm] [Engine-devel] RFC: New Storage API
- Original Message - From: Adam Litke a...@us.ibm.com To: Shu Ming shum...@linux.vnet.ibm.com Cc: engine-devel engine-de...@ovirt.org, VDSM Project Development vdsm-devel@lists.fedorahosted.org Sent: Tuesday, January 22, 2013 2:20:19 PM Subject: Re: [vdsm] [Engine-devel] RFC: New Storage API On Tue, Jan 22, 2013 at 11:36:57PM +0800, Shu Ming wrote: 2013-1-15 5:34, Ayal Baron: image and volume are overused everywhere and it would be extremely confusing to have multiple meanings to the same terms in the same system (we have image today which means virtual disk and volume which means a part of a virtual disk). Personally I don't like the distinction between image and volume done in ec2/openstack/etc seeing as they're treated as different types of entities there while the only real difference is mutability (images are read-only, volumes are read-write). To move to the industry terminology we would need to first change all references we have today to image and volume in the system (I would say also in ovirt-engine side) to align with the new meaning. Despite my personal dislike of the terms, I definitely see the value in converging on the same terminology as the rest of the industry but to do so would be an arduous task which is out of scope of this discussion imo (patches welcome though ;) Another distinction between Openstack and oVirt is how the Nova/ovirt-engine look upon storage systems. In Openstack, a stand alone storage service(Cinder) exports the raw storage block device to Nova. On the other hand, in oVirt, storage system is highly bounded with the cluster scheduling system which integrates storage sub-system, VM dispatching sub-system, ISO image sub systems. This combination make all of the sub-system integrated in a whole which is easy to deploy, but it make the sub-system more opaque and not harder to reuse and maintain. This new storage API proposal give us an opportunity to distinct these sub-systems as new components which export better, loose-coupling APIs to VDSM. A very good point and an important goal in my opinion. I'd like to see ovirt-engine become more of a GUI for configuring the storage component (like it does for Gluster) rather than the centralized manager of storage. The clustered storage should be able to take care of itself as long as the peer hosts can negotiate the SDM role. It would be cool if someone could actually dedicate a non-virtualization host where its only job is to handle SDM operations. Such a host could choose to only deploy the standalone HSM service and not the complete vdsm package. OpenStack and oVirt have different architectures and goals. Even though they are both marketed as IaaS solutions they are designed for different purposes. OpenStack is designed around the idea of simplifying the *development* and *integration* of IaaS subsystems through standardization of interfaces. If you design a system that requires access to some type of infrastructural resource you can develop against the OpenStack API for that specific resource and you can consume different underlying implementations of the subsystem. Alternatively if you are creating a new subsystem implementations one of your exposed APIs can be the appropriate OpenStack API. In short, they are a group of loosely coupled services meant to be used replicated and distributed in a cluster. Everyone can create they own implementations of the APIs. oVirt is designed around the idea of simplifying the *management* of said infrastructure. The ovirt-engine is the cluster manager and VDSM is the host-manager. Every host in the cluster has a host manager installed on it (VDSM) and some (currently only 1) might have the cluster-manager (ovirt-engine) and they are the effective brain. oVirt ideally only has managing entities. VDSM APIs delegate to other subsystems tasks that are in it's scope, the subsystems have their own APIs. For VMs you have libvirt, for networking you have the linux management tools and maybe netcf for policy we now have MOM. For iscsi we have iscsiadm, etc. The only odd one out is the image provisioning subsystem which I will get in to, don't worry. This means, if you didn't already gather, that no host managed by ovirt can exist without either VDSM or the ovirt-engine living on it. That being said, I am a huge proponent of making all subsystems optional. Meaning you can have VDSM that doesn't have the libvirt or networking glue bits and just has storage, and gluster. To put it simply, no host without a *managing* entity on it. But, as you all have pointed out, VDSM is redundant. There is no reason why the engine can't just directly ask libvirt to do things. There is no reason why we can't make a general iscsi management API and expose it on it's own, independent from other services. VDSM is frankensteinesque abomination of misplaced BL and pass-through APIs. This is why everyone are
Re: [vdsm] remote serial console via HTTP streaming handler
Good to see my suggestion didn't fall on deaf ears. - Original Message - From: Zhou Zheng Sheng zhshz...@linux.vnet.ibm.com To: Saggi Mizrahi smizr...@redhat.com Cc: VDSM Project Development vdsm-devel@lists.fedorahosted.org, Adam Litke a...@us.ibm.com, Ayal Baron aba...@redhat.com, Barak Azulay bazu...@redhat.com, Dan Kenigsberg dan...@redhat.com Sent: Tuesday, January 15, 2013 4:30:03 AM Subject: Re: remote serial console via HTTP streaming handler on 01/08/2013 04:10, Saggi Mizrahi wrote: The solution is somewhat elegant (and only ~150 LOC). That being said I still have some 2 major problems with it: The simpler one is that it uses HTTP in a very non-standard manner, this can be easily solved by using websockets[2]. This is very close to what the patch already does and will make it follow some sort of a standard. This will also enable console on the Web UI to expose this on newer browsers. Using WebSocket is a good idea. I have a look at its standard (http://tools.ietf.org/html/rfc6455). The framing and the security model is not trivial to implement (compared to that existing patch which enables HTTP to forward PTY octet stream in full duplex). Luckily there are some open-source WebSocket implementations. The second and the real reason I didn't put it just as a comment on the patch is that that using HTTP and POST %PATH to have only one listening socket for all VMs is completely different from the way we do VNC or SPICE. This means it kind of bypasses ticketing and any other mechanism we want to put on VM interfaces. I think HTTP digest authentication may be implemented in the current PTY forwarding patch to enable ticketing. The thing is, I really like it. I was suggesting that we extend this idiom to use for SPICE and VNC and tunneling it through a single http\websocket listener. So instead of making this work with the current methods make this the way to go. Using headers like: GET /VM/VM_ID/control HTTP/1.1 Host: server.example.com Upgrade: websocket Ticket: TICKET Connection: Upgrade Sec-WebSocket-Key: x3JJHMbDL1EzLkh9GBhXDw== Sec-WebSocket-Protocol: [pty, vnc, spice] Sec-WebSocket-Version: 13 Origin:http://example.com In the Spice official site, I see a demo project spice-html5 uses a WebSocket-Spice gateway to get the data. The Spice is tunneled in WebSocket between the client and the gateway. This is good for javascript running in browsers. If VDSM support tunneling the PTY, VNC and Spice in WebSocket, writing a viewers in browsers maybe easier. A WebSocket proxy can also be implemented to support migration with PTY. The PTY data stream is VDSM=proxy=client. When migrating, VDSM sends this event to proxy via AMQP, then shuts down the current WebSocket connection. The proxy can keep the connection with the client. After migration, another VDSM sends this event to proxy via AMQP, then the proxy establish the WebSocket connection with VDSM and continue the forwarding. We can also connect two guests' serial port by forwarding the data stream via this proxy back and forth with support for migration as explained above. Furthermore, the proxy can exposes the data stream in various plug-in protocols such as SOCKS, HTTP, SSH, telnet to various client. For example the proxy provide SOCKS support, then we can use socat as a SOCKS client to connect to guest serial port and pipe the data to FD 0 and 1 to a process running in the host. Also, I don't think it's such a problem to have the client change servers usually even if websockets are invovled. It just means that client needs to be aware of the possibility of an extra layer. -- Thanks and best regards! Zhou Zheng Sheng / 周征晟 E-mail:zhshz...@linux.vnet.ibm.com Telephone: 86-10-82454397 ___ vdsm-devel mailing list vdsm-devel@lists.fedorahosted.org https://lists.fedorahosted.org/mailman/listinfo/vdsm-devel
Re: [vdsm] RFC: New Storage API
- Original Message - From: Itamar Heim ih...@redhat.com To: Saggi Mizrahi smizr...@redhat.com Cc: VDSM Project Development vdsm-devel@lists.fedorahosted.org, engine-devel engine-de...@ovirt.org Sent: Monday, January 14, 2013 6:18:13 AM Subject: Re: [vdsm] RFC: New Storage API On 12/04/2012 11:52 PM, Saggi Mizrahi wrote: I've been throwing a lot of bits out about the new storage API and I think it's time to talk a bit. I will purposefully try and keep implementation details away and concentrate about how the API looks and how you use it. First major change is in terminology, there is no long a storage domain but a storage repository. This change is done because so many things are already called domain in the system and this will make things less confusing for new-commers with a libvirt background. One other changes is that repositories no longer have a UUID. The UUID was only used in the pool members manifest and is no longer needed. connectStorageRepository(repoId, repoFormat, connectionParameters={}): repoId - is a transient name that will be used to refer to the connected domain, it is not persisted and doesn't have to be the same across the cluster. repoFormat - Similar to what used to be type (eg. localfs-1.0, nfs-3.4, clvm-1.2). connectionParameters - This is format specific and will used to tell VDSM how to connect to the repo. disconnectStorageRepository(self, repoId): In the new API there are only images, some images are mutable and some are not. mutable images are also called VirtualDisks immutable images are also called Snapshots There are no explicit templates, you can create as many images as you want from any snapshot. There are 4 major image operations: createVirtualDisk(targetRepoId, size, baseSnapshotId=None, userData={}, options={}): targetRepoId - ID of a connected repo where the disk will be created size - The size of the image you wish to create baseSnapshotId - the ID of the snapshot you want the base the new virtual disk on userData - optional data that will be attached to the new VD, could be anything that the user desires. options - options to modify VDSMs default behavior returns the id of the new VD createSnapshot(targetRepoId, baseVirtualDiskId, userData={}, options={}): targetRepoId - The ID of a connected repo where the new sanpshot will be created and the original image exists as well. size - The size of the image you wish to create baseVirtualDisk - the ID of a mutable image (Virtual Disk) you want to snapshot userData - optional data that will be attached to the new Snapshot, could be anything that the user desires. options - options to modify VDSMs default behavior returns the id of the new Snapshot copyImage(targetRepoId, imageId, baseImageId=None, userData={}, options={}) targetRepoId - The ID of a connected repo where the new image will be created imageId - The image you wish to copy baseImageId - if specified, the new image will contain only the diff between image and Id. If None the new image will contain all the bits of image Id. This can be used to copy partial parts of images for export. userData - optional data that will be attached to the new image, could be anything that the user desires. options - options to modify VDSMs default behavior return the Id of the new image. In case of copying an immutable image the ID will be identical to the original image as they contain the same data. However the user should not assume that and always use the value returned from the method. removeImage(repositoryId, imageId, options={}): repositoryId - The ID of a connected repo where the image to delete resides imageId - The id of the image you wish to delete. getImageStatus(repositoryId, imageId) repositoryId - The ID of a connected repo where the image to check resides imageId - The id of the image you wish to check. All operations return once the operations has been committed to disk NOT when the operation actually completes. This is done so that: - operation come to a stable state as quickly as possible. - In case where there is an SDM, only small portion of the operation actually needs to be performed on the SDM host. - No matter how many times the operation fails and on how many hosts, you can always resume the operation and choose when to do it. - You can stop an operation at any time and remove the resulting object making a distinction between stop because the host is overloaded to I don't want that image This means that after calling any operation that creates a new image the user must then call getImageStatus() to check what is the status of the image. The status of the image can be either optimized, degraded, or broken. Optimized means
Re: [vdsm] API Documentation Since tag
- Original Message - From: Adam Litke a...@us.ibm.com To: Vinzenz Feenstra vfeen...@redhat.com Cc: vdsm-devel@lists.fedorahosted.org Sent: Friday, January 11, 2013 9:03:19 AM Subject: Re: [vdsm] API Documentation Since tag On Fri, Jan 11, 2013 at 10:19:45AM +0100, Vinzenz Feenstra wrote: Hi everyone, We are currently documenting the API in vdsmapi-schema.json I noticed that we have there documented when a certain element newly is introduced using the 'Since' tag. However I also noticed that we are not documenting when a field was newly added, nor do we update the 'since' tag. We should start documenting in what version we've introduced a field. A suggestion by saggi was to add to the comment for example: @since: 4.10.3 What is your point of view on this? I do think it's a good idea to add this information. How about supporting multiple Since lines in the comment like the following made up example: ## # @FenceNodePowerStatus: # # Indicates the power state of a remote host. # # @on:The remote host is powered on # # @off: The remote host is powered off # # @unknown: The power status is not known # # @sentient: The host is alive and powered by its own metabolism # # Since: 4.10.0 - @FenceNodePowerStatus # Since: 10.2.0 - @sentient ## I don't like the fact that both lines don't point to the same type of token. I also don't like that it's a repeat of the type names and field names. I prefer Vinzenz original suggestion (on IRC) of moving the Since token up and then have it be a state. It also makes discerning what entities you can use up to a certain version easier if you make sure to keep them sorted. We can do this because the order of the fields and availability is undetermined (unlike real structs). ## # @FenceNodePowerStatus: # # Indicates the power state of a remote host. # # Since: 4.10.0 # # @on:The remote host is powered on # # @off: The remote host is powered off # # @unknown: The power status is not known # # Since: 10.2.0 # # @sentient: The host is alive and powered by its own metabolism # ## The problem though is that it makes since a property of the fields and not of the struct. This isn't that much of a problem as we can assume the earliest version is the time when the struct was introduced. Remember that any patch to change the schema format will require changes to process-schema.py as well. -- Adam Litke a...@us.ibm.com IBM Linux Technology Center ___ vdsm-devel mailing list vdsm-devel@lists.fedorahosted.org https://lists.fedorahosted.org/mailman/listinfo/vdsm-devel ___ vdsm-devel mailing list vdsm-devel@lists.fedorahosted.org https://lists.fedorahosted.org/mailman/listinfo/vdsm-devel
[vdsm] remote serial console via HTTP streaming handler
I remember that there was a discussion about it but I don't remember it ever converging. In any case there is a patch upstream [1] that merits discussion outside the scope of the patch and reviewers. The solution is somewhat elegant (and only ~150 LOC). That being said I still have some 2 major problems with it: The simpler one is that it uses HTTP in a very non-standard manner, this can be easily solved by using websockets[2]. This is very close to what the patch already does and will make it follow some sort of a standard. This will also enable console on the Web UI to expose this on newer browsers. The second and the real reason I didn't put it just as a comment on the patch is that that using HTTP and POST %PATH to have only one listening socket for all VMs is completely different from the way we do VNC or SPICE. This means it kind of bypasses ticketing and any other mechanism we want to put on VM interfaces. The thing is, I really like it. I was suggesting that we extend this idiom to use for SPICE and VNC and tunneling it through a single http\websocket listener. So instead of making this work with the current methods make this the way to go. Using headers like: GET /VM/VM_ID/control HTTP/1.1 Host: server.example.com Upgrade: websocket Ticket: TICKET Connection: Upgrade Sec-WebSocket-Key: x3JJHMbDL1EzLkh9GBhXDw== Sec-WebSocket-Protocol: [pty, vnc, spice] Sec-WebSocket-Version: 13 Origin: http://example.com I admit I have no idea if migrating SPICE would like being tunneled but I guess there is no practical reason why that would be a problem. [1] http://gerrit.ovirt.org/#/c/10381 [2] http://en.wikipedia.org/wiki/WebSocket ___ vdsm-devel mailing list vdsm-devel@lists.fedorahosted.org https://lists.fedorahosted.org/mailman/listinfo/vdsm-devel
Re: [vdsm] Managing async tasks
- Original Message - From: Adam Litke a...@us.ibm.com To: vdsm-devel@lists.fedorahosted.org Cc: Dan Kenigsberg dan...@redhat.com, Ayal Baron aba...@redhat.com, Saggi Mizrahi smizr...@redhat.com, Federico Simoncelli fsimo...@redhat.com, engine-de...@ovirt.org Sent: Monday, December 17, 2012 12:00:49 PM Subject: Managing async tasks On today's vdsm call we had a lively discussion around how asynchronous operations should be handled in the future. In an effort to include more people in the discussion and to better capture the resulting conversation I would like to continue that discussion here on the mailing list. A lot of ideas were thrown around about how 'tasks' should be handled in the future. There are a lot of ways that it can be done. To determine how we should implement it, it's probably best if we start with a set of requirements. If we can first agree on these, it should be easy to find a solution that meets them. I'll take a stab at identifying a first set of POSSIBLE requirements: - Standardized method for determining the result of an operation This is a big one for me because it directly affects the consumability of the API. If each verb has different semantics for discovering whether it has completed successfully, then the API will be nearly impossible to use easily. Since there is no way to assure if of some tasks completed successfully or failed, especially around the murky waters of storage, I say this requirement should be removed. At least not in the context of a task. Sorry. That's my list :) Hopefully others will be willing to add other requirements for consideration. From my understanding, task recovery (stop, abort, rollback, etc) will not be generally supported and should not be a requirement. -- Adam Litke a...@us.ibm.com IBM Linux Technology Center ___ vdsm-devel mailing list vdsm-devel@lists.fedorahosted.org https://lists.fedorahosted.org/mailman/listinfo/vdsm-devel
Re: [vdsm] Managing async tasks
- Original Message - From: Adam Litke a...@us.ibm.com To: Saggi Mizrahi smizr...@redhat.com Cc: Dan Kenigsberg dan...@redhat.com, Ayal Baron aba...@redhat.com, Federico Simoncelli fsimo...@redhat.com, engine-de...@ovirt.org, vdsm-devel@lists.fedorahosted.org Sent: Monday, December 17, 2012 2:16:25 PM Subject: Re: Managing async tasks On Mon, Dec 17, 2012 at 12:15:08PM -0500, Saggi Mizrahi wrote: - Original Message - From: Adam Litke a...@us.ibm.com To: vdsm-devel@lists.fedorahosted.org Cc: Dan Kenigsberg dan...@redhat.com, Ayal Baron aba...@redhat.com, Saggi Mizrahi smizr...@redhat.com, Federico Simoncelli fsimo...@redhat.com, engine-de...@ovirt.org Sent: Monday, December 17, 2012 12:00:49 PM Subject: Managing async tasks On today's vdsm call we had a lively discussion around how asynchronous operations should be handled in the future. In an effort to include more people in the discussion and to better capture the resulting conversation I would like to continue that discussion here on the mailing list. A lot of ideas were thrown around about how 'tasks' should be handled in the future. There are a lot of ways that it can be done. To determine how we should implement it, it's probably best if we start with a set of requirements. If we can first agree on these, it should be easy to find a solution that meets them. I'll take a stab at identifying a first set of POSSIBLE requirements: - Standardized method for determining the result of an operation This is a big one for me because it directly affects the consumability of the API. If each verb has different semantics for discovering whether it has completed successfully, then the API will be nearly impossible to use easily. Since there is no way to assure if of some tasks completed successfully or failed, especially around the murky waters of storage, I say this requirement should be removed. At least not in the context of a task. I don't agree. Please feel free to convince me with some exampled. If we cannot provide feedback to a user as to whether their request has been satisfied or not, then we have some bigger problems to solve. If VDSM sends a write command to a storage server, and the connection hangs up before the ACK has returned. The operation has been committed but VDSM has no way of knowing if that happened as far as VDSM is concerned it got an ETIMEO or EIO. This is the same problem that the engine has with VDSM. If VDSM creates an image\VM\network\repo but the connection hangs up before the response can be sent back as far as the engine is concerned the operation times out. This is an inherent issue with clustering. This is why I want to move away from tasks being *the* trackable objects. Tasks should be short. As short as possible. Run VM should just persist the VM information on the VDSM host and return. The rest of the tracking should be done using the VM ID. Create image should return once VDSM persisted the information about the request on the repository and created the metadata files. Tracking should be done on the repo or the imageId. Sorry. That's my list :) Hopefully others will be willing to add other requirements for consideration. From my understanding, task recovery (stop, abort, rollback, etc) will not be generally supported and should not be a requirement. -- Adam Litke a...@us.ibm.com IBM Linux Technology Center ___ vdsm-devel mailing list vdsm-devel@lists.fedorahosted.org https://lists.fedorahosted.org/mailman/listinfo/vdsm-devel
Re: [vdsm] Managing async tasks
This is an addendum to my previous email. - Original Message - From: Saggi Mizrahi smizr...@redhat.com To: Adam Litke a...@us.ibm.com Cc: Dan Kenigsberg dan...@redhat.com, Ayal Baron aba...@redhat.com, Federico Simoncelli fsimo...@redhat.com, engine-de...@ovirt.org, vdsm-devel@lists.fedorahosted.org Sent: Monday, December 17, 2012 2:52:06 PM Subject: Re: Managing async tasks - Original Message - From: Adam Litke a...@us.ibm.com To: Saggi Mizrahi smizr...@redhat.com Cc: Dan Kenigsberg dan...@redhat.com, Ayal Baron aba...@redhat.com, Federico Simoncelli fsimo...@redhat.com, engine-de...@ovirt.org, vdsm-devel@lists.fedorahosted.org Sent: Monday, December 17, 2012 2:16:25 PM Subject: Re: Managing async tasks On Mon, Dec 17, 2012 at 12:15:08PM -0500, Saggi Mizrahi wrote: - Original Message - From: Adam Litke a...@us.ibm.com To: vdsm-devel@lists.fedorahosted.org Cc: Dan Kenigsberg dan...@redhat.com, Ayal Baron aba...@redhat.com, Saggi Mizrahi smizr...@redhat.com, Federico Simoncelli fsimo...@redhat.com, engine-de...@ovirt.org Sent: Monday, December 17, 2012 12:00:49 PM Subject: Managing async tasks On today's vdsm call we had a lively discussion around how asynchronous operations should be handled in the future. In an effort to include more people in the discussion and to better capture the resulting conversation I would like to continue that discussion here on the mailing list. A lot of ideas were thrown around about how 'tasks' should be handled in the future. There are a lot of ways that it can be done. To determine how we should implement it, it's probably best if we start with a set of requirements. If we can first agree on these, it should be easy to find a solution that meets them. I'll take a stab at identifying a first set of POSSIBLE requirements: - Standardized method for determining the result of an operation This is a big one for me because it directly affects the consumability of the API. If each verb has different semantics for discovering whether it has completed successfully, then the API will be nearly impossible to use easily. Since there is no way to assure if of some tasks completed successfully or failed, especially around the murky waters of storage, I say this requirement should be removed. At least not in the context of a task. I don't agree. Please feel free to convince me with some exampled. If we cannot provide feedback to a user as to whether their request has been satisfied or not, then we have some bigger problems to solve. If VDSM sends a write command to a storage server, and the connection hangs up before the ACK has returned. The operation has been committed but VDSM has no way of knowing if that happened as far as VDSM is concerned it got an ETIMEO or EIO. This is the same problem that the engine has with VDSM. If VDSM creates an image\VM\network\repo but the connection hangs up before the response can be sent back as far as the engine is concerned the operation times out. This is an inherent issue with clustering. This is why I want to move away from tasks being *the* trackable objects. Tasks should be short. As short as possible. Run VM should just persist the VM information on the VDSM host and return. The rest of the tracking should be done using the VM ID. Create image should return once VDSM persisted the information about the request on the repository and created the metadata files. Tracking should be done on the repo or the imageId. The thing is that I know how long a VM object should live (or an Image object). So tracking it is straight forward. How long a task should live is very problematic and quite context specific. It depends on what the task is. I think it's quite confusing from an API standpoint to have every task have a different scope, id requirement and life-cycle. In VDSM has two types of APIs CRUD objects - VM, Image, Repository, Bridge, Storage Connections General transient methods - getBiosInfo(), getDeviceList() The latter are quite simple to manage. They don't need any special handling. If you lost a getBiosInfo() call you just send another one, no harm done. The same is even true with things that change the host like getDeviceList() What we are really arguing about is fitting the CRUD objects to some generic task oriented scheme. I'm saying it's a waste of time as you can quite easily have flows to recover from each operation. Create - Check if the object exists Read - Read again Update - either update again or read and update if update didn't commit the first time Delete - Check if object doesn't exist Each of the objects we CRUD have different life-cycles and ownership semantics. Danken raised the point that creation has
[vdsm] [Draft]Task Management API
Dan rightly suggested I'd be more specific about what the task system is instead of what the task system isn't. The problem is that I'm not completely sure how it's going to work. It also depends on the events mechanism. This is my current working draft: TaskInfo: id string methodName string kwargs json-object (string keys variant values) *filtered to remove sensitive information getRunningTasks(filter string, filterType enum{glob, regexp}) Returns a list of TaskInfo of all tasks that their id's match the filter That's it, not even stopTask() As explained, I would like to offload handling to the subsystems. In order to make things easier for the clients every subsystem can choose a filed of the object to be of type OperationInfo. This is a generic structure that the user has a generic way to track all tasks on all subsystem with a report interface. The extraData field is for subsystem specific data. This is where the storage subsystem would put, for example, imageState (broken, degraded, optimized) data. OperationInfo: operationDescription string - something out of an agreed enum of strings vaguely describing the operation at hand for example Copying, Merging, Deleting, Configuring, Stopped, Paused, They must be known to the client so it can in turn translate it in the UI. The also have to remain relatively vague as they are part of the interface meaning that new values will break old clients so they have to be reusable. stageDescription - Similar to operation description in case you want more granularity, optional. stage (int, int) - (5, 10) means 5 out of 10. 1 out of 1 implies the UI to not display stage widgets. percentage - 0-100, -1 means unknown. lastError - (code, message) the same errors that can return for regular calls extraData - json-object For example creatVM will return once the object is created in VDSM. getVmInfo() would return, amongst other things, the operation info. For the case of preparing for launch it will be: {Creating, configuring, (2, 4), 40, (0, ), {state=preparing for launch}} In the case of VM paused on EIO: {Paused, Paused, (1, 1), -1, (123, Error writing to disks), {state=paused}} Migration is a tricky one, it will be reported as a task while it's in progress but all the information is available on the image operationInfo. In the case of Migration: {Migration, Configuring, (1, 3), -1, (0, ), {status=Migrating}} For StorageConnection this is somewhat already the case but in simplified version. If you want to ask about any other operation I'd be more then happy to write my suggestion for it. Subsystems have complete freedom about how to set up the API. For Storage you have Fixes() to start\stop operations. Gluster is pretty autonomous once operations have been started. Since operations return as soon as they are registered (persisted) or fail to register, it makes synchronous programming a bit clunky. vdsm.pauseVm(vmId) doesn't return when the VM is paused but when VDSM committed it will try to pause it. This means you will have to poll in order to see if the operation finished. For gluster, as an example, this is the only way we can check that the operation finished. For stuff we have a bit more control over vdsm will fire events using json-rpc notifications sent to the clients. The will be in the form of: {method: alert, params: { alertName: subsystem(.objectType)?.object.(subobject., ...), operationInfo, OperationInfo} } The user can register to recive events using a glob or a regexp. registering to vdsm.VM.* pop every time any VM has changed stage. This means that whenever the task finishes, fails or gains significance progress and VDSM is there to track it, an event will be sent to the client. This means that the general flow is. # Register operation vmID = best_vm host.VM.pauseVM(vmID) while True: opInfo = None try: event = host.waitForEvent(vdsm.VM.best_vm, timeout=10) opInfo = event.opInfo except VdsmDisconnectionError: host.waitForReconnect() host.vm.getVmInfo(vmID) # Double check that we didn't miss the event continue except Timeout: # This is a long operation, poll to see that we didn't miss any event # but more commonly, update percentage in the UI to show progress. vmInfo = host.vm.getVmInfo(vmID) opInfo = vmInfo.operationInfo if opInfo.stage.number != op.stage.total: # Operation in progress updateUI(opInfo) else: # Operation completed # Check that the state is what we expected it to be. if oInfo.extraData.state == paused: return SUCCESS else: return
Re: [vdsm] blame and shame
I kind of like the fact that I will not be blamed for all the stuff I broke. :( - Original Message - From: Antoni Segura Puimedon asegu...@redhat.com To: vdsm-devel@lists.fedorahosted.org Sent: Thursday, December 13, 2012 10:34:52 AM Subject: [vdsm] blame and shame Hi list! Since I'm doing lately and I plan to continue to do patches to improve pep8 compliance for the whole vdsm codebase, and a lot of that is E126, E127 and E128, that deal with whitespaces, I have added to my ~/.gitconfig [alias] bl = blame -w Which ignores whitespaces for the blame on the lines. This way, my name will not be shown next to code I don't know about ;-) Of course, it would be great if git blame where to be extended with pydiff so all the pep8 changes would be ignored for blaming purposes... But I'll leave that to someone else ;-) Best, Toni ___ vdsm-devel mailing list vdsm-devel@lists.fedorahosted.org https://lists.fedorahosted.org/mailman/listinfo/vdsm-devel ___ vdsm-devel mailing list vdsm-devel@lists.fedorahosted.org https://lists.fedorahosted.org/mailman/listinfo/vdsm-devel
Re: [vdsm] Request for consideration during the API revamp
Since I assume vdsClient will use libvdsm. It should have all the constants defined. I do like Adam's suggestion about making vdsClient auto-generated as well. vdsClient is currently very annoying to maintain. I would also like to propose changing the name of the executable to vdsm_cli. It would make it easier to distribute both tools as vdsClient will still be needed communicate with old VDSMs. Also capital letters in executable names is not very Unixy. - Original Message - From: Adam Litke a...@us.ibm.com To: Vinzenz Feenstra vfeen...@redhat.com Cc: vdsm-devel@lists.fedorahosted.org Sent: Wednesday, December 12, 2012 10:40:19 AM Subject: Re: [vdsm] Request for consideration during the API revamp On Wed, Dec 12, 2012 at 02:01:31PM +0100, Vinzenz Feenstra wrote: Hi, When there is the attempt to enhance/change the current API, I would ask you to consider to think also about the vdsClient use case. I haven't read anything regarding that so far and therefore I just want you to think about it as well. My expectation is that the vdsClient will continue to use the RPC interfaces, however since it is part of the VDSM project I think it would be a good idea if there is a way for both vdsmd and vdsClient to share constants used for the API. That in turn also should simplify the maintenance of vdsClient. Currently I see the constants used by both being defined on both sides and I am pretty sure that this could be improved. See this as just a thought on the whole redesign talk, but I would like to see this kind of use cases to be covered. :-) Yes, this is an excellent suggestion. One thing I am thinking about doing is generating a new python file with the enums defined in the schema. This could be included by all server-side code and by clients such as vdsClient. If we decide to add constants to the schema file, we could also place these into the same generated python file. -- Adam Litke a...@us.ibm.com IBM Linux Technology Center ___ vdsm-devel mailing list vdsm-devel@lists.fedorahosted.org https://lists.fedorahosted.org/mailman/listinfo/vdsm-devel ___ vdsm-devel mailing list vdsm-devel@lists.fedorahosted.org https://lists.fedorahosted.org/mailman/listinfo/vdsm-devel
Re: [vdsm] Host bios information
- Original Message - From: Ayal Baron aba...@redhat.com To: Saggi Mizrahi smizr...@redhat.com Cc: VDSM Project Development vdsm-devel@lists.fedorahosted.org, Shu Ming shum...@linux.vnet.ibm.com Sent: Thursday, December 13, 2012 11:30:56 AM Subject: Re: [vdsm] Host bios information - Original Message - I think that for the new current XML-RPC API it's OK to add it to the getVdsCaps() verb. For the new API I suggest moving it to it's own API. The smaller the APIs the easier they are to deprecate and support. I quite doubt the fields in getBiosInfo() will change half as frequently as whatever getVdsCaps() returns. I also kind of want to throw away getVdsCaps() and split it to better named better encapsulated methods. Ack. I just don't understand why not start right now? Any new patch should improve things at least a little. We know getVdsCaps() is wrong so let's put the bios info (and anything in getVdsCaps that makes sense to put with it if relevant) in a separate call. Adding a call in engine to this new method should be a no brainer, I don't think that is a good reason for not doing things properly in vdsm, even if we're talking about the current API. Well, from what I know the current overhead per call is too large to mandate a lot of calls. At least that is what I've been told. If that is not an issue, do it in the XML-RPC API too. Also, in the json-rpc base model, calls are not only cheaper, you also have batch calls. This means you can send multiple requests as one message and have VDSM send you the responses as one message once all tasks completed. This makes splitting aggregated methods to smaller methods painless and with minimal overhead. - Original Message - From: Shu Ming shum...@linux.vnet.ibm.com To: ybronhei ybron...@redhat.com Cc: VDSM Project Development vdsm-devel@lists.fedorahosted.org Sent: Thursday, December 13, 2012 11:04:09 AM Subject: Re: [vdsm] Host bios information After a quick review of the wiki page, it was stated that dmidecode gave too much informations. Only five fields will be displayed in the hardware tab, Manufactory, Version, Family, UUID and serial number. For Family, it is mean the CPU core's family. And it confuses me a bit with the CPU name and CPU type fields in general tab. I think we should chose the best one to characterizethe CPU type. ybronhei: Today in the Api we display general information about the host that vdsm export by getCapabilities Api. We decided to add bios information as part of the information that is displayed in UI under host's general sub-tab. To summaries the feature - We'll modify General tab to Software Information and add another tab for Hardware Information which will include all the bios data that we'll decide to gather from the host and display. Following this feature page: http://www.ovirt.org/Features/Design/HostBiosInfo for more details. All the parameters that can be displayed are mentioned in the wiki. I would greatly appreciate your comments and questions. Thanks. -- --- 舒明 Shu Ming Open Virtualization Engineerning; CSTL, IBM Corp. Tel: 86-10-82451626 Tieline: 9051626 E-mail: shum...@cn.ibm.com or shum...@linux.vnet.ibm.com Address: 3/F Ring Building, ZhongGuanCun Software Park, Haidian District, Beijing 100193, PRC ___ vdsm-devel mailing list vdsm-devel@lists.fedorahosted.org https://lists.fedorahosted.org/mailman/listinfo/vdsm-devel ___ vdsm-devel mailing list vdsm-devel@lists.fedorahosted.org https://lists.fedorahosted.org/mailman/listinfo/vdsm-devel ___ vdsm-devel mailing list vdsm-devel@lists.fedorahosted.org https://lists.fedorahosted.org/mailman/listinfo/vdsm-devel
Re: [vdsm] RFC: New Storage API
- Original Message - From: Adam Litke a...@us.ibm.com To: Saggi Mizrahi smizr...@redhat.com Cc: Deepak C Shetty deepa...@linux.vnet.ibm.com, engine-devel engine-de...@ovirt.org, VDSM Project Development vdsm-devel@lists.fedorahosted.org Sent: Monday, December 10, 2012 1:49:31 PM Subject: Re: [vdsm] RFC: New Storage API On Fri, Dec 07, 2012 at 02:53:41PM -0500, Saggi Mizrahi wrote: snip 1) Can you provide more info on why there is a exception for 'lvm based block domain'. Its not coming out clearly. File based domains are responsible for syncing up object manipulation (creation\deletion) The backend is responsible for making sure it all works either by having a single writer (NFS) or having it's own locking mechanism (gluster). In our LVM based domains VDSM is responsible for basic object manipulation. The current design uses an approach where there is a single host responsible for object creation\deleteion it is the SRM\SDM\SPM\S?M. If we ever find a way to make it fully clustered without a big hit in performance the S?M requirement will be removed form that type of domain. I would like to see us maintain a LOCALFS domain as well. For this, we would also need SRM, correct? No, why? -- Adam Litke a...@us.ibm.com IBM Linux Technology Center ___ vdsm-devel mailing list vdsm-devel@lists.fedorahosted.org https://lists.fedorahosted.org/mailman/listinfo/vdsm-devel
Re: [vdsm] RFC: New Storage API
- Original Message - From: Adam Litke a...@us.ibm.com To: Saggi Mizrahi smizr...@redhat.com Cc: Shu Ming shum...@linux.vnet.ibm.com, engine-devel engine-de...@ovirt.org, VDSM Project Development vdsm-devel@lists.fedorahosted.org Sent: Monday, December 10, 2012 4:47:46 PM Subject: Re: [vdsm] RFC: New Storage API On Mon, Dec 10, 2012 at 03:36:23PM -0500, Saggi Mizrahi wrote: Statements like this make me start to worry about your userData concept. It's a sign of a bad API if the user needs to invent a custom metadata scheme for itself. This reminds me of the abomination that is the 'custom' property in the vm definition today. In one sentence: If VDSM doesn't care about it, VDSM doesn't manage it. userData being a void* is quite common and I don't understand why you would thing it's a sign of a bad API. Further more, giving the user choice about how to represent it's own metadata and what fields it want to keep seems reasonable to me. Especially given the fact that VDSM never reads it. The reason we are pulling away from the current system of VDSM understanding the extra data is that it makes that data tied to VDSMs on disk format. VDSM on disk format has to be very stable because of clusters with multiple VDSM versions. Further more, since this is actually manager data it has to be tied to the manager backward compatibility lifetime as well. Having it be opaque to VDSM ties it to only one, simpler, support lifetime instead of two. I guess you are implying that it will make it problematic for multiple users to read userData left by another user because the formats might not be compatible. The solution is that all parties interested in using VDSM storage agree on format, and common fields, and supportability, and all the other things that choosing a supporting *something* entails. This is, however, out of the scope of VDSM. When the time comes I think how the userData blob is actually parsed and what fields it keeps should be discussed on ovirt-devel or engine-devel. The crux of the issue is that VDSM manages only what it cares about and the user can't modify directly. This is done because everything we expose we commit to. If you want any information persisted like: - Human readable name (in whatever encoding) - Is this a template or a snapshot - What user owns this image You can just put it in the userData. VDSM is not going to impose what encoding you use. It's not going to decide if you represent your users as IDs or names or ldap queries or Public Keys. It's not going to decide if you have explicit templates or not. It's not going to decide if you care what is the logical image chain. It's not going to decide anything that is out of it's scope. No format is future proof, no selection of fields will be good for any situation. I'd much rather it be someone else's problem when any of them need to be changed. They have currently been VDSMs problem and it has been hell to maintain. In general, I actually agree with most of this. What I want to avoid is pushing things that should actually be a part of the API into this userData blob. We do want to keep the API as simple as possible to give vdsm flexibility. If, over time, we find that users are always using userData to work around something missing in the API, this could be a really good sign that the API needs extension. I was actually contemplating about this for quite a while. If while you create an image the reply is lost or, VDSM is unable to know if the operation was committed or not, the user will have no way of knowing what thew new image ID is. To solve this it is recommended that the manager puts some sort of task related information in the user data. If the operation ever finishes in an an ambiguous state the user just reads the userData from any images it doesn't know or is unsure about their state. This is a flow that every client will have to have. So why not just add that to the API? Because I don't want to impose how this information gets generated, what is the content of that data or how unique it has to be. Since VDSM doesn't use it for anything, I don't feel like I need to figure this out. I am all for simplicity, but simplicity is kind of an abstract concept. Having it be a blob is in some aspects the simplest thing you can do. Just saying that I have a field, put whatever in it is simple to convey but does requires more work on the user's side to figure out what to do with it. All that being said, I do think that the format, fields and how to use them should be defined so different users can communicate and synchronize. It's also important that you don't reinvent the wheel for every flow in every client. I'm just saying that it's not in the scope of VDSM. It should be done as a standard that all users of VDSM agree too conform to. It's the same way that a file
Re: [vdsm] RFC: New Storage API
- Original Message - From: Deepak C Shetty deepa...@linux.vnet.ibm.com To: Saggi Mizrahi smizr...@redhat.com Cc: Shu Ming shum...@linux.vnet.ibm.com, engine-devel engine-de...@ovirt.org, VDSM Project Development vdsm-devel@lists.fedorahosted.org, Deepak C Shetty deepa...@linux.vnet.ibm.com Sent: Friday, December 7, 2012 12:23:15 AM Subject: Re: [vdsm] RFC: New Storage API On 12/06/2012 10:22 PM, Saggi Mizrahi wrote: - Original Message - From: Shu Ming shum...@linux.vnet.ibm.com To: Saggi Mizrahi smizr...@redhat.com Cc: VDSM Project Development vdsm-devel@lists.fedorahosted.org, engine-devel engine-de...@ovirt.org Sent: Thursday, December 6, 2012 11:02:02 AM Subject: Re: [vdsm] RFC: New Storage API Saggi, Thanks for sharing your thought and I get some comments below. Saggi Mizrahi: I've been throwing a lot of bits out about the new storage API and I think it's time to talk a bit. I will purposefully try and keep implementation details away and concentrate about how the API looks and how you use it. First major change is in terminology, there is no long a storage domain but a storage repository. This change is done because so many things are already called domain in the system and this will make things less confusing for new-commers with a libvirt background. One other changes is that repositories no longer have a UUID. The UUID was only used in the pool members manifest and is no longer needed. connectStorageRepository(repoId, repoFormat, connectionParameters={}): repoId - is a transient name that will be used to refer to the connected domain, it is not persisted and doesn't have to be the same across the cluster. repoFormat - Similar to what used to be type (eg. localfs-1.0, nfs-3.4, clvm-1.2). connectionParameters - This is format specific and will used to tell VDSM how to connect to the repo. Where does repoID come from? I think repoID doesn't exist before connectStorageRepository() return. Isn't repoID a return value of connectStorageRepository()? No, repoIDs are no longer part of the domain, they are just a transient handle. The user can put whatever it wants there as long as it isn't already taken by another currently connected domain. So what happens when user mistakenly gives a repoID that is in use before.. there should be something in the return value that specifies the error and/or reason for error so that user can try with a new/diff repoID ? Asi I said, connect fails if the repoId is in use ATM. disconnectStorageRepository(self, repoId) In the new API there are only images, some images are mutable and some are not. mutable images are also called VirtualDisks immutable images are also called Snapshots There are no explicit templates, you can create as many images as you want from any snapshot. There are 4 major image operations: createVirtualDisk(targetRepoId, size, baseSnapshotId=None, userData={}, options={}): targetRepoId - ID of a connected repo where the disk will be created size - The size of the image you wish to create baseSnapshotId - the ID of the snapshot you want the base the new virtual disk on userData - optional data that will be attached to the new VD, could be anything that the user desires. options - options to modify VDSMs default behavior IIUC, i can use options to do storage offloads ? For eg. I can create a LUN that represents this VD on my storage array based on the 'options' parameter ? Is this the intended way to use 'options' ? No, this has nothing to do with offloads. If by offloads you mean having other VDSM hosts to the heavy lifting then this is what the option autoFix=False and the fix mechanism is for. If you are talking about advanced scsi features (ie. write same) they will be used automatically whenever possible. In any case, how we manage LUNs (if they are even used) is an implementation detail. returns the id of the new VD I think we will also need a function to check if a a VirtualDisk is based on a specific snapshot. Like: isSnapshotOf(virtualDiskId, baseSnapshotID): No, the design is that volume dependencies are an implementation detail. There is no reason for you to know that an image is physically a snapshot of another. Logical snapshots, template information, and any other information can be set by the user by using the userData field available for every image. createSnapshot(targetRepoId, baseVirtualDiskId, userData={}, options={}): targetRepoId - The ID of a connected repo where the new sanpshot will be created and the original image exists as well. size - The size of the image you wish to create baseVirtualDisk - the ID of a mutable image (Virtual Disk) you want to snapshot userData - optional data that will be attached to the new Snapshot, could be anything
Re: [vdsm] RFC: New Storage API
- Original Message - From: Tony Asleson tasle...@redhat.com To: vdsm-devel@lists.fedorahosted.org Sent: Wednesday, December 5, 2012 4:48:34 PM Subject: Re: [vdsm] RFC: New Storage API On 12/04/2012 03:52 PM, Saggi Mizrahi wrote: I've been throwing a lot of bits out about the new storage API and I think it's time to talk a bit. I will purposefully try and keep implementation details away and concentrate about how the API looks and how you use it. First major change is in terminology, there is no long a storage domain but a storage repository. This change is done because so many things are already called domain in the system and this will make things less confusing for new-commers with a libvirt background. One other changes is that repositories no longer have a UUID. The UUID was only used in the pool members manifest and is no longer needed. connectStorageRepository(repoId, repoFormat, connectionParameters={}): repoId - is a transient name that will be used to refer to the connected domain, it is not persisted and doesn't have to be the same across the cluster. repoFormat - Similar to what used to be type (eg. localfs-1.0, nfs-3.4, clvm-1.2). connectionParameters - This is format specific and will used to tell VDSM how to connect to the repo. disconnectStorageRepository(self, repoId): In the new API there are only images, some images are mutable and some are not. mutable images are also called VirtualDisks immutable images are also called Snapshots There are no explicit templates, you can create as many images as you want from any snapshot. There are 4 major image operations: createVirtualDisk(targetRepoId, size, baseSnapshotId=None, userData={}, options={}): targetRepoId - ID of a connected repo where the disk will be created size - The size of the image you wish to create baseSnapshotId - the ID of the snapshot you want the base the new virtual disk on userData - optional data that will be attached to the new VD, could be anything that the user desires. options - options to modify VDSMs default behavior returns the id of the new VD I'm guessing there will be a way to find out how much space is available for a specified repo before you try to create a virtual disk on it? This is in the repo API which is not really detailed here. In any case, due to the nature of storage, you can never tell how much space an image is going to actually take. You have over-committing, thin provisioning, sparse volumes, native snapshots, compression, de-dupe, soft raid (btfs\zfs), check-summing, metadata backups, metadata per-operation (btrfs), and more. VDSM might also leave the image in degraded mode if there is no room to complete the action. If you want to create an image you should just give it a whirl, also you should always leave certain % percentage free. createSnapshot(targetRepoId, baseVirtualDiskId, userData={}, options={}): targetRepoId - The ID of a connected repo where the new sanpshot will be created and the original image exists as well. size - The size of the image you wish to create baseVirtualDisk - the ID of a mutable image (Virtual Disk) you want to snapshot userData - optional data that will be attached to the new Snapshot, could be anything that the user desires. options - options to modify VDSMs default behavior returns the id of the new Snapshot copyImage(targetRepoId, imageId, baseImageId=None, userData={}, options={}) targetRepoId - The ID of a connected repo where the new image will be created imageId - The image you wish to copy baseImageId - if specified, the new image will contain only the diff between image and Id. If None the new image will contain all the bits of image Id. This can be used to copy partial parts of images for export. userData - optional data that will be attached to the new image, could be anything that the user desires. options - options to modify VDSMs default behavior return the Id of the new image. In case of copying an immutable image the ID will be identical to the original image as they contain the same data. However the user should not assume that and always use the value returned from the method. Can the target repo id be itself? The case where a user wants to make a copy of a virtual disk in the same repo. A caller could snapshot the virtual disk and then create a virtual disk from the snapshot, but if the target repo could be the same as source repo then they could use this call as long as the returned ID was different. Does imageId IO need to be quiesced before calling this or will that be handled in the implementation (eg. snapshot first)? Copy of an image is possible to the same repo. Copy of a sanpshot to the same repo will not work, there is also no reason
Re: [vdsm] RFC: New Storage API
- Original Message - From: Shu Ming shum...@linux.vnet.ibm.com To: Saggi Mizrahi smizr...@redhat.com Cc: VDSM Project Development vdsm-devel@lists.fedorahosted.org, engine-devel engine-de...@ovirt.org Sent: Thursday, December 6, 2012 11:02:02 AM Subject: Re: [vdsm] RFC: New Storage API Saggi, Thanks for sharing your thought and I get some comments below. Saggi Mizrahi: I've been throwing a lot of bits out about the new storage API and I think it's time to talk a bit. I will purposefully try and keep implementation details away and concentrate about how the API looks and how you use it. First major change is in terminology, there is no long a storage domain but a storage repository. This change is done because so many things are already called domain in the system and this will make things less confusing for new-commers with a libvirt background. One other changes is that repositories no longer have a UUID. The UUID was only used in the pool members manifest and is no longer needed. connectStorageRepository(repoId, repoFormat, connectionParameters={}): repoId - is a transient name that will be used to refer to the connected domain, it is not persisted and doesn't have to be the same across the cluster. repoFormat - Similar to what used to be type (eg. localfs-1.0, nfs-3.4, clvm-1.2). connectionParameters - This is format specific and will used to tell VDSM how to connect to the repo. Where does repoID come from? I think repoID doesn't exist before connectStorageRepository() return. Isn't repoID a return value of connectStorageRepository()? No, repoIDs are no longer part of the domain, they are just a transient handle. The user can put whatever it wants there as long as it isn't already taken by another currently connected domain. disconnectStorageRepository(self, repoId) In the new API there are only images, some images are mutable and some are not. mutable images are also called VirtualDisks immutable images are also called Snapshots There are no explicit templates, you can create as many images as you want from any snapshot. There are 4 major image operations: createVirtualDisk(targetRepoId, size, baseSnapshotId=None, userData={}, options={}): targetRepoId - ID of a connected repo where the disk will be created size - The size of the image you wish to create baseSnapshotId - the ID of the snapshot you want the base the new virtual disk on userData - optional data that will be attached to the new VD, could be anything that the user desires. options - options to modify VDSMs default behavior returns the id of the new VD I think we will also need a function to check if a a VirtualDisk is based on a specific snapshot. Like: isSnapshotOf(virtualDiskId, baseSnapshotID): No, the design is that volume dependencies are an implementation detail. There is no reason for you to know that an image is physically a snapshot of another. Logical snapshots, template information, and any other information can be set by the user by using the userData field available for every image. createSnapshot(targetRepoId, baseVirtualDiskId, userData={}, options={}): targetRepoId - The ID of a connected repo where the new sanpshot will be created and the original image exists as well. size - The size of the image you wish to create baseVirtualDisk - the ID of a mutable image (Virtual Disk) you want to snapshot userData - optional data that will be attached to the new Snapshot, could be anything that the user desires. options - options to modify VDSMs default behavior returns the id of the new Snapshot copyImage(targetRepoId, imageId, baseImageId=None, userData={}, options={}) targetRepoId - The ID of a connected repo where the new image will be created imageId - The image you wish to copy baseImageId - if specified, the new image will contain only the diff between image and Id. If None the new image will contain all the bits of image Id. This can be used to copy partial parts of images for export. userData - optional data that will be attached to the new image, could be anything that the user desires. options - options to modify VDSMs default behavior Does this function mean that we can copy the image from one repository to another repository? Does it cover the semantics of storage migration, storage backup, storage incremental backup? Yes, the main purpose is copying to another repo. and you can even do incremental backups. Also the following flow 1. Run a VM using imageA 2. write to disk 3. Stop VM 4. copy imageA to repoB 5. Run a VM using imageA again 6. Write to disk 7. Stop VM 8. Copy imageA again basing it of imageA_copy1 on repoB creating a diff on repo diff without snapshotting the original image. return the Id of the new image
Re: [vdsm] VDSM tasks, the future
I'm sorry but your email client messed up the formatting and I can't figure out what are you comments. Could you please use text only emails. - Original Message - From: ybronhei ybron...@redhat.com To: Saggi Mizrahi smizr...@redhat.com Cc: Adam Litke a...@us.ibm.com, engine-devel engine-de...@ovirt.org, VDSM Project Development vdsm-devel@lists.fedorahosted.org Sent: Wednesday, December 5, 2012 8:37:23 AM Subject: Re: [vdsm] VDSM tasks, the future On 12/05/2012 12:20 AM, Saggi Mizrahi wrote: As the only subsystem to use asynchronous tasks until now is the storage subsystem I suggest going over how I suggest tackling task creation, task stop, task remove and task recovery. Other subsystem can create similar mechanisms depending on their needs. There is no way of avoiding it, different types of tasks need different ways of tracking\recovering from them. network should always auto-recover because it can't get a please fix command if the network is down. Storage on the other hand should never start operations on it's own because it might take up valuable resources from the host. Tasks that need to be tracked on a single host, 2 hosts, or the entire cluster need to have their own APIs. VM configuration never persist across reboots, networking sometimes persists and storage always persists. This means that recovery procedures (from the managers point of view) need to be vastly different. Add policy, resource allocation, and error flows you see that VDSM doesn't have nearly as much information to deal with the tasks. - Original Message - From: Adam Litke a...@us.ibm.com To: Saggi Mizrahi smizr...@redhat.com Cc: VDSM Project Development vdsm-devel@lists.fedorahosted.org , engine-devel engine-de...@ovirt.org , Ayal Baron aba...@redhat.com , Barak Azulay bazu...@redhat.com , Shireesh Anjal san...@redhat.com Sent: Tuesday, December 4, 2012 3:50:28 PM Subject: Re: VDSM tasks, the future On Tue, Dec 04, 2012 at 10:35:01AM -0500, Saggi Mizrahi wrote: Because I started hinting about how VDSM tasks are going to look going forward I thought it's better I'll just write everything in an email so we can talk about it in context. This is not set in stone and I'm still debating things myself but it's very close to being done. Don't debate them yourself, debate them here! Even better, propose your idea in schema form to show how a command might work exactly. I don't like throwing ideas in the air It can be much easier to understand the flow of a task in vdsm and outside vdsm by a small schema, mainly for the each task's states. To define the flow of a task you can separate between type of tasks (network, storage, vms, or else), we should have task's states that clarify if the task can be recovered or not, can be canceled or not and inc.. Canceling\Aborting\Reverting states should be more clarified and not every state can lead to all types of states. I tries to figure how task flow works today in vdsm, and this is what I've got - http://wiki.ovirt.org/Vdsm_tasks - Everything is asynchronous. The nature of message based communication is that you can't have synchronous operations. This is not really debatable because it's just how TCP\AMQP\messaging works. Can you show how a traditionally synchronous command might work? Let's take Host.getVmList as an example. The same as it works today, it's all a matter of how you wrap the transport layer. You will send a json-rpc request and wait for a response with the same id. As for the bindings, there are a lot of way we can tackle that. Always wait for the response and simulate synchronous behavior. Make every method return an object to track the task. task = host.getVmList() if not task.wait(1): task.cancel() else: res = task.result() It looks like traditional timeout.. why not to split blocking actions and non-blocking actions, non-blocking action will supply callback function to return to if the task fails or success. for example: createAsyncTask(host.getVmList, params, timeout=30, callbackGetVmList) Instead of using the dispatcher? Do you want to keep the dispatcher concept? Have it both ways (it's auto generated anyway) and have list = host.getVmList() task = host.getVmList_async() Have a high level and low level interfaces. host = host() host.connect(tcp://host:3233) req = host.sendRequest(123213, getVmList, []) if not req.wait(1): shost = SynchHost(host) shost.getVmList() # Actually wraps a request object ahost = AsyncHost(host) task = getVmList() # Actually wraps a request object - Task IDs will be decided by the caller. This is how json-rpc works and also makes sense because no the engine can track the task without needing to have a stage where we give it the task ID back. IDs are reusable as long as no one else is using them at the time so they can be used
[vdsm] VDSM tasks, the future
Because I started hinting about how VDSM tasks are going to look going forward I thought it's better I'll just write everything in an email so we can talk about it in context. This is not set in stone and I'm still debating things myself but it's very close to being done. - Everything is asynchronous. The nature of message based communication is that you can't have synchronous operations. This is not really debatable because it's just how TCP\AMQP\messaging works. - Task IDs will be decided by the caller. This is how json-rpc works and also makes sense because no the engine can track the task without needing to have a stage where we give it the task ID back. IDs are reusable as long as no one else is using them at the time so they can be used for synchronizing operations between clients (making sure a command is only executed once on a specific host without locking). - Tasks are transient If VDSM restarts it forgets all the task information. There are 2 ways to have persistent tasks: 1. The task creates an object that you can continue work on in VDSM. The new storage does that by the fact that copyImage() returns one the target volume has been created but before the data has been fully copied. From that moment on the stat of the copy can be queried from any host using getImageStatus() and the specific copy operation can be queried with getTaskStatus() on the host performing it. After VDSM crashes, depending on policy, either VDSM will create a new task to continue the copy or someone else will send a command to continue the operation and that will be a new task. 2. VDSM tasks just start other operations track-able not through the task interface. For example Gluster. gluster.startVolumeRebalance() will return once it has been registered with Gluster. glster.getOperationStatuses() will return the state of the operation from any host. Each call is a task in itself. - No task tags. They are silly and the caller can mangle whatever in the task ID if he really wants to tag tasks. - No explicit recovery stage. VDSM will be crash-only, there should be efforts to make everything crash-safe. If that is problematic, in case of networking, VDSM will recover on start without having a task for it. - No clean Task: Tasks can be started by any number of hosts this means that there is no way to own all tasks. There could be cases where VDSM starts tasks on it's own and thus they have no owner at all. The caller needs to continually track the state of VDSM. We will have brodcasted events to mitigate polling. - No revert Impossible to implement safely. - No SPM\HSM tasks SPM\SDM is no longer necessary for all domain types (only for type). What used to be SPM tasks, or tasks that persist and can be restarted on other hosts is talked about in previous bullet points. ___ vdsm-devel mailing list vdsm-devel@lists.fedorahosted.org https://lists.fedorahosted.org/mailman/listinfo/vdsm-devel
[vdsm] RFC: New Storage API
I've been throwing a lot of bits out about the new storage API and I think it's time to talk a bit. I will purposefully try and keep implementation details away and concentrate about how the API looks and how you use it. First major change is in terminology, there is no long a storage domain but a storage repository. This change is done because so many things are already called domain in the system and this will make things less confusing for new-commers with a libvirt background. One other changes is that repositories no longer have a UUID. The UUID was only used in the pool members manifest and is no longer needed. connectStorageRepository(repoId, repoFormat, connectionParameters={}): repoId - is a transient name that will be used to refer to the connected domain, it is not persisted and doesn't have to be the same across the cluster. repoFormat - Similar to what used to be type (eg. localfs-1.0, nfs-3.4, clvm-1.2). connectionParameters - This is format specific and will used to tell VDSM how to connect to the repo. disconnectStorageRepository(self, repoId): In the new API there are only images, some images are mutable and some are not. mutable images are also called VirtualDisks immutable images are also called Snapshots There are no explicit templates, you can create as many images as you want from any snapshot. There are 4 major image operations: createVirtualDisk(targetRepoId, size, baseSnapshotId=None, userData={}, options={}): targetRepoId - ID of a connected repo where the disk will be created size - The size of the image you wish to create baseSnapshotId - the ID of the snapshot you want the base the new virtual disk on userData - optional data that will be attached to the new VD, could be anything that the user desires. options - options to modify VDSMs default behavior returns the id of the new VD createSnapshot(targetRepoId, baseVirtualDiskId, userData={}, options={}): targetRepoId - The ID of a connected repo where the new sanpshot will be created and the original image exists as well. size - The size of the image you wish to create baseVirtualDisk - the ID of a mutable image (Virtual Disk) you want to snapshot userData - optional data that will be attached to the new Snapshot, could be anything that the user desires. options - options to modify VDSMs default behavior returns the id of the new Snapshot copyImage(targetRepoId, imageId, baseImageId=None, userData={}, options={}) targetRepoId - The ID of a connected repo where the new image will be created imageId - The image you wish to copy baseImageId - if specified, the new image will contain only the diff between image and Id. If None the new image will contain all the bits of image Id. This can be used to copy partial parts of images for export. userData - optional data that will be attached to the new image, could be anything that the user desires. options - options to modify VDSMs default behavior return the Id of the new image. In case of copying an immutable image the ID will be identical to the original image as they contain the same data. However the user should not assume that and always use the value returned from the method. removeImage(repositoryId, imageId, options={}): repositoryId - The ID of a connected repo where the image to delete resides imageId - The id of the image you wish to delete. getImageStatus(repositoryId, imageId) repositoryId - The ID of a connected repo where the image to check resides imageId - The id of the image you wish to check. All operations return once the operations has been committed to disk NOT when the operation actually completes. This is done so that: - operation come to a stable state as quickly as possible. - In case where there is an SDM, only small portion of the operation actually needs to be performed on the SDM host. - No matter how many times the operation fails and on how many hosts, you can always resume the operation and choose when to do it. - You can stop an operation at any time and remove the resulting object making a distinction between stop because the host is overloaded to I don't want that image This means that after calling any operation that creates a new image the user must then call getImageStatus() to check what is the status of the image. The status of the image can be either optimized, degraded, or broken. Optimized means that the image is available and you can run VMs of it. Degraded means that the image is available and will run VMs but it might be a better way VDSM can represent the underlying data. Broken means that the image can't be used at the moment, probably because not all the data has been set up on the volume. Apart from that VDSM will also return the last persisted status information which will conatin hostID - the last host to try and optimize of fix the image stage - X/Y (eg. 1/10) the last persisted stage of the fix. percent_complete - -1 or 0-100, the
Re: [vdsm] RFC: New Storage API
- Original Message - From: Adam Litke a...@us.ibm.com To: Saggi Mizrahi smizr...@redhat.com Cc: VDSM Project Development vdsm-devel@lists.fedorahosted.org, engine-devel engine-de...@ovirt.org Sent: Tuesday, December 4, 2012 6:08:25 PM Subject: Re: [vdsm] RFC: New Storage API Thanks for sharing this. It's nice to have something a little more concrete to think about. Just a few comments and questions inline to get some discussion flowing. On Tue, Dec 04, 2012 at 04:52:40PM -0500, Saggi Mizrahi wrote: I've been throwing a lot of bits out about the new storage API and I think it's time to talk a bit. I will purposefully try and keep implementation details away and concentrate about how the API looks and how you use it. First major change is in terminology, there is no long a storage domain but a storage repository. This change is done because so many things are already called domain in the system and this will make things less confusing for new-commers with a libvirt background. One other changes is that repositories no longer have a UUID. The UUID was only used in the pool members manifest and is no longer needed. connectStorageRepository(repoId, repoFormat, connectionParameters={}): We should probably add an options/flags parameter for extension of all new APIs. Usually I agree but connectionParameters is already generic enough :) repoId - is a transient name that will be used to refer to the connected domain, it is not persisted and doesn't have to be the same across the cluster. repoFormat - Similar to what used to be type (eg. localfs-1.0, nfs-3.4, clvm-1.2). connectionParameters - This is format specific and will used to tell VDSM how to connect to the repo. disconnectStorageRepository(self, repoId): I assume 'self' is a mistake here. Just want to clarify given all of the recent talk about instances vs. namespaces. Yea, it's just pasted from my code In the new API there are only images, some images are mutable and some are not. mutable images are also called VirtualDisks immutable images are also called Snapshots By mutable you mean writable right? Or does the word mutable imply more than that? It's a semantic distinction due to implementation details, in general terms, yes. There are no explicit templates, you can create as many images as you want from any snapshot. There are 4 major image operations: createVirtualDisk(targetRepoId, size, baseSnapshotId=None, userData={}, options={}): Is userdata a 'StringMap'? currently it's a json object. We could limit it to a string map and trust the client to parse types. We can have it be a string\blob and trust the user to serialize the data. It's pass-through object either way. I will reopen the argument about an options dict vs a flags parameter. I oppose the dict for expansion because I think it causes APIs to devolve into a mess where lots of arbitrary and not well thought out overrides are packed into the dict over time. A flags argument (in json and python it can be an enum array) limits us to really switching flags on and off instead of passing arbitrary data. We already have strategy that we know we want to have several options. Other stuff that have been suggested is to be able to override the img format (qcow2\qed) The way I envision it is having an class opts = CommandOptions() that you add opts.addStringOption(key, value) opts.addIntOption(key, 3) opt.addBoolOption(key, True) I know you could just as well have strategy_space_flag and strategy_performance_flag and fail the operation if they both exist. Since it is a matter of personal taste I think it should be decided by a vote. targetRepoId - ID of a connected repo where the disk will be created size - The size of the image you wish to create baseSnapshotId - the ID of the snapshot you want the base the new virtual disk on userData - optional data that will be attached to the new VD, could be anything that the user desires. options - options to modify VDSMs default behavior returns the id of the new VD createSnapshot(targetRepoId, baseVirtualDiskId, userData={}, options={}): targetRepoId - The ID of a connected repo where the new sanpshot will be created and the original image exists as well. size - The size of the image you wish to create Why is this needed? Doesn't the size of a snapshot have to be equal to its base image? oops, another copy\paste error, you can see this arg doesn't exist in the method signature. My proof reading do need more work. baseVirtualDisk - the ID of a mutable image (Virtual Disk) you want to snapshot Can you snapshot a snapshot? In that case, this parameter should be called baseImage. You can't snapshot a sanpshot, it makes no sense as it can't change and you will get the same object. userData - optional data
Re: [vdsm] [RFC]about the implement of text-based console
Sorry, it's probably the fact that I don't have enough time to go into the code but I still don't get what you are trying to do. Having it in HTTP and XML-RPC is a bad idea but I imagine that the theoretical solution doesn't depend on any of them. Could you just show some pseudo code of a client using the stream? - Original Message - From: Zhou Zheng Sheng zhshz...@linux.vnet.ibm.com To: Saggi Mizrahi smizr...@redhat.com, Adam Litke a...@us.ibm.com Cc: VDSM Project Development vdsm-devel@lists.fedorahosted.org Sent: Friday, November 30, 2012 10:12:19 PM Subject: Re: [vdsm] [RFC]about the implement of text-based console Hi all, in this mail I further explain how solution 5 (console streaming API) works, and propose a virtual HTTP server live inside existing XMLRPC server with a request router. You can have a look at http://gerrit.ovirt.org/9605 on 11/28/2012 01:09, Adam Litke wrote: One issue that was raised is console buffering. What happens if a client does not call getConsoleReadStream() fast enough? Will characters be dropped? This could create a reliability problem and would make scripting against this interface risky at best. on 11/28/2012 01:45, Saggi Mizrahi wrote: I don't really understand 5. What does those methods return the virtio dev path? As I know, HTTP supports persistent connection and data streaming, this is popular for AJAX applications and live video broadcasting servers. The client sends one GET request to server, and server returns a data stream, then the client reads the stream continuously. XMLRPC and REST calls relies on HTTP, so I was considering getConsoleReadStream() can utilize streaming feature in HTTP, and VDSM just forwards the console data when it is called. Unfortunately I can not find out how XMLRPC and REST supports data streaming, because XML and JSON do not support containing a continuous stream object. It seems that to get the continuous stream data, the client must call getConsoleReadStream() again and again. I think it's expensive to call getConsoleReadStream() very frequently to get the data, and it may cause a notable delay, which is not acceptable for interactive console. I am thinking of providing console stream through HTTP(s) directly. A virtual server can forward the data from guest serial console by traditional HTTP streaming method (GET /consoleStream/vmid HTTP/1.0), and can forward the input data from the user by POST method as well(POST /consoleStream/vmid HTTP/1.0), or forward input and output stream at the same time in a POST request. This virtual server can be further extended to serve downloading guest crash core dump, and we can provide flexible authentication policies in this server. The auth for HTTP requests can be different from the XMLRPC request. The normal XMLRPC requests are always POST / HTTP/1.0 or POST /RPC2 HTTP/1.0. So this virtual server can live inside the existing XMLRPC server, just with a request router. I read the code implementing the XMLRPC binding and find that implementing a request router is not very complex. We can multiplex the port 54321, and route the raw HTTP request to the virtual server while normal XMLRPC request still goes to XMLRPC handler. This means it can serve XMLRPC request as vdsClient -s localhost getVdsCaps at the same time it can serve a wget client as wget --no-check-certificate \ --certificate=/etc/pki/vdsm/certs/vdsmcert.pem \ --private-key=/etc/pki/vdsm/keys/vdsmkey.pem \ --ca-certificate=/etc/pki/vdsm/certs/cacert.pem \ https://localhost:54321/console/vmid I try to implement a simple request router at http://gerrit.ovirt.org/9605 If interested, you can have a look it. It can pass the recently add VDSM functional tests, and can serve wget requests at the same time. If we do not like this idea, I think only the solution of extending spice will fulfill our requirements. There are obvious problems in other solutions. - Original Message - From: Zhou Zheng Sheng zhshz...@linux.vnet.ibm.com To: VDSM Project Development vdsm-devel@lists.fedorahosted.org Sent: Tuesday, November 27, 2012 4:22:20 AM Subject: Re: [vdsm] [RFC]about the implement of text-based console Hi all, For now in there is no agreement on the remote guest console solution, so I decide to do some investigation continue the discussion. Our goal VM serial console remote access in CLI mode. That means the client runs without X environment. Do you mean like running qemu with -curses? I mean like virsh console -- Thanks and best regards! Zhou Zheng Sheng / 周征晟 E-mail: zhshz...@linux.vnet.ibm.com Telephone: 86-10-82454397 ___ vdsm-devel mailing list vdsm-devel@lists.fedorahosted.org https://lists.fedorahosted.org/mailman/listinfo/vdsm-devel
Re: [vdsm] [VDSM][RFC] hsm service standalone
HSM is not a package it's an application. Currently it and the rest of VDSM share the same process but they use RPC to communicate. This is done so that one day we can actually have them run as different processes. HSM is not something you import, it's a daemon you communicate with. - Original Message - From: Dan Kenigsberg dan...@redhat.com To: Saggi Mizrahi smizr...@redhat.com Cc: Sheldon shao...@linux.vnet.ibm.com, a...@linux.vnet.ibm.com, vdsm-devel@lists.fedorahosted.org, Zheng Sheng ZS Zhou zhshz...@cn.ibm.com Sent: Monday, December 3, 2012 12:01:28 PM Subject: Re: [vdsm] [VDSM][RFC] hsm service standalone On Mon, Dec 03, 2012 at 11:35:44AM -0500, Saggi Mizrahi wrote: There are a bunch of precondition to having HSM pulled out. On simple things is that someone needs to go through storage/misc.py and utils.py and move all the code out to logical packages. There also needs to be a bit of a rearrangement of the code files so they can import the reusable code properly. I am also still very much against putting core VDSM in to site-packages. Would you elaborate on your position? Do you mind the wrong implications this may give about API stability? ___ vdsm-devel mailing list vdsm-devel@lists.fedorahosted.org https://lists.fedorahosted.org/mailman/listinfo/vdsm-devel
Re: [vdsm] RFD: API: Identifying vdsm objects in the next-gen API
- Original Message - From: Adam Litke a...@us.ibm.com To: Saggi Mizrahi smizr...@redhat.com Cc: engine-de...@linode01.ovirt.org, Dan Kenigsberg dan...@redhat.com, Federico Simoncelli fsimo...@redhat.com, Ayal Baron aba...@redhat.com, vdsm-devel@lists.fedorahosted.org Sent: Monday, December 3, 2012 3:30:21 PM Subject: Re: RFD: API: Identifying vdsm objects in the next-gen API On Thu, Nov 29, 2012 at 05:59:09PM -0500, Saggi Mizrahi wrote: - Original Message - From: Adam Litke a...@us.ibm.com To: Saggi Mizrahi smizr...@redhat.com Cc: engine-de...@linode01.ovirt.org, Dan Kenigsberg dan...@redhat.com, Federico Simoncelli fsimo...@redhat.com, Ayal Baron aba...@redhat.com, vdsm-devel@lists.fedorahosted.org Sent: Thursday, November 29, 2012 5:22:43 PM Subject: Re: RFD: API: Identifying vdsm objects in the next-gen API On Thu, Nov 29, 2012 at 04:52:14PM -0500, Saggi Mizrahi wrote: They are not future proof as the paradigm is completely different. Storage domain IDs are not static any more (and are not guaranteed to be unique or the same across the cluster. Image IDs represent the ID of the projected data and not the actual unique path. Just as an example, to run a VM you give a list of domains that might contain the needed images in the chain and the image ID of the tip. The paradigm is changed to and most calls get non synchronous number of images and domains. Further more, the APIs themselves are completely different. So future proofing is not really an issue. I don't understand this at all. Perhaps we could all use some education on the architecture of the planned architectural changes. If I can pass an arbitrary list of domainIDs that _might_ contain the data, why wouldn't I just pass all of them every time? In that case, why are they even required since vdsm would have to search anyway? It's for optimization mostly, the engine usually has a good idea of where stuff are, having it give hints to VDSM can speed up the search process. also, then engines knows how transient some storage pieces are. If you have a domain that is only there for backup or owned by another manager sharing the host, you don't want you VMs using the disks that are on that storage effectively preventing it from being removed (though we do have plans to have qemu switch base snapshots at runtime for just that). This is not a clean design. If the search is slow, then maybe we need to improve caching internally. Making a client cache a bunch of internal IDs to pass around sounds like a complete layering violation to me. You can't cache this, if the same template exists on an 2 different NFS domains only the engine has enough information to know which you should use. We only have the engine give us thing information when starting a VM or merging\copying an image that resides on multiple domains. It is also completely optional. I didn't like it either. As to making the current API a bit simpler. As I said, making them opaque is problematic as currently the engine is responsible for creating the IDs. As I mentioned in my last post, engine still can specify the ID's when the object is first created. From that point forward the ID never changes so it can be baked into the identifier. Where will this identifier be persisted? Further more, some calls require you to play with these (making a template instead of a snapshot). Also, the full chain and topology needs to be completely visible to the engine. Please provide a specific example of how you play with the IDs. I can guess where you are going, but I don't want to divert the thread. The relationship between volumes and images is deceptive at the moment. IMG is the chain and volume is a member, IMGUUID is only used to for verification and to detect when we hit a template going up the chain. When you do operation on images assumptions are being guaranteed about the resulting IDs. When you copy an image, you assume to know all the new IDs as they remain the same. With your method I can't tell what the new opaque result is going to be. Preview mode (another abomination being deprecated) relies on the disconnect between imgUUID and volUUID. Live migration currently moves a lot of the responsibility to the engine. No client should need to know about all of these internal details. I understand that's the way it is today, and that's one of the main reasons that the API is a complete pain to use. You are correct but this is how this API was designed you can't get away from that. These things, as you said, are problematic. But this is the way things are today. We are changing them. Any intermediary step is needlessly problematic
[vdsm] object instancing in the new VDSM API
Currently the suggested scheme treats everything as instances and object have methods. This puts instancing as the responsibility of the API bindings. I suggest changing it to the way json was designed with namespaces and methods. For example instead for the api being: vm = host.getVMsList()[0] vm.getInfo() the API should be: vmID = host.getVMsList()[0] api.VMsManager.getVMInfo(vmID) And it should be up to decide how to wrap everything in objects. The problem with the API bindings controlling the instancing is that: 1) We have to *have* and *pass* implicit api obj which is problematic to maintain. For example, you have to have the api object as a member of instance for the method calls to work. This means that you can't recreate or pool API objects easily. You effectively need to add a move method to move the object to another API object to use it on a different host. 2) Because the objects are opaque it might be hard to know what fields of the instance to persist to get the same object. 3) It breaks the distinction between by-value and by-reference objects. 4) Any serious user will make it's own instance classes that conform to it's design and flow so they don't really add any convenience to anything apart for tests. You will create you're own VM object, and because it's in the manager scope it will be the same instance across all hosts. Instead of being able to pass the same ID to any host (as the vmID remains the same) you will have to create and instance object to use either before every call for simplicity or cache for each host for performance benefits. 5) It makes us pass a weird __obj__ parameter to each call that symbolizes self and makes it hard for a user that choose to use it's own bindings to understand what it does. 6) It's syntactic sugar at best that adds needless limitation to how a user can play with the IDs and the API. I personally think there is a reason why json-rpc defines name-spaces and methods and forgoes instance. It's simpler (for the implementation), more flexible, and it give the user more choice. Trying to hack that in will just cause needless complications IMHO. IDs should are just strings no need to complicate them By-Value objects should still be defined and instantiated by the bindings because unlike IDs we need to make sure all the fields exist and are in the correct type. ___ vdsm-devel mailing list vdsm-devel@lists.fedorahosted.org https://lists.fedorahosted.org/mailman/listinfo/vdsm-devel
Re: [vdsm] object instancing in the new VDSM API
So from what I gather the only thing that is bothering you is that storage operations require a lot of IDs. I get that, I hate that to. It doesn't change the point that it was designed that way. Even if you deem some use cases irrelevant it wouldn't change the fact that this is how people use it now. And because we are going to throw it away as soon as we can there is no reason to shape out API around that. So from what I gather we agree on instancing. --- From this moment on I'm going to try my best to explain how VDSM storage currently works. It is filled with misdirection and bad design. I hope that after that you will understand why you can't pack all the IDs together. Let's start with the storage pool. Because it was simpler to have all metadata changing operations run on the same host someone needed to find a way to make cross domain operations work on the same host. The solution was to band them all to a single entity call the storage pool and have a single lock. The point was to have a host be able to connect to multiple pools at a time. Due to bad code (that could have been easily not have been so bad) the multiple pools feature was never implemented. Because the single lock to rule them all doesn't really work when you want to secure domain we had to add more locks making the pool concept obsolete. These means that you can trust VDSM to only be connected to a single pool at the time, this means that if you want to change anything you can just remove the pool arg. Lets go to volumes and images. Contrary to how it's name imgUUID does not represent and image. It's actually a tag given to part of a chain. This is commonly used to differentiate between parts of the chain responsible for VM images and templates. Due to bad code a lot of the possible combinations are not supported but that is the intention. imgUUID being a tag means that it serves 3 purposes depending on the verb that uses it. 1) In some verbs it used as a useless sanity check to make sure the volume is tagged with this sdUUID. This I imagine was done because someone didn't fully comprehend how and why you do sanity checks. This means that in some verbs you can just remove it (if you are actually changing anything) 2) In some verbs it's meant to distinct the volume from it's original chain (creating a template). At that point it's actually now being invented by the caller. 3) Operations that act on the whole chain, if volUUID is there is for the same useless sanity check and can be removed. What you need to get out of this is that most of the time you can use less IDs just by removing useless imgUUID or volUUID args. Further more, you need to understand that they are not hierarchical. imgUUID is a tag on the volume. similar to user for a file. As for domain IDs, because the caller can choose to reuse imgUUIDs and volUUIDs on different domains and some flows actually depend on that. To make things simpler some verbs should be split up so how you specify that target volID doesn't affect the actual command. This means that copyImage() and createTemplate() should be split to: copyImage(dstDomain, srcDomain, imgUUID) createTemplate(dstDomain, dstImgUUID, srcDomain, srcImgUUID) That being said, I'm personally still against an indeterminate storage API because of engine adoption problems. But if you want to fix the current interface. Packing up the IDs to a single ID wouldn't work and is logically wrong. What you need to do is remove redundant arguments and split up verbs that do more then one thing. - Original Message - From: Adam Litke a...@us.ibm.com To: Saggi Mizrahi smizr...@redhat.com Cc: vdsm-devel vdsm-de...@fedorahosted.org, Ayal Baron aba...@redhat.com, Barak Azulay bazu...@redhat.com, ybronhei ybron...@redhat.com Sent: Monday, December 3, 2012 5:46:31 PM Subject: Re: object instancing in the new VDSM API On Mon, Dec 03, 2012 at 04:34:28PM -0500, Saggi Mizrahi wrote: Currently the suggested scheme treats everything as instances and object have methods. This puts instancing as the responsibility of the API bindings. I suggest changing it to the way json was designed with namespaces and methods. For example instead for the api being: vm = host.getVMsList()[0] vm.getInfo() the API should be: vmID = host.getVMsList()[0] api.VMsManager.getVMInfo(vmID) And it should be up to decide how to wrap everything in objects. For VMs, your example looks nice, but for today's Volumes it's not so nice. To properly identify a Volume, we must pass the storage pool id, storage domain id, image id, and volume id. If we are working with two Volumes, we would need 8 parameters unless we optimize for context and assume that the storage pool uuid is the same for both volumes, etc. The problem with that optimization is that we require clients to understand internal implementation details. How should the StorageDomain.getVolumes API
Re: [vdsm] RFD: API: Identifying vdsm objects in the next-gen API
This is all only valid for the current storage API the new one doesn't have pools or volumes. Only domains and images. Also, images and domains are more loosely coupled and make this method problematic. That being said, if we do choose to make the current storage API officially supported I do agree that it looks a bit simpler but for the price of forcing the user to construct these objects before sending the request. I know for a fact that the engine will just create these objects on the fly because they use their own objects to group things logically. This means adding more work instead of removing it. Most clients will do that anyway as they will use their own DAL to store these relationships. - Original Message - From: Adam Litke a...@us.ibm.com To: vdsm-devel@lists.fedorahosted.org Cc: engine-de...@linode01.ovirt.org, Dan Kenigsberg dan...@redhat.com, Federico Simoncelli fsimo...@redhat.com, Saggi Mizrahi smizr...@redhat.com, Ayal Baron aba...@redhat.com Sent: Thursday, November 29, 2012 12:19:06 PM Subject: RFD: API: Identifying vdsm objects in the next-gen API Today in vdsm, every object (StoragePool, StorageDomain, VM, Volume, etc) is identified by a single UUID. On the surface, it seems like this is enough info to properly identify a resource but in practice it's not. For example, when you look at the API's dealing with Volumes, almost all of them require an sdUUID, spUUID, and imgUUID in order to provide proper context for the operation. Needing to provide these extra UUIDs is a burden on the API user because knowing which values to pass requires internal knowledge of the API. For example, the spUUID parameter is almost always just the connected storage pool. Since we know there can currently be only one connected pool, the value is known. I would like to move away from needing to understand all of these relationships from the end user perspective by encapsulating the extra context into new object identifier types as follows: StoragePoolIdentifier: { 'storagepoolID': 'UUID' } StorageDomainIdentifier: { 'storagepoolID*': 'UUID', 'storagedomainID': 'UUID' } ImageIdentifier: { 'storagepoolID*': 'UUID', 'storagedomainID': 'UUID', 'imageID': 'UUID' } VolumeIdentifier: { 'storagepoolID*': 'UUID', 'storagedomainID': 'UUID', 'imageID': 'UUID', 'volumeID': 'UUID' } TaskIdentifier: { 'taskID': 'UUID' } VMIdentifier: { 'vmID': 'UUID' } In the new API, anytime a reference to an object is required, one of the above structures must be passed in place of today's single UUID. In many cases, this will allow us to reduce the number of parameters to the function since the needed contextual parameters (spUUID, etc) will be part of the object's identifier. Similarly, any time the API returns an object reference it would return a *Identifier instead of a bare UUID. These identifier types are basically opaque blobs to the API users and are only ever generated by vdsm itself. Because of this, we can change the internal structure of the identifier to require new information or (before freezing the API) remove fields that no longer make sense. I would greatly appreciate your comments on this proposal. If it seems reasonable, I will revamp the current schema to make the necessary changes and provide the Bridge patch functions to convert between the current implementation and the new schema. --- sample schema patch --- commit 48f6b0f0a111dd0b372d211a4e566ce87f375cee Author: Adam Litke a...@us.ibm.com Date: Tue Nov 27 14:14:06 2012 -0600 schema: Introduce class identifier types When calling API methods that belong to a particular class, a class instance must be indicated by passing a set of identifiers in the request. The location of these parameters within the request is: 'params' - '__obj__'. Since this set of identifiers must be used together to correctly instantiate an object, it makes sense to define these as proper types within the API. Then, functions that return an object (or list of objects) can refer to the correct type. Signed-off-by: Adam Litke a...@us.ibm.com diff --git a/vdsm_api/vdsmapi-schema.json b/vdsm_api/vdsmapi-schema.json index 0418e6e..7e2e851 100644 --- a/vdsm_api/vdsmapi-schema.json +++ b/vdsm_api/vdsmapi-schema.json @@ -937,7 +937,7 @@ # Since: 4.10.0 ## {'command': {'class': 'Host', 'name': 'getConnectedStoragePools'}, - 'returns': ['StoragePool']} + 'returns': ['StoragePoolIdentifier']} ## # @BlockDeviceType: @@ -1572,7 +1572,7 @@ {'command': {'class': 'Host', 'name': 'getStorageDomains'}, 'data': {'*storagepoolID': 'UUID', '*domainClass': 'StorageDomainImageClass', '*storageType': 'StorageDomainType', '*remotePath': 'str'}, - 'returns': ['StorageDomain']} + 'returns': ['StorageDomainIdentifier
Re: [vdsm] RFD: API: Identifying vdsm objects in the next-gen API
They are not future proof as the paradigm is completely different. Storage domain IDs are not static any more (and are not guaranteed to be unique or the same across the cluster. Image IDs represent the ID of the projected data and not the actual unique path. Just as an example, to run a VM you give a list of domains that might contain the needed images in the chain and the image ID of the tip. The paradigm is changed to and most calls get non synchronous number of images and domains. Further more, the APIs themselves are completely different. So future proofing is not really an issue. As to making the current API a bit simpler. As I said, making them opaque is problematic as currently the engine is responsible for creating the IDs. Further more, some calls require you to play with these (making a template instead of a snapshot). Also, the full chain and topology needs to be completely visible to the engine. These things, as you said, are problematic. But this is the way things are today. As for task IDs. Currently task IDs are only used for storage and they get persisted to disk. This is WRONG and is not the case with the new storage API. Because we moved to an asynchronous message based protocol (json-rpc over TCP\AMQP) there is no need to generate a task ID. it is built in to json-rpc. json-rpc specifies that the IDs have to be unique for a client as long as the request is still active. This is good enough as internally we can have a verb for a client to query it's own running tasks and a verb to query other host tasks by mangling in the client before the ID. Because the protocol is asynchronous all calls are asynchronous by nature well. Tasks will no longer be persisted or expected to be persisted. It's the callers responsibility to query the state and see if the operation succeeded or failed if the caller or VDSM died in the middle of the call. The current cleanTask() system can't be used when more then one client is using VDSM and will not be used for anything other then legacy storage. AFAIK Apart from storage all objects IDs are constructed with a single ID, name or alias. VMs, storageConnections, network interfaces. So it's not a real issue. I agree that in the future we should keep the idiom of pass configuration once, name it, and keep using the name to reference the object. - Original Message - From: Adam Litke a...@us.ibm.com To: Saggi Mizrahi smizr...@redhat.com Cc: engine-de...@linode01.ovirt.org, Dan Kenigsberg dan...@redhat.com, Federico Simoncelli fsimo...@redhat.com, Ayal Baron aba...@redhat.com, vdsm-devel@lists.fedorahosted.org Sent: Thursday, November 29, 2012 4:18:40 PM Subject: Re: RFD: API: Identifying vdsm objects in the next-gen API On Thu, Nov 29, 2012 at 02:16:42PM -0500, Saggi Mizrahi wrote: This is all only valid for the current storage API the new one doesn't have pools or volumes. Only domains and images. Also, images and domains are more loosely coupled and make this method problematic. I am looking for an incremental way to bridge the differences. It's been 2 years and we still don't have the revamped storage API so I am planning on what we have being around for awhile :) I think that defining object identifiers as opaque structured types is also future proof. In the future an Image-ng object we can drop 'storagepoolID' from the identifier and, if it makes sense, remove the hard association with a storageDomain as well. The point behind this refactoring is to give us the option of coupling multiple UUID's (or other data) to form a single, opaque identifier. That being said, if we do choose to make the current storage API officially supported I do agree that it looks a bit simpler but for the price of forcing the user to construct these objects before sending the request. I know for a fact that the engine will just create these objects on the fly because they use their own objects to group things logically. This means adding more work instead of removing it. Most clients will do that anyway as they will use their own DAL to store these relationships. Thanks for bringing up some of these points. All deserve attention so I will address each one individually: The current API does not yet make an official statement of support for anything. I want to model the current storage API so that the node level API can have the same level of functionality as is currently supported. I am all for removing deprecated functions and redesigning in-place for a reasonable amount of time going forward. In a perfect world, libvdsm-1.0 would release with no mention of storage pools at all. If properly designed, the end-user (including engine) would never be constructing these objects itself. Object identifiers are essentially opaque structures. In order to make this possible, we need to make sure that the API provides all of the functions needed to lookup
Re: [vdsm] RFD: API: Identifying vdsm objects in the next-gen API
- Original Message - From: Adam Litke a...@us.ibm.com To: Saggi Mizrahi smizr...@redhat.com Cc: engine-de...@linode01.ovirt.org, Dan Kenigsberg dan...@redhat.com, Federico Simoncelli fsimo...@redhat.com, Ayal Baron aba...@redhat.com, vdsm-devel@lists.fedorahosted.org Sent: Thursday, November 29, 2012 5:22:43 PM Subject: Re: RFD: API: Identifying vdsm objects in the next-gen API On Thu, Nov 29, 2012 at 04:52:14PM -0500, Saggi Mizrahi wrote: They are not future proof as the paradigm is completely different. Storage domain IDs are not static any more (and are not guaranteed to be unique or the same across the cluster. Image IDs represent the ID of the projected data and not the actual unique path. Just as an example, to run a VM you give a list of domains that might contain the needed images in the chain and the image ID of the tip. The paradigm is changed to and most calls get non synchronous number of images and domains. Further more, the APIs themselves are completely different. So future proofing is not really an issue. I don't understand this at all. Perhaps we could all use some education on the architecture of the planned architectural changes. If I can pass an arbitrary list of domainIDs that _might_ contain the data, why wouldn't I just pass all of them every time? In that case, why are they even required since vdsm would have to search anyway? It's for optimization mostly, the engine usually has a good idea of where stuff are, having it give hints to VDSM can speed up the search process. also, then engines knows how transient some storage pieces are. If you have a domain that is only there for backup or owned by another manager sharing the host, you don't want you VMs using the disks that are on that storage effectively preventing it from being removed (though we do have plans to have qemu switch base snapshots at runtime for just that). As to making the current API a bit simpler. As I said, making them opaque is problematic as currently the engine is responsible for creating the IDs. As I mentioned in my last post, engine still can specify the ID's when the object is first created. From that point forward the ID never changes so it can be baked into the identifier. Where will this identifier be persisted? Further more, some calls require you to play with these (making a template instead of a snapshot). Also, the full chain and topology needs to be completely visible to the engine. Please provide a specific example of how you play with the IDs. I can guess where you are going, but I don't want to divert the thread. The relationship between volumes and images is deceptive at the moment. IMG is the chain and volume is a member, IMGUUID is only used to for verification and to detect when we hit a template going up the chain. When you do operation on images assumptions are being guaranteed about the resulting IDs. When you copy an image, you assume to know all the new IDs as they remain the same. With your method I can't tell what the new opaque result is going to be. Preview mode (another abomination being deprecated) relies on the disconnect between imgUUID and volUUID. Live migration currently moves a lot of the responsibility to the engine. These things, as you said, are problematic. But this is the way things are today. We are changing them. Any intermediary step is needlessly problematic for existing clients. Work is already in progress for fixing the API properly, making some calls a bit nicer isn't an excuse to start making more compatibility code in the engine. As for task IDs. Currently task IDs are only used for storage and they get persisted to disk. This is WRONG and is not the case with the new storage API. Because we moved to an asynchronous message based protocol (json-rpc over TCP\AMQP) there is no need to generate a task ID. it is built in to json-rpc. json-rpc specifies that the IDs have to be unique for a client as long as the request is still active. This is good enough as internally we can have a verb for a client to query it's own running tasks and a verb to query other host tasks by mangling in the client before the ID. Because the protocol is So this would rely on the client keeping the connection open and as soon as it disconnects it would lose the ability to query tasks from before the connection went down? I don't know if it's a good idea to conflate message ID's with task ID's. While the protocol can operate asynchronously, some calls have synchronous semantics and others have asynchronous semantics. I would expect sync calls to return their data immediately and async calls to return immediately with either: an error code, or an 'operation started' message and associated ID for querying the status of the operation. Upon reflection I agree that having the request ID unique per client
Re: [vdsm] MTU setting according to ifcfg files.
I suggest we don't have a default. If you don't specify an MTU it will use whatever is already configured. There is no way to go back to the defaults only to set a new value. The engine can assume 1500 (in case of ethernet devices) is the recommended value. - Original Message - From: Simon Grinberg si...@redhat.com To: Igor Lvovsky ilvov...@redhat.com Cc: VDSM Project Development vdsm-devel@lists.fedorahosted.org Sent: Wednesday, November 28, 2012 9:53:48 AM Subject: Re: [vdsm] MTU setting according to ifcfg files. - Original Message - From: Igor Lvovsky ilvov...@redhat.com To: VDSM Project Development vdsm-devel@lists.fedorahosted.org Cc: Simon Grinberg si...@redhat.com Sent: Wednesday, November 28, 2012 2:58:52 PM Subject: [vdsm] MTU setting according to ifcfg files. Hi, I am working on one of the vdsm bugs that we have and I found that initscripts (initscripts-9.03.34-1.el6.x86_64) behaviour doesn't fits our needs. So, I would like to raise this issue in the list. The issue is MTU setting according to ifcfg files. I'll try to describe the flow below. 1. I started with ifcfg file for the interface without MTU keyword at all and the proper interface (let say eth0) had the *default* MTU=1500 (according to /sys/class/net/eth0/mtu). 2. I created a bridge with MTU=9000 on top of this interface. Everything went OK. After I wrote MTU=9000 on ifcfg-eth0 and ifdown/ifup it, eth0 got the proper MTU. 3. Now, I removed the bridge and deleted MTU keyword from the ifcfg-eth0. But after ifup/ifdown the actual MTU of the eth0 stayed 9000. The only way to change it back to 1500 (or something else) is explicitly set MTU in ifcfg file. According to Bill Nottingham it is intentional behaviour. If so, we have a problem in vdsm, because we never set MTU value until user ask it explicitly. Actually you are, You where asked for MTU 9000 on the network, As implementation specif you had to do this all the way down the chain Now it's only reasonable that when you cancel the 9000 request then you'll do what is necessary to rollback the changes. It's pity that ifcfg-files don't have the option to set MTU='default', but as you can read this default before you change, then please keep it somewhere and revert to that. It means that if we have interface with MTU=9000 on it just because once there was a bridge with such MTU attached to it and now we want to attach regular bridge with *default* MTU=1500 we have a problem. The only thing we can do to avoid this it's set explicitly MTU=1500 in interface's ifcfg file. IMHO it's a bit ugly, but it looks like we have no choice. As usual comments more than welcome... Regards, Igor Lvovsky ___ vdsm-devel mailing list vdsm-devel@lists.fedorahosted.org https://lists.fedorahosted.org/mailman/listinfo/vdsm-devel ___ vdsm-devel mailing list vdsm-devel@lists.fedorahosted.org https://lists.fedorahosted.org/mailman/listinfo/vdsm-devel ___ vdsm-devel mailing list vdsm-devel@lists.fedorahosted.org https://lists.fedorahosted.org/mailman/listinfo/vdsm-devel
Re: [vdsm] MTU setting according to ifcfg files.
I don't want to keep the last configured MTU. It's problematic. Having a stack is even worse. VDSM should try not to persist anything if possible. Also, reverting to the last MTU is raceful and has weird corner cases. Best to just assume default it 1500 (Like all major OSs do). But since it's not really a default I would call it a recommended setting. - Original Message - From: Igor Lvovsky ilvov...@redhat.com To: Saggi Mizrahi smizr...@redhat.com Cc: VDSM Project Development vdsm-devel@lists.fedorahosted.org, Simon Grinberg si...@redhat.com Sent: Wednesday, November 28, 2012 11:10:27 AM Subject: Re: [vdsm] MTU setting according to ifcfg files. - Original Message - From: Saggi Mizrahi smizr...@redhat.com To: Simon Grinberg si...@redhat.com Cc: VDSM Project Development vdsm-devel@lists.fedorahosted.org, Igor Lvovsky ilvov...@redhat.com Sent: Wednesday, November 28, 2012 5:30:17 PM Subject: Re: [vdsm] MTU setting according to ifcfg files. I suggest we don't have a default. If you don't specify an MTU it will use whatever is already configured. There is no way to go back to the defaults only to set a new value. The engine can assume 1500 (in case of ethernet devices) is the recommended value. This is not related to engine. You are right that the actually MTU will the last configured one, but this is exactly a problem. As I already mentioned, if you will add another bridge without custom MTU its users (VMs) can assume that the MTU is 1500 - Original Message - From: Simon Grinberg si...@redhat.com To: Igor Lvovsky ilvov...@redhat.com Cc: VDSM Project Development vdsm-devel@lists.fedorahosted.org Sent: Wednesday, November 28, 2012 9:53:48 AM Subject: Re: [vdsm] MTU setting according to ifcfg files. - Original Message - From: Igor Lvovsky ilvov...@redhat.com To: VDSM Project Development vdsm-devel@lists.fedorahosted.org Cc: Simon Grinberg si...@redhat.com Sent: Wednesday, November 28, 2012 2:58:52 PM Subject: [vdsm] MTU setting according to ifcfg files. Hi, I am working on one of the vdsm bugs that we have and I found that initscripts (initscripts-9.03.34-1.el6.x86_64) behaviour doesn't fits our needs. So, I would like to raise this issue in the list. The issue is MTU setting according to ifcfg files. I'll try to describe the flow below. 1. I started with ifcfg file for the interface without MTU keyword at all and the proper interface (let say eth0) had the *default* MTU=1500 (according to /sys/class/net/eth0/mtu). 2. I created a bridge with MTU=9000 on top of this interface. Everything went OK. After I wrote MTU=9000 on ifcfg-eth0 and ifdown/ifup it, eth0 got the proper MTU. 3. Now, I removed the bridge and deleted MTU keyword from the ifcfg-eth0. But after ifup/ifdown the actual MTU of the eth0 stayed 9000. The only way to change it back to 1500 (or something else) is explicitly set MTU in ifcfg file. According to Bill Nottingham it is intentional behaviour. If so, we have a problem in vdsm, because we never set MTU value until user ask it explicitly. Actually you are, You where asked for MTU 9000 on the network, As implementation specif you had to do this all the way down the chain Now it's only reasonable that when you cancel the 9000 request then you'll do what is necessary to rollback the changes. It's pity that ifcfg-files don't have the option to set MTU='default', but as you can read this default before you change, then please keep it somewhere and revert to that. It means that if we have interface with MTU=9000 on it just because once there was a bridge with such MTU attached to it and now we want to attach regular bridge with *default* MTU=1500 we have a problem. The only thing we can do to avoid this it's set explicitly MTU=1500 in interface's ifcfg file. IMHO it's a bit ugly, but it looks like we have no choice. As usual comments more than welcome... Regards, Igor Lvovsky ___ vdsm-devel mailing list vdsm-devel@lists.fedorahosted.org https://lists.fedorahosted.org/mailman/listinfo/vdsm-devel ___ vdsm-devel mailing list vdsm-devel@lists.fedorahosted.org https://lists.fedorahosted.org/mailman/listinfo/vdsm-devel ___ vdsm-devel mailing list vdsm-devel@lists.fedorahosted.org https://lists.fedorahosted.org/mailman/listinfo/vdsm-devel
Re: [vdsm] MTU setting according to ifcfg files.
OK, I think I need to explain myself better, MTU sizes under 1500 are not interesting as they are only really valid for slow networks which will not be able to support virt workloads anyway. 1500 is internet MTU and is the recommended size when communicating with the outside world. MTU is just a size that has to be agreed upon by all participants in the chain. There is no inherent default MTU but default is technically 1500. Reverting to previous value makes no sense unless you are just testing something out. For that case the engine can remember the current MTU and set it back. To sum up, I suggest ignoring any previously set value like we would ignore it if VDSM had set it. It makes no sense to keep it because the semantic of setting the MTU is to override the current configuration. As a side note, having verb to test max MTU for a path might be a good idea to give the engine\user a way to recommend a value to the user. - Original Message - From: Saggi Mizrahi smizr...@redhat.com To: Igor Lvovsky ilvov...@redhat.com Cc: VDSM Project Development vdsm-devel@lists.fedorahosted.org, Simon Grinberg si...@redhat.com Sent: Wednesday, November 28, 2012 11:23:52 AM Subject: Re: [vdsm] MTU setting according to ifcfg files. I don't want to keep the last configured MTU. It's problematic. Having a stack is even worse. VDSM should try not to persist anything if possible. Also, reverting to the last MTU is raceful and has weird corner cases. Best to just assume default it 1500 (Like all major OSs do). But since it's not really a default I would call it a recommended setting. - Original Message - From: Igor Lvovsky ilvov...@redhat.com To: Saggi Mizrahi smizr...@redhat.com Cc: VDSM Project Development vdsm-devel@lists.fedorahosted.org, Simon Grinberg si...@redhat.com Sent: Wednesday, November 28, 2012 11:10:27 AM Subject: Re: [vdsm] MTU setting according to ifcfg files. - Original Message - From: Saggi Mizrahi smizr...@redhat.com To: Simon Grinberg si...@redhat.com Cc: VDSM Project Development vdsm-devel@lists.fedorahosted.org, Igor Lvovsky ilvov...@redhat.com Sent: Wednesday, November 28, 2012 5:30:17 PM Subject: Re: [vdsm] MTU setting according to ifcfg files. I suggest we don't have a default. If you don't specify an MTU it will use whatever is already configured. There is no way to go back to the defaults only to set a new value. The engine can assume 1500 (in case of ethernet devices) is the recommended value. This is not related to engine. You are right that the actually MTU will the last configured one, but this is exactly a problem. As I already mentioned, if you will add another bridge without custom MTU its users (VMs) can assume that the MTU is 1500 - Original Message - From: Simon Grinberg si...@redhat.com To: Igor Lvovsky ilvov...@redhat.com Cc: VDSM Project Development vdsm-devel@lists.fedorahosted.org Sent: Wednesday, November 28, 2012 9:53:48 AM Subject: Re: [vdsm] MTU setting according to ifcfg files. - Original Message - From: Igor Lvovsky ilvov...@redhat.com To: VDSM Project Development vdsm-devel@lists.fedorahosted.org Cc: Simon Grinberg si...@redhat.com Sent: Wednesday, November 28, 2012 2:58:52 PM Subject: [vdsm] MTU setting according to ifcfg files. Hi, I am working on one of the vdsm bugs that we have and I found that initscripts (initscripts-9.03.34-1.el6.x86_64) behaviour doesn't fits our needs. So, I would like to raise this issue in the list. The issue is MTU setting according to ifcfg files. I'll try to describe the flow below. 1. I started with ifcfg file for the interface without MTU keyword at all and the proper interface (let say eth0) had the *default* MTU=1500 (according to /sys/class/net/eth0/mtu). 2. I created a bridge with MTU=9000 on top of this interface. Everything went OK. After I wrote MTU=9000 on ifcfg-eth0 and ifdown/ifup it, eth0 got the proper MTU. 3. Now, I removed the bridge and deleted MTU keyword from the ifcfg-eth0. But after ifup/ifdown the actual MTU of the eth0 stayed 9000. The only way to change it back to 1500 (or something else) is explicitly set MTU in ifcfg file. According to Bill Nottingham it is intentional behaviour. If so, we have a problem in vdsm, because we never set MTU value until user ask it explicitly. Actually you are, You where asked for MTU 9000 on the network, As implementation specif you had to do this all the way down the chain Now it's only reasonable that when you cancel the 9000 request then you'll do what is necessary to rollback the changes. It's pity
Re: [vdsm] MTU setting according to ifcfg files.
- Original Message - From: Simon Grinberg si...@redhat.com To: Saggi Mizrahi smizr...@redhat.com Cc: VDSM Project Development vdsm-devel@lists.fedorahosted.org, Barak Azulay bazu...@redhat.com, Igor Lvovsky ilvov...@redhat.com Sent: Wednesday, November 28, 2012 12:03:03 PM Subject: Re: [vdsm] MTU setting according to ifcfg files. - Original Message - From: Saggi Mizrahi smizr...@redhat.com To: Igor Lvovsky ilvov...@redhat.com Cc: VDSM Project Development vdsm-devel@lists.fedorahosted.org, Simon Grinberg si...@redhat.com, Barak Azulay bazu...@redhat.com Sent: Wednesday, November 28, 2012 6:49:22 PM Subject: Re: [vdsm] MTU setting according to ifcfg files. OK, I think I need to explain myself better, MTU sizes under 1500 are not interesting as they are only really valid for slow networks which will not be able to support virt workloads anyway. 1500 is internet MTU and is the recommended size when communicating with the outside world. MTU is just a size that has to be agreed upon by all participants in the chain. There is no inherent default MTU but default is technically 1500. Reverting to previous value makes no sense unless you are just testing something out. Yes it does, There are networks out there that do use MTU 1500 as weird as it sounds, It not weird at all, this is why MTU settings exist. But setting a low MTU will not break the network but will just have some performance degredation. this usually the admin does initial settings on the management network and then when you set don't touch all works well. An example is when you have storage and management on the same network. Now consider the scenario that for some VMs the user wants to limit to the 'normal/recommended defaults' so in this case he will have to set in the logical network property to MTU=1500. when VDSM sets this chain it supposedly won't touch the interface MTU since it's already bigger (if it does it's a bug). Now the user has one more logical network of VMs with 9000 since he also have VMs using shared storage on this network. All works well till now. But what about when removing the 9000 network? Will VDSM 'remember' that it did not touch the interface MTU in the first place, or will it try to set it to this recommended MTU?. It's a question of ownership. Because it's simpler I suggest we assume ownership and always set the maximum needed (also lowering if to high). The engine can query the MTU and make weird decision according. Like setting the current as default or as a saved value or whatever. This flow obviously needs user input so VSDM is not the place to put the decision making. I have no idea :) For that case the engine can remember the current MTU and set it back. To sum up, I suggest ignoring any previously set value like we would ignore it if VDSM had set it. It makes no sense to keep it because the semantic of setting the MTU is to override the current configuration. As a side note, having verb to test max MTU for a path might be a good idea to give the engine\user a way to recommend a value to the user. That is better but not perfect :) - Original Message - From: Saggi Mizrahi smizr...@redhat.com To: Igor Lvovsky ilvov...@redhat.com Cc: VDSM Project Development vdsm-devel@lists.fedorahosted.org, Simon Grinberg si...@redhat.com Sent: Wednesday, November 28, 2012 11:23:52 AM Subject: Re: [vdsm] MTU setting according to ifcfg files. I don't want to keep the last configured MTU. It's problematic. Having a stack is even worse. VDSM should try not to persist anything if possible. Also, reverting to the last MTU is raceful and has weird corner cases. Best to just assume default it 1500 (Like all major OSs do). But since it's not really a default I would call it a recommended setting. - Original Message - From: Igor Lvovsky ilvov...@redhat.com To: Saggi Mizrahi smizr...@redhat.com Cc: VDSM Project Development vdsm-devel@lists.fedorahosted.org, Simon Grinberg si...@redhat.com Sent: Wednesday, November 28, 2012 11:10:27 AM Subject: Re: [vdsm] MTU setting according to ifcfg files. - Original Message - From: Saggi Mizrahi smizr...@redhat.com To: Simon Grinberg si...@redhat.com Cc: VDSM Project Development vdsm-devel@lists.fedorahosted.org, Igor Lvovsky ilvov...@redhat.com Sent: Wednesday, November 28, 2012 5:30:17 PM Subject: Re: [vdsm] MTU setting according to ifcfg files. I suggest we don't have a default. If you don't specify an MTU it will use whatever is already configured. There is no way to go back to the defaults only to set a new value. The engine can assume 1500 (in case of ethernet devices) is the recommended value
Re: [vdsm] MTU setting according to ifcfg files.
- Original Message - From: Alon Bar-Lev alo...@redhat.com To: Simon Grinberg si...@redhat.com Cc: VDSM Project Development vdsm-devel@lists.fedorahosted.org, Saggi Mizrahi smizr...@redhat.com, lpeer Livnat Peer lp...@redhat.com Sent: Wednesday, November 28, 2012 12:49:10 PM Subject: Re: [vdsm] MTU setting according to ifcfg files. - Original Message - From: Simon Grinberg si...@redhat.com To: Saggi Mizrahi smizr...@redhat.com, lpeer Livnat Peer lp...@redhat.com Cc: VDSM Project Development vdsm-devel@lists.fedorahosted.org Sent: Wednesday, November 28, 2012 7:37:48 PM Subject: Re: [vdsm] MTU setting according to ifcfg files. - Original Message - From: Saggi Mizrahi smizr...@redhat.com To: Simon Grinberg si...@redhat.com Cc: VDSM Project Development vdsm-devel@lists.fedorahosted.org Sent: Wednesday, November 28, 2012 7:15:35 PM Subject: Re: [vdsm] MTU setting according to ifcfg files. - Original Message - From: Simon Grinberg si...@redhat.com To: Saggi Mizrahi smizr...@redhat.com Cc: VDSM Project Development vdsm-devel@lists.fedorahosted.org, Barak Azulay bazu...@redhat.com, Igor Lvovsky ilvov...@redhat.com Sent: Wednesday, November 28, 2012 12:03:03 PM Subject: Re: [vdsm] MTU setting according to ifcfg files. - Original Message - From: Saggi Mizrahi smizr...@redhat.com To: Igor Lvovsky ilvov...@redhat.com Cc: VDSM Project Development vdsm-devel@lists.fedorahosted.org, Simon Grinberg si...@redhat.com, Barak Azulay bazu...@redhat.com Sent: Wednesday, November 28, 2012 6:49:22 PM Subject: Re: [vdsm] MTU setting according to ifcfg files. OK, I think I need to explain myself better, MTU sizes under 1500 are not interesting as they are only really valid for slow networks which will not be able to support virt workloads anyway. 1500 is internet MTU and is the recommended size when communicating with the outside world. MTU is just a size that has to be agreed upon by all participants in the chain. There is no inherent default MTU but default is technically 1500. Reverting to previous value makes no sense unless you are just testing something out. Yes it does, There are networks out there that do use MTU 1500 as weird as it sounds, It not weird at all, this is why MTU settings exist. But setting a low MTU will not break the network but will just have some performance degredation. this usually the admin does initial settings on the management network and then when you set don't touch all works well. An example is when you have storage and management on the same network. Now consider the scenario that for some VMs the user wants to limit to the 'normal/recommended defaults' so in this case he will have to set in the logical network property to MTU=1500. when VDSM sets this chain it supposedly won't touch the interface MTU since it's already bigger (if it does it's a bug). Now the user has one more logical network of VMs with 9000 since he also have VMs using shared storage on this network. All works well till now. But what about when removing the 9000 network? Will VDSM 'remember' that it did not touch the interface MTU in the first place, or will it try to set it to this recommended MTU?. It's a question of ownership. Because it's simpler I suggest we assume ownership and always set the maximum needed (also lowering if to high). The engine can query the MTU and make weird decision according. Like setting the current as default or as a saved value or whatever. This flow obviously needs user input so VSDM is not the place to put the decision making. I tend to agree, it's an ownership thing Engine should not allow mixed configuration of 'default vs override' on the same interface. If user wishes to start playing with MTUs he needs to use it carefully and across the board. VDSM should not bother with the issue at all, certainly not playing a guessing game. Livant, your 0.02$? This exactly the reason why we should either define completely stateless slave host, and apply configuration including what you call 'defaults'. Completely stateless is problematic because if the engine is down or unavailable and VDSM happens to restart you can't use any of your resources. The way forward is currently to get rid of most of the configuration in vdsm.conf. Only have things that are necessary for communication with the engine (eg. Core dump on\off, management interface\port, SSL on\off). Other VDSM configuration should have a an API introduced to set them and that will be persisted but only configurable
Re: [vdsm] [RFC]about the implement of text-based console
The best solution would of course be 3 (Or something similar that keeps the terminal state inside the VM memory so that migration works). Tunelling screen can do that but it requires having screen (or something similar) installed on the guest which is hard to do. But I think the more practical solution is 2 as it has semantics similar to VNC. Running a real ssh (ie. 1) is problematic because we have less control over the daemon and there are more vectors the user can try and use to break out of the sandbox. Further more, setting up sandboxes is a bit problematic ATM. I don't really understand 5. What does those methods return the virtio dev path? - Original Message - From: Zhou Zheng Sheng zhshz...@linux.vnet.ibm.com To: VDSM Project Development vdsm-devel@lists.fedorahosted.org Sent: Tuesday, November 27, 2012 4:22:20 AM Subject: Re: [vdsm] [RFC]about the implement of text-based console Hi all, For now in there is no agreement on the remote guest console solution, so I decide to do some investigation continue the discussion. Our goal VM serial console remote access in CLI mode. That means the client runs without X environment. Do you mean like running qemu with -curses? There are several proposals. 1. Sandboxed sshd VDSM runs a new host sshd instance in virtual machine/sandbox and redirects the virtio console to it. 2. Third-party sshd VDSM runs third-party sshd library/implementation and redirects virtio console to it. 3. Spice Extend spice to support console and implement a client to be run without GUI environment 4. oVirt shell - Engine - libvirt The user connects to Engine via oVirt CLI, then issues a serial-console command, then Engine locates the host and connect to the guest console. Currently there is a workaround, it invokes virsh -c qemu+tls://host/qemu console vmid from Engine side. 5. VDSM console streaming API VDSM exposes getConsoleReadStream() and getConsoleWriteStream() via XMLRPC binding. Then implement the related client in vdsClient and Engine Detailed discussion 1. Sandboxes Solution 1 and 2 allow users connect to console using their favorite ssh client. The login name is vmid, the password is set by setVmTicket() call of VDSM. The connection will be lost during migration. This is similar to VNC in oVirt. I take a look at several sandbox technologies, including libvirt-sandbox, lxc and selinux. a) libvirt-sandbox boots a VM using host kernel and initramfs, then passthru the host file system to the VM in read only mode. We can also add extra binding to the guest file system. It's very easy to use. To run shell in a VM, one can just issues virt-sandbox -c qemu:///session /bin/sh Then the VM will be ready in several seconds. However it will trigger some selinux violations. Currently there is no official support for selinux policy configuration from this project. In the project page this is put in the todo list. b) lxc utilize Linux container to run a process in sandbox. It needs to be configured properly. I find in the package lxc-templates there is an example configuration file for running sshd in lxc. c) sandbox command in the package policycoreutils-python makes use of selinux to run a process in sandbox, but there is no official or example policy files for sshd. In a word, for sandbox technologies, we have to configure the policies/file system binding/network carefully and test the compatibility with popular sshd implementations (openssh-server). When those sshd upgrade, the policy must be upgraded by us at the same time. Since the policies are not maintained by who implements sshd, this is a burden for us. Work to do Write and maintain the policies. Find ways for auth callback and redirecting data to openssh-server. pros Re-use existing pieces and technologies (host sshd, sandbox). User friendly, they can use existing ssh clients. cons Connection is lost in migration, this is not a big problem because 1) VNC connection share the same problem, 2) the user can reconnect manually. It's not easy to maintain the sandbox policies/file system binding/network for compatibility with sshd. 2. Third-party sshd implementations Almost the same as solution 1 but with better flexibility. VDSM can import a third-party sshd library and let that library deal with auth and transport. VDSM just have to implement the data forwarding. Many people consider this is insecure but I think the ticket solution for VNC is even not as secure as this. Currently most of us only trust openssh-server and think the quality of third-party sshd is low. I searched for a while and found twisted.conch from the popular twisted project. I'm not familiar with twisted.conch, but I still put it in this mail to collect opinions from potential twisted.conch experts. In a word, I prefer sandbox technologies to third-party sshd
[vdsm] When Zombies Attack
I'm starting to see more and more flows run a process and leave a thread waiting for it just to prevent a zombie attack. This is wasteful, even more so that this is usually done for process that might get stuck on IO and might take a while to come back. To solve this I implemented this little tidbit. http://gerrit.ovirt.org/#/c/8937/ And you can see it being used here: http://gerrit.ovirt.org/#/c/8907/ specifically: http://gerrit.ovirt.org/#/c/8907/5/vdsm/storage/remoteFileHandler.py That being said, I also want to suggest adding autoreaping to AsyncProc.__del__() with a warning printed to the log notifying about the (maybe) unintentional process leak. Comments and suggestions for improvement are most welcome. ___ vdsm-devel mailing list vdsm-devel@lists.fedorahosted.org https://lists.fedorahosted.org/mailman/listinfo/vdsm-devel
[vdsm] git-review
I've recently encountered more and more people not using the git-review tool and manually pushing their changes to Gerrit using raw git commands. Even though there is nothing wrong with doing things the hard way. I prefer not using an overly complicated error prone way to interact with Gerrit. Last I checked the version of git-review in Fedora is broken but I suggest using pip anyway as it is always synced with the master branch. Also, please use topics. Either use a BZ# or a topic codename (eg. live_migration, vdsm_api, nfs4_support) so people can skim the review list for topics they might want to review. Be careful, it automatically uses current the branch name (unless you use -t) so if you giving your branches funny names (I know I do) don't forget to manually specify a topic. More information. http://wiki.openstack.org/GerritWorkflow https://github.com/openstack-ci/git-review ___ vdsm-devel mailing list vdsm-devel@lists.fedorahosted.org https://lists.fedorahosted.org/mailman/listinfo/vdsm-devel
Re: [vdsm] new API verb: getVersionInfo()
currently getVdsCaps() does a lot of unrelated things most of them have no relation to capabilites. This was done because of HTTP overhead. Instead of calling multiple commands we will call one that does everything. I agree with the suggestion that getVdsCaps() will actually return the capabilities. Capabilities being: - storage core version supported domain formats - VM core version and supported host capabilites. - network core and capabilities. - etc... These all should be mostly static and set at boot. As to the query API. I personally dislike the idea of a bag API. Now that we are moving away from HTTP, call overhead is no longer an issue so we can have multiple verbs and call them sequentially. In actuality we already do. Internally getVdsCaps() just aggregates other APIs. This makes return values of the method easier to handle and makes changing the results of an API call not affect users that don't care about that change. This also has better performance as storage APIs tend to slow the response and sending multiple commands would mean that you can get the Network stats even though the storage server is down. - Original Message - From: Dan Kenigsberg dan...@redhat.com To: Adam Litke a...@us.ibm.com Cc: vdsm-devel@lists.fedorahosted.org, Michal Skrivanek mskri...@redhat.com Sent: Thursday, October 18, 2012 4:38:16 AM Subject: Re: [vdsm] new API verb: getVersionInfo() On Wed, Oct 17, 2012 at 10:07:43AM -0500, Adam Litke wrote: Thanks for posting your idea on the list here. I like the idea of a more fine-grained version query API. getVdsCapabilities has become too much of a catch-all and I agree that something lighter is useful. I do think vdsm will want to add a real capabilities mechanism and it could probably go here as well. As we work to make the vdsm API evolve in a stable, compatible manner, capabilities/feature-bits will come into play. Since you're proposing a structure return value, we can easily add the capabilities field to it in a future release, but it might make sense to have it there now to reduce client-side complexity of figuring out if the return value has a capabilities item. To avoid the bloat that we have with the current getVdsCapabilities API, I propose a simple format for the new capabilities: {'enum': 'Capabilities', 'data': ['StorageDomain_30', 'StorageDomain_22', 'Sanlock', ...]} and then add the following to the return type for your new API: 'capabilities': ['Capabilities'] This is essentially an expandable bitmask of features where a feature is present by its presense in the 'capabilities' array. This will be extensible by simply adding new capabilities to the enum as we find them to be necessary. Thoughts on this? The reason I am bringing it up now is it would be nice to restrict the pain of migrating to this new version API to just one time. I fully agree - that's what I've ment in my http://gerrit.ovirt.org/#/c/8431/4/vdsm_api/vdsmapi-schema.json comment on a bag of capability flags. On Wed, Oct 17, 2012 at 01:37:08PM +0200, Peter V. Saveliev wrote: … New verb proposal: getVersionInfo() Background Right now VDSM has only one possibility to discover the peer VDSM version — it is to call getVdsCapabilities(). All would be nice, but the verb does a lot of stuff, including disk I/O (rpm data query). It is a serious issue for high-loaded hosts, that can even trigger call timeout. Rationale Working in an environment with multiple VDSM versions, it is inevitable to fall in a simple choice: * always operate with one API, described once and forever * use different protocol versions. It is a common practice to reserve something in a protocol, that will represent the protocol version. Any protocols w/o version info sooner or later face the need to guess a version, that is much worse. On the other hand, involving rpm queries and CPU topology calculation into the protocol version discovery is an overkill. So the simplest way is to reserve a new verb for it. Usecases It can be used in the future in *any* VDSM communication that can expose version difference. Implementation Obviously, the usage of a new verb in the current release, e.g. RHEV-3.1 can be done only in try/catch way, 'cause RHEV-3.0 does not support it. But to be able to use it in RHEV-3.2, we should already have it in 3.1. Even if we will not use it yet, the future usecases are pretty straightforward. So pls comment it: http://gerrit.ovirt.org/#/c/8431/ -- Peter V. Saveliev ___ vdsm-devel mailing list vdsm-devel@lists.fedorahosted.org https://lists.fedorahosted.org/mailman/listinfo/vdsm-devel -- Adam Litke
Re: [vdsm] new API verb: getVersionInfo()
I don't see how pyinotify is even related to storage stats. It doesn't work with NFS and is a bit flaky when it comes to VFSs like proc or dev. Also it doesn't check liveness or latency so the events don't really give us anything useful. The data is being taken from cache. I assume there is a prepare call there that makes everything slower. This will only be fixed with new style domains that don't have a built in sdUUID. - Original Message - From: Vinzenz Feenstra vfeen...@redhat.com To: Itamar Heim ih...@redhat.com Cc: Saggi Mizrahi smizr...@redhat.com, Michal Skrivanek mskri...@redhat.com, vdsm-devel@lists.fedorahosted.org Sent: Thursday, October 18, 2012 3:15:47 PM Subject: Re: [vdsm] new API verb: getVersionInfo() On 10/18/2012 08:34 PM, Itamar Heim wrote: On 10/18/2012 06:03 PM, Saggi Mizrahi wrote: currently getVdsCaps() does a lot of unrelated things most of them have no relation to capabilites. This was done because of HTTP overhead. Instead of calling multiple commands we will call one that does everything. I agree with the suggestion that getVdsCaps() will actually return the capabilities. Capabilities being: - storage core version supported domain formats - VM core version and supported host capabilites. - network core and capabilities. - etc... These all should be mostly static and set at boot. As to the query API. I personally dislike the idea of a bag API. Now that we are moving away from HTTP, call overhead is no longer an issue so we can have multiple verbs and call them sequentially. In actuality we already do. Internally getVdsCaps() just aggregates other APIs. This makes return values of the method easier to handle and makes changing the results of an API call not affect users that don't care about that change. This also has better performance as storage APIs tend to slow the response and sending multiple commands would mean that you can get the Network stats even though the storage server is down. i thought getVdsCaps return the storage results from cache, which is refreshed by another thread, to make sure getVdsCaps has no latency. Well this is what it should do but it still doesn't do it. At least from what I have seen so far. I am currently working on a PoC implementation for caching packages and having so pyinotify based trigger for refreshing the cache. I plan to really cache everything and we'll have a background thread running for updating the cached data on changes. I will be sending the proposed solution for it to the list. So we can discuss it into more details. -- Regards, Vinzenz Feenstra Senior Software Engineer IRC: vfeenstr or evilissimo ___ vdsm-devel mailing list vdsm-devel@lists.fedorahosted.org https://lists.fedorahosted.org/mailman/listinfo/vdsm-devel
Re: [vdsm] API: Supporting internal/testing interfaces
Never expose such things through the API. I know that it is currently impossible to test the mailbox \ lvextend flow without a full blown VDSM running because of bad design but this doesn't imply we should expose testing interface through the main public API. - Original Message - From: Adam Litke a...@us.ibm.com To: vdsm-devel@lists.fedorahosted.org Cc: Dan Kenigsberg dan...@redhat.com, fsimo...@redhat.com, Saggi Mizrahi smizr...@redhat.com Sent: Wednesday, October 3, 2012 3:09:48 PM Subject: API: Supporting internal/testing interfaces Hi, A recent patch: http://gerrit.ovirt.org/#/c/8286/1 has brought up an important issue regarding the vdsm API and I would like to open up a discussion about how we should expose testing/internal interfaces in the next-generation vdsm API. The above change exposes an internal HSM verb 'sendExtendMsg' via the xmlrpc interface. There is no doubt that this is useful for testing and debugging the storage mailbox functionality. Until now, all new APIs were required to be documented in the vdsm api schema so that they can be properly exported to end users. But we don't really want end users to consume this particular API. How should we handle this? I see a few options: 1) Don't document the API and omit it from the schema. This is the patch's current approach. I do not favor this approach because eventually the xmlrpc server will be going away and then we will lose the ability to use this new debugging API. We need to decide how to support debugging interfaces going forward. 2) Expose it in the schema as a debugging API. This can be done by extending the symbol's dictionary with {'debug': True}. Initially, the API documentation and code generators can simply skip over these symbols. Later on, we could generate an independent libvdsm-debug.so library that includes these debugging APIs. Thoughts? -- Adam Litke a...@us.ibm.com IBM Linux Technology Center ___ vdsm-devel mailing list vdsm-devel@lists.fedorahosted.org https://lists.fedorahosted.org/mailman/listinfo/vdsm-devel
Re: [vdsm] API: Supporting internal/testing interfaces
My personal preference is using the VDSM debug hook to inject code to a running VDSM and dynamically add whatever you want. This means the code is part of the test and not VDSM. We used to use it (before the code rotted away) to add to VDSM the startCoverage() and endCoverage() verbs for tests. Another option is having the code in an optional RPM (similar to how debug hook is loaded only if it's installed) I might also accept unpythonic things like conditional compilation Asking people nicely not to use a method that might corrupt their data-center doesn't always work with good people not to mention bad ones. You could also just fix the design :) - Original Message - From: Federico Simoncelli fsimo...@redhat.com To: Saggi Mizrahi smizr...@redhat.com Cc: Dan Kenigsberg dan...@redhat.com, vdsm-devel@lists.fedorahosted.org, Adam Litke a...@us.ibm.com Sent: Wednesday, October 3, 2012 9:39:44 PM Subject: Re: API: Supporting internal/testing interfaces - Original Message - From: Saggi Mizrahi smizr...@redhat.com To: Adam Litke a...@us.ibm.com Cc: Dan Kenigsberg dan...@redhat.com, fsimo...@redhat.com, vdsm-devel@lists.fedorahosted.org Sent: Wednesday, October 3, 2012 9:27:02 PM Subject: Re: API: Supporting internal/testing interfaces Never expose such things through the API. I know that it is currently impossible to test the mailbox \ lvextend flow without a full blown VDSM running because of bad design but this doesn't imply we should expose testing interface through the main public API. Ok, given that in the future we'll have a proper design, what is the short term alternative to efficiently test the mailbox? You also completely dismissed Adam's proposal to ship these in a separate libvdsm-debug.so library. -- Federico - Original Message - From: Adam Litke a...@us.ibm.com To: vdsm-devel@lists.fedorahosted.org Cc: Dan Kenigsberg dan...@redhat.com, fsimo...@redhat.com, Saggi Mizrahi smizr...@redhat.com Sent: Wednesday, October 3, 2012 3:09:48 PM Subject: API: Supporting internal/testing interfaces Hi, A recent patch: http://gerrit.ovirt.org/#/c/8286/1 has brought up an important issue regarding the vdsm API and I would like to open up a discussion about how we should expose testing/internal interfaces in the next-generation vdsm API. The above change exposes an internal HSM verb 'sendExtendMsg' via the xmlrpc interface. There is no doubt that this is useful for testing and debugging the storage mailbox functionality. Until now, all new APIs were required to be documented in the vdsm api schema so that they can be properly exported to end users. But we don't really want end users to consume this particular API. How should we handle this? I see a few options: 1) Don't document the API and omit it from the schema. This is the patch's current approach. I do not favor this approach because eventually the xmlrpc server will be going away and then we will lose the ability to use this new debugging API. We need to decide how to support debugging interfaces going forward. 2) Expose it in the schema as a debugging API. This can be done by extending the symbol's dictionary with {'debug': True}. Initially, the API documentation and code generators can simply skip over these symbols. Later on, we could generate an independent libvdsm-debug.so library that includes these debugging APIs. Thoughts? ___ vdsm-devel mailing list vdsm-devel@lists.fedorahosted.org https://lists.fedorahosted.org/mailman/listinfo/vdsm-devel
[vdsm] sanlock issues
If you are trying to run sanlock on fedora and you get this error: Sep 23 11:26:56 dhcp-XX-XX.tlv.redhat.com sanlock[7083]: 2012-09-23 11:26:56+0200 37014 [7083]: wdmd connect failed for watchdog handling You need to do this: # unload softdog if it's running rmmod softdog # Check if there are residual watchdog files under /dev and remove them rm /dev/watchdog* # reload the softdog module modprobe softdog # make sure the file is named /dev/watchdog mv /dev/watchdog? /dev/watchdog # set the proper selinux context restorecon /dev/watchdog # restart wdmd systemctl restart wdmd.service # restart sanlock systemctl restart sanlock.service # Profit! fortune ___ vdsm-devel mailing list vdsm-devel@lists.fedorahosted.org https://lists.fedorahosted.org/mailman/listinfo/vdsm-devel
[vdsm] [RFC] Implied UUIDs in API
Hi, in the API a lot of IDs get passed around are UUIDs. The point is that as long as you are not the entity generating the UUIDs the fact that these are UUIDs have no real significance to you. I suggest removing the validation of UUIDs from the receiving end. There is no real reason to make sure these are real UUIDs. It's another restriction we can remove from the interface simplifying the code and the interface. Just to be clear I'm not saying that we should stop using UUIDs. For example, vdsm will keep generating task IDs as UUIDs. But the documentation will state that it could be *any* string value. If for some reason we choose to change the format of task IDs. There will be no need to change the interface. The same goes for VM IDs. Currently the engine uses UUIDs but there is no reason for VDSM to enforce this and limit the engine from ever changing it in the future and using other string values. ___ vdsm-devel mailing list vdsm-devel@lists.fedorahosted.org https://lists.fedorahosted.org/mailman/listinfo/vdsm-devel
Re: [vdsm] About vdsm rest api
Hi, the rest API is going to move to a 2nd tier API very soon. There is also a pretty big API restructuring going on so we could supply a supported and documented API. Is there a reason you are not using the ovirt-engine REST-API? - Original Message - From: Paolo Tonin paolo.to...@gmail.com To: vdsm-devel@lists.fedorahosted.org Sent: Thursday, August 16, 2012 10:00:41 PM Subject: [vdsm] About vdsm rest api Hi all there! Is there any documentation about implemented http rest api in the current version of VDSM (4.10) I would to use vdsm without oVirt packages, actually i'm using # rpm -qa vdsm* vdsm-xmlrpc-4.10.0-0.42.12.el6.noarch vdsm-4.10.0-0.42.12.el6.x86_64 vdsm-rest-4.10.0-0.42.12.el6.noarch vdsm-bootstrap-4.10.0-0.42.12.el6.noarch vdsm-python-4.10.0-0.42.12.el6.x86_64 vdsm-reg-4.10.0-0.42.12.el6.noarch vdsm-cli-4.10.0-0.42.12.el6.noarch Thanks a lot ___ vdsm-devel mailing list vdsm-devel@lists.fedorahosted.org https://lists.fedorahosted.org/mailman/listinfo/vdsm-devel ___ vdsm-devel mailing list vdsm-devel@lists.fedorahosted.org https://lists.fedorahosted.org/mailman/listinfo/vdsm-devel
Re: [vdsm] About vdsm rest api
reposting to list --- I know it might seem a bit of a cumbersome process. But installation is relatively simple nowadays and the API is a lot more stable and safe at the moment. Because VDSM doesn't keep any information on the host you will end up having to develop your own solutions to problems already solved by the ovirt-engine. You only need to use the VDSM API if you write your own management solution. But seeing as the ovirt-engine is open source there should be no reason not to just write the feature you need and push it upstream. - Original Message - From: Paolo Tonin paolo.to...@gmail.com To: Saggi Mizrahi smizr...@redhat.com Sent: Monday, August 20, 2012 11:33:50 AM Subject: Re: [vdsm] About vdsm rest api Yes, because i don't want to install oVirt engine (and subsequely entire oVirt web interface and DB) 2012/8/20 Saggi Mizrahi smizr...@redhat.com: Hi, the rest API is going to move to a 2nd tier API very soon. There is also a pretty big API restructuring going on so we could supply a supported and documented API. Is there a reason you are not using the ovirt-engine REST-API? - Original Message - From: Paolo Tonin paolo.to...@gmail.com To: vdsm-devel@lists.fedorahosted.org Sent: Thursday, August 16, 2012 10:00:41 PM Subject: [vdsm] About vdsm rest api Hi all there! Is there any documentation about implemented http rest api in the current version of VDSM (4.10) I would to use vdsm without oVirt packages, actually i'm using # rpm -qa vdsm* vdsm-xmlrpc-4.10.0-0.42.12.el6.noarch vdsm-4.10.0-0.42.12.el6.x86_64 vdsm-rest-4.10.0-0.42.12.el6.noarch vdsm-bootstrap-4.10.0-0.42.12.el6.noarch vdsm-python-4.10.0-0.42.12.el6.x86_64 vdsm-reg-4.10.0-0.42.12.el6.noarch vdsm-cli-4.10.0-0.42.12.el6.noarch Thanks a lot ___ vdsm-devel mailing list vdsm-devel@lists.fedorahosted.org https://lists.fedorahosted.org/mailman/listinfo/vdsm-devel ___ vdsm-devel mailing list vdsm-devel@lists.fedorahosted.org https://lists.fedorahosted.org/mailman/listinfo/vdsm-devel
[vdsm] Please Review
I have a bunch of patches going stale adding minor improvements: I would like to get reviews so they get pushed in. I know they contain code paths that are unused at the moment. But adding death signal to certain copy operations or using the permutation feature for testing could prove useful for other people while I'm working on my own patches. Everything that isn't WIP is ready to get pushed in. http://gerrit.ovirt.org/#/q/status:open+project:vdsm+branch:master+topic:repo_engine,n,z ___ vdsm-devel mailing list vdsm-devel@lists.fedorahosted.org https://lists.fedorahosted.org/mailman/listinfo/vdsm-devel
Re: [vdsm] How should we handle aborted tasks? via engine, vdsClient or both?
- Original Message - From: Lee Yarwood lyarw...@redhat.com To: Saggi Mizrahi smizr...@redhat.com Cc: vdsm-devel@lists.fedorahosted.org Sent: Wednesday, July 18, 2012 10:53:45 AM Subject: Re: [vdsm] How should we handle aborted tasks? via engine, vdsClient or both? On 07/18/2012 03:13 PM, Saggi Mizrahi wrote: We purposefully removed the ability to stop and aborted task from outside VDSM. It is one of the many features VDSM had (and still has) that could corrupt you data center if abused. Understood, however we also lack the ability to manually recover a task so is it just a case of waiting for VDSM to forcibly remove the aborted tasks itself? Yes, Task recovery isn't really that robust. We are working on a different approach for tasks that is more cluster aware. On a related note, this is the time that the 1st rule of VDSM didn't apply! This is one hell of a milestone! I would ask what the 1st rule of VDSM is but I fear I might wake up in a basement. Lee -- Lee Yarwood Software Maintenance Engineer Red Hat UK Ltd 200 Fowler Avenue IQ Farnborough, Farnborough, Hants GU14 7JP Registered in England and Wales under Company Registration No. 03798903 Directors: Michael Cunningham (USA), Brendan Lane (Ireland), Matt Parson(USA), Charlie Peters (USA) GPG fingerprint : A5D1 9385 88CB 7E5F BE64 6618 BCA6 6E33 F672 2D76 ___ vdsm-devel mailing list vdsm-devel@lists.fedorahosted.org https://fedorahosted.org/mailman/listinfo/vdsm-devel
Re: [vdsm] Verify the storage data integrity after some storage operations with test cases
Actually setting up isos and installing an OS is an overkill IMHO. Using libguestfs seems simpler as it has python bindings. What you could do is: 1. use libguest fs to format a file system on an image 2. Put files on said file system with libguestfs 3. Snapshot 4. run fsck with libguestfs 5. rinse 6. repeast If you don't trust fsck to detect all issues you can use libguestfs to get an md5sum of the raw drive and make sure that after a snapshot it stays the same. - Original Message - From: Shu Ming shum...@linux.vnet.ibm.com To: VDSM Project Development vdsm-devel@lists.fedorahosted.org Sent: Monday, July 16, 2012 10:28:25 PM Subject: [vdsm] Verify the storage data integrity after some storage operations with test cases Hi, To verify the storage data integrity after some storage operations like snapshot, merging by VDSM. Here are the test cases I am pondering. I would like to know your feedback about these thoughts. 1) An customized ISO image with the agent required prepared for bringing up a VM in VDSM 2) The test case will inform VDSM to create a VM from the customized ISO image 3) The test case will install an IO application to the VM 3) The test case communicate with the VDSM to inform the IO application in the VM to write some data intentionally. 4) The test case sends the commands to VDSM do some storage operation like disk snapshot, volume merging, etc. Say snapshot operation here for an example. 5) VDSM then tell the test case the result of the operation like the name of the snapshot. 6) Test case can read the snapshot made to verify the snapshot with the data written in 3). Note: currently, there is no tool to read the snapshot image directly. We can restart the VM with the snapshot as the active disk and tell the IO application in the VM to read the data writen before for test case. And test case can compare the data read with the data it informs the application in 3). 7) If the two data matches, the storage operation succeed or it fails. In order to write such a test case, these VDSM features will be required. 1) VDSM can create a VM from a specific ISO image (Almost works) 2) Test case can install an IO application to the VM by VDSM (by ovirt-agent?) 3) Test case must have some protocols with the IO application in VM for passing the command to the VM and returning the result from the VM to the test case(by ovirt-agent?). 4) The IO application can be seen as an test agent. We may extend the existing agent like ovirt-agent as the IO application. -- Shu Ming shum...@linux.vnet.ibm.com IBM China Systems and Technology Laboratory ___ vdsm-devel mailing list vdsm-devel@lists.fedorahosted.org https://fedorahosted.org/mailman/listinfo/vdsm-devel ___ vdsm-devel mailing list vdsm-devel@lists.fedorahosted.org https://fedorahosted.org/mailman/listinfo/vdsm-devel
[vdsm] A few notes about lists in Makefiles
Hi, I would just like to push a few notes to people modifying autoconf\automake lists Please make sure the lists are sorted. Sorted lists are easier to skim and modify. Also, unsorted lists are known to make Federico sad, and we all want to keep him happy because he is a pretty swell guy and the one that we actually have to thank for the amazing build system. Also please make sure to add the $(NULL) item so when auto sorting you don't need to check if you need to add\remove a backslash. VARIABLE = \ A \ B \ C \ $(NULL) If you are using vim you could just mark all the lines an run :!sort to sort. Also, when adding a file to the PEP_WHITELIST, check if you can just mark the entire directory instead of the individual file. Remember, cleanliness is next to godliness. ___ vdsm-devel mailing list vdsm-devel@lists.fedorahosted.org https://fedorahosted.org/mailman/listinfo/vdsm-devel
[vdsm] Repo Engine Initial code drop
http://gerrit.ovirt.org/#/q/status:open+project:vdsm+branch:master+topic:repo_engine,n,z Everything that isn't marked as WIP is ready for same hard-core review and commit action. The WIP parts are a bit rough around the edges expect: * Typos * spelling errors * grammar errors * Functions that are remnants of attempts which proved problematic * Lack of code reuse * Old docstrings that are no longer correct * Dragons A few notes: * I call the new storage domains image repositories because I think it creates less confusion and ambiguity. * VirtualDisks are writable entities you can run a VM off, snapshots a read only entities you can make Virtual Disks from. Images are a name for both disks and snapshots. * Only localfs is somewhat supported * The Image manipulation code is working and you can create images and snapshots to your hearts delight. It might even work!. * The check process detects all tree issues but there are only fixes for orphaned tags and volumes meaning you will be able to clean whole tree. * The APIs are not final * Documentation is sparse I'm trying to make all this code separate from the regular VDSM core so we can push it in even though it's not perfect and slowly build up from that. The biggest problem with integration is not having the blockdev feature in qemu and libvirt. This means that running more the one VM which use the same snapshot might corrupt the qcow file. https://bugzilla.redhat.com/show_bug.cgi?id=750801 https://bugzilla.redhat.com/show_bug.cgi?id=760547 If anyone wants to help find me on #VDSM @ freenode and we'll coordinate efforts. My current TODO list: 1. XML-RPC API integration --- Could be pushed in from this point on, as an experimental API 2. nfs repo engine (will introduce sanlock to the mix) 3. clustered-lvm repo engine (Will introduce SRM) 4. Tasks --- I expect to have the Most of the API finalized here 5. Fix operations (merging, conversions) 6. Live snapshot 7. Live Merge 8. Live Storage Migration 9. Profit? ___ vdsm-devel mailing list vdsm-devel@lists.fedorahosted.org https://fedorahosted.org/mailman/listinfo/vdsm-devel
Re: [vdsm] [RFC] An alternative way to provide a supported interface -- libvdsm
I'm sorry, but I don't really understand the drawing - Original Message - From: Shu Ming shum...@linux.vnet.ibm.com To: Adam Litke a...@us.ibm.com Cc: vdsm-devel@lists.fedorahosted.org Sent: Wednesday, July 11, 2012 10:24:49 AM Subject: Re: [vdsm] [RFC] An alternative way to provide a supported interface -- libvdsm Adam, Maybe, I don't fully understand your proposal. Here is my understanding of libvdsm in the picture. Please check the following link for the picture. http://www.ovirt.org/wiki/File:Libvdsm.JPG http://www.ovirt.org/wiki/File:Libvdsm.JPG On 2012-7-9 21:56, Adam Litke wrote: On Fri, Jul 06, 2012 at 03:53:08PM +0300, Itamar Heim wrote: On 07/06/2012 01:15 AM, Robert Middleswarth wrote: On 07/05/2012 04:45 PM, Adam Litke wrote: On Thu, Jul 05, 2012 at 03:47:42PM -0400, Saggi Mizrahi wrote: - Original Message - From: Adam Litke a...@us.ibm.com To: Saggi Mizrahi smizr...@redhat.com Cc: Anthony Liguori anth...@codemonkey.ws, VDSM Project Development vdsm-devel@lists.fedorahosted.org Sent: Thursday, July 5, 2012 2:34:50 PM Subject: Re: [RFC] An alternative way to provide a supported interface -- libvdsm On Wed, Jun 27, 2012 at 02:50:02PM -0400, Saggi Mizrahi wrote: The idea of having a supported C API was something I was thinking about doing (But I'd rather use gobject introspection and not schema generation) But the problem is not having a C API is using the current XML RPC API as it's base I want to disect this a bit to find out exactly where there might be agreement and disagreement. C API is a good thing to implement - Agreed. I also want to use gobject introspection but I don't agree that using glib precludes the use of a formalized schema. My proposal is that we write a schema definition and generate the glib C code from that schema. I agree that the _current_ xmlrpc API makes a pretty bad base from which to start a supportable API. XMLRPC is a perfectly reasonable remote/wire protocol and I think we should continue using it as a base for the next generation API. Using a schema will ensure that the new API is well-structured. There major problems with XML-RPC (and to some extent with REST as well) are high call overhead and no two way communication (push events). Basing on XML-RPC means that we will never be able to solve these issues. I am not sure I am ready to conceed that XML-RPC is too slow for our needs. Can you provide some more detail around this point and possibly suggest an alternative that has even lower overhead without sacrificing the ubiquity and usability of XML-RPC? As far as the two-way communication point, what are the options besides AMQP/ZeroMQ? Aren't these even worse from an overhead perspective than XML-RPC? Regarding two-way communication: you can write AMQP brokers based on the C API and run one on each vdsm host. Assuming the C API supports events, what else would you need? I personally think that using something like AMQP for inter-node communication and engine - node would be optimal. With a rest interface that just send messages though something like AMQP. I would also not dismiss AMQP so soon we want a bug with more than a single listener at engine side (engine, history db, maybe event correlation service). collectd as a means for statistics already supports it as well. I'm for having REST as well, but not sure as main one for a consumer like ovirt engine. I agree that a message bus could be a very useful model of communication between ovirt-engine components and multiple vdsm instances. But the complexities and dependencies of AMQP do not make it suitable for use as a low-level API. AMQP will repel new adopters. Why not establish a libvdsm that is more minimalist and can be easily used by everyone? Then AMQP brokers can be built on top of the stable API with ease. All AMQP should require of the low-level API are standard function calls and an events mechanism. Thanks Robert The current XML-RPC API contains a lot of decencies and inefficiencies and we would like to retire it as soon as we possibly can. Engine would like us to move to a message based API and 3rd parties want something simple like REST so it looks like no one actually wants to use XML-RPC. Not even us. I am proposing that AMQP brokers and REST APIs could be written against the public API. In fact, they need not even live in the vdsm tree anymore if that is what we choose. Core vdsm would only be responsible for providing libvdsm and whatever language bindings we want to support. If we take the libvdsm route, the only reason to even have a REST bridge is only to support OSes other then Linux which is something I'm not sure we care about at the moment. That might be true regarding the current in-tree
Re: [vdsm] [RFC] An alternative way to provide a supported interface -- libvdsm
I don't think AMQP is a good low level supported protocol as it's a very complex protocol to set up and support. Also brokers are known to have their differences in standard implementation which means supporting them all is a mess. It looks like the most accepted route is the libvirt route of having a c library abstracting away client server communication and having more advanced consumers build protocol specific bridges that may have different support standards. On a more personal note, I think brokerless messaging is the way to go in ovirt because, unlike traditional clustering, worker nodes are not interchangeable so direct communication is the way to go, rendering brokers pretty much useless. - Original Message - From: Adam Litke a...@us.ibm.com To: Itamar Heim ih...@redhat.com Cc: vdsm-devel@lists.fedorahosted.org Sent: Monday, July 9, 2012 9:56:17 AM Subject: Re: [vdsm] [RFC] An alternative way to provide a supported interface -- libvdsm On Fri, Jul 06, 2012 at 03:53:08PM +0300, Itamar Heim wrote: On 07/06/2012 01:15 AM, Robert Middleswarth wrote: On 07/05/2012 04:45 PM, Adam Litke wrote: On Thu, Jul 05, 2012 at 03:47:42PM -0400, Saggi Mizrahi wrote: - Original Message - From: Adam Litke a...@us.ibm.com To: Saggi Mizrahi smizr...@redhat.com Cc: Anthony Liguori anth...@codemonkey.ws, VDSM Project Development vdsm-devel@lists.fedorahosted.org Sent: Thursday, July 5, 2012 2:34:50 PM Subject: Re: [RFC] An alternative way to provide a supported interface -- libvdsm On Wed, Jun 27, 2012 at 02:50:02PM -0400, Saggi Mizrahi wrote: The idea of having a supported C API was something I was thinking about doing (But I'd rather use gobject introspection and not schema generation) But the problem is not having a C API is using the current XML RPC API as it's base I want to disect this a bit to find out exactly where there might be agreement and disagreement. C API is a good thing to implement - Agreed. I also want to use gobject introspection but I don't agree that using glib precludes the use of a formalized schema. My proposal is that we write a schema definition and generate the glib C code from that schema. I agree that the _current_ xmlrpc API makes a pretty bad base from which to start a supportable API. XMLRPC is a perfectly reasonable remote/wire protocol and I think we should continue using it as a base for the next generation API. Using a schema will ensure that the new API is well-structured. There major problems with XML-RPC (and to some extent with REST as well) are high call overhead and no two way communication (push events). Basing on XML-RPC means that we will never be able to solve these issues. I am not sure I am ready to conceed that XML-RPC is too slow for our needs. Can you provide some more detail around this point and possibly suggest an alternative that has even lower overhead without sacrificing the ubiquity and usability of XML-RPC? As far as the two-way communication point, what are the options besides AMQP/ZeroMQ? Aren't these even worse from an overhead perspective than XML-RPC? Regarding two-way communication: you can write AMQP brokers based on the C API and run one on each vdsm host. Assuming the C API supports events, what else would you need? I personally think that using something like AMQP for inter-node communication and engine - node would be optimal. With a rest interface that just send messages though something like AMQP. I would also not dismiss AMQP so soon we want a bug with more than a single listener at engine side (engine, history db, maybe event correlation service). collectd as a means for statistics already supports it as well. I'm for having REST as well, but not sure as main one for a consumer like ovirt engine. I agree that a message bus could be a very useful model of communication between ovirt-engine components and multiple vdsm instances. But the complexities and dependencies of AMQP do not make it suitable for use as a low-level API. AMQP will repel new adopters. Why not establish a libvdsm that is more minimalist and can be easily used by everyone? Then AMQP brokers can be built on top of the stable API with ease. All AMQP should require of the low-level API are standard function calls and an events mechanism. Thanks Robert The current XML-RPC API contains a lot of decencies and inefficiencies and we would like to retire it as soon as we possibly can. Engine would like us to move to a message based API and 3rd parties want something simple like REST so it looks like no one actually wants to use XML-RPC. Not even us. I am proposing that AMQP brokers and REST APIs could be written against the public API. In fact, they need not even live in the vdsm tree anymore if that is what we choose
Re: [vdsm] [RFC] An alternative way to provide a supported interface -- libvdsm
- Original Message - From: Itamar Heim ih...@redhat.com To: Saggi Mizrahi smizr...@redhat.com Cc: Adam Litke a...@us.ibm.com, vdsm-devel@lists.fedorahosted.org Sent: Monday, July 9, 2012 11:03:43 AM Subject: Re: [vdsm] [RFC] An alternative way to provide a supported interface -- libvdsm On 07/09/2012 05:56 PM, Saggi Mizrahi wrote: I don't think AMQP is a good low level supported protocol as it's a very complex protocol to set up and support. Also brokers are known to have their differences in standard implementation which means supporting them all is a mess. It looks like the most accepted route is the libvirt route of having a c library abstracting away client server communication and having more advanced consumers build protocol specific bridges that may have different support standards. On a more personal note, I think brokerless messaging is the way to go in ovirt because, unlike traditional clustering, worker nodes are not interchangeable so direct communication is the way to go, rendering brokers pretty much useless. but brokerless doesn't let multiple consumers which a bus provides? All consumers can connect to the host and *some* events can be broadcasted to all connected clients. The real question is weather you want to depend on AMQP's routing \ message storing Also, if you find it preferable to have a centralized host (single point of failure) to get all events from all hosts for the price of some clients (I assume read only clients) not needing to know the locations of all worker nodes. But IMHO we already have something like that, it's called the ovirt-engine, and it could send aggregated events about the cluster (maybe with some extra enginy data). The question is what does mandating a broker gives us something that an AMQP bridge wouldn't. The only thing I can think of is vdsm can assume unmoderated vdsm to vdsm communication bypassing the engine. This means that VDSM can have some clustered behavior that requires no engine intervention. Further more, the engine can send a request and let the nodes decide who is performing the operation among themselves. Essentially: [ engine ] [ engine ] | | VS | [vdsm][vdsm] [ broker ] | | [vdsm][vdsm] *All links are two way links This has dire consequences on API usability and supportability. So we need to converge on that. There needs to be a good reason why the aforementioned logic code can't sit on a another ovirt specific entity (lets call it ovirt-dynamo) that uses VDSM's supported API but it's own APIs (or more likely messaging algorithms) are unsupported. [engine ] ||| | [ broker ] | ||| | [vdsm]-[dynamo] : [dynamo]-[vdsm] Host A : Host B *All links are two way links - Original Message - From: Adam Litke a...@us.ibm.com To: Itamar Heim ih...@redhat.com Cc: vdsm-devel@lists.fedorahosted.org Sent: Monday, July 9, 2012 9:56:17 AM Subject: Re: [vdsm] [RFC] An alternative way to provide a supported interface -- libvdsm On Fri, Jul 06, 2012 at 03:53:08PM +0300, Itamar Heim wrote: On 07/06/2012 01:15 AM, Robert Middleswarth wrote: On 07/05/2012 04:45 PM, Adam Litke wrote: On Thu, Jul 05, 2012 at 03:47:42PM -0400, Saggi Mizrahi wrote: - Original Message - From: Adam Litke a...@us.ibm.com To: Saggi Mizrahi smizr...@redhat.com Cc: Anthony Liguori anth...@codemonkey.ws, VDSM Project Development vdsm-devel@lists.fedorahosted.org Sent: Thursday, July 5, 2012 2:34:50 PM Subject: Re: [RFC] An alternative way to provide a supported interface -- libvdsm On Wed, Jun 27, 2012 at 02:50:02PM -0400, Saggi Mizrahi wrote: The idea of having a supported C API was something I was thinking about doing (But I'd rather use gobject introspection and not schema generation) But the problem is not having a C API is using the current XML RPC API as it's base I want to disect this a bit to find out exactly where there might be agreement and disagreement. C API is a good thing to implement - Agreed. I also want to use gobject introspection but I don't agree that using glib precludes the use of a formalized schema. My proposal is that we write a schema definition and generate the glib C code from that schema. I agree that the _current_ xmlrpc API makes a pretty bad base from which to start a supportable API. XMLRPC is a perfectly reasonable remote/wire protocol and I think we should continue using it as a base for the next generation API. Using a schema will ensure that the new API is well-structured. There major problems with XML-RPC (and to some extent with REST as well) are high call overhead and no two way communication (push
Re: [vdsm] [RFC] An alternative way to provide a supported interface -- libvdsm
- Original Message - From: Adam Litke a...@us.ibm.com To: Saggi Mizrahi smizr...@redhat.com Cc: Anthony Liguori anth...@codemonkey.ws, VDSM Project Development vdsm-devel@lists.fedorahosted.org Sent: Thursday, July 5, 2012 4:45:08 PM Subject: Re: [RFC] An alternative way to provide a supported interface -- libvdsm On Thu, Jul 05, 2012 at 03:47:42PM -0400, Saggi Mizrahi wrote: - Original Message - From: Adam Litke a...@us.ibm.com To: Saggi Mizrahi smizr...@redhat.com Cc: Anthony Liguori anth...@codemonkey.ws, VDSM Project Development vdsm-devel@lists.fedorahosted.org Sent: Thursday, July 5, 2012 2:34:50 PM Subject: Re: [RFC] An alternative way to provide a supported interface -- libvdsm On Wed, Jun 27, 2012 at 02:50:02PM -0400, Saggi Mizrahi wrote: The idea of having a supported C API was something I was thinking about doing (But I'd rather use gobject introspection and not schema generation) But the problem is not having a C API is using the current XML RPC API as it's base I want to disect this a bit to find out exactly where there might be agreement and disagreement. C API is a good thing to implement - Agreed. I also want to use gobject introspection but I don't agree that using glib precludes the use of a formalized schema. My proposal is that we write a schema definition and generate the glib C code from that schema. I agree that the _current_ xmlrpc API makes a pretty bad base from which to start a supportable API. XMLRPC is a perfectly reasonable remote/wire protocol and I think we should continue using it as a base for the next generation API. Using a schema will ensure that the new API is well-structured. There major problems with XML-RPC (and to some extent with REST as well) are high call overhead and no two way communication (push events). Basing on XML-RPC means that we will never be able to solve these issues. I am not sure I am ready to conceed that XML-RPC is too slow for our needs. Can you provide some more detail around this point and possibly suggest an alternative that has even lower overhead without sacrificing the ubiquity and usability of XML-RPC? As far as the two-way communication point, what are the options besides AMQP/ZeroMQ? Aren't these even worse from an overhead perspective than XML-RPC? Regarding two-way communication: you can write AMQP brokers based on the C API and run one on each vdsm host. Assuming the C API supports events, what else would you need? If we plan to go with the libvdsm route the only transports I think are appropriate are either raw sockets (like libvirt) or ZMQ (just to take advantage of it managing connection and message encapsulation but it might be an overkill). Other then that ZMQ\AMQP\REST\XML-RPC bridges are not really a priority for me as engine will not be using any of the bridges. The current XML-RPC API contains a lot of decencies and inefficiencies and we would like to retire it as soon as we possibly can. Engine would like us to move to a message based API and 3rd parties want something simple like REST so it looks like no one actually wants to use XML-RPC. Not even us. I am proposing that AMQP brokers and REST APIs could be written against the public API. In fact, they need not even live in the vdsm tree anymore if that is what we choose. Core vdsm would only be responsible for providing libvdsm and whatever language bindings we want to support. If we take the libvdsm route, the only reason to even have a REST bridge is only to support OSes other then Linux which is something I'm not sure we care about at the moment. That might be true regarding the current in-tree implementation. However, I can almost guarantee that someone wanting to write a web GUI on top of standalone vdsm would want a REST API to talk to. But libvdsm makes this use case of no concern to the core vdsm developers. I do think that having C supportability in our API is a good idea, but the current API should not be used as the base. Let's _start_ with a schema document that describes today's API and then clean it up. I think that will work better than starting from scratch. Once my schema is written I will post it and we can 'patch' it as a community until we arrive at a 1.0 version we are all happy with. +1 Ok. Redoubling my efforts to get this done. Describing the output of list(True) takes awhile :) -- Adam Litke a...@us.ibm.com IBM Linux Technology Center ___ vdsm-devel mailing list vdsm-devel@lists.fedorahosted.org https://fedorahosted.org/mailman/listinfo/vdsm-devel
Re: [vdsm] [RFC] An alternative way to provide a supported interface -- libvdsm
The idea of having a supported C API was something I was thinking about doing (But I'd rather use gobject introspection and not schema generation) But the problem is not having a C API is using the current XML RPC API as it's base The current XML-RPC API contains a lot of decencies and inefficiencies and we would like to retire it as soon as we possibly can. Engine would like us to move to a message based API and 3rd parties want something simple like REST so it looks like no one actually wants to use XML-RPC. Not even us. I do think that having C supportability in our API is a good idea, but the current API should not be used as the base. - Original Message - From: Anthony Liguori anth...@codemonkey.ws To: VDSM Project Development vdsm-devel@lists.fedorahosted.org Cc: Adam Litke a...@us.ibm.com, Saggi Mizrahi smizr...@redhat.com Sent: Monday, June 25, 2012 10:18:33 AM Subject: [RFC] An alternative way to provide a supported interface -- libvdsm Hi, I've been reading through the API threads here and considering the options. To be honest, I worry a lot about the scope of these discussions and that there's a tremendous amount of work before we have a useful end result. I wonder if we can solve this problem by adding another layer of abstraction... As Adam is currently building a schema for VDSM's XML-RPC, we could use the QAPI code generators to build a libvdsm that provided a programmatic C interface for the XML-RPC interface. It would take some tweaking, but this could be made a supportable C interface. The rules for having a supportable C interface are basically: 1) Never change function signatures 2) Never remove functions 3) Always allocate structures in the library and/or pad 4) Only add to structures, never remove or reorder 5) Provide flags that default to zero to indicate that fields/features are not present. 6) Always zero-initialize structures Having a libvdsm would allow the transport to change over time w/o affecting end-users. There are lots of good tools for documenting C APIs and dealing with versioning of C APIs. While we can start out with a schema-generated API, over time, we can implement libvdsm in an open-coded fashion allowing old APIs to be reimplemented in terms of new APIs. From a compatibility perspective, libvdsm would be fully backwards compatible with old versions of VDSM (so it would keep XML-RPC support forever) but may require new versions of libvdsm to talk to new versions of VDSM. That would allow for APIs to be deprecated within VDSM without breaking old clients. I think this would be an incremental approach to building a supportable API today while still giving the flexibility to make changes in the long term. And it should be fairly easy to generate a JNI binding and also port ovirt-engine to use an interface like this (since it already uses the XML-RPC API). Regards, Anthony Liguori ___ vdsm-devel mailing list vdsm-devel@lists.fedorahosted.org https://fedorahosted.org/mailman/listinfo/vdsm-devel
Re: [vdsm] [virt-node] RFC: API Supportability
I tired to sum everything in the wiki page [1] Please review the page and see if there is something I missed or that you don't agree with. - Original Message - From: Adam Litke a...@us.ibm.com To: Dan Kenigsberg dan...@redhat.com Cc: Saggi Mizrahi smizr...@redhat.com, VDSM Project Development vdsm-devel@lists.fedorahosted.org, Daniel Veillard veill...@redhat.com, Anthony Liguori aligu...@redhat.com Sent: Thursday, June 21, 2012 10:41:36 AM Subject: Re: [vdsm] [virt-node] RFC: API Supportability On Thu, Jun 21, 2012 at 01:20:40PM +0300, Dan Kenigsberg wrote: On Wed, Jun 20, 2012 at 10:42:16AM -0500, Adam Litke wrote: On Tue, Jun 19, 2012 at 10:17:28AM -0400, Saggi Mizrahi wrote: I've opened a wiki page [1] for the stable API and extracted some of the TODO points so we don't forget. Everyone can feel free to add more stuff. [1] http://ovirt.org/wiki/VDSM_Stable_API_Plan Rest of the comments inline - Original Message - From: Adam Litke a...@us.ibm.com To: Saggi Mizrahi smizr...@redhat.com Cc: VDSM Project Development vdsm-devel@lists.fedorahosted.org, Barak Azulay bazu...@redhat.com, Itamar Heim ih...@redhat.com, Ayal Baron aba...@redhat.com, Anthony Liguori aligu...@redhat.com Sent: Monday, June 18, 2012 12:23:10 PM Subject: Re: [virt-node] RFC: API Supportability On Mon, Jun 18, 2012 at 11:02:25AM -0400, Saggi Mizrahi wrote: The first thing we need to decide is API supportabiliy. I'll list the questions that need to be answered. The decision made here will have great effect on transport selection (espscially API change process and versioning) so try and think about this without going to specfic technicalities (eg. X can't be done on REST). Thanks for sending this out. I will take a crack at these questions... I would like to pose an additional question to be answered: - Should API parameter and return value constraints be formally defined? If so, how? Think of this as defining an API schema. For example: When creating a VM, which parameters are required/optional? What are the valid formats for specifying a VM disk? What are all of the possible task states? Has to be part of response to the call that retrieves the state. This will allow us to change the states in a BC manner. I am not sure I agree. I think it should be a part of the schema but not transmitted along with each API response involving a task. This would increase traffic and make responses unnecessarily verbose. Is there a maximum length for the storage domain description? I totally agree, how depends on the transport of choice but in any case I think the definition should be done in a declarative manner (XML\JSON) using concrete types (important for binding with C\Java) and have some *code to enforce* that the input is correct. This will prevent clients from not adhering to the schema exploiting python's relative lax approach to types. We already had issues with the engine wrongly sending numbers as strings and having this break internally because of some change in the python code made it not handle the conversion very well. Our schema should fully define a set of simple types and complex types. Each defined simple type will have an internal validation function to verify conformity of a given input. Complex types consist of nested lists and dicts of simple types. They are validated first by validating members as simple types and then checking for missing and/or extra data. When designing a dependable API, we should not desert our agility. ovirt-Engine has enjoyed the possibility of saying hey, we want another field reported in getVdsStats and presto, here it was. Complex types should be easily extendible (with a proper update of the API minor version, or a capabilities set). +1 -- Adam Litke a...@us.ibm.com IBM Linux Technology Center ___ vdsm-devel mailing list vdsm-devel@lists.fedorahosted.org https://fedorahosted.org/mailman/listinfo/vdsm-devel
Re: [vdsm] [virt-node] RFC: API Supportability
I've opened a wiki page [1] for the stable API and extracted some of the TODO points so we don't forget. Everyone can feel free to add more stuff. [1] http://ovirt.org/wiki/VDSM_Stable_API_Plan Rest of the comments inline - Original Message - From: Adam Litke a...@us.ibm.com To: Saggi Mizrahi smizr...@redhat.com Cc: VDSM Project Development vdsm-devel@lists.fedorahosted.org, Barak Azulay bazu...@redhat.com, Itamar Heim ih...@redhat.com, Ayal Baron aba...@redhat.com, Anthony Liguori aligu...@redhat.com Sent: Monday, June 18, 2012 12:23:10 PM Subject: Re: [virt-node] RFC: API Supportability On Mon, Jun 18, 2012 at 11:02:25AM -0400, Saggi Mizrahi wrote: The first thing we need to decide is API supportabiliy. I'll list the questions that need to be answered. The decision made here will have great effect on transport selection (espscially API change process and versioning) so try and think about this without going to specfic technicalities (eg. X can't be done on REST). Thanks for sending this out. I will take a crack at these questions... I would like to pose an additional question to be answered: - Should API parameter and return value constraints be formally defined? If so, how? Think of this as defining an API schema. For example: When creating a VM, which parameters are required/optional? What are the valid formats for specifying a VM disk? What are all of the possible task states? Has to be part of response to the call that retrieves the state. This will allow us to change the states in a BC manner. Is there a maximum length for the storage domain description? I totally agree, how depends on the transport of choice but in any case I think the definition should be done in a declarative manner (XML\JSON) using concrete types (important for binding with C\Java) and have some *code to enforce* that the input is correct. This will prevent clients from not adhering to the schema exploiting python's relative lax approach to types. We already had issues with the engine wrongly sending numbers as strings and having this break internally because of some change in the python code made it not handle the conversion very well. New API acceptance process == - What is the process to suggest new API calls? New APIs should be proposed on a mailing list. This allows everyone to participate in the conversation and preserves the comments. If the API is simple, a patch the provides a concrete example of implementation is recommended. Once the API design is agreed upon, patches can be submitted via the standard method (gerrit) to implement a new experimental API based on the design. The submitter of the patches should reply to the design discussion thread to notify participants of the available code. +1 - Who can ack such a change? API changes should be subject to wider approval than a simple change to an internal component. I believe that the +1,-1 system works well here and we should seek approvals from all participants in the design discussion if possible. I will add that you need a +1 from at least 2 maintainers for an API change. Also someone has to test that the change did not break old clients. - Does someone have veto rights? Anyone can NACK an API design. Same rules as for normal code. +1 - Are there experimental APIs? Yes! Dave Allan has mentioned that from his experience in libvirt, it would be very nice to have experimental APIs that can be improved before being baked into the supportable API. I definitely agree. In fact, all new APIs should go through a period of being experimental. Experimental API functions should begin with '_'. Once deemed stable, the '_' can be removed. I don't like this specific mangling scheme but I do agree that we need experimental calls. I also think that you need a mechanism to turn them on similar to `import __future__` in python so that you make sure API user knows it's experimental. API deprecation process === - Do we allow deprecation? I would like to allow deprecation because it grants us an avenue to clean up the API from time to time. That being said, I am not aware of a clean way to do this without breaking old clients too badly. At a minimum, an API would need to be deprecated for at least 2 years before it can be removed. How will this decision influence the initial API design? Are there features that we can build into an API that can ease the burden of deprecation on API consumers? Deprecation is tricky. We also need a mechanism for a client to know that his version of the API no longer exists so it can check for that at host connection and fail if the client is to old. To do that we could either have API group versions and expose which versions are supported in full. We could also take the opengl route of querying for call. But this might
Re: [vdsm] [virt-node] VDSM as a general purpose virt host manager
- Original Message - From: Ryan Harper ry...@us.ibm.com To: Saggi Mizrahi smizr...@redhat.com Cc: Ryan Harper ry...@us.ibm.com, VDSM Project Development vdsm-devel@lists.fedorahosted.org, Anthony Liguori aligu...@redhat.com Sent: Tuesday, June 19, 2012 9:30:08 AM Subject: Re: [vdsm] [virt-node] VDSM as a general purpose virt host manager * Saggi Mizrahi smizr...@redhat.com [2012-06-18 16:09]: Ryan, thanks for commenting. Sadly I feel that your points, though important, are a bit of a digression from the main discussion. Internal architectural changes to VDSM are out of the scope as this should be done on a very tight schedual. I don't think I was suggesting internal architectural changes. I may not yet be familiar enough with to code to understand that modifying the exist API will result in architectural changes. I do worry about what we expect to accomplish here if we have a tight schedule and also include the idea of general purpose virt host manager. Maybe your opening was too wide for the specific purpose you were intending (your numbered list). If you're strictly focused on something around Fedora18 timeline wise, I would agree that there isn't much runway to make big changes. With that in mind, I'd say we need to add a topic to your list: 5. API versioning and deprecation This is part of the supportability discussion. Please join in if you have something to add. The supportability email was sent to the list as well. I believe you've got a number of questions in this space on your other thread so I'll move over there. This is going to be a critical dicussion on how we move forward. Seeing as this is a pretty good list of things that need to be done\discussed in VDSM anyway. I took the liberty of putting them in a wiki page [1] so we don't forget and others can add\comment on the ideas. Thanks. In any case you can feel free to raise those issues on the list separately. Specifically, 3rd party plugins might be very topical with the undergoing gluster integration work. [1] http://www.ovirt.org/wiki/VDSM_Potential_Features - Original Message - From: Ryan Harper ry...@us.ibm.com To: Saggi Mizrahi smizr...@redhat.com Cc: VDSM Project Development vdsm-devel@lists.fedorahosted.org, Anthony Liguori aligu...@redhat.com Sent: Monday, June 18, 2012 3:43:42 PM Subject: Re: [vdsm] [virt-node] VDSM as a general purpose virt host manager * Saggi Mizrahi smizr...@redhat.com [2012-06-18 10:05]: I would like to put on to the table for descussion the growing need for a way to more easily reuse of the functionality of VDSM in order to service projects other than Ovirt-Engine. Originally VDSM was created as a proprietary agent for the sole purpose of serving the then proprietary version of what is known as ovirt-engine. Red Hat, after acquiring the technology, pressed on with it's commitment to open source ideals and released the code. But just releasing code into the wild doesn't build a community or makes a project successful. Further more when building open source software you should aspire to build reusable components instead of monolithic stacks. Saggi, Thanks for sending this out. I've been trying to pull together some thoughts on what else is needed for vdsm as a community. I know that for some time downstream has been the driving force for all of the work and now with a community there are challenges in finding our own way. While we certainly don't want to make downstream efforts harder, I think we need to develop and support our own vision for what vdsm can be come, some what independent of downstream and other exploiters. Revisiting the API is definitely a much needed endeavor and I think adding some use-cases or sample applications would be useful in demonstrating whether or not we're evolving the API into something easier to use for applications beyond engine. We would like to expose a stable, documented, well supported API. This gives us a chance to rethink the VDSM API from the ground up. There is already work in progress of making the internal logic of VDSM separate enough from the API layer so we could continue feature development and bug fixing while designing the API of the future. In order to achieve this though we need to do several things: 1. Declare API supportability guidelines 2. Decide on an API transport (e.g. REST, ZMQ, AMQP) 3. Make the API easily consumable (e.g. proper docs, example code, extending the API, etc) 4. Implement the API itself I agree with the list, but I'd like to work on the redesign discussion so that we're not doing all
[vdsm] My VDSM delopment workflow
People asking me about my code\test cycle so I decided to just have a small writeup on the list. Always test on latest stable fedora and RHEL, put yum upgrade -y as a nightly cron command! My development storage is a FreeNAS VM (But I will be moving to f17+lio when I find the time to configure everything and they implement CHAP auth) For things that have to be tested with a full blown VDSM install I have a script* that pulls from my git HEAD and installs it on the host. I don't use rsync because the timestamps confuse Make and might cause my local fedora files to be packaged by mistake. I also delete the RPMs and clean install new ones each time. It takes longer and invokes a libvirt reconfigure each time but it caches a lot of elusive errors that QE often miss. Always use yum! The rpm command is much less robust. This script is meant to work on EL\Fedora that just has git and multipath installed, and a cloned repo in which your development machine is the origin. Also note I explicitly clean /usr/share/vdsm because locally changed files don't have the same MD5 hash and might not be removed\replaced by yum. It takes quite a while but it just makes writing unit tests that much more appealing :) Make sure you have sudo rights to the appropriate commands I hope people find this helpful. --- #!/bin/bash # Get git root PROJ_GIT_DIR=$(git rev-parse --git-dir | xargs dirname | xargs readlink -f) pushd $PROJ_GIT_DIR . # Make sure autotools and other basic dependencies are installed sudo yum install -y automake autoconf gcc rpm-build pyflakes # Fetch remote head git co HEAD~ /dev/null git fetch -f origin HEAD:testhead git co testhead # Remove old RPMs rm -rvf ~/rpmbuild # Build ./autogen.sh --system ./configure # Install build requirements grep BuildRequires vdsm.spec | awk {'print $2'} | \ xargs sudo yum install -y make clean make rpm || exit 1 # Stop VDSM sudo /sbin/service vdsmd stop # Clean RPMs rpm -qa | grep vdsm | xargs sudo yum remove -y # Clean any local edits sudo rm -rf /usr/share/vdsm # Install new RPMs ls ~/rpmbuild/RPMS/*/*.rpm | grep -v faqemu | grep -v hook | grep -v reg | \ grep -v bootstrap | xargs sudo yum localinstall --nogpgcheck -y popd # Start VDSM sudo /sbin/service vdsmd start # VDSM has a long standing issues with reporting OK on start when it's actually # down service vdsmd status ___ vdsm-devel mailing list vdsm-devel@lists.fedorahosted.org https://fedorahosted.org/mailman/listinfo/vdsm-devel
Re: [vdsm] [virt-node] VDSM as a general purpose virt host manager
- Original Message - From: Deepak C Shetty deepa...@linux.vnet.ibm.com To: Ryan Harper ry...@us.ibm.com Cc: Saggi Mizrahi smizr...@redhat.com, Anthony Liguori aligu...@redhat.com, VDSM Project Development vdsm-devel@lists.fedorahosted.org Sent: Tuesday, June 19, 2012 10:58:47 AM Subject: Re: [vdsm] [virt-node] VDSM as a general purpose virt host manager On 06/19/2012 01:13 AM, Ryan Harper wrote: * Saggi Mizrahismizr...@redhat.com [2012-06-18 10:05]: I would like to put on to the table for descussion the growing need for a way to more easily reuse of the functionality of VDSM in order to service projects other than Ovirt-Engine. Originally VDSM was created as a proprietary agent for the sole purpose of serving the then proprietary version of what is known as ovirt-engine. Red Hat, after acquiring the technology, pressed on with it's commitment to open source ideals and released the code. But just releasing code into the wild doesn't build a community or makes a project successful. Further more when building open source software you should aspire to build reusable components instead of monolithic stacks. Saggi, Thanks for sending this out. I've been trying to pull together some thoughts on what else is needed for vdsm as a community. I know that for some time downstream has been the driving force for all of the work and now with a community there are challenges in finding our own way. While we certainly don't want to make downstream efforts harder, I think we need to develop and support our own vision for what vdsm can be come, some what independent of downstream and other exploiters. Revisiting the API is definitely a much needed endeavor and I think adding some use-cases or sample applications would be useful in demonstrating whether or not we're evolving the API into something easier to use for applications beyond engine. We would like to expose a stable, documented, well supported API. This gives us a chance to rethink the VDSM API from the ground up. There is already work in progress of making the internal logic of VDSM separate enough from the API layer so we could continue feature development and bug fixing while designing the API of the future. In order to achieve this though we need to do several things: 1. Declare API supportability guidelines 2. Decide on an API transport (e.g. REST, ZMQ, AMQP) 3. Make the API easily consumable (e.g. proper docs, example code, extending the API, etc) 4. Implement the API itself I agree with the list, but I'd like to work on the redesign discussion so that we're not doing all of 1-4 around the existing API that's engine-focused. I'm over due for posting a feature page on vdsm standalone mode, and I have some other thoughts on various uses. Some other paths of thought for use-cases I've been mulling over: - Simplifying using QEMU/KVM - consuming qemu via command line - can we manage/support developers launching qemu directly - consuming qemu via libvirt - can we integrate with systems that are already using libvirt - Addressing issues with libvirt - are there kvm specific features we can exploit that libvirt doesn't? - Scale-up/fail-over - can we support a single vdsm node, but allow for building up clusters/groups without bringing in something like ovirt-engine - can we look at decentralized fail-over for reliability without a central mgmt server? - pluggability - can we support an API that allows for third-party plugins to support new features or changes in implementation? Pluggability feature would be nice. Even nicer would be the ability to introspect and figure whats supported by VDSM. For eg: It would be nice to query what plugins/capabilities are supported and accordingly the client can take a decision and/or call the appropriate APIs w/o worrying about ENOTSUPP kind of error. It does becomes blur when we talk about Repository Engines... that was also targetted to provide pluggaibility in managing Images.. how will that co-exist with API level pluggability ? IIUC, StorageProvisioning (via libstoragemgmt) can be one such optional support that can fit as a plug-in nicely, right ? You will have have an introspective verb to get supported storage engines. Without the engine the hosts will not be able to log in to an image repo but it will not be an API level error. You will get UnsupportedRepoFormatError or something similar no matter which version of VDSM you use. The error is part of the interface and engines will expose their format and parameter in some way. - kvm tool integration
[vdsm] [virt-node] RFC: API Supportability
The first thing we need to decide is API supportabiliy. I'll list the questions that need to be answered. The decision made here will have great effect on transport selection (espscially API change process and versioning) so try and think about this without going to specfic technicalities (eg. X can't be done on REST). New API acceptance process == - What is the process to suggest new API calls? - Who can ack such a change? - Does someone have veto rights? - Are there experimental APIs? API deprecation process === - Do we allow deprecation? - When can an API call be deprecated? - Who can ack such a change? - Does someone have veto rights? API change process == - Can calls be modified or no symbol can ever repeat in a different form - When can an API call be deprecated? - Who can ack such a change? - Does someone have veto rights? API versioning == - Is the API versioned as a whole, is it per subsystem (storage, networking, etc..) or is each call versioned by itself. - What happens when old clients connects to a newer server. - What happens when a new client connects to an older sever. - How will versioning be expressed in the bindings? - Do we retrict newer clients from using old APIs when talking with a new server? ___ vdsm-devel mailing list vdsm-devel@lists.fedorahosted.org https://fedorahosted.org/mailman/listinfo/vdsm-devel
[vdsm] [virt-node] VDSM as a general purpose virt host manager
I would like to put on to the table for descussion the growing need for a way to more easily reuse of the functionality of VDSM in order to service projects other than Ovirt-Engine. Originally VDSM was created as a proprietary agent for the sole purpose of serving the then proprietary version of what is known as ovirt-engine. Red Hat, after acquiring the technology, pressed on with it's commitment to open source ideals and released the code. But just releasing code into the wild doesn't build a community or makes a project successful. Further more when building open source software you should aspire to build reusable components instead of monolithic stacks. We would like to expose a stable, documented, well supported API. This gives us a chance to rethink the VDSM API from the ground up. There is already work in progress of making the internal logic of VDSM separate enough from the API layer so we could continue feature development and bug fixing while designing the API of the future. In order to achieve this though we need to do several things: 1. Declare API supportability guidelines 2. Decide on an API transport (e.g. REST, ZMQ, AMQP) 3. Make the API easily consumable (e.g. proper docs, example code, extending the API, etc) 4. Implement the API itself All of these are dependent on one another and the permutations are endless. This is why I think we should try and work on each one separately. All discussions will be done openly on the mailing list and until the final version comes out nothing is set in stone. If you think you have anything to contribute to this process, please do so either by commenting on the discussions or by sending code/docs/whatever patches. Once the API solidifies it will be quite difficult to change fundamental things, so speak now or forever hold your peace. Note that this is just an introductory email. There will be a quick follow up email to kick start the discussions. ___ vdsm-devel mailing list vdsm-devel@lists.fedorahosted.org https://fedorahosted.org/mailman/listinfo/vdsm-devel
Re: [vdsm] vdsm vs selinux
Do you have an AVC denial in the audit log? What does it say? (Please run sealert -a FILE and put the resolved text along with the original AVC denail) Are you using NFS\localfs\SAN? What are the credentials and contexts of the files in question? Have you recently turned selinux on\off? Did you upgrade the OS or selinux policy? What is the libvirt version? - Original Message - From: Laszlo Hornyak lhorn...@redhat.com To: vdsm-devel@lists.fedorahosted.org Sent: Monday, June 18, 2012 11:13:37 AM Subject: [vdsm] vdsm vs selinux hi, I am running the latest VDSM (built from git repo) on rhel 6.2 and looks like it has some issues with selinux. setenforce 0 solves the problem, but is there a proper solution under way? Traceback (most recent call last): File /usr/share/vdsm/vm.py, line 570, in _startUnderlyingVm self._run() File /usr/share/vdsm/libvirtvm.py, line 1364, in _run self._connection.createXML(domxml, flags), File /usr/lib64/python2.6/site-packages/vdsm/libvirtconnection.py, line 82, in wrapper ret = f(*args, **kwargs) File /usr/lib64/python2.6/site-packages/libvirt.py, line 2490, in createXML if ret is None:raise libvirtError('virDomainCreateXML() failed', conn=self) libvirtError: internal error Process exited while reading console log output: char device redirected to /dev/pts/2 qemu-kvm: -drive file=/rhev/data-center/8c369da4-b3a0-11e1-9db0-273609afe0b1/efef4a96-16b1-4f14-a252-f33c7a8ce52b/images/40d2cc3a-9e9c-4224-af6f-2450efc883ca/e84617c5-8073-46de-85bd-2497235a5ba2,if=none,id=drive-virtio-disk0,format=raw,serial=40d2cc3a-9e9c-4224-af6f-2450efc883ca,cache=none,werror=stop,rerror=stop,aio=threads: could not open disk image /rhev/data-center/8c369da4-b3a0-11e1-9db0-273609afe0b1/efef4a96-16b1-4f14-a252-f33c7a8ce52b/images/40d2cc3a-9e9c-4224-af6f-2450efc883ca/e84617c5-8073-46de-85bd-2497235a5ba2: Permission denied Thank you, Laszlo ___ vdsm-devel mailing list vdsm-devel@lists.fedorahosted.org https://fedorahosted.org/mailman/listinfo/vdsm-devel ___ vdsm-devel mailing list vdsm-devel@lists.fedorahosted.org https://fedorahosted.org/mailman/listinfo/vdsm-devel
Re: [vdsm] [virt-node] VDSM as a general purpose virt host manager
To the question of What blocks us using the current VDSM API?. The main issue is supportability, This is also why it's the first point of discussion. The current API has no supportability guidelines and there is no way we could support it for the long run. Further more the current API, apart from being outdated is highly engine-specific. A lot of the decisions are related to VDSM having to slow down development to accommodate the slower pace of movement of the giant the is the ovirt-engine. This means, for example having confusing verbs and argument names (eg. destroy, and iscsi portals). Having redundant steps in the setup of things (eg. storage domain creation). Arbitrary limitations (eg. storage pools, iso\export domains), etc. In order to give a well supported API, we need to think about what we expose and how we expose it. Every verb should be thoroughly examined. This was not the case when the original API was created because, as I already noted, it was built to a case where the supported API is at the Engine level and not the VDSM level. This made creating\removing verbs a lot quicker and less expensive, accepting things we knew were not ideal with the knowledge we can change them the next version. This cannot be the case with a supported public API. - Original Message - From: Deepak C Shetty deepa...@linux.vnet.ibm.com To: Saggi Mizrahi smizr...@redhat.com Cc: VDSM Project Development vdsm-devel@lists.fedorahosted.org, Anthony Liguori aligu...@redhat.com Sent: Monday, June 18, 2012 1:35:21 PM Subject: Re: [vdsm] [virt-node] VDSM as a general purpose virt host manager On 06/18/2012 08:32 PM, Saggi Mizrahi wrote: I would like to put on to the table for descussion the growing need for a way to more easily reuse of the functionality of VDSM in order to service projects other than Ovirt-Engine. Originally VDSM was created as a proprietary agent for the sole purpose of serving the then proprietary version of what is known as ovirt-engine. Red Hat, after acquiring the technology, pressed on with it's commitment to open source ideals and released the code. But just releasing code into the wild doesn't build a community or makes a project successful. Further more when building open source software you should aspire to build reusable components instead of monolithic stacks. Can you list issues that block tools (other than ovirt-engine) in using VDSM ? That will help provide more clarity and scope of work described here. I understand the lack of REST API, which is where Adam's work comes in. With REST API support for vdsm, other tools can integrate upwardly with VDSM and exploit it. What else ? How does the current API layer design/implementation inhibit tools other than ovirt-engine to use VDSM ? We would like to expose a stable, documented, well supported API. This gives us a chance to rethink the VDSM API from the ground up. There is already work in progress of making the internal logic of VDSM separate enough from the API layer so we could continue feature development and bug fixing while designing the API of the future. In order to achieve this though we need to do several things: 1. Declare API supportability guidelines 2. Decide on an API transport (e.g. REST, ZMQ, AMQP) 3. Make the API easily consumable (e.g. proper docs, example code, extending the API, etc) 4. Implement the API itself All of these are dependent on one another and the permutations are endless. This is why I think we should try and work on each one separately. All discussions will be done openly on the mailing list and until the final version comes out nothing is set in stone. If you think you have anything to contribute to this process, please do so either by commenting on the discussions or by sending code/docs/whatever patches. Once the API solidifies it will be quite difficult to change fundamental things, so speak now or forever hold your peace. Note that this is just an introductory email. There will be a quick follow up email to kick start the discussions. ___ vdsm-devel mailing list vdsm-devel@lists.fedorahosted.org https://fedorahosted.org/mailman/listinfo/vdsm-devel ___ vdsm-devel mailing list vdsm-devel@lists.fedorahosted.org https://fedorahosted.org/mailman/listinfo/vdsm-devel
Re: [vdsm] [Engine-devel] RFC: Writeup on VDSM-libstoragemgmt integration
First of all I'd like to suggest not using the LSM acronym as it can also mean live-storage-migration and maybe other things. Secondly I would like to avoid talking about what needs to be changed in VDSM before we figure out what exactly we want to accomplish. Also, there is no mention on credentials in any part of the process. How does VDSM or the host get access to actually modify the storage array? Who holds the creds for that and how? How does the user set this up? In the array as domain case. How are the luns being mapped to initiators. What about setting discovery credentials. In the array set up case. How will the hosts be represented in regards to credentials? How will the different schemes and capabilities in regard to authentication methods will be expressed. Rest of the comments inline - Original Message - From: Deepak C Shetty deepa...@linux.vnet.ibm.com To: VDSM Project Development vdsm-devel@lists.fedorahosted.org Cc: libstoragemgmt-de...@lists.sourceforge.net, engine-de...@ovirt.org Sent: Wednesday, May 30, 2012 5:38:46 AM Subject: [Engine-devel] RFC: Writeup on VDSM-libstoragemgmt integration Hello All, I have a draft write-up on the VDSM-libstoragemgmt integration. I wanted to run this thru' the mailing list(s) to help tune and crystallize it, before putting it on the ovirt wiki. I have run this once thru Ayal and Tony, so have some of their comments incorporated. I still have few doubts/questions, which I have posted below with lines ending with '?' Comments / Suggestions are welcome appreciated. thanx, deepak [Ccing engine-devel and libstoragemgmt lists as this stuff is relevant to them too] -- 1) Background: VDSM provides high level API for node virtualization management. It acts in response to the requests sent by oVirt Engine, which uses VDSM to do all node virtualization related tasks, including but not limited to storage management. libstoragemgmt aims to provide vendor agnostic API for managing external storage array. It should help system administrators utilizing open source solutions have a way to programmatically manage their storage hardware in a vendor neutral way. It also aims to facilitate management automation, ease of use and take advantage of storage vendor supported features which improve storage performance and space utilization. Home Page: http://sourceforge.net/apps/trac/libstoragemgmt/ libstoragemgmt (LSM) today supports C and python plugins for talking to external storage array using SMI-S as well as native interfaces (eg: netapp plugin ) Plan is to grow the SMI-S interface as needed over time and add more vendor specific plugins for exploiting features not possible via SMI-S or have better alternatives than using SMI-S. For eg: Many of the copy offload features require to use vendor specific commands, which justifies the need for a vendor specific plugin. 2) Goals: 2a) Ability to plugin external storage array into oVirt/VDSM virtualization stack, in a vendor neutral way. 2b) Ability to list features/capabilities and other statistical info of the array 2c) Ability to utilize the storage array offload capabilities from oVirt/VDSM. 3) Details: LSM will sit as a new repository engine in VDSM. VDSM Repository Engine WIP @ http://gerrit.ovirt.org/#change,192 Current plan is to have LSM co-exist with VDSM on the virtualization nodes. *Note : 'storage' used below is generic. It can be a file/nfs-export for NAS targets and LUN/logical-drive for SAN targets. VDSM can use LSM and do the following... - Provision storage - Consume storage 3.1) Provisioning Storage using LSM Typically this will be done by a Storage administrator. oVirt/VDSM should provide storage admin the - ability to list the different storage arrays along with their types (NAS/SAN), capabilities, free/used space. - ability to provision storage using any of the array capabilities (eg: thin provisioned lun or new NFS export ) - ability to manage the provisioned storage (eg: resize/delete storage) Once the storage is provisioned by the storage admin, VDSM will have to refresh the host(s) for them to be able to see the newly provisioned storage. [SM] What about the clustered case, The management or the mailbox will have to be involved. Pros\Cons? Is there a capability for the storage to announce a change in topology? Can libstoragemgmt consume it? Does it even make sense? 3.1.1) Potential flows: Mgmt - vdsm - lsm: create LUN + LUN Mapping / Zoning / whatever is needed to make LUN available to list of hosts passed by mgmt Mgmt - vdsm: getDeviceList (refreshes host and gets list of devices) Repeat above for all relevant hosts (depending on list passed earlier, mostly relevant
Re: [vdsm] [virt-node] VDSM as a general purpose virt host manager
The decision to declare the current API as supported or not, or opening ourselves to more then one API transport is directly related to how we decide to handle deprecation (if any), API versioning and forward\backward compatibility. If we discover we clear path to evolve the API or support multiple transports and adhere to the (soon to be) agreed upon supportability guidelines we might choose the easy way of supporting the current API. This is why deciding how we are going to support things is the first step in the process. As a side note, having the XML-RPC operational for a version or two until the engine starts to use the new API is a non issue IMHO. - Original Message - From: Anthony Liguori anth...@codemonkey.ws To: Saggi Mizrahi smizr...@redhat.com Cc: VDSM Project Development vdsm-devel@lists.fedorahosted.org, Daniel P. Berrange berra...@redhat.com, Daniel Veillard veill...@redhat.com Sent: Monday, June 18, 2012 4:14:15 PM Subject: Re: [vdsm] [virt-node] VDSM as a general purpose virt host manager On 06/18/2012 10:02 AM, Saggi Mizrahi wrote: I would like to put on to the table for descussion the growing need for a way to more easily reuse of the functionality of VDSM in order to service projects other than Ovirt-Engine. Originally VDSM was created as a proprietary agent for the sole purpose of serving the then proprietary version of what is known as ovirt-engine. Red Hat, after acquiring the technology, pressed on with it's commitment to open source ideals and released the code. But just releasing code into the wild doesn't build a community or makes a project successful. Further more when building open source software you should aspire to build reusable components instead of monolithic stacks. We would like to expose a stable, documented, well supported API. This gives us a chance to rethink the VDSM API from the ground up. There is already work in progress of making the internal logic of VDSM separate enough from the API layer so we could continue feature development and bug fixing while designing the API of the future. In order to achieve this though we need to do several things: 1. Declare API supportability guidelines Adding danpb and DV as I think they can provide good advice here. Practically speaking, I think the most important thing to do is clearly declare what's supported and not supported in more detail than you probably want to. Realistically, you have to just support whatever you have. I don't know that designing a supportable interface can be really successful unless you start with that tomorrow. So basically, unless you plan on removing the XML-RPC interface in the next release, you should plan on supporting it forever... 2. Decide on an API transport (e.g. REST, ZMQ, AMQP) We spent so much time trying to find the best transport in QEMU with the resulting being something I'm ultimately unhappy with. The best decision we've made recently on this front is to move to a schema-based RPC mechanism where the transport code is all autogenerated. Python has an advantage in that it supports introspection although a disadvantage in that it's easy to end up with an ad-hoc interface by relying on passing around dictionaries. 3. Make the API easily consumable (e.g. proper docs, example code, extending the API, etc) Documentation is by far the most important thing IMHO. I actually think that simply taking the existing XML-RPC interface and adding documentation ought to be the first step even.. 4. Implement the API itself I think the biggest risk in an effort like this is letting perfect become the enemy of good. If the goal is to open VDSM up to other applications, you can start today but just documenting what you have with plans to deprecate and improve later. Honestly, worrying about XML-RPC vs. REST vs. AMQP is likely going to result in a lot of bike shedding and grand plans. Regards, Anthony Liguori All of these are dependent on one another and the permutations are endless. This is why I think we should try and work on each one separately. All discussions will be done openly on the mailing list and until the final version comes out nothing is set in stone. If you think you have anything to contribute to this process, please do so either by commenting on the discussions or by sending code/docs/whatever patches. Once the API solidifies it will be quite difficult to change fundamental things, so speak now or forever hold your peace. Note that this is just an introductory email. There will be a quick follow up email to kick start the discussions. ___ vdsm-devel mailing list vdsm-devel@lists.fedorahosted.org https://fedorahosted.org/mailman/listinfo/vdsm-devel
Re: [vdsm] pep8 questions
I thins this is the correct formatting: self.__putMetadata({NONE: # * (sd.METASIZE - 10)}, metaid) cls.log.warn(Could not get size for vol %s/%s using optimized methods, sdobj.sdUUID, volUUID, exc_info=True) - Original Message - From: Deepak C Shetty deepa...@linux.vnet.ibm.com To: VDSM Project Development vdsm-devel@lists.fedorahosted.org Sent: Tuesday, June 5, 2012 2:19:04 PM Subject: [vdsm] pep8 questions Hi, I was looking at resolving pep8 issues in vdsm/storage/blockVolume.py. Haven't been able to resolve the below.. Pointers appreciated. vdsm/storage/blockVolume.py:99:55: E225 missing whitespace around operator vdsm/storage/blockVolume.py:148:28: E201 whitespace after '{' vdsm/storage/blockVolume.py:207:28: E701 multiple statements on one line (colon) line 99: cls.log.warn(Could not get size for vol %s/%s using optimized googling i found some links indicating this pep8 warning is incorrect. line 148: cls.__putMetadata({ NONE: # * (sd.METASIZE-10) }, metaid) It gives some other error if i remove the whitespace after { line 206 207: raise se.VolumeCannotGetParent(blockVolume can't get parent %s for volume %s: %s % (srcVolUUID, volUUID, str(e))) I split this line to overcome the 80 error, but unable to decipher what this error means ? thanx, deepak ___ vdsm-devel mailing list vdsm-devel@lists.fedorahosted.org https://fedorahosted.org/mailman/listinfo/vdsm-devel ___ vdsm-devel mailing list vdsm-devel@lists.fedorahosted.org https://fedorahosted.org/mailman/listinfo/vdsm-devel
Re: [vdsm] VDSM API/clientIF instance design issue
If you don't want to add it in a parameter then you already suspect that you are doing something wrong. Using a singleton instead of passing a parameter doesn't make the dependency not there. It's just obscures it. I might not fully understand what you want to do but I think what you want is to have MOM expect a certain interface. Then have an adapter class bridging the two interfaces. The pass the wrapped CIF to MOM. - Original Message - From: Mark Wu wu...@linux.vnet.ibm.com To: vdsm-devel@lists.fedorahosted.org Cc: Dan Kenigsberg dan...@redhat.com, Saggi Mizrahi smizr...@redhat.com, Adam Litke a...@us.ibm.com, Ryan Harper ry...@us.ibm.com Sent: Wednesday, May 30, 2012 10:49:29 AM Subject: VDSM API/clientIF instance design issue Hi Guys, Recently, I has been working on integrate MOM into VDSM. MOM needs to use VDSM API to interact with it. But currently, it requires the instance of clientIF to use vdsm API. Passing clientIF to MOM is not a good choice since it's a vdsm internal object. So I try to remove the parameter 'cif' from the interface definition and change to access the globally unique clientIF instance in API.py. To get the instance of clientIF, I add a decorator to clientIF to change it into singleton. Actually, clientIF has been working as a global single instance already. We just don't have an interface to get it and so passing it as parameter instead. I think using singleton to get the instance of clientIF is more clean. Dan and Saggi already gave some comments in http://gerrit.ovirt.org/#change,4839 Thanks for the reviewing! But I think we need more discussion on it, so I post it here because gerrit is not the appropriate to discuss a design issue. Thanks ! Mark. ___ vdsm-devel mailing list vdsm-devel@lists.fedorahosted.org https://fedorahosted.org/mailman/listinfo/vdsm-devel
[vdsm] pep8 check in vim
Now that we started moving to conform with pep8 you would probably like to be able to easily check your code. If you use vim you could use this vim script http://www.vim.org/scripts/script.php?script_id=2914 I you are not using vim follow these 3 simple steps: 1. Switch to vim 2. Install the script 3. Profit ___ vdsm-devel mailing list vdsm-devel@lists.fedorahosted.org https://fedorahosted.org/mailman/listinfo/vdsm-devel
[vdsm] Test that broken upstream
Upstream will not currently build because of the following patch: commit 1d4f220616ca6fc014bbdfef7b826a16ed608ddf Author: y kaplan ykap...@redhat.com Date: Thu Apr 5 18:02:40 2012 +0300 Added guestIFTests Change-Id: I8b5138296c098826f149c26d38fd2bfce8794fe4 Reviewed-on: http://gerrit.ovirt.org/3379 Reviewed-by: Dan Kenigsberg dan...@redhat.com Tested-by: Dan Kenigsberg dan...@redhat.com It's import utils before setting up constants. The reason it worked for the committer is that it had VDSM installed on the host so it could import vdsm from site-packages. Please note that when writing tests and try to test with a clean host. We are in the process of having Jenkins plug in to Gerrit to do it for us before we actually commit and break the build. But until then please double check. ___ vdsm-devel mailing list vdsm-devel@lists.fedorahosted.org https://fedorahosted.org/mailman/listinfo/vdsm-devel
Re: [vdsm] PEP8 in VDSM code
The reason I wanted a gerrit hook is to avoid putting a -1 until VDSM is clean of errors. It's supposed to be a transitional state. - Original Message - From: Itamar Heim ih...@redhat.com To: Ewoud Kohl van Wijngaarden ew...@kohlvanwijngaarden.nl Cc: vdsm-devel@lists.fedorahosted.org Sent: Monday, March 26, 2012 5:52:13 AM Subject: Re: [vdsm] PEP8 in VDSM code On 03/26/2012 11:26 AM, Ewoud Kohl van Wijngaarden wrote: On Mon, Mar 26, 2012 at 04:57:24AM -0400, Ayal Baron wrote: I'd rather avoid gerrit hooks if possible to use a jenkins job to validate this to keep the gerrit deployment as simple to maintain/upgrade as possible. But that's the wrong place to be doing it. Jenkins periodically polls for changes and then runs a job and posts the results somewhere (who would get the email?) Here the committer would immediately know that there is a problem with the patch and reviewers also immediately know not to accept it. I think what Itamar is getting at is that from gerrit you can trigger jenkins jobs which give a -1 if it fails. If jenkins checks for pep8 you've solved the feedback issue without creating custom a gerrit hook. It will also be more scalable since you can add pyflakes / pylint / ... in the same check. true. per ayal's question - patch owner and reviewers will get the email, like any other review. we need to keep the gerrit as simple as possible wrt maintenance. ___ vdsm-devel mailing list vdsm-devel@lists.fedorahosted.org https://fedorahosted.org/mailman/listinfo/vdsm-devel ___ vdsm-devel mailing list vdsm-devel@lists.fedorahosted.org https://fedorahosted.org/mailman/listinfo/vdsm-devel
[vdsm] PEP8 in VDSM code
I suggest having pep8 a must for patch submission in VDSM. http://www.python.org/dev/peps/pep-0008/ Currently there are a few people policing these rules in reviews but I suggest we make it automatic. Unless someone objects I will put a gerrit hook that complains about pep8 violations. It will not mark -1s until all (or at least most) source code has been converted because people might get complains about code they did not modify in this patch. If you happy and you know it +1! ___ vdsm-devel mailing list vdsm-devel@lists.fedorahosted.org https://fedorahosted.org/mailman/listinfo/vdsm-devel
[vdsm] Following unix pipes
I recently had several instances where I had to try and figure out who holds what end of a unix pipe. To make this operation a bit more streamlined I created a small script to follow a pipe. I think it will be useful for other people debugging VDSM especially bugs related to out of process helpers not closing FDs properly. To see all the exists of a pipe just input a known end of the pipe: stahlband PID FD $ stahlband 5758 5 PID: 5758 FD: 5 KIND: r PID: 5758 FD: 6 KIND: w PID: 5770 FD: 5 KIND: r PID: 5770 FD: 6 KIND: w The code is available on github: https://github.com/ficoos/stahlband ___ vdsm-devel mailing list vdsm-devel@lists.fedorahosted.org https://fedorahosted.org/mailman/listinfo/vdsm-devel
Re: [vdsm] Fedora Virtualization Test Day 2012-04-12
Good thing you linked to that wiki page. I learned a lot. I don't mind being there for the EST shift. - Original Message - From: Ayal Baron aba...@redhat.com To: VDSM Project Development vdsm-devel@lists.fedorahosted.org Sent: Tuesday, February 28, 2012 4:39:04 PM Subject: [vdsm] Fedora Virtualization Test Day 2012-04-12 Hi all, $subject is a month and a half away. Any volunteers to driving vdsm testing forward for that day? https://fedoraproject.org/wiki/Test_Day:2012-04-12_Virtualization_Test_Day Regards, Ayal ___ vdsm-devel mailing list vdsm-devel@lists.fedorahosted.org https://fedorahosted.org/mailman/listinfo/vdsm-devel ___ vdsm-devel mailing list vdsm-devel@lists.fedorahosted.org https://fedorahosted.org/mailman/listinfo/vdsm-devel
Re: [vdsm] flowID schema
Pinning down specific moments where stuff went horribly wrong is usually quite simple. The only reason I can't think of for someone to track a problem is if there was a bug in the storage subsystem that cause a corruption that only became apparent later on. But this can't benefit from flowID. I'm going to to day this again. I want to hear someone to give a case where this is useful. Don't say things like. It will allow me to do X or I know of a guy who spent 2 years doing X to VDSM\Engine logs I want an example like: The user complained about X. So I had to do Y to figure out what is wrong and it was a pain. The Only reason I can think of is someone trying to figure out what wrong knowing nothing about how VDSM\Engine works. While I just jump from place to place because I know what is going on, other people would just want to go step by step to get the complete picture. But even with that I don't see why you would want to keep going from the Engine down to VDSM and back out again. Other then that I just simply can't imagine a use case where I'd need to do the stuff you talk about. But again, I usually get my bug reports from QA and QE, and they are more skilled at bug reporting then the general public. Further more, WHAT IS A FLOW? When does it start? When does it end? create Image is a flow? is the connect to the domain included? isn't it all just part of a big flow to create a VM? Does the engine even track it as a flow? Flows depend on the debugger the problem and the scope. Throwing another useless ID in the pile will give you nothing. What you want is to be able to map sophisticated connections between resources and operations inside every component and between them. For instance, in VDSM. To track a resource locking issues you use the resourceID. It crosses flows. To debug connection issues you use the connection information (and soon, the reference ID). An it crosses flows as well. I'm not saying that debugging Ovirt is easy. I'm just saying that this is not the solution IMHO. Good anchors to resources like taskIDs, connection reference IDs, domain IDs and resource IDs give you the ability to track whatever you want. You just need better tools to cross reference them across log files so that the log tricks I *know* are implemented in a way that everyone can use them. This is when good tools come in to play. Think about it like a DB. Should something be a table with an index or should it just be a view. (more inline) - Original Message - From: Simon Grinberg si...@redhat.com To: Saggi Mizrahi smizr...@redhat.com Cc: vdsm-devel@lists.fedorahosted.org, Dan Yasny dya...@redhat.com, Ayal Baron aba...@redhat.com Sent: Thursday, February 16, 2012 11:07:34 AM Subject: Re: [vdsm] flowID schema - Original Message - From: Saggi Mizrahi smizr...@redhat.com To: Simon Grinberg si...@redhat.com Cc: vdsm-devel@lists.fedorahosted.org, Dan Yasny dya...@redhat.com, Ayal Baron aba...@redhat.com Sent: Thursday, February 16, 2012 5:18:26 PM Subject: Re: [vdsm] flowID schema You could just cross check flowID (that is printed in RHEV-M to all the call IDs in that flow, It's a simple enough tool to make). Also if you look at the way VDSM is heading you will see that just grepping for the flow ID will gradually give you a smaller and smaller picture of the actual flow. - The new connection management means that the actual connect is not a direct part of the flow anymore. - We plan to have a similar mechanism for domains. - The new image code will do a lot of things out of band. - There will be more flows using multiple hosts because of SDM and the increased safety gained by sanlock and they will not share the flow ID in internal communication. - Engine actually polls all tasks in a single thread (with it's own flow ID? No flowID?) so even the actual result for async tasks might have a different flowID in the VDSM log. OK so they may be a terminology mismatch here: A flow as an end user would see it is: Everything that happened since the moment he clicked OK and to the moment the operation failed succeeded. Same goes for internal/auto triggered actions in response to an event (Storage failure etc). It may be composed as you notes above of several connections, multiple tasks etc. The flow ID as I see it is the linkage that makes sense in all of these. Everything that you've wrote above just convinces me that such an ID is a must. I understood that. The point is that the system is too complex to be able to find certain operations to flows. If you connect in FlowX but use a resource provided by the connection in flowY. The connection breaks and you loose the resource. As long as you somehow pinned the flowID value to the connection object it might be logged as flowX. The actual problem happened in flowY. My point is that the logical leaps required to extract only relevant data to a flow
Re: [vdsm] flowID schema
-1 I agree that for messaging environment having a Message ID is a must because you sometimes don't have a particular target so when you get a response you need to know what this node is actually responding to. The message ID could be composed with FLOWIDMSGID so you can reuse the field. But that is all besides the point. I understand that someone might find it fun to go on following the entire flow in the Engine and in VDSM. But I would like to hear an actual use case where someone would have actually benefited from this. As I see it having VSDM return the task ID with every response (and not just for async tasks) is a lot more useful and correct. A generic debugging scenario as I see it. 1. Something went wrong 2. You go looking in the ENGINE log trying to figure out what happend. 3. You see that ENGINE got SomeError. 4. Check to see if this error makes sense imagining that VDSM is always right and is a black box. 5. You did your digging and now you think that VDSM is as fault. 6. Go look for the call that failed. (If we returned the taskID it's pretty simple to find that call). 7. Look around the call to check VDSM state. 8. Profit. There is never a point where you want to follow a whole flow call by call going back and forth, and even if you did having the VDSM taskID is a better anchor then flowID. VDSM is built in a way that every call takes in to account the current state only. Debugging it with an engine flow mindset is just wrong and distracting. I see it doing more harm the good by reinforcing bad debugging practices. - Original Message - From: Keith Robertson krobe...@redhat.com To: VDSM Project Development vdsm-devel@lists.fedorahosted.org Sent: Thursday, February 9, 2012 1:34:43 PM Subject: Re: [vdsm] flowID schema On 02/09/2012 12:18 PM, Andrew Cathrow wrote: - Original Message - From: Ayal Baronaba...@redhat.com To: Dan Kenigsbergdan...@redhat.com Cc: VDSM Project Developmentvdsm-devel@lists.fedorahosted.org Sent: Monday, February 6, 2012 10:35:54 AM Subject: Re: [vdsm] flowID schema - Original Message - On Thu, Feb 02, 2012 at 10:32:49AM -0500, Saggi Mizrahi wrote: flowID makes no sense after the initial API call as stuff like cacheing\threadpools\samplingtasks\resources\asyncTasks so flowing a flow like that will not give you the entire picture while debugging. Also adding it now will make everything even more ugly. You know what, just imagine I wrote one of my long rambles about why I don't agree with doing this. I cannot imagine you write anything like that. Really. I do not understand why you object logging flowID on API entry point. The question is, what problem is this really trying to solve and is there a simpler and less obtrusive solution to that problem? correlating logs between ovirt engine and potentially multiple vdsm nodes is a nightmare. It requires a lot skill to follow a transaction through from the front end all the way to the node, and even multiple nodes (eg actions on spm, then actions on other node to run a vm). Having a way to correlate the logs and follow a single event/flow is vital. +1 Knowing what command caused a sequence of events in VDSM would be really helpful particularly in a threaded environment. Further, wouldn't such an ID be helpful in an asynchronous request/response model? I'm not sure what the plans are for AMQP or even if there are plans, but I'd think that something like this would be crucial for an async response. So, if you implemented it you might be killing 2 birds with 1 stone. FYI: If you want to see examples of other systems that use similar concepts, take a look at the correlation ID in JMS. Cheers, Keith ___ vdsm-devel mailing list vdsm-devel@lists.fedorahosted.org https://fedorahosted.org/mailman/listinfo/vdsm-devel ___ vdsm-devel mailing list vdsm-devel@lists.fedorahosted.org https://fedorahosted.org/mailman/listinfo/vdsm-devel ___ vdsm-devel mailing list vdsm-devel@lists.fedorahosted.org https://fedorahosted.org/mailman/listinfo/vdsm-devel ___ vdsm-devel mailing list vdsm-devel@lists.fedorahosted.org https://fedorahosted.org/mailman/listinfo/vdsm-devel ___ vdsm-devel mailing list vdsm-devel@lists.fedorahosted.org https://fedorahosted.org/mailman/listinfo/vdsm-devel
Re: [vdsm] metadata
Domain contain different metadata depending on their versions. The important keys represent: * Domain UUID * SPM lease information * Pool membership * original device block size (So if someone moves the domain between devices with different block sizes we know that the domain broke) * Domain human readable name A master domain also contains pool information * All members of the storage pool * this version of the master MD * pool human readable name A lot of these key will be deprecated in future domain versions as long as the entire concept of storage pools. So If you are planning on writing tools that read the domain MD please be aware the the format\keys will drastically change in the next months. - Original Message - From: wangxiaofan wangxiao...@opzoon.com To: vdsm-devel@lists.fedorahosted.org Sent: Thursday, February 2, 2012 1:58:06 AM Subject: [vdsm] metadata What is the metadata of non-master data domain used for? And master data domain? ___ vdsm-devel mailing list vdsm-devel@lists.fedorahosted.org https://fedorahosted.org/mailman/listinfo/vdsm-devel ___ vdsm-devel mailing list vdsm-devel@lists.fedorahosted.org https://fedorahosted.org/mailman/listinfo/vdsm-devel
Re: [vdsm] flowID schema
flowID makes no sense after the initial API call as stuff like cacheing\threadpools\samplingtasks\resources\asyncTasks so flowing a flow like that will not give you the entire picture while debugging. Also adding it now will make everything even more ugly. You know what, just imagine I wrote one of my long rambles about why I don't agree with doing this. As you plan on going on anyway here is my suggestion on how to push this in. XMLRPC doesn't support named parameter, this means that you can't just ad-hoc a new arg to all the API calls called flow-id. For simplicity's sake lets assume they always pass the last arg as flowID if it is a string that starts with __FLOWID__. What you do then is in dispatcher take the last arg and put it on the task object. Have the logger print this value even when the task is in prepare next to the threadID. You will have to make the clientIF calls use *another* dispatcher but the same task thread pool to have this supported at the clientIF verbs as well but I think it should have been done anyways. - Original Message - From: Douglas Landgraf dougsl...@redhat.com To: VDSM Project Development vdsm-devel@lists.fedorahosted.org Sent: Thursday, February 2, 2012 12:00:44 AM Subject: [vdsm] flowID schema Hello, flowID is schema that we will be including in vdsm API to oVirt Engine people share the ID of engine transaction to vdsm. With this in hands, we will add the ID of transactions to our log. I would like to know your opinion how we could do that without break our API, like include new parameters to our calls. Should we add at BindingXMLRPC.py - wrapper() a code to search for a 'flowID' key into functions which use dict as parameter (like create)? [1] Maybe change at other level inside BindingXMLRPC ? Ideas/Thoughts? [1] http://gerrit.ovirt.org/#patch,sidebyside,1221,3,vdsm/BindingXMLRPC.py Thanks! -- Cheers Douglas ___ vdsm-devel mailing list vdsm-devel@lists.fedorahosted.org https://fedorahosted.org/mailman/listinfo/vdsm-devel ___ vdsm-devel mailing list vdsm-devel@lists.fedorahosted.org https://fedorahosted.org/mailman/listinfo/vdsm-devel
[vdsm] Libstorage and repository engines
I've been working on refactoring the storageDomain\images system in VDSM. Apart from facilitating various features I've also been trying to make adding new SD types easier and making the image manipulation bits consistent across domain implementation. Currently in order to create a new domain type you have to create a new StoageDomain,Image and Volume objects and implement all the logic to manipulate them. Apart from being cumbersome and redundant it also make mixed clustered very hard to do. On of the big changes I put in is separating the image manipulation with the actual storage work. Instead of each domain type implementing createImage and co you have one class responsible for all the image manipulation in the cluster. All you have to do facilitate a new storage type is to create a domain engine. A domain engine is a python class that implement a minimal interface. 1. It has to be able to create resize and delete a slab (slab being a block of writable storage like a lun\lv\file) 2. It has to be able to create and delete tags (tags are pointers to slabs) The above function are very easy to implement and require very little complexity. All the heavy lifting (image manipulation, cleaning, transaction, atomic operations, etc) is managed by the Image Manager that just uses this unified interface to interact with the different storage types) In cases where a domain might have a special non-standard features I introduce the concept of capabilities. A domain engine can declare support for certain capabilities (eg. native snapshotting) and implement additional interfaces. If the image manager sees that the domain implements a capability it will use it if not it will use a default implementation that uses the default must have verbs. This is similar to just having drawLine and having drawRect. This is done automatically and at runtime. I like to compare this to how OpenGL will use software rendering if a certain standard feature is not implemented by the card so you might get a slower but still correct result. Now, libstorage is another way to abstract interactions and capabilities for different storage types and have a unified API for accessing them. Building a repo engine on top of libstorage is completely possible. But as you can see this creates a redundant layer of abstractions in the libstorage side. As I see it if you just want to have you storage supported by ovirt creating a repo engine is simpler as you can use high level concepts and I do plan to have engines run as their own processes so you could use whatever licence, language and storage server API you choose. Also libstorage will have to keep it's abstraction at a much lower level. This means exposing target specific flags and abilities. eWhile this is good in concept it will mean that the repo engine wrapping libstorage will have to juggle all those flags and calls instead of having different distinct class for each storage type with it's own specific hacks in place. Just as a current example, we currently use the same engine for nfs3 and nfs4. This means that when we are running on nfs4 we are still doing all the hacks that are meant to circumvent issues with v3 being stateless. This is no longer relevant as v4 is stateful. And what about SAMBA? or gluster? You got to have special hacks for boths What I'm saying is that if in the relatively simple world of NAS where we have a proven abstraction (file access commands, POSIX). We can't find a way to create a 1 class to rule them all. How can we expect to have a sane solution for the crazy world of SAN. I'm not saying we shouldn't create an engine for libstorage, just that we should treat it like we treat sharefs. As a simple generic non bullet proof\optimized implementation. Let the flaming commence! ___ vdsm-devel mailing list vdsm-devel@lists.fedorahosted.org https://fedorahosted.org/mailman/listinfo/vdsm-devel
Re: [vdsm] about snapshot
snapshot might have multiple images base on the in the case of template and preview mode. We also plan to remove preview mode and make it that every snapshot can have multiple images base on it. This design can only be possible when using different qcow2 files. I might not be understanding Dan as there are no plans to have libvirt do snapshotting (apart from live snapshots and even in that case vdsm's storage backend will be the one creating the actual storage target for the image). - Original Message - From: Daniel P. Berrange berra...@redhat.com To: wangxiaofan wangxiao...@opzoon.com Cc: vdsm-devel@lists.fedorahosted.org Sent: Monday, January 30, 2012 9:10:38 AM Subject: Re: [vdsm] about snapshot On Mon, Jan 30, 2012 at 10:08:19PM +0800, wangxiaofan wrote: Hi there, Why does not vdsm use snapshot APIs of libvirt, or qemu-img snapshot -c ? The snapshot APIs are a fairly recent addition to libvirt, whose design was in fact influenced strongly by VDSM's requirements. IIUC, the intent is for VDSM to switch over to the libvirt APIs at some point in the not too distant future. Regards, Daniel -- |: http://berrange.com -o- | http://www.flickr.com/photos/dberrange/ :| |: http://libvirt.org -o- |http://virt-manager.org :| |: http://autobuild.org -o- |http://search.cpan.org/~danberr/ :| |: http://entangle-photo.org -o- | http://live.gnome.org/gtk-vnc :| ___ vdsm-devel mailing list vdsm-devel@lists.fedorahosted.org https://fedorahosted.org/mailman/listinfo/vdsm-devel ___ vdsm-devel mailing list vdsm-devel@lists.fedorahosted.org https://fedorahosted.org/mailman/listinfo/vdsm-devel
Re: [vdsm] [Engine-devel] [RFC] New Connection Management API
snip Again trying to sum up and address all comments Clear all: == My opinions is still to not implement it. Even though it might generate a bit more traffic premature optimization is bad and there are other reasons we can improve VDSM command overhead without doing this. In any case this argument is redundant because my intention is (as Litke pointed out) is to have a lean API. and API call is something you have to support across versions, this call implemented in the engine is something that no one has to support and can change\evolve easily. As a rule, if an API call C and be implemented by doing A + B then C is redundant. List of connections as args: Sorry I forgot to respond about that. I'm not as strongly opposed to the idea as the other things you suggested. It'll just make implementing the persistence logic in VDSM significantly more complicated as I will have to commit multiple connection information to disk in an all or nothing mode. I can create a small sqlitedb to do that or do some directory tricks and exploit FS rename atomicity but I'd rather not. The demands are not without base. I would like to keep the code simple under the hood in the price of a few more calls. You would like to make less calls and keep the code simpler on your side. There isn't a real way to settle this. If anyone on the list as pros and cons for either way I'd be happy to hear them. If no compelling arguments arise I will let Ayal call this one. Transient connections: == The problem you are describing as I understand it is that VDSM did not respond and not that the API client did not respond. Again, this can happen for a number of reason, most of which VDSM might not be aware that there is actually a problem (network issues). This relates to the EOL policy. I agree we have to find a good way to define an automatic EOL for resources. I have made my suggestion. Out of the scope of the API. In the meantime cleaning stale connections is trivial and I have made it clear a previous email about how to go about it in a simple non intrusive way. Clean hosts on host connect, and on every poll if you find connections that you don't like. This should keep things squeaky clean. - Original Message - From: Livnat Peer lp...@redhat.com To: Saggi Mizrahi smizr...@redhat.com Cc: vdsm-devel@lists.fedorahosted.org, engine-de...@ovirt.org Sent: Thursday, January 26, 2012 5:22:42 AM Subject: Re: [Engine-devel] [RFC] New Connection Management API On 25/01/12 23:35, Saggi Mizrahi wrote: SNIP This is mail was getting way too long. About the clear all verb. No. Just loop, find the connections YOU OWN and clean them. Even though you don't want to support multiple clients to VDSM API doesn't mean the engine shouldn't behave like a proper citizen. It's the same reason why VDSM tries and not mess system resources it didn't initiate. There is a big difference, VDSM living in hybrid mode with other workload on the host is a valid use case, having more than one concurrent manager for VDSM is not. Generating a disconnect request for each connection does not seem like the right API to me, again think on the simple flow of moving host from one data center to another, the engine needs to disconnect tall storage domains (each domain can have couple of connections associated with it). I am giving example from the engine use cases as it is the main user of VDSM ATM but I am sure it will be relevant to any other user of VDSM. As I see it the only point of conflict is the so called non-peristed connections. I will call them transient connections from now on. There are 2 user cases being discussed 1. Wait until a connection is made, if it fails don't retry and automatically unmanage. 2. If the called of the API forgets or fails to unmanage a connection. Actually I was not discussing #2 at all. Your suggestion as I understand it: Transient connections are: - Connection that VDSM will only try to connect to once and will not reconnect to in case of disconnect. yes My problem with this definition that it does not specify the end of life of the connection. Meaning it solves only use case 1. since this is the only use case i had in mind, it is what i was looking for. If all is well, and it usually is, VDSM will not invoke a disconnect. So the caller would have to call unmanage if the connection succeeded at the end of the flow. agree. Now, if you are already calling unmanage if connection succeeded you can just call it anyway. not exactly, an example I gave earlier on the thread was that VSDM hangs or have other error and the engine can not initiate unmanaged, instead let's assume the host is fenced (self-fence or external fence does not matter), in this scenario the engine will not issue unmanage
Re: [vdsm] [Engine-devel] [RFC] New Connection Management API
- Original Message - From: Adam Litke a...@us.ibm.com To: Saggi Mizrahi smizr...@redhat.com Cc: Livnat Peer lp...@redhat.com, engine-de...@ovirt.org, vdsm-devel@lists.fedorahosted.org Sent: Thursday, January 26, 2012 1:58:40 PM Subject: Re: [vdsm] [Engine-devel] [RFC] New Connection Management API On Thu, Jan 26, 2012 at 10:00:57AM -0500, Saggi Mizrahi wrote: snip Again trying to sum up and address all comments Clear all: == My opinions is still to not implement it. Even though it might generate a bit more traffic premature optimization is bad and there are other reasons we can improve VDSM command overhead without doing this. In any case this argument is redundant because my intention is (as Litke pointed out) is to have a lean API. and API call is something you have to support across versions, this call implemented in the engine is something that no one has to support and can change\evolve easily. As a rule, if an API call C and be implemented by doing A + B then C is redundant. List of connections as args: Sorry I forgot to respond about that. I'm not as strongly opposed to the idea as the other things you suggested. It'll just make implementing the persistence logic in VDSM significantly more complicated as I will have to commit multiple connection information to disk in an all or nothing mode. I can create a small sqlitedb to do that or do some directory tricks and exploit FS rename atomicity but I'd rather not. I would be strongly opposed to introducing a sqlite database into vdsm just to enable convenience mode for this API. Does the operation really need to be atomic? Why not just perform each connection sequentially and return a list of statuses? Is the only motivation for allowing a list of parameters to reduce the number of API calls between engine and vdsm)? If so, the same argument Saggi makes above applies here. I try and have VDSM expose APIs that are simple to predict. a command can either succeed or fail. The problem is not actually validating the connections. The problem is that once I concluded that they are all OK I need to persist to disk the information that will allow me to reconnect if VDSM happens to crash. If I naively save them one by one I could get in a state where only some of the connections persisted before the operation failed. So I have to somehow put all this in a transaction. I don't have to use sqlite. I could also put all the persistence information in a new dir for every call named UUID.tmp. Once I wrote everything down I rename the directory to just UUID and fsync it. This is guarantied by posix to be atomic. For unmanage, I move all the persistence information from the directories they sit in to a new dir named UUID. Rename it to a UUDI.tmp, fsync it and then remove it. This all just looks like more trouble then it's worth to me. The demands are not without base. I would like to keep the code simple under the hood in the price of a few more calls. You would like to make less calls and keep the code simpler on your side. There isn't a real way to settle this. If anyone on the list as pros and cons for either way I'd be happy to hear them. If no compelling arguments arise I will let Ayal call this one. Transient connections: == The problem you are describing as I understand it is that VDSM did not respond and not that the API client did not respond. Again, this can happen for a number of reason, most of which VDSM might not be aware that there is actually a problem (network issues). This relates to the EOL policy. I agree we have to find a good way to define an automatic EOL for resources. I have made my suggestion. Out of the scope of the API. In the meantime cleaning stale connections is trivial and I have made it clear a previous email about how to go about it in a simple non intrusive way. Clean hosts on host connect, and on every poll if you find connections that you don't like. This should keep things squeaky clean. - Original Message - From: Livnat Peer lp...@redhat.com To: Saggi Mizrahi smizr...@redhat.com Cc: vdsm-devel@lists.fedorahosted.org, engine-de...@ovirt.org Sent: Thursday, January 26, 2012 5:22:42 AM Subject: Re: [Engine-devel] [RFC] New Connection Management API On 25/01/12 23:35, Saggi Mizrahi wrote: SNIP This is mail was getting way too long. About the clear all verb. No. Just loop, find the connections YOU OWN and clean them. Even though you don't want to support multiple clients to VDSM API doesn't mean the engine shouldn't behave like a proper citizen. It's the same reason why VDSM tries and not mess system resources it didn't initiate. There is a big difference, VDSM living in hybrid mode with other workload on the host
Re: [vdsm] [Engine-devel] [RFC] New Connection Management API
- Original Message - From: Livnat Peer lp...@redhat.com To: Saggi Mizrahi smizr...@redhat.com Cc: vdsm-devel@lists.fedorahosted.org, engine-de...@ovirt.org Sent: Thursday, January 26, 2012 3:03:39 PM Subject: Re: [Engine-devel] [RFC] New Connection Management API On 26/01/12 17:00, Saggi Mizrahi wrote: snip Again trying to sum up and address all comments Clear all: == My opinions is still to not implement it. Even though it might generate a bit more traffic premature optimization is bad and there are other reasons we can improve VDSM command overhead without doing this. In any case this argument is redundant because my intention is (as Litke pointed out) is to have a lean API. and API call is something you have to support across versions, this call implemented in the engine is something that no one has to support and can change\evolve easily. As a rule, if an API call C and be implemented by doing A + B then C is redundant. I disagree with the above statement, exposing a bulk of operations in a single API call is very common and not considered redundant. I agree that that APIs with those kind of calls exist but it doesn't mean they are not redundant. re·dun·dant: adj. (of words or data) Able to be omitted without loss of meaning or function This call can be omitted without loss of function. API calls are a commitment for generations. Wrapping this in the clients doesn't. To quot myself: API call is something you have to support across versions, this call implemented in the engine is something that no one has to support and can change\evolve easily. ~ Saggi Mizrahi, a few lines above this API set will one day be considered stupid, obsolete and annoying. That's just how life is. We'll find better ways of solving these problems. When that moment comes I want to have as little functionality as possible I have to keep maintaining. I doubt there is any way you can convince me otherwise. Put yourself in my position and think if you would have made this sacrifice just to save someone a loop. To sum up, I will not add any API calls I don't absolutely have to. As to the amount of calls, this is not relevant to the clear all verb. This is addressed by the point right below this sentence. List of connections as args: Sorry I forgot to respond about that. I'm not as strongly opposed to the idea as the other things you suggested. It'll just make implementing the persistence logic in VDSM significantly more complicated as I will have to commit multiple connection information to disk in an all or nothing mode. I can create a small sqlitedb to do that or do some directory tricks and exploit FS rename atomicity but I'd rather not. The demands are not without base. I would like to keep the code simple under the hood in the price of a few more calls. You would like to make less calls and keep the code simpler on your side. There isn't a real way to settle this. It is not about keeping the code simple (writing a loop is simple as well), it is about redundant round trips. As I said, I agree there is merit there. I think that roundtrips is a general issue not specific to this call. My opinion is that communication with VDSM should just use HTTP pipelining (http://en.wikipedia.org/wiki/HTTP_pipelining) This will solve the problem globally instead of tacking it on to the API interface. I generally prefer simplicity of the API and the implementation, and correctness over performance. I laid out out what the change entails, multiple ways of solving this, and my personal perspective. Unless someone on the list objects to either solution, Ayal will have final say on this matter. He is more of a pragmatist than I (and doing what he says usually correlates with me getting my paycheck). If anyone on the list as pros and cons for either way I'd be happy to hear them. If no compelling arguments arise I will let Ayal call this one. Transient connections: == The problem you are describing as I understand it is that VDSM did not respond and not that the API client did not respond. Again, this can happen for a number of reason, most of which VDSM might not be aware that there is actually a problem (network issues). This relates to the EOL policy. I agree we have to find a good way to define an automatic EOL for resources. I have made my suggestion. Out of the scope of the API. In the meantime cleaning stale connections is trivial and I have made it clear a previous email about how to go about it in a simple non intrusive way. Clean hosts on host connect, and on every poll if you find connections that you don't like. This should keep things squeaky clean. I have no additional input on this. The only real legitimate reservation you still have with the API is transient connections. As I said, if you can find a way