Re: [vdsm] vdsm hangs in SamplingMethod after reinstall
On Thu, Feb 09, 2012 at 07:15:48PM -0500, Ayal Baron wrote: > > > - Original Message - > > Hi. I am running into a very annoying problem when working on vdsm > > lately. My > > development process involves stopping vdsm, replacing files, and > > restarting it. > > I do this pretty frequently. Sometimes, after restarting vdsm the > > XMLRPC call > > getStorageDomainsList() hangs. The following line is the last to > > print in the > > log: > > > > Thread-18::DEBUG::2012-02-09 > > 17:11:46,793::misc::1017::SamplingMethod::(__call__) Trying to enter > > sampling method (storage.sdc.refreshStorage) > > > > The only solution I've been able to come up with is restarting my > > machine. When > > stopping vdsm I search for any stale threads but I am unable to find > > them. Do > > you know what else might be causing DynamicBarrier.enter() to hang > > for a long > > period of time? Do the threading primitives use some sort of > > temporary disk > > storage that needs to be cleaned up? Thanks for the help! > > Try to add some logging in sdc.py: > def refreshStorage(self): > >>> ADD LOG HERE Yep have done this and I am not even getting into the refreshStorage function. We actually hang in DynamicBarrier.enter(). I am going to add some debugging to determine which locking operation gets stuck. > multipath.rescan() > > I have a feeling that your issue is not with SamplingMethod > > > > > -- > > Adam Litke > > IBM Linux Technology Center > > > > ___ > > vdsm-devel mailing list > > vdsm-devel@lists.fedorahosted.org > > https://fedorahosted.org/mailman/listinfo/vdsm-devel > > > -- Adam Litke IBM Linux Technology Center ___ vdsm-devel mailing list vdsm-devel@lists.fedorahosted.org https://fedorahosted.org/mailman/listinfo/vdsm-devel
Re: [vdsm] vdsm hangs in SamplingMethod after reinstall
- Original Message - > Hi. I am running into a very annoying problem when working on vdsm > lately. My > development process involves stopping vdsm, replacing files, and > restarting it. > I do this pretty frequently. Sometimes, after restarting vdsm the > XMLRPC call > getStorageDomainsList() hangs. The following line is the last to > print in the > log: > > Thread-18::DEBUG::2012-02-09 > 17:11:46,793::misc::1017::SamplingMethod::(__call__) Trying to enter > sampling method (storage.sdc.refreshStorage) > > The only solution I've been able to come up with is restarting my > machine. When > stopping vdsm I search for any stale threads but I am unable to find > them. Do > you know what else might be causing DynamicBarrier.enter() to hang > for a long > period of time? Do the threading primitives use some sort of > temporary disk > storage that needs to be cleaned up? Thanks for the help! Try to add some logging in sdc.py: def refreshStorage(self): >>> ADD LOG HERE multipath.rescan() I have a feeling that your issue is not with SamplingMethod > > -- > Adam Litke > IBM Linux Technology Center > > ___ > vdsm-devel mailing list > vdsm-devel@lists.fedorahosted.org > https://fedorahosted.org/mailman/listinfo/vdsm-devel > ___ vdsm-devel mailing list vdsm-devel@lists.fedorahosted.org https://fedorahosted.org/mailman/listinfo/vdsm-devel
[vdsm] vdsm hangs in SamplingMethod after reinstall
Hi. I am running into a very annoying problem when working on vdsm lately. My development process involves stopping vdsm, replacing files, and restarting it. I do this pretty frequently. Sometimes, after restarting vdsm the XMLRPC call getStorageDomainsList() hangs. The following line is the last to print in the log: Thread-18::DEBUG::2012-02-09 17:11:46,793::misc::1017::SamplingMethod::(__call__) Trying to enter sampling method (storage.sdc.refreshStorage) The only solution I've been able to come up with is restarting my machine. When stopping vdsm I search for any stale threads but I am unable to find them. Do you know what else might be causing DynamicBarrier.enter() to hang for a long period of time? Do the threading primitives use some sort of temporary disk storage that needs to be cleaned up? Thanks for the help! -- Adam Litke IBM Linux Technology Center ___ vdsm-devel mailing list vdsm-devel@lists.fedorahosted.org https://fedorahosted.org/mailman/listinfo/vdsm-devel
Re: [vdsm] flowID schema
- Original Message - > > From: "Saggi Mizrahi" > > To: "Keith Robertson" > > Cc: "VDSM Project Development" > > Sent: Thursday, February 9, 2012 2:24:44 PM > > Subject: Re: [vdsm] flowID schema > > > > -1 > > > > I agree that for messaging environment having a Message ID is a > > must > > because you sometimes don't have a particular target so when you > > get > > a response you need to know what this node is actually responding > > to. > > > > The message ID could be composed with so you can > > reuse the field. > > > > But that is all besides the point. > > > > I understand that someone might find it fun to go on following the > > entire flow in the Engine and in VDSM. But I would like to hear an > > actual use case where someone would have actually benefited from > > this. > > As I see it having VSDM return the task ID with every response (and > > not just for async tasks) is a lot more useful and correct. > > Actually, the only way to understand what happened in a certain flow > is to follow it through. From the engine log where an action was > initiated, down to the hosts that did the execution. Everything RHEV > does is a flow, and with no correlation between hosts executing > parts of the same flow, troubleshooting turns into guesswork, > because the only contact point left is time, which is useless when > you're talking about vdsm - there are sometimes hundreds of log > records in a single second, and not every host is in absolute sync > with every other. What are you talking about? you know exactly what operation the engine ran at vdsm level. If it's a task then you also have a task id which is a uuid so you don't need anything else. In addition, now that engine logs results, you can just grep that instead of a flow id and land at the exact correct command and not have to figure out which out of the 5 run in this flow is the relevant one. If you could give a real example where this would be beneficial (i.e. log excerpts, how you correlated them and how flow id would have eased your job) that would be great. Note that I've also discussed this with Yaniv from qe who said they don't really need it. > > > > > A generic debugging scenario as I see it. > > > > 1. Something went wrong > > 2. You go looking in the ENGINE log trying to figure out what > > happend. > > 3. You see that ENGINE got SomeError. > > ok, the rest are all downhill. > > 4. You follow the failure back to the start of the flow, then go with > the flow to the point where the engine exited to vdsm > 5. switch over to vdsm logs, make sure you have the timing right > (with no flow ID that's the olny orientation after all) > 6. find the start of the vdsm-side flow, follow it to the failure, > pray the error makes sense. > > In many cases the answer is not in the vdsm failure traceback but > somewhere in the middle of the flow, with no errors reported, this > is why we need a way to easily follow things through. Moreover, the > logs should be readable enough to make sense to a typical sysadmin, > and not a RHEV expert. > > > 4. Check to see if this error makes sense imagining that VDSM is > > always right and is a black box. > > 5. You did your digging and now you think that VDSM is as fault. > > 6. Go look for the call that failed. (If we returned the taskID > > it's > > pretty simple to find that call). > > 7. Look around the call to check VDSM state. > > 8. Profit. > > > > There is never a point where you want to follow a whole flow call > > by > > call going back and forth, and even if you did having the VDSM > > taskID is a better anchor then flowID. > > not everything is a task, flow IDs would unify entire flows, and make > following them easy. > > > > > VDSM is built in a way that every call takes in to account the > > current state only. Debugging it with an engine flow mindset is > > just > > wrong and distracting. I see it doing more harm the good by > > reinforcing bad debugging practices. > > Maybe you're right, though I can't see how from my experience so far, > but following the flows is the only thing that got cases resolved. > Not event IDs making every possible error, and not task IDs (though > these do have their uses) - slow and meticulous mapping of flows to > log records. > > > > > - Original Message - > > > From: "Keith Robertson" > > > To: "VDSM Project Development" > > > > > > Sent: Thursday, February 9, 2012 1:34:43 PM > > > Subject: Re: [vdsm] flowID schema > > > > > > On 02/09/2012 12:18 PM, Andrew Cathrow wrote: > > > > > > > > - Original Message - > > > >> From: "Ayal Baron" > > > >> To: "Dan Kenigsberg" > > > >> Cc: "VDSM Project > > > >> Development" > > > >> Sent: Monday, February 6, 2012 10:35:54 AM > > > >> Subject: Re: [vdsm] flowID schema > > > >> > > > >> > > > >> > > > >> - Original Message - > > > >>> On Thu, Feb 02, 2012 at 10:32:49AM -0500, Saggi Mizrahi > > > >>> wrote: > > > flowID makes no sense after the initial API call as stuff >
Re: [vdsm] flowID schema
> From: "Saggi Mizrahi" > To: "Keith Robertson" > Cc: "VDSM Project Development" > Sent: Thursday, February 9, 2012 2:24:44 PM > Subject: Re: [vdsm] flowID schema > > -1 > > I agree that for messaging environment having a Message ID is a must > because you sometimes don't have a particular target so when you get > a response you need to know what this node is actually responding > to. > > The message ID could be composed with so you can > reuse the field. > > But that is all besides the point. > > I understand that someone might find it fun to go on following the > entire flow in the Engine and in VDSM. But I would like to hear an > actual use case where someone would have actually benefited from > this. > As I see it having VSDM return the task ID with every response (and > not just for async tasks) is a lot more useful and correct. Actually, the only way to understand what happened in a certain flow is to follow it through. From the engine log where an action was initiated, down to the hosts that did the execution. Everything RHEV does is a flow, and with no correlation between hosts executing parts of the same flow, troubleshooting turns into guesswork, because the only contact point left is time, which is useless when you're talking about vdsm - there are sometimes hundreds of log records in a single second, and not every host is in absolute sync with every other. > > A generic debugging scenario as I see it. > > 1. Something went wrong > 2. You go looking in the ENGINE log trying to figure out what > happend. > 3. You see that ENGINE got SomeError. ok, the rest are all downhill. 4. You follow the failure back to the start of the flow, then go with the flow to the point where the engine exited to vdsm 5. switch over to vdsm logs, make sure you have the timing right (with no flow ID that's the olny orientation after all) 6. find the start of the vdsm-side flow, follow it to the failure, pray the error makes sense. In many cases the answer is not in the vdsm failure traceback but somewhere in the middle of the flow, with no errors reported, this is why we need a way to easily follow things through. Moreover, the logs should be readable enough to make sense to a typical sysadmin, and not a RHEV expert. > 4. Check to see if this error makes sense imagining that VDSM is > always right and is a black box. > 5. You did your digging and now you think that VDSM is as fault. > 6. Go look for the call that failed. (If we returned the taskID it's > pretty simple to find that call). > 7. Look around the call to check VDSM state. > 8. Profit. > > There is never a point where you want to follow a whole flow call by > call going back and forth, and even if you did having the VDSM > taskID is a better anchor then flowID. not everything is a task, flow IDs would unify entire flows, and make following them easy. > > VDSM is built in a way that every call takes in to account the > current state only. Debugging it with an engine flow mindset is just > wrong and distracting. I see it doing more harm the good by > reinforcing bad debugging practices. Maybe you're right, though I can't see how from my experience so far, but following the flows is the only thing that got cases resolved. Not event IDs making every possible error, and not task IDs (though these do have their uses) - slow and meticulous mapping of flows to log records. > > - Original Message - > > From: "Keith Robertson" > > To: "VDSM Project Development" > > Sent: Thursday, February 9, 2012 1:34:43 PM > > Subject: Re: [vdsm] flowID schema > > > > On 02/09/2012 12:18 PM, Andrew Cathrow wrote: > > > > > > - Original Message - > > >> From: "Ayal Baron" > > >> To: "Dan Kenigsberg" > > >> Cc: "VDSM Project > > >> Development" > > >> Sent: Monday, February 6, 2012 10:35:54 AM > > >> Subject: Re: [vdsm] flowID schema > > >> > > >> > > >> > > >> - Original Message - > > >>> On Thu, Feb 02, 2012 at 10:32:49AM -0500, Saggi Mizrahi wrote: > > flowID makes no sense after the initial API call as stuff like > > cacheing\threadpools\samplingtasks\resources\asyncTasks so > > flowing > > a flow like that will not give you the entire picture while > > debugging. > > > > Also adding it now will make everything even more ugly. > > You know what, just imagine I wrote one of my long rambles > > about > > why I don't agree with doing this. > > >>> I cannot imagine you write anything like that. Really. I do not > > >>> understand why you object logging flowID on API entry point. > > >> The question is, what problem is this really trying to solve and > > >> is > > >> there a simpler and less obtrusive solution to that problem? > > > correlating logs between ovirt engine and potentially multiple > > > vdsm > > > nodes is a nightmare. It requires a lot skill to follow a > > > transaction through from the front end all the way to the node, > > > and even multiple nodes (eg actions o
Re: [vdsm] flowID schema
- Original Message - > > > - Original Message - > > From: "Saggi Mizrahi" > > To: "Keith Robertson" > > Cc: "VDSM Project Development" > > Sent: Thursday, February 9, 2012 2:24:44 PM > > Subject: Re: [vdsm] flowID schema > > > > -1 > > > > I agree that for messaging environment having a Message ID is a > > must > > because you sometimes don't have a particular target so when you > > get > > a response you need to know what this node is actually responding > > to. > > > > The message ID could be composed with so you can > > reuse the field. > > > > But that is all besides the point. > > > > I understand that someone might find it fun to go on following the > > entire flow in the Engine and in VDSM. But I would like to hear an > > actual use case where someone would have actually benefited from > > this. > > As I see it having VSDM return the task ID with every response (and > > not just for async tasks) is a lot more useful and correct. > > I'd like to hear from some folks who've had to support rhev and do > exactly this. > Dan, Simon, care to chip in? Agreed. However, I can personally say that I've debugged dozens of issues cross components and having an anchor in the engine log (result from vdsm) was all I needed to find the relevant part in the vdsm log. > > > > > > > A generic debugging scenario as I see it. > > > > 1. Something went wrong > > 2. You go looking in the ENGINE log trying to figure out what > > happend. > > 3. You see that ENGINE got SomeError. > > 4. Check to see if this error makes sense imagining that VDSM is > > always right and is a black box. > > 5. You did your digging and now you think that VDSM is as fault. > > 6. Go look for the call that failed. (If we returned the taskID > > it's > > pretty simple to find that call). > > 7. Look around the call to check VDSM state. > > 8. Profit. > > > > There is never a point where you want to follow a whole flow call > > by > > call going back and forth, and even if you did having the VDSM > > taskID is a better anchor then flowID. > > > > VDSM is built in a way that every call takes in to account the > > current state only. Debugging it with an engine flow mindset is > > just > > wrong and distracting. I see it doing more harm the good by > > reinforcing bad debugging practices. > > > > - Original Message - > > > From: "Keith Robertson" > > > To: "VDSM Project Development" > > > > > > Sent: Thursday, February 9, 2012 1:34:43 PM > > > Subject: Re: [vdsm] flowID schema > > > > > > On 02/09/2012 12:18 PM, Andrew Cathrow wrote: > > > > > > > > - Original Message - > > > >> From: "Ayal Baron" > > > >> To: "Dan Kenigsberg" > > > >> Cc: "VDSM Project > > > >> Development" > > > >> Sent: Monday, February 6, 2012 10:35:54 AM > > > >> Subject: Re: [vdsm] flowID schema > > > >> > > > >> > > > >> > > > >> - Original Message - > > > >>> On Thu, Feb 02, 2012 at 10:32:49AM -0500, Saggi Mizrahi > > > >>> wrote: > > > flowID makes no sense after the initial API call as stuff > > > like > > > cacheing\threadpools\samplingtasks\resources\asyncTasks so > > > flowing > > > a flow like that will not give you the entire picture while > > > debugging. > > > > > > Also adding it now will make everything even more ugly. > > > You know what, just imagine I wrote one of my long rambles > > > about > > > why I don't agree with doing this. > > > >>> I cannot imagine you write anything like that. Really. I do > > > >>> not > > > >>> understand why you object logging flowID on API entry point. > > > >> The question is, what problem is this really trying to solve > > > >> and > > > >> is > > > >> there a simpler and less obtrusive solution to that problem? > > > > correlating logs between ovirt engine and potentially multiple > > > > vdsm > > > > nodes is a nightmare. It requires a lot skill to follow a > > > > transaction through from the front end all the way to the node, > > > > and even multiple nodes (eg actions on spm, then actions on > > > > other > > > > node to run a vm). > > > > Having a way to correlate the logs and follow a single > > > > event/flow > > > > is vital. > > > > > > > +1 > > > > > > Knowing what command caused a sequence of events in VDSM would be > > > really > > > helpful particularly in a threaded environment. Further, > > > wouldn't > > > such > > > an ID be helpful in an asynchronous request/response model? I'm > > > not > > > sure what the plans are for AMQP or even if there are plans, but > > > I'd > > > think that something like this would be crucial for an async > > > response. > > > So, if you implemented it you might be killing 2 birds with 1 > > > stone. > > > > > > FYI: If you want to see examples of other systems that use > > > similar > > > concepts, take a look at the correlation ID in JMS. > > > > > > Cheers, > > > Keith > > > > > > > > > >>> ___ > > > >>> vds
Re: [vdsm] flowID schema
- Original Message - > From: "Saggi Mizrahi" > To: "Keith Robertson" > Cc: "VDSM Project Development" > Sent: Thursday, February 9, 2012 2:24:44 PM > Subject: Re: [vdsm] flowID schema > > -1 > > I agree that for messaging environment having a Message ID is a must > because you sometimes don't have a particular target so when you get > a response you need to know what this node is actually responding > to. > > The message ID could be composed with so you can > reuse the field. > > But that is all besides the point. > > I understand that someone might find it fun to go on following the > entire flow in the Engine and in VDSM. But I would like to hear an > actual use case where someone would have actually benefited from > this. > As I see it having VSDM return the task ID with every response (and > not just for async tasks) is a lot more useful and correct. I'd like to hear from some folks who've had to support rhev and do exactly this. Dan, Simon, care to chip in? > > A generic debugging scenario as I see it. > > 1. Something went wrong > 2. You go looking in the ENGINE log trying to figure out what > happend. > 3. You see that ENGINE got SomeError. > 4. Check to see if this error makes sense imagining that VDSM is > always right and is a black box. > 5. You did your digging and now you think that VDSM is as fault. > 6. Go look for the call that failed. (If we returned the taskID it's > pretty simple to find that call). > 7. Look around the call to check VDSM state. > 8. Profit. > > There is never a point where you want to follow a whole flow call by > call going back and forth, and even if you did having the VDSM > taskID is a better anchor then flowID. > > VDSM is built in a way that every call takes in to account the > current state only. Debugging it with an engine flow mindset is just > wrong and distracting. I see it doing more harm the good by > reinforcing bad debugging practices. > > - Original Message - > > From: "Keith Robertson" > > To: "VDSM Project Development" > > Sent: Thursday, February 9, 2012 1:34:43 PM > > Subject: Re: [vdsm] flowID schema > > > > On 02/09/2012 12:18 PM, Andrew Cathrow wrote: > > > > > > - Original Message - > > >> From: "Ayal Baron" > > >> To: "Dan Kenigsberg" > > >> Cc: "VDSM Project > > >> Development" > > >> Sent: Monday, February 6, 2012 10:35:54 AM > > >> Subject: Re: [vdsm] flowID schema > > >> > > >> > > >> > > >> - Original Message - > > >>> On Thu, Feb 02, 2012 at 10:32:49AM -0500, Saggi Mizrahi wrote: > > flowID makes no sense after the initial API call as stuff like > > cacheing\threadpools\samplingtasks\resources\asyncTasks so > > flowing > > a flow like that will not give you the entire picture while > > debugging. > > > > Also adding it now will make everything even more ugly. > > You know what, just imagine I wrote one of my long rambles > > about > > why I don't agree with doing this. > > >>> I cannot imagine you write anything like that. Really. I do not > > >>> understand why you object logging flowID on API entry point. > > >> The question is, what problem is this really trying to solve and > > >> is > > >> there a simpler and less obtrusive solution to that problem? > > > correlating logs between ovirt engine and potentially multiple > > > vdsm > > > nodes is a nightmare. It requires a lot skill to follow a > > > transaction through from the front end all the way to the node, > > > and even multiple nodes (eg actions on spm, then actions on other > > > node to run a vm). > > > Having a way to correlate the logs and follow a single event/flow > > > is vital. > > > > > +1 > > > > Knowing what command caused a sequence of events in VDSM would be > > really > > helpful particularly in a threaded environment. Further, wouldn't > > such > > an ID be helpful in an asynchronous request/response model? I'm > > not > > sure what the plans are for AMQP or even if there are plans, but > > I'd > > think that something like this would be crucial for an async > > response. > > So, if you implemented it you might be killing 2 birds with 1 > > stone. > > > > FYI: If you want to see examples of other systems that use similar > > concepts, take a look at the correlation ID in JMS. > > > > Cheers, > > Keith > > > > > > >>> ___ > > >>> vdsm-devel mailing list > > >>> vdsm-devel@lists.fedorahosted.org > > >>> https://fedorahosted.org/mailman/listinfo/vdsm-devel > > >>> > > >> ___ > > >> vdsm-devel mailing list > > >> vdsm-devel@lists.fedorahosted.org > > >> https://fedorahosted.org/mailman/listinfo/vdsm-devel > > >> > > > ___ > > > vdsm-devel mailing list > > > vdsm-devel@lists.fedorahosted.org > > > https://fedorahosted.org/mailman/listinfo/vdsm-devel > > > > ___ > > vdsm-devel mailing li
Re: [vdsm] flowID schema
- Original Message - > -1 > > I agree that for messaging environment having a Message ID is a must > because you sometimes don't have a particular target so when you get > a response you need to know what this node is actually responding > to. > > The message ID could be composed with so you can > reuse the field. > > But that is all besides the point. > > I understand that someone might find it fun to go on following the > entire flow in the Engine and in VDSM. But I would like to hear an > actual use case where someone would have actually benefited from > this. > As I see it having VSDM return the task ID with every response (and > not just for async tasks) is a lot more useful and correct. > > A generic debugging scenario as I see it. > > 1. Something went wrong > 2. You go looking in the ENGINE log trying to figure out what > happend. > 3. You see that ENGINE got SomeError. > 4. Check to see if this error makes sense imagining that VDSM is > always right and is a black box. > 5. You did your digging and now you think that VDSM is as fault. > 6. Go look for the call that failed. (If we returned the taskID it's > pretty simple to find that call). > 7. Look around the call to check VDSM state. > 8. Profit. > > There is never a point where you want to follow a whole flow call by > call going back and forth, and even if you did having the VDSM > taskID is a better anchor then flowID. > > VDSM is built in a way that every call takes in to account the > current state only. Debugging it with an engine flow mindset is just > wrong and distracting. I see it doing more harm the good by > reinforcing bad debugging practices. I don't know about harm, but, today the engine logs every call and return value to and from vdsm. This means that all the info that is needed to follow a flow is already present in the engine log (which was not the case previously) so I believe that the flow id is redundant. In addition, instead of focusing on how to track a flow between components, we should focus on how to improve the engine log so that the users don't need to go to the hosts in the first place. My biggest problem with it is that it changes each and every verb in the API and makes the log itself also more verbose and less readable. > > - Original Message - > > From: "Keith Robertson" > > To: "VDSM Project Development" > > Sent: Thursday, February 9, 2012 1:34:43 PM > > Subject: Re: [vdsm] flowID schema > > > > On 02/09/2012 12:18 PM, Andrew Cathrow wrote: > > > > > > - Original Message - > > >> From: "Ayal Baron" > > >> To: "Dan Kenigsberg" > > >> Cc: "VDSM Project > > >> Development" > > >> Sent: Monday, February 6, 2012 10:35:54 AM > > >> Subject: Re: [vdsm] flowID schema > > >> > > >> > > >> > > >> - Original Message - > > >>> On Thu, Feb 02, 2012 at 10:32:49AM -0500, Saggi Mizrahi wrote: > > flowID makes no sense after the initial API call as stuff like > > cacheing\threadpools\samplingtasks\resources\asyncTasks so > > flowing > > a flow like that will not give you the entire picture while > > debugging. > > > > Also adding it now will make everything even more ugly. > > You know what, just imagine I wrote one of my long rambles > > about > > why I don't agree with doing this. > > >>> I cannot imagine you write anything like that. Really. I do not > > >>> understand why you object logging flowID on API entry point. > > >> The question is, what problem is this really trying to solve and > > >> is > > >> there a simpler and less obtrusive solution to that problem? > > > correlating logs between ovirt engine and potentially multiple > > > vdsm > > > nodes is a nightmare. It requires a lot skill to follow a > > > transaction through from the front end all the way to the node, > > > and even multiple nodes (eg actions on spm, then actions on other > > > node to run a vm). > > > Having a way to correlate the logs and follow a single event/flow > > > is vital. > > > > > +1 > > > > Knowing what command caused a sequence of events in VDSM would be > > really > > helpful particularly in a threaded environment. Further, wouldn't > > such > > an ID be helpful in an asynchronous request/response model? I'm > > not > > sure what the plans are for AMQP or even if there are plans, but > > I'd > > think that something like this would be crucial for an async > > response. > > So, if you implemented it you might be killing 2 birds with 1 > > stone. > > > > FYI: If you want to see examples of other systems that use similar > > concepts, take a look at the correlation ID in JMS. > > > > Cheers, > > Keith > > > > > > >>> ___ > > >>> vdsm-devel mailing list > > >>> vdsm-devel@lists.fedorahosted.org > > >>> https://fedorahosted.org/mailman/listinfo/vdsm-devel > > >>> > > >> ___ > > >> vdsm-devel mailing list > > >> vdsm-devel@lists.fedorahosted.or
Re: [vdsm] flowID schema
-1 I agree that for messaging environment having a Message ID is a must because you sometimes don't have a particular target so when you get a response you need to know what this node is actually responding to. The message ID could be composed with so you can reuse the field. But that is all besides the point. I understand that someone might find it fun to go on following the entire flow in the Engine and in VDSM. But I would like to hear an actual use case where someone would have actually benefited from this. As I see it having VSDM return the task ID with every response (and not just for async tasks) is a lot more useful and correct. A generic debugging scenario as I see it. 1. Something went wrong 2. You go looking in the ENGINE log trying to figure out what happend. 3. You see that ENGINE got SomeError. 4. Check to see if this error makes sense imagining that VDSM is always right and is a black box. 5. You did your digging and now you think that VDSM is as fault. 6. Go look for the call that failed. (If we returned the taskID it's pretty simple to find that call). 7. Look around the call to check VDSM state. 8. Profit. There is never a point where you want to follow a whole flow call by call going back and forth, and even if you did having the VDSM taskID is a better anchor then flowID. VDSM is built in a way that every call takes in to account the current state only. Debugging it with an engine flow mindset is just wrong and distracting. I see it doing more harm the good by reinforcing bad debugging practices. - Original Message - > From: "Keith Robertson" > To: "VDSM Project Development" > Sent: Thursday, February 9, 2012 1:34:43 PM > Subject: Re: [vdsm] flowID schema > > On 02/09/2012 12:18 PM, Andrew Cathrow wrote: > > > > - Original Message - > >> From: "Ayal Baron" > >> To: "Dan Kenigsberg" > >> Cc: "VDSM Project Development" > >> Sent: Monday, February 6, 2012 10:35:54 AM > >> Subject: Re: [vdsm] flowID schema > >> > >> > >> > >> - Original Message - > >>> On Thu, Feb 02, 2012 at 10:32:49AM -0500, Saggi Mizrahi wrote: > flowID makes no sense after the initial API call as stuff like > cacheing\threadpools\samplingtasks\resources\asyncTasks so > flowing > a flow like that will not give you the entire picture while > debugging. > > Also adding it now will make everything even more ugly. > You know what, just imagine I wrote one of my long rambles about > why I don't agree with doing this. > >>> I cannot imagine you write anything like that. Really. I do not > >>> understand why you object logging flowID on API entry point. > >> The question is, what problem is this really trying to solve and > >> is > >> there a simpler and less obtrusive solution to that problem? > > correlating logs between ovirt engine and potentially multiple vdsm > > nodes is a nightmare. It requires a lot skill to follow a > > transaction through from the front end all the way to the node, > > and even multiple nodes (eg actions on spm, then actions on other > > node to run a vm). > > Having a way to correlate the logs and follow a single event/flow > > is vital. > > > +1 > > Knowing what command caused a sequence of events in VDSM would be > really > helpful particularly in a threaded environment. Further, wouldn't > such > an ID be helpful in an asynchronous request/response model? I'm not > sure what the plans are for AMQP or even if there are plans, but I'd > think that something like this would be crucial for an async > response. > So, if you implemented it you might be killing 2 birds with 1 stone. > > FYI: If you want to see examples of other systems that use similar > concepts, take a look at the correlation ID in JMS. > > Cheers, > Keith > > > >>> ___ > >>> vdsm-devel mailing list > >>> vdsm-devel@lists.fedorahosted.org > >>> https://fedorahosted.org/mailman/listinfo/vdsm-devel > >>> > >> ___ > >> vdsm-devel mailing list > >> vdsm-devel@lists.fedorahosted.org > >> https://fedorahosted.org/mailman/listinfo/vdsm-devel > >> > > ___ > > vdsm-devel mailing list > > vdsm-devel@lists.fedorahosted.org > > https://fedorahosted.org/mailman/listinfo/vdsm-devel > > ___ > vdsm-devel mailing list > vdsm-devel@lists.fedorahosted.org > https://fedorahosted.org/mailman/listinfo/vdsm-devel > ___ vdsm-devel mailing list vdsm-devel@lists.fedorahosted.org https://fedorahosted.org/mailman/listinfo/vdsm-devel
Re: [vdsm] flowID schema
On 02/09/2012 12:18 PM, Andrew Cathrow wrote: - Original Message - From: "Ayal Baron" To: "Dan Kenigsberg" Cc: "VDSM Project Development" Sent: Monday, February 6, 2012 10:35:54 AM Subject: Re: [vdsm] flowID schema - Original Message - On Thu, Feb 02, 2012 at 10:32:49AM -0500, Saggi Mizrahi wrote: flowID makes no sense after the initial API call as stuff like cacheing\threadpools\samplingtasks\resources\asyncTasks so flowing a flow like that will not give you the entire picture while debugging. Also adding it now will make everything even more ugly. You know what, just imagine I wrote one of my long rambles about why I don't agree with doing this. I cannot imagine you write anything like that. Really. I do not understand why you object logging flowID on API entry point. The question is, what problem is this really trying to solve and is there a simpler and less obtrusive solution to that problem? correlating logs between ovirt engine and potentially multiple vdsm nodes is a nightmare. It requires a lot skill to follow a transaction through from the front end all the way to the node, and even multiple nodes (eg actions on spm, then actions on other node to run a vm). Having a way to correlate the logs and follow a single event/flow is vital. +1 Knowing what command caused a sequence of events in VDSM would be really helpful particularly in a threaded environment. Further, wouldn't such an ID be helpful in an asynchronous request/response model? I'm not sure what the plans are for AMQP or even if there are plans, but I'd think that something like this would be crucial for an async response. So, if you implemented it you might be killing 2 birds with 1 stone. FYI: If you want to see examples of other systems that use similar concepts, take a look at the correlation ID in JMS. Cheers, Keith ___ vdsm-devel mailing list vdsm-devel@lists.fedorahosted.org https://fedorahosted.org/mailman/listinfo/vdsm-devel ___ vdsm-devel mailing list vdsm-devel@lists.fedorahosted.org https://fedorahosted.org/mailman/listinfo/vdsm-devel ___ vdsm-devel mailing list vdsm-devel@lists.fedorahosted.org https://fedorahosted.org/mailman/listinfo/vdsm-devel ___ vdsm-devel mailing list vdsm-devel@lists.fedorahosted.org https://fedorahosted.org/mailman/listinfo/vdsm-devel
Re: [vdsm] flowID schema
-BEGIN PGP SIGNED MESSAGE- Hash: SHA1 On 02/09/2012 05:18 PM, Andrew Cathrow wrote: >> > The question is, what problem is this really trying to solve and is >> > there a simpler and less obtrusive solution to that problem? > correlating logs between ovirt engine and potentially multiple vdsm nodes is > a nightmare. It requires a lot skill to follow a transaction through from the > front end all the way to the node, and even multiple nodes (eg actions on > spm, then actions on other node to run a vm). > Having a way to correlate the logs and follow a single event/flow is vital. > +1 anything that allows us to easily follow flows between engine, vdsm and back again would be welcomed by those supporting it! Lee -BEGIN PGP SIGNATURE- Version: GnuPG v1.4.11 (GNU/Linux) Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org/ iQEcBAEBAgAGBQJPNAiFAAoJELymbjP2ci12PFYH/09f/c7xlSMZ1QKJo6QHgBmC fucAUiKDWFxpJ3zyOJsoJfCL6TmXdvdUAJQXhmy2Its3KGYdhKZLezqJHpIGM/AR f694/O5a1jUYGJnxtQknr1H4wJar1ot0gIJEuiDxEM9haR4XTTYbhsyl8ApQ8wHP EERHi6eTzLY0j3ohbDOqRSrGdsofTYQ653MNQXggfBa41R1KbOPkp8XIf4RJ7+0J 8e7xGqU3SwITDeJ9c7yj5bwWdpuzUGvwPZif8VeuR0RajjKHCagiexWbRoLmoGYi dqc+QcxOL8zs1e9mOiB1Cgfc36bbCjGFBNYe8N3gpnod9NCQZ1f9FzbQ82+Dosg= =CMZJ -END PGP SIGNATURE- ___ vdsm-devel mailing list vdsm-devel@lists.fedorahosted.org https://fedorahosted.org/mailman/listinfo/vdsm-devel
Re: [vdsm] flowID schema
- Original Message - > From: "Ayal Baron" > To: "Dan Kenigsberg" > Cc: "VDSM Project Development" > Sent: Monday, February 6, 2012 10:35:54 AM > Subject: Re: [vdsm] flowID schema > > > > - Original Message - > > On Thu, Feb 02, 2012 at 10:32:49AM -0500, Saggi Mizrahi wrote: > > > flowID makes no sense after the initial API call as stuff like > > > cacheing\threadpools\samplingtasks\resources\asyncTasks so > > > flowing > > > a flow like that will not give you the entire picture while > > > debugging. > > > > > > Also adding it now will make everything even more ugly. > > > You know what, just imagine I wrote one of my long rambles about > > > why I don't agree with doing this. > > > > I cannot imagine you write anything like that. Really. I do not > > understand why you object logging flowID on API entry point. > > The question is, what problem is this really trying to solve and is > there a simpler and less obtrusive solution to that problem? correlating logs between ovirt engine and potentially multiple vdsm nodes is a nightmare. It requires a lot skill to follow a transaction through from the front end all the way to the node, and even multiple nodes (eg actions on spm, then actions on other node to run a vm). Having a way to correlate the logs and follow a single event/flow is vital. > > > > > ___ > > vdsm-devel mailing list > > vdsm-devel@lists.fedorahosted.org > > https://fedorahosted.org/mailman/listinfo/vdsm-devel > > > ___ > vdsm-devel mailing list > vdsm-devel@lists.fedorahosted.org > https://fedorahosted.org/mailman/listinfo/vdsm-devel > ___ vdsm-devel mailing list vdsm-devel@lists.fedorahosted.org https://fedorahosted.org/mailman/listinfo/vdsm-devel
[vdsm] Adding gluster support
Hi, The following wiki describes the approach i'm suggesting for adding gluster support in phases to ovirt. http://www.ovirt.org/wiki/AddingGlusterSupportToOvirt comments welcome. Thanks, Itamar ___ vdsm-devel mailing list vdsm-devel@lists.fedorahosted.org https://fedorahosted.org/mailman/listinfo/vdsm-devel