Re: [vdsm] vdsm hangs in SamplingMethod after reinstall

2012-02-09 Thread Adam Litke
On Thu, Feb 09, 2012 at 07:15:48PM -0500, Ayal Baron wrote:
> 
> 
> - Original Message -
> > Hi.  I am running into a very annoying problem when working on vdsm
> > lately.  My
> > development process involves stopping vdsm, replacing files, and
> > restarting it.
> > I do this pretty frequently.  Sometimes, after restarting vdsm the
> > XMLRPC call
> > getStorageDomainsList() hangs.  The following line is the last to
> > print in the
> > log:
> > 
> > Thread-18::DEBUG::2012-02-09
> > 17:11:46,793::misc::1017::SamplingMethod::(__call__) Trying to enter
> > sampling method (storage.sdc.refreshStorage)
> > 
> > The only solution I've been able to come up with is restarting my
> > machine.  When
> > stopping vdsm I search for any stale threads but I am unable to find
> > them.  Do
> > you know what else might be causing DynamicBarrier.enter() to hang
> > for a long
> > period of time?  Do the threading primitives use some sort of
> > temporary disk
> > storage that needs to be cleaned up?  Thanks for the help!
> 
> Try to add some logging in sdc.py:
> def refreshStorage(self):
> >>> ADD LOG HERE

Yep have done this and I am not even getting into the refreshStorage function.
We actually hang in DynamicBarrier.enter().  I am going to add some debugging to
determine which locking operation gets stuck.

> multipath.rescan()
> 
> I have a feeling that your issue is not with SamplingMethod
> 
> > 
> > --
> > Adam Litke 
> > IBM Linux Technology Center
> > 
> > ___
> > vdsm-devel mailing list
> > vdsm-devel@lists.fedorahosted.org
> > https://fedorahosted.org/mailman/listinfo/vdsm-devel
> > 
> 

-- 
Adam Litke 
IBM Linux Technology Center

___
vdsm-devel mailing list
vdsm-devel@lists.fedorahosted.org
https://fedorahosted.org/mailman/listinfo/vdsm-devel


Re: [vdsm] vdsm hangs in SamplingMethod after reinstall

2012-02-09 Thread Ayal Baron


- Original Message -
> Hi.  I am running into a very annoying problem when working on vdsm
> lately.  My
> development process involves stopping vdsm, replacing files, and
> restarting it.
> I do this pretty frequently.  Sometimes, after restarting vdsm the
> XMLRPC call
> getStorageDomainsList() hangs.  The following line is the last to
> print in the
> log:
> 
> Thread-18::DEBUG::2012-02-09
> 17:11:46,793::misc::1017::SamplingMethod::(__call__) Trying to enter
> sampling method (storage.sdc.refreshStorage)
> 
> The only solution I've been able to come up with is restarting my
> machine.  When
> stopping vdsm I search for any stale threads but I am unable to find
> them.  Do
> you know what else might be causing DynamicBarrier.enter() to hang
> for a long
> period of time?  Do the threading primitives use some sort of
> temporary disk
> storage that needs to be cleaned up?  Thanks for the help!

Try to add some logging in sdc.py:
def refreshStorage(self):
>>> ADD LOG HERE
multipath.rescan()

I have a feeling that your issue is not with SamplingMethod

> 
> --
> Adam Litke 
> IBM Linux Technology Center
> 
> ___
> vdsm-devel mailing list
> vdsm-devel@lists.fedorahosted.org
> https://fedorahosted.org/mailman/listinfo/vdsm-devel
> 
___
vdsm-devel mailing list
vdsm-devel@lists.fedorahosted.org
https://fedorahosted.org/mailman/listinfo/vdsm-devel


[vdsm] vdsm hangs in SamplingMethod after reinstall

2012-02-09 Thread Adam Litke
Hi.  I am running into a very annoying problem when working on vdsm lately.  My
development process involves stopping vdsm, replacing files, and restarting it.
I do this pretty frequently.  Sometimes, after restarting vdsm the XMLRPC call
getStorageDomainsList() hangs.  The following line is the last to print in the
log:

Thread-18::DEBUG::2012-02-09 
17:11:46,793::misc::1017::SamplingMethod::(__call__) Trying to enter sampling 
method (storage.sdc.refreshStorage)

The only solution I've been able to come up with is restarting my machine.  When
stopping vdsm I search for any stale threads but I am unable to find them.  Do
you know what else might be causing DynamicBarrier.enter() to hang for a long
period of time?  Do the threading primitives use some sort of temporary disk
storage that needs to be cleaned up?  Thanks for the help!

-- 
Adam Litke 
IBM Linux Technology Center

___
vdsm-devel mailing list
vdsm-devel@lists.fedorahosted.org
https://fedorahosted.org/mailman/listinfo/vdsm-devel


Re: [vdsm] flowID schema

2012-02-09 Thread Ayal Baron


- Original Message -
> > From: "Saggi Mizrahi" 
> > To: "Keith Robertson" 
> > Cc: "VDSM Project Development" 
> > Sent: Thursday, February 9, 2012 2:24:44 PM
> > Subject: Re: [vdsm] flowID schema
> >
> > -1
> >
> > I agree that for messaging environment having a Message ID is a
> > must
> > because you sometimes don't have a particular target so when you
> > get
> > a response you need to know what this node is actually responding
> > to.
> >
> > The message ID could be composed with  so you can
> > reuse the field.
> >
> > But that is all besides the point.
> >
> > I understand that someone might find it fun to go on following the
> > entire flow in the Engine and in VDSM. But I would like to hear an
> > actual use case where someone would have actually benefited from
> > this.
> > As I see it having VSDM return the task ID with every response (and
> > not just for async tasks) is a lot more useful and correct.
> 
> Actually, the only way to understand what happened in a certain flow
> is to follow it through. From the engine log where an action was
> initiated, down to the hosts that did the execution. Everything RHEV
> does is a flow, and with no correlation between hosts executing
> parts of the same flow, troubleshooting turns into guesswork,
> because the only contact point left is time, which is useless when
> you're talking about vdsm - there are sometimes hundreds of log
> records in a single second, and not every host is in absolute sync
> with every other.

What are you talking about? you know exactly what operation the engine ran at 
vdsm level. If it's a task then you also have a task id which is a uuid so you 
don't need anything else.
In addition, now that engine logs results, you can just grep that instead of a 
flow id and land at the exact correct command and not have to figure out which 
out of the 5 run in this flow is the relevant one.

If you could give a real example where this would be beneficial (i.e. log 
excerpts, how you correlated them and how flow id would have eased your job) 
that would be great.
Note that I've also discussed this with Yaniv from qe who said they don't 
really need it.

> 
> >
> > A generic debugging scenario as I see it.
> >
> > 1. Something went wrong
> > 2. You go looking in the ENGINE log trying to figure out what
> > happend.
> > 3. You see that ENGINE got SomeError.
> 
> ok, the rest are all downhill.
> 
> 4. You follow the failure back to the start of the flow, then go with
> the flow to the point where the engine exited to vdsm
> 5. switch over to vdsm logs, make sure you have the timing right
> (with no flow ID that's the olny orientation after all)
> 6. find the start of the vdsm-side flow, follow it to the failure,
> pray the error makes sense.
> 
> In many cases the answer is not in the vdsm failure traceback but
> somewhere in the middle of the flow, with no errors reported, this
> is why we need a way to easily follow things through. Moreover, the
> logs should be readable enough to make sense to a typical sysadmin,
> and not a RHEV expert.
> 
> > 4. Check to see if this error makes sense imagining that VDSM is
> > always right and is a black box.
> > 5. You did your digging and now you think that VDSM is as fault.
> > 6. Go look for the call that failed. (If we returned the taskID
> > it's
> > pretty simple to find that call).
> > 7. Look around the call to check VDSM state.
> > 8. Profit.
> >
> > There is never a point where you want to follow a whole flow call
> > by
> > call going back and forth, and even if you did having the VDSM
> > taskID is a better anchor then flowID.
> 
> not everything is a task, flow IDs would unify entire flows, and make
> following them easy.
> 
> >
> > VDSM is built in a way that every call takes in to account the
> > current state only. Debugging it with an engine flow mindset is
> > just
> > wrong and distracting. I see it doing more harm the good by
> > reinforcing bad debugging practices.
> 
> Maybe you're right, though I can't see how from my experience so far,
> but following the flows is the only thing that got cases resolved.
> Not event IDs making every possible error, and not task IDs (though
> these do have their uses) - slow and meticulous mapping of flows to
> log records.
> 
> >
> > - Original Message -
> > > From: "Keith Robertson" 
> > > To: "VDSM Project Development"
> > > 
> > > Sent: Thursday, February 9, 2012 1:34:43 PM
> > > Subject: Re: [vdsm] flowID schema
> > >
> > > On 02/09/2012 12:18 PM, Andrew Cathrow wrote:
> > > >
> > > > - Original Message -
> > > >> From: "Ayal Baron"
> > > >> To: "Dan Kenigsberg"
> > > >> Cc: "VDSM Project
> > > >> Development"
> > > >> Sent: Monday, February 6, 2012 10:35:54 AM
> > > >> Subject: Re: [vdsm] flowID schema
> > > >>
> > > >>
> > > >>
> > > >> - Original Message -
> > > >>> On Thu, Feb 02, 2012 at 10:32:49AM -0500, Saggi Mizrahi
> > > >>> wrote:
> > >  flowID makes no sense after the initial API call as stuff
> 

Re: [vdsm] flowID schema

2012-02-09 Thread Dan Yasny
> From: "Saggi Mizrahi" 
> To: "Keith Robertson" 
> Cc: "VDSM Project Development" 
> Sent: Thursday, February 9, 2012 2:24:44 PM
> Subject: Re: [vdsm] flowID schema
>
> -1
>
> I agree that for messaging environment having a Message ID is a must
> because you sometimes don't have a particular target so when you get
> a response you need to know what this node is actually responding
> to.
>
> The message ID could be composed with  so you can
> reuse the field.
>
> But that is all besides the point.
>
> I understand that someone might find it fun to go on following the
> entire flow in the Engine and in VDSM. But I would like to hear an
> actual use case where someone would have actually benefited from
> this.
> As I see it having VSDM return the task ID with every response (and
> not just for async tasks) is a lot more useful and correct.

Actually, the only way to understand what happened in a certain flow is to 
follow it through. From the engine log where an action was initiated, down to 
the hosts that did the execution. Everything RHEV does is a flow, and with no 
correlation between hosts executing parts of the same flow, troubleshooting 
turns into guesswork, because the only contact point left is time, which is 
useless when you're talking about vdsm - there are sometimes hundreds of log 
records in a single second, and not every host is in absolute sync with every 
other.

>
> A generic debugging scenario as I see it.
>
> 1. Something went wrong
> 2. You go looking in the ENGINE log trying to figure out what
> happend.
> 3. You see that ENGINE got SomeError.

ok, the rest are all downhill. 

4. You follow the failure back to the start of the flow, then go with the flow 
to the point where the engine exited to vdsm 
5. switch over to vdsm logs, make sure you have the timing right (with no flow 
ID that's the olny orientation after all)
6. find the start of the vdsm-side flow, follow it to the failure, pray the 
error makes sense.

In many cases the answer is not in the vdsm failure traceback but somewhere in 
the middle of the flow, with no errors reported, this is why we need a way to 
easily follow things through. Moreover, the logs should be readable enough to 
make sense to a typical sysadmin, and not a RHEV expert.

> 4. Check to see if this error makes sense imagining that VDSM is
> always right and is a black box.
> 5. You did your digging and now you think that VDSM is as fault.
> 6. Go look for the call that failed. (If we returned the taskID it's
> pretty simple to find that call).
> 7. Look around the call to check VDSM state.
> 8. Profit.
>
> There is never a point where you want to follow a whole flow call by
> call going back and forth, and even if you did having the VDSM
> taskID is a better anchor then flowID.

not everything is a task, flow IDs would unify entire flows, and make following 
them easy.

>
> VDSM is built in a way that every call takes in to account the
> current state only. Debugging it with an engine flow mindset is just
> wrong and distracting. I see it doing more harm the good by
> reinforcing bad debugging practices.

Maybe you're right, though I can't see how from my experience so far, but 
following the flows is the only thing that got cases resolved. Not event IDs 
making every possible error, and not task IDs (though these do have their uses) 
- slow and meticulous mapping of flows to log records.

>
> - Original Message -
> > From: "Keith Robertson" 
> > To: "VDSM Project Development" 
> > Sent: Thursday, February 9, 2012 1:34:43 PM
> > Subject: Re: [vdsm] flowID schema
> >
> > On 02/09/2012 12:18 PM, Andrew Cathrow wrote:
> > >
> > > - Original Message -
> > >> From: "Ayal Baron"
> > >> To: "Dan Kenigsberg"
> > >> Cc: "VDSM Project
> > >> Development"
> > >> Sent: Monday, February 6, 2012 10:35:54 AM
> > >> Subject: Re: [vdsm] flowID schema
> > >>
> > >>
> > >>
> > >> - Original Message -
> > >>> On Thu, Feb 02, 2012 at 10:32:49AM -0500, Saggi Mizrahi wrote:
> >  flowID makes no sense after the initial API call as stuff like
> >  cacheing\threadpools\samplingtasks\resources\asyncTasks so
> >  flowing
> >  a flow like that will not give you the entire picture while
> >  debugging.
> > 
> >  Also adding it now will make everything even more ugly.
> >  You know what, just imagine I wrote one of my long rambles
> >  about
> >  why I don't agree with doing this.
> > >>> I cannot imagine you write anything like that. Really. I do not
> > >>> understand why you object logging flowID on API entry point.
> > >> The question is, what problem is this really trying to solve and
> > >> is
> > >> there a simpler and less obtrusive solution to that problem?
> > > correlating logs between ovirt engine and potentially multiple
> > > vdsm
> > > nodes is a nightmare. It requires a lot skill to follow a
> > > transaction through from the front end all the way to the node,
> > > and even multiple nodes (eg actions o

Re: [vdsm] flowID schema

2012-02-09 Thread Ayal Baron


- Original Message -
> 
> 
> - Original Message -
> > From: "Saggi Mizrahi" 
> > To: "Keith Robertson" 
> > Cc: "VDSM Project Development" 
> > Sent: Thursday, February 9, 2012 2:24:44 PM
> > Subject: Re: [vdsm] flowID schema
> > 
> > -1
> > 
> > I agree that for messaging environment having a Message ID is a
> > must
> > because you sometimes don't have a particular target so when you
> > get
> > a response you need to know what this node is actually responding
> > to.
> > 
> > The message ID could be composed with  so you can
> > reuse the field.
> > 
> > But that is all besides the point.
> > 
> > I understand that someone might find it fun to go on following the
> > entire flow in the Engine and in VDSM. But I would like to hear an
> > actual use case where someone would have actually benefited from
> > this.
> > As I see it having VSDM return the task ID with every response (and
> > not just for async tasks) is a lot more useful and correct.
> 
> I'd like to hear from some folks who've had to support rhev and do
> exactly this.
> Dan, Simon, care to chip in?

Agreed.  However, I can personally say that I've debugged dozens of issues 
cross components and having an anchor in the engine log (result from vdsm) was 
all I needed to find the relevant part in the vdsm log.

> 
> 
> 
> > 
> > A generic debugging scenario as I see it.
> > 
> > 1. Something went wrong
> > 2. You go looking in the ENGINE log trying to figure out what
> > happend.
> > 3. You see that ENGINE got SomeError.
> > 4. Check to see if this error makes sense imagining that VDSM is
> > always right and is a black box.
> > 5. You did your digging and now you think that VDSM is as fault.
> > 6. Go look for the call that failed. (If we returned the taskID
> > it's
> > pretty simple to find that call).
> > 7. Look around the call to check VDSM state.
> > 8. Profit.
> > 
> > There is never a point where you want to follow a whole flow call
> > by
> > call going back and forth, and even if you did having the VDSM
> > taskID is a better anchor then flowID.
> > 
> > VDSM is built in a way that every call takes in to account the
> > current state only. Debugging it with an engine flow mindset is
> > just
> > wrong and distracting. I see it doing more harm the good by
> > reinforcing bad debugging practices.
> > 
> > - Original Message -
> > > From: "Keith Robertson" 
> > > To: "VDSM Project Development"
> > > 
> > > Sent: Thursday, February 9, 2012 1:34:43 PM
> > > Subject: Re: [vdsm] flowID schema
> > > 
> > > On 02/09/2012 12:18 PM, Andrew Cathrow wrote:
> > > >
> > > > - Original Message -
> > > >> From: "Ayal Baron"
> > > >> To: "Dan Kenigsberg"
> > > >> Cc: "VDSM Project
> > > >> Development"
> > > >> Sent: Monday, February 6, 2012 10:35:54 AM
> > > >> Subject: Re: [vdsm] flowID schema
> > > >>
> > > >>
> > > >>
> > > >> - Original Message -
> > > >>> On Thu, Feb 02, 2012 at 10:32:49AM -0500, Saggi Mizrahi
> > > >>> wrote:
> > >  flowID makes no sense after the initial API call as stuff
> > >  like
> > >  cacheing\threadpools\samplingtasks\resources\asyncTasks so
> > >  flowing
> > >  a flow like that will not give you the entire picture while
> > >  debugging.
> > > 
> > >  Also adding it now will make everything even more ugly.
> > >  You know what, just imagine I wrote one of my long rambles
> > >  about
> > >  why I don't agree with doing this.
> > > >>> I cannot imagine you write anything like that. Really. I do
> > > >>> not
> > > >>> understand why you object logging flowID on API entry point.
> > > >> The question is, what problem is this really trying to solve
> > > >> and
> > > >> is
> > > >> there a simpler and less obtrusive solution to that problem?
> > > > correlating logs between ovirt engine and potentially multiple
> > > > vdsm
> > > > nodes is a nightmare. It requires a lot skill to follow a
> > > > transaction through from the front end all the way to the node,
> > > > and even multiple nodes (eg actions on spm, then actions on
> > > > other
> > > > node to run a vm).
> > > > Having a way to correlate the logs and follow a single
> > > > event/flow
> > > > is vital.
> > > >
> > > +1
> > > 
> > > Knowing what command caused a sequence of events in VDSM would be
> > > really
> > > helpful particularly in a threaded environment.  Further,
> > > wouldn't
> > > such
> > > an ID be helpful in an asynchronous request/response model?  I'm
> > > not
> > > sure what the plans are for AMQP or even if there are plans, but
> > > I'd
> > > think that something like this would be crucial for an async
> > > response.
> > > So, if you implemented it you might be killing 2 birds with 1
> > > stone.
> > > 
> > > FYI: If you want to see examples of other systems that use
> > > similar
> > > concepts, take a look at the correlation ID in JMS.
> > > 
> > > Cheers,
> > > Keith
> > > 
> > > 
> > > >>> ___
> > > >>> vds

Re: [vdsm] flowID schema

2012-02-09 Thread Andrew Cathrow


- Original Message -
> From: "Saggi Mizrahi" 
> To: "Keith Robertson" 
> Cc: "VDSM Project Development" 
> Sent: Thursday, February 9, 2012 2:24:44 PM
> Subject: Re: [vdsm] flowID schema
> 
> -1
> 
> I agree that for messaging environment having a Message ID is a must
> because you sometimes don't have a particular target so when you get
> a response you need to know what this node is actually responding
> to.
> 
> The message ID could be composed with  so you can
> reuse the field.
> 
> But that is all besides the point.
> 
> I understand that someone might find it fun to go on following the
> entire flow in the Engine and in VDSM. But I would like to hear an
> actual use case where someone would have actually benefited from
> this.
> As I see it having VSDM return the task ID with every response (and
> not just for async tasks) is a lot more useful and correct.

I'd like to hear from some folks who've had to support rhev and do exactly this.
Dan, Simon, care to chip in?



> 
> A generic debugging scenario as I see it.
> 
> 1. Something went wrong
> 2. You go looking in the ENGINE log trying to figure out what
> happend.
> 3. You see that ENGINE got SomeError.
> 4. Check to see if this error makes sense imagining that VDSM is
> always right and is a black box.
> 5. You did your digging and now you think that VDSM is as fault.
> 6. Go look for the call that failed. (If we returned the taskID it's
> pretty simple to find that call).
> 7. Look around the call to check VDSM state.
> 8. Profit.
> 
> There is never a point where you want to follow a whole flow call by
> call going back and forth, and even if you did having the VDSM
> taskID is a better anchor then flowID.
> 
> VDSM is built in a way that every call takes in to account the
> current state only. Debugging it with an engine flow mindset is just
> wrong and distracting. I see it doing more harm the good by
> reinforcing bad debugging practices.
> 
> - Original Message -
> > From: "Keith Robertson" 
> > To: "VDSM Project Development" 
> > Sent: Thursday, February 9, 2012 1:34:43 PM
> > Subject: Re: [vdsm] flowID schema
> > 
> > On 02/09/2012 12:18 PM, Andrew Cathrow wrote:
> > >
> > > - Original Message -
> > >> From: "Ayal Baron"
> > >> To: "Dan Kenigsberg"
> > >> Cc: "VDSM Project
> > >> Development"
> > >> Sent: Monday, February 6, 2012 10:35:54 AM
> > >> Subject: Re: [vdsm] flowID schema
> > >>
> > >>
> > >>
> > >> - Original Message -
> > >>> On Thu, Feb 02, 2012 at 10:32:49AM -0500, Saggi Mizrahi wrote:
> >  flowID makes no sense after the initial API call as stuff like
> >  cacheing\threadpools\samplingtasks\resources\asyncTasks so
> >  flowing
> >  a flow like that will not give you the entire picture while
> >  debugging.
> > 
> >  Also adding it now will make everything even more ugly.
> >  You know what, just imagine I wrote one of my long rambles
> >  about
> >  why I don't agree with doing this.
> > >>> I cannot imagine you write anything like that. Really. I do not
> > >>> understand why you object logging flowID on API entry point.
> > >> The question is, what problem is this really trying to solve and
> > >> is
> > >> there a simpler and less obtrusive solution to that problem?
> > > correlating logs between ovirt engine and potentially multiple
> > > vdsm
> > > nodes is a nightmare. It requires a lot skill to follow a
> > > transaction through from the front end all the way to the node,
> > > and even multiple nodes (eg actions on spm, then actions on other
> > > node to run a vm).
> > > Having a way to correlate the logs and follow a single event/flow
> > > is vital.
> > >
> > +1
> > 
> > Knowing what command caused a sequence of events in VDSM would be
> > really
> > helpful particularly in a threaded environment.  Further, wouldn't
> > such
> > an ID be helpful in an asynchronous request/response model?  I'm
> > not
> > sure what the plans are for AMQP or even if there are plans, but
> > I'd
> > think that something like this would be crucial for an async
> > response.
> > So, if you implemented it you might be killing 2 birds with 1
> > stone.
> > 
> > FYI: If you want to see examples of other systems that use similar
> > concepts, take a look at the correlation ID in JMS.
> > 
> > Cheers,
> > Keith
> > 
> > 
> > >>> ___
> > >>> vdsm-devel mailing list
> > >>> vdsm-devel@lists.fedorahosted.org
> > >>> https://fedorahosted.org/mailman/listinfo/vdsm-devel
> > >>>
> > >> ___
> > >> vdsm-devel mailing list
> > >> vdsm-devel@lists.fedorahosted.org
> > >> https://fedorahosted.org/mailman/listinfo/vdsm-devel
> > >>
> > > ___
> > > vdsm-devel mailing list
> > > vdsm-devel@lists.fedorahosted.org
> > > https://fedorahosted.org/mailman/listinfo/vdsm-devel
> > 
> > ___
> > vdsm-devel mailing li

Re: [vdsm] flowID schema

2012-02-09 Thread Ayal Baron


- Original Message -
> -1
> 
> I agree that for messaging environment having a Message ID is a must
> because you sometimes don't have a particular target so when you get
> a response you need to know what this node is actually responding
> to.
> 
> The message ID could be composed with  so you can
> reuse the field.
> 
> But that is all besides the point.
> 
> I understand that someone might find it fun to go on following the
> entire flow in the Engine and in VDSM. But I would like to hear an
> actual use case where someone would have actually benefited from
> this.
> As I see it having VSDM return the task ID with every response (and
> not just for async tasks) is a lot more useful and correct.
> 
> A generic debugging scenario as I see it.
> 
> 1. Something went wrong
> 2. You go looking in the ENGINE log trying to figure out what
> happend.
> 3. You see that ENGINE got SomeError.
> 4. Check to see if this error makes sense imagining that VDSM is
> always right and is a black box.
> 5. You did your digging and now you think that VDSM is as fault.
> 6. Go look for the call that failed. (If we returned the taskID it's
> pretty simple to find that call).
> 7. Look around the call to check VDSM state.
> 8. Profit.
> 
> There is never a point where you want to follow a whole flow call by
> call going back and forth, and even if you did having the VDSM
> taskID is a better anchor then flowID.
> 
> VDSM is built in a way that every call takes in to account the
> current state only. Debugging it with an engine flow mindset is just
> wrong and distracting. I see it doing more harm the good by
> reinforcing bad debugging practices.

I don't know about harm, but, today the engine logs every call and return value 
to and from vdsm.  This means that all the info that is needed to follow a flow 
is already present in the engine log (which was not the case previously) so I 
believe that the flow id is redundant.
In addition, instead of focusing on how to track a flow between components, we 
should focus on how to improve the engine log so that the users don't need to 
go to the hosts in the first place.
My biggest problem with it is that it changes each and every verb in the API 
and makes the log itself also more verbose and less readable.

> 
> - Original Message -
> > From: "Keith Robertson" 
> > To: "VDSM Project Development" 
> > Sent: Thursday, February 9, 2012 1:34:43 PM
> > Subject: Re: [vdsm] flowID schema
> > 
> > On 02/09/2012 12:18 PM, Andrew Cathrow wrote:
> > >
> > > - Original Message -
> > >> From: "Ayal Baron"
> > >> To: "Dan Kenigsberg"
> > >> Cc: "VDSM Project
> > >> Development"
> > >> Sent: Monday, February 6, 2012 10:35:54 AM
> > >> Subject: Re: [vdsm] flowID schema
> > >>
> > >>
> > >>
> > >> - Original Message -
> > >>> On Thu, Feb 02, 2012 at 10:32:49AM -0500, Saggi Mizrahi wrote:
> >  flowID makes no sense after the initial API call as stuff like
> >  cacheing\threadpools\samplingtasks\resources\asyncTasks so
> >  flowing
> >  a flow like that will not give you the entire picture while
> >  debugging.
> > 
> >  Also adding it now will make everything even more ugly.
> >  You know what, just imagine I wrote one of my long rambles
> >  about
> >  why I don't agree with doing this.
> > >>> I cannot imagine you write anything like that. Really. I do not
> > >>> understand why you object logging flowID on API entry point.
> > >> The question is, what problem is this really trying to solve and
> > >> is
> > >> there a simpler and less obtrusive solution to that problem?
> > > correlating logs between ovirt engine and potentially multiple
> > > vdsm
> > > nodes is a nightmare. It requires a lot skill to follow a
> > > transaction through from the front end all the way to the node,
> > > and even multiple nodes (eg actions on spm, then actions on other
> > > node to run a vm).
> > > Having a way to correlate the logs and follow a single event/flow
> > > is vital.
> > >
> > +1
> > 
> > Knowing what command caused a sequence of events in VDSM would be
> > really
> > helpful particularly in a threaded environment.  Further, wouldn't
> > such
> > an ID be helpful in an asynchronous request/response model?  I'm
> > not
> > sure what the plans are for AMQP or even if there are plans, but
> > I'd
> > think that something like this would be crucial for an async
> > response.
> > So, if you implemented it you might be killing 2 birds with 1
> > stone.
> > 
> > FYI: If you want to see examples of other systems that use similar
> > concepts, take a look at the correlation ID in JMS.
> > 
> > Cheers,
> > Keith
> > 
> > 
> > >>> ___
> > >>> vdsm-devel mailing list
> > >>> vdsm-devel@lists.fedorahosted.org
> > >>> https://fedorahosted.org/mailman/listinfo/vdsm-devel
> > >>>
> > >> ___
> > >> vdsm-devel mailing list
> > >> vdsm-devel@lists.fedorahosted.or

Re: [vdsm] flowID schema

2012-02-09 Thread Saggi Mizrahi
-1

I agree that for messaging environment having a Message ID is a must because 
you sometimes don't have a particular target so when you get a response you 
need to know what this node is actually responding to.

The message ID could be composed with  so you can reuse the 
field.

But that is all besides the point.

I understand that someone might find it fun to go on following the entire flow 
in the Engine and in VDSM. But I would like to hear an actual use case where 
someone would have actually benefited from this.
As I see it having VSDM return the task ID with every response (and not just 
for async tasks) is a lot more useful and correct.

A generic debugging scenario as I see it.

1. Something went wrong
2. You go looking in the ENGINE log trying to figure out what happend.
3. You see that ENGINE got SomeError.
4. Check to see if this error makes sense imagining that VDSM is always right 
and is a black box.
5. You did your digging and now you think that VDSM is as fault.
6. Go look for the call that failed. (If we returned the taskID it's pretty 
simple to find that call).
7. Look around the call to check VDSM state.
8. Profit.

There is never a point where you want to follow a whole flow call by call going 
back and forth, and even if you did having the VDSM taskID is a better anchor 
then flowID.

VDSM is built in a way that every call takes in to account the current state 
only. Debugging it with an engine flow mindset is just wrong and distracting. I 
see it doing more harm the good by reinforcing bad debugging practices.

- Original Message -
> From: "Keith Robertson" 
> To: "VDSM Project Development" 
> Sent: Thursday, February 9, 2012 1:34:43 PM
> Subject: Re: [vdsm] flowID schema
> 
> On 02/09/2012 12:18 PM, Andrew Cathrow wrote:
> >
> > - Original Message -
> >> From: "Ayal Baron"
> >> To: "Dan Kenigsberg"
> >> Cc: "VDSM Project Development"
> >> Sent: Monday, February 6, 2012 10:35:54 AM
> >> Subject: Re: [vdsm] flowID schema
> >>
> >>
> >>
> >> - Original Message -
> >>> On Thu, Feb 02, 2012 at 10:32:49AM -0500, Saggi Mizrahi wrote:
>  flowID makes no sense after the initial API call as stuff like
>  cacheing\threadpools\samplingtasks\resources\asyncTasks so
>  flowing
>  a flow like that will not give you the entire picture while
>  debugging.
> 
>  Also adding it now will make everything even more ugly.
>  You know what, just imagine I wrote one of my long rambles about
>  why I don't agree with doing this.
> >>> I cannot imagine you write anything like that. Really. I do not
> >>> understand why you object logging flowID on API entry point.
> >> The question is, what problem is this really trying to solve and
> >> is
> >> there a simpler and less obtrusive solution to that problem?
> > correlating logs between ovirt engine and potentially multiple vdsm
> > nodes is a nightmare. It requires a lot skill to follow a
> > transaction through from the front end all the way to the node,
> > and even multiple nodes (eg actions on spm, then actions on other
> > node to run a vm).
> > Having a way to correlate the logs and follow a single event/flow
> > is vital.
> >
> +1
> 
> Knowing what command caused a sequence of events in VDSM would be
> really
> helpful particularly in a threaded environment.  Further, wouldn't
> such
> an ID be helpful in an asynchronous request/response model?  I'm not
> sure what the plans are for AMQP or even if there are plans, but I'd
> think that something like this would be crucial for an async
> response.
> So, if you implemented it you might be killing 2 birds with 1 stone.
> 
> FYI: If you want to see examples of other systems that use similar
> concepts, take a look at the correlation ID in JMS.
> 
> Cheers,
> Keith
> 
> 
> >>> ___
> >>> vdsm-devel mailing list
> >>> vdsm-devel@lists.fedorahosted.org
> >>> https://fedorahosted.org/mailman/listinfo/vdsm-devel
> >>>
> >> ___
> >> vdsm-devel mailing list
> >> vdsm-devel@lists.fedorahosted.org
> >> https://fedorahosted.org/mailman/listinfo/vdsm-devel
> >>
> > ___
> > vdsm-devel mailing list
> > vdsm-devel@lists.fedorahosted.org
> > https://fedorahosted.org/mailman/listinfo/vdsm-devel
> 
> ___
> vdsm-devel mailing list
> vdsm-devel@lists.fedorahosted.org
> https://fedorahosted.org/mailman/listinfo/vdsm-devel
> 
___
vdsm-devel mailing list
vdsm-devel@lists.fedorahosted.org
https://fedorahosted.org/mailman/listinfo/vdsm-devel


Re: [vdsm] flowID schema

2012-02-09 Thread Keith Robertson

On 02/09/2012 12:18 PM, Andrew Cathrow wrote:


- Original Message -

From: "Ayal Baron"
To: "Dan Kenigsberg"
Cc: "VDSM Project Development"
Sent: Monday, February 6, 2012 10:35:54 AM
Subject: Re: [vdsm] flowID schema



- Original Message -

On Thu, Feb 02, 2012 at 10:32:49AM -0500, Saggi Mizrahi wrote:

flowID makes no sense after the initial API call as stuff like
cacheing\threadpools\samplingtasks\resources\asyncTasks so
flowing
a flow like that will not give you the entire picture while
debugging.

Also adding it now will make everything even more ugly.
You know what, just imagine I wrote one of my long rambles about
why I don't agree with doing this.

I cannot imagine you write anything like that. Really. I do not
understand why you object logging flowID on API entry point.

The question is, what problem is this really trying to solve and is
there a simpler and less obtrusive solution to that problem?

correlating logs between ovirt engine and potentially multiple vdsm nodes is a 
nightmare. It requires a lot skill to follow a transaction through from the 
front end all the way to the node, and even multiple nodes (eg actions on spm, 
then actions on other node to run a vm).
Having a way to correlate the logs and follow a single event/flow is vital.


+1

Knowing what command caused a sequence of events in VDSM would be really 
helpful particularly in a threaded environment.  Further, wouldn't such 
an ID be helpful in an asynchronous request/response model?  I'm not 
sure what the plans are for AMQP or even if there are plans, but I'd 
think that something like this would be crucial for an async response.  
So, if you implemented it you might be killing 2 birds with 1 stone.


FYI: If you want to see examples of other systems that use similar 
concepts, take a look at the correlation ID in JMS.


Cheers,
Keith



___
vdsm-devel mailing list
vdsm-devel@lists.fedorahosted.org
https://fedorahosted.org/mailman/listinfo/vdsm-devel


___
vdsm-devel mailing list
vdsm-devel@lists.fedorahosted.org
https://fedorahosted.org/mailman/listinfo/vdsm-devel


___
vdsm-devel mailing list
vdsm-devel@lists.fedorahosted.org
https://fedorahosted.org/mailman/listinfo/vdsm-devel


___
vdsm-devel mailing list
vdsm-devel@lists.fedorahosted.org
https://fedorahosted.org/mailman/listinfo/vdsm-devel


Re: [vdsm] flowID schema

2012-02-09 Thread Lee Yarwood
-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1

On 02/09/2012 05:18 PM, Andrew Cathrow wrote:
>> > The question is, what problem is this really trying to solve and is
>> > there a simpler and less obtrusive solution to that problem?
> correlating logs between ovirt engine and potentially multiple vdsm nodes is 
> a nightmare. It requires a lot skill to follow a transaction through from the 
> front end all the way to the node, and even multiple nodes (eg actions on 
> spm, then actions on other node to run a vm).
> Having a way to correlate the logs and follow a single event/flow is vital.
> 

+1 anything that allows us to easily follow flows between engine, vdsm
and back again would be welcomed by those supporting it!

Lee
-BEGIN PGP SIGNATURE-
Version: GnuPG v1.4.11 (GNU/Linux)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org/

iQEcBAEBAgAGBQJPNAiFAAoJELymbjP2ci12PFYH/09f/c7xlSMZ1QKJo6QHgBmC
fucAUiKDWFxpJ3zyOJsoJfCL6TmXdvdUAJQXhmy2Its3KGYdhKZLezqJHpIGM/AR
f694/O5a1jUYGJnxtQknr1H4wJar1ot0gIJEuiDxEM9haR4XTTYbhsyl8ApQ8wHP
EERHi6eTzLY0j3ohbDOqRSrGdsofTYQ653MNQXggfBa41R1KbOPkp8XIf4RJ7+0J
8e7xGqU3SwITDeJ9c7yj5bwWdpuzUGvwPZif8VeuR0RajjKHCagiexWbRoLmoGYi
dqc+QcxOL8zs1e9mOiB1Cgfc36bbCjGFBNYe8N3gpnod9NCQZ1f9FzbQ82+Dosg=
=CMZJ
-END PGP SIGNATURE-
___
vdsm-devel mailing list
vdsm-devel@lists.fedorahosted.org
https://fedorahosted.org/mailman/listinfo/vdsm-devel


Re: [vdsm] flowID schema

2012-02-09 Thread Andrew Cathrow


- Original Message -
> From: "Ayal Baron" 
> To: "Dan Kenigsberg" 
> Cc: "VDSM Project Development" 
> Sent: Monday, February 6, 2012 10:35:54 AM
> Subject: Re: [vdsm] flowID schema
> 
> 
> 
> - Original Message -
> > On Thu, Feb 02, 2012 at 10:32:49AM -0500, Saggi Mizrahi wrote:
> > > flowID makes no sense after the initial API call as stuff like
> > > cacheing\threadpools\samplingtasks\resources\asyncTasks so
> > > flowing
> > > a flow like that will not give you the entire picture while
> > > debugging.
> > > 
> > > Also adding it now will make everything even more ugly.
> > > You know what, just imagine I wrote one of my long rambles about
> > > why I don't agree with doing this.
> > 
> > I cannot imagine you write anything like that. Really. I do not
> > understand why you object logging flowID on API entry point.
> 
> The question is, what problem is this really trying to solve and is
> there a simpler and less obtrusive solution to that problem?

correlating logs between ovirt engine and potentially multiple vdsm nodes is a 
nightmare. It requires a lot skill to follow a transaction through from the 
front end all the way to the node, and even multiple nodes (eg actions on spm, 
then actions on other node to run a vm).
Having a way to correlate the logs and follow a single event/flow is vital.

> 
> > 
> > ___
> > vdsm-devel mailing list
> > vdsm-devel@lists.fedorahosted.org
> > https://fedorahosted.org/mailman/listinfo/vdsm-devel
> > 
> ___
> vdsm-devel mailing list
> vdsm-devel@lists.fedorahosted.org
> https://fedorahosted.org/mailman/listinfo/vdsm-devel
> 
___
vdsm-devel mailing list
vdsm-devel@lists.fedorahosted.org
https://fedorahosted.org/mailman/listinfo/vdsm-devel


[vdsm] Adding gluster support

2012-02-09 Thread Itamar Heim

Hi,

The following wiki describes the approach i'm suggesting for adding 
gluster support in phases to ovirt.


http://www.ovirt.org/wiki/AddingGlusterSupportToOvirt

comments welcome.

Thanks,
   Itamar
___
vdsm-devel mailing list
vdsm-devel@lists.fedorahosted.org
https://fedorahosted.org/mailman/listinfo/vdsm-devel