Re: [Lustre-discuss] Metadata storage in test script files

2012-05-08 Thread Roman Grigoryev
Hi,
On 05/08/2012 01:34 AM, Chris Gearing wrote:
> 
> 
> On Mon, May 7, 2012 at 7:33 PM, Nathan Rutman  > wrote:
> 
> 
> On May 4, 2012, at 7:46 AM, Chris Gearing wrote:
> 
> > Hi Roman,
> >
> > I think we may have rat-holed here and perhaps it's worth just
> > re-stating what I'm trying to achieve here.
> >
> > We have a need to be able to test in a more directed and targeted
> > manner, to be able to focus on a unit of code like lnet or an
> attribute
> > of capability like performance. However since starting work on the
> > Lustre test infrastructure it has become clear to me that knowledge
> > about the capability, functionality and purpose of individual tests is
> > very general and held in the heads of Lustre engineers. Because we are
> > talking about targeting tests we require knowledge about the
> capability,
> > functionality and purpose of the tests not the outcome of running the
> > tests, or to put it another way what the tests can do not what
> they have
> > done.
> >
> > One key fact about cataloguing the the capabilities of the tests
> is that
> > for almost every imaginable case the capability of the test only
> changes
> > if the test itself changes and so the rate of change of the data
> in the
> > catalogue is the same and actually much less than the rate of change
> > test code itself. The only exception to this could be that a test
> > suddenly discovers a new bug which has to have a new ticket
> attached to
> > it, although this should be a very very rare if we manage our
> > development process properly.
> >
> > This requirement leads to the conclusion that we need to catalogue all
> > of the tests within the current test-framework and a catalogue equates
> > to a database, hence we need a database of the capability,
> functionality
> > and purpose of the individual tests. With this requirement in mind it
> > would be easy to create a database using something like mysql that
> could
> > be used by applications like the Lustre test system, but using an
> > approach like that would make the database very difficult to share and
> > will be even harder to attach the knowledge to the Lustre tree
> which is
> > were it belongs.
> >
> > So the question I want to solve is how to catalogue the
> capabilities of
> > the individual tests in a database, store that data as part of the
> > Lustre source and as a bonus make the data readable and even carefully
> > editable by people as well as machines. Now to focus on the last
> point I
> > do not think we should constrain ourselves to something that can
> be read
> > by machine using just bash, we do have access to structure
> languages and
> > should make use of that fact.
> >
> I think we all agree 100% on the above...
> 
> > The solution to all of this seemed to be to store the catalogue about
> > the tests as part of the tests themselves
> ... but not necessarily that conclusion.
>  
> 
> > , this provides for human and
> > machine accessibility, implicit version control and certainty the what
> > ever happens to Lustre source the data goes with it. It is also
> the case
> > that by keeping the catalogue with the subject the maintenance of the
> > catalogue is more likely to occur than if the two are separate.
> 
> I agree with all those.  But there are some difficulties with this
> as well:
> 1. bash isn't a great language to encapsulate this metadata
> 
>  
> The thing to focus on I think is the data captured not the format. The
> parser for yaml encapsulated in the source or anywhere else is a small
> amount of effort compared to capturing the data in the first place. If
> we capture the data and it's machine readable then changing the format
> is easy.
> 
> There are many advantages today to keeping the source and the metadata
> in the same place, one being that when reviewing new or updated tests
> the reviewers can and will be encouraged to by the locality to ensure
> the metadata matches the new or revised test. If the two are not
> together then they have very little chance of being kept in sync.

Also I have more then one concerns. You are suggesting to put in bash
structure which has his formal description. Who and when will check that
a embedded structure is correct? Formal structure must be checked by
tools not by eyes. For example I use Rx tools with schema definition for
yaml. Extracting yaml data and checked it separately decrease comfort of
using tools.

To be honest, I don't see big difference between using 2 files and one
file from developer point of view. This is more about discipline
question then comfort. Absolutely same developer could ignore
description which is placed nearly. (From my experience with tests liv

Re: [Lustre-discuss] Metadata storage in test script files

2012-05-07 Thread Chris Gearing
On Mon, May 7, 2012 at 7:33 PM, Nathan Rutman  wrote:

>
> On May 4, 2012, at 7:46 AM, Chris Gearing wrote:
>
> > Hi Roman,
> >
> > I think we may have rat-holed here and perhaps it's worth just
> > re-stating what I'm trying to achieve here.
> >
> > We have a need to be able to test in a more directed and targeted
> > manner, to be able to focus on a unit of code like lnet or an attribute
> > of capability like performance. However since starting work on the
> > Lustre test infrastructure it has become clear to me that knowledge
> > about the capability, functionality and purpose of individual tests is
> > very general and held in the heads of Lustre engineers. Because we are
> > talking about targeting tests we require knowledge about the capability,
> > functionality and purpose of the tests not the outcome of running the
> > tests, or to put it another way what the tests can do not what they have
> > done.
> >
> > One key fact about cataloguing the the capabilities of the tests is that
> > for almost every imaginable case the capability of the test only changes
> > if the test itself changes and so the rate of change of the data in the
> > catalogue is the same and actually much less than the rate of change
> > test code itself. The only exception to this could be that a test
> > suddenly discovers a new bug which has to have a new ticket attached to
> > it, although this should be a very very rare if we manage our
> > development process properly.
> >
> > This requirement leads to the conclusion that we need to catalogue all
> > of the tests within the current test-framework and a catalogue equates
> > to a database, hence we need a database of the capability, functionality
> > and purpose of the individual tests. With this requirement in mind it
> > would be easy to create a database using something like mysql that could
> > be used by applications like the Lustre test system, but using an
> > approach like that would make the database very difficult to share and
> > will be even harder to attach the knowledge to the Lustre tree which is
> > were it belongs.
> >
> > So the question I want to solve is how to catalogue the capabilities of
> > the individual tests in a database, store that data as part of the
> > Lustre source and as a bonus make the data readable and even carefully
> > editable by people as well as machines. Now to focus on the last point I
> > do not think we should constrain ourselves to something that can be read
> > by machine using just bash, we do have access to structure languages and
> > should make use of that fact.
> >
> I think we all agree 100% on the above...
>
> > The solution to all of this seemed to be to store the catalogue about
> > the tests as part of the tests themselves
> ... but not necessarily that conclusion.
>
>
> , this provides for human and
> > machine accessibility, implicit version control and certainty the what
> > ever happens to Lustre source the data goes with it. It is also the case
> > that by keeping the catalogue with the subject the maintenance of the
> > catalogue is more likely to occur than if the two are separate.
>
> I agree with all those.  But there are some difficulties with this as well:
> 1. bash isn't a great language to encapsulate this metadata
>

The thing to focus on I think is the data captured not the format. The
parser for yaml encapsulated in the source or anywhere else is a small
amount of effort compared to capturing the data in the first place. If we
capture the data and it's machine readable then changing the format is easy.

There are many advantages today to keeping the source and the metadata in
the same place, one being that when reviewing new or updated tests the
reviewers can and will be encouraged to by the locality to ensure the
metadata matches the new or revised test. If the two are not together then
they have very little chance of being kept in sync.

2. this further locks us in to current test implementation - there's not
> much possibility to start writing tests in another language if we're
> parsing through looking for bash-formatted metadata. Sure, multiple parsers
> could be written...
>

I don't think it is a lock in at all, the data is machine readable and
moving to a new format when and should we need it will be easy. Let's focus
on capturing the data so we increase our knowledge, once we have the data
we can manipulate it however we want. The data and the metadata together in
my opinion increases the chance of capturing and updating the data given
todays methods and tools.

3. difficulty changing md of groups of tests en-mass - eg. add "slow"
> keyword to a set of tests
>

The data can read and written by machine and the libraries/application to
do this would be written. Referring back to the description of the metadata
we would not be making sweeping changes to test metadata because the
metadata should only change when the test changes [exceptions will always
apply but we should not optimize for excepti

Re: [Lustre-discuss] Metadata storage in test script files

2012-05-07 Thread Nathan Rutman

On May 4, 2012, at 7:46 AM, Chris Gearing wrote:

> Hi Roman,
> 
> I think we may have rat-holed here and perhaps it's worth just 
> re-stating what I'm trying to achieve here.
> 
> We have a need to be able to test in a more directed and targeted 
> manner, to be able to focus on a unit of code like lnet or an attribute 
> of capability like performance. However since starting work on the 
> Lustre test infrastructure it has become clear to me that knowledge 
> about the capability, functionality and purpose of individual tests is 
> very general and held in the heads of Lustre engineers. Because we are 
> talking about targeting tests we require knowledge about the capability, 
> functionality and purpose of the tests not the outcome of running the 
> tests, or to put it another way what the tests can do not what they have 
> done.
> 
> One key fact about cataloguing the the capabilities of the tests is that 
> for almost every imaginable case the capability of the test only changes 
> if the test itself changes and so the rate of change of the data in the 
> catalogue is the same and actually much less than the rate of change 
> test code itself. The only exception to this could be that a test 
> suddenly discovers a new bug which has to have a new ticket attached to 
> it, although this should be a very very rare if we manage our 
> development process properly.
> 
> This requirement leads to the conclusion that we need to catalogue all 
> of the tests within the current test-framework and a catalogue equates 
> to a database, hence we need a database of the capability, functionality 
> and purpose of the individual tests. With this requirement in mind it 
> would be easy to create a database using something like mysql that could 
> be used by applications like the Lustre test system, but using an 
> approach like that would make the database very difficult to share and 
> will be even harder to attach the knowledge to the Lustre tree which is 
> were it belongs.
> 
> So the question I want to solve is how to catalogue the capabilities of 
> the individual tests in a database, store that data as part of the 
> Lustre source and as a bonus make the data readable and even carefully 
> editable by people as well as machines. Now to focus on the last point I 
> do not think we should constrain ourselves to something that can be read 
> by machine using just bash, we do have access to structure languages and 
> should make use of that fact.
> 
I think we all agree 100% on the above...

> The solution to all of this seemed to be to store the catalogue about 
> the tests as part of the tests themselves
... but not necessarily that conclusion.

> , this provides for human and 
> machine accessibility, implicit version control and certainty the what 
> ever happens to Lustre source the data goes with it. It is also the case 
> that by keeping the catalogue with the subject the maintenance of the 
> catalogue is more likely to occur than if the two are separate.

I agree with all those.  But there are some difficulties with this as well:
1. bash isn't a great language to encapsulate this metadata
2. this further locks us in to current test implementation - there's not much 
possibility to start writing tests in another language if we're parsing through 
looking for bash-formatted metadata. Sure, multiple parsers could be written...
3. difficulty changing md of groups of tests en-mass - eg. add "slow" keyword 
to a set of tests
4. no inheritance of characteristics - each test must explicitly list every 
piece of md.  This not only blows up the amount of md it also is a source for 
typos, etc. to cause problems.
5. no automatic modification of characteristics.  In particular, one piece of 
md I would like to see is "maximum allowed test time" for each test.  Ideally, 
this could be measured and adjusted automatically based on historical and 
ongoing run data.  But it would be dangerous to allow automatic modification to 
the script itself.

To address those problems, I think a database-type approach is exactly right, 
or perhaps a YAML file with hierarchical inheritance.
To some degree, this is a "evolution vs revolution" question, and I prefer to 
come down on the revolution-enabling design, despite the problems you list.  
Basically, I believe the separated MD model allows for the replacement of 
test-framework, and this, to my mind, is the majority driver for adding the MD 
at all.


> 
> My original use of the term test metadata is intended as a more modern 
> term for catalogue or the [test] library.
> 
> So to refresh everybody's mind, I'd like to suggest that we place test 
> metadata in the source code itself using the following format, where the 
> here doc is inserted into the copy about the test function itself.
> 
> ===
> < Name:
>   before_upgrade_create_data
> Summary:
>   Copies lustre source into a node specific directory and then

Re: [Lustre-discuss] Metadata storage in test script files

2012-05-04 Thread Chris Gearing
Hi Roman,

I think we may have rat-holed here and perhaps it's worth just 
re-stating what I'm trying to achieve here.

We have a need to be able to test in a more directed and targeted 
manner, to be able to focus on a unit of code like lnet or an attribute 
of capability like performance. However since starting work on the 
Lustre test infrastructure it has become clear to me that knowledge 
about the capability, functionality and purpose of individual tests is 
very general and held in the heads of Lustre engineers. Because we are 
talking about targeting tests we require knowledge about the capability, 
functionality and purpose of the tests not the outcome of running the 
tests, or to put it another way what the tests can do not what they have 
done.

One key fact about cataloguing the the capabilities of the tests is that 
for almost every imaginable case the capability of the test only changes 
if the test itself changes and so the rate of change of the data in the 
catalogue is the same and actually much less than the rate of change 
test code itself. The only exception to this could be that a test 
suddenly discovers a new bug which has to have a new ticket attached to 
it, although this should be a very very rare if we manage our 
development process properly.

This requirement leads to the conclusion that we need to catalogue all 
of the tests within the current test-framework and a catalogue equates 
to a database, hence we need a database of the capability, functionality 
and purpose of the individual tests. With this requirement in mind it 
would be easy to create a database using something like mysql that could 
be used by applications like the Lustre test system, but using an 
approach like that would make the database very difficult to share and 
will be even harder to attach the knowledge to the Lustre tree which is 
were it belongs.

So the question I want to solve is how to catalogue the capabilities of 
the individual tests in a database, store that data as part of the 
Lustre source and as a bonus make the data readable and even carefully 
editable by people as well as machines. Now to focus on the last point I 
do not think we should constrain ourselves to something that can be read 
by machine using just bash, we do have access to structure languages and 
should make use of that fact.

The solution to all of this seemed to be to store the catalogue about 
the tests as part of the tests themselves, this provides for human and 
machine accessibility, implicit version control and certainty the what 
ever happens to Lustre source the data goes with it. It is also the case 
that by keeping the catalogue with the subject the maintenance of the 
catalogue is more likely to occur than if the two are separate.

My original use of the term test metadata is intended as a more modern 
term for catalogue or the [test] library.

So to refresh everybody's mind, I'd like to suggest that we place test 
metadata in the source code itself using the following format, where the 
here doc is inserted into the copy about the test function itself.

===


Re: [Lustre-discuss] Metadata storage in test script files

2012-05-03 Thread Roman Grigoryev
Hi,

On 05/02/2012 11:01 PM, Andreas Dilger wrote:
> I'm chopping out most of the discussion, to try and focus on the core issues 
> here.
> 
> On 2012-05-02, at 10:35 AM, Roman Grigoryev wrote:
>> On 05/02/2012 01:25 PM, Chris wrote:
>>> I cannot suppose if you should store this information with your results
>>> because I have no insight into your private testing practices.
>>
>> I just want to have info not only in maloo or other big systems but in
>> default test harness. Developers can run results by hand, tester also
>> should have possibility to execute in specific environment. If we can
>> provides some helpful info - i think it is good. few kilobytes is not so
>> match as logs, but can help in some cases.
> 
> I don't think you two are in disagreement here.  We want the test 
> descriptions and other 
> metadata with the tests, open for any usage (human, test scripts, different 
> test harnesses, etc).

I absolute agree. My point is just about form: machine usage need formal
description of fields and tools for simple check it.

> 
>>> I don't think people should introduce dependencies either, but they have 
>>> and we have to deal with that fact. In your example
>>> if C is dependent on A and A is removed then C cannot be run.
>>
>> Maybe I'm incorrect, but fight with dependencies looks like more
>> important then adding descriptions.
> 
> For the short term.  However, finding dependencies is easily done through 
> simple mechanical steps (e.g. try to run each subtest
> independently).  Since the policy in the past was to make all tests 
> independent, I expect that not very many tests will actually
> have dependencies.

Just now I'm working on this task.

> 
> However, the main reason for having good descriptions of the tests is to gain 
> an understanding of what part of the
> code the tests are trying to exercise, what problem they were written to 
> verify, and what value they provide. 
> We cannot reasonably rewrite or modify tests safely if we don't have a good 
> understanding of what they are doing today.
> Also, this helps people running and debugging the tests and their failures 
> for the long term.

I absolute agree with common target and text descriptions for humans. I
just don't really see why test refactoring and test understanding
(creating summary-descriptions) cannot be combine into one. (Also I have
feeling that developer will find many errors when go around test for get
description. I have some experience in same tasks and could say that
fresh look to old tests often find problems.).

-- 
Thanks,
Roman


___
Lustre-discuss mailing list
Lustre-discuss@lists.lustre.org
http://lists.lustre.org/mailman/listinfo/lustre-discuss


Re: [Lustre-discuss] Metadata storage in test script files

2012-05-02 Thread Andreas Dilger
I'm chopping out most of the discussion, to try and focus on the core issues 
here.

On 2012-05-02, at 10:35 AM, Roman Grigoryev wrote:
> On 05/02/2012 01:25 PM, Chris wrote:
>> I cannot suppose if you should store this information with your results
>> because I have no insight into your private testing practices.
> 
> I just want to have info not only in maloo or other big systems but in
> default test harness. Developers can run results by hand, tester also
> should have possibility to execute in specific environment. If we can
> provides some helpful info - i think it is good. few kilobytes is not so
> match as logs, but can help in some cases.

I don't think you two are in disagreement here.  We want the test descriptions 
and other metadata with the tests, open for any usage (human, test scripts, 
different test harnesses, etc).

>> I don't think people should introduce dependencies either, but they have and 
>> we have to deal with that fact. In your example if C is dependent on A and A 
>> is removed then C cannot be run.
> 
> Maybe I'm incorrect, but fight with dependencies looks like more
> important then adding descriptions.

For the short term.  However, finding dependencies is easily done through 
simple mechanical steps (e.g. try to run each subtest independently).  Since 
the policy in the past was to make all tests independent, I expect that not 
very many tests will actually have dependencies.

However, the main reason for having good descriptions of the tests is to gain 
an understanding of what part of the code the tests are trying to exercise, 
what problem they were written to verify, and what value they provide.  We 
cannot reasonably rewrite or modify tests safely if we don't have a good 
understanding of what they are doing today.  Also, this helps people running 
and debugging the tests and their failures for the long term.


Cheers, Andreas
--
Andreas Dilger   Whamcloud, Inc.
Principal Lustre Engineerhttp://www.whamcloud.com/




___
Lustre-discuss mailing list
Lustre-discuss@lists.lustre.org
http://lists.lustre.org/mailman/listinfo/lustre-discuss


Re: [Lustre-discuss] Metadata storage in test script files

2012-05-02 Thread Roman Grigoryev
Hi Chris,

On 05/02/2012 08:06 PM, Chris wrote:
> On 02/05/2012 16:44, Roman wrote:
>>
>>> I think this is something that needs to live outside the test metadata
>>> being described here.  The definition of "golden configuration" is
>>> hard to define, and depends heavily on factors that change from one
>>> environment to the next.
>> We could separate dynamic and static metadata. But it will be good if
>> both set of data use one engine and storage type with just different
>> sources.
> 
> I think we all understand the static metadata and I believe that the
> data in my original examples is static data. This data relates to a
> version of the test scripts and so can live as part of the test script
> managed using the same git mechanisms.
> 
> Could you explain what you mean by dynamic data so that we can all
> understand exactly what you are suggesting we store.

As true dynamic data I can imagine only tickets now. And I'm not sure
how it important to keep in test sources, it think umbrella for old
bugzilla, WC jira and maybe other bug sources is more important.

But I can imagine situation when we want to update meta data in many
tests. F.e. somebody done by test coverage and want to add it to meta
information.

> 
>> Also, I don't see good way to use 'metadata inheritance' way in shell
>> without adding pretty unclear shell code, so switch to metadata usage
>> should be one-monent or test framework just ignore it and metadata
>> became just static text for external scripts. 
> 
> I'm not sure if there is a place for inheritance in this particular
> situation but if there is then we need to be clear of one thing. There
> can be no implicit   for these scripts. I.e. We can't have a
> single attribute at the top of a file that applies to all tests. The
> reason for this is because one major reason for having metadata is that
> we cause the data to be collected properly, each test needs to have the
> data explicitly captured. If a test does not have the data captured then
> we do not have any data - and no data is a fact (data) in itself, If a
> test inherits data from another test then that must have be explicitly set.
> 
> We cannot allow sweeping inheritance that allows us to imagine we have
> learnt something when actually we've just taken a short cut to give the
> impression of knowledge.

Yes, I mean inheritance from "single attribute at the top of a file"
(with overriding if defined in detailed level). Why we can't have single
attribute at the top which is default values? Going over all tests
manually is very big task.

Back to your original definition, f.e. all tests from lustre-rsync
should be on one component (maybe, as I understand), there is no big
reasons to duplicate componetns.

-- 
Thanks,
Roman
___
Lustre-discuss mailing list
Lustre-discuss@lists.lustre.org
http://lists.lustre.org/mailman/listinfo/lustre-discuss


Re: [Lustre-discuss] Metadata storage in test script files

2012-05-02 Thread Roman Grigoryev
Hi,

On 05/02/2012 01:25 PM, Chris wrote:
> On 02/05/2012 04:23, Roman Grigoryev wrote:
>> Hi Cris,
>>
>> On 05/01/2012 08:17 PM, Chris wrote:
>>> The metadata can be used in a multitude of ways, for example we can
>>> create dynamic test sets based on
>>> the changes made or target area of testing. What we are doing here is
>>> creating an understanding of the
>>> tests that we have so that we can improve our processes and testing
>>> capabilities in the future.
>> I think that when are are defining tool we should say about purpose.
>> F.e. good description  and summary is not needed for creating dynamic
>> test sets. I think, it very important to say how will we use it.
>> Continue of this idea please read below.
> The purpose is to enable use to develop and store knowledge/information
> about the tests, the information should be in a conical form, objective
> and correct. If we do this then the whole community can make use of it
> as they see fit. I want to ensure that the initial set of stored
> variables describes the tests as completely as reasonably possible. The
> conical description of each test is not effected by the usage to which
> the data is put.
> 
>>> The metadata does not go to the results. The metadata is a database in
>>> it's own right and should metadata
>>> about a test be required it would be accessed from the source (database)
>>> itself.
>> I think fields like title, summary, and, possible. description should be
>> present in results too. It can be very helpful for quickly understanding
>> test results.
> They can be presented as part of results but I would not store with the
> results, if for example Maloo presents the description it will fetch it
> from the correct version of the source, we should not be making copies
> of data.

ok, good.

> 
> I cannot suppose if you should store this information with your results
> because I have no insight into your private testing practices.

I just want to have info not only in maloo or other big systems but in
default test harness. Developers can run results by hand, tester also
should have possibility to execute in specific environment. If we can
provides some helpful info - i think it is good. few kilobytes is not so
match as logs, but can help in some cases.

>>
 On 04/30/2012 08:50 PM, Chris wrote:
>>> ... snip ...
>>>
>>>
>>> As I said we can mine this data any-time and anyway that we want, and
>>> the purpose of this
>>> discussion is the data not how we use it. But as an example something
>>> that dynamically built
>>> test sets would need to know prerequisites.
>>>
>>> The suffix of a,b,c could be used to generate prerequisite information
>>> but it is firstly inflexible, for example
>>> I bet 'b','c' and 'd' are often dependent on 'a' but not each other,
>>> secondly and more importantly we want a
>>> standard form for storing metadata because we want to introduce order
>>> and knowledge into the test
>>> scripts that we have today.
>> Why I asked about way of usage: if we want to use this information in
>> scripts and in other automated way we must strictly specify logic on
>> items and provides tool for check it.
>>
>> F.e. we will use it when built test execution queue. We have chain like
>> this: test C prerequisite B, test B prerequisite A. Test A doesn't have
>> prerequisite. In one good day test A became excluded. Is it possible to
>> execute test C?
>> But if we will not use it in scripting there is no big logical problem.
>>
>> (My opinion: I don't like this situation and think that test
>> dependencies should be used only in very specific and rare case.)
> I don't think people should introduce dependencies either, but they have
> and we have to deal with that fact. In your example if C is dependent on
> A and A is removed then C cannot be run.

Maybe I'm incorrect, but fight with dependencies looks like more
important then adding descriptions.

>>
 I suggest add keywords(Components could be translated as keywords too)
 and test type (stress, benchmark, load, functional, negative, etc) for
 quick filtering. For example, SLOW could transform to keyword.
>>> This seems like a reasonable idea although we need a name that describes
>>> what it is,
>>> we will need to define that set of possible words as we need to with the
>>> Components elements.
>> I mean that 'keywords' should be separated from components but could be
>> logically included. I think, 'Components' is special type of keywords.
> I don't think of Components as a keyword, I think of it as a factual
> piece of data and if we want to add the test purpose then we should call
> it that. The use of keywords in data is generally a typeless catch-all.
> All of this metadata should be clear and well defined which does not in
> my opinion allow scope for a keywords element.

I agreed that Components aren't keywords.

> 
> I would suggest that we add a variable called Purposes which is an array
> containing a set of predefined elements like stress, benchm

Re: [Lustre-discuss] Metadata storage in test script files

2012-05-02 Thread Chris
On 02/05/2012 16:44, Roman wrote:
>
>> I think this is something that needs to live outside the test metadata
>> being described here.  The definition of "golden configuration" is
>> hard to define, and depends heavily on factors that change from one
>> environment to the next.
> We could separate dynamic and static metadata. But it will be good if
> both set of data use one engine and storage type with just different
> sources.

I think we all understand the static metadata and I believe that the 
data in my original examples is static data. This data relates to a 
version of the test scripts and so can live as part of the test script 
managed using the same git mechanisms.

Could you explain what you mean by dynamic data so that we can all 
understand exactly what you are suggesting we store.

> Also, I don't see good way to use 'metadata inheritance' way in shell 
> without adding pretty unclear shell code, so switch to metadata usage 
> should be one-monent or test framework just ignore it and metadata 
> became just static text for external scripts. 

I'm not sure if there is a place for inheritance in this particular 
situation but if there is then we need to be clear of one thing. There 
can be no implicit inheritance for these scripts. I.e. We can't have a 
single attribute at the top of a file that applies to all tests. The 
reason for this is because one major reason for having metadata is that 
we cause the data to be collected properly, each test needs to have the 
data explicitly captured. If a test does not have the data captured then 
we do not have any data - and no data is a fact (data) in itself, If a 
test inherits data from another test then that must have be explicitly set.

We cannot allow sweeping inheritance that allows us to imagine we have 
learnt something when actually we've just taken a short cut to give the 
impression of knowledge.

Chris
___
Lustre-discuss mailing list
Lustre-discuss@lists.lustre.org
http://lists.lustre.org/mailman/listinfo/lustre-discuss


Re: [Lustre-discuss] Metadata storage in test script files

2012-05-02 Thread Roman Grigoryev
Hi Andreas,

On 05/02/2012 08:14 AM, Andreas Dilger wrote:
> On 2012-05-01, at 9:23 PM, Roman Grigoryev wrote:

 On 04/30/2012 08:50 PM, Chris wrote:
> Prerequisites:Pre-requisite tests that must be run before this test 
> can be run. This is again an array which presumes a test may
> have multiple pre-requisites, but the data should not contain a
> chain of prerequisites, i.e. if A requires B and B requires C, the
> pre-requisites of A is B not B & C.
 On which step do you want to check chains? And what is logical base
 for this prerequisites exclude case that current tests have hidden
 dependencies?
  I don't see any difference between one test which have body from tests 
 a,b,c and this prerequisites definition.
 Could you please explain more why we need this field?
>>> As I said we can mine this data any-time and anyway that we want, and
>>> the purpose of this discussion is the data not how we use it. But as
>>> an example something that dynamically built
>>> test sets would need to know prerequisites.
>>>
>>> The suffix of a,b,c could be used to generate prerequisite information
>>> but it is firstly inflexible, for example I bet 'b','c' and 'd' are
>>> often dependent on 'a' but not each other, secondly and more
>>> importantly we want a standard form for storing metadata because we
>>> want to introduce order and knowledge into the test
>>> scripts that we have today.
>>
>> Why I asked about way of usage: if we want to use this information in
>> scripts and in other automated way we must strictly specify logic on
>> items and provides tool for check it.
> 
> I think it is sufficient to have a well-structured repository of test
> metadata, and then multiple uses can be found for this data.  Even for
> human use, a good description of what the test is supposed to check,
> and why this test exists would be a good start.

I absolute agree that good description, summary and other fields are
very important.
> 
> The test metadata format is extensible, so should we need more fields
> in the future it will be possible to add them.  I think the hardest
> work will be to get good text descriptions of the tests, not mechanical
> issues like dependencies and such.

I think this work will be pretty long and I suggest to ask it only for
new and changed tests. In this case, possibility to have some kind of
description inheritance is good solution.

> 
>> F.e. we will use it when built test execution queue. We have chain like
>> this: test C prerequisite B, test B prerequisite A. Test A doesn't have
>> prerequisite. In one good day test A became excluded. Is it possible to
>> execute test C?
>> But if we will not use it in scripting there is no big logical problem.
>>
>> (My opinion: I don't like this situation and think that test
>> dependencies should be used only in very specific and rare case.)
>>
>>>
> TicketIDs: This is an array of ticket numbers that this test
> explicitly tests. In theory we should aim for the state where
> every ticket has a test associated with it, and in future we
> should be able to carry out a gap analysis.
>
 I suggest add keywords(Components could be translated as keywords too) and 
 test type (stress, benchmark, load, functional, negative,
 etc) for quick filtering. For example, SLOW could transform to
 keyword.
>>> This seems like a reasonable idea although we need a name that describes 
>>> what it is, we will need to define that set of possible
>>> words as we need to with the Components elements.
>>
>> I mean that 'keywords' should be separated from components but could be
>> logically included. I think, 'Components' is special type of keywords.
>>
>>> What should this field be called - we should not reduce the value of
>>> this data why genericizing it into 'keywords'.
>>>
 Also,  I would like to mention, we have 3 different logical types of
 data:
 1) just human-readable descriptions
 2) filtering and targeting fields (Componens, keywords if you agree with
 my suggestion)
 3) framework directives(Prerequisites)

> As time goes on we may well expand this compulsory list, but this is I
> believe a sensible starting place.
>
> Being part of the source this data will be subject to the same review
> process as any other change and so we cannot store dynamic data here,
> such as pass rates etc.
 What you you think, maybe it is good idea to keep metadata separately?
 This can be useful for simplifying changing data via script for mass
 modification also as adding tickets and pass rate and execution time on
 'gold' configurations?
>>> It would be easier to store the data separately and we could use Maloo
>>> but it's very important that this data becomes part of the Lustre
>>> 'source' so that everybody can benefit from it. Adding tickets is
>>> not a problem as part of the resolution issue is to ensure that at
>>> lea

Re: [Lustre-discuss] Metadata storage in test script files

2012-05-02 Thread Roman
Hi Andreas,

On 05/02/2012 08:14 AM, Andreas Dilger wrote:
> On 2012-05-01, at 9:23 PM, Roman Grigoryev wrote:

 On 04/30/2012 08:50 PM, Chris wrote:
> Prerequisites:Pre-requisite tests that must be run before this test 
> can be run. This is again an array which presumes a test may
> have multiple pre-requisites, but the data should not contain a
> chain of prerequisites, i.e. if A requires B and B requires C, the
> pre-requisites of A is B not B & C.
 On which step do you want to check chains? And what is logical base
 for this prerequisites exclude case that current tests have hidden
 dependencies?
  I don't see any difference between one test which have body from tests 
 a,b,c and this prerequisites definition.
 Could you please explain more why we need this field?
>>> As I said we can mine this data any-time and anyway that we want, and
>>> the purpose of this discussion is the data not how we use it. But as
>>> an example something that dynamically built
>>> test sets would need to know prerequisites.
>>>
>>> The suffix of a,b,c could be used to generate prerequisite information
>>> but it is firstly inflexible, for example I bet 'b','c' and 'd' are
>>> often dependent on 'a' but not each other, secondly and more
>>> importantly we want a standard form for storing metadata because we
>>> want to introduce order and knowledge into the test
>>> scripts that we have today.
>>
>> Why I asked about way of usage: if we want to use this information in
>> scripts and in other automated way we must strictly specify logic on
>> items and provides tool for check it.
> 
> I think it is sufficient to have a well-structured repository of test
> metadata, and then multiple uses can be found for this data.  Even for
> human use, a good description of what the test is supposed to check,
> and why this test exists would be a good start.

I absolute agree that good description, summary and other fields are
very important.
> 
> The test metadata format is extensible, so should we need more fields
> in the future it will be possible to add them.  I think the hardest
> work will be to get good text descriptions of the tests, not mechanical
> issues like dependencies and such.

I think this work will be pretty long and I suggest to ask it only for
new and changed tests. In this case, possibility to have some kind of
description inheritance is good solution.

> 
>> F.e. we will use it when built test execution queue. We have chain like
>> this: test C prerequisite B, test B prerequisite A. Test A doesn't have
>> prerequisite. In one good day test A became excluded. Is it possible to
>> execute test C?
>> But if we will not use it in scripting there is no big logical problem.
>>
>> (My opinion: I don't like this situation and think that test
>> dependencies should be used only in very specific and rare case.)
>>
>>>
> TicketIDs: This is an array of ticket numbers that this test
> explicitly tests. In theory we should aim for the state where
> every ticket has a test associated with it, and in future we
> should be able to carry out a gap analysis.
>
 I suggest add keywords(Components could be translated as keywords too) and 
 test type (stress, benchmark, load, functional, negative,
 etc) for quick filtering. For example, SLOW could transform to
 keyword.
>>> This seems like a reasonable idea although we need a name that describes 
>>> what it is, we will need to define that set of possible
>>> words as we need to with the Components elements.
>>
>> I mean that 'keywords' should be separated from components but could be
>> logically included. I think, 'Components' is special type of keywords.
>>
>>> What should this field be called - we should not reduce the value of
>>> this data why genericizing it into 'keywords'.
>>>
 Also,  I would like to mention, we have 3 different logical types of
 data:
 1) just human-readable descriptions
 2) filtering and targeting fields (Componens, keywords if you agree with
 my suggestion)
 3) framework directives(Prerequisites)

> As time goes on we may well expand this compulsory list, but this is I
> believe a sensible starting place.
>
> Being part of the source this data will be subject to the same review
> process as any other change and so we cannot store dynamic data here,
> such as pass rates etc.
 What you you think, maybe it is good idea to keep metadata separately?
 This can be useful for simplifying changing data via script for mass
 modification also as adding tickets and pass rate and execution time on
 'gold' configurations?
>>> It would be easier to store the data separately and we could use Maloo
>>> but it's very important that this data becomes part of the Lustre
>>> 'source' so that everybody can benefit from it. Adding tickets is
>>> not a problem as part of the resolution issue is to ensure that at
>>> lea

Re: [Lustre-discuss] Metadata storage in test script files

2012-05-02 Thread Chris
On 02/05/2012 04:23, Roman Grigoryev wrote:
> Hi Cris,
>
> On 05/01/2012 08:17 PM, Chris wrote:
>> The metadata can be used in a multitude of ways, for example we can
>> create dynamic test sets based on
>> the changes made or target area of testing. What we are doing here is
>> creating an understanding of the
>> tests that we have so that we can improve our processes and testing
>> capabilities in the future.
> I think that when are are defining tool we should say about purpose.
> F.e. good description  and summary is not needed for creating dynamic
> test sets. I think, it very important to say how will we use it.
> Continue of this idea please read below.
The purpose is to enable use to develop and store knowledge/information 
about the tests, the information should be in a conical form, objective 
and correct. If we do this then the whole community can make use of it 
as they see fit. I want to ensure that the initial set of stored 
variables describes the tests as completely as reasonably possible. The 
conical description of each test is not effected by the usage to which 
the data is put.

>> The metadata does not go to the results. The metadata is a database in
>> it's own right and should metadata
>> about a test be required it would be accessed from the source (database)
>> itself.
> I think fields like title, summary, and, possible. description should be
> present in results too. It can be very helpful for quickly understanding
> test results.
They can be presented as part of results but I would not store with the 
results, if for example Maloo presents the description it will fetch it 
from the correct version of the source, we should not be making copies 
of data.

I cannot suppose if you should store this information with your results 
because I have no insight into your private testing practices.
>
>>> On 04/30/2012 08:50 PM, Chris wrote:
>> ... snip ...
>>
>>
>> As I said we can mine this data any-time and anyway that we want, and
>> the purpose of this
>> discussion is the data not how we use it. But as an example something
>> that dynamically built
>> test sets would need to know prerequisites.
>>
>> The suffix of a,b,c could be used to generate prerequisite information
>> but it is firstly inflexible, for example
>> I bet 'b','c' and 'd' are often dependent on 'a' but not each other,
>> secondly and more importantly we want a
>> standard form for storing metadata because we want to introduce order
>> and knowledge into the test
>> scripts that we have today.
> Why I asked about way of usage: if we want to use this information in
> scripts and in other automated way we must strictly specify logic on
> items and provides tool for check it.
>
> F.e. we will use it when built test execution queue. We have chain like
> this: test C prerequisite B, test B prerequisite A. Test A doesn't have
> prerequisite. In one good day test A became excluded. Is it possible to
> execute test C?
> But if we will not use it in scripting there is no big logical problem.
>
> (My opinion: I don't like this situation and think that test
> dependencies should be used only in very specific and rare case.)
I don't think people should introduce dependencies either, but they have 
and we have to deal with that fact. In your example if C is dependent on 
A and A is removed then C cannot be run.
>
>>> I suggest add keywords(Components could be translated as keywords too)
>>> and test type (stress, benchmark, load, functional, negative, etc) for
>>> quick filtering. For example, SLOW could transform to keyword.
>> This seems like a reasonable idea although we need a name that describes
>> what it is,
>> we will need to define that set of possible words as we need to with the
>> Components elements.
> I mean that 'keywords' should be separated from components but could be
> logically included. I think, 'Components' is special type of keywords.
I don't think of Components as a keyword, I think of it as a factual 
piece of data and if we want to add the test purpose then we should call 
it that. The use of keywords in data is generally a typeless catch-all. 
All of this metadata should be clear and well defined which does not in 
my opinion allow scope for a keywords element.

I would suggest that we add a variable called Purposes which is an array 
containing a set of predefined elements like stress, benchmark, load and 
functional etc.

For example

Purposes:
   - stress
   - load

>> It would be easier to store the data separately and we could use Maloo
>> but it's very important
>> that this data becomes part of the Lustre 'source' so that everybody can
>> benefit from it. Adding
>> tickets is not a problem as part of the resolution issue is to ensure
>> that at least one test exercises
>> the problem and proves it has been fixed, the fact that this assurance
>> process requires active
>> interaction by an engineer with the scripts is a positive.
>>
>> As for pass rate, execution time and gold configurations this
>> in

Re: [Lustre-discuss] Metadata storage in test script files

2012-05-01 Thread Andreas Dilger
On 2012-05-01, at 9:23 PM, Roman Grigoryev wrote:
> On 05/01/2012 08:17 PM, Chris wrote:
>> The metadata can be used in a multitude of ways, for example we can
>> create dynamic test sets based on
>> the changes made or target area of testing. What we are doing here is
>> creating an understanding of the
>> tests that we have so that we can improve our processes and testing
>> capabilities in the future.
> 
> I think that when are are defining tool we should say about purpose.
> F.e. good description  and summary is not needed for creating dynamic
> test sets. I think, it very important to say how will we use it.
> Continue of this idea please read below.
> 
>> The metadata does not go to the results. The metadata is a database in
>> it's own right and should metadata about a test be required it would be 
>> accessed from the source (database) itself.
> 
> I think fields like title, summary, and, possible. description should be
> present in results too. It can be very helpful for quickly understanding
> test results.

I think what Chris was suggesting is the opposite of what you state here.  He 
was writing that the "test metadata" under discussion here is the static 
description of the test to be stored with the test itself.  Chris is 
specifically excluding any runtime data from being stored with the test, not 
(as you suggest) excluding the display of this description in the test results.

>>> On 04/30/2012 08:50 PM, Chris wrote:
 Prerequisites:Pre-requisite tests that must be run before this test 
 can be run. This is again an array which presumes a test may
 have multiple pre-requisites, but the data should not contain a
 chain of prerequisites, i.e. if A requires B and B requires C, the
 pre-requisites of A is B not B & C.
>>> On which step do you want to check chains? And what is logical base
>>> for this prerequisites exclude case that current tests have hidden
>>> dependencies?
>>>  I don't see any difference between one test which have body from tests 
>>> a,b,c and this prerequisites definition.
>>> Could you please explain more why we need this field?
>> As I said we can mine this data any-time and anyway that we want, and
>> the purpose of this discussion is the data not how we use it. But as
>> an example something that dynamically built
>> test sets would need to know prerequisites.
>> 
>> The suffix of a,b,c could be used to generate prerequisite information
>> but it is firstly inflexible, for example I bet 'b','c' and 'd' are
>> often dependent on 'a' but not each other, secondly and more
>> importantly we want a standard form for storing metadata because we
>> want to introduce order and knowledge into the test
>> scripts that we have today.
> 
> Why I asked about way of usage: if we want to use this information in
> scripts and in other automated way we must strictly specify logic on
> items and provides tool for check it.

I think it is sufficient to have a well-structured repository of test
metadata, and then multiple uses can be found for this data.  Even for
human use, a good description of what the test is supposed to check,
and why this test exists would be a good start.

The test metadata format is extensible, so should we need more fields
in the future it will be possible to add them.  I think the hardest
work will be to get good text descriptions of the tests, not mechanical
issues like dependencies and such.

> F.e. we will use it when built test execution queue. We have chain like
> this: test C prerequisite B, test B prerequisite A. Test A doesn't have
> prerequisite. In one good day test A became excluded. Is it possible to
> execute test C?
> But if we will not use it in scripting there is no big logical problem.
> 
> (My opinion: I don't like this situation and think that test
> dependencies should be used only in very specific and rare case.)
> 
>> 
 TicketIDs: This is an array of ticket numbers that this test
 explicitly tests. In theory we should aim for the state where
 every ticket has a test associated with it, and in future we
 should be able to carry out a gap analysis.
 
>>> I suggest add keywords(Components could be translated as keywords too) and 
>>> test type (stress, benchmark, load, functional, negative,
>>> etc) for quick filtering. For example, SLOW could transform to
>>> keyword.
>> This seems like a reasonable idea although we need a name that describes 
>> what it is, we will need to define that set of possible
>> words as we need to with the Components elements.
> 
> I mean that 'keywords' should be separated from components but could be
> logically included. I think, 'Components' is special type of keywords.
> 
>> What should this field be called - we should not reduce the value of
>> this data why genericizing it into 'keywords'.
>> 
>>> Also,  I would like to mention, we have 3 different logical types of
>>> data:
>>> 1) just human-readable descriptions
>>> 2) filtering and targeting fields (C

Re: [Lustre-discuss] Metadata storage in test script files

2012-05-01 Thread Roman Grigoryev
Hi Cris,

On 05/01/2012 08:17 PM, Chris wrote:
> On 30/04/2012 19:15, Roman Grigoryev wrote:
>> Hi Cris,
>> I'm glad to read next emails on this direction.
>> Please don't consider this as criticism, I just would like to get more
>> clearness: what is target of adding this metadata? Do you have plans to
>> use the metadata in other scripts? How? Does this metadata go to to
>> results?
>>
>> Also please see more my comments inline:
> The metadata can be used in a multitude of ways, for example we can
> create dynamic test sets based on
> the changes made or target area of testing. What we are doing here is
> creating an understanding of the
> tests that we have so that we can improve our processes and testing
> capabilities in the future.

I think that when are are defining tool we should say about purpose.
F.e. good description  and summary is not needed for creating dynamic
test sets. I think, it very important to say how will we use it.
Continue of this idea please read below.

> 
> The metadata does not go to the results. The metadata is a database in
> it's own right and should metadata
> about a test be required it would be accessed from the source (database)
> itself.

I think fields like title, summary, and, possible. description should be
present in results too. It can be very helpful for quickly understanding
test results.

> 
>> On 04/30/2012 08:50 PM, Chris wrote:
> ... snip ...
> 
>>> Prerequisites:Pre-requisite tests that must be run before this test
>>> can be run. This is again an array which presumes a test may have
>>> multiple pre-requisites, but the data should not contain a chain of
>>> prerequisites, i.e. if A requires B and B requires C, the pre-requisites
>>> of A is B not B&  C.
>> On which step do you want to check chains? And what is logical base for
>> this prerequisites exclude case that current tests have hidden
>> dependencies?
>>   I don't see any difference between one test which have body from tests
>> a,b,c and this prerequisites definition.
>> Could you please explain more why we need this field?
> As I said we can mine this data any-time and anyway that we want, and
> the purpose of this
> discussion is the data not how we use it. But as an example something
> that dynamically built
> test sets would need to know prerequisites.
> 
> The suffix of a,b,c could be used to generate prerequisite information
> but it is firstly inflexible, for example
> I bet 'b','c' and 'd' are often dependent on 'a' but not each other,
> secondly and more importantly we want a
> standard form for storing metadata because we want to introduce order
> and knowledge into the test
> scripts that we have today.

Why I asked about way of usage: if we want to use this information in
scripts and in other automated way we must strictly specify logic on
items and provides tool for check it.

F.e. we will use it when built test execution queue. We have chain like
this: test C prerequisite B, test B prerequisite A. Test A doesn't have
prerequisite. In one good day test A became excluded. Is it possible to
execute test C?
But if we will not use it in scripting there is no big logical problem.

(My opinion: I don't like this situation and think that test
dependencies should be used only in very specific and rare case.)

> 
>>> TicketIDs: This is an array of ticket numbers that this test
>>> explicitly tests. In theory we should aim for the state where every
>>> ticket has a test associated with it, and in future we should be able to
>>> carry out a gap analysis.
>>>
>> I suggest add keywords(Components could be translated as keywords too)
>> and test type (stress, benchmark, load, functional, negative, etc) for
>> quick filtering. For example, SLOW could transform to keyword.
> This seems like a reasonable idea although we need a name that describes
> what it is,
> we will need to define that set of possible words as we need to with the
> Components elements.

I mean that 'keywords' should be separated from components but could be
logically included. I think, 'Components' is special type of keywords.

> 
> What should this field be called - we should not reduce the value of
> this data why genericizing it
> into 'keywords'.
>> Also,  I would like to mention, we have 3 different logical types of
>> data:
>> 1) just human-readable descriptions
>> 2) filtering and targeting fields (Componens, keywords if you agree with
>> my suggestion)
>> 3) framework directives(Prerequisites)
>>
>>> As time goes on we may well expand this compulsory list, but this is I
>>> believe a sensible starting place.
>>>
>>> Being part of the source this data will be subject to the same review
>>> process as any other change and so we cannot store dynamic data here,
>>> such as pass rates etc.
>> What you you think, maybe it is good idea to keep metadata separately?
>> This can be useful for simplifying changing data via script for mass
>> modification also as adding tickets and pass rate and execution time on
>> 'go

Re: [Lustre-discuss] Metadata storage in test script files

2012-05-01 Thread Chris
On 30/04/2012 19:15, Roman Grigoryev wrote:
> Hi Cris,
> I'm glad to read next emails on this direction.
> Please don't consider this as criticism, I just would like to get more
> clearness: what is target of adding this metadata? Do you have plans to
> use the metadata in other scripts? How? Does this metadata go to to results?
>
> Also please see more my comments inline:
The metadata can be used in a multitude of ways, for example we can 
create dynamic test sets based on
the changes made or target area of testing. What we are doing here is 
creating an understanding of the
tests that we have so that we can improve our processes and testing 
capabilities in the future.

The metadata does not go to the results. The metadata is a database in 
it's own right and should metadata
about a test be required it would be accessed from the source (database) 
itself.

> On 04/30/2012 08:50 PM, Chris wrote:
... snip ...

>> Prerequisites:Pre-requisite tests that must be run before this test
>> can be run. This is again an array which presumes a test may have
>> multiple pre-requisites, but the data should not contain a chain of
>> prerequisites, i.e. if A requires B and B requires C, the pre-requisites
>> of A is B not B&  C.
> On which step do you want to check chains? And what is logical base for
> this prerequisites exclude case that current tests have hidden
> dependencies?
>   I don't see any difference between one test which have body from tests
> a,b,c and this prerequisites definition.
> Could you please explain more why we need this field?
As I said we can mine this data any-time and anyway that we want, and 
the purpose of this
discussion is the data not how we use it. But as an example something 
that dynamically built
test sets would need to know prerequisites.

The suffix of a,b,c could be used to generate prerequisite information 
but it is firstly inflexible, for example
I bet 'b','c' and 'd' are often dependent on 'a' but not each other, 
secondly and more importantly we want a
standard form for storing metadata because we want to introduce order 
and knowledge into the test
scripts that we have today.

>> TicketIDs: This is an array of ticket numbers that this test
>> explicitly tests. In theory we should aim for the state where every
>> ticket has a test associated with it, and in future we should be able to
>> carry out a gap analysis.
>>
> I suggest add keywords(Components could be translated as keywords too)
> and test type (stress, benchmark, load, functional, negative, etc) for
> quick filtering. For example, SLOW could transform to keyword.
This seems like a reasonable idea although we need a name that describes 
what it is,
we will need to define that set of possible words as we need to with the 
Components elements.

What should this field be called - we should not reduce the value of 
this data why genericizing it
into 'keywords'.
> Also,  I would like to mention, we have 3 different logical types of data:
> 1) just human-readable descriptions
> 2) filtering and targeting fields (Componens, keywords if you agree with
> my suggestion)
> 3) framework directives(Prerequisites)
>
>> As time goes on we may well expand this compulsory list, but this is I
>> believe a sensible starting place.
>>
>> Being part of the source this data will be subject to the same review
>> process as any other change and so we cannot store dynamic data here,
>> such as pass rates etc.
> What you you think, maybe it is good idea to keep metadata separately?
> This can be useful for simplifying changing data via script for mass
> modification also as adding tickets and pass rate and execution time on
> 'gold' configurations?
It would be easier to store the data separately and we could use Maloo 
but it's very important
that this data becomes part of the Lustre 'source' so that everybody can 
benefit from it. Adding
tickets is not a problem as part of the resolution issue is to ensure 
that at least one test exercises
the problem and proves it has been fixed, the fact that this assurance 
process requires active
interaction by an engineer with the scripts is a positive.

As for pass rate, execution time and gold configurations this 
information is just not 1 dimensional
enough to store in the source.

Chris

___
Lustre-discuss mailing list
Lustre-discuss@lists.lustre.org
http://lists.lustre.org/mailman/listinfo/lustre-discuss


Re: [Lustre-discuss] Metadata storage in test script files

2012-04-30 Thread Roman Grigoryev
Hi Cris,
I'm glad to read next emails on this direction.
Please don't consider this as criticism, I just would like to get more
clearness: what is target of adding this metadata? Do you have plans to
use the metadata in other scripts? How? Does this metadata go to to results?

Also please see more my comments inline:

On 04/30/2012 08:50 PM, Chris wrote:
> Hi,
> 
> Further to previous discussions titled "your opinion about testing" I'd 
> like to propose a meta data format for the test script files and would 
> obviously welcome peoples input;
> 
> Effectively each test in the scripts is represented by a function and a 
> call to run_test, so we have
> 
> test_function() {
>  ...code
> }
> 
> run_test function "Description of the function"
> 
> I'd like to propose that above every function a here document is placed 
> that contains yaml v1.2 encoded data (yaml.org) with 2 characters for 
> the indent. The block will start with << TEST_METADATA and be terminated 
> with TEST_METADATA. We might want to place it in a comment block but 
> this is not really required. The block will also be wrapped at 80 
> characters for readability.
> 
> The compulsory elements to the data will be
> Name:Name of the function, this ensures pairing between 
> function and comments is not just file relative.
> Summary:  Will often be the description after the run_test but 
> not always as the tense will change
> Description:   A full description of the function, the more 
> information here the better.
> Components:  This is the component described in the commit message 
> (http://wiki.whamcloud.com/display/PUB/Commit+Comments) to make this 
> useful we will need to come up a with a defined set of components that 
> will need to be enforced in the commit message. The format of this entry 
> will be a yaml array.
> Prerequisites:Pre-requisite tests that must be run before this test 
> can be run. This is again an array which presumes a test may have 
> multiple pre-requisites, but the data should not contain a chain of 
> prerequisites, i.e. if A requires B and B requires C, the pre-requisites 
> of A is B not B & C.

On which step do you want to check chains? And what is logical base for
this prerequisites exclude case that current tests have hidden
dependencies?
 I don't see any difference between one test which have body from tests
a,b,c and this prerequisites definition.
Could you please explain more why we need this field?

> TicketIDs: This is an array of ticket numbers that this test 
> explicitly tests. In theory we should aim for the state where every 
> ticket has a test associated with it, and in future we should be able to 
> carry out a gap analysis.
> 

I suggest add keywords(Components could be translated as keywords too)
and test type (stress, benchmark, load, functional, negative, etc) for
quick filtering. For example, SLOW could transform to keyword.

Also,  I would like to mention, we have 3 different logical types of data:
1) just human-readable descriptions
2) filtering and targeting fields (Componens, keywords if you agree with
my suggestion)
3) framework directives(Prerequisites)

> As time goes on we may well expand this compulsory list, but this is I 
> believe a sensible starting place.
> 
> Being part of the source this data will be subject to the same review 
> process as any other change and so we cannot store dynamic data here, 
> such as pass rates etc.

What you you think, maybe it is good idea to keep metadata separately?
This can be useful for simplifying changing data via script for mass
modification also as adding tickets and pass rate and execution time on
'gold' configurations?

Thanks,
Roman

> 
> Do people think that additional data fields should be permitted on an 
> adhoc basis or should a control list of permitted data elements be kept. 
> I'm tempted to say that adhoc additional fields should be allowed, 
> although this could lead to name clashes if people are not careful.
> 
> Below is an simple example.
> 
> ===
> < Name:
>before_upgrade_create_data
> Summary:
>Copies lustre source into a node specific directory and then creates 
> a tarball using that directory
> Description:
>This should be called prior to upgrading Lustre and creates a set of 
> data on the Lustre partition
>which be accessed and checked after the upgrade has taken place. 
> Several methods are using
>including tar'ing directories so the can later be untar'ed and 
> compared, along with create sha1's
>of stored data.
> Component:
>- lnet
>- recovery
> Prerequisites:
>- before_upgrade_clear_filesystem
> TicketIDs:
>- LU-123
>- LU-432
> TEST_METADATA
> 
> test_before_upgrade_create_data() {
> ...
> }
> 
> run_test before_upgrade_create_data "Copying lustre source into a 
> directory $IOP_DIR1, creating and then using source to create a tarball

[Lustre-discuss] Metadata storage in test script files

2012-04-30 Thread Chris
Hi,

Further to previous discussions titled "your opinion about testing" I'd 
like to propose a meta data format for the test script files and would 
obviously welcome peoples input;

Effectively each test in the scripts is represented by a function and a 
call to run_test, so we have

test_function() {
 ...code
}

run_test function "Description of the function"

I'd like to propose that above every function a here document is placed 
that contains yaml v1.2 encoded data (yaml.org) with 2 characters for 
the indent. The block will start with << TEST_METADATA and be terminated 
with TEST_METADATA. We might want to place it in a comment block but 
this is not really required. The block will also be wrapped at 80 
characters for readability.

The compulsory elements to the data will be
Name:Name of the function, this ensures pairing between 
function and comments is not just file relative.
Summary:  Will often be the description after the run_test but 
not always as the tense will change
Description:   A full description of the function, the more 
information here the better.
Components:  This is the component described in the commit message 
(http://wiki.whamcloud.com/display/PUB/Commit+Comments) to make this 
useful we will need to come up a with a defined set of components that 
will need to be enforced in the commit message. The format of this entry 
will be a yaml array.
Prerequisites:Pre-requisite tests that must be run before this test 
can be run. This is again an array which presumes a test may have 
multiple pre-requisites, but the data should not contain a chain of 
prerequisites, i.e. if A requires B and B requires C, the pre-requisites 
of A is B not B & C.
TicketIDs: This is an array of ticket numbers that this test 
explicitly tests. In theory we should aim for the state where every 
ticket has a test associated with it, and in future we should be able to 
carry out a gap analysis.

As time goes on we may well expand this compulsory list, but this is I 
believe a sensible starting place.

Being part of the source this data will be subject to the same review 
process as any other change and so we cannot store dynamic data here, 
such as pass rates etc.

Do people think that additional data fields should be permitted on an 
adhoc basis or should a control list of permitted data elements be kept. 
I'm tempted to say that adhoc additional fields should be allowed, 
although this could lead to name clashes if people are not careful.

Below is an simple example.

===