Re: [cellml-discussion] initial feedback on new repository metadata tools

2007-05-16 Thread Tommy Yu
David Nickerson wrote:
 Hi Tommy,
 
 Thought I'd just jot down some initial thoughts after following your 
 invitation to have a play with the new metadata tools on the model 
 repository :-) These are generally cosmetic suggestions more for your 
 consideration than any urgent requirement you need to act on...
 

Thank you for your input :). Some of the issues are the result of making this 
page in the short time between the time of your last email and my announcement, 
but I will address them in due time.

 The formatting of author/creator/modifier/etc names leaves a bit to be 
 desired. I'd suggest following a more standard pattern and removing the 
 '|' as the separation character. For example, something like:
   G1 O1 F1, G2 O2 F2,  G3 O3 F3; or
   F1, G1 O1, F2, G2 O2,  F3, G3 O3
 with F=Family, G=Given, O=other names where you drop O's where they are 
 not specified or add more in where needed. You'd imagine that then names 
 specified using the vCard:FN property would more naturally slot into 
 this kind of display.
 

Thank you for showing me how it should be displayed.  I just tossed that part 
up in about 5 minutes without given much thought to it.  I did rush it, as you 
can see ;)

 If would be good to make the PubMed ID field a link to the PubMed page 
 for the given reference.
 

That would be useful.

 Similarly, when editing a citation it would be good to simply specify a 
 PubMed ID or DOI or something similar and have the fields populated from 
 that database.
 

The repository we have now originally only supported PubMed ID, and the editor 
only reads data from the CellML file.  If the database has an API to achieve 
that I would be interested in seeing how that could be done, but if not I will 
put the auto-population in the back burner.

 While its hard to judge until more metadata is presented to the user, I 
 suspect the collapsing pane view is not going to be particularly easy to 
 navigate around. But I guess we can wait and see once there are some 
 more complete examples to play with. Nesting similar properties would 
 probably be a good idea (modification history, as you pointed out).
 

If you imagine the citation and the modification history being in the same 
page, you can see the benefits.  Also, I could start by making an index at the 
top that links to the anchors that is in the heading of the various panels.  
Just an idea for now.

 In the previous framework, each piece of metadata presented to the user 
 provided a link to the corresponding CellML/MathML code in the view 
 cellml tab. It would be good to keep this functionality.
 

Yup, the only issue was I don't yet have code to pull other cmeta:comment from 
the CellML file fully in place yet, but I don't imagine this being too hard to 
do.

 While I haven't followed through with any changes, I'm hoping that 
 modifying a model's name or curation level will force the editor to also 
 add modification metadata to the model? You could perhaps pre-populate 
 the modification fields based on the changes the user is making...
 

I am working on placing the modification entry form into the metadata editing 
form if the user is uploading a new model.  Curation level is not in the 
metadata specification yet, it only lives as an attribute on the repository 
(the edit page provides that).  As for pre-populating modification fields, how 
would my code know what changes the user made to the model between the last 
version and the one she is uploading?  There is no real version tracking code 
in the repository now, certainly no way to run diff between them.

 It would be good to have a consistent interface for editing a person's 
 name. Currently when editing metadata there is a different interface for 
 the file creator, comment authors, and citation authors. I think the 
 citation author interface is the best, although possibly with all three 
 fields having equal width. And while I guess we can force people into 
 using the full vCard:N structure when they are entering data using this 
 editor it probably gets a bit tricky to also support the abbreviated 
 vCard:FN property
 

See below...

 In the metadata editor, there is really nothing special about the CellML 
 Model Metadata, it just happens to be about the cellml:model resource. 
 It would be nice to be able to use the same interface to enter the same 
 kinds of metadata about any resource in the model document. For example, 
 it is quite useful to cite a particular source(s) for a variable value 
 or a particular modification to an equation.
 

While it wouldn't be too difficult to extend the editor to include that, I 
would think getting the basics down first before I start making further fancy 
additions, and I didn't want to start making drastic changes before I get the 
foundation solid.  The features I have completed so far pretty much mimics what 
this PMR originally had, except using a proper library that abstracts 
operations away to classes for 

Re: [cellml-discussion] initial feedback on new repository metadatatools

2007-05-16 Thread David Nickerson
 While I haven't followed through with any changes, I'm hoping that 
 modifying a model's name or curation level will force the editor to also 
 add modification metadata to the model? You could perhaps pre-populate 
 the modification fields based on the changes the user is making...

 
 I am working on placing the modification entry form into the metadata editing 
 form if the user is uploading a new model.  Curation level is not in the 
 metadata specification yet, it only lives as an attribute on the repository 
 (the edit page provides that).  As for pre-populating modification fields, 
 how would my code know what changes the user made to the model between the 
 last version and the one she is uploading?  There is no real version tracking 
 code in the repository now, certainly no way to run diff between them.

sorry, I meant if the user is changing the model name *only* (i.e., 
version/variant/part changes), or curation level *only* (once that is in 
the metadata) then for these relatively simple operations it probably 
wouldn't be too hard to populate the modification entry based on the 
single operation being done and the user's cellml.org membership data. 
The user would probably be free to edit the pre-populated fields. Does 
that make it any clearer?
___
cellml-discussion mailing list
cellml-discussion@cellml.org
http://www.cellml.org/mailman/listinfo/cellml-discussion


[cellml-discussion] removal of reaction components from models

2007-05-16 Thread James Lawson
Dear All,

This email is aimed at anyone who has comments, but we particularly want
to draw Matt's attention.

In the CellML meeting yesterday I brought up the issue of replacement of
reaction components with straight math. At present PCEnv isn't handling
reaction components well - models which use reaction components aren't
integrating, for one. There are also issues with math elements not being
picked up if they are under role elements. Andrew has written a script
to pull these math elements up a level so that they're a direct child of
the component, not the role element. The script also defines delta
variables as rate * stoichiometry. Running this script on the models
which contain reaction components has cleared up most of the errors with
undefined delta variables, so now many of the models with reaction
components can now be loaded in PCEnv. The problem is that *none of
them* will integrate properly. I am making the assumption that this
effect is due to the reaction component, not the models, since it is so
widespread among many very different models.

I'm going to start rebuilding models without reaction components.
However, one of the primary issues around this is that the information
represented by the attributes defined in the reaction components is,
while not essential for computation of the model, definitely something
that we want to keep. For example, what species are reactants, products,
catalysts, activators, inhibitors (and what kind of inhibitor,) etc.
Ideally, these attributes would be recorded as metadata. We don't as yet
have this facility, however.

So the questions are:
1.) what to do with this data meanwhile
2.) how to redesign reaction descriptions using a combination of math
and metadata.

One of the reasons we're seeking your input Matt, is that ontologies
such as BioPAX could be really useful in providing a framework for how
we assign metadata to reactions, in a biological sense. Also, Sarala
mentioned in the meeting yesterday that she was using BioPAX to describe
reactions in her work.

Regarding the first question, some of the ideas that were suggested were:
a.) use commenting to describe the attributes of each reaction
b.) keep the files (well, we already do anyway,) that describe the
models in terms of reaction components and refer back to them later when
we have the facilities to enter metadata on reactions to the models
which have been rebuilt without the reaction components.

So if anyone has any comments on this, they'd be much appreciated.

James

___
cellml-discussion mailing list
cellml-discussion@cellml.org
http://www.cellml.org/mailman/listinfo/cellml-discussion


Re: [cellml-discussion] removal of reaction components from models

2007-05-16 Thread Matt
Hi,

First off ... Horray for the demise of the reaction element. From
your description, the script of Andrew's pulls apart the components
and adds in the formerly implicit rate*stoichiometry as I would
expect, though I would like stoichiometry to be represented as an
explicit variable so we can refer to it and not have to infer from
constants in the math. There may be some argument to more
decomposition of this into separate components, but that is work in
progress for Best Practices and Sarala's annotation work.

As for keeping some sane biology in the models that is lost by
removing reactions. Yes, Sarala's work will end up being the practice
we want to take for this, but at the moment it is not proven and
considering this is a public resource lets stick to something already
specified for cellml metadata which in time can be automatically
migrated to (or simply complimented with) something in Sarala's
domain. Briefly, Biopax addresses the role of an entity by way of its
place in an interaction process, whereas our reaction elements were
quite explicit about a role.

I would propose something very simple. Use the biological entity
metadata (see section 4.10  Biological Entity of
http://www.cellml.org/specifications/metadata/cellml_metadata_1.0#sec_general_metadata)
to refer to a prescribed role within our own controlled vocab that is
designed only for the purpose of maintaining this role data. The
cmeta:bio_entity can contain a collection of references which allow
for general concepts to be mapped to a CellML element as well as a
specific physical entity from say a protein database.

This means that each variable that had a role in the reaction element
would now need a cmeta id assigned to it and the respective rdf
written out for the bioentity data. The rdf:value of the identifier
would be the URI of the respective role in the role vocab.

It would make sense(and help the migration/complement of Srala's work)
if the roles for modulators could follow those set out in biopax (see
pages 17-19 of http://www.biopax.org/release/biopax-level2-documentation.pdf).

So we would end up with simple URIs such as the following (which map
into terms within some ontological context in the imaginary document
http://cellml.org/vocabularies/2007/05/17/reactionmapping):


For the role of entities
http://cellml.org/vocabularies/2007/05/17/reactionmapping#reactant
http://cellml.org/vocabularies/2007/05/17/reactionmapping#product
http://cellml.org/vocabularies/2007/05/17/reactionmapping#catalyst

For the kinds of modulators
http://cellml.org/vocabularies/2007/05/17/reactionmapping#activation
http://cellml.org/vocabularies/2007/05/17/reactionmapping#inhibition
http://cellml.org/vocabularies/2007/05/17/reactionmapping#inhibition-allosteric
http://cellml.org/vocabularies/2007/05/17/reactionmapping#inhibition-competitive
http://cellml.org/vocabularies/2007/05/17/reactionmapping#inhibition-irreversable
http://cellml.org/vocabularies/2007/05/17/reactionmapping#inhibition-noncompetitive
http://cellml.org/vocabularies/2007/05/17/reactionmapping#inhibition-other
http://cellml.org/vocabularies/2007/05/17/reactionmapping#inhibition-uncompetitive
http://cellml.org/vocabularies/2007/05/17/reactionmapping#activation-nonallosteric
http://cellml.org/vocabularies/2007/05/17/reactionmapping#activation-allosteric

Sarala has a rule for generating the cmeta ids of variables and math,
so perhaps it is best to make up an xslt that takes a cellml model and
generates the cmeta ids for the elements and puts them back in to the
elements and perhaps even creates stub rdf description elements for
each of them.

If you send me a component that has been decomposed from a reaction
element using Andrew's script, then I'll add in the rdf metadata in
the way I was thinking and post it back here.

cheers
Matt


On 5/17/07, James Lawson [EMAIL PROTECTED] wrote:
 Dear All,

 This email is aimed at anyone who has comments, but we particularly want
 to draw Matt's attention.

 In the CellML meeting yesterday I brought up the issue of replacement of
 reaction components with straight math. At present PCEnv isn't handling
 reaction components well - models which use reaction components aren't
 integrating, for one. There are also issues with math elements not being
 picked up if they are under role elements. Andrew has written a script
 to pull these math elements up a level so that they're a direct child of
 the component, not the role element. The script also defines delta
 variables as rate * stoichiometry. Running this script on the models
 which contain reaction components has cleared up most of the errors with
 undefined delta variables, so now many of the models with reaction
 components can now be loaded in PCEnv. The problem is that *none of
 them* will integrate properly. I am making the assumption that this
 effect is due to the reaction component, not the models, since it is so
 widespread among many very different models.

 I'm going to start 

Re: [cellml-discussion] removal of reaction components from models

2007-05-16 Thread Andrew Miller
James Lawson wrote:
 Dear All,

 This email is aimed at anyone who has comments, but we particularly want
 to draw Matt's attention.

 In the CellML meeting yesterday I brought up the issue of replacement of
 reaction components with straight math. At present PCEnv isn't handling
 reaction components well - models which use reaction components aren't
 integrating, for one. There are also issues with math elements not being
 picked up if they are under role elements. Andrew has written a script
 to pull these math elements up a level so that they're a direct child of
 the component, not the role element. The script also defines delta
 variables as rate * stoichiometry. Running this script on the models
 which contain reaction components has cleared up most of the errors with
 undefined delta variables, so now many of the models with reaction
 components can now be loaded in PCEnv. The problem is that *none of
 them* will integrate properly. I am making the assumption that this
 effect is due to the reaction component, not the models, since it is so
 widespread among many very different models.
   
Have you actually looked at what exactly is happening? Is this an 
overflow where quantities are getting so big they go to infinity because 
you have a loop which doesn't have any self-regulation?

I presume that the model that is being generated from the reactions by 
my script would be exactly identical to what you would enter if you 
recreated the model by hand (assuming that the entered rate laws are an 
accurate representation of the paper). If the rates were specified using 
one of the common formulae for enzyme rates, you would get product 
inhibition and these overflows wouldn't happen, but I think that perhaps 
the entered rate laws are too simple.

Obviously, getting rid of the reactions is the overall goal, but I don't 
see any reason why you would be better to re-create the models from 
scratch, rather than starting from the original model with reactions, 
running it through my script to generate equations for this, and then 
fixing any problems such as rate laws which don't represent what the 
authors of the paper actually used and/or result in species getting 
exponential increases to such high concentrations they overflow the 
floating point representation.

 I'm going to start rebuilding models without reaction components.
 However, one of the primary issues around this is that the information
 represented by the attributes defined in the reaction components is,
 while not essential for computation of the model, definitely something
 that we want to keep. For example, what species are reactants, products,
 catalysts, activators, inhibitors (and what kind of inhibitor,) etc.
 Ideally, these attributes would be recorded as metadata. We don't as yet
 have this facility, however.
   
My script could generate metadata as well if required. However, as Poul 
suggested at the CellML meeting, feedback from Matt on exactly what 
metadata we should generate it and where that metadata should be stored 
(I think it preferably should be in the model, but Matt might have a 
different opinion).
 So the questions are:
 1.) what to do with this data meanwhile
 2.) how to redesign reaction descriptions using a combination of math
 and metadata.

 One of the reasons we're seeking your input Matt, is that ontologies
 such as BioPAX could be really useful in providing a framework for how
 we assign metadata to reactions, in a biological sense. Also, Sarala
 mentioned in the meeting yesterday that she was using BioPAX to describe
 reactions in her work.

 Regarding the first question, some of the ideas that were suggested were:
 a.) use commenting to describe the attributes of each reaction
 b.) keep the files (well, we already do anyway,) that describe the
 models in terms of reaction components and refer back to them later when
 we have the facilities to enter metadata on reactions to the models
 which have been rebuilt without the reaction components.

 So if anyone has any comments on this, they'd be much appreciated.

 James

 ___
 cellml-discussion mailing list
 cellml-discussion@cellml.org
 http://www.cellml.org/mailman/listinfo/cellml-discussion
   

Best regards,
Andrew Miller

___
cellml-discussion mailing list
cellml-discussion@cellml.org
http://www.cellml.org/mailman/listinfo/cellml-discussion


Re: [cellml-discussion] Proposed backward-incompatible changes totheCCGS.

2007-05-16 Thread David Nickerson
excellent, thanks for the explanations Andrew - it all makes sense now :-)


Andre.

Andrew Miller wrote:
 David Nickerson wrote:
 Hi Andrew,

 I think the advantages offered by this transition outweigh the lack of 
 backwards compatibility and have no objections to your proposal, just a 
 few clarifications below...

   
 I am planning on completely updating the CCGS to make use of CeVAS, 
 MaLaES, and CUSES. Due to the large scale of the changes, I am taking 
 this opportunity to fix some design issues which currently limit what 
 CCGS can do. This inevitably means breaking the existing interface, and 
 any code that uses it (I could have tried to make the interfaces look 
 the same, but the code generated would still be different, due to the 
 changes discussed below).

 Major changes proposed:
 * The concept of variables and rates in CCGS is being replaced by a 
 concept of a computation target. A computation target is anything which 
 can be computed by CCGS, and includes variables and  rates (possibly 
 multiple times).
 
 I'm not sure I follow how variable's are linked to a ComputationTarget. 
 You say that a given variable may be associated with multiple 
 computable's - does this mean there are possibly multiple methods to 
 compute the same variable or will all the ComputationTarget's for a 
 given variable resolve to the same computational steps?
   
 Hi Andre,
 
 'Computing a variable' no longer has much meaning, because there can be 
 more than one thing which you need to compute (multiple derivatives of 
 the same variable).
 
 For example, if you have an equation like d^3(x)/dt^3 = t in the CellML, 
 then what actually gets computed is:
 1. The rate d^3(x)/dt^3
 2. The rate d^2(x) / dt^2 (copied from the state variable for 1 above).
 3. The rate d(x)/dt (copied from the state variable for 2 above).
 
 All three rates are separate computation targets associated with 
 variable x, and all three get computed (there is no choice between 
 them). They do not 'resolve to the same computational step' in the sense 
 that they could be computed in different places in the generated code. 
 However, the order chosen will always ensure that rates or variables are 
 computed prior to being used to compute another rate or variable.
   
 * There will no longer be separate blocks of code which explicitly 
 compute rates and blocks which explicitly compute variables. Instead, 
 computation targets are computed, and as a side effect of this, any 
 other algebraic or rate variable may be computed as a pre-requisite. One 
 positive result of this is that it will be possible to use derivatives 
 like any other variable, so the same derivative can appear more than 
 once, and derivatives can even be solved by Newton-Raphson steps if 
 required. It means that it is possible to efficient code which evolves 
 the model without computing unnecessary variables until they are needed 
 for presentation purposes. This completely changes the structure of the 
 strings produced, causing backward-incompatibility.
 
 Could you clarify what you mean by presentation purposes?
   
 Perhaps 'presentation purposes' is not the best phrase, because 
 obviously it might be used for further non-CellML computations rather 
 than presentation to the user. The point is that the blocks of code 
 generated will be split up differently. One block of code will compute 
 how to initialise constants, another will compute the rates and the 
 minimum other variables needed to compute the rates, and the third will 
 compute all remaining variables. The third block does not need to be run 
 at every integration step unless you have a compelling reason to know 
 the values of the variables computed in it (e.g. to present them to the 
 user, or to allow for further use of these values). This makes it 
 possible to have lots of extra variables which just give you some more 
 useful interpretation of the model but don't actually affect the 
 evolution of the state variables, without worrying that they will slow 
 the algorithm when they aren't needed (the current code does have 
 something similar which was hacked in using pre-processor macros in the 
 generated code, to avoid breaking backwards compatibility, but that 
 solution is not sustainable especially for languages without 
 pre-processors, and the new split is much cleaner).
   
 * The idea that a rate is a constant is contemplated, allowing for more 
 efficient computation.
 
 Is this looking at the math defining a rate and checking if it would 
 remain constant after all the other math has been processed?
   
 Yes. The current code allows for variables to be 'computed constants', 
 but not rates. The new code gives rates equivalent status to variables, 
 so they can also be computed only once if that is required.
 
 Some integrators might require that the rates array be backed up and 
 copied to support this (because in reality, there is more than one rates 
 array internal to