Re: [cellml-discussion] initial feedback on new repository metadata tools
David Nickerson wrote: Hi Tommy, Thought I'd just jot down some initial thoughts after following your invitation to have a play with the new metadata tools on the model repository :-) These are generally cosmetic suggestions more for your consideration than any urgent requirement you need to act on... Thank you for your input :). Some of the issues are the result of making this page in the short time between the time of your last email and my announcement, but I will address them in due time. The formatting of author/creator/modifier/etc names leaves a bit to be desired. I'd suggest following a more standard pattern and removing the '|' as the separation character. For example, something like: G1 O1 F1, G2 O2 F2, G3 O3 F3; or F1, G1 O1, F2, G2 O2, F3, G3 O3 with F=Family, G=Given, O=other names where you drop O's where they are not specified or add more in where needed. You'd imagine that then names specified using the vCard:FN property would more naturally slot into this kind of display. Thank you for showing me how it should be displayed. I just tossed that part up in about 5 minutes without given much thought to it. I did rush it, as you can see ;) If would be good to make the PubMed ID field a link to the PubMed page for the given reference. That would be useful. Similarly, when editing a citation it would be good to simply specify a PubMed ID or DOI or something similar and have the fields populated from that database. The repository we have now originally only supported PubMed ID, and the editor only reads data from the CellML file. If the database has an API to achieve that I would be interested in seeing how that could be done, but if not I will put the auto-population in the back burner. While its hard to judge until more metadata is presented to the user, I suspect the collapsing pane view is not going to be particularly easy to navigate around. But I guess we can wait and see once there are some more complete examples to play with. Nesting similar properties would probably be a good idea (modification history, as you pointed out). If you imagine the citation and the modification history being in the same page, you can see the benefits. Also, I could start by making an index at the top that links to the anchors that is in the heading of the various panels. Just an idea for now. In the previous framework, each piece of metadata presented to the user provided a link to the corresponding CellML/MathML code in the view cellml tab. It would be good to keep this functionality. Yup, the only issue was I don't yet have code to pull other cmeta:comment from the CellML file fully in place yet, but I don't imagine this being too hard to do. While I haven't followed through with any changes, I'm hoping that modifying a model's name or curation level will force the editor to also add modification metadata to the model? You could perhaps pre-populate the modification fields based on the changes the user is making... I am working on placing the modification entry form into the metadata editing form if the user is uploading a new model. Curation level is not in the metadata specification yet, it only lives as an attribute on the repository (the edit page provides that). As for pre-populating modification fields, how would my code know what changes the user made to the model between the last version and the one she is uploading? There is no real version tracking code in the repository now, certainly no way to run diff between them. It would be good to have a consistent interface for editing a person's name. Currently when editing metadata there is a different interface for the file creator, comment authors, and citation authors. I think the citation author interface is the best, although possibly with all three fields having equal width. And while I guess we can force people into using the full vCard:N structure when they are entering data using this editor it probably gets a bit tricky to also support the abbreviated vCard:FN property See below... In the metadata editor, there is really nothing special about the CellML Model Metadata, it just happens to be about the cellml:model resource. It would be nice to be able to use the same interface to enter the same kinds of metadata about any resource in the model document. For example, it is quite useful to cite a particular source(s) for a variable value or a particular modification to an equation. While it wouldn't be too difficult to extend the editor to include that, I would think getting the basics down first before I start making further fancy additions, and I didn't want to start making drastic changes before I get the foundation solid. The features I have completed so far pretty much mimics what this PMR originally had, except using a proper library that abstracts operations away to classes for
Re: [cellml-discussion] initial feedback on new repository metadatatools
While I haven't followed through with any changes, I'm hoping that modifying a model's name or curation level will force the editor to also add modification metadata to the model? You could perhaps pre-populate the modification fields based on the changes the user is making... I am working on placing the modification entry form into the metadata editing form if the user is uploading a new model. Curation level is not in the metadata specification yet, it only lives as an attribute on the repository (the edit page provides that). As for pre-populating modification fields, how would my code know what changes the user made to the model between the last version and the one she is uploading? There is no real version tracking code in the repository now, certainly no way to run diff between them. sorry, I meant if the user is changing the model name *only* (i.e., version/variant/part changes), or curation level *only* (once that is in the metadata) then for these relatively simple operations it probably wouldn't be too hard to populate the modification entry based on the single operation being done and the user's cellml.org membership data. The user would probably be free to edit the pre-populated fields. Does that make it any clearer? ___ cellml-discussion mailing list cellml-discussion@cellml.org http://www.cellml.org/mailman/listinfo/cellml-discussion
[cellml-discussion] removal of reaction components from models
Dear All, This email is aimed at anyone who has comments, but we particularly want to draw Matt's attention. In the CellML meeting yesterday I brought up the issue of replacement of reaction components with straight math. At present PCEnv isn't handling reaction components well - models which use reaction components aren't integrating, for one. There are also issues with math elements not being picked up if they are under role elements. Andrew has written a script to pull these math elements up a level so that they're a direct child of the component, not the role element. The script also defines delta variables as rate * stoichiometry. Running this script on the models which contain reaction components has cleared up most of the errors with undefined delta variables, so now many of the models with reaction components can now be loaded in PCEnv. The problem is that *none of them* will integrate properly. I am making the assumption that this effect is due to the reaction component, not the models, since it is so widespread among many very different models. I'm going to start rebuilding models without reaction components. However, one of the primary issues around this is that the information represented by the attributes defined in the reaction components is, while not essential for computation of the model, definitely something that we want to keep. For example, what species are reactants, products, catalysts, activators, inhibitors (and what kind of inhibitor,) etc. Ideally, these attributes would be recorded as metadata. We don't as yet have this facility, however. So the questions are: 1.) what to do with this data meanwhile 2.) how to redesign reaction descriptions using a combination of math and metadata. One of the reasons we're seeking your input Matt, is that ontologies such as BioPAX could be really useful in providing a framework for how we assign metadata to reactions, in a biological sense. Also, Sarala mentioned in the meeting yesterday that she was using BioPAX to describe reactions in her work. Regarding the first question, some of the ideas that were suggested were: a.) use commenting to describe the attributes of each reaction b.) keep the files (well, we already do anyway,) that describe the models in terms of reaction components and refer back to them later when we have the facilities to enter metadata on reactions to the models which have been rebuilt without the reaction components. So if anyone has any comments on this, they'd be much appreciated. James ___ cellml-discussion mailing list cellml-discussion@cellml.org http://www.cellml.org/mailman/listinfo/cellml-discussion
Re: [cellml-discussion] removal of reaction components from models
Hi, First off ... Horray for the demise of the reaction element. From your description, the script of Andrew's pulls apart the components and adds in the formerly implicit rate*stoichiometry as I would expect, though I would like stoichiometry to be represented as an explicit variable so we can refer to it and not have to infer from constants in the math. There may be some argument to more decomposition of this into separate components, but that is work in progress for Best Practices and Sarala's annotation work. As for keeping some sane biology in the models that is lost by removing reactions. Yes, Sarala's work will end up being the practice we want to take for this, but at the moment it is not proven and considering this is a public resource lets stick to something already specified for cellml metadata which in time can be automatically migrated to (or simply complimented with) something in Sarala's domain. Briefly, Biopax addresses the role of an entity by way of its place in an interaction process, whereas our reaction elements were quite explicit about a role. I would propose something very simple. Use the biological entity metadata (see section 4.10 Biological Entity of http://www.cellml.org/specifications/metadata/cellml_metadata_1.0#sec_general_metadata) to refer to a prescribed role within our own controlled vocab that is designed only for the purpose of maintaining this role data. The cmeta:bio_entity can contain a collection of references which allow for general concepts to be mapped to a CellML element as well as a specific physical entity from say a protein database. This means that each variable that had a role in the reaction element would now need a cmeta id assigned to it and the respective rdf written out for the bioentity data. The rdf:value of the identifier would be the URI of the respective role in the role vocab. It would make sense(and help the migration/complement of Srala's work) if the roles for modulators could follow those set out in biopax (see pages 17-19 of http://www.biopax.org/release/biopax-level2-documentation.pdf). So we would end up with simple URIs such as the following (which map into terms within some ontological context in the imaginary document http://cellml.org/vocabularies/2007/05/17/reactionmapping): For the role of entities http://cellml.org/vocabularies/2007/05/17/reactionmapping#reactant http://cellml.org/vocabularies/2007/05/17/reactionmapping#product http://cellml.org/vocabularies/2007/05/17/reactionmapping#catalyst For the kinds of modulators http://cellml.org/vocabularies/2007/05/17/reactionmapping#activation http://cellml.org/vocabularies/2007/05/17/reactionmapping#inhibition http://cellml.org/vocabularies/2007/05/17/reactionmapping#inhibition-allosteric http://cellml.org/vocabularies/2007/05/17/reactionmapping#inhibition-competitive http://cellml.org/vocabularies/2007/05/17/reactionmapping#inhibition-irreversable http://cellml.org/vocabularies/2007/05/17/reactionmapping#inhibition-noncompetitive http://cellml.org/vocabularies/2007/05/17/reactionmapping#inhibition-other http://cellml.org/vocabularies/2007/05/17/reactionmapping#inhibition-uncompetitive http://cellml.org/vocabularies/2007/05/17/reactionmapping#activation-nonallosteric http://cellml.org/vocabularies/2007/05/17/reactionmapping#activation-allosteric Sarala has a rule for generating the cmeta ids of variables and math, so perhaps it is best to make up an xslt that takes a cellml model and generates the cmeta ids for the elements and puts them back in to the elements and perhaps even creates stub rdf description elements for each of them. If you send me a component that has been decomposed from a reaction element using Andrew's script, then I'll add in the rdf metadata in the way I was thinking and post it back here. cheers Matt On 5/17/07, James Lawson [EMAIL PROTECTED] wrote: Dear All, This email is aimed at anyone who has comments, but we particularly want to draw Matt's attention. In the CellML meeting yesterday I brought up the issue of replacement of reaction components with straight math. At present PCEnv isn't handling reaction components well - models which use reaction components aren't integrating, for one. There are also issues with math elements not being picked up if they are under role elements. Andrew has written a script to pull these math elements up a level so that they're a direct child of the component, not the role element. The script also defines delta variables as rate * stoichiometry. Running this script on the models which contain reaction components has cleared up most of the errors with undefined delta variables, so now many of the models with reaction components can now be loaded in PCEnv. The problem is that *none of them* will integrate properly. I am making the assumption that this effect is due to the reaction component, not the models, since it is so widespread among many very different models. I'm going to start
Re: [cellml-discussion] removal of reaction components from models
James Lawson wrote: Dear All, This email is aimed at anyone who has comments, but we particularly want to draw Matt's attention. In the CellML meeting yesterday I brought up the issue of replacement of reaction components with straight math. At present PCEnv isn't handling reaction components well - models which use reaction components aren't integrating, for one. There are also issues with math elements not being picked up if they are under role elements. Andrew has written a script to pull these math elements up a level so that they're a direct child of the component, not the role element. The script also defines delta variables as rate * stoichiometry. Running this script on the models which contain reaction components has cleared up most of the errors with undefined delta variables, so now many of the models with reaction components can now be loaded in PCEnv. The problem is that *none of them* will integrate properly. I am making the assumption that this effect is due to the reaction component, not the models, since it is so widespread among many very different models. Have you actually looked at what exactly is happening? Is this an overflow where quantities are getting so big they go to infinity because you have a loop which doesn't have any self-regulation? I presume that the model that is being generated from the reactions by my script would be exactly identical to what you would enter if you recreated the model by hand (assuming that the entered rate laws are an accurate representation of the paper). If the rates were specified using one of the common formulae for enzyme rates, you would get product inhibition and these overflows wouldn't happen, but I think that perhaps the entered rate laws are too simple. Obviously, getting rid of the reactions is the overall goal, but I don't see any reason why you would be better to re-create the models from scratch, rather than starting from the original model with reactions, running it through my script to generate equations for this, and then fixing any problems such as rate laws which don't represent what the authors of the paper actually used and/or result in species getting exponential increases to such high concentrations they overflow the floating point representation. I'm going to start rebuilding models without reaction components. However, one of the primary issues around this is that the information represented by the attributes defined in the reaction components is, while not essential for computation of the model, definitely something that we want to keep. For example, what species are reactants, products, catalysts, activators, inhibitors (and what kind of inhibitor,) etc. Ideally, these attributes would be recorded as metadata. We don't as yet have this facility, however. My script could generate metadata as well if required. However, as Poul suggested at the CellML meeting, feedback from Matt on exactly what metadata we should generate it and where that metadata should be stored (I think it preferably should be in the model, but Matt might have a different opinion). So the questions are: 1.) what to do with this data meanwhile 2.) how to redesign reaction descriptions using a combination of math and metadata. One of the reasons we're seeking your input Matt, is that ontologies such as BioPAX could be really useful in providing a framework for how we assign metadata to reactions, in a biological sense. Also, Sarala mentioned in the meeting yesterday that she was using BioPAX to describe reactions in her work. Regarding the first question, some of the ideas that were suggested were: a.) use commenting to describe the attributes of each reaction b.) keep the files (well, we already do anyway,) that describe the models in terms of reaction components and refer back to them later when we have the facilities to enter metadata on reactions to the models which have been rebuilt without the reaction components. So if anyone has any comments on this, they'd be much appreciated. James ___ cellml-discussion mailing list cellml-discussion@cellml.org http://www.cellml.org/mailman/listinfo/cellml-discussion Best regards, Andrew Miller ___ cellml-discussion mailing list cellml-discussion@cellml.org http://www.cellml.org/mailman/listinfo/cellml-discussion
Re: [cellml-discussion] Proposed backward-incompatible changes totheCCGS.
excellent, thanks for the explanations Andrew - it all makes sense now :-) Andre. Andrew Miller wrote: David Nickerson wrote: Hi Andrew, I think the advantages offered by this transition outweigh the lack of backwards compatibility and have no objections to your proposal, just a few clarifications below... I am planning on completely updating the CCGS to make use of CeVAS, MaLaES, and CUSES. Due to the large scale of the changes, I am taking this opportunity to fix some design issues which currently limit what CCGS can do. This inevitably means breaking the existing interface, and any code that uses it (I could have tried to make the interfaces look the same, but the code generated would still be different, due to the changes discussed below). Major changes proposed: * The concept of variables and rates in CCGS is being replaced by a concept of a computation target. A computation target is anything which can be computed by CCGS, and includes variables and rates (possibly multiple times). I'm not sure I follow how variable's are linked to a ComputationTarget. You say that a given variable may be associated with multiple computable's - does this mean there are possibly multiple methods to compute the same variable or will all the ComputationTarget's for a given variable resolve to the same computational steps? Hi Andre, 'Computing a variable' no longer has much meaning, because there can be more than one thing which you need to compute (multiple derivatives of the same variable). For example, if you have an equation like d^3(x)/dt^3 = t in the CellML, then what actually gets computed is: 1. The rate d^3(x)/dt^3 2. The rate d^2(x) / dt^2 (copied from the state variable for 1 above). 3. The rate d(x)/dt (copied from the state variable for 2 above). All three rates are separate computation targets associated with variable x, and all three get computed (there is no choice between them). They do not 'resolve to the same computational step' in the sense that they could be computed in different places in the generated code. However, the order chosen will always ensure that rates or variables are computed prior to being used to compute another rate or variable. * There will no longer be separate blocks of code which explicitly compute rates and blocks which explicitly compute variables. Instead, computation targets are computed, and as a side effect of this, any other algebraic or rate variable may be computed as a pre-requisite. One positive result of this is that it will be possible to use derivatives like any other variable, so the same derivative can appear more than once, and derivatives can even be solved by Newton-Raphson steps if required. It means that it is possible to efficient code which evolves the model without computing unnecessary variables until they are needed for presentation purposes. This completely changes the structure of the strings produced, causing backward-incompatibility. Could you clarify what you mean by presentation purposes? Perhaps 'presentation purposes' is not the best phrase, because obviously it might be used for further non-CellML computations rather than presentation to the user. The point is that the blocks of code generated will be split up differently. One block of code will compute how to initialise constants, another will compute the rates and the minimum other variables needed to compute the rates, and the third will compute all remaining variables. The third block does not need to be run at every integration step unless you have a compelling reason to know the values of the variables computed in it (e.g. to present them to the user, or to allow for further use of these values). This makes it possible to have lots of extra variables which just give you some more useful interpretation of the model but don't actually affect the evolution of the state variables, without worrying that they will slow the algorithm when they aren't needed (the current code does have something similar which was hacked in using pre-processor macros in the generated code, to avoid breaking backwards compatibility, but that solution is not sustainable especially for languages without pre-processors, and the new split is much cleaner). * The idea that a rate is a constant is contemplated, allowing for more efficient computation. Is this looking at the math defining a rate and checking if it would remain constant after all the other math has been processed? Yes. The current code allows for variables to be 'computed constants', but not rates. The new code gives rates equivalent status to variables, so they can also be computed only once if that is required. Some integrators might require that the rates array be backed up and copied to support this (because in reality, there is more than one rates array internal to