Hi all, I have spent some time looking into how we might be able to represent stochastic models in CellML. This is something that would be useful to ensure we have properly contemplated for CellML 1.2. I have pasted in the notes I wrote on this below.
Please let me know if you have any suggestions, comments, or criticisms of the below document. At some point, this will obviously need to be transformed into a more robust proposal, but for now, I just want to make sure we keep the option open to use the CellML 1.2 core to represent stochastic models. Best regards, Andrew ----- Overall goal: Describe a framework which can be used in CellML 1.2 to represent a range of different systems which require stochastic differential equations to describe. Constraints: Do not want to describe the procedure for solving the model in core CellML, only the underlying mathematical / statistical model. Want to express the model in such a way that the procedure is computable from the model. Want there to be only one interpretation of the model. Want the representation to be abstract enough that it is meaningful for a number of different fields, and not just chemical equations in a well stirred vessel. Want the representation to work naturally when mixed with systems of ordinary differential equations. Use cases: Chemical reactions under the Chemical Master Equation model of Gillespie: We need to split these into separate species. This is a Poisson process, so there are simple ways to represent it. It is more efficient to represent models using Weiner processes when this there are large enough numbers of molecules to justify this but ODEs are not being used. However, Poisson and Weiner processes are both Levy Processes, that is, they have stationary independent increments, are zero at time zero, and are cadlag. This is not necessarily a good thing because some things we want to model might have memory of past events or a time dependence. How we can represent this in CellML: For the continuous case, integral equations for the increment in terms of built in processes like Weiner and Poisson processes (I don't believe there is a clean way to represent increments in MathML). Implementations will need to identify these and work out the distribution of the time until the next event (good implementations might be able to perform symbolic algebra to work this out, but most implementations would probably just recognise common cases like Poisson distributions with arbitrary parameters, and deal with expressions involving a Weiner process by sampling from the increment distribution in each time step), which could be put into a slightly generalised Gibson-Bruck type of framework where we store the time of the next event. None of this helps for non stationary independent processes, however. How could this be related to the standards: It has been proposed that CellML 1.2 have a core specification which describes the basic way of representing the mathematical structure in very general terms, and secondary specifications be used to narrow CellML down to specific subsets which can be implemented in their entirety by a actual software packages. Core CellML 1.2 should be general enough to represent concepts like probability distributions (MathML allows new operators to be defined, so we could create ones for our base types of stochastic process). The ODE IV secondary specification would not allow stochastic models, while we would have another alternative secondary specification which extended the ODE IV one to allow certain limited types of stochastic model (limited to types we know how to solve). In terms of interaction with the typing system, in a stochastic system we have both reals and real-valued random variables. Once we have one random variable in our system, this will propagate to everything else which is affected by it, so most of the model will technically be a random variable. However, we want to be able to easily mix stochastic models with existing ODE IV models to create hybrid models, so we don't really want the required datatype to change just to connect up a random variable to a non-random variable. For this reason, I don't think it is worthwhile to consider a random variable of a certain type a different datatype, and instead we would require tools to deduce this information if they require it. _______________________________________________ cellml-discussion mailing list cellml-discussion@cellml.org http://www.cellml.org/mailman/listinfo/cellml-discussion