Re: [cellml-discussion] Concerning the CellML Model Repository

2007-06-26 Thread Tommy Yu
David Nickerson wrote:
 Hi Tommy,
 
 That looks good - its all starting to make sense to me now.
 
 I'm just wondering how your system would handle a case where two authors 
 independently encode the same published model. The first author to 
 upload their encoding would get ownership of the publication alias (if 
 I have the terminology right). Is there any way for the second author to 
 get a similar alias to their encoding of the model? This is starting to 
 sound like a version/variant theme, but its probably a situation that 
 will crop up quite frequently...

I guess it depends on how those two model creators work.  If John and Mary work 
independently and two different models describing the same item were created, 
each will need have separate project directories.  If John did get the 
publication alias set up first it would obviously point to his model, but now 
Mary comes along and wants to have a separate model up also.  What could happen 
is this:

1) Publication alias is no longer an alias, but a directory holding aliases to 
users' models.
2) New model directory is created.  John and Mary's model directory is copied 
into there.

While outcome is similar, 1) separates publications from models a lot more, may 
reflect this situation when a paper with multiple models with each created by 
different people:

John, Mary and Ming creates on models a, b, c based on Doe's paper.  All three 
gets approved, and repos://publication/doe_2007_1/ is created containing 

repos://publication/doe_2007_1/pathway_a - repos://!rev/45/home/john/a
repos://publication/doe_2007_1/pathway_b - repos://!rev/60/home/mary/b
repos://publication/doe_2007_1/pathway_c - repos://!rev/54/home/ming/c

created by their respective creators.  Each published model is treated 
differently, note their revision numbers.

2) has the benefit of encouraging model creators to work together, groups the 
same models in one place, and may reflect this situation:

A publication that describes multiple models with different people coding up 
each one could have a shared UUID named workspace, owned by the people working 
on it, with each separate models in its own directory.  The publication alias 
could be owned by the whole group that worked on the model.

I just flushed this out of my head, both of these suggestions may have very 
interesting consequences that is not noted here.

This was a very good question.

 
 This is a slightly different example from your example workflow and 
 could be viewed as John and Mary both having valid and correct but 
 different encodings of the doe_2007_1 paper. Actually, I just saw the 
 '_1' on the publication link - is that some kind of version/variant that 
 would be _2 for Mary in my example? I had been assuming the 2007_1 meant 
 January 2007.
 

It could conceivably mean the first paper John Doe published in 2007, or 
January, as that haven't been decided yet.

Thanks,
Tommy.

 
 Thanks,
 Andre.
 
 Tommy Yu wrote:
 Hi,

 I thought Andrew's ideas here is worth expanding, and I wrote a page based 
 on that.

 http://www.cellml.org/Members/tommy/BaseRepository

 Cheers,
 Tommy.
 
 ___
 cellml-discussion mailing list
 cellml-discussion@cellml.org
 http://www.cellml.org/mailman/listinfo/cellml-discussion

___
cellml-discussion mailing list
cellml-discussion@cellml.org
http://www.cellml.org/mailman/listinfo/cellml-discussion


Re: [cellml-discussion] Concerning the CellML Model Repository

2007-06-26 Thread Matt
This is my view of where things should be heading:

The main impetus for this thread is moving the cellml.org site
forward. In this sense I would like to see a description of what it
currently does and what features have been informally slated.

Then I'd like to see a document that re-writes these out as use-cases
that don't depend on technology (but can certainly borrow ideas from
various technologies).

A large part of this are the cellml.org use-cases around the use of
metadata in the models.

While the underlying implementation of the repository is something to
discuss, I think that it is a red herring at the moment. The issues
seems more to do with various use-cases being difficult to represent
in the current style of model naming and the difficulty of reflecting
someone's local filesystem workflow/layout. I think there is too much
of a rush to solve the repository issue quickly based on these
idiosyncrasies of the cellml.org model naming problem.

Some(!!)  considerations:

- how is a modelers local workspace organized? e.g. we have talked
about the possible need for a manifest file; the possibility of
metadata sitting separate from the model itself; etc. Is the idea of a
workspace appropriate? Would people have multiple workspaces, say one
for each model, or one workspace for all their models, or both?

- do people want to use a single central repository? Or should they be
able to work independently in their own instance of a repository and
perhaps at some point transfer their project to another one?

- there has been an assumption that the base unit stored in a
repository should be a cellml/xml model - why is this? check the
reasons why this is believed to be the way it should be.

- don't try to figure out the URI scheme right now - even in use
cases. The only attention to URI will be the bahviour it might exhibit
in the modeling process: for example, you want someone to be able to
move from tracking a volatile branch of a model in their imports to a
stable one (that's all you have to say, not what the URIs might look
like).

- don't attach specific technologies to the repository system until
the use-case space has been filled out

The evolution of the repository is a non-small task (it's actually
someone's PhD topic). So once there is a pretty certain idea of what
the repository may be, then how does the current system in the plone
site sit with respect to this? Are there technologies that take us a
step closer that could be weaved into the current product? etc..

What are the priorities for cellml.org?
___
cellml-discussion mailing list
cellml-discussion@cellml.org
http://www.cellml.org/mailman/listinfo/cellml-discussion


Re: [cellml-discussion] Concerning the CellML Model Repository | version/variant metadata?

2007-06-24 Thread James Lawson
My $0.02 on this is (please forgive me if I get some of the technical
stuff mixed up):

The current naming scheme is as it translates to the web address is:
author(s)_date_versionXX_variantXX

I think it should be author(s)_date_variantXX_versionXX instead, since
IMO, one should be thinking in terms of versions of variants, rather
than variants of versions.

Also, I think that if there were perhaps some metadata that could
pertain to what version and variant a cellml file is, and also some
'sub'metadata under variant to say what the variant represents, whether
it's a particular cell type or what.

I realise that metadata isn't supposed to be added to a model for the
sake of a repository or for any non-generalised purpose, but I think
that version/variant metadata would be useful.
E.g. for 1.1 models, a simulator could pick this metadata up. So you
could bring up window in which the software could tell you that, for
example, you are embedding this version of this markov model of an
L-type Ca++ channel, by such and such et al., into a variant 02 -
epicardial cell Pandit et al. cardiac cell model, etc. etc.
Another example would be working with CellML 1.1 models in an era where
we have a library of components that people can use. We might have a
GPCR component which has a large number of variants, and it would be
crucial for the simulation/editing programs like PCEnv to know, and be
able to tell the user, which version and variant of each component they
are using. People might want to swap in different variants to see how if
affects their model etc.

And of course this version/variant metadata would obviously be highly
useful (IMO) for the repository. Maybe subversion could automatically
write this metadata.

 What I'm really trying to say is that I think there is justification
for version/variant information to be stored in metadata as well as the
URI naming scheme, since, unless I'm missing something, there is useful
information (both for repositories and simulator software) that can't be
 stored in the URI.

James

 
 - Version/Variant
 It already clogged up the system.  There is no proper revision control 
 mechanism, what we have now is an ad-hoc emulated system.
 
 I don't think it has clogged the system I just think it has been
 improperly used both by authors and by the user interface. 

Ideally the users and authors shouldn't be presented the option to make
mistakes like this, should they? Most people, I would imagine, don't
care about the versions of a model unless they are actually working
on/with it.


This is no
 fault of the authors, there is simply a specification for versioning
 that is missing. The hope is that subversion applies well to this.


___
cellml-discussion mailing list
cellml-discussion@cellml.org
http://www.cellml.org/mailman/listinfo/cellml-discussion


Re: [cellml-discussion] Concerning the CellML Model Repository

2007-06-22 Thread Matt
On 6/22/07, Tommy Yu [EMAIL PROTECTED] wrote:
 Matt wrote:
  Hi Tommy,
 
  Can you continue to update/fill out your document as well as begin
  associated proposals with information contained in the replies people
  are submitting. The goal of this process is a scoping document with
  associated content.
 

 It will be done when I am done refining all my thoughts about the threads 
 here, along with the other thoughts I already have but not written down there.

  More comments below.
 

 Likewise.

  On 6/22/07, Tommy Yu [EMAIL PROTECTED] wrote:
  Matt wrote:
  Hi Tommy,
 
  I found the document seemed to be too far ahead of itself. I also
  didn't find any of the pros and cons very compelling because they
  don't address specific problems and those problems are not described.
 
  1) What are you actually trying to achieve? It would be useful to
  describe the parts of the current system that are giving you grief and
  look to give you more grief based on the use cases and any axes of
  scale.
 
  Starting with what I envisioned.
 
  Who is the repository catered for?
  1) People who would like to work on models, using it as a place to store 
  work-in-progress models.
  2) Reviewers to review models.
  3) Website users to browse models.
 
  1) What do the model builders want?
  - Their own workspace (home directory)
  - A place to let reviewers review their models
  - Also to publish their models
 
  First point is not addressed by what we have now.  Second and third point 
  is quite ad-hoc.  Also, version control is very ad-hoc right now.
 
  Each of these points need to be filled out, e.g. what does it mean to
  have a workspace for a CellML modeller?, What are the scenarios and
  workflows for reviewers of CellML models?
 

 Workspace is like a home directory.  Or are you comfortable with a flat 
 filesystem where each file is owned by different people are all over the 
 place.  This is about organization according to what the model builders want.

I'm more comfortable with the latter; but exactly what that looks like
is difficult to know or perhaps ever to predict. Some work on a
manifest description and a best practice/hint would be a good start.


 Models are by default private to the owner, but s/he can expose it via the 
 layer that binds subversion and the database together which manages 
 permissions.

Try and stay away from specific underlying pieces at the moment. They
key is the description of the workflow states, transitions, and
actions.

 Other modelers could import their collegues' models (provided permissions are 
 given).

CellML import element kind of imports?


 Reviewers simply gets access to a model, a URI to a specific revision of a 
 model (and associated files, at model builder's options) will be generated 
 which s/he could use.  If reviewer has rights s/he can publish the model to 
 the public.

This should probably be model workspace, the concept of a single model
is a bit vague at the moment unless we define some rule that there
will always be a single top level model. I presume where we are
heading is that TTW, people will be accessing an index.html that
processes a manifest file and creates a pretty view of the workspace.


  2,3) Reviewers and website users
  - A centralized location to browse models.
  - They would like to see how models may relate to each other.
 
 
  How do models relate to each other? Relations between models come from
  all sorts of data within models, and within any associated metadata
  (so more than just our current cellml metadata specification). It
  would be useful to write out the details of the relationships that are
  important here as these pretty much form the basis of many of the
  queries that will need to be performed.

 It will be done.  I can see users wanting to know which component of a model 
 was imported by other models, and finding all other dependency of a 
 particular model.  More will come.

 
  First point is already addressed, but second point is definitely not 
  possible as the current repository does not support 1.1.
 
  Why does it not support CellML 1.1? i.e. what is the technology block
  here to extending the current system to support it?
 

 None, aside from the lack of a proper code versioning system in the backend.  
 With a few changes to the copy/paste code, CellML 1.1 will then be able to be 
 stored into the repository.  I could go ahead and do this, but it will only 
 further compound the issues we have now.  Okay, fine, refactor Model.py and 
 have new classes inherit from that, but we still lack certain key features, 
 such as a proper versioning backend.

So lack of support for CellML 1.1 is not a reason to rebuild the
system, but implementing CellML 1.1 support means pressure on other
ugly bits like dealing with import dependencies (perhaps uploaded as
seperate files) and would mean more work for people to manually ensure
that versions used in import URIs are correct?


 Maybe as an experiment I could 

Re: [cellml-discussion] Concerning the CellML Model Repository

2007-06-22 Thread David Nickerson
 Do we really want to proxy remote repositories?  Can we start smaller for now 
 but keep that in mind?

I think this will be an essential feature of the model repository as we 
move forward. We are trying to present model authors with a common 
platform for the distribution and archiving of their models as they go 
through development and publication cycles. At some point we are going 
to want to provide some assurances to the community in terms of 
repository accessibility - things like uptime, backups, redundancy, etc.

There is also a big question mark over the implications of the current 
geographical location of the model repository. For example, how will 
access scale when you start having tens, if not hundreds, of users 
around the world interacting with the repository on a regular basis? If 
things run too slow or access is a problem then people simply won't use 
the system.

So while it makes sense to start out with less grand plans, I think any 
plan on moving the repository forward has to take these issues into 
account and discuss how they would be addressed in any future repository 
implementation.


David.
___
cellml-discussion mailing list
cellml-discussion@cellml.org
http://www.cellml.org/mailman/listinfo/cellml-discussion


Re: [cellml-discussion] Concerning the CellML Model Repository

2007-06-22 Thread David Nickerson
It might also be worth looking into what the folks over at 
http://www.biomodels.net/ are up to. Given they seem to have curation 
built into their repository and maybe some other features worth looking 
into?

And if we're going to be starting from scratch, there might be some 
value into seeing how the biomodels repository could be extended to 
support CellML?

When you start seeing comments like BioModels Database ranked first 
data resource for Systems Biology in Nature Biotechnology, it might be 
a hint that they're doing something right and we should maybe be working 
with them rather than independently.


David.


Tommy Yu wrote:
 Hi,
 
 I have written down some of my thoughts on how the model repository could be 
 put together.
 
 http://www.cellml.org/Members/tommy/repository_redesign.html
 
 It is still a pretty rough document.  The usage example section gives a rough 
 outline on what I see people might be doing with the repository and how this 
 design could address those issues, which I think it will be of interest to 
 users.  It is not an exhaustive list, yet.
 
 I must also note the design outlined is quite a drastic departure from what 
 we have now (it will be yet another new repository).  However, it is more 
 true to the one envisioned before according to 
 http://www.cellml.org/wiki/CellMLModelRepositories, except I have an addition 
 layer that will assist in pulling content and drawing relationships between 
 models.
 
 Feel free to take it apart and/or build on top of it.
 
 Cheers,
 Tommy.
 ___
 cellml-discussion mailing list
 cellml-discussion@cellml.org
 http://www.cellml.org/mailman/listinfo/cellml-discussion

-- 
David Nickerson, PhD
Research Fellow
Division of Bioengineering
Faculty of Engineering
National University of Singapore
Email: [EMAIL PROTECTED]
___
cellml-discussion mailing list
cellml-discussion@cellml.org
http://www.cellml.org/mailman/listinfo/cellml-discussion


Re: [cellml-discussion] Concerning the CellML Model Repository

2007-06-21 Thread David Nickerson
Hi Tommy,

looks like a good starting point for some discussion. Just to help me 
think through some of the issues, is there any chance you could add a 
usage example illustrating how this system would deal with a model made 
from the combination of a bunch of papers (i.e., a single model where 
each component defines a new citation). I'm guessing this would be done 
by adding each of the components as separate models and then importing 
them into a single model?

Another usage example that might be interesting to look at would be a 
model author adding a local CellML 1.1 model hierarchy to a remote 
repository and how all the import href's are handled in this case (i.e., 
imports throughout the model hierarchy might consist of a mix of 
relative, http, and file URLs).

And another usage example might be the searching for models built using 
a specific set of data. It will hopefully become standard practice to 
annotate variable values with their source, where the source may be some 
data from a different article than the model's publication.


Thanks,
David.

Tommy Yu wrote:
 Hi,
 
 I have written down some of my thoughts on how the model repository could be 
 put together.
 
 http://www.cellml.org/Members/tommy/repository_redesign.html
 
 It is still a pretty rough document.  The usage example section gives a rough 
 outline on what I see people might be doing with the repository and how this 
 design could address those issues, which I think it will be of interest to 
 users.  It is not an exhaustive list, yet.
 
 I must also note the design outlined is quite a drastic departure from what 
 we have now (it will be yet another new repository).  However, it is more 
 true to the one envisioned before according to 
 http://www.cellml.org/wiki/CellMLModelRepositories, except I have an addition 
 layer that will assist in pulling content and drawing relationships between 
 models.
 
 Feel free to take it apart and/or build on top of it.
 
 Cheers,
 Tommy.
 ___
 cellml-discussion mailing list
 cellml-discussion@cellml.org
 http://www.cellml.org/mailman/listinfo/cellml-discussion

-- 
David Nickerson, PhD
Research Fellow
Division of Bioengineering
Faculty of Engineering
National University of Singapore
Email: [EMAIL PROTECTED]
___
cellml-discussion mailing list
cellml-discussion@cellml.org
http://www.cellml.org/mailman/listinfo/cellml-discussion


Re: [cellml-discussion] Concerning the CellML Model Repository

2007-06-21 Thread Matt
Hi Tommy,

I found the document seemed to be too far ahead of itself. I also
didn't find any of the pros and cons very compelling because they
don't address specific problems and those problems are not described.

1) What are you actually trying to achieve? It would be useful to
describe the parts of the current system that are giving you grief and
look to give you more grief based on the use cases and any axes of
scale.

2) What are the use cases? An initial set should be extracted from the
current site. You have written out some, but they only covered a small
set of function of the site, especially when it comes to relations
between models or workflow and curation states.

I understand some of the details that are causing you pain with the
current implementation, but I think the first part of this is to be
charitable to the current system and adequately describe the two
points above.

Before rethinking the implementation of this site I think the
following need to also be done:
- a specification for assigning a URI to these models (as would be
used by CellML 1.1 imports)
- a specification for how a manifest file is to be constructed, or
some set of rules for interpreting a directory structure of models,
especially in those cases where there are multiple local models used
in imports and we need to point to at least the top level model.
- a suggested solution to the bqs problem. Research existing standards.

Generally:

Relational databases are useful, but so are the combination of
ZCatalog and Sets. It really depends on the structure of the data and
the queries you want to perform. You should write out a reasonable set
of these in natural language to get the focus right. Maybe a proof of
concept using various mechanisms is required.

The frustration with metadata handling at the moment is a result of
some difficulties in the metadata specification for the metadata you
are using the most and also the use of a quite esoteric system:
4Suite's Versa RDF query interface. RDQL or SPARQL are better SQL-like
equivalents and certainly have a wide acceptance.

Subversion offers a nice philosophy of code management and the guess
is that this would apply well to the modeling process. It also offers
the potential for building URIs for versioned material - individual
files and whole changesets (which is something we are after). The
default webdav URI scheme may not be what we want, so it is also worth
looking at others; for example, the trac browser interface to a
subversion repository form quite nice URIs.

Workflow and security as defined and implemented by Zope/CMF/Plone is
a very nice model that should be reflected in our workflow and
security use-cases. We discussed a few weeks ago that if this
environment is going to provide the security layer, then there needs
to be a relationship between this and the subversion repository at
quite a detailed level.

cheers
Matt


On 6/21/07, Tommy Yu [EMAIL PROTECTED] wrote:
 Hi,

 I have written down some of my thoughts on how the model repository could be 
 put together.

 http://www.cellml.org/Members/tommy/repository_redesign.html

 It is still a pretty rough document.  The usage example section gives a rough 
 outline on what I see people might be doing with the repository and how this 
 design could address those issues, which I think it will be of interest to 
 users.  It is not an exhaustive list, yet.

 I must also note the design outlined is quite a drastic departure from what 
 we have now (it will be yet another new repository).  However, it is more 
 true to the one envisioned before according to 
 http://www.cellml.org/wiki/CellMLModelRepositories, except I have an addition 
 layer that will assist in pulling content and drawing relationships between 
 models.

 Feel free to take it apart and/or build on top of it.

 Cheers,
 Tommy.
 ___
 cellml-discussion mailing list
 cellml-discussion@cellml.org
 http://www.cellml.org/mailman/listinfo/cellml-discussion

___
cellml-discussion mailing list
cellml-discussion@cellml.org
http://www.cellml.org/mailman/listinfo/cellml-discussion


Re: [cellml-discussion] Concerning the CellML Model Repository

2007-06-21 Thread Andrew Miller
Tommy Yu wrote:
 Hi,

 I have written down some of my thoughts on how the model repository could be 
 put together.

 http://www.cellml.org/Members/tommy/repository_redesign.html

 It is still a pretty rough document.  The usage example section gives a rough 
 outline on what I see people might be doing with the repository and how this 
 design could address those issues, which I think it will be of interest to 
 users.  It is not an exhaustive list, yet.

 I must also note the design outlined is quite a drastic departure from what 
 we have now (it will be yet another new repository).  However, it is more 
 true to the one envisioned before according to 
 http://www.cellml.org/wiki/CellMLModelRepositories, except I have an addition 
 layer that will assist in pulling content and drawing relationships between 
 models.

 Feel free to take it apart and/or build on top of it.
   
Hi Tommy,

A few comments:
1) I am still not convinced that meta-data should not be versioned, 
simply because changes to metadata can be important changes to a model. 
In some cases, such as changes to simulation metadata, the changes might 
have a major impact on the final model.

I don't think it is a bad thing to have a one-way cache of metadata 
somewhere for technical / performance reasons (perhaps in a relational 
database), but I think that we should replicate data for each model 
(perhaps using a deep copy-on-write approach if this is really necessary 
to save disk space) rather than changing the metadata for existing 
models without changing the version.

Making changes to metadata require changes to the model will ensure that 
no one gets burned by referencing a particular version of a model, only 
to find that the metadata in that version has changed on them.

Your current unversioned, globally shared metadata approach probably 
also has security implications. For example, lets say that Alice submits 
a model which references a publication. Now suppose that Charlie wasn't 
an author of that paper, but he wants to add his name onto the list of 
authors. So he submits a completely different, bogus, model which 
includes metadata for the publication, and includes his name. When Bob 
downloads Alice's model from the repository, it would then include 
Charlie's name as one of the authors (assuming that the publication was 
referenced by PubMed ID or DOI or some sort of publication URI. 
Particular cases like the one I described might be able to be secured in 
an ad hoc fashion such as by checking that the authors are the same, but 
the general attack will still pervade this type of approach unless 
metadata is associated uniquely with a particular version of a 
particular model. If the assertions about the same subject cannot be 
identified between models in the database, then having data flow back 
from the relational database into the model does not carry any benefit 
at all).

However, I do agree that there is a place for some metadata which can be 
changed without creating a new version (which probably is the type of 
metadata that you wouldn't include in the CellML file by default). 
Curation status and permissions would probably fit in this category, 
because although they may be associated with a particular version, they 
should not be immutable for a given version.

2) I think that there should be a directory for each mathematical model 
(which may include several CellML model files, documentation, and so 
on), so that a particular version can be downloaded / checked out in its 
entirety (with some directory-level manifest describing how to run or 
view the model). This suggests that collisions between mathematical 
models should be prevented at this level, not at the file level. Under 
this scheme, Mary would find that at usage example 3, she couldn't use 
the same directory name as the one John already submitted.

3) I think the 'reference by citation' needs some expansion: I think 
people referencing models should have the choice to refer to:
 = a specific version for which no files will change at all.
 = the latest version which aims to reflect the letter of a publication 
(updates will only fix mistakes in the model which prevent it from 
corresponding to the printed paper).
 = the latest version which aims to reflect the results obtained by the 
author (updates can fix discrepancies or omissions from the paper that 
were in the author's original code, if the author didn't use CellML).
 = the latest derivative of the current model developed by the same 
author / group, even if it has not yet been peer-reviewed (subject to 
permissions constraints).
 = the latest derivative of the current model, but with all imports 
external to the model updated to the latest versions (even if this has 
not been reviewed by the author). This would be the most frequently 
updated version, because it could be automatically created without the 
model author being involved.

It would also be possible to search for derivatives made 

Re: [cellml-discussion] Concerning the CellML Model Repository

2007-06-21 Thread Tommy Yu
Hi Andrew,

A couple notes:

 I don't think it is a bad thing to have a one-way cache of metadata 
 somewhere for technical / performance reasons (perhaps in a relational 
 database), but I think that we should replicate data for each model 
 (perhaps using a deep copy-on-write approach if this is really necessary 
 to save disk space) rather than changing the metadata for existing 
 models without changing the version.
 
 Making changes to metadata require changes to the model will ensure that 
 no one gets burned by referencing a particular version of a model, only 
 to find that the metadata in that version has changed on them.
 
 Your current unversioned, globally shared metadata approach probably 
 also has security implications. For example, lets say that Alice submits 

I understood, and I did call for metadata in the RDBMS to be more of a 
snapshot.  Metadata will still be versioned (revision) in the Subversion 
repository.  The publishing of a model to the public could conceivably be done 
by someone other than the model creator.

Also, in the scenario outlined below, you are correct that a paper referenced 
by PubMed would be treated somewhat differently.  If Charlie were to publish a 
fake paper to the repository, it would result in a new references anyway:

Alice - Paper title (original)
Alice, Charlie - Paper title (fake)

There is no way to stop users from entering bad data into the system if they 
were given admin rights.  Fortunately Charlie wouldn't have that and so he 
wouldn't be able to add a new author to Alice's paper, but able to only create 
a new fake paper that he did not write since he can publish a model.

On the other hand, if he decide to use the original publication name to publish 
his model, then change the reference there, he would still be prevented from 
doing that, but he has the option to create a new fake reference.  Again, no 
way stopping user from publishing bad data if they were given rights.  It is 
possible to limit where Charlie can publish his paper to (i.e. publishes to 
reviewers only), and there would be no visible damage.

 a model which references a publication. Now suppose that Charlie wasn't 
 an author of that paper, but he wants to add his name onto the list of 
 authors. So he submits a completely different, bogus, model which 
 includes metadata for the publication, and includes his name. When Bob 
 downloads Alice's model from the repository, it would then include 
 Charlie's name as one of the authors (assuming that the publication was 
 referenced by PubMed ID or DOI or some sort of publication URI. 
 Particular cases like the one I described might be able to be secured in 
 an ad hoc fashion such as by checking that the authors are the same, but 
 the general attack will still pervade this type of approach unless 
 metadata is associated uniquely with a particular version of a 
 particular model. If the assertions about the same subject cannot be 
 identified between models in the database, then having data flow back 
 from the relational database into the model does not carry any benefit 
 at all).
 
 However, I do agree that there is a place for some metadata which can be 
 changed without creating a new version (which probably is the type of 
 metadata that you wouldn't include in the CellML file by default). 
 Curation status and permissions would probably fit in this category, 
 because although they may be associated with a particular version, they 
 should not be immutable for a given version.
 
 2) I think that there should be a directory for each mathematical model 
 (which may include several CellML model files, documentation, and so 
 on), so that a particular version can be downloaded / checked out in its 
 entirety (with some directory-level manifest describing how to run or 
 view the model). This suggests that collisions between mathematical 
 models should be prevented at this level, not at the file level. Under 
 this scheme, Mary would find that at usage example 3, she couldn't use 
 the same directory name as the one John already submitted.
 
 3) I think the 'reference by citation' needs some expansion: I think 
 people referencing models should have the choice to refer to:
  = a specific version for which no files will change at all.
  = the latest version which aims to reflect the letter of a publication 
 (updates will only fix mistakes in the model which prevent it from 
 corresponding to the printed paper).
  = the latest version which aims to reflect the results obtained by the 
 author (updates can fix discrepancies or omissions from the paper that 
 were in the author's original code, if the author didn't use CellML).
  = the latest derivative of the current model developed by the same 
 author / group, even if it has not yet been peer-reviewed (subject to 
 permissions constraints).
  = the latest derivative of the current model, but with all imports 
 external to the model updated to the latest versions (even if 

Re: [cellml-discussion] Concerning the CellML Model Repository

2007-06-21 Thread Tommy Yu
David Nickerson wrote:
 Hi Tommy,
 
 looks like a good starting point for some discussion. Just to help me 
 think through some of the issues, is there any chance you could add a 
 usage example illustrating how this system would deal with a model made 
 from the combination of a bunch of papers (i.e., a single model where 
 each component defines a new citation). I'm guessing this would be done 
 by adding each of the components as separate models and then importing 
 them into a single model?
 

It depends on how the model is cited.

If the creator of the model that binds all the separate models together based 
his/her model on a published paper, that citation would be used.  If not, it 
can only reside inside the user's directory as a filename of his choice, that 
imports the other models.

Yes, creator of model would have to import the components.

 Another usage example that might be interesting to look at would be a 
 model author adding a local CellML 1.1 model hierarchy to a remote 
 repository and how all the import href's are handled in this case (i.e., 
 imports throughout the model hierarchy might consist of a mix of 
 relative, http, and file URLs).
 

The model repository shouldn't be responsible for users importing from file:// 
and other non-existent URIs.  I will create detail use cases for this, but in 
the case of http URIs, I can think of checking for a pre-approved list of 
hostnames that models can be imported from.

 And another usage example might be the searching for models built using 
 a specific set of data. It will hopefully become standard practice to 
 annotate variable values with their source, where the source may be some 
 data from a different article than the model's publication.
 

That's using the metadata, right?  If the creator of the model does annotate 
components properly (e.g. giving some comment to cmeta:id of some component of 
some file) it will be searchable (provided that the creator publishes that 
model).

Thanks for your inputs,
Tommy.

 
 Thanks,
 David.
 
 Tommy Yu wrote:
 Hi,

 I have written down some of my thoughts on how the model repository could be 
 put together.

 http://www.cellml.org/Members/tommy/repository_redesign.html

 It is still a pretty rough document.  The usage example section gives a 
 rough outline on what I see people might be doing with the repository and 
 how this design could address those issues, which I think it will be of 
 interest to users.  It is not an exhaustive list, yet.

 I must also note the design outlined is quite a drastic departure from what 
 we have now (it will be yet another new repository).  However, it is more 
 true to the one envisioned before according to 
 http://www.cellml.org/wiki/CellMLModelRepositories, except I have an 
 addition layer that will assist in pulling content and drawing relationships 
 between models.

 Feel free to take it apart and/or build on top of it.

 Cheers,
 Tommy.
 ___
 cellml-discussion mailing list
 cellml-discussion@cellml.org
 http://www.cellml.org/mailman/listinfo/cellml-discussion
 

___
cellml-discussion mailing list
cellml-discussion@cellml.org
http://www.cellml.org/mailman/listinfo/cellml-discussion


Re: [cellml-discussion] Concerning the CellML Model Repository

2007-06-21 Thread Matt
Hi Tommy,

Can you continue to update/fill out your document as well as begin
associated proposals with information contained in the replies people
are submitting. The goal of this process is a scoping document with
associated content.

More comments below.

On 6/22/07, Tommy Yu [EMAIL PROTECTED] wrote:
 Matt wrote:
  Hi Tommy,
 
  I found the document seemed to be too far ahead of itself. I also
  didn't find any of the pros and cons very compelling because they
  don't address specific problems and those problems are not described.
 
  1) What are you actually trying to achieve? It would be useful to
  describe the parts of the current system that are giving you grief and
  look to give you more grief based on the use cases and any axes of
  scale.
 

 Starting with what I envisioned.

 Who is the repository catered for?
 1) People who would like to work on models, using it as a place to store 
 work-in-progress models.
 2) Reviewers to review models.
 3) Website users to browse models.

 1) What do the model builders want?
 - Their own workspace (home directory)
 - A place to let reviewers review their models
 - Also to publish their models

 First point is not addressed by what we have now.  Second and third point is 
 quite ad-hoc.  Also, version control is very ad-hoc right now.

Each of these points need to be filled out, e.g. what does it mean to
have a workspace for a CellML modeller?, What are the scenarios and
workflows for reviewers of CellML models?


 2,3) Reviewers and website users
 - A centralized location to browse models.
 - They would like to see how models may relate to each other.


How do models relate to each other? Relations between models come from
all sorts of data within models, and within any associated metadata
(so more than just our current cellml metadata specification). It
would be useful to write out the details of the relationships that are
important here as these pretty much form the basis of many of the
queries that will need to be performed.

 First point is already addressed, but second point is definitely not possible 
 as the current repository does not support 1.1.

Why does it not support CellML 1.1? i.e. what is the technology block
here to extending the current system to support it?


 Issues:
 - Flat file system.
 Sure, using ZCatalog it is possible to emulate users' home directories and 
 the like, but it still does not get away from what we have now.

I don't understand this. What are you aiming for in a home space and
why doesn't the current system support it?

 - Version/Variant
 It already clogged up the system.  There is no proper revision control 
 mechanism, what we have now is an ad-hoc emulated system.

I don't think it has clogged the system I just think it has been
improperly used both by authors and by the user interface. This is no
fault of the authors, there is simply a specification for versioning
that is missing. The hope is that subversion applies well to this.

 - It's CellML Code, right?
 Why not put code in a real code management system, like Subversion?

Subversion works well for filesystems of code and text data and to
some extent binary data that we don't really need to query the
contents of. If this applies well for CellML modelling, then
subversion is probably a good match. Subversion will bring its own
complexities when we are dealing with applying security to file
objects, and security/publishing in general will get even more complex
if we are proxying remote repositories - which we talked about a few
weeks ago.

Generally, I think the concept of cellml modelling being laid out in a
filesystem and subversion versioning concepts applied to it is good,
but untested. For instance, take a reasonably complex model of Andre's
and work out how it will look on the filesystem and  what subversion
versioning would result in.

While in this thread, I don't believe metadata should be treated any
differently to model data. Adding special rules for versioning of some
data and not others is going to complicate the versioning process and
I can't see any compelling reason to do this. Remember that the
subversion system is versioning file objects which will contain both
metadata and cellml model data. What is important is how and where
metadata is stored. Perhaps metadata should be seperated into its own
document sitting next to the model in the filesystem.

My inclination is that an implementation using subversion plus some
subversion hooks will be ok, but we haven't worked out details or done
any proof of concept for this - which should be agnositic to cellml
and focussed on how to apply zope+cmf security and workflows to data
objects stored in subversion repositories.


 - Zope has revision control
 Until someone packs the database.

Perhaps you should look at http://plone.org/products/plone/roadmap/8
(which is now completed and merged into Plone 3). There are some other
add on products - some listed in