Re: [cellml-discussion] Concerning the CellML Model Repository

Tommy Yu Thu, 21 Jun 2007 22:40:41 -0700

Matt wrote:
> Hi Tommy,
> 
> Can you continue to update/fill out your document as well as begin
> associated proposals with information contained in the replies people
> are submitting. The goal of this process is a scoping document with
> associated content.
>


It will be done when I am done refining all my thoughts about the threads here, 
along with the other thoughts I already have but not written down there.

> More comments below.
> 

Likewise.

> On 6/22/07, Tommy Yu <[EMAIL PROTECTED]> wrote:
>> Matt wrote:
>>> Hi Tommy,
>>>
>>> I found the document seemed to be too far ahead of itself. I also
>>> didn't find any of the pros and cons very compelling because they
>>> don't address specific problems and those problems are not described.
>>>
>>> 1) What are you actually trying to achieve? It would be useful to
>>> describe the parts of the current system that are giving you grief and
>>> look to give you more grief based on the use cases and any axes of
>>> scale.
>>>
>> Starting with what I envisioned.
>>
>> Who is the repository catered for?
>> 1) People who would like to work on models, using it as a place to store 
>> work-in-progress models.
>> 2) Reviewers to review models.
>> 3) Website users to browse models.
>>
>> 1) What do the model builders want?
>> - Their own workspace (home directory)
>> - A place to let reviewers review their models
>> - Also to publish their models
>>
>> First point is not addressed by what we have now.  Second and third point is 
>> quite ad-hoc.  Also, version control is very ad-hoc right now.
> 
> Each of these points need to be filled out, e.g. what does it mean to
> have a workspace for a CellML modeller?, What are the scenarios and
> workflows for reviewers of CellML models?
> 

Workspace is like a home directory.  Or are you comfortable with a flat 
filesystem where each file is owned by different people are all over the place. 
 This is about organization according to what the model builders want.

Models are by default private to the owner, but s/he can expose it via the 
layer that binds subversion and the database together which manages 
permissions.  Other modelers could import their collegues' models (provided 
permissions are given).

Reviewers simply gets access to a model, a URI to a specific revision of a 
model (and associated files, at model builder's options) will be generated 
which s/he could use.  If reviewer has rights s/he can publish the model to the 
public.

>> 2,3) Reviewers and website users
>> - A centralized location to browse models.
>> - They would like to see how models may relate to each other.
>>
> 
> How do models relate to each other? Relations between models come from
> all sorts of data within models, and within any associated metadata
> (so more than just our current cellml metadata specification). It
> would be useful to write out the details of the relationships that are
> important here as these pretty much form the basis of many of the
> queries that will need to be performed.

It will be done.  I can see users wanting to know which component of a model 
was imported by other models, and finding all other dependency of a particular 
model.  More will come.

> 
>> First point is already addressed, but second point is definitely not 
>> possible as the current repository does not support 1.1.
> 
> Why does it not support CellML 1.1? i.e. what is the technology block
> here to extending the current system to support it?
> 

None, aside from the lack of a proper code versioning system in the backend.  
With a few changes to the copy/paste code, CellML 1.1 will then be able to be 
stored into the repository.  I could go ahead and do this, but it will only 
further compound the issues we have now.  Okay, fine, refactor Model.py and 
have new classes inherit from that, but we still lack certain key features, 
such as a proper versioning backend.

Maybe as an experiment I could drop versions/variants and see how feasible it 
is in implementing certain desired features with just that.  However I still 
think we need a proper storage backend as the foundation, get that right 
(decided), before moving forward.

I don't want to build a mansion with a lack of a solid foundation.

>> Issues:
>> - Flat file system.
>> Sure, using ZCatalog it is possible to emulate users' home directories and 
>> the like, but it still does not get away from what we have now.
> 
> I don't understand this. What are you aiming for in a home space and
> why doesn't the current system support it?
> 

See above.  Current system can emulate it, therefore support it, but again, 
lack of proper versioning backend.

>> - Version/Variant
>> It already clogged up the system.  There is no proper revision control 
>> mechanism, what we have now is an ad-hoc emulated system.
> 
> I don't think it has clogged the system I just think it has been
> improperly used both by authors and by the user interface. This is no
> fault of the authors, there is simply a specification for versioning
> that is missing. The hope is that subversion applies well to this.
> 

It would be nice if we just drop it together and let the new system handle it 
automatically, rather than letting the user assign their own version numbers, 
which the Subversion system should address that.  Also, use standardized URIs.

>> - It's CellML Code, right?
>> Why not put code in a real code management system, like Subversion?
> 
> Subversion works well for filesystems of code and text data and to
> some extent binary data that we don't really need to query the
> contents of. If this applies well for CellML modelling, then
> subversion is probably a good match. Subversion will bring its own
> complexities when we are dealing with applying security to file
> objects, and security/publishing in general will get even more complex
> if we are proxying remote repositories - which we talked about a few
> weeks ago.
> 

Do we really want to proxy remote repositories?  Can we start smaller for now 
but keep that in mind?

> Generally, I think the concept of cellml modelling being laid out in a
> filesystem and subversion versioning concepts applied to it is good,
> but untested. For instance, take a reasonably complex model of Andre's
> and work out how it will look on the filesystem and  what subversion
> versioning would result in.
> 
> While in this thread, I don't believe metadata should be treated any
> differently to model data. Adding special rules for versioning of some
> data and not others is going to complicate the versioning process and
> I can't see any compelling reason to do this. Remember that the
> subversion system is versioning file objects which will contain both
> metadata and cellml model data. What is important is how and where
> metadata is stored. Perhaps metadata should be seperated into its own
> document sitting next to the model in the filesystem.
> 

Perhaps.  Really, the metadata that gets stored in the RDBMS in my design is a 
snapshot of what is in the model.  The whole file is versioned, no real major 
issues there.  It is possible to have a log table of who changed what when and 
that would be versioned, in the case of finding who to blame when a model is 
marked to be curated as level 3 when it is really a level 1 model.

> My inclination is that an implementation using subversion plus some
> subversion hooks will be ok, but we haven't worked out details or done
> any proof of concept for this - which should be agnositic to cellml
> and focussed on how to apply zope+cmf security and workflows to data
> objects stored in subversion repositories.
> 
> 
>> - Zope has revision control
>> Until someone packs the database.
> 
> Perhaps you should look at http://plone.org/products/plone/roadmap/8
> (which is now completed and merged into Plone 3). There are some other
> add on products - some listed in
> http://plone.org/products/by-category/versioning-staging
> 

Those are interesting.

> 
>> - Zope/Plone is also quite slow.
> 
> Really? How so?
> 
>> - Code we have now cannot get away from original design flaws.  Might as 
>> well start from scratch.
> 
> Refactoring may achieve the outcome better.

Not without a proper foundation.

> 
>> The major issue is, I cannot see how I can get the current repository to 
>> support CellML 1.1 models.  Sure, a new archetype can be written, and built 
>> with ZCatalog and the like.  I still find this method to be an ad-hoc 
>> slapped together with semi-mismatching components to get it working, whereas 
>> the obvious solution to use a CMS with a database that points to the data 
>> would be the much elegant solution (with a front-end written to interface 
>> that).
> 
> I don't know what elegant means here. A diagram of the current
> components might help for us to see what the current layout is like.
> Sure, when I look at the code the lines between components are
> blurred, but there is an architecture there that Carrey and Andre were
> working towards. It would be useful to see the landscape as it is now.
> 

http://www.cellml.org/wiki/CellMLModelRepositories

Wait, it does not match up with that document.  I may get around to that.

> 
>> Oh, how is it ad-hoc?  I still do not have this resolved, but there is no 
>> "not" query in ZCatalog.  There is a product called 'AdvancedQuery' that 
>> address that, but that's more dependency on yet more products to get 
>> something simple done.
> 
> What does query mean in the context of this project? Any call on the
> ZCatalog is definately a query in technical terms. Is there something
> like an outer join in SQL that you want an analogy for in ZCatalog
> querying? Have you looked at creating custom indexes? I suspect the
> ZCatalog is more powerful than you think.
> 

What I meant is, ZCatalog does not do what I need it to do.  I can't for 
instance find models that does not have a particular keyword easily.  During my 
search for solution I came across this:

http://www.dieter.handshake.de/pyprojects/zope/AdvancedQuery.html

To address the limitations of ZCatalog's search functions.  Yet more addon on 
top of more addons to get things done.

>> There are more, but I will end it here.
> 
> Please don't. We need all of them.
> 
>>> 2) What are the use cases? An initial set should be extracted from the
>>> current site. You have written out some, but they only covered a small
>>> set of function of the site, especially when it comes to relations
>>> between models or workflow and curation states.
>> Feel free to list some specific examples I have omitted like Andre and 
>> Andrew did.  I do agree it is a small set, but I am starting from the basics 
>> and moving up from there.  It will get quite complicated.
> 
> Document what the current workflow for the site is at the moment. Then
> see where those can be cleaned up.
> 
>>> I understand some of the details that are causing you pain with the
>>> current implementation, but I think the first part of this is to be
>>> charitable to the current system and adequately describe the two
>>> points above.
>>>
>>> Before rethinking the implementation of this site I think the
>>> following need to also be done:
>>> - a specification for assigning a URI to these models (as would be
>>> used by CellML 1.1 imports)
>> I've outlined a few, but more details to come.
>>
>>> - a specification for how a manifest file is to be constructed, or
>>> some set of rules for interpreting a directory structure of models,
>>> especially in those cases where there are multiple local models used
>>> in imports and we need to point to at least the top level model.
>>> - a suggested solution to the bqs problem. Research existing standards.
>>>
>> I did consider that, and I think OpenURL may suit our needs fairly well.  It 
>> is already an established standards, it's about citations, got great support 
>> by the world (libraries and citation catalogs are using this), seems to have 
>> everything bqs describes, and here's the spec:
>>
>> http://www.niso.org/standards/resources/Z39_88_2004.pdf
>>
>> However, it's in XML only, but near the bottom of page 23 of that file, I 
>> quote:
>>
>>> - To support new applications, communities could introduce new XML-based 
>>> ContextObject Formats constrained by other syntactic constraint languages 
>>> (DTD or RELAX NG, for example) or semantic constraint languages (RDFS or 
>>> OWL, for example).
>> Nothing is really stopping us from adapting that standard, aside from having 
>> to rewrite/regenerate all metadata we have now.
> 
> 
> I'm not familiar with this (yet). We need to write out where URL has
> meaning (for eample in a URI of a model import, in being able to
> access versions or changesets, etc) and then weigh up the options.
> 

That was a response to address the bqs problem.  This is how OpenURL represents 
a citation in XML:

<ctx:referent>
  <ctx:metadata-by-val>
    <ctx:format>info:ofi/fmt:xml:xsd:journal</ctx:format>
    <ctx:metadata>
      <rft:journal xmlns:rft="info:ofi/fmt:xml:xsd:journal"
  xsi:schemaLocation="info:ofi/fmt:xml:xsd:journal
  http://www.openurl.info/registry/docs/info:ofi/fmt:xml:xsd:journal";>
        <rft:authors>
          <rft:author>
            <rft:aulast>Bergelson</rft:aulast>
            <rft:auinit>J</rft:auinit>
          </rft:author>
        </rft:authors>
        <rft:atitle>Isolation of a common receptor for coxsackie B viruses
  and adenoviruses 2 and 5
        </rft:atitle>
        <rft:jtitle>Science</rft:jtitle>
        <rft:date>1997</rft:date>
        <rft:volume>275</rft:volume>
        <rft:spage>1320</rft:spage>
        <rft:epage>1323</rft:epage>
      </rft:journal>
    </ctx:metadata>
  </ctx:metadata-by-val>
</ctx:referent>

As for the URIs, it is still in progress.

>>> Generally:
>>>
>>> Relational databases are useful, but so are the combination of
>>> ZCatalog and Sets. It really depends on the structure of the data and
>>> the queries you want to perform. You should write out a reasonable set
>>> of these in natural language to get the focus right. Maybe a proof of
>>> concept using various mechanisms is required.
>>>
>> Will get to that.  I am at the research stage still, but I did have some 
>> preliminary schemas down.
>>
>>> The frustration with metadata handling at the moment is a result of
>>> some difficulties in the metadata specification for the metadata you
>>> are using the most and also the use of a quite esoteric system:
>>> 4Suite's Versa RDF query interface. RDQL or SPARQL are better SQL-like
>>> equivalents and certainly have a wide acceptance.
>>>
>> While Versa itself is not so bad, but the intricacy and gotcha's of 4Suite 
>> was quite unpleasant to deal with.  I must note I did not decide to use 
>> that, I merely inherited code that used it so I am sort of stuck.  If I had 
>> a choice I would be using RDFlib and use SPARQL provided by it.  Yes, it has 
>> to do with frustration of both 4Suite and the metadata specification.
>>
>>> Subversion offers a nice philosophy of code management and the guess
>>> is that this would apply well to the modeling process. It also offers
>>> the potential for building URIs for versioned material - individual
>>> files and whole changesets (which is something we are after). The
>>> default webdav URI scheme may not be what we want, so it is also worth
>>> looking at others; for example, the trac browser interface to a
>>> subversion repository form quite nice URIs.
>>>
>> Yes, I am doing research into those also.  The HTTP interface would built on 
>> top of that also.
>>
>> It is my desire to use a _real_ code management backend to manage models so 
>> I don't have to start writing a versioning mechanism into the repository 
>> like what we have now.
>>
> 
> Don't forget to look at the plone 3 state I mentioned above. But since
> one of our use-cases is for someone to be able to work with data on
> their filesystem and submit it through the command-line or some
> file-browser tool, the subversion client process is already available
> and has various attractive features.
> 

Exactly, they can check in code from the tool of their choice.

>>> Workflow and security as defined and implemented by Zope/CMF/Plone is
>>> a very nice model that should be reflected in our workflow and
>>> security use-cases. We discussed a few weeks ago that if this
>>> environment is going to provide the security layer, then there needs
>>> to be a relationship between this and the subversion repository at
>>> quite a detailed level.
>>>
>> The workflow and security Zope/CMF/Plone will definitely be used, and will 
>> be mapped some ways into the model repository interface (abstraction layer, 
>> I called it).  I will give this more thought when I have the foundations 
>> down (i.e. interface between subversion/code and the database of metadata, 
>> submitted models, etc).
> 
> This feels like an issue that should be advertised on plone/zope lists
> sooner rather than later; perhaps there are already some products out
> there to help or other people are thinking of it, or someone has
> thought of it and found compelling reasons not to do it. Either way, I
> think the appeal of this subject to the greate community would be
> quite high.
> 

Perhaps, but I will wait a bit on that until the design is more flushed out.

Cheers,
Tommy.

> cheers
> Matt
> 
> 
>> Thank you for your thoughts,
>> Tommy.
>>
>>> cheers
>>> Matt
>>>
>>>
>>> On 6/21/07, Tommy Yu <[EMAIL PROTECTED]> wrote:
>>>> Hi,
>>>>
>>>> I have written down some of my thoughts on how the model repository could 
>>>> be put together.
>>>>
>>>> http://www.cellml.org/Members/tommy/repository_redesign.html
>>>>
>>>> It is still a pretty rough document.  The usage example section gives a 
>>>> rough outline on what I see people might be doing with the repository and 
>>>> how this design could address those issues, which I think it will be of 
>>>> interest to users.  It is not an exhaustive list, yet.
>>>>
>>>> I must also note the design outlined is quite a drastic departure from 
>>>> what we have now (it will be yet another new repository).  However, it is 
>>>> more true to the one envisioned before according to 
>>>> http://www.cellml.org/wiki/CellMLModelRepositories, except I have an 
>>>> addition layer that will assist in pulling content and drawing 
>>>> relationships between models.
>>>>
>>>> Feel free to take it apart and/or build on top of it.
>>>>
>>>> Cheers,
>>>> Tommy.
>>>> _______________________________________________
>>>> cellml-discussion mailing list
>>>> cellml-discussion@cellml.org
>>>> http://www.cellml.org/mailman/listinfo/cellml-discussion
>>>>
>>> _______________________________________________
>>> cellml-discussion mailing list
>>> cellml-discussion@cellml.org
>>> http://www.cellml.org/mailman/listinfo/cellml-discussion
>> _______________________________________________
>> cellml-discussion mailing list
>> cellml-discussion@cellml.org
>> http://www.cellml.org/mailman/listinfo/cellml-discussion
>>
> _______________________________________________
> cellml-discussion mailing list
> cellml-discussion@cellml.org
> http://www.cellml.org/mailman/listinfo/cellml-discussion

_______________________________________________
cellml-discussion mailing list
cellml-discussion@cellml.org
http://www.cellml.org/mailman/listinfo/cellml-discussion

Re: [cellml-discussion] Concerning the CellML Model Repository

Reply via email to