On Tue, Jun 4, 2013 at 3:20 PM, Jerven Bolleman <m...@jerven.eu> wrote:
> > > > On Tue, Jun 4, 2013 at 3:08 PM, Michel Dumontier < > michel.dumont...@gmail.com> wrote: > >> The point here is simple. if you provide a URI uniprot:1.2.3.4, i would >> like to know that this is incorrect. >> >> m. >> > Yes, but the model needs to be good enough to tell you that. The model > discussed yesterday with > data item identifer regex pattern is not strong enough to do so. The void > uriRegexPattern might be good enough. > > :x a void:Dataset ; > void:uriRegexPattern "ec:[1-6].\d.\d.\d" , "uniprot:P\d{5}" . > > in our registry, we have 4 prefixes for "ec" ec, enzyme nomenclature, ec-code, enzyme classification where "ec" is the (global) preferred prefix, and the others are cultivated from various datasets so, in a regex, (ec|enzyme nomenclature|ec\-code|enzyme classification) and the identifier part matches to: "\d+\.-\.-\.-|\d+\.\d+\.-\.-|\d+\.\d+\.\d+\.-|\d+\.\d+\.\d+\.(n)?\d+" so, putting the prefix in use and provided identifier together, we would ask whether it matches to "(ec|enzyme nomenclature|ec\-code|enzyme classification)\:( \d+\.-\.-\.-|\d+\.\d+\.-\.-|\d+\.\d+\.\d+\.-|\d+\.\d+\.\d+\.(n)?\d+)" we would also want to match fully qualified URIs in a similar manner. > But I am thinking that we can have stronger validation patterns if we > think a bit more. > e.g. can we think of something that can prevent. > > uniprot:P12345 a up:Sequence . > sequence:P12345 a up:Protein . > > And is a dataset description the right place for this validation data? > > yes. m. -- Michel Dumontier Associate Professor of Bioinformatics, Carleton University Chair, W3C Semantic Web for Health Care and the Life Sciences Interest Group http://dumontierlab.com