Le mercredi 20 mai 2015 08:39:42 Sergio Fernández a écrit : > Hi, > > On Wed, May 20, 2015 at 12:52 AM, Hervé BOUTEMY <herve.bout...@free.fr> > > wrote: > > > One question Hervé, do all rdf files at > > > https://projects-new.apache.org/doap/ are automatically generated or > > > > copied > > > > > from svn? > > > > these ones are automatically generated from committer-info.txt by > > parsecommittees.py > > > > > > http://svn.apache.org/viewvc/comdev/projects.apache.org/scripts/import/par > > secommittees.py > Perfect. I'll take a look there. > > The first thing I'd like to change the way RDF files are generated. > Generate RDF/XML with a XML library could look like a good idea, but > actually it's a very bad one. Valid XML files could be invalid RDF/XML. So > i'd like to switch to RDFLib to generate those files. I'll provide a patch > for internal validation. don't hesitate to directly commit, it is easily reverted if there is a strong problem (which I don't epect: you're an Apache committer, isn't it?): since I'm not a RDF expert at all, and a noob in Python, starting with XML library was the easiest to start the work, hoping someone like you could improve (the way Daniel started the projects-new.a.o site and I jumped in to improve it with my own ideas)
> One idea that copuld show how the semantics goes > beyond the syntax is to generated those DOAP files in other serializations, > such as JSON-LD or Turtle. Then we'll slowly move into a more correct way. > > Is that fine for you? yes, great, thank you > > what I still don't know is what is expected handwritten data in the PMC data > > files > > and if we really should try to generate such pmc.rdf files instead of > > reading > > content from > > http://svn.apache.org/repos/asf/infrastructure/site-tools/trunk/projects/p > > mc_list.xml > Well, the current file at > http://svn.apache.org/repos/asf/infrastructure/site-tools/trunk/projects/dat > a/ contain only basic PMC information (name, homepage and chair), which I > think we could get from LDAP or any other source. > > The key concept I'd like to introduce here is that the less manually > generated files the better for building a machine-drive infrastructure on > top of them. +1 the only question is: where do we get info that can't be discovered scraping existing information source (like committee-info.txt or LDAP)? But any existing information extraction and reformatting has to be automated = parsecommittees.py when you'll execute the script, you'll see that it checks consistency between different information sources and displays warnings when a discrepency is found: I think I'll add (if you don't beat me at it) the check against pmc_list.xml: the committee list at least should match, and if the committee name from PMC descriptor does not match what we have from committee(info.txt, we should warn Regards, Hervé > > > I try to maintain ideas and comments in "Work in Progres" section of about > > page: https://projects-new.apache.org/about.html > > Ah, great! > > Thanks.