Re: projects-new.a.o updates

Hervé BOUTEMY Wed, 20 May 2015 22:48:08 -0700

Le mercredi 20 mai 2015 08:39:42 Sergio Fernández a écrit :
> Hi,
> 
> On Wed, May 20, 2015 at 12:52 AM, Hervé BOUTEMY <herve.bout...@free.fr>
> 
> wrote:
> > > One question Hervé, do all rdf files at
> > > https://projects-new.apache.org/doap/ are automatically generated or
> > 
> > copied
> > 
> > > from svn?
> > 
> > these ones are automatically generated from committer-info.txt by
> > parsecommittees.py
> > 
> > 
> > http://svn.apache.org/viewvc/comdev/projects.apache.org/scripts/import/par
> > secommittees.py
> Perfect. I'll take a look there.
> 
> The first thing I'd like to change the way RDF files are generated.
> Generate RDF/XML with a XML library could look like a good idea, but
> actually it's a very bad one.  Valid XML files could be invalid RDF/XML. So
> i'd like to switch to RDFLib to generate those files. I'll provide a patch
> for internal validation.
don't hesitate to directly commit, it is easily reverted if there is a strong 
problem (which I don't epect: you're an Apache committer, isn't it?): since 
I'm not a RDF expert at all, and a noob in Python, starting with XML library 
was the easiest to start the work, hoping someone like you could improve (the 
way Daniel started the projects-new.a.o site and I jumped in to improve it 
with my own ideas)


> One idea that copuld show how the semantics goes
> beyond the syntax is to generated those DOAP files in other serializations,
> such as JSON-LD or Turtle. Then we'll slowly move into a more correct way.
> 
> Is that fine for you?
yes, great, thank you

> 
> what I still don't know is what is expected handwritten data in the PMC data
> > files
> > and if we really should try to generate such pmc.rdf files instead of
> > reading
> > content from
> > http://svn.apache.org/repos/asf/infrastructure/site-tools/trunk/projects/p
> > mc_list.xml
> Well, the current file at
> http://svn.apache.org/repos/asf/infrastructure/site-tools/trunk/projects/dat
> a/ contain only basic PMC information (name, homepage and chair), which I
> think we could get from LDAP or any other source.
> 
> The key concept I'd like to introduce here is that the less manually
> generated files the better for building a machine-drive infrastructure on
> top of them.
+1
the only question is: where do we get info that can't be discovered scraping 
existing information source (like committee-info.txt or LDAP)?
But any existing information extraction and reformatting has to be automated = 
parsecommittees.py

when you'll execute the script, you'll see that it checks consistency between 
different information sources and displays warnings when a discrepency is 
found: I think I'll add (if you don't beat me at it) the check against 
pmc_list.xml: the committee list at least should match, and if the committee 
name from PMC descriptor does not match what we have from committee(info.txt, 
we should warn

Regards,

Hervé

> 
> > I try to maintain ideas and comments in "Work in Progres" section of about
> > page: https://projects-new.apache.org/about.html
> 
> Ah, great!
> 
> Thanks.

Re: projects-new.a.o updates

Reply via email to