Re: projects-new.a.o updates
Hi, On Wed, May 20, 2015 at 12:52 AM, Hervé BOUTEMY wrote: > > > One question Hervé, do all rdf files at > > https://projects-new.apache.org/doap/ are automatically generated or > copied > > from svn? > these ones are automatically generated from committer-info.txt by > parsecommittees.py > > > http://svn.apache.org/viewvc/comdev/projects.apache.org/scripts/import/parsecommittees.py Perfect. I'll take a look there. The first thing I'd like to change the way RDF files are generated. Generate RDF/XML with a XML library could look like a good idea, but actually it's a very bad one. Valid XML files could be invalid RDF/XML. So i'd like to switch to RDFLib to generate those files. I'll provide a patch for internal validation. One idea that copuld show how the semantics goes beyond the syntax is to generated those DOAP files in other serializations, such as JSON-LD or Turtle. Then we'll slowly move into a more correct way. Is that fine for you? what I still don't know is what is expected handwritten data in the PMC data > files > and if we really should try to generate such pmc.rdf files instead of > reading > content from > http://svn.apache.org/repos/asf/infrastructure/site-tools/trunk/projects/pmc_list.xml Well, the current file at http://svn.apache.org/repos/asf/infrastructure/site-tools/trunk/projects/data/ contain only basic PMC information (name, homepage and chair), which I think we could get from LDAP or any other source. The key concept I'd like to introduce here is that the less manually generated files the better for building a machine-drive infrastructure on top of them. > I try to maintain ideas and comments in "Work in Progres" section of about > page: https://projects-new.apache.org/about.html Ah, great! Thanks. -- Sergio Fernández Partner Technology Manager Redlink GmbH m: +43 6602747925 e: sergio.fernan...@redlink.co w: http://redlink.co
Re: projects-new.a.o updates
Le mercredi 20 mai 2015 00:55:47 sebb a écrit : > On 19 May 2015 at 23:52, Hervé BOUTEMY wrote: > > Le lundi 18 mai 2015 14:03:39 Sergio Fernández a écrit : > >> Hi guys, > >> > >> I have to admit I'm a bit lost with the development here; I do not have > >> that much time, things change quite fast and discussions are a bit hard > >> to > >> follow. Hervé has done a great work; so some guidelines where I can > >> contribute would help me a lot. > >> > >> One question Hervé, do all rdf files at > >> https://projects-new.apache.org/doap/ are automatically generated or > >> copied > >> from svn? > > > > these ones are automatically generated from committer-info.txt by > > parsecommittees.py > > > > http://svn.apache.org/viewvc/comdev/projects.apache.org/scripts/import/par > > secommittees.py > > > > what I still don't know is what is expected handwritten data in the PMC > > data files > > That is described here: > > http://projects.apache.org/docs/pmc.html yeah, I found and now understand: I tried to improve explanations to fix why I didn't understand explanations before (even if I read them multiple times) and was confused between projects DOAP files and committees PMC descriptor files There are still confusing parts, IMHO: - PMC entry under "DOAP Files" section - "PMC descriptors" section in http://projects.apache.org/guidelines.html But probably it's not really time to invest in projects.a.o since I hope we'll be able to switch to projects-new.a.o I just added committee.html separate from project.html, and made different links for PMCs and projects: https://projects-new.apache.org/projects.html?pmc > > > and if we really should try to generate such pmc.rdf files instead of > > reading content from > > http://svn.apache.org/repos/asf/infrastructure/site-tools/trunk/projects/ > > pmc_list.xml > Most of the PMC data files are basic place-holders, but the syntax > allows PMCs to create and maintain their own RDF files. > > Do we really want to prevent them doing so? the only information that require manual entry is charter: everything else can be automated, which will be easier and give better accuracy > > Maybe the solution would be to ignore the dummy references such as > > http://{tlp}.apache.org/"; /> yes, this dummy reference is a problem since it's "magic" > > and instead generate the required info from other sources. > > However, I don't know whether there is a canonical source for deriving > the internal name and home page of a PMC from its full name > e.g. > Apache Portable Runtime => apr & http://apr.apache.org/ > Apache HttpComponents => httpcomponents & http://hc.apache.org/ > > There are other such examples where the conversion cannot readily be > automated. [Except by maintaining a list of the exceptions somewhere] I coded the few exceptions in the beginning of parsecommittees.py Now we just need to define where to maintain the charter info, and generated pmc.rdf files from committee-info.txt + this charter will be the most accurate and easy to find since it's a codified url: http://project-new.a.o/doap/{committee id}/pmc.rdf Notice that I can improve parsecommittees.py to only update rdf files and not erase charter info but only update chair and PMC members > > > I try to maintain ideas and comments in "Work in Progres" section of about > > page: https://projects-new.apache.org/about.html > > Might be easier to use a Wiki page for that? it's temporary Regards, Hervé > > > Regards, > > > > Hervé > > > >> Cheers, > >> > >> On Sat, May 16, 2015 at 4:42 PM, sebb wrote: > >> > On 16 May 2015 at 08:22, Hervé BOUTEMY wrote: > >> > > Le samedi 16 mai 2015 00:36:03 sebb a écrit : > >> > >> On 15 May 2015 at 22:08, Hervé BOUTEMY wrote: > >> > >> > Le vendredi 15 mai 2015 14:02:52 sebb a écrit : > >> > >> >> On 14 May 2015 at 23:38, Hervé BOUTEMY > >> > > >> > wrote: > >> > >> >> > Hi, > >> > >> >> > > >> > >> >> > I seriously updated content: > >> > >> >> > - *every* TLP is listed, even when no DOAP file has been > >> > >> >> > written > >> > > >> > [1] > >> > > >> > >> >> > - TLP project can be displayed, even without DOAP and provide > >> > >> >> > link > >> > > >> > to > >> > > >> > >> >> > every > >> > >> >> > sub-project [2] > >> > >> >> > - when a TLP has a "main sub-project" with its DOAP file, data > >> > > >> > from TLP > >> > > >> > >> >> > and > >> > >> >> > data from DOAP subproject are clearly separate [3] > >> > >> >> > >> > >> >> The URLs [1] [2] [3] use the same namespace for PMCs and projects > >> > >> >> as > >> > >> >> well as generic queries. > >> > >> >> This may cause name clashes in future - e.g. a PMC called > >> > >> >> "numbers" > >> > >> >> would clash with the "numbers" view of the data. > >> > >> > > >> > >> > not exactly: [1] is project*s*.html while the 2 others are > >> > > >> > project.html > >> > > >> > >> > so no clash between projects listing type and project/PMC > >> > >> > >> > >> Ah, OK, I'd not noticed the sub
Re: projects-new.a.o updates
On 19 May 2015 at 23:52, Hervé BOUTEMY wrote: > Le lundi 18 mai 2015 14:03:39 Sergio Fernández a écrit : >> Hi guys, >> >> I have to admit I'm a bit lost with the development here; I do not have >> that much time, things change quite fast and discussions are a bit hard to >> follow. Hervé has done a great work; so some guidelines where I can >> contribute would help me a lot. >> >> One question Hervé, do all rdf files at >> https://projects-new.apache.org/doap/ are automatically generated or copied >> from svn? > these ones are automatically generated from committer-info.txt by > parsecommittees.py > > http://svn.apache.org/viewvc/comdev/projects.apache.org/scripts/import/parsecommittees.py > > what I still don't know is what is expected handwritten data in the PMC data > files That is described here: http://projects.apache.org/docs/pmc.html > and if we really should try to generate such pmc.rdf files instead of reading > content from > http://svn.apache.org/repos/asf/infrastructure/site-tools/trunk/projects/pmc_list.xml Most of the PMC data files are basic place-holders, but the syntax allows PMCs to create and maintain their own RDF files. Do we really want to prevent them doing so? Maybe the solution would be to ignore the dummy references such as http://{tlp}.apache.org/"; /> and instead generate the required info from other sources. However, I don't know whether there is a canonical source for deriving the internal name and home page of a PMC from its full name e.g. Apache Portable Runtime => apr & http://apr.apache.org/ Apache HttpComponents => httpcomponents & http://hc.apache.org/ There are other such examples where the conversion cannot readily be automated. [Except by maintaining a list of the exceptions somewhere] > I try to maintain ideas and comments in "Work in Progres" section of about > page: https://projects-new.apache.org/about.html Might be easier to use a Wiki page for that? > Regards, > > Hervé > >> >> Cheers, >> >> On Sat, May 16, 2015 at 4:42 PM, sebb wrote: >> > On 16 May 2015 at 08:22, Hervé BOUTEMY wrote: >> > > Le samedi 16 mai 2015 00:36:03 sebb a écrit : >> > >> On 15 May 2015 at 22:08, Hervé BOUTEMY wrote: >> > >> > Le vendredi 15 mai 2015 14:02:52 sebb a écrit : >> > >> >> On 14 May 2015 at 23:38, Hervé BOUTEMY >> > >> > wrote: >> > >> >> > Hi, >> > >> >> > >> > >> >> > I seriously updated content: >> > >> >> > - *every* TLP is listed, even when no DOAP file has been written >> > >> > [1] >> > >> > >> >> > - TLP project can be displayed, even without DOAP and provide link >> > >> > to >> > >> > >> >> > every >> > >> >> > sub-project [2] >> > >> >> > - when a TLP has a "main sub-project" with its DOAP file, data >> > >> > from TLP >> > >> > >> >> > and >> > >> >> > data from DOAP subproject are clearly separate [3] >> > >> >> >> > >> >> The URLs [1] [2] [3] use the same namespace for PMCs and projects as >> > >> >> well as generic queries. >> > >> >> This may cause name clashes in future - e.g. a PMC called "numbers" >> > >> >> would clash with the "numbers" view of the data. >> > >> > >> > >> > not exactly: [1] is project*s*.html while the 2 others are >> > >> > project.html >> > >> > >> > so no clash between projects listing type and project/PMC >> > >> >> > >> Ah, OK, I'd not noticed the subtle difference. >> > >> >> > >> However there is still a potential name clash: the Ant PMC is not the >> > >> same as the Ant project produced by the Ant PMC. >> > > >> > > yes, even if Ant is one of the few committees that explicitely makes a >> > > difference between the committee and the project even if they share the >> > >> > same >> > >> > > name >> > > >> > >> >> It would be better to use distinct namespaces for distinct types of >> > >> > item. >> > >> > >> > I don't think a clash between a PMC and a project can happen: if they >> > >> > have >> > >> > >> > the same id, it should be TLP's PMC, isn't it? >> > >> >> > >> No, they are not the same thing. >> > >> A project is not a PMC, though they may have the same name. >> > >> >> > >> A PMC is a group of people; >> > > >> > > ok >> > > question: are committers a second group of people attached to a PMC? >> > >> > Not always. >> > The committers LDAP groups are basically used to grant permission to >> > access code repos. >> > Not all PMCs use them, for example Subversion (and more recently >> > Commons) allow any ASF committer (another LDAP group) write access to >> > their source code. >> > >> > Incubator committer groups are not defined in LDAP but have the same >> > purpose as the LDAP ones. >> > >> > > To me, >> > > that's the case, even if some projects have their own committers list >> > >> > (like >> > >> > > incubator projects, or I suppose the lucene-* or hive-hcatalog or >> > >> > xmlgraphics- >> > >> > > fop & xmlgraphics-batik LDAP groups representing projects that didn't >> > >> > write >> > >> > > DOAP file) >> > > see http://people.apache.org/committers-by-project.html >> > >> > The lucene/hive/et
Re: projects-new.a.o updates
Le lundi 18 mai 2015 14:03:39 Sergio Fernández a écrit : > Hi guys, > > I have to admit I'm a bit lost with the development here; I do not have > that much time, things change quite fast and discussions are a bit hard to > follow. Hervé has done a great work; so some guidelines where I can > contribute would help me a lot. > > One question Hervé, do all rdf files at > https://projects-new.apache.org/doap/ are automatically generated or copied > from svn? these ones are automatically generated from committer-info.txt by parsecommittees.py http://svn.apache.org/viewvc/comdev/projects.apache.org/scripts/import/parsecommittees.py what I still don't know is what is expected handwritten data in the PMC data files and if we really should try to generate such pmc.rdf files instead of reading content from http://svn.apache.org/repos/asf/infrastructure/site-tools/trunk/projects/pmc_list.xml I try to maintain ideas and comments in "Work in Progres" section of about page: https://projects-new.apache.org/about.html Regards, Hervé > > Cheers, > > On Sat, May 16, 2015 at 4:42 PM, sebb wrote: > > On 16 May 2015 at 08:22, Hervé BOUTEMY wrote: > > > Le samedi 16 mai 2015 00:36:03 sebb a écrit : > > >> On 15 May 2015 at 22:08, Hervé BOUTEMY wrote: > > >> > Le vendredi 15 mai 2015 14:02:52 sebb a écrit : > > >> >> On 14 May 2015 at 23:38, Hervé BOUTEMY > > > > wrote: > > >> >> > Hi, > > >> >> > > > >> >> > I seriously updated content: > > >> >> > - *every* TLP is listed, even when no DOAP file has been written > > > > [1] > > > > >> >> > - TLP project can be displayed, even without DOAP and provide link > > > > to > > > > >> >> > every > > >> >> > sub-project [2] > > >> >> > - when a TLP has a "main sub-project" with its DOAP file, data > > > > from TLP > > > > >> >> > and > > >> >> > data from DOAP subproject are clearly separate [3] > > >> >> > > >> >> The URLs [1] [2] [3] use the same namespace for PMCs and projects as > > >> >> well as generic queries. > > >> >> This may cause name clashes in future - e.g. a PMC called "numbers" > > >> >> would clash with the "numbers" view of the data. > > >> > > > >> > not exactly: [1] is project*s*.html while the 2 others are > > > > project.html > > > > >> > so no clash between projects listing type and project/PMC > > >> > > >> Ah, OK, I'd not noticed the subtle difference. > > >> > > >> However there is still a potential name clash: the Ant PMC is not the > > >> same as the Ant project produced by the Ant PMC. > > > > > > yes, even if Ant is one of the few committees that explicitely makes a > > > difference between the committee and the project even if they share the > > > > same > > > > > name > > > > > >> >> It would be better to use distinct namespaces for distinct types of > > > > item. > > > > >> > I don't think a clash between a PMC and a project can happen: if they > > > > have > > > > >> > the same id, it should be TLP's PMC, isn't it? > > >> > > >> No, they are not the same thing. > > >> A project is not a PMC, though they may have the same name. > > >> > > >> A PMC is a group of people; > > > > > > ok > > > question: are committers a second group of people attached to a PMC? > > > > Not always. > > The committers LDAP groups are basically used to grant permission to > > access code repos. > > Not all PMCs use them, for example Subversion (and more recently > > Commons) allow any ASF committer (another LDAP group) write access to > > their source code. > > > > Incubator committer groups are not defined in LDAP but have the same > > purpose as the LDAP ones. > > > > > To me, > > > that's the case, even if some projects have their own committers list > > > > (like > > > > > incubator projects, or I suppose the lucene-* or hive-hcatalog or > > > > xmlgraphics- > > > > > fop & xmlgraphics-batik LDAP groups representing projects that didn't > > > > write > > > > > DOAP file) > > > see http://people.apache.org/committers-by-project.html > > > > The lucene/hive/etc groups are historic and AIUI are deprecated > > because of the overhead of maintainance etc. > > It is much preferred to use social means to control who is "allowed" > > to update code, as is done by Subversion and Commons. > > > > > I suppose we could display the difference when some projects have their > > > > own > > > > > committers list that is different from the TLP's committers list > > > > Not sure the distinction is useful. > > The current people site just displays the membership of the various > > different groups; it is up to the reader to know what the group does. > > > > >> a project is a software artifact. > > > > > > ok, > > > that's the classical way IT people talk, even if that's not the way > > > > business > > > > > people talk: I think this is a cause for major misunderstandings between > > > > devs > > > > > and business, but that's a larger problem than ASF's internals we're > > > > working > > > > > on :) > > > > It's not just IT