Re: Re : Re: Re : Re: Project Visualization Tool...
The main problem I have with JSON is that AFAIK it does not support comments. On 18 April 2015 at 19:03, herve.bout...@free.fr wrote: Yes, I have no problem with json vs xml: the question is more to define the schema like doap did it, and write documentation for projects to know where to publish what information editing current generated json just creates a new information source, without any documentation My point is: afaik, the purpose of the site is to display info in newer ways, then json generated from every existing piece of information is great, like any other format that would better suit some other visualization But if we're creating any new source of information that competes with existing one, this has to be done with great care on documentation, explanation on how to migrate and so on of course the raw format is not an issue: no religion here on xml vs json vs yaml vs ... Regards Hervé - Mail d'origine - De: jan i j...@apache.org À: dev@community.apache.org Envoyé: Sat, 18 Apr 2015 11:44:54 +0200 (CEST) Objet: Re: Re : Re: Project Visualization Tool... On Saturday, April 18, 2015, herve.bout...@free.fr wrote: It was told the new site would use native json, instead of doap But I'm not convinced at all, since Doap is an invaluable source of info, documented, and so on json is also a documented standard, that in general is more known, and I believe has more tools supporting it. then imho it would be better to generate json from doap I disabled the json edit feature recently since it will cause problems which problems? with a defined json it is simple to generate the doap file. I highly recommend staying at json and using that as base for all our central data. rgds jan i regards Hervé - Mail d'origine - De: Shane Curcuru a...@shanecurcuru.org javascript:; À: dev@community.apache.org javascript:; Envoyé: Sat, 18 Apr 2015 06:43:37 +0200 (CEST) Objet: Re: Project Visualization Tool... We had a great session, and a lot of energy, hopefully we can make some progress. One note: this needs to be a comdev PMC project, and we need to really plan the data part out if we want to be successful. Note that projects-new.a.o is the planned future replacement for projects.a.o - there are *significant* differences, so you need to look at the About page and the source repo. In particular, the new site uses it's own new JSON generated sources which (I think) will no longer use the DOAPs. In particular, Infra currently does *not* consider either the data gathering (i.e. populating the JSON behind the projects-new site) nor the visualizations (current or ones we want to build) as core supported services. So whatever we build needs to be maintained by this PMC to start with. Also, Link dump of useful related bits: Old service, based on crappy cron jobs and DOAP files from projects: https://projects.apache.org/ New service, soon to be infra supported, relying on JSON data generated by infra on a regular schedule: https://projects-new.apache.org/ Useful PMC chair report helper, that surfaces a number of different statistics about your PMC(s), including mailing list stats, PMC/committer changes, some software releases, etc. etc. (Members have visibility to all PMCs): https://reporter.apache.org Rob Weir (AOO, Member) used to do some visualization stuff and might have code ideas: http://www.robweir.com/blog/2013/05/mapping-apache.html Ken Coar's old mailing list stats page: https://people.apache.org/~coar/mlists.html The AOO project wrote a mailing list visualizer for who talks to whom: https://blogs.apache.org/OOo/entry/visualizing_the_aoo_dev_list Some outside statistics FLOSSmole generated about Apache communities and lists: http://flossmole.org/category/tags/apache Random other interesting analytics: The Subversion project has the contribulyzer - Shane -- Sent from My iPad, sorry for any misspellings.
Re: Project Visualization Tool...
On 16 May 2015 at 08:31, Hervé BOUTEMY herve.bout...@free.fr wrote: Le samedi 16 mai 2015 00:30:55 sebb a écrit : On 15 May 2015 at 23:28, Hervé BOUTEMY herve.bout...@free.fr wrote: Le vendredi 15 mai 2015 15:34:47 sebb a écrit : I think we really have some data model problem here regarding what is a project's DOAP file: sometimes, a project is a PMC, sometimes a project is a deliverable, more like what is called in projectsnew.a.o a sub-project That is not how I understand DOAPs. DOAP == Description Of A Project i.e. some releaseable artifact. A single PMC may have multiple projects, each with its own releases and repositories. These are modelled quite well in the DOAPs that PMCs have created. +1 Information about the PMC which manages the projects is NOT stored in a DOAP, it is stored in a PMC data file. This is referenced from a DOAP using asfext:pmd rdf:resource=URL/ where URL is either an actual URL of a PMC data file or a dummy URL e.g. asfext:pmc rdf:resource=http://pmcname.apache.org / which leads to a file here: https://svn.apache.org/repos/asf/infrastructure/site-tools/trunk/projects /da ta_files/pmcname.rdf I'm not RDF expert, but this Apache-specific algorithm to find PMC rdf file seems strange: I understand it is coded/known from projects.a.o xslt transformation Yes. But this should be usable from any RDF tooling, no? It's not currently usable except by using special processing. The problem is that the shorthand URL is used by all but about 4 of the PMCs, so it would be a major challenge to get this fixed. Some PMCs are quick to fix such issues; some may take weeks or months to fix even a simple error. I think that people don't understand this PMC information rdf file (I didn't until our current discussion) But with good explanations and visualization help given by projects-new.a.o, we can go really faster: I'm ready to try once we're clear :) Another problem I see with these PMC data rdf files is that they seem to not be really maintained: I doubt PMCs update PMC data rdf files on each PMC Chair change. Yes. That's why I had the idea of generating/updating the chair when parsing committee-info.txt. Fair enough, but that does not mean the code needs to create yet another RDF file. +1 my itend was not to create a new one, but replace with generated info But other information manually written in current PMC data rdf files can't be found anywhere else, AFAIK. Yes. that's where it hurst: we need to mix handwritten with generated content... nedd to be clear on the process Last problem: I personnally really didn't understand this PMC data rdf file until now. I don't know who understands it :) IMHO, the magic algorithm to find the rdf file is a root cause. The PMC data file is documented here: http://projects.apache.org/docs/pmc.html yeah, I read it several time before, I knew I was not confident with what I read, and now I know I completely misread it until now. Does it need clarifying? If so, what is not clear? How could it be improved? if you look at https://projects-new.apache.org/projects.html?pmc, typical cases for that are: - Incubator: there is the the Incubator project, displayed without DOAP file since the incubator has special source info, and many sub-projects which provide DOAP files - Commons: there is no Commons' DOAP file, then no TLP... on sub-project is quasi randomly chosen... Common's DOAP file, if it existed would not release anything, its a pure organizational project There is an ambiguity here: project can mean an organisational entity and project can mean a releaseable artifact. There are different RDF files for the two meanings; only the artifact has an associated DOAP. - Ant: there is an Ant DOAP file that represent the TLP and the main released artifact No, it only links to the TLP = PMC data file, it does not represent the TLP. The Ant DOAP file only represents the Ant product. ok, IIUC, I should rephrase https://projects-new.apache.org/project.html?ant : 1. Top Level Project data: to Apache Committee data: 2. Project established: to Committee established: That does not seem necessary. 3. Sub-projects (8): to Projects (8):, eventually boldening the TLP if one is the TLP No - none of the projects are the TLP. as said in the other thread, this assertion is confusing: none of the projects are the Top Level Project On reflection, I think I was wrong about that. The TLP is the original project which the PMC was created to manage. The TLP / PMC is not the same as any of its projects. Most PMCs happen to have the same name as one of their projects, but they are distinct entities. To take the Ant example, there needs to be an Ant PMC/TLP page and a separate Ant project page. These should be linked somehow. and I should rename tlps.json to
Re: Project Visualization Tool...
Le samedi 16 mai 2015 00:30:55 sebb a écrit : On 15 May 2015 at 23:28, Hervé BOUTEMY herve.bout...@free.fr wrote: Le vendredi 15 mai 2015 15:34:47 sebb a écrit : I think we really have some data model problem here regarding what is a project's DOAP file: sometimes, a project is a PMC, sometimes a project is a deliverable, more like what is called in projectsnew.a.o a sub-project That is not how I understand DOAPs. DOAP == Description Of A Project i.e. some releaseable artifact. A single PMC may have multiple projects, each with its own releases and repositories. These are modelled quite well in the DOAPs that PMCs have created. +1 Information about the PMC which manages the projects is NOT stored in a DOAP, it is stored in a PMC data file. This is referenced from a DOAP using asfext:pmd rdf:resource=URL/ where URL is either an actual URL of a PMC data file or a dummy URL e.g. asfext:pmc rdf:resource=http://pmcname.apache.org / which leads to a file here: https://svn.apache.org/repos/asf/infrastructure/site-tools/trunk/projects /da ta_files/pmcname.rdf I'm not RDF expert, but this Apache-specific algorithm to find PMC rdf file seems strange: I understand it is coded/known from projects.a.o xslt transformation Yes. But this should be usable from any RDF tooling, no? It's not currently usable except by using special processing. The problem is that the shorthand URL is used by all but about 4 of the PMCs, so it would be a major challenge to get this fixed. Some PMCs are quick to fix such issues; some may take weeks or months to fix even a simple error. I think that people don't understand this PMC information rdf file (I didn't until our current discussion) But with good explanations and visualization help given by projects-new.a.o, we can go really faster: I'm ready to try once we're clear :) Another problem I see with these PMC data rdf files is that they seem to not be really maintained: I doubt PMCs update PMC data rdf files on each PMC Chair change. Yes. That's why I had the idea of generating/updating the chair when parsing committee-info.txt. Fair enough, but that does not mean the code needs to create yet another RDF file. +1 my itend was not to create a new one, but replace with generated info But other information manually written in current PMC data rdf files can't be found anywhere else, AFAIK. Yes. that's where it hurst: we need to mix handwritten with generated content... nedd to be clear on the process Last problem: I personnally really didn't understand this PMC data rdf file until now. I don't know who understands it :) IMHO, the magic algorithm to find the rdf file is a root cause. The PMC data file is documented here: http://projects.apache.org/docs/pmc.html yeah, I read it several time before, I knew I was not confident with what I read, and now I know I completely misread it until now. if you look at https://projects-new.apache.org/projects.html?pmc, typical cases for that are: - Incubator: there is the the Incubator project, displayed without DOAP file since the incubator has special source info, and many sub-projects which provide DOAP files - Commons: there is no Commons' DOAP file, then no TLP... on sub-project is quasi randomly chosen... Common's DOAP file, if it existed would not release anything, its a pure organizational project There is an ambiguity here: project can mean an organisational entity and project can mean a releaseable artifact. There are different RDF files for the two meanings; only the artifact has an associated DOAP. - Ant: there is an Ant DOAP file that represent the TLP and the main released artifact No, it only links to the TLP = PMC data file, it does not represent the TLP. The Ant DOAP file only represents the Ant product. ok, IIUC, I should rephrase https://projects-new.apache.org/project.html?ant : 1. Top Level Project data: to Apache Committee data: 2. Project established: to Committee established: That does not seem necessary. 3. Sub-projects (8): to Projects (8):, eventually boldening the TLP if one is the TLP No - none of the projects are the TLP. as said in the other thread, this assertion is confusing: none of the projects are the Top Level Project The TLP / PMC is not the same as any of its projects. Most PMCs happen to have the same name as one of their projects, but they are distinct entities. To take the Ant example, there needs to be an Ant PMC/TLP page and a separate Ant project page. These should be linked somehow. and I should rename tlps.json to committees.json (and update code accordingly) No need. given this problem with a TLP is not a project, I think using committee or PMC would avoid confusion then on https://projects-new.apache.org/ , do we really want to graph TLPs
Re: Project Visualization Tool...
On 16 May 2015 at 00:30, sebb seb...@gmail.com wrote: On 15 May 2015 at 23:28, Hervé BOUTEMY herve.bout...@free.fr wrote: Le vendredi 15 mai 2015 15:34:47 sebb a écrit : I think we really have some data model problem here regarding what is a project's DOAP file: sometimes, a project is a PMC, sometimes a project is a deliverable, more like what is called in projectsnew.a.o a sub-project That is not how I understand DOAPs. DOAP == Description Of A Project i.e. some releaseable artifact. A single PMC may have multiple projects, each with its own releases and repositories. These are modelled quite well in the DOAPs that PMCs have created. +1 Information about the PMC which manages the projects is NOT stored in a DOAP, it is stored in a PMC data file. This is referenced from a DOAP using asfext:pmd rdf:resource=URL/ where URL is either an actual URL of a PMC data file or a dummy URL e.g. asfext:pmc rdf:resource=http://pmcname.apache.org / which leads to a file here: https://svn.apache.org/repos/asf/infrastructure/site-tools/trunk/projects/da ta_files/pmcname.rdf I'm not RDF expert, but this Apache-specific algorithm to find PMC rdf file seems strange: I understand it is coded/known from projects.a.o xslt transformation Yes. But this should be usable from any RDF tooling, no? It's not currently usable except by using special processing. The problem is that the shorthand URL is used by all but about 4 of the PMCs, so it would be a major challenge to get this fixed. Some PMCs are quick to fix such issues; some may take weeks or months to fix even a simple error. Another problem I see with these PMC data rdf files is that they seem to not be really maintained: I doubt PMCs update PMC data rdf files on each PMC Chair change. Yes. That's why I had the idea of generating/updating the chair when parsing committee-info.txt. Fair enough, but that does not mean the code needs to create yet another RDF file. But other information manually written in current PMC data rdf files can't be found anywhere else, AFAIK. Yes. Last problem: I personnally really didn't understand this PMC data rdf file until now. I don't know who understands it :) IMHO, the magic algorithm to find the rdf file is a root cause. The PMC data file is documented here: http://projects.apache.org/docs/pmc.html if you look at https://projects-new.apache.org/projects.html?pmc, typical cases for that are: - Incubator: there is the the Incubator project, displayed without DOAP file since the incubator has special source info, and many sub-projects which provide DOAP files - Commons: there is no Commons' DOAP file, then no TLP... on sub-project is quasi randomly chosen... Common's DOAP file, if it existed would not release anything, its a pure organizational project There is an ambiguity here: project can mean an organisational entity and project can mean a releaseable artifact. There are different RDF files for the two meanings; only the artifact has an associated DOAP. - Ant: there is an Ant DOAP file that represent the TLP and the main released artifact No, it only links to the TLP = PMC data file, it does not represent the TLP. The Ant DOAP file only represents the Ant product. ok, IIUC, I should rephrase https://projects-new.apache.org/project.html?ant : 1. Top Level Project data: to Apache Committee data: 2. Project established: to Committee established: That does not seem necessary. 3. Sub-projects (8): to Projects (8):, eventually boldening the TLP if one is the TLP No - none of the projects are the TLP. The TLP / PMC is not the same as any of its projects. Most PMCs happen to have the same name as one of their projects, but they are distinct entities. Note that the Creadur PMC does not have a Creadur project. To take the Ant example, there needs to be an Ant PMC/TLP page and a separate Ant project page. These should be linked somehow. and I should rename tlps.json to committees.json (and update code accordingly) No need. then on https://projects-new.apache.org/ , do we really want to graph TLPs evolution or committees? No idea I suppose commons can be called a TLP, even if it does not have any main project that is the effective TLP Yes, Commons is a TLP/PMC. I don't think it's helpful to think of PMCs having a main project. PMCs have one or more projects; each project has a single PMC. comdev is not really a TLP: should probably not be listed in projects list, but as special committee not producing projects? Well, it is responsible for this mailing list and is probably responsible for the projects.a.o website. is Labs a TLP? or like comdev? What does committee-info.txt say? I suppose we can hard-code the list of committees that are not expected to have projects, the list should not change often: Labs and comdev seem to be the only 2 (that extend special committees from 5 to 7) and
Re: Project Visualization Tool...
On 5 May 2015 at 07:38, Hervé BOUTEMY herve.bout...@free.fr wrote: Le samedi 18 avril 2015 10:55:00 Shane Curcuru a écrit : LOL, below. I highly recommend separating the model from the views, so that we can efficiently enable our volunteer's energy here to actually accomplish something valuable. +1 So let's work on stuff to do that excites us, but remember to keep the technical problems focused on what this PMC believes we can truly create and maintain going forward. Don't worry about everything at once. Just focus on separate bits: - Method to scrape source data from our various definitive or even not completely definitive but very close places (txt files, websites, LDAP) - Model and data source that actually holds info about committer lists and project metadata. I'm betting Daniels' projects-new does this very well already. +1 it's a perfect starting point: just need to document and continue to improve then I started by documenting what are the current information sources used for generating projects-new.a.o json files: see https://projects-new.apache.org/json/foundation/ and http://svn.apache.org/viewvc/comdev/projects.apache.org/scripts/README.txt?view=markup -- - Stable API to get at that model. Would be really nice if we did this just once, so that people working above here don't interfere with people working below here. -- +1 Since there are multiple information sources for TLPs/PMCs/committers, I think I will consolidate to avoid what's currently happenning: the projects.js (ie one visualization) contains a lot of code to consolidate the multiple information sources If the consolidation is done server side, in the generation scripts, it will be easier to use for projects.js and any other tool wanting to do other future visualizations - Visualizations. There's lots of different stuff to do here, and I think it'd be super helpful if everyone just did something they want, and then show us the code. +1 Sure, there's lots of what is important to focus on, but I for one would love to see real examples of all the cool visualization libraries out there, and I know a couple folks already use some of them. - UI additions for the projects-new/projects websites, which are featured at the top level of a.o. I.e., this is our projects directory, how can we better lead people who arrive there at what they want to know? at the moment, I'm not trying to add any new UI, but improve the consistency of displayed data, since current state is not really consistent: some PMCs are not displayed, probably because they have not provided any DOAP file. But even without DOAP file, we have a lot of data to display for a TLP, most of what we display for a TLP (ie a project that does not have any subproject) I think we really have some data model problem here regarding what is a project's DOAP file: sometimes, a project is a PMC, sometimes a project is a deliverable, more like what is called in projectsnew.a.o a sub-project That is not how I understand DOAPs. DOAP == Description Of A Project i.e. some releaseable artifact. A single PMC may have multiple projects, each with its own releases and repositories. These are modelled quite well in the DOAPs that PMCs have created. Information about the PMC which manages the projects is NOT stored in a DOAP, it is stored in a PMC data file. This is referenced from a DOAP using asfext:pmd rdf:resource=URL/ where URL is either an actual URL of a PMC data file or a dummy URL e.g. asfext:pmc rdf:resource=http://pmcname.apache.org / which leads to a file here: https://svn.apache.org/repos/asf/infrastructure/site-tools/trunk/projects/data_files/pmcname.rdf if you look at https://projects-new.apache.org/projects.html?pmc, typical cases for that are: - Incubator: there is the the Incubator project, displayed without DOAP file since the incubator has special source info, and many sub-projects which provide DOAP files - Commons: there is no Commons' DOAP file, then no TLP... on sub-project is quasi randomly chosen... Common's DOAP file, if it existed would not release anything, its a pure organizational project There is an ambiguity here: project can mean an organisational entity and project can mean a releaseable artifact. There are different RDF files for the two meanings; only the artifact has an associated DOAP. - Ant: there is an Ant DOAP file that represent the TLP and the main released artifact No, it only links to the TLP = PMC data file, it does not represent the TLP. The Ant DOAP file only represents the Ant product. I chose Commons, but it could have been HttpComponents or Logging Services, or Lucene (Lucene have been very clear that there is a Lucene core sub- project), Web Services, Axis, Xalan, Xerces, XML Graphics, Attic, Creadur, DB, jUDDI, Tcl I chose Ant, but it could have been Velocity, MINA, Directory, HTTP Server, MyFaces, Tomcat - (future) UI
Re: Project Visualization Tool...
Le samedi 18 avril 2015 10:55:00 Shane Curcuru a écrit : LOL, below. I highly recommend separating the model from the views, so that we can efficiently enable our volunteer's energy here to actually accomplish something valuable. +1 So let's work on stuff to do that excites us, but remember to keep the technical problems focused on what this PMC believes we can truly create and maintain going forward. Don't worry about everything at once. Just focus on separate bits: - Method to scrape source data from our various definitive or even not completely definitive but very close places (txt files, websites, LDAP) - Model and data source that actually holds info about committer lists and project metadata. I'm betting Daniels' projects-new does this very well already. +1 it's a perfect starting point: just need to document and continue to improve then I started by documenting what are the current information sources used for generating projects-new.a.o json files: see https://projects-new.apache.org/json/foundation/ and http://svn.apache.org/viewvc/comdev/projects.apache.org/scripts/README.txt?view=markup -- - Stable API to get at that model. Would be really nice if we did this just once, so that people working above here don't interfere with people working below here. -- +1 Since there are multiple information sources for TLPs/PMCs/committers, I think I will consolidate to avoid what's currently happenning: the projects.js (ie one visualization) contains a lot of code to consolidate the multiple information sources If the consolidation is done server side, in the generation scripts, it will be easier to use for projects.js and any other tool wanting to do other future visualizations - Visualizations. There's lots of different stuff to do here, and I think it'd be super helpful if everyone just did something they want, and then show us the code. +1 Sure, there's lots of what is important to focus on, but I for one would love to see real examples of all the cool visualization libraries out there, and I know a couple folks already use some of them. - UI additions for the projects-new/projects websites, which are featured at the top level of a.o. I.e., this is our projects directory, how can we better lead people who arrive there at what they want to know? at the moment, I'm not trying to add any new UI, but improve the consistency of displayed data, since current state is not really consistent: some PMCs are not displayed, probably because they have not provided any DOAP file. But even without DOAP file, we have a lot of data to display for a TLP, most of what we display for a TLP (ie a project that does not have any subproject) I think we really have some data model problem here regarding what is a project's DOAP file: sometimes, a project is a PMC, sometimes a project is a deliverable, more like what is called in projectsnew.a.o a sub-project if you look at https://projects-new.apache.org/projects.html?pmc, typical cases for that are: - Incubator: there is the the Incubator project, displayed without DOAP file since the incubator has special source info, and many sub-projects which provide DOAP files - Commons: there is no Commons' DOAP file, then no TLP... on sub-project is quasi randomly chosen... Common's DOAP file, if it existed would not release anything, its a pure organizational project - Ant: there is an Ant DOAP file that represent the TLP and the main released artifact I chose Commons, but it could have been HttpComponents or Logging Services, or Lucene (Lucene have been very clear that there is a Lucene core sub- project), Web Services, Axis, Xalan, Xerces, XML Graphics, Attic, Creadur, DB, jUDDI, Tcl I chose Ant, but it could have been Velocity, MINA, Directory, HTTP Server, MyFaces, Tomcat - (future) UI additions for *other* places. It would be awesome, for example, to provide a tiny scriptlet that any project could inject in their website that displays a see also menu. That would link to a specific URL on projects.a.o that would say hey, you came from Cassandra, here are: -other big data projects, -other projects in Java, -other projects with the same committers... etc. as a service. - Shane I'll continue tonight on this Any help appreciated Regards, Hervé
Re: Re : Re: Project Visualization Tool...
LOL, below. I highly recommend separating the model from the views, so that we can efficiently enable our volunteer's energy here to actually accomplish something valuable. So let's work on stuff to do that excites us, but remember to keep the technical problems focused on what this PMC believes we can truly create and maintain going forward. Don't worry about everything at once. Just focus on separate bits: - Method to scrape source data from our various definitive or even not completely definitive but very close places (txt files, websites, LDAP) - Model and data source that actually holds info about committer lists and project metadata. I'm betting Daniels' projects-new does this very well already. -- - Stable API to get at that model. Would be really nice if we did this just once, so that people working above here don't interfere with people working below here. -- - Visualizations. There's lots of different stuff to do here, and I think it'd be super helpful if everyone just did something they want, and then show us the code. Sure, there's lots of what is important to focus on, but I for one would love to see real examples of all the cool visualization libraries out there, and I know a couple folks already use some of them. - UI additions for the projects-new/projects websites, which are featured at the top level of a.o. I.e., this is our projects directory, how can we better lead people who arrive there at what they want to know? - (future) UI additions for *other* places. It would be awesome, for example, to provide a tiny scriptlet that any project could inject in their website that displays a see also menu. That would link to a specific URL on projects.a.o that would say hey, you came from Cassandra, here are: -other big data projects, -other projects in Java, -other projects with the same committers... etc. as a service. - Shane On 4/18/15 5:44 AM, jan i wrote: On Saturday, April 18, 2015, herve.bout...@free.fr wrote: It was told the new site would use native json, instead of doap But I'm not convinced at all, since Doap is an invaluable source of info, documented, and so on json is also a documented standard, that in general is more known, and I believe has more tools supporting it. then imho it would be better to generate json from doap I disabled the json edit feature recently since it will cause problems which problems? with a defined json it is simple to generate the doap file. I highly recommend staying at json and using that as base for all our central data. rgds jan i regards Hervé - Mail d'origine - De: Shane Curcuru a...@shanecurcuru.org javascript:; À: dev@community.apache.org javascript:; Envoyé: Sat, 18 Apr 2015 06:43:37 +0200 (CEST) Objet: Re: Project Visualization Tool... We had a great session, and a lot of energy, hopefully we can make some progress. One note: this needs to be a comdev PMC project, and we need to really plan the data part out if we want to be successful. Note that projects-new.a.o is the planned future replacement for projects.a.o - there are *significant* differences, so you need to look at the About page and the source repo. In particular, the new site uses it's own new JSON generated sources which (I think) will no longer use the DOAPs. In particular, Infra currently does *not* consider either the data gathering (i.e. populating the JSON behind the projects-new site) nor the visualizations (current or ones we want to build) as core supported services. So whatever we build needs to be maintained by this PMC to start with. Also, Link dump of useful related bits: Old service, based on crappy cron jobs and DOAP files from projects: https://projects.apache.org/ New service, soon to be infra supported, relying on JSON data generated by infra on a regular schedule: https://projects-new.apache.org/ Useful PMC chair report helper, that surfaces a number of different statistics about your PMC(s), including mailing list stats, PMC/committer changes, some software releases, etc. etc. (Members have visibility to all PMCs): https://reporter.apache.org Rob Weir (AOO, Member) used to do some visualization stuff and might have code ideas: http://www.robweir.com/blog/2013/05/mapping-apache.html Ken Coar's old mailing list stats page: https://people.apache.org/~coar/mlists.html The AOO project wrote a mailing list visualizer for who talks to whom: https://blogs.apache.org/OOo/entry/visualizing_the_aoo_dev_list Some outside statistics FLOSSmole generated about Apache communities and lists: http://flossmole.org/category/tags/apache Random other interesting analytics: The Subversion project has the contribulyzer - Shane
Re : Re: Re : Re: Project Visualization Tool...
Yes, I have no problem with json vs xml: the question is more to define the schema like doap did it, and write documentation for projects to know where to publish what information editing current generated json just creates a new information source, without any documentation My point is: afaik, the purpose of the site is to display info in newer ways, then json generated from every existing piece of information is great, like any other format that would better suit some other visualization But if we're creating any new source of information that competes with existing one, this has to be done with great care on documentation, explanation on how to migrate and so on of course the raw format is not an issue: no religion here on xml vs json vs yaml vs ... Regards Hervé - Mail d'origine - De: jan i j...@apache.org À: dev@community.apache.org Envoyé: Sat, 18 Apr 2015 11:44:54 +0200 (CEST) Objet: Re: Re : Re: Project Visualization Tool... On Saturday, April 18, 2015, herve.bout...@free.fr wrote: It was told the new site would use native json, instead of doap But I'm not convinced at all, since Doap is an invaluable source of info, documented, and so on json is also a documented standard, that in general is more known, and I believe has more tools supporting it. then imho it would be better to generate json from doap I disabled the json edit feature recently since it will cause problems which problems? with a defined json it is simple to generate the doap file. I highly recommend staying at json and using that as base for all our central data. rgds jan i regards Hervé - Mail d'origine - De: Shane Curcuru a...@shanecurcuru.org javascript:; À: dev@community.apache.org javascript:; Envoyé: Sat, 18 Apr 2015 06:43:37 +0200 (CEST) Objet: Re: Project Visualization Tool... We had a great session, and a lot of energy, hopefully we can make some progress. One note: this needs to be a comdev PMC project, and we need to really plan the data part out if we want to be successful. Note that projects-new.a.o is the planned future replacement for projects.a.o - there are *significant* differences, so you need to look at the About page and the source repo. In particular, the new site uses it's own new JSON generated sources which (I think) will no longer use the DOAPs. In particular, Infra currently does *not* consider either the data gathering (i.e. populating the JSON behind the projects-new site) nor the visualizations (current or ones we want to build) as core supported services. So whatever we build needs to be maintained by this PMC to start with. Also, Link dump of useful related bits: Old service, based on crappy cron jobs and DOAP files from projects: https://projects.apache.org/ New service, soon to be infra supported, relying on JSON data generated by infra on a regular schedule: https://projects-new.apache.org/ Useful PMC chair report helper, that surfaces a number of different statistics about your PMC(s), including mailing list stats, PMC/committer changes, some software releases, etc. etc. (Members have visibility to all PMCs): https://reporter.apache.org Rob Weir (AOO, Member) used to do some visualization stuff and might have code ideas: http://www.robweir.com/blog/2013/05/mapping-apache.html Ken Coar's old mailing list stats page: https://people.apache.org/~coar/mlists.html The AOO project wrote a mailing list visualizer for who talks to whom: https://blogs.apache.org/OOo/entry/visualizing_the_aoo_dev_list Some outside statistics FLOSSmole generated about Apache communities and lists: http://flossmole.org/category/tags/apache Random other interesting analytics: The Subversion project has the contribulyzer - Shane -- Sent from My iPad, sorry for any misspellings.
Re: Re : Re: Project Visualization Tool...
On Saturday, April 18, 2015, herve.bout...@free.fr wrote: It was told the new site would use native json, instead of doap But I'm not convinced at all, since Doap is an invaluable source of info, documented, and so on json is also a documented standard, that in general is more known, and I believe has more tools supporting it. then imho it would be better to generate json from doap I disabled the json edit feature recently since it will cause problems which problems? with a defined json it is simple to generate the doap file. I highly recommend staying at json and using that as base for all our central data. rgds jan i regards Hervé - Mail d'origine - De: Shane Curcuru a...@shanecurcuru.org javascript:; À: dev@community.apache.org javascript:; Envoyé: Sat, 18 Apr 2015 06:43:37 +0200 (CEST) Objet: Re: Project Visualization Tool... We had a great session, and a lot of energy, hopefully we can make some progress. One note: this needs to be a comdev PMC project, and we need to really plan the data part out if we want to be successful. Note that projects-new.a.o is the planned future replacement for projects.a.o - there are *significant* differences, so you need to look at the About page and the source repo. In particular, the new site uses it's own new JSON generated sources which (I think) will no longer use the DOAPs. In particular, Infra currently does *not* consider either the data gathering (i.e. populating the JSON behind the projects-new site) nor the visualizations (current or ones we want to build) as core supported services. So whatever we build needs to be maintained by this PMC to start with. Also, Link dump of useful related bits: Old service, based on crappy cron jobs and DOAP files from projects: https://projects.apache.org/ New service, soon to be infra supported, relying on JSON data generated by infra on a regular schedule: https://projects-new.apache.org/ Useful PMC chair report helper, that surfaces a number of different statistics about your PMC(s), including mailing list stats, PMC/committer changes, some software releases, etc. etc. (Members have visibility to all PMCs): https://reporter.apache.org Rob Weir (AOO, Member) used to do some visualization stuff and might have code ideas: http://www.robweir.com/blog/2013/05/mapping-apache.html Ken Coar's old mailing list stats page: https://people.apache.org/~coar/mlists.html The AOO project wrote a mailing list visualizer for who talks to whom: https://blogs.apache.org/OOo/entry/visualizing_the_aoo_dev_list Some outside statistics FLOSSmole generated about Apache communities and lists: http://flossmole.org/category/tags/apache Random other interesting analytics: The Subversion project has the contribulyzer - Shane -- Sent from My iPad, sorry for any misspellings.
Re: Project Visualization Tool...
On Thu, Apr 16, 2015 at 04:12:17PM -0500, James Carman wrote: At ApacheCon, we discussed creating a project visualization tool to help folks navigation the ever-growing number of projects we have here at the ASF. The idea would be to allow folks to see some form of tag cloud or something (with the tags being the projects themselves), but the cloud is interactive, allowing filtering by various dimensions (size of project, age, relationships to other projects, programming language, etc.). We already have a new projects page in the works: https://projects-new.apache.org/ which displays quite a bit of information. Where do we get that information? Herve added an About page recently: https://projects-new.apache.org/about.html Each project manages their own DOAP file. Those files are listed at the old projects.a.o Watching these mails lists recently, i gather that someone needs to run a script manually to re-populate the project-new.a.o -David Do folks have any other ideas about different ways of browsing/exploring the projects? One idea we have is to lean on TinkerPop (currently incubating) to load the data into a graph structure to allow the data to be easily manipulated (the gremlin language allows you to traverse the graph in this way very easily). Thoughts? James Carman
Re : Re: Project Visualization Tool...
If you look at sources, part of it in a crontab But not everything: I'm trying to improve the automated extractions, eventually fixing source data, to be able to do the full extracts through cron the code is open to every committer: don't hesitate to modify it :) Regards Hervé - Mail d'origine - De: David Crossley cross...@apache.org À: dev@community.apache.org Envoyé: Fri, 17 Apr 2015 08:46:10 +0200 (CEST) Objet: Re: Project Visualization Tool... On Thu, Apr 16, 2015 at 04:12:17PM -0500, James Carman wrote: At ApacheCon, we discussed creating a project visualization tool to help folks navigation the ever-growing number of projects we have here at the ASF. The idea would be to allow folks to see some form of tag cloud or something (with the tags being the projects themselves), but the cloud is interactive, allowing filtering by various dimensions (size of project, age, relationships to other projects, programming language, etc.). We already have a new projects page in the works: https://projects-new.apache.org/ which displays quite a bit of information. Where do we get that information? Herve added an About page recently: https://projects-new.apache.org/about.html Each project manages their own DOAP file. Those files are listed at the old projects.a.o Watching these mails lists recently, i gather that someone needs to run a script manually to re-populate the project-new.a.o -David Do folks have any other ideas about different ways of browsing/exploring the projects? One idea we have is to lean on TinkerPop (currently incubating) to load the data into a graph structure to allow the data to be easily manipulated (the gremlin language allows you to traverse the graph in this way very easily). Thoughts? James Carman
Re: Project Visualization Tool...
Wow. great stuff! I was wondering how do you get the projects-per-language stats? E.g. as a Groovy aficionado I looked at https://projects-new.apache.org/projects.html?language#Groovy and don't see Apache Bigtop which uses Groovy and Gradle heavily. Thanks! Cos On Thu, Apr 16, 2015 at 04:12PM, James Carman wrote: At ApacheCon, we discussed creating a project visualization tool to help folks navigation the ever-growing number of projects we have here at the ASF. The idea would be to allow folks to see some form of tag cloud or something (with the tags being the projects themselves), but the cloud is interactive, allowing filtering by various dimensions (size of project, age, relationships to other projects, programming language, etc.). We already have a new projects page in the works: https://projects-new.apache.org/ which displays quite a bit of information. Where do we get that information? Do folks have any other ideas about different ways of browsing/exploring the projects? One idea we have is to lean on TinkerPop (currently incubating) to load the data into a graph structure to allow the data to be easily manipulated (the gremlin language allows you to traverse the graph in this way very easily). Thoughts? James Carman
Re: Project Visualization Tool...
This is all taken from our DOAP file On Thu, Apr 16, 2015 at 5:45 PM, Konstantin Boudnik c...@apache.org wrote: Wow. great stuff! I was wondering how do you get the projects-per-language stats? E.g. as a Groovy aficionado I looked at https://projects-new.apache.org/projects.html?language#Groovy and don't see Apache Bigtop which uses Groovy and Gradle heavily. Thanks! Cos On Thu, Apr 16, 2015 at 04:12PM, James Carman wrote: At ApacheCon, we discussed creating a project visualization tool to help folks navigation the ever-growing number of projects we have here at the ASF. The idea would be to allow folks to see some form of tag cloud or something (with the tags being the projects themselves), but the cloud is interactive, allowing filtering by various dimensions (size of project, age, relationships to other projects, programming language, etc.). We already have a new projects page in the works: https://projects-new.apache.org/ which displays quite a bit of information. Where do we get that information? Do folks have any other ideas about different ways of browsing/exploring the projects? One idea we have is to lean on TinkerPop (currently incubating) to load the data into a graph structure to allow the data to be easily manipulated (the gremlin language allows you to traverse the graph in this way very easily). Thoughts? James Carman