Re: Re : Re: Re : Re: Project Visualization Tool...

2015-05-16 Thread sebb
The main problem I have with JSON is that AFAIK it does not support comments.

On 18 April 2015 at 19:03,  herve.bout...@free.fr wrote:
 Yes, I have no problem with json vs xml: the question is more to define the 
 schema like doap did it, and write documentation for projects to know where 
 to publish what information

 editing current generated json just creates a new information source, without 
 any documentation

 My point is: afaik, the purpose of the site is to display info in newer ways, 
 then json generated from every existing piece of information is great, like 
 any other format that would better suit some other visualization

 But if we're creating any new source of information that competes with 
 existing one, this has to be done with great care on documentation, 
 explanation on how to migrate and so on

 of course the raw format is not an issue: no religion here on xml vs json vs 
 yaml vs ...

 Regards

 Hervé


 - Mail d'origine -
 De: jan i j...@apache.org
 À: dev@community.apache.org
 Envoyé: Sat, 18 Apr 2015 11:44:54 +0200 (CEST)
 Objet: Re: Re : Re: Project Visualization Tool...

 On Saturday, April 18, 2015, herve.bout...@free.fr wrote:

 It was told the new site would use native json, instead of doap
 But I'm not convinced at all, since Doap is an invaluable source of info,
 documented, and so on

 json is also a documented standard, that in general is more known, and I
 believe has more tools supporting it.



 then imho it would be better to generate json from doap

 I disabled the json edit feature recently since it will cause problems

 which problems?

 with a defined json it is simple to generate the doap file.

 I highly recommend staying at json and using that as base for all our
 central data.

 rgds
 jan i




 regards

 Hervé
 - Mail d'origine -
 De: Shane Curcuru a...@shanecurcuru.org javascript:;
 À: dev@community.apache.org javascript:;
 Envoyé: Sat, 18 Apr 2015 06:43:37 +0200 (CEST)
 Objet: Re: Project Visualization Tool...

 We had a great session, and a lot of energy, hopefully we can make some
 progress. One note: this needs to be a comdev PMC project, and we need
 to really plan the data part out if we want to be successful.

 Note that projects-new.a.o is the planned future replacement for
 projects.a.o - there are *significant* differences, so you need to look
 at the About page and the source repo. In particular, the new site uses
 it's own new JSON generated sources which (I think) will no longer use
 the DOAPs.

 In particular, Infra currently does *not* consider either the data
 gathering (i.e. populating the JSON behind the projects-new site) nor
 the visualizations (current or ones we want to build) as core supported
 services. So whatever we build needs to be maintained by this PMC to
 start with.

 Also, Link dump of useful related bits: 

 Old service, based on crappy cron jobs and DOAP files from projects:
 https://projects.apache.org/

 New service, soon to be infra supported, relying on JSON data generated
 by infra on a regular schedule:
 https://projects-new.apache.org/

 Useful PMC chair report helper, that surfaces a number of different
 statistics about your PMC(s), including mailing list stats,
 PMC/committer changes, some software releases, etc. etc. (Members have
 visibility to all PMCs):
 https://reporter.apache.org

 Rob Weir (AOO, Member) used to do some visualization stuff and might
 have code ideas:
 http://www.robweir.com/blog/2013/05/mapping-apache.html

 Ken Coar's old mailing list stats page:

 https://people.apache.org/~coar/mlists.html

 The AOO project wrote a mailing list visualizer for who talks to whom:
 https://blogs.apache.org/OOo/entry/visualizing_the_aoo_dev_list

 Some outside statistics FLOSSmole generated about Apache communities and
 lists:
 http://flossmole.org/category/tags/apache

 Random other interesting analytics:
 The Subversion project has the contribulyzer



 - Shane



 --
 Sent from My iPad, sorry for any misspellings.



Re: Project Visualization Tool...

2015-05-16 Thread sebb
On 16 May 2015 at 08:31, Hervé BOUTEMY herve.bout...@free.fr wrote:
 Le samedi 16 mai 2015 00:30:55 sebb a écrit :
 On 15 May 2015 at 23:28, Hervé BOUTEMY herve.bout...@free.fr wrote:
  Le vendredi 15 mai 2015 15:34:47 sebb a écrit :
   I think we really have some data model problem here regarding what is a
   project's DOAP file: sometimes, a project is a PMC, sometimes a
   project
   is a deliverable, more like what is called in projectsnew.a.o a
   sub-project
 
  That is not how I understand DOAPs.
 
  DOAP == Description Of A Project
 
  i.e. some releaseable artifact.
 
  A single PMC may have multiple projects, each with its own releases
  and repositories.
  These are modelled quite well in the DOAPs that PMCs have created.
 
  +1
 
  Information about the PMC which manages the projects is NOT stored in
  a DOAP, it is stored in a PMC data file.
  This is referenced from a DOAP using
 
  asfext:pmd rdf:resource=URL/
 
  where URL is either an actual URL of a PMC data file or a dummy URL e.g.
 
  asfext:pmc rdf:resource=http://pmcname.apache.org /
 
  which leads to a file here:
 
  https://svn.apache.org/repos/asf/infrastructure/site-tools/trunk/projects
  /da ta_files/pmcname.rdf
 
  I'm not RDF expert, but this Apache-specific algorithm to find PMC rdf
  file seems strange: I understand it is coded/known from projects.a.o xslt
  transformation
 Yes.

  But this should be usable from any RDF tooling, no?

 It's not currently usable except by using special processing.

 The problem is that the shorthand URL is used by all but about 4 of
 the PMCs, so it would be a major challenge to get this fixed.

 Some PMCs are quick to fix such issues; some may take weeks or months
 to fix even a simple error.
 I think that people don't understand this PMC information rdf file (I didn't
 until our current discussion)
 But with good explanations and visualization help given by projects-new.a.o,
 we can go really faster: I'm ready to try once we're clear :)


  Another problem I see with these PMC data rdf files is that they seem to
  not be really maintained: I doubt PMCs update PMC data rdf files on each
  PMC Chair change.

 Yes.

  That's why I had the idea of generating/updating the chair when
  parsing committee-info.txt.

 Fair enough, but that does not mean the code needs to create yet
 another RDF file.
 +1
 my itend was not to create a new one, but replace with generated info


  But other information manually written in current PMC data rdf files can't
  be found anywhere else, AFAIK.

 Yes.
 that's where it hurst: we need to mix handwritten with generated content...
 nedd to be clear on the process


  Last problem: I personnally really didn't understand this PMC data rdf
  file
  until now. I don't know who understands it :)
  IMHO, the magic algorithm to find the rdf file is a root cause.

 The PMC data file is documented here:

 http://projects.apache.org/docs/pmc.html
 yeah, I read it several time before, I knew I was not confident with what I
 read, and now I know I completely misread it until now.

Does it need clarifying? If so, what is not clear? How could it be improved?


   if you look at https://projects-new.apache.org/projects.html?pmc,
   typical
   cases for that are:
   - Incubator: there is the the Incubator project, displayed without
   DOAP
   file since the incubator has special source info, and many sub-projects
   which provide DOAP files
   - Commons: there is no Commons' DOAP file, then no TLP... on
   sub-project
   is quasi randomly chosen... Common's DOAP file, if it existed would not
   release anything, its a pure organizational project
 
  There is an ambiguity here: project can mean an organisational entity
  and project can mean a releaseable artifact.
 
  There are different RDF files for the two meanings; only the artifact
  has an associated DOAP.
 
   - Ant: there is an Ant DOAP file that represent the TLP and the main
   released artifact
 
  No, it only links to the TLP = PMC data file, it does not represent the
  TLP. The Ant DOAP file only represents the Ant product.
 
  ok, IIUC, I should rephrase
  https://projects-new.apache.org/project.html?ant : 1. Top Level Project
  data: to Apache Committee data:
  2. Project established: to Committee established:

 That does not seem necessary.

  3. Sub-projects (8): to Projects (8):, eventually boldening the TLP if
  one is the TLP

 No - none of the projects are the TLP.
 as said in the other thread, this assertion is confusing: none of the
 projects are the Top Level Project

On reflection, I think I was wrong about that.
The TLP is the original project which the PMC was created to manage.

 The TLP / PMC is not the same as any of its projects.

 Most PMCs happen to have the same name as one of their projects, but
 they are distinct entities.

 To take the Ant example, there needs to be an Ant PMC/TLP page and a
 separate Ant project page.
 These should be linked somehow.

  and I should rename tlps.json to 

Re: Project Visualization Tool...

2015-05-16 Thread Hervé BOUTEMY
Le samedi 16 mai 2015 00:30:55 sebb a écrit :
 On 15 May 2015 at 23:28, Hervé BOUTEMY herve.bout...@free.fr wrote:
  Le vendredi 15 mai 2015 15:34:47 sebb a écrit :
   I think we really have some data model problem here regarding what is a
   project's DOAP file: sometimes, a project is a PMC, sometimes a
   project
   is a deliverable, more like what is called in projectsnew.a.o a
   sub-project
  
  That is not how I understand DOAPs.
  
  DOAP == Description Of A Project
  
  i.e. some releaseable artifact.
  
  A single PMC may have multiple projects, each with its own releases
  and repositories.
  These are modelled quite well in the DOAPs that PMCs have created.
  
  +1
  
  Information about the PMC which manages the projects is NOT stored in
  a DOAP, it is stored in a PMC data file.
  This is referenced from a DOAP using
  
  asfext:pmd rdf:resource=URL/
  
  where URL is either an actual URL of a PMC data file or a dummy URL e.g.
  
  asfext:pmc rdf:resource=http://pmcname.apache.org /
  
  which leads to a file here:
  
  https://svn.apache.org/repos/asf/infrastructure/site-tools/trunk/projects
  /da ta_files/pmcname.rdf
  
  I'm not RDF expert, but this Apache-specific algorithm to find PMC rdf
  file seems strange: I understand it is coded/known from projects.a.o xslt
  transformation
 Yes.
 
  But this should be usable from any RDF tooling, no?
 
 It's not currently usable except by using special processing.
 
 The problem is that the shorthand URL is used by all but about 4 of
 the PMCs, so it would be a major challenge to get this fixed.
 
 Some PMCs are quick to fix such issues; some may take weeks or months
 to fix even a simple error.
I think that people don't understand this PMC information rdf file (I didn't 
until our current discussion)
But with good explanations and visualization help given by projects-new.a.o, 
we can go really faster: I'm ready to try once we're clear :)

 
  Another problem I see with these PMC data rdf files is that they seem to
  not be really maintained: I doubt PMCs update PMC data rdf files on each
  PMC Chair change.
 
 Yes.
 
  That's why I had the idea of generating/updating the chair when
  parsing committee-info.txt.
 
 Fair enough, but that does not mean the code needs to create yet
 another RDF file.
+1
my itend was not to create a new one, but replace with generated info

 
  But other information manually written in current PMC data rdf files can't
  be found anywhere else, AFAIK.
 
 Yes.
that's where it hurst: we need to mix handwritten with generated content... 
nedd to be clear on the process

 
  Last problem: I personnally really didn't understand this PMC data rdf
  file
  until now. I don't know who understands it :)
  IMHO, the magic algorithm to find the rdf file is a root cause.
 
 The PMC data file is documented here:
 
 http://projects.apache.org/docs/pmc.html
yeah, I read it several time before, I knew I was not confident with what I 
read, and now I know I completely misread it until now.

 
   if you look at https://projects-new.apache.org/projects.html?pmc,
   typical
   cases for that are:
   - Incubator: there is the the Incubator project, displayed without
   DOAP
   file since the incubator has special source info, and many sub-projects
   which provide DOAP files
   - Commons: there is no Commons' DOAP file, then no TLP... on
   sub-project
   is quasi randomly chosen... Common's DOAP file, if it existed would not
   release anything, its a pure organizational project
  
  There is an ambiguity here: project can mean an organisational entity
  and project can mean a releaseable artifact.
  
  There are different RDF files for the two meanings; only the artifact
  has an associated DOAP.
  
   - Ant: there is an Ant DOAP file that represent the TLP and the main
   released artifact
  
  No, it only links to the TLP = PMC data file, it does not represent the
  TLP. The Ant DOAP file only represents the Ant product.
  
  ok, IIUC, I should rephrase
  https://projects-new.apache.org/project.html?ant : 1. Top Level Project
  data: to Apache Committee data:
  2. Project established: to Committee established:
 
 That does not seem necessary.
 
  3. Sub-projects (8): to Projects (8):, eventually boldening the TLP if
  one is the TLP
 
 No - none of the projects are the TLP.
as said in the other thread, this assertion is confusing: none of the 
projects are the Top Level Project

 The TLP / PMC is not the same as any of its projects.
 
 Most PMCs happen to have the same name as one of their projects, but
 they are distinct entities.
 
 To take the Ant example, there needs to be an Ant PMC/TLP page and a
 separate Ant project page.
 These should be linked somehow.
 
  and I should rename tlps.json to committees.json (and update code
  accordingly)
 No need.
given this problem with a TLP is not a project, I think using committee or 
PMC would avoid confusion

 
  then on https://projects-new.apache.org/ , do we really want to graph TLPs
 

Re: Project Visualization Tool...

2015-05-15 Thread sebb
On 16 May 2015 at 00:30, sebb seb...@gmail.com wrote:
 On 15 May 2015 at 23:28, Hervé BOUTEMY herve.bout...@free.fr wrote:
 Le vendredi 15 mai 2015 15:34:47 sebb a écrit :
  I think we really have some data model problem here regarding what is a
  project's DOAP file: sometimes, a project is a PMC, sometimes a project
  is a deliverable, more like what is called in projectsnew.a.o a
  sub-project
 That is not how I understand DOAPs.

 DOAP == Description Of A Project

 i.e. some releaseable artifact.

 A single PMC may have multiple projects, each with its own releases
 and repositories.
 These are modelled quite well in the DOAPs that PMCs have created.
 +1

 Information about the PMC which manages the projects is NOT stored in
 a DOAP, it is stored in a PMC data file.
 This is referenced from a DOAP using

 asfext:pmd rdf:resource=URL/

 where URL is either an actual URL of a PMC data file or a dummy URL e.g.

 asfext:pmc rdf:resource=http://pmcname.apache.org /

 which leads to a file here:

 https://svn.apache.org/repos/asf/infrastructure/site-tools/trunk/projects/da
 ta_files/pmcname.rdf
 I'm not RDF expert, but this Apache-specific algorithm to find PMC rdf file 
 seems
 strange: I understand it is coded/known from projects.a.o xslt transformation

 Yes.

 But this should be usable from any RDF tooling, no?

 It's not currently usable except by using special processing.

 The problem is that the shorthand URL is used by all but about 4 of
 the PMCs, so it would be a major challenge to get this fixed.

 Some PMCs are quick to fix such issues; some may take weeks or months
 to fix even a simple error.

 Another problem I see with these PMC data rdf files is that they seem to not 
 be
 really maintained: I doubt PMCs update PMC data rdf files on each PMC Chair
 change.

 Yes.

 That's why I had the idea of generating/updating the chair when
 parsing committee-info.txt.

 Fair enough, but that does not mean the code needs to create yet
 another RDF file.

 But other information manually written in current PMC data rdf files can't be
 found anywhere else, AFAIK.


 Yes.

 Last problem: I personnally really didn't understand this PMC data rdf file
 until now. I don't know who understands it :)
 IMHO, the magic algorithm to find the rdf file is a root cause.

 The PMC data file is documented here:

 http://projects.apache.org/docs/pmc.html

  if you look at https://projects-new.apache.org/projects.html?pmc, typical
  cases for that are:
  - Incubator: there is the the Incubator project, displayed without DOAP
  file since the incubator has special source info, and many sub-projects
  which provide DOAP files
  - Commons: there is no Commons' DOAP file, then no TLP... on sub-project
  is quasi randomly chosen... Common's DOAP file, if it existed would not
  release anything, its a pure organizational project

 There is an ambiguity here: project can mean an organisational entity
 and project can mean a releaseable artifact.

 There are different RDF files for the two meanings; only the artifact
 has an associated DOAP.

  - Ant: there is an Ant DOAP file that represent the TLP and the main
  released artifact

 No, it only links to the TLP = PMC data file, it does not represent the TLP.
 The Ant DOAP file only represents the Ant product.
 ok, IIUC, I should rephrase https://projects-new.apache.org/project.html?ant 
 :
 1. Top Level Project data: to Apache Committee data:
 2. Project established: to Committee established:

 That does not seem necessary.

 3. Sub-projects (8): to Projects (8):, eventually boldening the TLP if 
 one
 is the TLP

 No - none of the projects are the TLP.
 The TLP / PMC is not the same as any of its projects.

 Most PMCs happen to have the same name as one of their projects, but
 they are distinct entities.

Note that the Creadur PMC does not have a Creadur project.

 To take the Ant example, there needs to be an Ant PMC/TLP page and a
 separate Ant project page.
 These should be linked somehow.

 and I should rename tlps.json to committees.json (and update code 
 accordingly)

 No need.

 then on https://projects-new.apache.org/ , do we really want to graph TLPs
 evolution or committees?

 No idea

 I suppose commons can be called a TLP, even if it does not have any main
 project that is the effective TLP

 Yes, Commons is a TLP/PMC.

 I don't think it's helpful to think of PMCs having a main project.

 PMCs have one or more projects; each project has a single PMC.

 comdev is not really a TLP: should probably not be listed in projects list,
 but as special committee not producing projects?

 Well, it is responsible for this mailing list and is probably
 responsible for the projects.a.o website.

 is Labs a TLP? or like comdev?

 What does committee-info.txt say?

 I suppose we can hard-code the list of committees that are not expected to
 have projects, the list should not change often: Labs and comdev seem to be
 the only 2 (that extend special committees from 5 to 7)

 and 

Re: Project Visualization Tool...

2015-05-15 Thread sebb
On 5 May 2015 at 07:38, Hervé BOUTEMY herve.bout...@free.fr wrote:
 Le samedi 18 avril 2015 10:55:00 Shane Curcuru a écrit :
 LOL, below.

 I highly recommend separating the model from the views, so that we can
 efficiently enable our volunteer's energy here to actually accomplish
 something valuable.
 +1


 So let's work on stuff to do that excites us, but remember to keep the
 technical problems focused on what this PMC believes we can truly create
 and maintain going forward.

 Don't worry about everything at once.  Just focus on separate bits:

 - Method to scrape source data from our various definitive or even not
 completely definitive but very close places (txt files, websites, LDAP)

 - Model and data source that actually holds info about committer lists
 and project metadata.  I'm betting Daniels' projects-new does this very
 well already.
 +1 it's a perfect starting point: just need to document and continue to
 improve
 then I started by documenting what are the current information sources used
 for generating projects-new.a.o json files:
 see https://projects-new.apache.org/json/foundation/ and
 http://svn.apache.org/viewvc/comdev/projects.apache.org/scripts/README.txt?view=markup


 --
 - Stable API to get at that model.  Would be really nice if we did this
 just once, so that people working above here don't interfere with people
 working below here.
 --
 +1

 Since there are multiple information sources for TLPs/PMCs/committers, I think
 I will consolidate to avoid what's currently happenning: the projects.js (ie
 one visualization) contains a lot of code to consolidate the multiple
 information sources
 If the consolidation is done server side, in the generation scripts, it will
 be easier to use for projects.js and any other tool wanting to do other future
 visualizations


 - Visualizations.  There's lots of different stuff to do here, and I
 think it'd be super helpful if everyone just did something they want,
 and then show us the code.
 +1


 Sure, there's lots of what is important to focus on, but I for one
 would love to see real examples of all the cool visualization libraries
 out there, and I know a couple folks already use some of them.

 - UI additions for the projects-new/projects websites, which are
 featured at the top level of a.o.  I.e., this is our projects
 directory, how can we better lead people who arrive there at what they
 want to know?
 at the moment, I'm not trying to add any new UI, but improve the consistency
 of displayed data, since current state is not really consistent: some PMCs are
 not displayed, probably because they have not provided any DOAP file. But even
 without DOAP file, we have a lot of data to display for a TLP, most of what we
 display for a TLP (ie a project that does not have any subproject)

 I think we really have some data model problem here regarding what is a
 project's DOAP file: sometimes, a project is a PMC, sometimes a project is a
 deliverable, more like what is called in projectsnew.a.o a sub-project

That is not how I understand DOAPs.

DOAP == Description Of A Project

i.e. some releaseable artifact.

A single PMC may have multiple projects, each with its own releases
and repositories.
These are modelled quite well in the DOAPs that PMCs have created.

Information about the PMC which manages the projects is NOT stored in
a DOAP, it is stored in a PMC data file.

This is referenced from a DOAP using

asfext:pmd rdf:resource=URL/

where URL is either an actual URL of a PMC data file or a dummy URL e.g.

asfext:pmc rdf:resource=http://pmcname.apache.org /

which leads to a file here:

https://svn.apache.org/repos/asf/infrastructure/site-tools/trunk/projects/data_files/pmcname.rdf


 if you look at https://projects-new.apache.org/projects.html?pmc, typical
 cases for that are:
 - Incubator: there is the the Incubator project, displayed without DOAP file
 since the incubator has special source info, and many sub-projects which
 provide DOAP files
 - Commons: there is no Commons' DOAP file, then no TLP... on sub-project is
 quasi randomly chosen... Common's DOAP file, if it existed would not release
 anything, its a pure organizational project

There is an ambiguity here: project can mean an organisational entity
and project can mean a releaseable artifact.

There are different RDF files for the two meanings; only the artifact
has an associated DOAP.

 - Ant: there is an Ant DOAP file that represent the TLP and the main released
 artifact

No, it only links to the TLP = PMC data file, it does not represent the TLP.
The Ant DOAP file only represents the Ant product.

 I chose Commons, but it could have been HttpComponents or Logging Services, or
 Lucene (Lucene have been very clear that there is a Lucene core sub-
 project), Web Services, Axis, Xalan, Xerces, XML Graphics, Attic, Creadur, DB,
 jUDDI, Tcl

 I chose Ant, but it could have been Velocity, MINA, Directory, HTTP Server,
 MyFaces, Tomcat



 - (future) UI 

Re: Project Visualization Tool...

2015-05-05 Thread Hervé BOUTEMY
Le samedi 18 avril 2015 10:55:00 Shane Curcuru a écrit :
 LOL, below.
 
 I highly recommend separating the model from the views, so that we can
 efficiently enable our volunteer's energy here to actually accomplish
 something valuable.
+1

 
 So let's work on stuff to do that excites us, but remember to keep the
 technical problems focused on what this PMC believes we can truly create
 and maintain going forward.
 
 Don't worry about everything at once.  Just focus on separate bits:
 
 - Method to scrape source data from our various definitive or even not
 completely definitive but very close places (txt files, websites, LDAP)
 
 - Model and data source that actually holds info about committer lists
 and project metadata.  I'm betting Daniels' projects-new does this very
 well already.
+1 it's a perfect starting point: just need to document and continue to 
improve
then I started by documenting what are the current information sources used 
for generating projects-new.a.o json files:
see https://projects-new.apache.org/json/foundation/ and 
http://svn.apache.org/viewvc/comdev/projects.apache.org/scripts/README.txt?view=markup

 
 --
 - Stable API to get at that model.  Would be really nice if we did this
 just once, so that people working above here don't interfere with people
 working below here.
 --
+1

Since there are multiple information sources for TLPs/PMCs/committers, I think 
I will consolidate to avoid what's currently happenning: the projects.js (ie 
one visualization) contains a lot of code to consolidate the multiple 
information sources
If the consolidation is done server side, in the generation scripts, it will 
be easier to use for projects.js and any other tool wanting to do other future 
visualizations

 
 - Visualizations.  There's lots of different stuff to do here, and I
 think it'd be super helpful if everyone just did something they want,
 and then show us the code.
+1

 
 Sure, there's lots of what is important to focus on, but I for one
 would love to see real examples of all the cool visualization libraries
 out there, and I know a couple folks already use some of them.
 
 - UI additions for the projects-new/projects websites, which are
 featured at the top level of a.o.  I.e., this is our projects
 directory, how can we better lead people who arrive there at what they
 want to know?
at the moment, I'm not trying to add any new UI, but improve the consistency 
of displayed data, since current state is not really consistent: some PMCs are 
not displayed, probably because they have not provided any DOAP file. But even 
without DOAP file, we have a lot of data to display for a TLP, most of what we 
display for a TLP (ie a project that does not have any subproject)

I think we really have some data model problem here regarding what is a 
project's DOAP file: sometimes, a project is a PMC, sometimes a project is a 
deliverable, more like what is called in projectsnew.a.o a sub-project

if you look at https://projects-new.apache.org/projects.html?pmc, typical 
cases for that are:
- Incubator: there is the the Incubator project, displayed without DOAP file 
since the incubator has special source info, and many sub-projects which 
provide DOAP files
- Commons: there is no Commons' DOAP file, then no TLP... on sub-project is 
quasi randomly chosen... Common's DOAP file, if it existed would not release 
anything, its a pure organizational project
- Ant: there is an Ant DOAP file that represent the TLP and the main released 
artifact

I chose Commons, but it could have been HttpComponents or Logging Services, or 
Lucene (Lucene have been very clear that there is a Lucene core sub-
project), Web Services, Axis, Xalan, Xerces, XML Graphics, Attic, Creadur, DB, 
jUDDI, Tcl

I chose Ant, but it could have been Velocity, MINA, Directory, HTTP Server, 
MyFaces, Tomcat


 
 - (future) UI additions for *other* places.  It would be awesome, for
 example, to provide a tiny scriptlet that any project could inject in
 their website that displays a see also menu.  That would link to a
 specific URL on projects.a.o that would say hey, you came from
 Cassandra, here are: -other big data projects, -other projects in Java,
 -other projects with the same committers... etc. as a service.
 
 - Shane

I'll continue tonight on this
Any help appreciated

Regards,

Hervé



Re: Re : Re: Project Visualization Tool...

2015-04-18 Thread Shane Curcuru
LOL, below.

I highly recommend separating the model from the views, so that we can
efficiently enable our volunteer's energy here to actually accomplish
something valuable.

So let's work on stuff to do that excites us, but remember to keep the
technical problems focused on what this PMC believes we can truly create
and maintain going forward.

Don't worry about everything at once.  Just focus on separate bits:

- Method to scrape source data from our various definitive or even not
completely definitive but very close places (txt files, websites, LDAP)

- Model and data source that actually holds info about committer lists
and project metadata.  I'm betting Daniels' projects-new does this very
well already.

--
- Stable API to get at that model.  Would be really nice if we did this
just once, so that people working above here don't interfere with people
working below here.
--

- Visualizations.  There's lots of different stuff to do here, and I
think it'd be super helpful if everyone just did something they want,
and then show us the code.

Sure, there's lots of what is important to focus on, but I for one
would love to see real examples of all the cool visualization libraries
out there, and I know a couple folks already use some of them.

- UI additions for the projects-new/projects websites, which are
featured at the top level of a.o.  I.e., this is our projects
directory, how can we better lead people who arrive there at what they
want to know?

- (future) UI additions for *other* places.  It would be awesome, for
example, to provide a tiny scriptlet that any project could inject in
their website that displays a see also menu.  That would link to a
specific URL on projects.a.o that would say hey, you came from
Cassandra, here are: -other big data projects, -other projects in Java,
-other projects with the same committers... etc. as a service.

- Shane


On 4/18/15 5:44 AM, jan i wrote:
 On Saturday, April 18, 2015, herve.bout...@free.fr wrote:
 
 It was told the new site would use native json, instead of doap
 But I'm not convinced at all, since Doap is an invaluable source of info,
 documented, and so on
 
 json is also a documented standard, that in general is more known, and I
 believe has more tools supporting it.
 
 

 then imho it would be better to generate json from doap

 I disabled the json edit feature recently since it will cause problems
 
 which problems?
 
 with a defined json it is simple to generate the doap file.
 
 I highly recommend staying at json and using that as base for all our
 central data.
 
 rgds
 jan i
 
 
 

 regards

 Hervé
 - Mail d'origine -
 De: Shane Curcuru a...@shanecurcuru.org javascript:;
 À: dev@community.apache.org javascript:;
 Envoyé: Sat, 18 Apr 2015 06:43:37 +0200 (CEST)
 Objet: Re: Project Visualization Tool...

 We had a great session, and a lot of energy, hopefully we can make some
 progress. One note: this needs to be a comdev PMC project, and we need
 to really plan the data part out if we want to be successful.

 Note that projects-new.a.o is the planned future replacement for
 projects.a.o - there are *significant* differences, so you need to look
 at the About page and the source repo. In particular, the new site uses
 it's own new JSON generated sources which (I think) will no longer use
 the DOAPs.

 In particular, Infra currently does *not* consider either the data
 gathering (i.e. populating the JSON behind the projects-new site) nor
 the visualizations (current or ones we want to build) as core supported
 services. So whatever we build needs to be maintained by this PMC to
 start with.

 Also, Link dump of useful related bits: 

 Old service, based on crappy cron jobs and DOAP files from projects:
 https://projects.apache.org/

 New service, soon to be infra supported, relying on JSON data generated
 by infra on a regular schedule:
 https://projects-new.apache.org/

 Useful PMC chair report helper, that surfaces a number of different
 statistics about your PMC(s), including mailing list stats,
 PMC/committer changes, some software releases, etc. etc. (Members have
 visibility to all PMCs):
 https://reporter.apache.org

 Rob Weir (AOO, Member) used to do some visualization stuff and might
 have code ideas:
 http://www.robweir.com/blog/2013/05/mapping-apache.html

 Ken Coar's old mailing list stats page:

 https://people.apache.org/~coar/mlists.html

 The AOO project wrote a mailing list visualizer for who talks to whom:
 https://blogs.apache.org/OOo/entry/visualizing_the_aoo_dev_list

 Some outside statistics FLOSSmole generated about Apache communities and
 lists:
 http://flossmole.org/category/tags/apache

 Random other interesting analytics:
 The Subversion project has the contribulyzer



 - Shane


 



Re : Re: Re : Re: Project Visualization Tool...

2015-04-18 Thread herve . boutemy
Yes, I have no problem with json vs xml: the question is more to define the 
schema like doap did it, and write documentation for projects to know where to 
publish what information

editing current generated json just creates a new information source, without 
any documentation

My point is: afaik, the purpose of the site is to display info in newer ways, 
then json generated from every existing piece of information is great, like any 
other format that would better suit some other visualization

But if we're creating any new source of information that competes with existing 
one, this has to be done with great care on documentation, explanation on how 
to migrate and so on

of course the raw format is not an issue: no religion here on xml vs json vs 
yaml vs ... 

Regards

Hervé 


- Mail d'origine -
De: jan i j...@apache.org
À: dev@community.apache.org
Envoyé: Sat, 18 Apr 2015 11:44:54 +0200 (CEST)
Objet: Re: Re : Re: Project Visualization Tool...

On Saturday, April 18, 2015, herve.bout...@free.fr wrote:

 It was told the new site would use native json, instead of doap
 But I'm not convinced at all, since Doap is an invaluable source of info,
 documented, and so on

json is also a documented standard, that in general is more known, and I
believe has more tools supporting it.



 then imho it would be better to generate json from doap

 I disabled the json edit feature recently since it will cause problems

which problems?

with a defined json it is simple to generate the doap file.

I highly recommend staying at json and using that as base for all our
central data.

rgds
jan i




 regards

 Hervé
 - Mail d'origine -
 De: Shane Curcuru a...@shanecurcuru.org javascript:;
 À: dev@community.apache.org javascript:;
 Envoyé: Sat, 18 Apr 2015 06:43:37 +0200 (CEST)
 Objet: Re: Project Visualization Tool...

 We had a great session, and a lot of energy, hopefully we can make some
 progress. One note: this needs to be a comdev PMC project, and we need
 to really plan the data part out if we want to be successful.

 Note that projects-new.a.o is the planned future replacement for
 projects.a.o - there are *significant* differences, so you need to look
 at the About page and the source repo. In particular, the new site uses
 it's own new JSON generated sources which (I think) will no longer use
 the DOAPs.

 In particular, Infra currently does *not* consider either the data
 gathering (i.e. populating the JSON behind the projects-new site) nor
 the visualizations (current or ones we want to build) as core supported
 services. So whatever we build needs to be maintained by this PMC to
 start with.

 Also, Link dump of useful related bits: 

 Old service, based on crappy cron jobs and DOAP files from projects:
 https://projects.apache.org/

 New service, soon to be infra supported, relying on JSON data generated
 by infra on a regular schedule:
 https://projects-new.apache.org/

 Useful PMC chair report helper, that surfaces a number of different
 statistics about your PMC(s), including mailing list stats,
 PMC/committer changes, some software releases, etc. etc. (Members have
 visibility to all PMCs):
 https://reporter.apache.org

 Rob Weir (AOO, Member) used to do some visualization stuff and might
 have code ideas:
 http://www.robweir.com/blog/2013/05/mapping-apache.html

 Ken Coar's old mailing list stats page:

 https://people.apache.org/~coar/mlists.html

 The AOO project wrote a mailing list visualizer for who talks to whom:
 https://blogs.apache.org/OOo/entry/visualizing_the_aoo_dev_list

 Some outside statistics FLOSSmole generated about Apache communities and
 lists:
 http://flossmole.org/category/tags/apache

 Random other interesting analytics:
 The Subversion project has the contribulyzer



 - Shane



-- 
Sent from My iPad, sorry for any misspellings.



Re: Re : Re: Project Visualization Tool...

2015-04-18 Thread jan i
On Saturday, April 18, 2015, herve.bout...@free.fr wrote:

 It was told the new site would use native json, instead of doap
 But I'm not convinced at all, since Doap is an invaluable source of info,
 documented, and so on

json is also a documented standard, that in general is more known, and I
believe has more tools supporting it.



 then imho it would be better to generate json from doap

 I disabled the json edit feature recently since it will cause problems

which problems?

with a defined json it is simple to generate the doap file.

I highly recommend staying at json and using that as base for all our
central data.

rgds
jan i




 regards

 Hervé
 - Mail d'origine -
 De: Shane Curcuru a...@shanecurcuru.org javascript:;
 À: dev@community.apache.org javascript:;
 Envoyé: Sat, 18 Apr 2015 06:43:37 +0200 (CEST)
 Objet: Re: Project Visualization Tool...

 We had a great session, and a lot of energy, hopefully we can make some
 progress. One note: this needs to be a comdev PMC project, and we need
 to really plan the data part out if we want to be successful.

 Note that projects-new.a.o is the planned future replacement for
 projects.a.o - there are *significant* differences, so you need to look
 at the About page and the source repo. In particular, the new site uses
 it's own new JSON generated sources which (I think) will no longer use
 the DOAPs.

 In particular, Infra currently does *not* consider either the data
 gathering (i.e. populating the JSON behind the projects-new site) nor
 the visualizations (current or ones we want to build) as core supported
 services. So whatever we build needs to be maintained by this PMC to
 start with.

 Also, Link dump of useful related bits: 

 Old service, based on crappy cron jobs and DOAP files from projects:
 https://projects.apache.org/

 New service, soon to be infra supported, relying on JSON data generated
 by infra on a regular schedule:
 https://projects-new.apache.org/

 Useful PMC chair report helper, that surfaces a number of different
 statistics about your PMC(s), including mailing list stats,
 PMC/committer changes, some software releases, etc. etc. (Members have
 visibility to all PMCs):
 https://reporter.apache.org

 Rob Weir (AOO, Member) used to do some visualization stuff and might
 have code ideas:
 http://www.robweir.com/blog/2013/05/mapping-apache.html

 Ken Coar's old mailing list stats page:

 https://people.apache.org/~coar/mlists.html

 The AOO project wrote a mailing list visualizer for who talks to whom:
 https://blogs.apache.org/OOo/entry/visualizing_the_aoo_dev_list

 Some outside statistics FLOSSmole generated about Apache communities and
 lists:
 http://flossmole.org/category/tags/apache

 Random other interesting analytics:
 The Subversion project has the contribulyzer



 - Shane



-- 
Sent from My iPad, sorry for any misspellings.


Re: Project Visualization Tool...

2015-04-17 Thread David Crossley
On Thu, Apr 16, 2015 at 04:12:17PM -0500, James Carman wrote:
 At ApacheCon, we discussed creating a project visualization tool to
 help folks navigation the ever-growing number of projects we have here
 at the ASF.  The idea would be to allow folks to see some form of tag
 cloud or something (with the tags being the projects themselves), but
 the cloud is interactive, allowing filtering by various dimensions
 (size of project, age, relationships to other projects, programming
 language, etc.).
 
 We already have a new projects page in the works:
 
 https://projects-new.apache.org/
 
 which displays quite a bit of information.  Where do we get that
 information?

Herve added an About page recently:
https://projects-new.apache.org/about.html

Each project manages their own DOAP file.

Those files are listed at the old projects.a.o

Watching these mails lists recently, i gather that someone
needs to run a script manually to re-populate the project-new.a.o

-David

  Do folks have any other ideas about different ways of
 browsing/exploring the projects?  One idea we have is to lean on
 TinkerPop (currently incubating) to load the data into a graph
 structure to allow the data to be easily manipulated (the gremlin
 language allows you to traverse the graph in this way very easily).
 
 Thoughts?
 
 James Carman


Re : Re: Project Visualization Tool...

2015-04-17 Thread herve . boutemy
If you look at sources, part of it in a crontab
But not everything: I'm trying to improve the automated extractions, eventually 
fixing source data, to be able to do the full extracts through cron

the code is open to every committer: don't hesitate to modify it :)

Regards

Hervé 

- Mail d'origine -
De: David Crossley cross...@apache.org
À: dev@community.apache.org
Envoyé: Fri, 17 Apr 2015 08:46:10 +0200 (CEST)
Objet: Re: Project Visualization Tool...

On Thu, Apr 16, 2015 at 04:12:17PM -0500, James Carman wrote:
 At ApacheCon, we discussed creating a project visualization tool to
 help folks navigation the ever-growing number of projects we have here
 at the ASF. The idea would be to allow folks to see some form of tag
 cloud or something (with the tags being the projects themselves), but
 the cloud is interactive, allowing filtering by various dimensions
 (size of project, age, relationships to other projects, programming
 language, etc.).
 
 We already have a new projects page in the works:
 
 https://projects-new.apache.org/
 
 which displays quite a bit of information. Where do we get that
 information?

Herve added an About page recently:
https://projects-new.apache.org/about.html

Each project manages their own DOAP file.

Those files are listed at the old projects.a.o

Watching these mails lists recently, i gather that someone
needs to run a script manually to re-populate the project-new.a.o

-David

 Do folks have any other ideas about different ways of
 browsing/exploring the projects? One idea we have is to lean on
 TinkerPop (currently incubating) to load the data into a graph
 structure to allow the data to be easily manipulated (the gremlin
 language allows you to traverse the graph in this way very easily).
 
 Thoughts?
 
 James Carman



Re: Project Visualization Tool...

2015-04-16 Thread Konstantin Boudnik
Wow. great stuff! I was wondering how do you get
the projects-per-language stats? E.g. as a Groovy aficionado I looked at

https://projects-new.apache.org/projects.html?language#Groovy

and don't see Apache Bigtop which uses Groovy and Gradle heavily.

Thanks!
  Cos


On Thu, Apr 16, 2015 at 04:12PM, James Carman wrote:
 At ApacheCon, we discussed creating a project visualization tool to
 help folks navigation the ever-growing number of projects we have here
 at the ASF.  The idea would be to allow folks to see some form of tag
 cloud or something (with the tags being the projects themselves), but
 the cloud is interactive, allowing filtering by various dimensions
 (size of project, age, relationships to other projects, programming
 language, etc.).
 
 We already have a new projects page in the works:
 
 https://projects-new.apache.org/
 
 which displays quite a bit of information.  Where do we get that
 information?  Do folks have any other ideas about different ways of
 browsing/exploring the projects?  One idea we have is to lean on
 TinkerPop (currently incubating) to load the data into a graph
 structure to allow the data to be easily manipulated (the gremlin
 language allows you to traverse the graph in this way very easily).
 
 Thoughts?
 
 James Carman


Re: Project Visualization Tool...

2015-04-16 Thread Roman Shaposhnik
This is all taken from our DOAP file

On Thu, Apr 16, 2015 at 5:45 PM, Konstantin Boudnik c...@apache.org wrote:
 Wow. great stuff! I was wondering how do you get
 the projects-per-language stats? E.g. as a Groovy aficionado I looked at

 https://projects-new.apache.org/projects.html?language#Groovy

 and don't see Apache Bigtop which uses Groovy and Gradle heavily.

 Thanks!
   Cos


 On Thu, Apr 16, 2015 at 04:12PM, James Carman wrote:
 At ApacheCon, we discussed creating a project visualization tool to
 help folks navigation the ever-growing number of projects we have here
 at the ASF.  The idea would be to allow folks to see some form of tag
 cloud or something (with the tags being the projects themselves), but
 the cloud is interactive, allowing filtering by various dimensions
 (size of project, age, relationships to other projects, programming
 language, etc.).

 We already have a new projects page in the works:

 https://projects-new.apache.org/

 which displays quite a bit of information.  Where do we get that
 information?  Do folks have any other ideas about different ways of
 browsing/exploring the projects?  One idea we have is to lean on
 TinkerPop (currently incubating) to load the data into a graph
 structure to allow the data to be easily manipulated (the gremlin
 language allows you to traverse the graph in this way very easily).

 Thoughts?

 James Carman