Re: Standards for mail archive statistics gathering?

Hervé BOUTEMY Tue, 05 May 2015 23:06:58 -0700

Le mardi 5 mai 2015 21:26:36 Shane Curcuru a écrit :
> On 5/5/15 7:33 AM, Boris Baldassari wrote:
> > Hi Folks,
> > 
> > Sorry for the late answer on this thread. Don't know what has been done
> > since then, but I've some experience to share on this, so here are my 2c..
> 
> No, more input is always appreciated!  Hervé is doing some
> centralization of the projects-new.a.o data capture, which is related
> but slightly separate.
+1
this can give a common place to put code once experiments show that we should 
add a new data source


> But this is going to be a long-term project
+1

> with
> plenty of different people helping I bet.
I hope so...

> 
> ...
> 
> > * Parsing mboxes for software repository data mining:
> > There is a suite of tools exactly targeted at this kind of duty on
> > github: Metrics Grimoire [1], developed (and used) by Bitergia [2]. I
> > don't know how they manage time zones, but the toolsuite is widely used
> > around (see [3] or [4] as examples) so I believe they are quite robust.
> > It includes tools for data retrieval as well as visualisation.
> 
> Drat.  Metrics Grimoire looks pretty nifty - essentially a set of
> frameworks for extracting metadata from a bunch of sources - but it's
> GPL, so personally I have no interest in working on it.  If someone else
> uses it to generate datasets that's great.
> 
> > * As for the feedback/thoughts about the architecture and formats:
> > I love the REST-API idea proposed by Rob. That's really easy to access
> > and retrieve through scripts on-demand. CSV and JSON are my favourite
> > formats, because they are, again, easy to parse and widely used -- every
> > language and library has some facility to read them natively.
> 
> Yup - again, like project visualization, to make any of this simple for
> newcomers to try stuff, we need to separate data gathering / model /
> visualization.  Since most of these are spare time projects, having easy
> chunks makes it simpler for different people to try their hand at it.
For visualization, for sure, json is the current natural format when data is 
consumed from the browser.
I don't have great experience on this, and what I'm missing with json 
currently is a common practice on documenting a structure: are there common 
practices?
Because for simple json structure, documentation is not really necessary, but 
once the structure goes complex, documentation is really a key requirement for 
people to use or extend. And I already see this shortcoming with the 11 json 
files from projects-new.a.o = https://projects-new.apache.org/json/foundation/

Regards,

Hervé

> 
> Thanks,
> 
> - Shane

Re: Standards for mail archive statistics gathering?

Reply via email to