Re: Standards for mail archive statistics gathering?

Shane Curcuru Tue, 05 May 2015 18:28:44 -0700

On 5/5/15 7:33 AM, Boris Baldassari wrote:
> Hi Folks,
> 
> Sorry for the late answer on this thread. Don't know what has been done
> since then, but I've some experience to share on this, so here are my 2c..


No, more input is always appreciated!  Hervé is doing some
centralization of the projects-new.a.o data capture, which is related
but slightly separate.  But this is going to be a long-term project with
plenty of different people helping I bet.

...
> * Parsing mboxes for software repository data mining:
> There is a suite of tools exactly targeted at this kind of duty on
> github: Metrics Grimoire [1], developed (and used) by Bitergia [2]. I
> don't know how they manage time zones, but the toolsuite is widely used
> around (see [3] or [4] as examples) so I believe they are quite robust.
> It includes tools for data retrieval as well as visualisation.

Drat.  Metrics Grimoire looks pretty nifty - essentially a set of
frameworks for extracting metadata from a bunch of sources - but it's
GPL, so personally I have no interest in working on it.  If someone else
uses it to generate datasets that's great.

> 
> * As for the feedback/thoughts about the architecture and formats:
> I love the REST-API idea proposed by Rob. That's really easy to access
> and retrieve through scripts on-demand. CSV and JSON are my favourite
> formats, because they are, again, easy to parse and widely used -- every
> language and library has some facility to read them natively.

Yup - again, like project visualization, to make any of this simple for
newcomers to try stuff, we need to separate data gathering / model /
visualization.  Since most of these are spare time projects, having easy
chunks makes it simpler for different people to try their hand at it.

Thanks,

- Shane

Re: Standards for mail archive statistics gathering?

Reply via email to