On Thu, 2 Oct 2025 at 16:26, Rich Bowen <[email protected]> wrote:
>
>
>
> > On Oct 2, 2025, at 10:45 AM, sebb <[email protected]> wrote:
> >
> > On Thu, 2 Oct 2025 at 15:30, Rich Bowen <[email protected] 
> > <mailto:[email protected]>> wrote:
> >>
> >> Hi, folks,
> >>
> >> For the last couple of months I’ve been producing these - 
> >> https://boxofclue.com/apache-highlights/
> >>
> >> The process is that I have a checkout of every repo under 
> >> https://github.com/apache (just metadata, not actual files. 2.9Gb total) 
> >> and I grind through them to generate some metrics like:
> >>
> >> First time commit (ie, had a commit in a merged PR the first time)
> >> 10th/100th/1000th/etc commit
> >>
> >> There are some false positives, which I think come from, for example, X 
> >> makes a first-time commit to iceberg-fortran but they’ve contributed to 
> >> iceberg-rust before. But for the most part, it gives a really great weekly 
> >> snapshot of who the new people in your project are. I’ve gotten a couple 
> >> positive comments from a handful of projects that are using this data to 
> >> welcome new contributors, which was the intent of the thing. (I post it to 
> >> Mastodon every week.)
> >>
> >> I’d like to run this on our VM, rather than running it on my laptop every 
> >> Monday morning. I’d also like to link to the reports from a couple of 
> >> places on our website (and don’t really want to link to boxofclue.com 
> >> <http://boxofclue.com/> <http://boxofclue.com/> from there!) But I wanted 
> >> to run it by you folks first, before taking the liberty to do that. Does 
> >> anybody have any objections to me doing this?
> >
> > Which VM did you have in mind?
>
> projects.apache.org <http://projects.apache.org/> seems to make the most 
> sense, since this is project metrics.
>
> Also, I suppose a related question is, do you think anyone would have any 
> objection to their name being listed on such a document on an Apache website? 
> I cannot personally think why they would (and this is all already-public 
> data) but I suppose it is possible that someone might, and I want to be 
> sensitive to that.

AIUI, just because a particular item of PII is published in one
location does not mean it can be published elsewhere.

Does the data have to be fully public?
Indeed would it mean anything to the general public?

It could be restricted to committers or perhaps members (ASF or (P)PMC)

> —
> Rich Bowen
> [email protected]
>
>
>
>

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to