Re: Community renewal and project obsolescence

2023-12-28 Thread Mo Zhou

On 12/28/23 10:34, Rafael Laboissière wrote:


* M. Zhou  [2023-12-27 19:00]:

Thanks for the code and the figure. Indeed, the trend is confirmed by 
fitting a linear model count ~ year to the new members list. The 
coefficient is -1.39 member/year, which is significantly different 
from zero (F[1,22] = 11.8, p < 0.01). Even when we take out the data 
from year 2001, that could be interpreted as an outlier, the trend is 
still siginificant, with a drop of 0.98 member/year (F[1,21] = 8.48, p 
< 0.01).


I thought about to use some models for population statistics, so we can 
get the data about DD birth rate and DD retire/leave rate, as well as a 
prediction. But since the descendants of DDs are not naturally new DDs, 
the typical population models are not likely going to work well. The 
birth of DD is more likely mutation, sort of.


Anyway, we do not need sophisticated math models to draw the conclusion 
that Debian is an aging community. And yet, we don't seem to have a good 
way to reshape the curve using Debian's funds. -- this is one of the key 
problems behind the data.


P.S.1: The correct way to do the analysis above is by using a 
generalized linear model, with the count data from a Poisson 
distribution (or, perhaps, by considering overdispersed data). I will 
eventually add this to my code in Git.


Why not integrate them into nm.debian.org when they are ready?

P.S.2: In your Python code, it is possible to get the data frame 
directly from the web page, without copying Just replace the 
line:


    df = pd.read_csv('members.csv', sep='\t')

by:

    df = pd.read_html("https://nm.debian.org/members/;)[0]

I am wondering whether ChatGPT could have figured this out…


I just specified the CSV input format based on what I have copied. It 
produces well-formatted code with detailed documentation in most of the 
time. I deleted too much from its outputs to keep the snippet short.


I have to justify one thing to avoid giving you a wrong impression about 
large language models. In fact, the performance of an LLM (such as 
ChatGPT) greatly varies based on the prompt and the context people 
provided to it. Exploring this in-context learning capability is still 
one of the cutting edge research topics. For the status-quo LLMs, their 
answers on boilerplate code like plotting (matplotlib) and simple 
statistics (pandas) are terribly perfect.




Re: Community renewal and project obsolescence

2023-12-28 Thread Rafael Laboissière

* M. Zhou  [2023-12-27 19:00]:

Thanks for sharing the figure. The data seems correlated with the 
number of new Debian accounts. See the figure below: 
Python Code for this figure:


 ```
 # modified from ChatGPT.
 # XXX: members.csv is copy-pasted from https://nm.debian.org/members/
 import pandas as pd
 import matplotlib.pyplot as plt
 df = pd.read_csv('members.csv', sep='\t')
 df = df[df['Since'] != '(unknown)'] # filter out invalid data
 df['Since'] = pd.to_datetime(df['Since'])
 df['Year'] = df['Since'].dt.year
 account_counts = df['Year'].value_counts().sort_index()
 smoothed_counts = account_counts.rolling(window=3).mean()
 plt.figure(figsize=(10, 6))
  plt.bar(account_counts.index, account_counts.values, color='skyblue')
 plt.plot(smoothed_counts.index, smoothed_counts.values, color='orange',
 label=f'Smoothed (Window=3)')
 plt.xlabel('Year')
 plt.ylabel('Number of Accounts Created')
 plt.title('Number of Accounts Created Each Year')
 plt.legend()
 plt.savefig('nm-year.png')
 ```


Thanks for the code and the figure. Indeed, the trend is confirmed by 
fitting a linear model count ~ year to the new members list. The 
coefficient is -1.39 member/year, which is significantly different from 
zero (F[1,22] = 11.8, p < 0.01). Even when we take out the data from year 
2001, that could be interpreted as an outlier, the trend is still 
siginificant, with a drop of 0.98 member/year (F[1,21] = 8.48, p < 0.01).


Best,

Rafael Laboissière

P.S.1: The correct way to do the analysis above is by using a 
generalized linear model, with the count data from a Poisson distribution 
(or, perhaps, by considering overdispersed data). I will eventually add 
this to my code in Git.


P.S.2: In your Python code, it is possible to get the data frame directly 
from the web page, without copying Just replace the line:


df = pd.read_csv('members.csv', sep='\t')

by:

df = pd.read_html("https://nm.debian.org/members/;)[0]

I am wondering whether ChatGPT could have figured this out…



Re: Shutdown of servers at AQL (mips*el porterbox and buildds)

2023-12-28 Thread Aurelien Jarno
Dear all,

On 2023-11-23 07:11, Aurelien Jarno wrote:
> Dear all,
> 
> Our hosting agreement with AQL has ended. As a result we need to unrack
> the servers that were hosted there. We are working on relocating them or
> setting up new servers elsewhere.
> 
> The list of affected services are:
> - eller.d.o (mips*el porterbox)

eberlin.d.o has been setup as a mips*el porterbox to replace eller.d.o.

Regards
Aurelien

-- 
Aurelien Jarno  GPG: 4096R/1DDD8C9B
aurel...@aurel32.net http://aurel32.net


signature.asc
Description: PGP signature


Re: reinstallation of the riscv64 porterbox

2023-12-28 Thread Aurelien Jarno
Dear all,

On 2023-12-27 22:27, Aurelien Jarno wrote:
> Dear all,
> 
> As part of making riscv64 as an official architecture, the riscv64 porterbox
> will be reinstalled. For this reason, it will be unavailable for a couple of
> days. It will then come back as ricci.debian.org.

The reinstallation is now done, the porterbox is now accessible as
ricci.debian.org.

Regards
Aurelien

-- 
Aurelien Jarno  GPG: 4096R/1DDD8C9B
aurel...@aurel32.net http://aurel32.net


signature.asc
Description: PGP signature