Re: Community renewal and project obsolescence
On 12/28/23 10:34, Rafael Laboissière wrote: * M. Zhou [2023-12-27 19:00]: Thanks for the code and the figure. Indeed, the trend is confirmed by fitting a linear model count ~ year to the new members list. The coefficient is -1.39 member/year, which is significantly different from zero (F[1,22] = 11.8, p < 0.01). Even when we take out the data from year 2001, that could be interpreted as an outlier, the trend is still siginificant, with a drop of 0.98 member/year (F[1,21] = 8.48, p < 0.01). I thought about to use some models for population statistics, so we can get the data about DD birth rate and DD retire/leave rate, as well as a prediction. But since the descendants of DDs are not naturally new DDs, the typical population models are not likely going to work well. The birth of DD is more likely mutation, sort of. Anyway, we do not need sophisticated math models to draw the conclusion that Debian is an aging community. And yet, we don't seem to have a good way to reshape the curve using Debian's funds. -- this is one of the key problems behind the data. P.S.1: The correct way to do the analysis above is by using a generalized linear model, with the count data from a Poisson distribution (or, perhaps, by considering overdispersed data). I will eventually add this to my code in Git. Why not integrate them into nm.debian.org when they are ready? P.S.2: In your Python code, it is possible to get the data frame directly from the web page, without copying Just replace the line: df = pd.read_csv('members.csv', sep='\t') by: df = pd.read_html("https://nm.debian.org/members/;)[0] I am wondering whether ChatGPT could have figured this out… I just specified the CSV input format based on what I have copied. It produces well-formatted code with detailed documentation in most of the time. I deleted too much from its outputs to keep the snippet short. I have to justify one thing to avoid giving you a wrong impression about large language models. In fact, the performance of an LLM (such as ChatGPT) greatly varies based on the prompt and the context people provided to it. Exploring this in-context learning capability is still one of the cutting edge research topics. For the status-quo LLMs, their answers on boilerplate code like plotting (matplotlib) and simple statistics (pandas) are terribly perfect.
Re: Community renewal and project obsolescence
* M. Zhou [2023-12-27 19:00]: Thanks for sharing the figure. The data seems correlated with the number of new Debian accounts. See the figure below: Python Code for this figure: ``` # modified from ChatGPT. # XXX: members.csv is copy-pasted from https://nm.debian.org/members/ import pandas as pd import matplotlib.pyplot as plt df = pd.read_csv('members.csv', sep='\t') df = df[df['Since'] != '(unknown)'] # filter out invalid data df['Since'] = pd.to_datetime(df['Since']) df['Year'] = df['Since'].dt.year account_counts = df['Year'].value_counts().sort_index() smoothed_counts = account_counts.rolling(window=3).mean() plt.figure(figsize=(10, 6)) plt.bar(account_counts.index, account_counts.values, color='skyblue') plt.plot(smoothed_counts.index, smoothed_counts.values, color='orange', label=f'Smoothed (Window=3)') plt.xlabel('Year') plt.ylabel('Number of Accounts Created') plt.title('Number of Accounts Created Each Year') plt.legend() plt.savefig('nm-year.png') ``` Thanks for the code and the figure. Indeed, the trend is confirmed by fitting a linear model count ~ year to the new members list. The coefficient is -1.39 member/year, which is significantly different from zero (F[1,22] = 11.8, p < 0.01). Even when we take out the data from year 2001, that could be interpreted as an outlier, the trend is still siginificant, with a drop of 0.98 member/year (F[1,21] = 8.48, p < 0.01). Best, Rafael Laboissière P.S.1: The correct way to do the analysis above is by using a generalized linear model, with the count data from a Poisson distribution (or, perhaps, by considering overdispersed data). I will eventually add this to my code in Git. P.S.2: In your Python code, it is possible to get the data frame directly from the web page, without copying Just replace the line: df = pd.read_csv('members.csv', sep='\t') by: df = pd.read_html("https://nm.debian.org/members/;)[0] I am wondering whether ChatGPT could have figured this out…
Re: Shutdown of servers at AQL (mips*el porterbox and buildds)
Dear all, On 2023-11-23 07:11, Aurelien Jarno wrote: > Dear all, > > Our hosting agreement with AQL has ended. As a result we need to unrack > the servers that were hosted there. We are working on relocating them or > setting up new servers elsewhere. > > The list of affected services are: > - eller.d.o (mips*el porterbox) eberlin.d.o has been setup as a mips*el porterbox to replace eller.d.o. Regards Aurelien -- Aurelien Jarno GPG: 4096R/1DDD8C9B aurel...@aurel32.net http://aurel32.net signature.asc Description: PGP signature
Re: reinstallation of the riscv64 porterbox
Dear all, On 2023-12-27 22:27, Aurelien Jarno wrote: > Dear all, > > As part of making riscv64 as an official architecture, the riscv64 porterbox > will be reinstalled. For this reason, it will be unavailable for a couple of > days. It will then come back as ricci.debian.org. The reinstallation is now done, the porterbox is now accessible as ricci.debian.org. Regards Aurelien -- Aurelien Jarno GPG: 4096R/1DDD8C9B aurel...@aurel32.net http://aurel32.net signature.asc Description: PGP signature