Re: [gentoo-user] Replacement for gcruft: gcrud

2018-08-17 Thread Corentin “Nado” Pazdera
August 17, 2018 1:09 AM, "Andrew Udvare"  wrote:

> The whitelist is the biggest work in progress right now. Most of what it 
> lists from /etc for me is
> /etc/config-archive which AFAIK is not managed by Portage at all although 
> Portage will place old
> files there? I don't use the feature because my /etc is controlled by Git. 
> The stuff listed in
> /var/ is pretty accurate as there's a lot of old website cruft and this 
> computer does not serve
> anything like that anymore.

Well, for example I use eselect-repository which puts repos in /var/dbr/repos, 
I put gentoo tree in
there as well and the whole tree is suggested for deletion.
A solution would be to read /etc/portage/repos.conf file(s) for repos location 
during the runtime
detection, or use portageq interface.
Or tell people to whitelist manually their repos location when the config file 
will be available ;)

You could add in whitelist directories containing a .keep file, although I'm 
not sure how to
specify it.
Same goes for git repositories, I’d rather delete a whole git repo or nothing 
at all inside, so
adding a rule which can interprets "pick parent dir of a .git dir to suggest 
deletion, ignore all
children of said parent".

> The idea is to move to everything in the whitelist.c file to a declarative 
> (no code unless you
> count RE) configuration file. I have not decided on a format but I am leaning 
> towards INI-style
> because GLib2 has a parser for that built-in. The config file will specify 
> exact paths, RE, and
> globs. There will be a default dynamic list generated at runtime based on 
> what packages you have
> installed (as gcruft had this feature).

That will be nice, waiting for it ;) Something basic might be enough for making 
batches of test
before choosing a definite format.

>> I also caught some wrongly listed files because of the multilib system with 
>> /lib symlink.
>> For example, dhcpcd declared /lib/dhcpcd/dhcpcd-hooks, thus the realpath 
>> /lib64/dhcpcd/dhcpcd-hooks
>> was listed in the removal suggestion. This should be fixed with profile 17.1
> 
> The /lib vs /lib64 issue will be resolved in a later version. I think I need 
> to use lstat()
> everywhere instead of stat(), or I can call realpath() prior to storing 
> values in the set. This
> file should be whitelisted, but only if you have dhcpcd installed (I've long 
> since moved to dhcpd).

I’m in favor of the realpath suggestion, this will be useful for any symlinked 
accessed path.

>> The log is so huge at the moment it is useless for me :/
>> 
>> % wc -l out.log
>> 461575 out.log
> 
> Any thoughts on how to simplify analysis?

A few, but I’m not sure if I have much which are /universal/ in gentoo systems.
Do you plan to integrate the sorting part in gcrud directly?
If so, I’d suggest bringing /usr/* stuff first to show, because un-owned files 
should be
exceptions.
Same goes for /lib, but stuff like kernel modules should be treated carefully, 
we can either
whitelist the whole /lib{,32,64}/modules, or try being smart and select old 
kernel modules only.
This might be tricky given the number of ways someone can manage them.

Also, here is small analysis of files locations by gcrud.

% cut -d/ -f2 out.log|uniq -c
295 etc
3309 lib64
1178 lib
13 opt
39586 usr
417194 var

/var containing my different repos, its logical it contains most occurences.
Next goes usr, containing another lib{,32,64} schema with /usr/lib pointing to 
/usr/lib64, with go
packages installed (in /usr/lib64/go).
With these informations, I suppose most will disappear when using 
realpath/switching to 17.1
profile.

Thanks for your work, this will probably a excellent tool in a few commits ;)

Regards,
Corentin “Nado” Pazdera



Re: [gentoo-user] Replacement for gcruft: gcrud

2018-08-16 Thread Andrew Udvare



> On 2018-08-16, at 14:22, james  wrote:
> 
> Yes, but, it'll be  while for me. Offer and automated clean up option,
> and I have dozens of systems to test.

I'll figure out the kind of tests I want to run sometime soon.
> 
> 
> GLEP 64 was on the path to systematically solve what you you are doing
> after the fact::
> 
> https://wiki.gentoo.org/wiki/GLEP:64
> 
> More refs for your convenience
> 
> http://asic-linux.com.mx/~izto/checkinstall/
> 
> http://gittup.org/tup/
> ("It will automatically clean-up old files.")

Thanks for pointing these out.

It is really tempting to support macOS like tup does, although SIP and the 
restored snapshot on boot kind of makes it unnecessary. And also the idea of 
using a newly created FS to see changes is interesting.

A new GLEP to systematically delete extraneous files could be to restore a 
non-user generated snapshot on boot just like iOS/macOS, but the problem is 
that we don't always use the same filesystem or mount configurations. Another 
way would be to use xattr but again the issue is compatibility.

-- 
Andrew


Re: [gentoo-user] Replacement for gcruft: gcrud

2018-08-16 Thread Andrew Udvare



> On 2018-08-16, at 16:09, Corentin “Nado” Pazdera  wrote:
> 
> Hi,
> 
> So I tested it, and I was surprised how many /etc files weren't put into 
> whitelist.
> Actually, most of /etc shouldn't be suggested for deletion if the packages 
> are still installed.

Thanks for testing! Really appreciate it.

The whitelist is the biggest work in progress right now. Most of what it lists 
from /etc for me is /etc/config-archive which AFAIK is not managed by Portage 
at all although Portage will place old files there? I don't use the feature 
because my /etc is controlled by Git. The stuff listed in /var/ is pretty 
accurate as there's a lot of old website cruft and this computer does not serve 
anything like that anymore.

> 
> Portage stuff like repositories could be whitelisted in a dynamic manner, or 
> at least bing able to
> tell what directorie(s) are used to store them.

The idea is to move to everything in the whitelist.c file to a declarative (no 
code unless you count RE) configuration file. I have not decided on a format 
but I am leaning towards INI-style because GLib2 has a parser for that 
built-in. The config file will specify exact paths, RE, and globs. There will 
be a default dynamic list generated at runtime based on what packages you have 
installed (as gcruft had this feature).

> I also caught some wrongly listed files because of the multilib system with 
> /lib symlink.
> For example, dhcpcd declared /lib/dhcpcd/dhcpcd-hooks, thus the realpath 
> /lib64/dhcpcd/dhcpcd-hooks
> was listed in the removal suggestion. This should be fixed with profile 17.1

The /lib vs /lib64 issue will be resolved in a later version. I think I need to 
use lstat() everywhere instead of stat(), or I can call realpath() prior to 
storing values in the set. This file should be whitelisted, but only if you 
have dhcpcd installed (I've long since moved to dhcpd).

I am trying to my best to give zero false positives, so you plan to have 
something like `% gcrud | ... | xargs rm -fR`.

> 
> The log is so huge at the moment it is useless for me :/
> 
> % wc -l out.log
> 461575 out.log

Any thoughts on how to simplify analysis?

-- 
Andrew


Re: [gentoo-user] Replacement for gcruft: gcrud

2018-08-16 Thread Corentin “Nado” Pazdera
August 16, 2018 8:07 AM, "Andrew Udvare"  wrote:

> gcruft seems to have died off (https://www.google.com/search?q=gcruft
> only returns ebuild results). I was using it quite a lot and wrote many
> exception files. It's gone now with no way for my or anyone else's
> ebuild to get the original source. I did preserve it though, here:
> https://gitlab.com/Tatsh/gcruft
> 
> I wrote a replacement in C named gcrud. It only needs GLib2 installed to
> work. It's much faster than gcruft ever was. The code is here:
> 
> https://gitlab.com/Tatsh/gcrud
> https://github.com/Tatsh/gcrud
> 
> I am placing preference in GitLab for issues and merge requests, but I
> will accept PRs from GitHub.
> 
> The whitelist https://gitlab.com/Tatsh/gcrud/blob/master/whitelist.c is
> currently hard-coded and limited but the results are satisfactory for
> now in my use cases.
> 
> Type use case:
> 
> sudo ./gcrud | sort -u > out.log
> 
> Examine out.log for things you can delete. There are absolutely zero
> calls to delete files from the machine in my code and never will be any
> kind of automation support.
> 
> If anyone tries it out I certainly would like to see your output and get
> some bug reports or suggestions. The main feature planned is reading
> from a configuration file for exact file paths and regexs.
> 
> --
> Andrew

Hi,

So I tested it, and I was surprised how many /etc files weren't put into 
whitelist.
Actually, most of /etc shouldn't be suggested for deletion if the packages are 
still installed.

Portage stuff like repositories could be whitelisted in a dynamic manner, or at 
least bing able to
tell what directorie(s) are used to store them.

I also caught some wrongly listed files because of the multilib system with 
/lib symlink.
For example, dhcpcd declared /lib/dhcpcd/dhcpcd-hooks, thus the realpath 
/lib64/dhcpcd/dhcpcd-hooks
was listed in the removal suggestion. This should be fixed with profile 17.1

The log is so huge at the moment it is useless for me :/

% wc -l out.log
461575 out.log

--
Corentin “Nado” Pazdera