Re: [mmh] folder_read with scandir()

markus schnalke Sat, 21 Mar 2020 15:02:46 -0700

Hoi.

[2020-03-21 17:48] Philipp Takacs <phil...@bureaucracy.de>
> [2020-03-21 10:28] markus schnalke <mei...@marmaro.de>
> >
> > Some comments:
> >
> > [2020-03-19 12:45] Philipp Takacs <phil...@bureaucracy.de>
> > >
> > > Also noticed the other flag, not shoure if we need this realy. It's
> > > only used in folder and I don't see the point of knowing that there
> > > are other files in a folder.
> >
> > We should be a bit careful at that point to understand the
> > situation. Here are some aspects to think about:
> >
> > - Files starting with a comma. These were backups of deleted messages
> > (see the section on the trash folder in my master's thesis). Thus
> > they are considered MH files here but not further regarded. I'm
> > not sure if the code had been this way all the time, because there
> > used to the a `--with-hash-prefix' option (or similar) that
> > replaced the comma with a hash symbol as the prefix for deleted
> > messages backups. Anyways, we don't have such comma files anymore.
> > Mmh uses the trash folder for that case, i.e. we can get rid of
> > the comma prefix file handling ... possibly. Does anyone know the
> > situation in nmh: have they switched to a trash folder as well?
> > Because we're still (basically) mail storage compatible to nmh (I
> > think), we should not be fully ignorant of that situation. But we
> > could handle those backup files as `other' files. 
> 
> The files starting with a comma/dot aren't the problem they are totally
> ignored. The problem are files witch aren't totally ignored. I have a ugly
> workaround in the filter and I want to get rid of it.


How I understand the situation (without looking at the code): A
folder can contain:

- messages
- `.' and `..'
- `.mh_sequences'
- subfolders
- other files

Other files are everything else, no matter if they are named
`.foo' or `,foo' or just `foo'.

When we read a folder, only messages are of relevance. All the
rest is ignored.

Do we need to know about subfolders and other files? I'd say that
it would be nice, because there are mmh tools that use such
information.

Does it have to be part of this function? Not necessarily.

(Sorry for not looking at the code right now. I hope my thoughts
help nonetheless.)


> > - `Other' files seem to be files in the mail storage that have
> > nothing to do with MH. E.g. any random file someone created in the
> > mail storage without any connection to mmh. Normally mmh would
> > simply ignore them, i.e. act as if they wouldn't be there at all.
> 
> Not completely other files are also sub directories. folder uses this to
> check if it's required to search for sub directories with the recursive
> flag. This is an other ugly code path I found.

Like I layed out above, these are different classes of files that
could be in a folder. Merging subdirectories and other files (or
ignoring everything that starts with a dot) does not match the
problem structure. It might work and it might have been good
enough -- it might still be good enough -- but the goal should be
that the code reflects the problem domain.


> > There's one place, however, where `other' files are relevant:
> > rmf(1). We should not delete a mail folder that includes `other'
> > files. It seems this check is missing there. Hence, I would keep
> > the `other' files marker. It adds few complexity and can be
> > useful. Plus, we should use it in rmf(1).
> 
> rmf(1) just don't use folder_read(), therefor it's completely useless
> for it. It does check for other files on it's own.

Sounds a bit like code duplication. Surely both cases -- reading
message numbers from a folder and deleting a folder -- are quite
different, but the classification of files within a folder is the
same and maybe should be outfactored and reused (maybe with an
enum fileclass or such).


> So conclusion, currently I would suggest to let in in. But if I rewrite
> the recursive code path of folder(1), I'll bring this up again.

Would be okay for me.


> > > +static int others;
> > > +
> > > +static int
> > > +msgcmp(const struct dirent **d1, const struct dirent **d2)
> > > +{
> > > + size_t l1, l2;
> > > +
> > > + l1 = strlen((*d1)->d_name);
> > > + l2 = strlen((*d2)->d_name);
> > > + if (l1 < l2) {
> > > +         return -1;
> > > + }
> > > + if (l1 > l2) {
> > > +         return 1;
> > > + }
> > > + return strcmp((*d1)->d_name, (*d2)->d_name);
> > > +}
> >
> > Isn't that a strange comparison algorithm, for first compare
> > string lengths and only if eaqual their content? Here does mmh
> > sort by length of folder name?
> 
> This is only for msg-files and sorts by numeric order. So "9" is
> smaller the "10". But actually it's complete irrelevant, because the
> m_atoi() and the storage in the array will automatic get the right
> order. Therefor we could completly remove this function.

Even better. ;-)

Still I'd like to comment on this piece of code, only because we
all can learn from other peoples views on certain code snippets.

I would have been less irritated if the function would have been
named `msgnumcmp'. That would have implied what you explained upon
my question ("9" < "10").

As within the function, only the `d_name' member of the `dirent'
stuct is used, it would have been more appropriate to only pass
that filename string, not the whole dirent.

But nevermind. The first step always is to write some code at all,
which you much better achieve than myself (to my shame). Only from
there on we can improve the code. -- What I want to say: Keep
going! :-)


meillo

Re: [mmh] folder_read with scandir()

Reply via email to