Re: [incubator-ponymail-foal] branch master updated: store sources as sha3-256 of themselves, add a permalink reference to the digested doc.

sebb Wed, 26 Aug 2020 03:22:25 -0700

On Tue, 25 Aug 2020 at 20:27, Daniel Gruno <[email protected]> wrote:
>
> On 25/08/2020 21.24, sebb wrote:
> > On Tue, 25 Aug 2020 at 20:11, Daniel Gruno <[email protected]> wrote:
> >>
> >> On 25/08/2020 20.54, sebb wrote:
> >>> On Tue, 25 Aug 2020 at 19:42, Daniel Gruno <[email protected]> wrote:
> >>>>
> >>>> On 25/08/2020 20.35, sebb wrote:
> >>>>> On Tue, 25 Aug 2020 at 19:23, Daniel Gruno <[email protected]> wrote:
> >>>>>>
> >>>>>> On 25/08/2020 20.15, sebb wrote:
> >>>>>>> AFAICT this will generate different hashes for the same message if
> >>>>>>> they are loaded from a different source.
> >>>>>>
> >>>>>> Yeah, it will - at present, that is on purpose. We can look at doing
> >>>>>> something like using Sean's DKIM parser for this, and only hashing the
> >>>>>> output from that, with the x-archived-list-id added in from the command
> >>>>>> line --lid argument if different from the canonical list id.
> >>>>>>
> >>>>>>>
> >>>>>>> Whilst it should ensure that distinct messages don't clash, it won't
> >>>>>>> weed out actual duplicates.
> >>>>>>
> >>>>>> Right, aware of that. In most cases, if you are reloading, you are 
> >>>>>> doing
> >>>>>> so with a fresh DB, and it won't matter much. In cases where you are
> >>>>>> "cascading" mbox files, it would make duplicates, but that's only a
> >>>>>> question of disk space for now, having duplicate source files won't
> >>>>>> cause malfunctions, just a few more bytes used and source alternatives.
> >>>>>
> >>>>> This has implications for the API and the UI.
> >>>>>
> >>>>> If there are multiple matches for a Permalink, in general one cannot
> >>>>> say which is correct, so all will have to be returned and displayed.
> >>>>
> >>>> I'm pondering how to address this. Currently, the prototype will return
> >>>> the first hit it finds that matches. This should really be fine, as they
> >>>> are all valid sources, so returning one or the other would not matter
> >>>> for the end-user.
> >>>
> >>> This assumes that the Permalink is sufficiently unique.
> >>> That is not true for some of the current designs.
> >>>
> >>
> >> This would be the case only if you lost your database and decided to
> >> re-image everything from scratch using foal with an older generator
> >> instead of the original pony mail, and two or more emails had collisions.
> >>
> >> I would strongly recommend against doing this unless you have no other
> >> choice or do not care about older permalinks that much.
> >>
> >> Foal is not meant as a drop-in replacement for the current Pony Mail. If
> >> you lose your old database and want complete assuredness against this,
> >> you should re-image using the old version first, and then migrate
> >> across. There will be differences in both the archiver and the UI that
> >> are not fully backwards compatible, as the 'old ways' are bugged here
> >> and there.
> >>
> >> The migrator will, once it's done, migrate everything over verbatim, so
> >> any overrides you had in the old system will apply to the new one as
> >> well, and you won't see multiple choices for old emails, only newly
> >> archived ones done with the foal archiver or importer.
> >
> > If Foal is to support non-unique generators, it must use their
> > Permalinks as the database Id, or it must support multiple matches.
> >
>
> I'm strongly in favor of ripping them out of the system altogether, and
> only supporting full and dkim for future operations. I haven't quite
> gotten around to it yet :)


The full generator is only useful for messages that always come from
the same source.
Unless all the headers are identical, it will produce a different output.

And until the recent removal of the archived-at header, it would not
even produce the same result twice for identical archiver inputs.
Nor would import-mbox produce the same result as the archiver for the
same message.

It is the least stable generator.

I think it would be a mistake to keep it unless you are keeping them all.

Re: [incubator-ponymail-foal] branch master updated: store sources as sha3-256 of themselves, add a permalink reference to the digested doc.

Reply via email to