On Thu, 10 Sep 2020 at 13:46, Daniel Gruno <[email protected]> wrote: > > On 10/09/2020 14.44, sebb wrote: > > On Thu, 10 Sep 2020 at 13:23, Daniel Gruno <[email protected]> wrote: > >> > >> On 10/09/2020 14.15, sebb wrote: > >>> On Thu, 10 Sep 2020 at 12:32, Daniel Gruno <[email protected]> wrote: > >>>> > >>>> On 10/09/2020 13.25, sebb wrote: > >>>>> Migration to Foal will be a huge job for some installations. > >>>>> > >>>>> Whilst hopefully all snags will have been ironed out of any conversion > >>>>> tool before it is deployed in earnest, it's possible that some edge > >>>>> cases will cause issues, and will need subsequent adjustment. > >>>> > >>>> Short of ironing out a standard for DKIM_ID, the migration tests I've > >>>> done have gone relatively well. There were IIRC a few snags, most > >>>> related to the ES 7.8.1 lib, but once I got migration started, it worked > >>>> as intended and everything on the new ES server was compatible. If we > >>>> could somehow get a migration test running on travis or such, that would > >>>> be ideal - but that is quite tricky - we'd have to maybe dockerize two > >>>> containers - one with old pony, one with foal, and then test migrating > >>>> across and checking that each document is obtainable. > >>> > >>> What tests are planned for checking migration? > >>> > >>>>> > >>>>> To this end, I think it will be essential to know which records have > >>>>> been migrated, and which version of the software was used to do so (as > >>>>> well as the date). > >>>>> > >>>>> It may be worth including version and timestamp info in the direct > >>>>> archive and imports as well. > >>>> > >>>> Do you mean adding a key/value to the migrated doc with a migration > >>>> note? That wouldn't be a bad idea, if nothing else, to keep score of > >>>> what was migrated and what's new. > >>> > >>> Something like that. > >>> > >>> I think the data needs to be flexible and allow for multiple notes. > >>> It won't always be sufficient to record the last change to the data. > >> > >> Yes, one wondrous thing about ES is a text field can be both text or an > >> array of texts, so you can have one note or multiple notes, and it'll > >> just work. I'm thinking of just having a "notes" field where we can put > >> entries. > > > > Does that automatically append new entries, or does the user have to > > amend the record to ensure previous entries are not lost? > > What I do right now is fetch the doc, ensure 'notes' is a list, then > append new notes to it and save the entire doc.
i.e. care must be taken not to lose existing info. > > > > It would probably still be useful to have some fixed attributes such as > > -archived-at > > -imported-at > > That would be for archiver.py and import-mbox.py? Yes, probably also need -migrated at > > > >>> > >>>>> > >>>>> One possible application would be to back-fill attachments which were > >>>>> originally ignored. > >>>> > >>>> This could be run as a background re-indexer perhaps? That grabs the > >>>> source document, re-parses attachments, and if it contained more than > >>>> originally thought, add them and update the email document. > >>> > >>> Yes, and marks the document somehow so it does not need to be scanned > >>> again. > >>> > >>> This is where the change context comes in. > >>> If we knew which documents were created with which version of > >>> software, it would be possible to know which ones did not need > >>> processing. > >>> > >>>>> > >>>>> S. > >>>>> > >>>> > >> >
