On 10 March 2010 07:18, Stuart Bishop <[email protected]> wrote: ... > If it is just inserting new data rather than modifying existing rows it > should be ok at the moment. You say 'almost all new data' though, which is > the catch. Even if it is all new data, that doesn't mean it will be fine in > the future (eg. we add an ON INSERT trigger to update some cache > information). It also doesn't protect us from long running imports, which we > will kill off to avoid causing database bloat (garbage cannot be cleared up > in the database by VACUUM until it is older than the longest running > transaction).
Looking through bugimport.py, the only occurrence I see of manipulating existing data is a call to email_address.account.createPerson(), when a user has an account in SSO (?) but not in Launchpad. If this is a problem, we could, for example, identify all bugs to be imported that refer to users with SSO and not LP, then process these last of all. Or we could say that dry runs are not possible when an import contains such users. > If the goal here is to avoid writing the cache file, I'd suggest just using > another method to detect an already imported bug (eg. the bug nickname is > set by the importer to allow old bug ids to map to launchpad bug ids). Avoiding the cache file is one thing certainly. The bug importer does set the bug nick name, so we should change it to check for that instead of using the cache file. The other reason is to allow anyone to do dry runs, so that we don't have to, and as a step towards completely self-service bug imports. Allowing trial runs feels like it's important to that, but maybe it's not. > The other points are valid rationales though. Perhaps we should import into > temporary tables and, on success, move all the data from the temporary > tables into the real ones. I'd suggest now worrying about these issues > though - better validation of the import file before attempting the import > would seem to be a better approach. For the database import to fail, you > would need to violate database constraints or attempt to link to a > non-existant row and there not that many constraints to check and I don't > think there are any foreign key references that might get removed mid-run. I don't think temporary tables are the way to go, because it's the constraints and foreign keys that need to be exercised. The temp tables could have those same constraints I guess... but it seems like a lot of work. I agree that validation is the way to go. Two validators might work well: one static, and one when the database is available. The static validator could be packaged stand-alone so that it's easy to reuse by the developers of bug exporters. Gavin. _______________________________________________ Mailing list: https://launchpad.net/~launchpad-dev Post to : [email protected] Unsubscribe : https://launchpad.net/~launchpad-dev More help : https://help.launchpad.net/ListHelp

